User Tag List

First 123 Last

Results 11 to 20 of 26

Thread: Big Data

  1. #11
    meh Salomé's Avatar
    Join Date
    Sep 2008
    MBTI
    INTP
    Enneagram
    5w4 sx/sp
    Posts
    10,540

    Default

    In an enthusiastic kind of way.
    Quote Originally Posted by Ivy View Post
    Gosh, the world looks so small from up here on my high horse of menstruation.

  2. #12
    null Jonny's Avatar
    Join Date
    Sep 2009
    MBTI
    FREE
    Posts
    2,486

    Default

    Ah. I'll admit, I do sometimes get lost in my excitement. That isn't to say that I won't step back from the experience to reflect upon my own biases and/or seek advice from a (relatively) neutral party.
    [SIGPIC][/SIGPIC]

  3. #13
    meh Salomé's Avatar
    Join Date
    Sep 2008
    MBTI
    INTP
    Enneagram
    5w4 sx/sp
    Posts
    10,540

    Default

    Well, that's disappointing.

    Here was me all convinced you were the alpha and omega on the topic - what with you filming yourself in front of a whiteboard full of equations an all.
    Quote Originally Posted by Ivy View Post
    Gosh, the world looks so small from up here on my high horse of menstruation.

  4. #14
    null Jonny's Avatar
    Join Date
    Sep 2009
    MBTI
    FREE
    Posts
    2,486

    Default

    [SIGPIC][/SIGPIC]

  5. #15
    meh Salomé's Avatar
    Join Date
    Sep 2008
    MBTI
    INTP
    Enneagram
    5w4 sx/sp
    Posts
    10,540

    Default

    I'm teeeeeeeasing you.
    Damn, JohnBoy, How did you get all the way to being you without developing a sense of humour?
    Quote Originally Posted by Ivy View Post
    Gosh, the world looks so small from up here on my high horse of menstruation.

  6. #16
    Ginkgo
    Guest

    Default

    Regardless of the amount of data, I think it's a good idea to consciously take courses of action as if a statistical representation reflects a particular object or person, as long as the statistical representation is all across the board. It's convenient for companies that can easily monitor their entire consumer base. However, I also think they should continue to observe and take note if their predictions are wrong, rethink things, and make deductions in order to determine what they're really looking for in a product, consumer, or whathaveyou. Over time, the statistics ought to refine themselves to cope with initial ignorance. We're never going to be omnipresent, but we're always going to have tabs on how we're willing to place bets, so to speak.

  7. #17
    meh Salomé's Avatar
    Join Date
    Sep 2008
    MBTI
    INTP
    Enneagram
    5w4 sx/sp
    Posts
    10,540

    Default

    Are you sure you're not INFJ?
    Quote Originally Posted by Ivy View Post
    Gosh, the world looks so small from up here on my high horse of menstruation.

  8. #18

    Default

    Quote Originally Posted by Jonnyboy View Post
    Too lazy to type... Here's this:

    Interesting to get your thoughts on this Jonny.

    But there are some counterpoints I wanted to bring up.

    1) The Big Data community itself is a source of the problem. In some sense, I consider myself part of this community. I believe machine learning, statistical analysis, and high performance computation are going to be major parts of my career. My university is working on a Big Data initiative (They want to create a Data Science Institute). But parts of the foundational document for this initiative contains the very ethos you claim is not part of Big Data itself. "Let the data speak for themselves".

    You can read about some of the bold (bordering on nonsensical) claims that some key figures in the Big Data community make themselves.

    For instance: http://research.microsoft.com/en-us/...omplete_lr.pdf
    ^Here (in the second essay), Jim Gray talks about a "A Transformed Scientific Method".

    He proposes that the "theoretical branch" of science only started a few hundred years ago. I wonder if he ever read Euclid, or knew about Babylonian mathematics. He then claims that the "empirical branch" of science started thousands of years ago...does he think we learned to forage, and hunt by magic? He then claims that computational science just started a decade ago...perhaps not familiar with the abacus.

    Of course, this is Jim Gray's hook into getting people to listen to him. He goes on to talk about the fact that software is becoming an increasingly large part of experimental budgets, how data management and cross-linking is becoming more important and so on. Many of the issues he brings up are real and important.

    However, does this excuse the nonsensical amount of hype he brings to these endeavors?

    Chris Anderson a former editor at both Nature and Science, two of the highest impact Scientific Journals in the world, wrote this article in WIRED magazine. It is titled "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete", and he concludes it with the following statement: "Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all."

    ^If that isn't overhyped, I don't know what is.

    Even Nate Silver (who early in his book The Signal and the Noise states that he didn't agree with Chris Anderson), has his own version of the "ideology of no ideology". He calls it "being Foxy". In his ideology (or Big Idea, if you will), he groups (without presenting supporting data mind you), the following traits together: "Multi-Disciplinary, Adaptable, Self-Critical, Tolerant of Complexity, Cautious, Empirical, Better Forecasters" (He calls people with these characteristics "Foxes"...who he claims use a lot of little ideas instead of big ones), and he groups these other traits together: "Specialized, Stalwart, Stubborn, Order Seeking, Confident, Ideological, weaker forecasters". (He calls the people with these characteristics "Hedgehogs" who want to explain things with some "Big Ideas")

    I wonder if the irony of this section of his book is lost on him. He certainly makes many good points in his book (which he ironically calls "principles"), about thinking probabilistically (too bad he didn't do this with his Fox-Hedgehog thing), adjusting forecasts with new data, looking for consensus, and to beware "magic bullet" forecasts. I think we are so far away from any semblance of mechanistic understanding in the social sciences, that this is probably a good set of temporary rules for forecasting elections or economic shifts. But does he, in some sense a specialist in the social sciences, believe that using Occam's Razor and avoiding ad-hoc hypothesizing are no longer valuable in the physical sciences?

    With that, I want to highlight the perils of "letting the data speak for itself".

    2)You are assuming you are able to adjust quickly enough to new correlations showing up. I am thinking specifically in the realm of biology and medicine. We are very quickly creating many many biomarkers for diseases. The reliance on these biomarkers are based mainly on correlations. The data itself often relies on assays that are fairly automated. What happens if the company making the kits for these assays screws up? Did the correlations actually help the people who got individual poor diagnoses or hurt them?

    3) Statistics work on groups not individuals. Who would you trust more to diagnose you specifically, a pathologist or an epidemiologist? If you've seen the House episode called Three Stories, there is a great moment when one of the students he is lecturing to "diagnoses" the problem with a patients leg based on statistics of people with leg problems. I think this moment illustrated quite well how the very questions a person with mechanistic understanding (in this case House) would ask differs from the questions someone with a correlation based understanding (the student) would ask.

    4) All statistics assume a prior model, even rank based statistics. There is no getting away from modeling. You literally have no choice.

    5) You have more information from common sense that you realize, and therefore blindly assuming a default prior could be very wrong. Famously, Enrico Fermi asked his class "How many Piano Tuners are there in Chicago?" to get them to realize that they can guess some order of magnitude by making assumptions explicit and following what that would imply. More importantly, how would you even begin to have a sense of scale that your prior distribution should span without incorporating common sense? Here, you can get away from theory. But you do so at your peril.

    Accept the past. Live for the present. Look forward to the future.
    Robot Fusion
    "As our island of knowledge grows, so does the shore of our ignorance." John Wheeler
    "[A] scientist looking at nonscientific problems is just as dumb as the next guy." Richard Feynman
    "[P]etabytes of [] data is not the same thing as understanding emergent mechanisms and structures." Jim Crutchfield

  9. #19
    meh Salomé's Avatar
    Join Date
    Sep 2008
    MBTI
    INTP
    Enneagram
    5w4 sx/sp
    Posts
    10,540

    Default

    ^Agree with all of that. But then, I don't really know anyone working "at the coal-face" so to speak, who doesn't think this way.

    Couple of things to add. First, despite the hype, data-mining is nothing new, it's been around for decades. The thing that is feeding the "Big Data" frenzy is the ever-lower cost of storage and processing which makes feasible the collection of unimaginably vast quantities of the stuff by unremarkable organisations - often without a pre-existing business case. This is an unmitigated headache for infrastructure managers. In this sense, (and only this sense) it's unprecedented. It's a kind of unthinking devotion to data for data's sake, which is nonsensical. It reflects the fact that CIO's (or whoever is in charge of the IT budget) are more swayed by wild rhetoric than good judgement or understanding of technology.

    More isn't better. Ever heard of not being able to see the wood for the trees? It's a problem with unmanageably huge datasets, with this new requirement to capture absolutely everything and store it somewhere, just in case a future use might be found. I won't go into the technical difficulties and pitfalls and related security risks. Suffice it to say they are legion. And uranium is not a bad analogy (also @Jonnyboy, "intrinsic" doesn't mean what you think it means..).

    Then we turn to the uses themselves. Which broadly fall into two camps, commercial exploitation and government snooping. Privacy legislation cannot keep pace with the kinds of changes in technology which mean your supermarket might know you're pregnant before you do. I find such implications chilling. And I'm no Luddite.

    Reputed to be the first person to coin the phrase, in 1989, Erik Larson wrote a piece for Harper’s Magazine, which was reprinted in The Washington Post. "The article begins with the author wondering how all that junk mail arrives in his mailbox and moves on to the direct-marketing industry. The article includes these two sentences: “The keepers of big data say they do it for the consumer’s benefit. But data have a way of being used for purposes other than originally intended.”"

    At the beginning of last year, I was asked to work on a project which would enable a large media organisation to identify the demographics of its audience to a fine-grained degree so that it could sell "premium" advertising space accordingly. We are all familiar with targeted marketing online - Amazon's "hey, people like you also like this!" or Google's creepy parsing of your gmails to draw your attention to yet more worthless shit. But in the near future, you won't have any choice about the adverts you see - you'll only be shown those products that some analyst has pre-judged you to be "fit" for. I hardly need join the dots as to the potential for exploitative reinforcing of stereotypes, advertising "ghettos" etc, etc (shades of "Minority Report").
    Another project I was approached about involved every telecoms provider in the country being forced by the government to expose their databases to interrogation by algorithms designed to sniff out "terrorist activity" in the run up to the Olympics. Yet another is too commercially-sensitive to discuss here but suffers similar ethical questionability. Big Data is Big Brother, with a more insidious agenda.

    Don't believe the hype that Big Data 'lets the data speak', free of pre-existing assumptions. No such thing is possible. Instead, to paraphrase Daniel Dennett, "The baggage gets on board, without being inspected. "
    Quote Originally Posted by Ivy View Post
    Gosh, the world looks so small from up here on my high horse of menstruation.

  10. #20
    null Jonny's Avatar
    Join Date
    Sep 2009
    MBTI
    FREE
    Posts
    2,486

    Default

    Yeah, I know what intrinsic means, and I knew I went off course while I was talking about it in the video. The issue was your statement that "[Big data is] not Information. It has no intrinsic value." I linked the two sentences, and inferred from them that you were implying that in order for something to be information, it must have intrinsic value (or something like that). I contest that big data is information, simply information that has to be processed more than most information we utilize. All information has no intrinsic value; its value comes from how a person uses it. If you disagree, I'd appreciate an example of information with intrinsic value; that way, I can get a clear understanding of where you draw the line at something having value for its own sake.

    Sorry for the confusion; one of the problems with talking off-the-cuff.


    @ygolo I'll address your claims point by point:


    The Big Data community itself is a source of the problem.

    I agree. The problem isn't Big Data, but how people use it. I stated this in the video.


    You are assuming you are able to adjust quickly enough to new correlations showing up.

    Am I? I don't recall assuming that.


    Statistics work on groups not individuals.

    I agree wholeheartedly. I don't pour my milk and cereal into a shoe if I have a bowl available, and I don't use statistics to make claims about an individual when I can simply talk with the person. As I stated in the video, Big Data is good insofar as we use it to guide our intuition about the world.


    All statistics assume a prior model, even rank based statistics.

    I agree. This is rooted in the very way we understand the world. We cannot think or comprehend without constructing our own mental models of our environment, so how could we ever hope to utilize data without doing the same?


    You have more information from common sense that you realize, and therefore blindly assuming a default prior could be very wrong.

    I think everyone does, and it would be foolhardy to take a shot in the dark when setting assumptions. But, this is entirely pragmatic in nature; we simply do not have time to test every possibility that could ever exist. We have to start somewhere, and using prior knowledge to guide us often saves time.
    [SIGPIC][/SIGPIC]

Similar Threads

  1. Hidden Danger of Big Data
    By SearchingforPeace in forum Science, Technology, and Future Tech
    Replies: 6
    Last Post: 08-19-2016, 11:13 AM
  2. TypologyCentral and Big Data?
    By st-t-toat in forum The Bonfire
    Replies: 19
    Last Post: 09-16-2014, 02:55 AM
  3. Is your head too big?
    By sdalek in forum General Psychology
    Replies: 22
    Last Post: 07-07-2007, 02:55 PM
  4. Big brother is Watching You
    By wyrdsister in forum Politics, History, and Current Events
    Replies: 1
    Last Post: 04-25-2007, 04:36 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Single Sign On provided by vBSSO