Tuesday, December 14, 2010

Is Everything Miscellaneous?

     Been reading "Everything is Miscellaneous" by David Weinberger recently. It's pretty good; it helped me understand some things about the Web/Internet that I hadn't understood, or hadn't appreciated, before. One thing is Wikipedia. He showed how its community generated entries, when controversial and fought over, will eventually come to a consensus on the entry and the vocabulary to describe it. Of course, this pushed my centrist, compromising, consensus-builder buttons big time. But even after slowing down and thinking it over, I still agree that this technique will produce valid knowledge at least as often as reliance on experts. God knows "experts" have pulled some amazing stunts in the past; that's a source of material for lots of web sites and stand-up comedy routines. But Weinberger, like most writers on new technology, seems to take the technology he writes about (the WWW) as the only source of all truth, beauty and goodness. He seems to me to say that in this brave new world knowledge itself is different and better. He throws around the phrase "third order", short for third order of knowledge, with great gusto; the Web has made all things new.


     Not that Third Order doesn't make a lot of sense. The first two orders of knowledge can be summarized as data (descriptions of the real world) first, and metadata (data describing data) second. Both of these orders have definite limitations. Weinberger's favorite example of metadata is the Dewey Decimal Classification, especially as implemented in the good old card catalog. These two, while they are vast improvements over simple lists of books, have definite limits to their usefulness. He describes these limits as inevitable consequences of their physicality. I find it a persuasive argument. The card catalog is especially convincing. When I was young it was how you found anything in the library, and it had limits. Subject terms have to be assigned by an educated professional, and if too many books have too many subject terms, the catalog swells to unusability. Conversely, when you kept the catalog to a manageable size you inevitably left out a lot of useful information. Plus, having studied indexing myself, I know that the best educated expert in the world can’t anticipate everyone’s needs.

     It is certainly true that new information technology has enabled a drastic increase in what is possible and what is practical. On the Internet we are not hobbled by many of the physical limits on how we link things, and how we label them. But he sometimes seems to me too rapturous about what these changes and new powers mean. He almost seems to say that reality itself has been re-created. To be fair, he explicitly denies that in the last chapter, but his re-definition of "knowledge" is still much too sweeping for my taste.

     Probably my biggest beef with the book is the chapter in which he states that meaning is now a social process. I don't buy that, at least not without a whole lot of nuancing and limitations. It reminded me of a humorous article from many years ago, when I was writing SPSS jobs for UD faculty research projects. SPSS stands for Statistical Package for the Social Sciences. It had commands to perform statistical calculations specially designed for social science research. You would usually start with a GET FILE command, to fetch the file of data you wish to analyze. Surveys and other such data-gathering instruments and techniques usually have gaps, where for instance a respondent declined to answer a question. In such cases you would enter a special number, and then tell the system to treat occurrences of this number as missing, not real data. The command ASSIGN MISSING did this.

     This article started out by observing that sometimes, sociological research just doesn't go well. You blow through a whole pile of grant money, and nothing correlates, regresses, or lines up with anything else. So he invented some new SPSS commands for when this happens. Instead of GET FILE, use FAKE FILE. You enter the variables and the coefficients you want, and the system generates a data set where everything fits the way it should. Way easier and cheaper, right? But this is a data set with no meaning. So, use the ASSIGN MEANING command. After each variable name, enter a description of its meaning. There; perfect research every time. Of course, the journal editors or referees might balk at "fake data." So you describe your data set as "stochastically inferred data." Problem solved!

     I can't hear anything about "assigning meaning" without flashing back to that article, and I just can't take the concept seriously. But this post is long enough; maybe I'll ruminate sometime about meaning, and what it means.