Thursday, 2 October 2014

Data needs information

I was going to include this as a footnote in a document I'm writing but the it didn't seem necessary, so I'll put it here.

It can be useful to distinguish between data and information in order to demonstrate that data is not information and that you need to extract meaning from data to get information. However, data needs information in the same way that information needs data. Data presupposes at least the potential for finding meaning and therefore information. Stuff would not be data if there was no chance of meaning ever being extracted from it. It would just be: stuff. We wouldn’t call it data


Tony Hirst said...

I started thinking that information is contextualised data. This as a crude draft from TM351:

data is a collective noun that refers to one or more raw facts or pieces of what we might describe as raw or decontextualised information (“information without context”). When presented in a particular context, data may become useful as information. Data may be represented in a standardised way (a data representation, data format or data interchange format) to facilitate its processing or interchange between different processing systems. Data may be stored collectively within a dataset or a database. A single item of data is usually referred to as a “data point”, a “data item” or a “data element”. Data itself is often presented in a decontextualised form, although that is not to say it is without context: for example, there is always some context arising from the way it was collected, the original purposes for which it was collected, the original interpretation that the person collecting the data identified as part of the rationale for collecting just that data in just that way, from the way it is stored (making certain forms of retrieval easier, or harder, to achieve) and from the way it is presented.

There is an issue of course - any data point is political in the sense that it was collected in a particular way for a particular purpose and this metadata contextualises it in a particular way.

I'm a little stuck on what the 'context' that means we interpret data as information might be? Perhaps it's that the data is interpreted in a way that means we understand something about the world in one way rather than another. For example, 6' may be a piece of data and the relation height(A,6') the context that means we have the information that A is 6' high. (This may or may not be true - -if it isn't, is it still information? What if A is actually 5' high?). Also, is it "more informative' if we know that 6' is relatively large or small compared to some other set of heights? Is that different information?

onesecbeforetheend said...

"Data" is the plural of "datum," Latin version of Greek "dedomenon" that actually means "given." An -at least- etymological approach would sustain that "stuff could be data." What's an example of stuff incapable of getting meaning extracted from them? Even if one mentions such examples, the very fact that they will be mentioned on the discussion will magically turn them into semanticized information. They even get information from Big Data garbage bins nowadays. :) Fantastic blog, I loved the fact I finally disagree with something posted here, so I comment in order to express my respect as well!

David Chapman said...

Sorry for the delay in responding to these comments.

Yes, I agree, onesec, in an absolute sense, and that line of argument leads to the infosphere of Luciano Floridi, in which the whole of reality is information.

I keep coming back to the language of maps (models, abstractions) and territories (reality, data, stuff).
It is impossible to deal with the territory so we use maps. For any given purpose we have a map, and abstracting from the territory gives us entities on the map. The stuff in the territory that becomes entities on the map we call the data, and since no map encompasses everything in the territory (you can’t have a 1:1 scale map) there is stuff in the territory that is not data *for that map*. However, since we can’t deal with the territory, only with maps, stuff that never gets represented on any map can’t be talked about, so doesn’t exist in any meaningful sense.

I can’t quite decide whether I want to call the entity in the map ‘information’ or the process of generating/extracting the entity as information.

Tony, perhaps this is all a bit too abstract for TM351.

onesecbeforetheend said...

Thanks for the response David!
Yes, in the sense of the map/territory relation, I suppose I agree. Data as territory then becomes a micro-metaphysics, semantic meaning passes to a mid-level metaphysics, and the potential virtualities of our mappings become a deleuzoguattarian deterritorializing "line of flight", or a macro-metaphysics.

Of course there's always Bateson's "difference that makes a difference", that helps guiding ourselves within the threshold of abstracted realities and "real" raw data.

All best :)