Data and Information

Data is everywhere in abundant amounts. Surveillance cameras continuously capture video, every time you make a phone call your name and location gets recorded, often your clicking pattern is recorded when surfing the web, most financial transactions are recorded, satellites and observatories generate tera-bytes of data every year, the FBI maintains a DNA-database of most convicted criminals, soon all written text from our libraries is digitized, need Igo on?

But data in itself is useless. Hidden inside the data is valuable information. The objective of machine learning is to pull the relevant information from the data and make it available to the user. What do we mean by “relevant information”? When analyzing data we typically have a specific question in mind such as :“How many types of car can be discerned in this video” or “what will be weather next week”. So the answer can take the form of a single number (there are 5 cars), or a sequence of numbers or (the temperature next week) or a complicated pattern (the cloud configuration next week). If the answer to our query is itself complex we like to visualize it using graphs, bar-plots or even little movies. But one should keep in mind that the particularanalysis depends on the task one has in mind.

Let me spell out a few tasks that are typically considered in machine learning:

Prediction:Here we ask ourselves whether we can extrapolate the information in the data to new unseen cases. For instance, if I have a data-base of attributes of Hummers such as weight, color, number of people it can hold etc. and another data-base of attributes of Ferraries, then one can try to predict the type of car (Hummer or Ferrari) from a new set of attributes. Another example is predicting the weather (given all the recorded weather patterns in the past, can we predict the weather next week), or the stock prizes.

1

Interpretation:Here we seek to answer questions about the data. For instance, what property of this drug was responsible for its high success-rate? Does a security officer at the airport apply racial profiling in deciding who’s luggage to check? How many natural groups are there in the data?

Compression:Here we are interested in compressing the original data, a.k.a. the number of bits needed to represent it. For instance, files in your computer can be “zipped” to a much smaller size by removing much of the redundancy inthose files. Also, JPEG and GIF (among others) are compressed representations of the original pixel-map.

All of the above objectives depend on the fact that there is_structure_in the data. If data is completely random there is nothing to predict, nothing to interpret and nothing to compress. Hence, all tasks are somehow related to discovering or leveraging this structure. One could say that data is highly redundant and that this redundancy is exactly what makes it interesting. Take the example of natural images. If you are required to predict the color of the pixels neighboring to some random pixel in an image, you would be able to do a pretty good job (for instance 20% may be blue sky and predicting the neighbors of a blue sky pixel is easy). Also, if we would generate images at random they would not look like natural scenes at all. For one, it wouldn’t contain objects. Only a tiny fraction of all possible images looks “natural” and so the space of natural images is highly structured.

Thus, all of these concepts are intimately related: structure, redundancy, predictability, regularity, interpretability, compressibility. They refer to the “food” for machine learning, without structure there is nothing to learn. The same thing is true for human learning. From the day we are born we start noticing that there is structure in this world. Our survival depends on discovering and recording this structure. If I walk into this brown cylinder with a green canopy I suddenly stop, it won’t give way. In fact, it damages mybody. Perhaps this holds for all these objects. When I cry my mother suddenly appears. Our game is to predict the future accurately, and we predict it by learning its structure.

results matching ""

    No results matching ""