Unpopular Ideas: When everybody films everything

Trending in 2014:

dashboard cams
cop cams
home security cameras
GoPro
Google Glass
stealth video glasses
first person hyperlapse video
quantified self

We are recording more and more of our personal lives, and we are starting to do it with passive devices that film whatever is in front of them.

What happens when the technology advances a few more steps? We will have:

unobtrusive devices
recordings that are streamed directly to the cloud and stored forever
batteries that lasts all day
high resolution video and high quality stereo audio
a digital rights system that protects everyone's privacy

It will occur to many people that they might as well just record everything, all the time, and they will eventually forget how to turn it off.

Privacy

We enter a new era of digital rights when people begin recording their everyday conversations. If you film me talking to you, you own the recording. You can play it for yourself whenever you want. But there would be terrible consequences if you can make it public without my permission, or hide it from me. Most communication would go off camera.

There has to be a way for the subjects of a video to control access to it.

How do we know who is in a video? The video is geo-coded, and let's say the direction it's pointed in is known, and the positions of the people nearby are known (since they are also creating geo-coded videos), then we can be accurate about who is in the video, and therefore who needs to be consulted on allowing access to people who were not there. Face recognition is a double-check.

Some principles:

I can always access my own feed, no matter who is in it.
I can access any feed that I am in.
If I am the only person in my feed, I have complete control over its access by other people.
If my feed contains other people, I cannot necessarily share it. I need permission from the other people.
I can release any feed containing me to any given group. But it only becomes available to that group when everyone in the feed allows access by that group.
Each person can set a default access level. If I'm going out, I may want everything to be by default public. For many people, that will be the default most of the time. But there's an easy way to switch to private whenever I want, and that means any video that includes me will require my explicit permission to share.

People can decide to make portions of their feeds available to various groups, and the general public is one such group.

The movie of your life

First person videos are always lacking something -- the hero of the story is behind the camera, never seen. As recording becomes omnipresent, there will be images of you in feeds filmed by other people. Software will be able to stitch together the information in multiple videos to create something resembling a movie about your day, automatically selecting scenes with interesting content, showing you as you appear to the people you have encountered.

Hitchcock said "drama is real life with the all the boring parts cut out". The same software that presents the interesting moments of your day can make the movie of your life, at various running times. Everybody may have a novel in them, but someone has to write it. Your life movie is generated automatically. All you have to do is wear the device.

Passive filming produces videos that are mostly dead space. No one watches them from start to finish, much less live. But they capture rare moments of interest that are missed by active recording. The times we pose for a photo and say "cheese" are significant but dull. Will will stop interrupting our lives to take pictures and start picking key frames from video feeds instead.

Our collective perceptual universe

Taken together, these videos form a complete record of whatever is seen and heard by all the people wearing recording devices, and we are stipulating that is everyone.

The vast majority of the information we receive is through sound and image. By a remarkable coincidence, these are also the two senses that can be captured and replayed at will. Or maybe it's not a coincidence. We could have developed methods to record and play back sequences of smell, or touch, or tastes, by why? Odorama never caught on. Audio and video is enough.

Metaphysics

There is a much richer physical reality that causes the video and audio, far more difficult to capture, but we are also stipulating that the recording devices are as good as our senses. For our normal human purposes, the deep physical reality below our awareness is not important. We have only our perceptions of it to work with to build the world of experience and ideas. The perceptual world is becoming tractable in a way the physical world below cannot be. These feeds form the bedrock and raw data for any true description of the perceptual world.

Street View Video

The feeds available to you are stitched together into an interface like Street View. You will be able to pan and zoom around the map and inside buildings, now with a lot more granularity and the added dimension of time. You can sit at a cafe and watch the world go by all day, thanks to feeds from multiple patrons, or see a concert from the point of view of every audience member.

Street View Video will become a partial 3d model of the physical world, over time. How do we combine multiple feeds? Street View does its best to morph one street photo to the next as you virtually walk down the street. The corresponding action for video is a lot more complicated. The work on hyperlapse video applies here, now applied to multiple feeds from multiple sources. Each new feed of the same event from a different perspective clarifies some part of the unified model.

Your complete history is a combination of your own feed and the feeds of everyone around you. The 3d model gives out at the edges. You can't go watch that tree falling in a forest unless someone was there to see it.

Information extraction

We can also make symbolic models using this raw data, recognizing speech and faces across multiple videos. Individuals would be represented by point locations, and speech by text. This leaves behind most of the detail at the raw feed level in favor of a more compact model that's easier to navigate and contains much of the interesting information. You might explore at this level, then dive down at times into the raw feeds for the actual video.

The software that edits out the boring stuff will rely heavily on recognized speech and natural language processing methods like text summarization and named entity recognition.

Speech

The meaning of a word is only approximated by a dictionary definition. Meaning is determined by use. Which use? Some uses are more equal than others, but the full story of a word would include every use. Every time someone has uttered or written a word, they contribute some epsilon of meaning to it. Web-based written text allows this kind of analysis now, the new thing is that we can now see the much larger corpus of speech as well.

Levels of summarization

At a much higher level, there is the News, where reporters write descriptions of events by summarizing the feeds of the participants. Summarization at this level is beyond the software I am imagining.

Beyond News is News Analysis and finally History, which summarizes thousands of news reports.

All levels of summarization will be traced back to the raw data they describe. In the future, there are no more arguments about facts, only about interpretations.

It is amazing that this is in the realm of possibility. The perceptual world is huge but finite and we may someday have enough storage for the whole thing.

Unpopular Ideas

Sunday, September 21, 2014

When everybody films everything