December 12, 2011
For a long time now I’ve wanted to write a computer program which can take a film of me roaming around town and automatically produce a 3D model of the area based on the film footage alone. I found a video on Youtube where researchers did just that.
Isn’t that awesome? I can write code which can do this same thing, but what I’m now wanting to know is how to separate objects one from another. For example, your mind would automatically separate the statue, sidewalk, and buildings all as separate things. To this computer algorithm, it’s all connected and it’s just one stream of points located in space. It doesn’t “know” what those points represent. It doesn’t say, “This block of points over here represents the civil engineering building, these other ones a statue, whereas these over here are the sidewalk extending toward the central park.”
The more I’ve looked into this problem, the more difficult it becomes. For example, how would an algorithm know that the ramp in front of a building is a part of the building, or just a structure nearby? How would it know to group them together? It’d have to know the purpose of ramps, and know that humans with disabilities need those in order to get inside the building. Knowing how to separate the world out into objects will be really complicated, requiring a deep understanding of humans, their desires, and how the world works. It would require learning about objects and what role they play in life. That takes the problem to a whole new level. I still have a lot to learn about how our mind works, but I’m making progress!