April 17, 2011
I’ve been hard pressed to determine how I want to spend my summer. I thought about doing Physics research, but I decided I wanted to take some time and research some things on my own. That brings me to everything I’ve been wanting to study and learn. I have a huge pile of books here to read, and I’m going to go through some of them over the summer. Most of them are related to neuroscience and the mind. They’re detailed looks into how the brain processes different types of information and represents things. Also, I plan to spend a lot of hours studying quantum mechanics, which I still feel I don’t fully understand – and who knows if I ever will. But there’s been some technological innovations which have been … well, distracting me!
One of my dreams has been to reverse translate 2D images into their corresponding 3D environments. For example, I’d love to build a little robot which has two camera eyes, and is able to know that it’s in a 3D environment, be aware of what’s around it, how those things are moving, and successfully drive around without bumping into anything. I used to spend my time thinking about how that process of going from 2D images to 3D geometry would work, and I found it immensely difficult. In fact, the greatest minds and philosophers have been working on the problem for a very very long time (hundreds of years). Well, about a year and a half or so ago I discovered Vision Science, and later, Computer Vision.
I’ve acquired a small little library of computer vision books and have been trying to find time to study through them. Now I’m finding out that developers have released open source libraries of computer vision technology which do exactly what I want! Just last night I downloaded OpenCV (Open Computer Vision) and installed it on my computer. I developed a few simple applications with it using Visual Studio including an application that supported facial recognition. I integrated it with my webcam and it identified my face as I moved about on the screen. I was practically screaming out, “Yes! This is what I’ve wanted for ages!”
What I’m thinking of doing for my personal robot is first make the simplest robot possible. I’ll attach two webcams to a laptop and I’ll walk around the house with it and see if it successfully builds a 3D environment, properly separating objects one from one another. For example, it will build an environment of my kitchen, with the chairs, table, and so on, all located in the proper locations and orientations. Next I want to make sure that it separates the objects properly, knowing that the chairs are separate from the table, and so forth.
Then I will next program in a simple sort of mind, where I link words to these “objects” and it will be capable of identifying what it’s looking at. I will be able to stand in front of it and it will say, “Oh, that’s Jason!”
After that I want to try to program in some physics and intelligence into it. For example, I want to spin a swivel chair in front it and it to know that it’s a spinning chair, know it’s velocity information, angular momentum, and so forth. That way it will have a basic ability to anticipate what will happen in various situations. I don’t know to what degree OpenCV supports this sort of thing. I’ll have to research that out.
So as I thought about all of this, I decided to do more research over the weekend and see what else I could find. It turns out that the same group which supports OpenCV just late last year started a new open-source framework called OpenNI (Open Natural Interaction). I don’t know all that much about it yet, but from what I’ve read, it’s all about capturing your body movements. Check it out.
Using this I could not only identify human beings who are standing in front of the robot, but also highly anticipate their movements and behavior. Maybe I could even hook two little robotic arms to the robot and it could swing at anyone near it! (or at least anyone on my enemy list). That’d be really neat to construct.
Here’s another video of OpenNI in action.
That is just too cool! I thought about starting a YouTube channel, and I would do various videos where I talk about the same sorts of things I blog about, and other philosophical questions on my mind. I thought about using ARToolkit in conjunction with this other stuff and having dialogs with little 3D animated models on my desk. The computer vision toolkit could be reprogrammed to track something far less conspicuous. Probably I’d have it track a little … oh, what are they called – they’re little mats you have on your desk which, if you had say a cup of iced tea, you’d place the glass on top of it. I’d use that as the “platform” for my little 3D people who would show up. I could have a whole little gang of animated characters who would show up from time to time and have discussions with me. I could download 3D models online and then animate them using OpenNI, so I wouldn’t have to mess with any 3D studio Max and other complex video effects. All of this would be very time consuming, however. I don’t know if I’d have enough time to work on all of that AND do my physics research.
One such character would be a reverend. If I started talking about evolution, or arguments against God, he’d pop out and start preaching to me. “That’s the path of the devil son! Just listen to yourself!” Another character would be an evil mad scientist who pops up from time to time, who would be my “dark side”. He’d always offer a confused perspective lacking all ethical considerations. I’d also love to have a little Gary Coleman who pops out from time to time to say, “Whatchu talkin’ about?”
When I’d relate an embarassing story that happened to me, I’d first tell the story in a way that tries to “save face”. Then he’d pop up on my cup holder and say, “Whatchu talking about? You CAN’T be serious. Ya’ll, let me telling you what really happened to Jason…” Then he’d retell the same story in a way that makes me look pathetic and ridiculous, with a delivery tone similar to this clip.
I’d have the time of my life making something like that. It’d be hilarious too, but I’d need help making all of it. Like I said, it’d be VERY time consuming. Using OpenCV and ARToolkit, I could place small little markers around the hallways and be able to transform the house into literally any 3D environment. I could walk around my bedroom and it appear as if I was on the deck of the Death Star. The problem is I’d be completely reliant on 3D models others have created. I don’t have any artistic skills making 3D models or environments. I know how to render them though, and place them anywhere I want, even my own house!
I wouldn’t mind writing the scripts for a YouTube program like this, but it’s the video editing, splicing, uploading to YouTube, and all of that which I don’t have time for. Though in other ways it would save me time because I could talk about the brain and whip out a physical model for everyone to look at, show which areas are doing what, and it’d be easier to communicate all I want to communicate that way. It opens up a new window of ways I could communicate with people.
There’s one central problem to all of this — it takes time away from my research. When I think about what I want to do with my physics degree once I actually get it, I think I’m going to do work with holograms. That involves light and lasers, space, and combining images from multiple angles using tools like computer vision. It’s right up my alley and ties in with everything I’m interested in. I need to find out which universities have the best hologram research programs and start moving in that direction.
As for the YouTube program, I’d love to show, step by step, how a robot starts from a 2D image camera, analyzes it in amazing ways, and then is able to know what it’s looking at. Using OpenCV, I could show you guys, step by step how all that works. Sometimes I try to do so with words, but it’d be more awesome to show you guys the algorithms, step by step, and explain it to you. You’d watch it step by step on the screen. After the explanation I could show you the brain areas in your own mind which perform similar functions!
I’ve thought about this for a long time, and I think my passion lies in understanding how the brain works, and the how the “virtual model” of reality we experience consciously differs from the deeper reality that is actually out there. I used to spend many hours and long walks contemplating how the brain stores objects and identifies them from the raw signals coming in from the eyes. Now with vision science and computer vision, I’m actually seeing how that works, which is really exciting to me. The brain probably doesn’t use the exact same algorithms that say OpenCV uses, but it likely performs something very similar. The overall idea that our subjective sense of space and identification of objects is an information processing task is probably true. The more I read, the more I’m convinced that that is the case. When I’m studying neuroscience, I find out which areas of the brain are doing these information processing tasks, and I just feel like I’ve finally found out what I am as a human being. I think, “Yes. This is what the human brain does. This is what it means to be a human being.” Of course, our brain does a lot more than just build up a concept of space and identify objects – it also has a sort of “intuitive physics” where it anticipates what objects will do in various situations, it’s able to think about objects and categorize them, produces emotions, and so on – but I’ve came to a deeper understanding of myself from all of this.
Some of those books in my “to read” list over this summer include books on how the mind represents numbers. That’s something I don’t understand. I don’t know what a number is. I was watching the videos on the edge.org website and saw Dr. Stanislas Debaene lecture on consciousness and found out that he wrote a book on just this. It’s called Number Sense: How The Mind Creates Mathematics. I’m really looking forward to reading it. Hopefully I’ll understand numbers after reading it.
I also don’t understand the object categorization process the brain uses. Sure, I’ve read books, such as my textbook on Vision Science, which speculate on what the brain might be doing, but I don’t feel confident in any of them. I spent several hours yesterday thinking about that problem, but no matter how much time I spend on it, I never seem to make any progress. Unfortunately I don’t know of any detailed books on this subject, so I’m forced to think it all out for myself. That’s a very slow process.
I have a few books on the mind by Steven Pinker which I still haven’t read. I still haven’t read his book The Stuff Of Thought. I plan to do so this summer.
Ugh, I never can decide what to do with myself. I have like 100 different things I want to do at any given moment, and there’s not enough time for any of them. I’ll have to prioritize all of this and decide how to spend my summer most effectively.