What is this? I haven’t posted anything on my blog for almost a month?! How irresponsible of me. If I said I’ve been too busy to post anything, I’d be lying. Really I’ve been immersed in some personal projects. Maybe I’ll talk with all of you about some of the things I’ve been up to. So let’s get right to it.
As some of you probably already know, my passion is understanding the mind and our world, and as of the past few years, I’ve been researching how to build intelligent machines which understand space. I want to better learn how our brain understands the world we see out of our eyes, and how it builds a model of the world. What sort of data structures are used within the human brain to hold onto spatial and object information? How is it accessed? How is that information processed and changed? How does our brain make predictions on what we will experience next, forming expectations of the world? In other words, I want to build a machine that has two camera eyes and can understand the world it sees through those eyes. I want it to be capable of driving or walking around, avoiding obstacles, identifying objects, having memories of past experiences with those objects, capable of predicting your next action, and so on.
I’ve been searching for algorithms that are modeled after the human brain, using simulated neurons and neural networks. After searching online for months and months, and reading textbooks I acquired after checking out MIT open-courseware’s Brain and Cognitive Sciences program (I bought the same textbooks they use for their neuroscience degree program at MIT, and have been reading them), eventually I found a jewel. Jeff Hawkins, president of a company called Numenta, and founder of Palm computing, has been traveling around the country giving introductory lectures on building intelligent machines and simulating the brain in computers. He’s developed an algorithm which he’s modeled after the neocortex. He calls it Hierarchical Temporal Memory (HTMs). It’s the exact sort of thing I’ve been searching for. So how does it work?
Let’s first pose a difficult question. How does your brain identify objects, say, a cat? Think of how different each cat or dog you see is from each other. One cat may be skinny, another fat. One may be white, another brown, while another black with white spots. One dog may be a beagle, while another is a golden retriever. Even so, you’re able to notice that they’re all dogs and easily can identify them as such. Small children simply look at a few pictures of dogs in a picture book and can then easily identify the animals they see in real life. How can the brain do things like that? Or better yet, how can I build a computer program that can look at a video feed and watch hours and hours of footage and then identify a particular type of animal when it comes on screen?
For example, when I was watching David Attenborough’s films, in the behind the scenes footage I saw their team having to stake out a bird’s nest for days, filming and filming, waiting for the bird to come back. Imagine if they could just leave their camera there, hidden in the brush, and film all the footage, and then have a computer “watch” all of that film, identifying the moments in time when the animal came back to the nest. AI could track the bird, moving the electronic tripod to keep the bird in view. That way the researchers wouldn’t have to sit and watch hours and hours and hours of footage.
Or say you wanted to make Youtube and Google far more intelligent. Instead of just applying intelligent machines to just visual information, say we applied it to audio information as well. Say you are wondering about the positions of a political candidate in an upcoming election. Has that candidate ever stated their position on such and such an issue? You could ask, “What does Ron Paul think of Medicare and Social Security?” And then AI algorithms would search Youtube and find particular clips of Ron Paul stating his positions on those topics. It could cut out small portions of much longer clips and bring them up to you for viewing.
Sounds neat doesn’t it? Well that’s what’s currently being developed these days and it’s amazing technology. That’s the sort of thing Jeff and his colleagues at Numenta are working on. But how does it work? First off, you can read their research papers here.
I was particularly drawn to them because I have always wondered how abstract thought took place within the human brain. I always wondered how the brain stored information about a generic “cat”. How would I write an algorithm that could identify a cat? I had no idea how the brain did that. But now I think I get it. Take a look at this picture below. This is the idea behind HTMs.
If you look at your brain’s neocortex, which is where this sort of thing happens, you’ll see that it is structured in layers. Though this is a gross simplification, the sensory organs, such as your eyes, feed into the bottom layers, which then process the information upward to higher and higher layers. Higher layers also feed back down to lower layers, but we’ll talk about that in a second. So what’s going on there?
Basically the brain starts with simple patterns, such as a direct image input from your eyes. The neurons then feed that information upward to the next layer up, which finds patterns in a small portion of the image. You can see that in the HTM image above. And then that feeds up to the next layer above it, which finds patterns in the patterns. Then the next layer up finds patterns within the patterns, within the patterns. And so on and so forth. A common very simple pattern algorithm is run over and over and over, passing the results upward in a pyramid of pattern information.
The same idea applies to audio information coming into your ears. You start with basic audio coming in from your left or right ear, which then feeds up to a higher level, and then another higher level. Each layer looks for patterns within the layer below it, and you end up with patterns of patterns of patterns of patterns.
Going back to our cat example, the information “cat” would be a higher level concept in this pyramid, and if you traced “downward” in the pyramid you would come to individual experiences with particular cats you’ve had contact with. So in one grand stroke, your brain is forming memories of the particular pet cat you’re playing with, but also forming generalized ideas about how cats behave in general, how they appear, and so forth. Your brain then comes to an understand, “Ah, so this is what a cat is like. This is how they behave.” And then in the future your brain can identify cats and have expectations about how they behave. For example, you’ll know to be careful when dangling your socks in front of their eyes as cats have an instinct to claw such interesting objects, possibly injuring your hand if you’re not paying attention.
Now let’s talk about the connections that feed downward. Your brain doesn’t just passively observe the world around it – it tries to make predictions about what will happen in the future. When I see my pet cat Meanus lying on my bed, I have had a lot of experiences with her. I know what to expect and when I go to rub her belly, I know what she’s going to do. My brain takes visual input from my eyes, which then triggers this pattern recognition process described above. If finds patterns, and then patterns within the patterns, and then patterns within the patterns within the patterns, and then matches that up with, “Oh, that’s meanus!” So those particular neurons fire and I become conscious of being in the room with my cat. Now at the same time, my brain is constantly comparing my present experience with experiences I’ve had in the past. Past memories of Meanus are being called up and accessed, being used to predict what she will do next. That’s what the feed downward links do. In particular they compare past experiences to the present, and if they’re not lining up the brain says, “Wooaaahhhh. Something new is going on here. Alert! Attention, attention, focus attention on this!”
For example, if I was here typing on my computer and then Meanus stood on her hind legs and started audibly singing, “Fly me too the mooonnn, let me plaaayyyyy among the starrrssss…” My head would spin, I’d be blown away, and then I’d think, “What the HELL IS GOING ON!” I’d lose interest in everything else and watch in silence as Meanus crooned me a toon. My brain would recognize the Meanus patterns but when it compared what Meanus is doing now to what she’s done in the past, it wouldn’t recognize the behavior, would consider it “weird” and out of the ordinary, and my attention would be drawn to it. That’s because this would be violating my brain’s current mental model of the world. Cats don’t sing! My brain would then have to start rethinking Meanus, such as, “How is she able to sing? Has she been possessed by spirits? Is she being controlled by aliens? How intelligent is she? Am I dreaming? Is this really happening?” I’d have to then change my relationship to that object and how I plan to respond to it in the future.
Before going on, I’d like to bring up something which I found fascinating about all of this. Many years ago I remember reading Wittgenstein’s Philosophical Investigations and he mistook this process for free will. Or maybe this is what free will is? He said we only notice our free will in action when our expectations are violated. For example, we may will to place an object down on the kitchen table, but as we lay it down and let go of it, it starts to fall over and roll toward the edge of the table. That’s when we scramble to grab it before it falls off and breaks. Our free will decided to place the object the table, but that decision was violated by reality, and then we had to make a new decision to grab for the object. Now, however, I understand that that’s just how the brain is structured to work — the HTM process. It’s how that pyramid hierarchy of information processing works. If something violates our expectations, attention is focused on that until the situation is brought under control. Most of it is an unconscious process though, and I don’t think it explains free will.
Ok, so how does this tie to how I’ve been spending my time? Well, first off, I’ve been studying neural computation, and how to model neural networks in software. I want to implement something like Numenta’s HTMs and then rig a computer up to go around, processing information from cameras. I want to store all the visual information in this HTM hierarchy and then train the system to identify objects. Next, I want to be able to go into this huge multi-terabyte database of visual information and run a program on it which can go into that HTM database and pull out 3D spatial information. I want to be able to say, “Computer, generate me a 3D model of my bedroom.” It then searches its database, finds that tree of information, processes downward through it, and then builds a 3D model of my bedroom on my screen, rendered in OpenGL. Then I can fly through it and look around.
I want to be able to walk around a place with a camera, filming things, and then show that film to my computer, let it parse in the video feeds, and build up an ever growing database of visual information. I could show it a video clip of me walking around a college campus and then say, “Computer, build a 3D model of the buildings you saw in that video.” It would then do so.
That’s the goal I’m working on. I want to fully understand how our subjective sense of space works and how our brain works with space and numbers, and logic and everything else. This HTM stuff is deeper than just space. It’s how language and abstract thought work. It’s how intelligence works. This is what intelligence is. It’s this process of finding patterns within patterns within patterns, and organizing them in a hierarchy, and making predictions with that information. At least, that’s what I’m currently thinking intelligence is.
From what I gather, algorithms similar to this are what are being used to build intelligent computer chips. Companies like IBM are wanting to build computer chips which process information in a way similar to these HTMs and then the computers can be intelligent. This doesn’t get rid of normal processors, but by changing the way information is being processed, sensory type information, such as sight and sound, can be processed far more effectively and easily.
However, during the past week or so I’ve had a long diversion from my research. I was a little burnt out from my brain research and had just taken two exams in class – math and physics. I wanted a break and needed some time to work on something different. It’s nice to do that here and there. I have just acquired a new development IDE and was reading through some programming books to learn the new features. That was nice. I was writing some “goof off” programs to test how things work. I ended up making a really stupid program with a paintbox which drew random lines within it. Once I finished that I leaned back in my chair, yawned, and went for a walk. It was a good day.
Once I got back from my walk I got to looking through my bookshelf and saw an old classic I hadn’t read in ages. It’s called The Black Art of 3D Game Programming: Writing Your own High-Speed 3D Polygon Video Games in C. Now there’s a gem for you! It was written in the early 90s and shows you how to write 3D games in DOS! Epic! Beyond Epic! Why is that? This is before the days of Windows, and DirectX, and OpenGL. This is back when you had to write directly to your video card’s memory buffer, writing your pixels for each dot on your screen. Memory address (X,Y) on your 320×200 screen set to some RGB value. I was thinking, “Ah, I remember this book. I love this book.” Then I got to reading the section on 3D game programming, the mathematics involved, the matrices, the vector mathematics to calculate collisions, and so forth. Then I got a wild idea. What if I wrote a 3D engine over the weekend, rendering to Windows paintbox? LOL. Software rendered 3D graphics engine using the code from this ancient book. So, that’s what I did!
I got to cranking and then made a simple Doom like game, where I was running around in 3D world. I wrote the code to render the picture, pixel by pixel. I had to write code to draw individual lines, to draw polygons and triangles, and to do lighting effects. LOL. I didn’t use any libraries of any kind. No help. I did it all from scratch. I coded my 3D points in 1 by 4 matrices, which I then multiplied by rotation and translation matrices, rotating my scene relative to the camera. I’d swing my mouse around and change my camera’s direction cosine angles and rotate my scene. That was cool.
I logged into MSN messenger and told one of my old friends how my Saturday had been one of the best days of my life. He didn’t seem to understand why writing a 3D graphics engine from scratch into a Windows paintbox was anything to be happy about. But to me, I went under the hood into how virtual reality and video games work, understanding how all the fancy 3D graphics and video games of today work. How do they render the walls, the textures, the lighting, and all of that? How do they make it look so real? Well, I know how it works, all the way down to drawing each individual pixel to the screen. That’s why it cool. I like knowing how stuff works. I especially like programming simulations.
In the video above, you find a ray tracer, which I would like to eventually find time to program. I programmed something similar, but much more simplistic. Mine also used complete software rendering, without using any libraries. That way I had full control over how things were rendered, the physics used, and so forth. In their simulation, they blast billions of light rays into a scene, which then bounce around, physics calculates how the light bends and reflects, and then it collides with the camera. It works how reality works. My simulation I wrote over the weekend uses a series of tricks and rotations, similar to how 3D games programmed today work. Ray tracing requires too much CPU power to handle in games today, but in 20 years, I guess it will be the norm as to how games are rendered. It produces photo-realistic graphics, as you can see.
I like video games and virtual reality in particular because in that world you’re God. Think of it this way — say you were made God for a day. You could change anything, and make anything work however you wanted. You could build your own world from scratch, to your every specification. How would your world work? Well, when you write your own video games, that’s essentially what you’re doing. You can make any experience for the player you can imagine. The real test is how creative you can be. When I’m out for walks, I try to notice every little detail of this world. I look at things both from the angle of the physicist, where everything is ordered and following laws, to the emotional and graphical, such as an artist’s perspective. I look at the sky, with the blues and oranges and reds, or the stars flickering in the sky. I notice specular highlights from the lights in the room shining on reflective objects, and light and soft shadows being cast from multiple light sources in the kitchen. I think to myself, “If I were God of my own virtual reality experience, how would I program my reality to work? What good things from my world would I keep, and what things would I change? What would people in my world experience?” I find that doing such an exercise makes me extremely happy because it makes me focus on everything that’s awesome about this world, and learn how those things work. Then I try to program those things into a computer, and I come to a very deep understanding of the things I love most. Also, when undergoing this process, I also have to search and find ways to bring those experiences to other people.
Richard Feynman once said, “What I cannot create I do not understand.” If you can’t create your own virtual reality similar to our own, you don’t understand the world you live in.
Oh yeah, and before I forget, you may have noticed that comments are closed. I started getting like 500 spam comments a day from bots. I got tired of cleaning them up. I don’t know how to stop them, so I just closed comments down entirely. I’ll try to work on fixing that sometime. The internet is such a sleazy place. Bots go around trying to create links to people’s sites, creating false sites full of viruses and spyware, all to help push some crappy website’s page ranking up in Google’s search results. Losers. Try writing real content and having a real website, and then maybe people would come to view your site without having to resort to tricks and lies. You’re like Newt Gingrich, hiring companies to make millions of fake twitter followers.