Years ago I remember sending an email to a scientist friend of mine who, at the time, worked for Lockheed Boeing. I asked him how he thought the brain processed the objects we see and handle. I remember posing a dilemma I wasn’t able to figure out at the time — how did the brain know it was looking at the same object even though it rotated, changed sized, and even appearance?
The example I gave him was actually a tree stump out in the middle of a field. At a distance the stump is just a brown and beige dot. You can’t really make out any detail at all. As you approach the stump, you start to see the bark and if you get even closer still, you start to see all the detailed fabric of the bark, the wood grain, and the tree rings. In terms of retinal images on the eyeball, these are vastly different experiences. If you were a painter, and you were to paint the stump at a distance and the stump up close, the painting would be completely different. Yet somehow our brain knows that the stump is the same stump.
You may think, “What’s the problem?” Well, maybe I can pose this problem another way. Sit down on a computer and imagine having a digital video clip of a person walking toward the stump. Now you have to write a computer program that examines each image from the video, one by one, see how the colors change, analyze them in some complex way, and know that it’s looking at blades of grass in a field, that you’re an observer standing so many feet above the ground, and that you’re approaching a tree stump. How would you store each blade of grass? How would the algorithm know that the blade of grass in one image is the same blade of grass in the next? How would it know about the tree stump? What sort of storage mechanism would you use to store the information about the stump, the grass, and the spatial relations between everything? How much detail would you keep? How would you relate them in a time sequence?
I pondered away at the problem for another two years and I slowly made some headway, but it was very difficult. I can remember one evening going out to Lowes and buying a bunch of meter sticks and heading out to my backyard. I sat out there for four hours staring at meter sticks laid out across the yard. I walked toward them, away from them, and rotated them every which way. After a while I thought to myself, “My God, the brain is simply amazing. However it does manage to achieve this, it must be so complex that it’d take a lifetime to figure out.”
Though I didn’t know it at the time, it turns out that very smart people at M.I.T. and other universities had been working on this problem for decades with computer vision. When I first figured out such a subject existed, I was like an elated school boy given a bowl of ice cream and chocolate milk. It turned out that they have had algorithms since the late 1980s which could process objects and their orientation in space. I was practically ripping the books open once they arrived from Amazon screaming in my head, “How did you guys do it! How!”
I opened up David Marr’s book Vision and I was greeted with this introduction,
“What does it mean, to see? The plain man’s answer (and Aristotle’s, too) would be, to know what is where by looking. In other words, vision is the process of discovering from images what is present in the world, and where it is.
Vision is therefore, first and foremost, an information-processing task, but we cannot think of it just as a process. For if we are capable of knowing what is where in the world, our brains must somehow be capable of representing this information – in all its profusion of color and form, beauty, motion, and detail. The study of vision must therefore include not only the study of how to extract from images the various aspects of the world that are useful to us, but also an inquiry into the nature of the internal representations by which we capture this information and thus make it available as a basis for decisions about our thoughts and actions. This duality — the representation and the processing of information — lies at the heart of most information-processing tasks and will profoundly shape our investigation of the particular problems posed by vision.”
Marr wrote this book before I was even born, but I believe he somehow was thinking of me. Several hundred pages later, he had done just what he set out to do — give a series of algorithms and processes the brain might use to parse objects out of the images we see, and then store them in some usable format which we can use to make decisions. It turns out I was correct about one thing — vision is practically a miracle.
Scientists from all over the world have been working on these issues and they’ve came up with so many clever ideas trying to reverse engineer what the brain is doing for us. It turns out the problem I was thinking of has a fancy scientific name vision scientists use: the inverse problem. Quoting from my other great book, Vision Science: Photons To Phenomenology,
“1.2.3 Vision as an “Inverse” Problem
We have now described how light reflected from the 3-D world produces 2-D images at the back of the eye where vision begins. This process of image formation is completely determined by the laws of optics, so for any given scene with well-specified lighting conditions and a point of observation, we can determine with great accuracy what image would be produced. In fact, the field of computer graphics is concerned with exactly this problem: how to render images on a computer display screen that realistically depict scenes of objects y modeling the process of image formation…”
Ah, computer graphics. My love in life — programming computers to do fancy 3-D graphics, virtual environments, and simulations. I was floating away as I read that. I thought, “I’m right at home in this world.” Vision Science continues,
“In effect, the program simulates the optical events of photon emission, reflection, transmission, and absorption to construct an image of a “virtual” environment that does not exist in the physical world. Such programs allow the effects of different orders of light reflection to be illustrated (e.g., in Color Plate 1.1 A-D) because the program can be stopped after each cycle of simulated reflection to see what the image looks like. This is not possible with real optical image formation.”
Mmmm. Ray-tracing. Reflective surfaces. Specular highlights. Dynamic lighting. Mirrors. I can remember the first book I learned 3D graphics programming from. It’s called The Black Art of 3D Game Programming, written by Andre LaMothe. I still have it here. It’s a gem. It was one of the first books out on the subject. It’s old school. Real old school. New books use OpenGL or DirectX to do all the rendering for you. LaMothe does everything by scratch, rendering each line and polygon, pixel by pixel. That’s how real programmers do it! I can remember the early chapters on mathematics and line drawing. He loads up the screen to 320×240 VGA, then he shows you how to draw a line from X1, Y1, to X2, Y2. He writes directly to the video card’s memory buffer. Next you establish your 256 color palette. Yeah baby, those were the days! Then you draw colored polygons, pixel by pixel. Next up, matrices and transformations, camera positions and projections. You scale and distort those polygons based on the observer’s position. Then the later chapters he gets into lighting. He gets into the physics of how surfaces reflect light and how you shade those polygons differently based on the relative positions of the light sources and the observer. To save CPU cycles various optimizations, such as binary space partitions are introduced. Visual illusions and effects, such as motion parallax are discussed… *Heart races… faints like a school girl meeting Justin Bieber*
Sorry, the sheer epicness of the subject matter being discussed made me faint. I can’t stress how much fun it is to program simulators that say emulate mirrors, especially using OpenGL. You use matrices and transformations to emulate them. What you do is render your scene normally without the mirror. Then you lay out a stencil over everywhere but the pixels where your mirror is located. Then you get the mirror’s surface normal vector and render your scene again, but change your camera position as if you’re standing behind the mirror at just the right distance, aligned with the mirror’s surface normal. So for every mirror in the scene you do multiple rendering passes. It takes up a lot of CPU cycles and back when I was first programming these sorts of things computers weren’t near as powerful. I think my first simulator which had a mirror in it was running on a Pentium II 450 Mhz, so I had to be careful not to overload it. It’s not much of a simulator if it runs at 3 frames a second! It was even more fun rending mirrors looking into mirrors. Or even more fun, distortions caused by curved mirrors! Back to Vision Science,
“The early stages of visual perception can be viewed as trying to solve what is often called the inverse problem: how to get from optical images of scenes back to knowledge of the objects that gave rise to them. From this perspective, the most obvious solution is for vision to invert the process of image formation by undoing the optical transformations that happen during image formation.
Unfortunately, there is no easy way to do this. The difficulty is that the mathematical relation between the environment and its projective image is not symmetrical. The projection from environment to image goes from three dimensions to two and so is a well-defined function: Each point in environment maps into a unique point in the image. The inverse mapping from image to environment goes from two dimensions to three, and this is not a well-defined function: Each point in the image could map into an infinite number of points in the environment. Therefore, logic dictates that for every 2-D image on the back of our eyes, there are infinitely many distinct 3-D environments that could have given rise to it.”
Ouch! An infinite number of possible 3-D environments could give rise to each 2-D image. No wonder I was having such difficulties figuring out how this worked! But in comes Herman Helmholtz, one of the greatest physicists to ever live, and the founder of vision science. He has a brilliant idea. He theorizes that the brain makes certain assumptions about the environment, and those assumptions constrict those 3-D environmental possibilities. These constrictions will allow the brain to construct, or at least estimate, a model of the environment by making educated guesses about what we might be looking at. Keyword is MIGHT. But what sorts of assumptions are we talking about? Ah, here is where life gets interesting!
As you know, we humans evolved here on planet Earth, roaming about the fields under the sun and moonlight. You find out that the assumptions mother nature filled our brains with are environmental conditions common to planet Earth. For example, our brains unconsciously assume that light comes from above us. Take a look at these images below.
You’ll notice that three of dents look like holes, whereas one looks like a bump. That’s how your brain parses in the color shading and gives a spatial meaning to the picture. The bottom left bump juts out whereas the other three are holes. But what happens if I simply flip the image over?
Wild huh? The three holes now become bumps, and the bump becomes a hole. To those who don’t understand how the vision system works, these come across as party tricks and neat coincidences. But this is no cheap trick. This is very important. Carefully examining the small and strange things in life oftentimes lead to the most important discoveries. These sorts of things are key to understanding the very nature of how our brain gives a sense of space and time! This is important — your brain builds a virtual model of space based on those 2D images on the back of retina, but it has to make certain assumptions about the environment in order to do so. When those assumptions are not true for that instance, then we experience what we commonly call a “visual illusion.”
What you see is not necessarily what’s there. Frankly, I’d argue that what we see in the virtual models in our heads is pretty far from what reality actually is. For example, we see a stone as a solid object, whereas it’s actually mostly empty space. We don’t see it moving, but actually the atoms of which it’s composed are vibrating and wiggling all over – not to even mention the strange quantum effects going on. I don’t claim to fully understand the quantum effects yet, so I won’t even talk about them.
So how does the brain construct this model of 3D space? And Jason, you’re saying space isn’t actually 3D? Nope, it isn’t. Einstein’s relativity tells us we live in something more akin to a 4-D space-time that curves and is very complicated. Quantum mechanics tells us reality is even stranger. But, probabilistically speaking, the photons that end up making it to our eyes have consistent enough probabilistic behavior to generate similar images each time and our brains have evolved to generate semi-accurate models or reality based on the most probable conditions and scenarios on the Earth.
Just to name a few factors the brain uses, it looks at shading (making assumptions about lighting), it parses out 2D contours and generates a sort of 2D line-drawing of each picture, then analyzes the angles and curves of those 2-D lines, building up objects and space. It looks at shadows, it looks for horizon lines and uses a neat trigonometric relationship to establish depth, it looks for convergence in the 2-D lines which it parsed out and based on things like their slopes and where they intersect, generates surface normals and orientations. It compares the images from both eyes and sees how much the images differ (binocular effects). It builds up a gigantic database of objects which it is constantly comparing and analyzing and using in its calculations. It looks at the textures of things, such as the statistical similarities in the colors of a field of green grass and it looks at how that texture pattern shrinks and blurs as it heads off into the distance and generates the depth of the surface from that. It looks for a dulling of color, which happens for hills in the distant background as a lot of the light of lower-wavelengths is scattered as it travels toward you.
The 2-D line parsing system is pretty neat to understand. Take a necker cube. You’ve all seen it.
From the image you can see the necker cube, and the two possible ways your brain interprets it. One of the things you learn in vision science is how these lines are parsed out and how say the vertex patterns are analyzed and interpreted. It’s really neat stuff. But sometimes the lines can play havoc with your brain, which tries to assemble the lines into surfaces and assemble the surfaces into space, but then finds it to be impossible. Take the famous Penrose triangle.
Your brain starts to parse out the lines and form surfaces but then says, “Wait a minute. This surface won’t go with this other surface.” These are called impossible objects, but they are neat to look at and ponder.
Everything gets really complicated in vision science. Different information starts to conflict. The 2-D line contours which your brain parses out may say that space is doing one thing, but then the shading information may say another. The brain has a whole complex system it uses to make the best guess as to which information source best represents what you’re looking at.
The brain also does more than this though. We’ve just scratched the surface of all your brain is doing. It also groups things together by their similarities in color, their similarities in size, similarities in orientation, similarities in “fate” (it calculates a trajectory of where an object is moving), it looks for parallel and symmetric patterns in the image, searches for continuities, and the list goes on.
I’ll have to write more on all of this at a later time. What interests me about all of this is that the model our brain builds up is inaccurate and the assumptions it makes aren’t true all the time. As Richard Dawkins points out in his books and many of his lectures, our brain assumes slow walking speeds and medium sized objects. Our brain builds up a model that is intended to lug our big body around and help it walk from one place to another, and do common things like find something to eat. But the world of much larger than our bodies, and much smaller than our bodies, follows different rules than the model in our head. For example, if you accelerate up toward light speed the object shapes start to morph due to the Lorentz contraction. Though your brain tells you, “This coffee mug has this cylindrical shape, with this circular bottom and open top, in reality the cup’s shape is much stranger. If you were the lay that cup on your kitchen table and then zoom past it at near light speed, it would bend and morph and do all kinds of weird things. There’s actually relativity simulators you can download for the computer. One is called Real Time Relativity, which is neat to play around with.
If you’d rather just watch a video of optical relativistic effects, I found a video below. In it, you can see how matter starts to behave at near light speeds. You see that objects begin to bend in weird ways and their colors change.
It’s only weird to us though because we never see this happen. Once again, our brain makes an assumption that we’ll never travel that fast, and so it makes assumptions about geometry and space which only hold up if we’re moving slowly. I still haven’t mastered quantum mechanics, but from what I can tell, it seems our brain’s views on causality and time are flawed as well. I’ll write about that when I feel more comfortable with quantum physics (if that ever happens).
Combine all this with neuroscience and how brain activity gives rise to consciousness, and you’re getting the gist of where my research takes me. It’s nice to see that things I had no idea about just a few years ago are starting to make sense to me now. I suppose that’s growth and development.