Wednesday, January 11, 2012

Kinect Connect

So I haven't posted for a while, but that doesn't mean I haven't been doing anything.  Quite to the contrary, David and I have been pushing ahead full-throttle with the Microsoft Kinect.  For this coming quarter I am beginning my gaming capstone development project, which will take 20 weeks with two other team members, I am working hard on Zombie Yoga with David to get pose detection working, and we are going to jump into Game Jam at the end of this month to develop something awesome.

So yeah, a lot of games, a lot of work, and not much time to do it all and my homework.  Oh dear.
F*#k
Anyway, where shall I begin?  Where did I last leave off, anyway? Dang.

Well let's start with the Kinect, then.  Kinect in it's current state, or rather using the current SDK that Microsoft has released (The Kinect SDK beta) returns raw, unfiltered, and mostly really, really, really, really wonky data.






Like having a conversation with the Kinect
Now, to be fair, the Kinect is an awesome and amazing tool, and given what they have to work with, Microsoft has done a spectacular job.  I could sit here and grumble about the Kinect having issues, which is what a lot of people might prefer to do (the let's-rag-on-what's-popular-to-rag-on mentality) but the truth is what they have done with the input the sensor gets is phenomenal.
It's just not perfect like many developers would prefer, because that takes all of the serious effort out of our hands.  No, we have to do some of the tweaking and fiddling ourselves.


So what David and I have decided to do is kind-of-sort-of what the Kinect already does, brings together an infrared depth image and pair it with some bone data.  The Kinect software does the hard part and using the information it gets puts together a "skeleton" structure that represents a player's body.  The only problem is if my limbs go off-screen, or begin to cross, the Kinect just loses it's marbles and spends the duration of the limb-crossing by comically running around and bumping into tables while trying to collect the said stray marbles.

Now, for Zombie Yoga we want the Kinect to register poses, and the thing about Yoga is it's not all nice and planar, some of the yoga positions people can get themselves into are downright nuts.
Like cramming a monster into my homework assignment
Solution?  Take what the Kinect gives us depth-wise, split that into individual chunks of depth-based objects that know where they are in relative 3D space, and then check those against the bones that are whacky.  If where we think a bone should be there is an object of appropriate size at about the right depth we can tell the program "hey, this crazy bone-joint that is freaking out, it should go here)



That's the gist of it, anyway.  The actual implementation is going a little bit slower.  We are just about done with the code that checks out the depth image and comes up with a bunch of depth objects from said image.

Again, the Kinect and the SDK most of the heavy lifting, it scans the environment, picks out up to 7 players and sends back pretty darn accurate depth data pertaining to how far each player is from the camera (and not just the player, but every single part of the player, it can tell the difference between your nose and your forehead, which if you have a bigger nose like me isn't too impressive, but trust me)

Snakes in a Notebook.
Anyway, this part might be boring, but just how you go about extracting depth information from the depth image isn't as simple as just looking and saying "oh, look, there's my hand!"
It's easy for us, because we have eyes, but computers, like justice and awkward dates, are blind.



The trouble was being able to break apart one large image into a bunch of smaller pieces of data, without spending a whole lot of time on the image.  Basically, when a computer displays or processes an image it starts from the top left and goes through the entire image row by row.  Pixel by pixel.  That's how it has to draw.  While there are lots of "fancy" computer concepts such as multi-threading and other sweet processor behaviors, essentially a computer operates linearly.  Line by line, instruction by instruction.  One at a time.  It just does it awesomely fast.

Anyway, the problem is when you only get one pixel of information at a time, how do you, quickly and efficiently, begin to group pixels that share the same properties together when you only have information pertaining to the current pixel?

Through the power of dragons?

Well I'll skip all the ways my logic failed and give you the good version: As I run across "depth pixels" of a similar depth I keep track of the first pixel I saw that matched my depth description, and as we go across horizontally (how pixels are read, one by one, row by row) I keep track of the latest pixel that met my requirements.  Then, when I find a pixel that doesn't match what I was looking at, I kick that last line to an object that maintains a grouping of what I called "depth objects" and the line glues itself onto a depth object that the line is directly underneath, or it creates a new depth object that will be looked at during later passes.

So why is this good?  It sounds stupid, right, adding things on line by line?  And keep in mind, these can be short lines too.  When I go through a row of pixels I will probably find several different lines of different layer depths in the depth image.  Why just not look to the depth pixels immediately surrounding the one I'm currently looking at, and letting it create a depth object by expanding outwards and checking outlying pixels?

Several reasons: It doesn't maintain any sort of order, my depth-checking of an individual pixel just beings expanding out to the surrounding pixels, and how do I know if I've found enough? That problem is likely to also be recursive, which if you are checking nodes for pathfinding maybe isn't so bad, but in my case I'm checking a 320x240 depth image, which if you do the math is 76800 pixels.  That's a lot of ground to cover, and even when I do find the edges of this certain depth I want to check, how do I find the next place to start checking?  I may very well just be jumping around to random places on the image checking all the pixels in no particular order with recursion, running up how much stack space I have left.  And this is happening at about 24 frames per second, on average.  Sure, computers could handle it, but on top of running a game?  A 3D game at that? You might as well set your computer on fire as is.

So by just collecting information of lines and then sending the lines to sorting into their appropriate depth object we manage to build up an entire grouping of depth objects whose size and location match the different depth's seen on your computer screen when you boot up the Kinect Skeletal veiwer.

Wait, you can see it already? So why did we go to all that trouble?  Because the computer is going to have to look at that image every time the Kinect sends out an DepthReady read event (whenever a new depth image is ready) and it keeps track of only which depth pixels are how deep, not their mass or the average location of the mass, that's what the Kinect bone tracker is doing.. to an extent.  It's still gives crazy and weird data sometimes, which is why we went to all the trouble to do this thing.  And plus, it has given us some pretty crazy game ideas that you will see up here for game jam and my capstone project, so stay tuned!
Indeed.


-Kevin