Russ Allbery and Jeff Townsend
BlackJack is a game that can be played entirely with hand gestures. It is easy to sit at a table and play for hours without saying a word. Thus, it seemed the perfect choice for developing a gesture recognition project.
Here is a basic summary of the game of blackjack.
We wanted to produce a system that was as close to the real thing as possible. The gestures used must be natural and familiar. In all cases where there was an equivalent casino gesture for a function, that gesture is supported. For some vital functions that have no gesture-only analog in the real game, we provided natural substitutes.
We wanted anyone familiar with playing blackjack to be able to hook up the system and begin playing with very little instruction. The biggest mistake in designing a system such as this is requiring the user to adapt to the system, rather than designing it with the expectations of the user in mind.
We used three technologies to impliment the game. For recognition of finger positioning, we used the CyberGlove from Virtual Technologies. It is a data glove worn over the hand that reports the angles of the fingers and thumb. Knowledge of the relative position of the entire hand was also required; for this we used a FASTRAK tracker (from Polhemus) mounted on the CyberGlove. The game was then displayed on a Responsive Workbench run by an SGI Onyx.
Neither the cyberglove nor the Polhemus were sufficient by themselves to produce the interface. Integrating the two in an efficient and simple manner to produce complex time dependant gestures was key in meeting the Design goals of the project.
There are many different betting options in blackjack -- the motions usually involve touching and moving real casino chips. Since we have virtual chips, we needed to change the betting paradigm to fit the virtual interface in a way that was consistent with the design goals of the project.
The following gestures were recognized by our system. Also included are the magic numbers we came up with to check against the raw data from the Cyberglove to determine whether the user's hand was in that gesture.
The standard blackjack hit gesture is a repeated curling of the fingers, with the palm either up or down. Rather than look for the motion, which produced more inconsistent results, we just looked for a closed hand gesture. We didn't check hand orientation.
Magic numbers: Index knuckle >= 135, middle knuckle >= 130, ring knuckle >= 120, index, middle, and ring first joints >= 110.
The standard stay gesture is palm flat and down, hand moving from side to side. The Cyberglove model that we were working with didn't have wrist sensors, so we just detected the flat palm. In order to distinguish the stay gesture from the split gesture, we required that the hand be flat.
Magic numbers: Index and pinkie knuckles <= 105, middle and ring knuckles <= 105, all first joints <= 85, azimuth angle on Polhemus tracker between 40 and 120 degrees (assuming tracker positioning such that the tracker cord was pointed straight up the arm of the user).
We decided to use a tap on the chips to represent a bet and a tap on the winnings to represent a desire to collect winnings rather than attempt to emulate a standard blackjack table (where the player would have to move a chip to the betting circle or collect the chips, something that wasn't possible in the interface we used). The betting gestures, the collecting chips gestures, and the double down gesture all collectively fell into the category of tap gestures; the actual gesture recognized was the same in each case, and the program just behaved differently depending on the x and y tracker location on the table. We originally tried to recognize a tap gesture (tapping on the chips on the table) with the Polhemus tracker, but this proved highly unsatisfactory because of the wrist-mounted position of the tracker. That required arm movement to make the gesture register.
Instead, we used just the Cyberglove and detected motion of the index finger, requiring that the middle and ring fingers by curled into the hand at the same time. Polling the glove for data every 50ms, we required two consecutive increases in the bend of the index knuckle (while the first joint remained straight) followed by a decrease in the bend and a total movement of 15 (in the units returned by the Cyberglove).
Magic numbers: Middle and ring knuckles >= 110, index first joint <= 110, total movement of 15, two consecutive polls (100ms) of movement.
The gesture to change chips of one denomination into another denomination was a double tap (tapping with both the index and middle finger). Only the motion of the index finger was tracked, just like with the tap gestures, but the middle finger was also required to be straight.
Magic numbers: Ring knuckle >= 120, index and middle first joints <= 110, total movement of 15, two consecutive polls (100ms) of movement.
The final gesture, indicating a desire to split the current hand, doesn't have a natural counterpart in the way blackjack is normally played (at a casino, you would move your cards apart and then put a bet next to each card). We invented a reasonable gesture that turned out to be very natural: holding the hand flat on its side and moving it first right and then left (the glove we had was left-handed).
Magic numbers: Hand position the same as for a split, but azimuth angle from the tracker not between 40 and 120 degrees, movement to the left for at least 6 units followed by movement back to the right for at least 6 units (Polhemus tracker data).
All of our gesture recognition code was determined entirely empirically; we took a best guess at an initial way to recognize the gesture and then refined our code with experimentation until it correctly recognized the gesture the way we intuitively performed it. This approach served our goal of making the gesture recognition as natural as possible; the project was intended to recognize the natural gestures a person would use playing real blackjack, not artificial gestures invented solely for the convenience of a computer.
The resulting gesture recognition was largely user-independent, but we did notice that some minor calibration to each user could be convenient if possible. In particular, some users preferred to have the constraints for the hit gesture extremely loose so only a small curling of the fingers would be required, while other users found that generated too many false positives and wanted stricter requirements before the gesture would be recognized.
One of the primary problems we had to overcome from the beginning was that the Cyberglove we had was on loan from Germany and therefore had no English documentation or libraries. Thankfully, further teams at Stanford shouldn't have this problem since Stanford should be getting our own Cyberglove. Although this lack of documentation caused some problems early, it proved useful in the long run since it forced an entirely empirical approach to writing the gesture recognition libraries which produced the best final results.
For this project, we used a Polhemus FASTRAK tracker, with a sensor mounted on the Cyberglove and another on the Responsive Workbench. The blackjack game was displayed on the Responsive Workbench, although the actual interface wasn't in 3D and didn't use the advanced display technology of the Workbench, since the flat surface of the table was a very natural interface for playing blackjack and tapping chips and cards.
Originally, we attempted to use the LWP FASTRAK library with the Stanford Workbench code, but it interfered in some way with the Cyberglove interface libraries, causing bogus errors from the FASTRAK library and extreme delays and errors in the gesture recognition. We fell back on the FASTRAK libraries written at GMD, which worked considerably better. Over top of those libraries, we used the tracker interface object written for the Responsive Workbench, which performed the coordinate transformation into Workbench coordinates from tracker coordinates.
After originally polling the Cyberglove too quickly, getting back bogus data, and then adjusting the polling rate until we got clean data, we arrived at a rate of one poll every 50ms. That worked reliably for all of our gesture recognition. We polled the Polhemus tracker once every 10ms, but it's quite likely we would have been able to poll it faster; that was more than sufficient for our uses.
One obvious need in a gesture recognition system is a rest position, a position of the hand where no gestures will be recognized by the system so that the user doesn't send gestures that they don't intend. After some experimentation and observation, we arrived at a rest position of fairly straight knuckles and slightly bent first finger joints, which proved fairly natural. The boundaries for the rest position were the hit gesture (bent knuckles) and the stay gesture (completely flat hand). Recognizing the rest position was one of the most challenging parts of the project, and even with our final gesture recognition code we got some false stays if the person's hand was too flat or the Cyberglove couldn't distinguish the small bends in the finger, and some false hits if the person's hand bent farther than we had calibrated the glove for.
As a fallback, we used the switch on the Cyberglove as an escape mechanism; no gestures were recognized if the switch was off.
One problem that we ran into while integrating the Polhemus tracking with the Cyberglove is that the mounting position for the Polhemus on the glove (the wrist) simply wasn't as accurate as we would have liked. Wrist movement was just too decoupled from finger movement, and the majority of movement was in the fingers and in wrist bends that the tracker wouldn't pick up.
Some of this is solved by the newer version of the Cyberglove, which has wrist bend sensorts, but ideally the Polhemus tracker should really be placed in the fingertip for applications like this. That would have allowed extremely accurate determination of the location of finger taps on the table.
Some of the problems that we had with rest position recognition make use think that a very brief per-user calibration session might prove useful. A short group of exercises should be sufficient to determine an individual's preferred rest position, preferred constraints on the hit gesture, and position of the wrist for a given position of the finger (in order to determine the exact location on the table of a finger tap). At the least, it would be worth trying to see how useful it proved.
Various portions of our blackjack interface could be improved, adding such things as letting bets ride between hands, implementing betting limits, house rules, and other standard features of casino blackjack, and allowing the user to select which hand in a split they wish to play first (rather than just playing them in a particular order).
Virtual Technologies' product information and pricing for the Cyberglove.
Polhemus's product information, features, specifications, and pricing information for the FASTRAK tracker.
The Responsive Workbench project at GMD in Germany, the origin of the Cyberglove and Polhemus libraries that we used in our project, as well as the original site for the Responsive Workbench, which we used as our display device.
An on-line German to English Dictionary we used to translate the error messages and comments in the interface libraries we used in this project. (!)