The following is a refactored translation of the talk I gave at the EVA10 Expo in Buenos Aires, on 11/12/10. It does not mirror exactly what I said during the talk, but it should give you a general idea. I’ve divided it into three parts: Player limits and systemic constraints (this post), Cheap tricks and easy ways out and Best practices.

The designer tasked with creating a game experience using some flavor of motion control is always fighting a pitched battle against two cunning and deceitful enemies: the tech he’s using, and the player he’s designing for. Both will lay traps for him that might prove dangerous if he designs naively.

Having had some catastrophic encounters with those foes myself, I now attempt to pass on the knowledge I’ve taken from those experiences, hoping you will be able to avoid easy mistakes when designing for motion capture systems.

Underlying constraints

When basing a game mechanic upon the detection of movements from a human player, it is important  to study both the tech and the player and discover their underlying constraints.

These are the limits you shouldn’t tresspass, as they are deeply linked to how technology works and to how a player thinks and moves. Going over the line in one of these aspects might prove disastrous for the project and translate directly as production delays, or worse, a botched, uninteresting game mechanic.

Player Limits

This set of constraints arises from the player’s psychology, cognition and anatomy. They will always apply, regardless of what revolutionary tech is invented; By keeping these in mind when designing you will boost how accessible and comfortable your game mechanics are.

Pre-existing control fantasy

What I call the “pre-existing control fantasy” (let’s settle for control fantasy from now on) is a mental construct made of how the player thinks the motion mechanic will work. It is made of three notions:

  • What the player thinks the tech will allow him to do,
  • What the player thinks the game will ask him to do, and
  • What the player knows about the real-world proxy of the motion mechanic.

The first tier of the control fantasy is the hardware marketer’s responsibility. To illustrate how this tier is built, I’ve compiled marketing videos from the major motion platforms:

What the video shows is the formal similarities in the way these technically different platforms are sold. In every advertisement, we are shown the sensor object (camera and/or “wand”) and are shown how easy and instantaneously it captures our every movement. We are also shown actors performing almost universally-recognizable motions (“dancing”, “golfing”, “tennis”, etc.). The aural component reinforces the metaphor and also promises instantaneous detection of any motion, however exaggerated it might be by the actors. Also, actual gameplay footage is entirely optional, which is understandable since at the time most of these commercials aired, games were being rushed to match launch dates.

Although game industry folks can’t help but smile at some bold claims made by the commercials, the game-buying public actually believes that what they’re being shown is what they’ll get, and any experience short from that will be disappointing. The exigence fades with time as the public gets used to what the platform can actually do, in stead of what they were promised at launch. The designer will have to at least match the current expectation of the users when implementing his mechanic.

The next tier of the control fantasy is the particular experience your game is based on. If you’re making a game about futuristic swordplay, players will start building an idea of what futuristic swordplay is like when watching preview videos, reading about your game or just looking at the box art when purchasing the game. The media he’ll see might be gameplay footage, fake gameplay footage (an actor performing aided by keen editing) or just atmospheric stuff (story trailers, concept art, audio, written material…), but all will contribute in building a general idea of the game.

In a nutshell, the player will already have imagined a version of the control scheme of the game based on what he imagines the game will allow him to do, or experience. Designers must take into account what most players will expect to be doing while playing; If you manage to stick close to what first comes to mind when thinking about the experience the game will provide, the player will learn the moves easily and enjoy the game right from the beginning.

Finally, the player will add his experience with the best real-life proxy to the game experience he can find. A game about swordfighting will not produce the same expectations on a professional fencer than on a boy who plays with a wooden sword, or someone who is a fan of Kurosawa films. Nevertheless, all will bring some part of personal experience into the control fantasy, and the designer needs to guess what will be the most prevalent to tailor the control to what most players have already experienced that was like the control scheme they’re building.

Combined, the three aspects form a strong idea of how the game will play in the player’s mind. The designer must study how the tech and the game are (or will be) perceived by the target audience and what most of that target audience might have experienced that is close to what the game offers. By studying how the pre-existing control fantasy is built, he will be able to tailor the control scheme to avoid misunderstandings and make it more natural to most players, reducing the number of players who will need detailed instructions.

Untrustworthy and fallible controller

One of Microsoft Kinect’s mottos was “You are the controller”. In a way, this affirmation actually holds true for all games, not just Kinect-enabled ones; The player is always the one who is in control of the game.

Of course, the most straightforward interpretation is that the player is the equivalent of the control pad, i.e. the input mechanism. This interpretation is flawed: the controller is the camera, wand or floor mat connected to the computer; Aren’t buttons also motion-sensing devices?

And yet, for the sake of illustration, let’s take this phrase literally: the player IS the controller. This means he is also the worst controller ever: not only is he prone to errors caused by lack of attention or clumsiness, he will often fail to recognize he even made a mistake!

Buttons are great for gaming because they leave no room for interpretation: Either they’re pressed, or unpressed. There is no ambiguous state where a button might be pressed or not pressed depending on external circumstances.

Irresistible!

The player, however, can fail to do an acceptable input for a variety of reasons. Maybe he’s holding the wand incorrectly, maybe he’s wearing a bright sweater, or stumbled with a piece of furniture. Maybe he’s left-handed, or wasn’t paying attention when the game explained the controls. The player will fail.

Even more aggravating, the player is very bad at realizing he made a mistake. When a misdetection occurs during motion play, it is more often than not a player error (doing the wrong move at the wrong time) rather than a detection error (doing the right move in a way the system can’t recognize). The crux of the problem is that the player is also the judge of whether the system interpreted his move right or not, and his judgment is dependent on how much time he’s had to reflect on what he’s done. I call this “Judgement Lag”.

This judgement lag means that the player will be less inclined to accept the fact he got a move wrong the sooner after said move the game tells him so. This effect is amplified if you require the player to do quick and/or violent moves, or to react quickly. When reacting to something, we spend little to no time thinking about exactly what our body is doing. The analysis (and thus value judgement – whether we did good or not) comes some time after the movement is completed.

You need to take special care of this judgement lag if you require some flow in the way moves are chained together. One smart example of good handling of this lag is Ubisoft’s Just Dance. In it, while it is technically possible to detect, you are not told immediately that you got a move wrong. Instead, you are told at the next move. Even if you didn’t get that move wrong, enough time has elapsed that you managed to analyze what was asked against what you achieved and managed to see if you were wrong or not. If the game told you you got it wrong right after making a move, and your judgement had not had time to reach a conclusion, most players would drop their arms in protest and stop dancing altogether, which is catastrophic to the enjoyment of a dance game. Just Dance’s player error management is what keeps the flow going in the face of player mistakes.

Your game should also give the player the bad news at a time he is ready to process them, lest you’ll be breaking the flow at every player mistake. And trust me, you don’t want that.

Ambiguous interpretations

The player, being human, is able to understand that a sign can hold many different meanings depending on the context, and even many different meanings at once. This is what semiotics are all about. For the interests of the designer of motion games though, semiotics are a real pain.

As I said talking about the pre-existing control fantasy, the player will approach your game with an idea of what he will have to do to control it. I also said that every player will add a different component in the form of his previous experience with the nearest real-life proxy of the experience you’re building. Since everyone has a different life experience, the designer needs to specify the correct move the system knows how to detect and make that instruction available to the player.

Alas, it is very hard to create a concise, absolute, non-ambiguous instruction, even when using words with pictures. The nature of signs means that there will always be players who misinterpret the instructions at least once, providing another source of player error you will have to take into account.

As a designer, you should never underestimate the player need for exact, precise instructions.

I can recall two examples in my short experience where being more thorough when providing instructions would’ve saved precious time. The first one was a game where I asked players to “do a throwing motion”. 9 out of 10 players had absolutely no problem in completing the movement, but one female player (who “threw like a girl”, as much stereotypical as it sounds) could not do it, for as much as she tried. Her interpretation of the word “throw” was too different to what the other players where used to, yet she was following the instructions and was sure she was doing the correct movement. The game was somehow “not getting it”.

The second example’s problem is also due to cultural gender differences, which goes to show how important the “personal experience” tier of the control fantasy is. In this other (comedy) game, players had to use the Wii Balance Board to direct a stream of liquid by shifting their body weight. The behavior and display of this stream was such to mimic the act of urination. Male urination, that is – stable feet, straight legs, outward hip movements. The game was tested by giving no instructions to the playtesters, as we were hoping to see if they got the “joke”. Male testers got it pretty quickly and smirked throughout the play sessions, while female testers did well at the game but were clearly not making any connections. When we explained them that it was actually mimicking urination, female playtesters invariably fared worse at the game than before; the instructions had polluted their control fantasy, and they started mimicking female urination motions (crouched position, stable feet, inwards hip movements), which we couldn’t detect with the balance board since the shifts in weight were too slight.

There are many more examples I could give, but these gender-specific ones are very useful at demonstrating the relationships between control fantasy and provided instructions. A basic rule of thumb to remember here is that you should never trust yourself when judging a motion instruction “enough”. You should always test it with the most diverse public and see if it still means the same thing. Another rule is that 100% effectiveness of instructions is impossible, but 95% is possible with moderate testing.

Anatomically constrained

Finally, you should always think about the player’s physical integrity. Safety first, as they say. I can’t imagine a studio head or publisher would be keen on paying player injury bills or facing a health scandal PR storm.

You should always make sure to design within the physical limits of the player, not beyond (unless you are working on an expert training serious game or a punishing art-game).

Do not ask from the player motions that are beyond his maximum range of movement; care for his articulations. Your players are not contorsionists.

Do not ask from the player impossible or strenuous motion combos. The speed at which a person can do a movement is proportional to the amplitude of the movement. Broad movements take longer to complete than shorter ones. This is the reason why motion games’s pacing often feels slower than the rest; spamming a button is easy because of the small amplitude of movement needed. If instead of pressing a button with your thumb (low amplitude), the game asks you to wave your arm up and down (high amplitude), the maximum movement frequency you can reach will be lower. The designer should take this inverse relationship between attainable frequency and movement amplitude into account when designing input timings.

= Breather =

Those were all the constraints relative to the player of motion games I could discover so far. Have you found other type of constraints of this kind? Feel free to share them in the comment section.

Technological constraints

In addition to the limits of the player’s body and perception, the designer must take into account the constraints inherent to the technology he’s designing for. Be it IR sensors, accelerometers, cameras with image analysis or a combination of all these methods, the constraints are shared by all existing and future motion gaming systems. A good understanding of these is important to design within the field of what’s possible to detect with good accuracy to provide better motion experiences.

To better explain these constraints, I’ll be basing my explanations upon a simple model:

PLAYER  =>  SENSOR  =>  TREATMENT SYSTEM  =>  DISPLAY  =>  PLAYER
where => means "sends information to"

Latency

Latency is a given constant in any computer system. To make things simpler, I’ll divide it in Sensing Latency, Treatment Latency and Display Latency.

Sensing Latency depends on the frequency in which the sensor sends information to the system about a movement, the amount of information contained in a “sensor frame” and the bandwidth available between the sensor and the system. This type of latency is usually constant and can be written off since the maker of the game system is at least expected to make sure the sensor performs normally. The actual value depends on the particular system you’re designing for.

Treatment Latency is highly variable because it’s specific to the software you’re building and the tasks it needs to do. It’s the time the game engine needs to parse the incoming sensor data and extract meaningful input from it. It varies according to the available processing resources (more is faster), the amount of input information (more is slower) and the complexity of the input it’s looking for (more is slower).

Display Latency is caused by the time the screen takes to display a new image and how long the player’s brain takes to perceive it. It depends on the person and type of screen he’s using, but its values are usually constant between 100 and 200 milliseconds.

Obviously, in virtually all cases you’ll be looking for lower latency. If you manage to keep it below the display latency for most players (around 200ms), it will be unnoticeable. If you go higher, there’s an ever-increasing chance that the player will complain about sluggish or unresponsive controls.

Signal/noise ratio

If you need to bring latency down to acceptable levels, you need to study the Signal-to-Noise Ratio (or SNR) of the system you’re designing for. This ratio depends on what you’re looking for (Signal) and everything else (Noise).

Difference between signal and noise

 

You always must try to have a strong signal and weak noise, thus raising the SNR. The problem with motion sensing is that it’s hard to increase signal noise, other than asking broad, clear and identical movements from the player (see above for an explanation of why that’s very hard), or asking him to wear specific clothes in the case of image-analysis systems.

It is also possible to increase the SNR by using a subset of the total sensor data. The subset must be chosen accordingly to what you’re looking for to avoid throwing out useful information.

Take the following two images as an example:

 

If we're looking for "gloved hand", the regular one might be a false positive and require additional treatment..

By filtering the image data, the differences between the gloved and regular hands are more apparent.

The analog example for accelerometer data would be choosing a particular axis to try to detect the requested movement.

The last resort to increase your SNR is to simplify the Signal you’re looking for. In our case, this translates to the phenomenon known as “waggle”, and we’ll see later why that’s bad.

So, in a nutshell, a high SNR means lower treatment latency and higher detection rate, but the methods you’ll use to increase that SNR might render your movement mechanic too simplistic, uninteresting or boring. This is why most gameplay programmers lose their hair young.

Lack of contextual data, or information state

The last technical constraint I’ll be talking about is the lack of native meta-information, or information state. This is a typical problem of accelerometer-based systems, in which in some cases doing the same movement holding the sensor in a different way will yield different results, because the sensor has a hard time knowing how it is being held.

Camera-based technologies have the benefit of allowing the designers to assume that its axes are fixed – constant “up-down” and “left-right” directions. The counterpart to this is that they can not make assumptions about the player’s position in respect to the sensor. He could be turning it’s back to the screen or standing on his hands, which would send the sensor into disarray.

This means that it is necessary to either make sure the player complies and maintains a predefined position (the true cause of calibration screens) or that it is necessary to spend extra processing time in order to extract the information state from the information (which can always fail because of some case was not predicted).

Your best bet is to design your game in a way that the player can reach and maintain as easily as possible the same information state, thus keeping the sensor infomation “in context”.

Stop!

This concludes the enumeration of the constraints and limits the keen designer should keep close when designing, prototyping or refining a motion mechanic, regardless of platform. Keep in mind that these come only from personal experience and are by no means meant as exhaustive. If you found another rule shaping motion game design and feel like sharing, the comment section is your best friend.

The next part in this series will be the list of some cheap tricks put forth to work around time, budget or bad design factors. Stay tuned!