DECK officers on American aircraft carriers use hand gestures to guide planes around their vessels. These signals are fast, efficient and perfect for a noisy environment. Unfortunately, they work only with people. They are utterly lost on robotic drones—and even if a drone is under the control of a remote pilot deep in the bowels of the ship, that pilot often has difficulty reading them. Since drones are becoming more and more important in modern warfare, this is a nuisance. Life would be easier for all if drones were smart enough to respond directly to a deck officer’s gesticulations.
Making them that smart is the goal of Yale Song, a computer scientist at the Massachusetts Institute of Technology. He is not there yet but, as he reports in ACM Transactions on Interactive Intelligent Systems, he and his colleagues David Demirdjian and Randall Davis have developed a promising prototype.
To try teaching drones the language of hand signals Mr Song and his colleagues made a series of videos in which various deck officers performed to camera a set of 24 commonly used gestures. They then fed these videos into an algorithm of their own devising that was designed to analyse the position and movement of a human body, and told the algorithm what each gesture represented. The idea was that the algorithm would learn the association and, having seen the same gesture performed by different people, would be able to generalise what was going on and thus recognise gestures performed by strangers.
Unfortunately, it did not quite work out like that. In much the same way that spoken language is actually a continuous stream of sound (perceived gaps between words are, in most cases, an audio illusion), so the language of gestures to pilots is also continuous, with one flowing seamlessly into the next. And the algorithm could not cope with that.
To overcome this difficulty Mr Song imposed gaps by chopping the videos up into three-second blocks. That allowed the computer time for reflection. Its accuracy was also increased by interpreting each block in light of those immediately before and after it, to see if the result was a coherent message of the sort a deck officer might actually wish to impart.
The result is a system that gets it right three-quarters of the time. Obviously that is not enough: you would not entrust the fate of a multi-million-dollar drone to such a system. But it is a good start. If Mr Song can push the accuracy up to that displayed by a human pilot, then the task of controlling activity on deck should become a lot easier.