Wednesday, May 8, 2013

Why facial recognition tech failed in the Boston bombing manhunt

Despite what you see on TV, facial recognition isn't a silver bullet.

The drivers license and student ID photos of Dzhokhar Tsarnaev, and images published by the FBI and Massachusetts law enforcement during the manhunt for him and his brother.
FBI, photo illustration by Sean Gallagher
In the last decade, the US government has made a big investment in facial recognition technology. The Department of Homeland Security paid out hundreds of millions of dollars in grants to state and local governments to build facial recognition databases—pulling photos from drivers' licenses and other identification to create a massive library of residents, all in the name of anti-terrorism. In New York, the Port Authority is installing a "defense grade" computer-driven surveillance system around the World Trade Center site to automatically catch potential terrorists through a network of hundreds of digital eyes.
But then an act of terror happened in Boston on April 15. Alleged perpetrators Dzhokhar and Tamerlan Tsarnaev were both in the database. Despite having an array of photos of the suspects, the system couldn't come up with a match. Or at least it didn't come up with one before the Tsarnaev brothers had been identified by other means.
For people who understand how facial recognition works, this comes as no surprise. Despite advances in the technology, systems are only as good as the data they're given to work with. Real life isn't like anything you may have seen on NCIS or Hawaii Five-0. Simply put, facial recognition isn't an instantaneous, magical process. Video from a gas station surveillance camera or a police CCTV camera on some lamppost cannot suddenly be turned into a high-resolution image of a suspect's face that can then be thrown against a drivers' license photo database to spit out an instant match.
Not yet. Facial recognition technology has gotten a lot better in the past decade, and the addition of other biometric technologies to facial recognition is making it increasingly accurate. Facial recognition and other biometric and image processing technologies, such as gait recognition, helped law enforcement find the suspects in the rush of people around Copley Place that day with the help of retailers' own computerized surveillance systems.
The fact is that it's much more likely for a bank or department store to know who you are when you walk past a camera than for law enforcement to make an ID based on video footage. That's because you give retailers a lot more information to work with—and the systems they use are arguably better suited to keeping track of you than most police surveillance systems.

Three steps to (sometimes) finding the perfect match

Under the best circumstances, facial recognition can be extremely accurate, returning the right person as a potential match more than 99 percent of the time with ideal conditions. But to get that level of accuracy almost always requires some skilled guidance from humans, plus some up-front work to get a good image. Depending on the type of facial recognition system, finding the right match usually requires three stages of processing.

Face detection and enhancement

The software looks for patterns in the image that match models in its algorithms for faces. A simpler form of this technology is used in consumer cameras, in photo apps for mobile devices, and in entities like iPhoto or Facebook.
In some circumstances, even detecting a face within an image can be difficult for software without human guidance. Lighting, camera angle, and facial expression can all muddle the process. A photo will often be taken from an angle that requires investigators to do preprocessing. "Typically, you'll do some preprocessing of the image," said Brian Martin, director of Biometric Research for facial recognition system provider MorphoTrust USA. "You can try to get rid of blur or the interlacing artifacts from older cameras. Some people use Photoshop to clean up the image; our company has what we call ABIS Face Examiner Workstation, which is face-specific tools to clean up an image. You can take a non-frontal looking face and physically model it as a three-dimensional image, then rotate it toward the camera and re-render a new face. So you do this sort of cleanup of the image and then submit it to the database."
Enlarge / At left, a face from an ATM camera video is recognized and evaluated for facial recognition quality; at right, a photo of a face is enhanced with a 3D model to improve its searchability.
If an image is too low-resolution, sometimes multiple images can be combined to create a higher-resolution composite. Lower resolution images may still work, but the results are more likely to misidentify the person—or miss him or her completely.
"Hollywood does a pretty good job of creating a myth that you could extract a better image by enhancing and zooming where information wasn't captured," said Masayuki Karahashi, senior vice president of engineering for surveillance and video analysis technology firm 3VR. "You're not going to create more information out of nothing.

Feature registration and extraction

Next, the software tries to identify common facial features to use as reference points to extract a "faceprint"—the centers of the eyes, tip of nose, and corners of the mouth are common features used for this. Again, depending on the quality of the image, a human may have to help the software with this, marking the location of reference points to help the software along.
With the reference points set, the software then adjusts the image to "normalize" it against the images in its database—making sure the face is scaled to the same size and removing other elements of the photo that might reduce the likelihood of a match. Then it runs calculations on the image to generate a faceprint. This is a binary value based on a mathematical representation of the patterns in the face.
There are several approaches to creating a faceprint. Some systems use algorithms that measure the distance between sets of features in the normalized image, while others detect contours and "facial boundaries."
Feature extraction is "the classic way" to gather data for facial recognition, according to Parham Aarabi, a professor of computer science at the University of Toronto and CEO of facial software firm ModiFace. “Another way is to do a direct match," he noted. This technique involves using the facial image itself as the basis of comparison rather than using an algorithmic representation. "A lot of the more recent work in facial recognition has been in direct face-to-face matching," Aarabi said. Other systems use multiple images of an individual to "learn" their facial characteristics to build a model, much like the Faces feature in Apple's iPhoto.
But in all of these approaches, the more detailed a source image is, the better. More data to base the faceprint on means a higher likelihood of success in the next steps—matching and classification.

Matching and classification

The feature-based faceprint of a subject can be used in a number of ways, depending on the facial recognition application. Some systems perform additional indexing based on the images to classify the subject for narrowing searches, processing the faceprint with algorithms that can estimate the age and gender of the subject. Other characteristics, such as skin tone and facial features, can be used to help index the image as well, allowing for searches to be narrowed by race, estimated weight, or hair color.
Classification can also be used with what Martin called "short-term biometrics"—things such as gait recognition, or clothing, or other identifying features (such as a black backpack). These all can help locate a subject within a set of images or video streams. This approach was used to find the Tsarnaev brothers in surveillance video and other images collected from multiple sources by law enforcement. Video analysis showed Dzhokhar walking quickly and calmly away from the site of the second bomb as the first exploded; characteristics such as the brothers' ball caps and backpacks were used to quickly identify the suspects by retailers. These businesses had surveillance systems from vendors such as 3VR that could recognize relevant footage in their systems to provide to law enforcement.
"The fact that they were able to start looking for a person with a white baseball cap, a black bag—they were able to use those as variables to pull up videos," said Masayuki Karahashi, 3VR's senior vice president of engineering. Several 3VR customers were able to automatically pull results from their systems to provide to law enforcement from terabytes of video footage from the day.
Finding the actual identity of someone in an image still requires a match against a facial database. In a facial recognition search, the binary faceprint of the subject is checked against those of a collection of "candidate" images. The bigger the pool of "candidates," the longer it takes to find a match—and the larger the pool of possible matches will likely be.
Performing matching, like everything else in facial recognition, requires significant computation resources. "Given how fast computers have become, it's not that much of an issue," said Aarabi. "If you narrow down a database to 10 million potential matches, that can be done in a reasonably short amount of time, so matching is not really a bottleneck anymore."
According to some National Institute of Standards and Technology benchmarks performed in 2010 (PDF), "Using the most accurate face recognition algorithm, the chance of identifying the unknown subject (at rank 1) in a database of 1.6 million criminal records is about 92 percent." But the study found that for larger data sets, such as the FBI's 12 million image database, the accuracy of searches rapidly degrades. "For other population sizes, this accuracy rate decreases linearly with the logarithm of the population size. In all cases a secondary (human) adjudication process will be necessary to verify that the top-rank hit is indeed that hypothesized by the system," the authors of the study wrote.
Under ideal conditions, a facial recognition scan can at least come close to how such things play out in the movies. And even though facial recognition requires significant computing power to pull off, cloud computing and improved graphics processing are making it a lot easier to deploy—even to consumer devices. In testimony before the Senate Judiciary Committee last July, MorphoTrust's Martin told senators, "The technology is currently at a state where these face recognition algorithms can be deployed in anything from cell phones to large multiserver search engines capable of searching over 100 million faces in just a few seconds with operational accuracy."

That driver’s license photo is worse than you think

All that search speed, however, depends on the quality of the images in the database. Simply put, "if you don't have a good database, [you] won't get a match," Aarabi said. There's more to having a good facial database than having the suspect's picture in it.
Part of the problem investigators faced was that the facial database that did have images of at least one of the suspected bombers in it was built for a very specific purpose—preventing people from obtaining fraudulent driver's licenses.
Originally provided by Digimarc ID Systems started in 2006, the Massachusetts Registry of Motor Vehicles' facial recognition system is currently maintained (thanks to a series of corporate acquisitions) by MorphoTrust USA. The system, purchased with a $1.5 million grant from the Department of Homeland Security, holds images of the more than four million licensed drivers in the state of Massachusetts. It regularly catches as many as a thousand fraudulent license applicants every year.
"The DMV systems, they essentially have facial recognition in place because they want to prevent fraud with people trying to get licenses under different names," said Martin. "There are two cases that they typically use facial recognition in—the first is when you apply for a new license. They check against the database of images to see if they get a hit for someone under a different name. Then some human examiner goes through to see if there is a fraud case. The second case is to ensure, if you've had three or four licenses in the past, that your new one matches those past licenses."
In both of these scenarios, Martin said, "The capture of the photo is pretty controlled. You're looking at the camera with a flash and looking directly at the camera, so you can get really high accuracy."
Because the faces being matched are all in the same format, with the same lighting, and of essentially the same resolution, the Registry's system is the facial recognition equivalent of shooting fish in a barrel. Even so, a bad match occasionally manages to squeak by. That's what happened to John Gass.
In 2011, Gass' license was revoked by the Registry of Motor Vehicles when another driver's image matched his enough to fool the system—and likely the inspector who checked the results. He ended up suing the state for loss of wages because of the 10-day ordeal he went through to prove that he was, in fact, a unique being. This sort of case is "incredibly rare," Martin said.
But when applied to the task of locating a terrorism suspect, the Registry's database is less than ideal. Despite their relatively perfect capture of each person's forward-looking face, all its photos of individuals are from the same angle.
There's also the issue of how that face changes. Dzhokhar Tsarnaev's drivers license photo was taken when he was 16; his facial structure may have changed during the last three years as he grew. And while many photos of Tsarnaev emerged once he was determined to be a suspect, the early images that law enforcement had to work with were less than ideal for matching up against a driver's license photo.
"If they had the driver's license image, it was a few years old, and they might have looked much different, and might not have been able to get a match from blurry surveillance image of the guy," said Martin. "It was a relatively hard case for face recognition. With some of the later images that came out, they weren't impossible to work with, and I think the technology could have come up with a match. But the ones the FBI posted on the website, I don't think there was a chance for matching them. It was too hard."
Inevitably, it came down to time. As more images were collected, a positive facial identification could have been made for Dzhokhar Tsarnaev—images were combined and a more detailed composite was assembled. But the desire to quickly apprehend the brothers led to the FBI publishing the surveillance images on April 18, hoping that the images would spur tips from the public. They ended up inspiring the Tsarnaevs' attempted flight. That ordeal contained many incidents authorities would likely want to forget: the killing of an MIT police officer, a car-jacking, and a shootout with police. According to law enforcement accounts, that last incident saw another policeman wounded and Tamerlan Tsarnaev shot; he was then run over by his brother in the stolen vehicle.

Uncontrolled environments          

Surveillance video of the Tsarnaev brothers captured by a retailer's surveillance camera near Copley Square.             
There were a lot of pictures taken of the Tsarnaev brothers on April 15—some of them blurry, many of them not. But few of them were good candidates in the early hours of the investigation for getting a good match against a driver's license photo. Many of the best images came from digital surveillance systems set up at Lord & Taylor and other retailers near the bombings. Sadly, those cameras weren't in the best position to get a clean shot at the brothers' faces either. "These were uncontrolled environments," said 3VR's Karahashi. "In case of Boston bombing, you have so much raw footage, and they were processing video from cameras that weren't designed to be capturing faces." There were plenty of megapixel surveillance cameras in the mall, stores, and hotels around Copley Square—"but there [were] also a lot of analog and VGA resolution cameras," Karahashi said.
Some of those cameras were intended to capture faces, just not of people passing outside. Retailers, banks, and casino operators are among the businesses who have made the biggest investments in video surveillance, footage analysis, and facial recognition technology. Casinos in Las Vegas were among the first to embrace facial recognition—largely to help them keep "undesirables" off their gaming floors—and they have invested heavily in video analysis systems that watch gaming tables for regulatory purposes or track car license plates as they go in and out of garages. Retailers and banks want to capture people's faces for similar reasons: "loss prevention" in retail, regulatory purposes, and security in banking. But retailers also want to be able to use surveillance footage to improve marketing and track customer behavior in their stores.
"In a store, you know people are going to come into [the] store through specific entrances, and [you] know that they'll be about a certain height, so if I have this camera angle, I'll have a pretty good rate capturing faces," said Karahashi.
Video analysis systems can categorize and track individuals across multiple video feeds. But they can also pull in other sources of time-indexed data for context—such as transactions, alarms, and other events in a company's information systems.
Retailers can watch individuals walk the floor in a playback and see what they bought, if anything, and then adjust marketing to improve their "conversion rate." If someone calls a bank to complain about fraudulent ATM transactions, Karahashi said, a bank can pull up video footage from ATMs at the time of the transaction to spot the person making them. They can then use that person's image to search for other incidents across the entire network. A single complaint could uncover a broader ATM skimming operation.
But those scenarios all play out within a single system, and the digital cameras in Lord & Taylor and other stores couldn't capture the faces of passers-by unless they "volunteered their faces," Karahashi said.
One of the early images of Dzhokhar Tsarnaev released by the FBI. The profile image would have been useless in a facial recognition search of the Massachusetts Registry of Motor Vehicles database.
Enlarge / Another early image released by the FBI. The low resolution of this image would have made a false negative—a total miss in the search—more likely, according to experts.
Enlarge / The angle of the footage from this retailer's surveillance camera, in combination with lighting, the sunglasses worn by Tamerlan Tsarnaev, and the ball caps worn by both of the brothers would have made a facial match difficult.

Crowdsourcing the cameras, beefing up the database

Enlarge / A chart from the 2010 NIST study of facial recognition algorithms, with images from the FBI's Multiple Encounter, Deceased Subject (MEDS) facial database. Converting multiple images over time of an individual into a single model for search improved the probability of an accurate match.
Even without surveillance cameras at face-level, there were plenty of cameras on Copley Square that were in position to capture the Tsarnaev brothers' faces. "One of the things we have now more than ever before is multiple mobile device images of the same scene," said Aarabi. "You can now take multiple mobile images and combine them to make a high-resolution image of a person's face—you can have 10 photos from 10 different angles and they can be combined to expand your chances of finding the right person."
But if the goal is to have the ability to pull the name of someone on a surveillance camera feed out of the air at an instant, the problem may not be solved by more surveillance cameras. As NIST pointed out in its study of facial recognition, "multimode biometrics"—the combination of multiple images and characteristics of a person—can dramatically improve the value of biometric databases. Just having multiple photos of a person in the database from different angles and with different facial expressions can significantly improve the probability of a match.
Current systems "are not complex enough," said Kushan Ahmadian, an Alberta-based software developer who received his PhD in computer science studying biometrics and facial recognition. "If you use video and record their movement pattern at the time you take their picture, it significantly improves the quality of recognition."
Martin agrees that multimode can help. With high-resolution cameras, multiple biometrics can be collected by the same camera. He said iris recognition could be combined with facial recognition in some applications, since high-resolution cameras can pick up iris patterns in photos. According to Martin, researchers are even looking at using skin pores for identification. "If you have a high-enough resolution image, you can detect pore patterns that are unique to an individual that would distinguish him even from an identical twin," he said.

License and (biometric) registration

This sort of mult-mode biometric identification is already being used by the Defense Department and law enforcement agencies. The military widely used iris recognition along with photo recognition to record the identities of individuals in Iraq and Afghanistan. Police departments have begun to collect iris data on arrestees, and irises are part of the biometrics used by the DHS's US-VISIT database. That database is used to screen people entering the country to determine if they're illegally coming into the US.
Companies with an eye on knowing who wanders through their hallways have kept facial databases with corporate ID systems for more than a decade. The falling price and improved resolution of biometric collection hardware may lead the more security-conscious to increase the capabilities of their digital identity databases—especially as facial recognition systems become incorporated into login credentials. Financial institutions are starting to look at facial recognition systems to reduce transaction fraud. In April, the London-based software firm Facebanx introduced a technology that will let customers submit their own facial image to be added to their account information via webcam or mobile device to increase their accounts' security. Healthcare organizations are looking at iris scans for more reliable patient identification.
No one is standing in line at the DMV for multiple digital mug shots and iris scans yet. But with the technology within reach and demands for better surveillance voiced after the Boston bombing, it may soon become routine to get your pores and irises recorded by the DMV in addition to a front and side photo. It's a development with massive implications for security. Sadly, there's no guarantee you'll like your picture any better.

No comments:

Post a Comment