The Democratic National Committee attempted to raise awareness of the hazards of AI-doctored movies last week at the Black Hat cybersecurity conference in Las Vegas by exhibiting a deepfaked video of DNC Chair Tom Perez. Deepfakes are videos that have been altered with deep learning algorithms to superimpose someone’s face onto another person’s video.
As the 2020 presidential election approaches, there is growing worry about the possible risks to the democratic process posed by deepfakes. The House Permanent Select Committee on Intelligence of the United States Congress held a hearing in June to address the concerns posed by deefakes and other AI-manipulated media. However, there is also debate over whether tech corporations are prepared to deal with deepfakes. Rep. Adam Schiff, chairman of the House Intelligence Committee, raised alarm earlier this month that Google, Facebook, and Twitter do not have a clear plan to address the issue.
Fear of a deepfake onslaught has sparked a slew of projects and efforts to detect deepfakes and other image- and video-tampering techniques.
Inconsistent blinking
Deepfakes use neural networks to superimpose the target person’s face on an actor in the source video. While neural networks are capable of mapping the attributes of one person’s face onto another, they lack knowledge of the physical and natural qualities of human faces.
That is why they can reveal themselves by causing unusual phenomena. Unblinking eyes are one of the most noteworthy artefacts. Before deepfakes can be created, the neural networks that generate them must be trained by providing them examples. Images of the target person are used as examples in the case of deepfakes. Because most of the images used in training have open eyes, the neural network tends to produce deepfakes that don’t blink or blink in unusual ways.
Researchers at the University of Albany published a study last year on a method for detecting this type of discrepancy in eye blinking. Surprisingly, the technique employs deep learning, the same technology that was utilised to make the bogus films. The researchers discovered that neural networks trained on eye blinking videos could localise eye blinking segments in videos and evaluate the frame sequence for abnormal motions.
However, with technology improving by the day, it’s only a matter of time before someone creates deepfakes that can blink naturally.
Tracking head movement
Researchers at UC Berkeley recently built an AI algorithm that detects face-swapped videos using something considerably more difficult to fake: head and face motions. Every person has their own set of head movements (for example, nodding while presenting a fact) and facial expressions (for example, smirking when making a point). Deepfakes inherit the actor’s head and face gestures, not the targeted person’s.
A neural network trained on an individual’s head and face gestures would be able to detect videos with head gestures that do not belong to that person. To put their model to the test, the UC Berkley researchers used real-life films of world leaders to train the neural network. The AI detected deepfake films of the same people with 92% accuracy.
Head movement detection provides a strong defence against deep fakes. However, unlike the eye-blinking detector, which requires only one training session, the head movement detector requires training for each user. As a result, while it is appropriate for public personalities such as world leaders and celebrities, it is less suitable for general-purpose deepfake detection.
Pixel inconsistencies
When forgers alter a picture or video, they try to make it appear realistic. While picture modification can be difficult to notice with the naked eye, it can leave some artefacts behind that a well-trained deep learning algorithm can detect.
The University of California, Riverside researchers created an AI model that identifies tampering by analysing the edges of objects contained in photos. The pixels near the edges of objects that have been artificially inserted or removed from an image have unique properties, such as unnatural smoothing and feathering.
The researchers at UCR trained their algorithm on a large dataset of annotated samples of untampered and tampered photos. The neural network was able to identify common patterns in photos that define the borders of modified and non-manipulated items. When given new photos, the AI detected and highlighted modified objects.
While the researchers tested this method on static photos, it may also be applicable to videos. Deepfakes are essentially a succession of modified image frames, therefore the same object manipulation artefacts can be found on the margins of the subject’s face in those individual frames.
While this is a useful strategy for detecting a variety of tampering tactics, it may become obsolete as deepfakes and other video-manipulation tools get more sophisticated.
establishing a truth baseline
While most efforts in the field focus on detecting tampering in videos, an alternative approach to combating deepfakes is to show what is true. This is the method taken by researchers at the University of Surrey in the United Kingdom in Archangel, a project they are testing with national archives from around the world.
Archangel blends neural networks and blockchain to create a smart archive for archiving films that can be utilised as a single source of truth in the future. Archangel trains a neural network on various video formats when a record is added to the archive. The neural network will subsequently be able to distinguish if a new video is identical to the original or a manipulated version.
Traditional fingerprinting methods validate file authenticity by comparing them at the byte level. This is incompatible with video, because the byte structure changes when compressed in different codecs. However, because neural networks learn and compare the visual aspects of the video, it is codec-independent.
Archangel saves these neural network fingerprints on a permissioned blockchain managed by the national archives participating in the trial programme to ensure their integrity. Adding records to the archive necessitates agreement among the organisations involved. This means that no single entity can judge which videos are genuine. When Archangel is made public, anyone will be able to run a video through the neural networks to verify its validity.
The disadvantage of this strategy is that it necessitates the use of a trained neural network for each video. This can limit its application because training neural networks takes hours and requires a lot of processing resources. It is, however, appropriate for sensitive videos such as Congressional records and speeches by high-profile persons, which are more likely to be tampered with.
A game of cat and mouse
While it is encouraging to see these and other efforts protecting elections and individuals from deepfakes, they are up against a rapidly evolving technology. As deepfakes become more advanced, it’s uncertain if defence and detection technologies can keep up.