Tuesday, May 12, 2015

YouTube's Content ID Program is Leading to Hundreds of Distorted Variations of Songs and Shows

Recently while looking for a song on YouTube, I came across what appeared to be the song I was searching for, but with a slightly-off pitch. I assumed it was a fluke and moved on. However, over the past few months I have noticed hundreds of videos, both of popular songs and hit television shows, that have been distorted in some way or another. There are songs that have been slowed down or sped up by ~5%, videos that are missing the top or bottom 20%, videos that have black boxes over parts of the video, audio that has had its pitch altered, videos where the original source is shrunk to half its size and embedded within some background, the list goes on.

After a bit of digging, it became apparent that this was almost certainly an attempt by the uploader to defeat YouTube's Content ID recognition system. For those who aren't familiar, Content ID is an automated system used by YouTube to detect copyrighted works within videos uploaded to the site. In most cases, Content ID will alert the uploader that the work is copyrighted and give them the option to swap the audio, or display advertisements on the video (proceeds go to the rights holders). While this is arguably a workable approach to copyright detection, it has mis-identified countless videos in the past (example another another). Besides the constant false positives and the lack of recourse for users accused of a violation, another major issue exists with Content ID: it struggles to detect derivatives of an original work, leading to multiple copies, all of them distorted in some way.

Given that Content ID is an automated system, it lends itself to the constant back-and-forth battles that exist with such technologies; for every attempt YouTube takes to curb copyright violations, the violators will seek ways to defeat the system. This very issue is now littering YouTube with thousands, if not millions, of defective copies of original works.

Take the hit show, "Shark Tank," as an example.

Here's a copy of a recent episode where about 10% of the left side of the video has been removed: https://www.youtube.com/watch?v=O9wPE-GZpQI.

Here's an episode where the entire video has been framed by a black box and translucent lines: https://www.youtube.com/watch?v=9kJq9hNgPrw

In this episode, the uploader shrunk the entire video within a large black border: https://www.youtube.com/watch?v=bmg5oqXWskY.

Here's an even more annoying example where the actual video only makes up about 40% of the box, while the rest if filled with a background pattern: https://www.youtube.com/watch?v=JB78NFhmIYs.

In this copy, the uploader has added a giant image background behind the original video, which has also been cropped: https://www.youtube.com/watch?v=fsSlp9b2P4Q.

When it comes to audio changes, pitch shifting on Taylor Swift's music has become so rampant, that not having pitch shift has become a selling point:

Unfortunately, I doubt that Content ID will ever be able to detect all of the variations of videos and songs that are copyrighted. Because these works have now been altered, sometimes to an almost laughable extent, I'm afraid that the outcome for the original artists is actually worse than if the originals were allowed to remain. Now, instead of hearing Taylor Swift's "Welcome to New York" the way she intended, I've heard about a hundred different variations, some fast, some slow, some higher pitch, some low. It remains to be seen how YouTube will respond to these videos, but my guess is that they will continue to adjust Content ID. Eventually, new uploads of Justin Bieber may sound like this: https://www.youtube.com/watch?v=bidHnEekXpE