Why live video production is on the cusp of massive democratization.
A Gentle Introduction to Live Video.
Producing good live video is historically hard. And I don’t only mean in terms of human expertise and effort. It happens to be an extremely challenging computing problem. Consider the last professional sporting event you watched on tv or your computer — it likely included multiple camera angles, graphics, instant replays, data overlays, animations, multiple audio inputs, etc. How does all of that live data get combined, manipulated by a producer and transferred at lightning speed to your screen in a few seconds?
The best way to answer this question starts with a quick primer on how a computer works. At its most basic level, a computer is a vehicle to transmit electric pulses. You can think of these electric pulses as either “1s” or “0s”. This binary representation enables us to create logical statements on the computer. From here, the designers of a computer can create chips (collections of logic on physical hardware) as the foundation for writing software on top of it. You may have heard the term “levels of abstraction” from one of your nerdiest friends when talking about computers. This refers to the different levels of software that are written (you can think of it as a stack) on top of your computer hardware which a computer programmer can use to make a computer do something (or even a regular person can make it do something using an interface…yet an even higher level of abstraction!). Each level higher up the stack typically means that your computer will need to execute more and more instructions to do what you tell it (but it will also be easier for a regular person to tell it what to do as you go up the stack). This means that the “level of abstraction” that you choose to operate on will have an impact on the performance of your program or the task you want the computer to execute.
So, you may be wondering “ok so what does that have to do with live video”? Well, live video requires the ability to process a huge amount of data really quickly. This brings me to the final “computer science” part of the story. When one is faced with a computing challenge like this, there are a few options. You can create custom hardware chips designed to deal with a specific process (ie handling video) and for no other function. If you do this, you can execute the instructions you need to handle and produce video at “the lowest level of abstraction” (the hardware level). The alternative solution would be handling the video using software. This means it would take place at a “higher level of abstraction” and as a result would take more instructions (ie time) to do or might simply not be possible depending on the underlying hardware.
In a world where the data transfer and processing requirements of live video production exceeded the capacity of a general-purpose computer (ie your laptop, tablet or phone) hardware was really the only viable solution. For this reason, high quality live video production has been a very hardware intensive process to date.
If you are asking yourself “what makes video production such an intensive process”, consider the following tasks that take place- multiple high-resolution video files sent to a hardware box which needs to be able to read in their massive quantities of “1s” and “0s” for each frame of the video. Consider a 1080p video at 30 frames per second. That is roughly 1.2 Gb of video data from each camera per second. Now multiply that by 4 or more (the number of cameras typically included in a good production). That’s a lot. By the way, these cameras won’t always send their video data in the same format…so the hardware will have the responsibility of converting the format. Now let’s add graphics and data (like scoreboard information). These graphics and data need to be incorporated into the stream (which requires a similar process to converting a format…specifically decoding the video, inserting the graphics and then encoding it again). Ok how about audio? Well that needs to be processed (and formatted) and remain synchronized with the video throughout the production. Lastly, there is instant replay. This is a tricky one for your computer. In addition to doing all of this processing, your computer needs to have the ability to store huge volumes of data (ie all of the video from all of the cameras captured during the event) in its “memory” to be quickly recalled when you want to show a replay.
I hope that provides a good sense of the magnitude of the challenge. And I didn’t even touch on any other processing you might want to do (like graphics that track objects in the video, autonomous viewports, etc).
As a result of what I illustrated above, people involved in the world of video production could be forgiven for adhering to the conventional wisdom that it still necessitates custom hardware (both in computing power and cameras). However, that is no longer the case. We are all familiar with Moore’s law and the exponential improvements in computer processing power in our devices over time. At the present moment in video, I would argue that general purpose computer processing power is reaching a point where it can adequately handle the tasks laid out above. It’s important to keep in mind that the human eye (or at least my eye) can’t tell the difference between anything above 4K and can scarcely tell the difference between 1080p and 4K. My point here is that the amount of data required for live video processing won’t grow at an exponential rate anymore because its already at the upper bound of what humans can differentiate, yet computer processing power will.
This means it will get easier and easier for common devices to handle HD video processing tasks. Another challenge with live video is an internet network’s ability to transfer the video data from its origin to your device (meaning it can’t effectively livestream a 1080p video at a decent frame rate to your computer). This is sometimes more of a constraint than the video processing power of a computer or your eye’s ability to process images. The last component of this convergence is the camera quality of new smart devices (iPhones, iPads, etc). They now have the ability to 4K video at 60 frames per second. This certainly meets or exceeds the quality of almost any camcorder.
What does this all mean? In a sentence-the rapidly improving quality of mobile wireless networks (see 5G), the insane quality of the camera in your pocket and the processing power of your computers (phones, tablets, laptops) means that very soon ANYONE will be able to produce a live video event at the same quality as you see on television. The shift will be from technical constraints to constraints of human desire. And if there is any doubt about human desire to produce live video, you can take a look at the enormous quantities pouring into Facebook Live, YouTube Live, Instagram, Twitch, Periscope, etc. Now its a question of the desire to make BETTER video?
The Innovators Dilemma
Another argument for the coming rapid and abrupt change in live video production can be made by considering the patterns observed in The Innovators Dilemma. Simply put, most “video experts” and existing companies are focused on incremental gains of existing video technologies (and hardware systems). This is historically a moment when market leaders following this strategy tend to be disrupted. Combine this with the “convergence” that is taking place with consumer devices and wireless networks. The conditions are ripe for a seismic shift in how “professional quality video” is produced and who can produce it. Instead of making custom video hardware smaller and slightly more powerful, how about simply doing it on everyday devices? Or instead of building an autonomous camera system also requiring custom hardware (aka an ability to have a camera follow the action of a sporting event), how about making that happen on iPhones?
Creating a New Market
So, who exactly can produce high quality live video? This group of people has historically been small. Without expertise, expensive hardware and a lot of effort, a person was simply not in a position to produce a multi camera livestream with any type of production quality. For all of the reasons described above, this is no longer the case. Millions (or maybe even billions) of people have the hardware required to create great live video. This represents an exponential explosion in the number of people who can now express themselves more effectively and share with the world in real-time!
What Does This Mean
Anyone can create amazing video. Whether it is a group of kids who wants to stream the athletic events or concerts at their school, parents at their children’s events, a professional athlete or move star creating live content for social media, a college or HS athletic department streaming their Olympic sports, a concert, a live speech…all of these groups will soon be sharing their stories with the world at quality levels not previously imaged using only the devices in their pocket.
The Next Convergence
The video evolution I described is extremely exciting in its own right (to me at least). There is a parallel revolution taking place in artificial intelligence. The same improvements in computer hardware have led to the viability of massive neural networks and computer vision applications. Their constraints are often the quantity of available data to make them smarter. So what happens when all of this video data becomes accessible to these applications? How easy can it become for someone to produce a great video with minimal effort? Will people even want to watch full events or can a computer decide what parts you want to see?