Friday, May 5, 2017

Facebook's march to augmented reality

The big theme coming out of Facebook's recent F8 Developer Conference in San Jose, California was augmented reality (AR). Mark Zuckerberg told the audience that the human desire for community has been weakened over time and believes that social media could play a role in strengthening these ties.

Augmented reality begins as an add-on to Facebook Stories, its answer to Snapchat. Users simply take a photo and then use the app to place an overlay on top of the image, such as a silly hat, fake moustache, while funky filters keep the users engaged and help them create a unique image. Over time, the filter suggestions become increasingly smart, adapting to the content in the photo - think of a perfect frame if the photo is of the Eiffel Tower. The idea is to make the messaging more fun. In addition, geo-location data might be carried to the FB data centre to enhance the intelligence of the application, but most of the processing can happen on the device.

Many observers saw Facebook's demos as simply a needed response to Snapchat. However, Facebook is serious about pushing this concept far beyond cute visual effects for photos and video. AR and VR are key principles for what Facebook believes is the future of communications and community building.

As a thought experiment, one can consider some of the networking implications of real-time AR. In the Facebook demonstration, a user turns on the video chat application on their smartphone. While the application parameters of this demonstration are not known, the latest smartphones can record in 4K at 30 frames per second, and will soon be even sharper and faster. Apple's Facetime requires about 1 Mbit/s for HD resolution and this has been common for several years (video at 720p and 30 fps). AR certainly will benefit from high resolution, so one can estimate the video stream leaves the smart phone on a 4 Mbit/s link (this guestimate is on the low end). The website www.livestream.com calculates a minimum of 5 Mbit/s upstream bandwidth for launching a video stream with high to medium resolution. LTE-Advanced networks are capable of delivering 4 Mbit/s upstream, with plenty of headroom, and WiFi networks are even better.

To identify people, places and things in the video, Facebook will have to perform sophisticated graphical processing with machine learning. Currently this cannot be done locally by the app on the smartphone and so will need to be done at a Facebook data centre. So the 4 Mbit/s stream will have to leave the carrier network and be routed to the nearest Facebook data centre.

It is known from previous Open Compute Project (OCP) announcements that Facebook is building its own AI-ready compute clusters. The first design, called Big Sur, is an Open Rack-compatible chassis that incorporates eight high-performance GPUs of up to 300 watts each, with the flexibility to configure between multiple PCI-e topologies. It uses NVIDIA's Tesla accelerated computing platform. This design was announced in late 2015 and subsequently deployed in Facebook data centres to support its early work in AI. In March, Facebook unveiled Big Basin, its next-gen GPU server capable of machine learning models that are 30% bigger than those handled on Big Sur using greater arithmetic throughput and a memory increase from 12 to 16 Gbytes. The new chassis also allows for disaggregation of CPU compute from the GPUs, something that Facebook calls JBOG (just a bunch of GPUs), which should bring the benefits of virtualisation when many streams need to be processed simultaneously. The engineering has anticipated that increased PCIe bandwidth will be needed between the GPUs and the CPU head nodes, hence a new Tioga Pass server platform was also necessitated.

The Tioga Pass server features a dual-socket motherboard, with DIMMs on both PCB sides for maximum memory configuration. The PCIe slot has been upgraded from x24 to x32, which allows for two x16 slots, or one x16 slot and two x8 slots, to make the server more flexible as the head node for the Big Basin JBOG. This new hardware will need to be deployed at scale in Facebook data centres. Therefore, one can envision that the video stream originates at 4 Mbit/s and travels from the user's smartphone and is routed via the mobile operator to the nearest Facebook data centre.

Machine learning processes running on the GPU servers perform what Facebook terms Simultaneous Localisation and Mapping (SLAM). The AI essentially identifies the three-dimensional space of the video and the objects or people within it. The demo showed a number of 3D effects being applied to video stream, such as lighting/shading, placement of other objects or text. Once this processing has been completed, the output stream must continue to its destination, the other participants on the video call. Maybe further encoding has compressed the stream, but still Facebook will have to be burning some amount of outbound bandwidth to hand the video stream over to another mobile operator for delivery via IP to the app on the recipient's smartphone. Most likely, the recipient(s) of the call will have their video cameras turned on and these streams will also need the same AR processing in the reverse direction. Therefore, we can foresee see a two-way AR video call burning tens of mgeabits of WAN capacity to/from the Facebook data centre.

The question of scalability

Facebook does not charge users for accessing any of its services, which generally roll out across the entire platform at one go or in a rapid series of upgrade steps. Furthermore, Facebook often reminds us that it is now serving a billion users worldwide. So clearly, it must be thinking about AR on a massive scale. When Facebook first began serving videos from its own servers, the scalability question was also raised, but this test was passed successfully thanks to the power of caching and CDNs. When Facebook Live began rolling, it also seemed like a stretch that it could work at global scale. Yet now there are very successful Facebook video services.

Mobile operators should be able to handle large numbers of Facebook users engaging in 4 Mbit/s upstream connections, but each of those 4 Mbit/s streams will have to make a visit to the FB data centre for processing. Fifty users will burn 200 Mbit/s of inbound capacity to the data centre, 500 users will eat up 2 Gbit/s of bandwidth, 5,000 20 Gbit/s and 50,000 200 Gbit/s. For mobile operators, if AR chats prove to be popular lots of traffic will be moving in and out of Facebook data centres, and one could easily envision a big carrier like Verizon or Sprint having more than 500,000 simultaneous users on Facebook AR. So this would present a challenge if 10 million users decide to try this out on a Sunday evening. That would demand a lot of bandwidth that network engineers would have to find a way to support. Another point is that, from experience with other chat applications, people are no longer accustomed to economising in terms of length of the call or number of participants. One can expect many users to kick-off a Facebook AR call with friends on another continent and keep the stream opened for hours.

Of course, there could be clever compression algorithms in play so that the 4 Mbit/s at each end of the connection could be reduced, while if the participants do not move from where they are calling and nothing changes in the background, perhaps the AR can snooze, reducing the amount of processing needed and the bandwidth load. In addition, perhaps some of the AR processing can be done on next gen smartphones. However, the opposite could also be true, where AR performance is enhanced by using 4K, multiple cameras per user are used on the handset for better depth perception, and the video runs at 60 fps or faster.

Augmented reality is so new that it is not yet known whether it will take off quickly or be dismissed as a fad. Maybe it will only make sense in narrow applications. In addition, by the time AR calling is ready for mass deployment, Facebook will have more data centres in operation with a lot more DWDM to provide its massive optical transport – for example the MAREA submarine cable across the Atlantic Ocean between Virginia and Spain, which Facebook announced last year in partnership with Microsoft. The MAREA cable, which will be managed by Telxius, Telefónica’s new infrastructure company, will feature eight fibre pairs and an initial estimated design capacity of 160 Tbit/s. So what will fill all that bandwidth? Perhaps AR video calls, but the question then is, will metro and regional networks be ready?