One of the things that attending Nvidia’s (NVDA) annual developer conference drives home is the extent to which Nvidia is now a software company as much as it is a graphics chip developer.
And from the perspective of the head of Nvidia’s server GPU business, this end-to-end approach to product development — one that encompasses developing GPUs and hardware, creating software tools and libraries that work with those GPUs and working with third-party developers on software that can effectively leverage those GPUs and tools/libraries — has become a vital competitive strength.
While at Nvidia’s GPU Technology Conference (GTC) this week, I had a chance to talk with Ian Buck, the company’s VP of Accelerated Computing. Buck is in charge of Nvidia’s Datacenter business (it covers server GPUs and related hardware and software), which produced $2.9 billion in revenue during the company’s most recent fiscal year. He’s also the creator of Nvidia’s widely-adopted CUDA programming model.
Here’s what Buck said about a number of interesting subjects:
AI Adoption at Cloud Services Firms
Nvidia remains the dominant provider of accelerators used to handle the demanding task of training AI/deep learning models to do things such as understand voice commands, detecting objects within photos and determining what content to show on a search results page or personalized news feed. And, although this field is more competitive, the company is also a major provider of accelerators used to perform inference — the running of trained AI models against new data and content, such as that produced by hundreds of millions of consumers and office workers.
As Buck pointed out, tech giants running massive cloud data centers (the proverbial hyperscalers) have been at the forefront of deploying AI at scale. “Hyperscalers all jumped on it very quickly,” he said. “Logically, because they had a lot of the talent…they also had a lot of the data, and of course massive amounts of compute [capacity].”
But thanks to both the deployment of AI training systems (such as Nvidia’s DGX line) within traditional data centers, as well as the launch of cloud services for training and deploying AI models, AI adoption has been spreading rapidly. Buck noted Internet/cloud services providers such as Uber and Yelp (YELP) have become major adopters, using AI to do things such as figure out what image should go on a business’ web page or having the right mix of ride-sharing drivers and passengers in an area.
Enterprise AI Adoption
When asked about which enterprise verticals are seeing strong AI adoption, Buck mentioned financial services is another area where Nvidia sees a lot of work happening. This is an industry where firms have long invested in computing to gain an edge, and “AI is another tool” for them to leverage, he said. Meanwhile, the oil and gas industry, which has long used high-performance computing (HPC) systems featuring GPUs to assist with oil discovery, is now turning to AI to help with the process.
Healthcare is a “particularly interesting” field, Buck added. Here, AI is being used to do things such as helping doctors conducting MRIs and CAT scans to prioritize the most important scans, and on helping radiologists focus on “hotspots” within scans.
The Versatility of GPUs
As Nvidia starts facing more competition from firms developing custom chips (ASICs) meant for AI workloads — companies developing them include Intel (INTC) , Alphabet/Google (GOOGL) and several startups — the company has been talking up the versatility of its GPUs. That is, their ability to not only handle AI workloads, but those in areas such as graphics rendering, analytics and traditional HPC.
Buck suggested Nvidia’s recently-launched Tesla T4 GPU, which is based on the company’s new Turing GPU architecture, strengthens the company’s sales pitch with regards to versatility. The Tesla T4, Buck said, can be used to handle everything from training to inference to video transcoding to providing GPU resources for virtual desktops. Google has begun providing access to the T4 to its cloud infrastructure clients, and Amazon.com (AMZN) announced this week that it plans to do the same.
Nvidia’s Inference Strengths
A lot of data center inference work is still performed using server CPUs (quite often Intel’s). And in the accelerator space, Nvidia faces competition from programmable chips (FPGAs) developed by Intel and Xilinx (XLNX) , as well as ASICs from the likes of Intel, Google and Amazon.
Nvidia is banking on the performance gains delivered by the Tesla T4, which comes with specialized processing cores (known as Tensor Cores) to accelerate AI workloads, to help it gain inference share. And Buck indicated the fact that so much training work relies on Nvidia’s GPUs helps its cause.
“By training on Nvidia GPUs, you train to a certain level of accuracy,” he said. And preserving that accuracy once a trained AI model has been deployed for inference can be easier if a firm is relying on a common processor architecture and software stack. He added that when deployed AI models make mistakes (as they inevitably do), relying on a common platform makes it easier to send data back to a training system to help improve the model.
“What we often see is a constant re-training of the service,” Buck said. “The world-class AI deployed today is constantly re-training.”
When asked about the latency performance of Nvidia’s GPUs for inference — that is, the minimum amount of time the GPUs need to act on data — Buck mentioned that Nvidia’s GPUs can deliver latencies of less than 10 milliseconds (thousandths of a second). He added that while latency (often a selling point for FPGAs) matters, so does the ability to support a wide variety of inference workloads, as the number of AI models deployed by companies to power services keeps growing. That’s an area where Nvidia’s software support helps it out.
During his Monday keynote address at GTC, Nvidia CEO Jensen Huang pointed out how a simple voice query search fielded by Microsoft’s (MSFT) Bing search engine requires engaging over a half-dozen deployed AI models — among other things, they covered speech recognition, a language model, a text-to-speech algorithm and an algorithm for figuring out what images to show on a web page. “All that [work] has to happen in 300 milliseconds or less,” Buck noted.
Improving Performance via Software Advances
It’s now been about two years since Nvidia launched the Tesla V100, its current flagship GPU for AI training and HPC workloads. And though some were hoping that the company would announce a successor to the V100 at GTC, no such announcement arrived.
But while the V100 may be two years old, software advances — made possible in part by Nvidia’s work with third-party developers — have considerably improved its performance for many AI and HPC workloads since 2017, Buck pointed out. One example he gave: Training performance for ResNet-50, a well-known neural network related to image-recognition, has risen more than 100%.
At GTC, Nvidia took another step to improve the training performance of its existing GPUs by announcing support for a technology known as automatic mixed precision. In a nutshell, automatic mixed precision allows AI researchers to use a blend of computationally demanding, high-precision arithmetic and less demanding, low-precision arithmetic to effectively train certain models, and to do so with little programming work.
The move comes as firms such as Intel and Google give their support to Bfloat16, a number format that they argue can allow 16-bit (half-precision) arithmetic to (for certain AI models) deliver the kind of accuracy that has traditionally required 32-bit (single-precision) arithmetic.
Accelerating Analytics and Machine Learning Workloads
Last fall, Nvidia unveiled RAPIDS, an open-source software platform for using GPUs to accelerate traditional enterprise analytics and machine learning workloads, many of which are still handled by server CPUs. At a GTC developer session, a slide outlining the company’s RAPIDS roadmap suggested it’s aiming for the platform to deliver massive performance improvements (in some cases, well over 10x) for data analytics, machine learning and graph analytics workloads relative to CPU-based approaches.
“Data analytics can benefit the same way that HPC and AI have benefited from accelerated computing,” Buck said when asked about the performance gains RAPIDS can deliver. “In large part, it’s the same math.”
Nvidia’s Software Ecosystem
When discussing Nvidia’s software stack for developers, the CUDA programming model is just “half the story” at this point, Buck observed. The company now offers developers nearly 40 domain-specific software libraries that run on top of CUDA, including 15 AI-related libraries. “This represents a huge body of work,” he said.
At GTC, Nvidia re-branded its various software libraries, declaring them to all be part of a common software platform known as CUDA-X. The company also revamped its AI and data science software libraries, and it placed these libraries within a common software developer kit (SDK) known as CUDA-X AI.