AMD @ SC15: Boltzmann Initiative Announced - C++ and CUDA Compilers for AMD GPUsby Ryan Smith on November 16, 2015 8:00 AM EST
The second in our major SC15 announcements comes from AMD, who is taking to the show to focus on the HPC capabilities of their FirePro S line of server cards. Of all of the pre-briefings we’ve sat in on in the past two weeks AMD’s announcement today is by far the most significant. And it’s only fitting then that this happens when SC is taking place in AMD’s backyard: Austin, Texas.
So what has AMD so excited for SC15? In short the company is about to embark on a massive overhaul of their HPC software plans. Dubbed the Boltzmann Initiative – after father of statistical mechanics Ludwig Boltzmann – AMD will be undertaking a much needed redevelopment effort of their HPC software ecosystem in order to close the gap with NVIDIA and offer an environment competitive (and compatible!) with CUDA. So with that in mind, let’s jump right in.
Headless Linux & HSA-based GPU Environment
Perhaps the cornerstone of the Boltzmann Initiative is with AMD’s drivers, which are being improved and overhauled to support AMD’s other plans. The company will be building a dedicated 64-bit Linux driver specifically for headless operation under Linux. It’s only been in the last year that AMD has really focused on headless Linux operation – prior to that headless OpenCL execution was a bit of a hack – and with the new driver AMD completes what they’ve started.
But more importantly than that, the headless Linux driver will be implementing an HSA extended environment, which will bring with it many of the advantages of the Heterogeneous System Architecture to AMD’s FirePro discrete GPUs. This environment, which AMD is calling HSA+, builds off of the formal HSA standard by adding extensions specifically to support HSA with discrete GPUs. The extensions themselves are going to be non-standard – the HSA Foundation has been focused on truly integrated devices ala APUs, and it doesn’t sound like these extensions will be accepted upstream into mainstream HSA any time soon – but AMD will be releasing the extensions as an open source project in the future.
The purpose of extending HSA to dGPUs, besides meeting earlier promises, is to bring as many of the benefits of the HSA execution model to dGPUs as is practical. For AMD this means being able to put HSA CPUs and dGPUs into a single unified address space – closing a gap with NVIDIA since CUDA 6 – which can significantly simplify programming for applications which are actively executing work on both the CPU and the GPU. Using the HSA model along with this driver also allows AMD to address other needs such as bringing down dispatch latency and improving support/performance for large clusters where fabrics such as InfiniBand are being used to link together the nodes in a cluster. Combined with the basic abilities of the new driver, AMD is in essence laying some much-needed groundwork to offer a cluster feature set more on-par with the competition.
Heterogeneous Compute Compiler – Diverging From OpenCL, Going C++
The second part of the Boltzmann Imitative is AMD’s new compiler for HPC, the Heterogeneous Compute Compiler. Built on top of work the company has already done for their HSA compiler, the HCC will be the first of AMD’s two efforts to address the programming needs of the HPC user base, who by and large has passed on AMD’s GPUs in part for a lackluster HPC software environment.
As a bit of background here before going any further, one of the earliest advantages for NVIDIA and CUDA was supporting C++ and other high-level programming languages at a time when OpenCL could only support a C-like syntax, and programming for OpenCL was decidedly at a lower level. AMD meanwhile continued to back OpenCL, in part in order to support an open ecosystem, and while OpenCL made great strides with the provisional release of OpenCL 2.1 and OpenCL C++ kernel language this year, in a sense the damage has been done. OpenCL sees minimal use in the HPC space, and further complicating matters is the fact that not all of the major vendors support OpenCL 2.x. AMD for their part is polite enough not to name names, but at this point the laggard is well known to be NVIDIA, who only supports up to OpenCL 1.2 (and seems to be in no rush to support anything newer).
As a result of these developments AMD is altering their software strategy, as it’s clear that the company can no longer just bank on OpenCL for their HPC software API needs. I hesitate to say that AMD is backing away from OpenCL at all, as in our briefings AMD made it clear that they intend to continue to support OpenCL, and based on their attitude and presentation this doesn’t appear to be a hollow corporate boilerplate promise in order to avoid rocking the boat. But there’s a realization that even if OpenCL delivers everything AMD ever wanted, it’s hard to leverage OpenCL when support for the API is fragmented and when aspects of OpenCL C++ are still too low level, so AMD will simultaneously be working on their own API and environment.
This environment will be built around the Heterogeneous Compute Compiler. In some ways AMD’s answer to CUDA, the HCC is a single C/C++/OpenMP compiler for both the CPU and the GPU. Like so many recent compiler projects, AMD will be leveraging parts of Clang and LLVM to handle the compilation, along with portions of HSA as previously described to serve as the runtime environment.
The purpose of the HCC will be to allow developers to write CPU and/or GPU code using a single compiler, in a single language, inside a single source file. The end result is something that resembles Microsoft’s C++ AMP, with developers simply making parallel calls within a C++ program as they see fit. Perhaps most importantly for AMD and their prospective HPC audience, HCC means that a separate source file for GPU kernels is not needed, a limitation that continues to exist right up to OpenCL++.
An Example of HCC Code (Source)
Overall HCC will expose parallelism in two ways. The first of which is through explicit syntax for parallel operations, ala-C++ AMP, with developers calling parallel-capable functions such as parallel_for_each to explicitly setup segments of code that can be run in parallel and how that interacts with the rest of the program, with this functionality built around C++ lambda code. The second method, at an even higher level, will be to leverage the forthcoming Parallel STL (Standard Template Library), which is slated to come with C++ 17. The Parallel STIL will contain a number of parallelized standard functions for GPU/accelerator execution, making things even simpler for developers as they no longer need to explicitly account for and control certain aspects of parallel execution, and can use the STL functions as a base for modification/extension.
Ultimately HCC is intended to modernize GPU programming for AMD GPUs and to bring some much-desired features to the environment. Along with the immediate addition of basic parallelism and standard parallel functions, the HCC will also include some other features specifically for improving performance on GPUs and other accelerators. This includes support for pre-fetching data, asynchronous compute kernels, and even scratchpad memories (i.e. the AMD LDS Local Data Share). Between these features, AMD is hopeful that they can offer the kind of programming environment that HPC users have wanted, an environment that is more welcoming to new HPC programmers, and an environment that is more welcoming to seasoned CUDA programmers as well.
Heterogeneous-compute Interface for Portability (HIP) – CUDA Compilation For AMD GPUs
Last but certainly not least in the Boltzmann Initiative is AMD’s effort to fully extend a bridge into the world of CUDA developers. With HCC to bring AMD’s programming environment more on par with what CUDA developers expect, AMD realizes that just being as good as NVIDIA won’t always be good enough, that developers accustomed to the syntax of CUDA won’t want to change, and that CUDA won’t be going anywhere anytime soon. The solution to that problem is the Heterogeneous-compute Interface for Portability, otherwise known as HIP, which gives CUDA developers the tools they need to easily move over to AMD GPUs.
Through HIP AMD will bridge the gap between HCC and CUDA by giving developers a CUDA-like syntax – the various HIP API commands – allowing developers to program for AMD GPUs in a CUDA-like fashion. Meanwhile HIP will also including a toolset (the HIPify Tools) that further simplifies porting by automatically converting CUDA code to HIP code. And finally, once code is HIP – be it natively written that way or converted – it can then be compiled to either NVIDIA or AMD GPUs through NVCC (using a HIP header file to add HIP support) or HCC respectively.
To be clear here, HIP is not a means for AMD GPUs to run compiled CUDA programs. CUDA is and remains an NVIDIA technology. But HIP is the means for source-to-source translation, so that developers will have a far easier time targeting AMD GPUs. Given that the HPC market is one where developers are typically writing all of their own code here anyhow and tweaking it for the specific architecture it’s meant to run on, a source-to-source translation covers most of AMD’s needs right there, and retains AMD’s ability to compile CUDA code from a high level where they can better optimize that code for their GPUs.
Now there are some unknowns here, including whether AMD can keep HIP up to date with CUDA feature additions, but more importantly there’s a question of just what NVIDIA’s reaction will be. CUDA is NVIDIA’s, through and through, and it does make one wonder whether NVIDIA would try to sue AMD for implementing the CUDA API without NVIDIA’s permission, particularly in light of the latest developments in the Oracle vs. Google case on the Java API. AMD for their part has had their legal team look at the issue extensively and doesn’t believe they’re at risk – pointing in part to Google’s own efforts to bring CUDA support to LLVM with GPUCC – though I suspect AMD’s efforts are a bit more inflammatory given the direct competition. Ultimately it’s a matter that will be handled by AMD and NVIDIA only if it comes to it, but it’s something that does need to be pointed out.
Otherwise by creating HIP AMD is solving one of the biggest issues that has hindered the company’s HPC efforts since CUDA gained traction, which is the fact that they can’t run CUDA. A compatibility layer against a proprietary API is never the perfect solution – AMD would certainly be happier if everyone could and did program in standard C++ – but there is a sizable user base that has grown up on CUDA and is at this point entrenched with it. And simply put AMD needs to have CUDA compatibility if they wish to wrest HPC GPU market share away from NVIDIA.
Wrapping things up then, with the Boltzmann Initiative AMD is taking an important and very much necessary step to redefine themselves in the HPC space. By providing an improved driver layer for Linux supporting headless operation and a unified memory space, with a compiler for direct, single source C++ compilation on top of that, and a CUDA compatibility layer to reach the established CUDA user base, AMD Is finally getting far more aggressive on the HPC side of matters, and making the moves that many have argued they have needed to make for quite some time. At this point AMD needs to deliver on their roadmap and to ensure they deliver quality tools in the process, and even then NVIDIA earned their place in the HPC space through good products and will not be easily dislodged – CUDA came at exactly the time when developers needed it – but for AMD if they can execute on Boltzmann it will be the first time in half a decade they would have a fighting chance at tapping into the lucrative and profitable HPC market.
Post Your CommentPlease log in or sign up to comment.
View All Comments
MikhailT - Monday, November 16, 2015 - linkWhy would anyone do this with the legal issue about HIP?
I fair to see why people should be excited about this? AMD should've presented a better solution for OpenCL and not invest into a solution for a competitor that they have to play catch up with and not only that, they could face legal issues for.
Yojimbo - Monday, November 16, 2015 - linkThey have to play catch up either way. Are there really legal issues for writing your own compiler for code written for someone else's compiler?
Alexvrb - Monday, November 16, 2015 - linkOpenCL is an open standard with some serious shortcomings, and poor support on popular Nvidia hardware. Extending this would not solve all their problems in the short term, and anything they add that isn't standard simply would not see much use. What they did was literally the smartest thing they could have done and it involves a three-pronged approach. Continue to support existing and future OpenCL solutions, add extensive support for C++/C to produce CPU and GPU code using a single compiler (HCC), and add support for converting existing CUDA code to run through said HCC *or* NVCC to produce binaries for both major vendors using existing CUDA code.
If they only did the first or even the first two they'd still be "playing catch up" in your eyes, and they wouldn't be interesting to those currently standardizing on CUDA and Nvidia. Now they're going to get a lot more interest from formerly Nvidia-only outfits.
johnpombrio - Monday, November 16, 2015 - linkOnce again, AMD comes out with a massive paper launch and media conference of a non-existent software and programming solution with no dates on when it will be available or how AMD will be able to pay for creating it. Power point slides, slick presenters, and nice graphics are a lot cheaper than actually creating the software that they represent. As with the Fury X launch with its massive AMD hype before launch and its meh performance when it was released, AMD found out how hard it is to actually produce what it said it would in the AMD paper launch of the product years before. Meantime, NVidia and Intel continue to steal away market share without all the press releases and power point slides and media talks. AMD, continue to generate more FUD and smoke and mirrors please. It gives the press something to write about.
MrSpadge - Tuesday, November 17, 2015 - linkYeah, Intel & nVidia never present their new products and ideas. And never fail to deliver once they show something.. like Tegra, Itanium etc.
anubis44 - Tuesday, November 17, 2015 - linkThe Fury X is impressive. The card is designed for DX12, not decrepit DX11, which is all the reviewers used to test the card. In addition, the latest Windows 10 drivers have produced significant performance increases for all AMD cards:
The tables are turning, and AMD is getting stronger, not weaker.
D. Lister - Thursday, November 19, 2015 - linkFury is designed for an API that would come in full swing, maybe in the next couple of years, when the Fury would already be a generation or two old? Secondly, DX11 has been around for quite a few years, yet DX9 still prevails. The two of these APIs shall continue to exist for a long time right beside DX12.
Also, wccftech is not really a very good choice as a reference.
hammer256 - Monday, November 16, 2015 - linkBack when Ryan first wrote about GCN I remember thinking that hopefully AMD's software support for programming the GPUs will get better, and maybe get to the point of being as well documented and convenient to use as CUDA. I guess it took them a while...
In terms of HPC, I think that their effort in this is a lot bigger deal than their next GPU. Hardware is nice and pretty, but it's the software support that allows the hardware to be practical for us ordinary folks to program.
Marcelo Viana - Saturday, November 21, 2015 - linkWell I see no rights of NVidia to sue AMD in any way.
The AMD tools take a(uncompiled) source code that is a programmer authorship. A code that i use the notepad to write if i wish to, as a small code in c++.
If AMD use Cuda code in their gpu will be another history, but after all AMD use only programmer authorship code that AMD tools translate to C++ code and translate that C++ code with their own compiler to run in their firepro cards.
Very smart move from AMD. Lots of ressources, imagine one gpu on node 15 share a memory of the gpu on node 512, through the infiniband, or all the Ram and Vram of the entire system as one unified memory. So easy to program now, finally AMD, finally. The only thing i hope is that their compiler give performance and not just that flexibility and ressources.
Marcelo Viana - Saturday, November 21, 2015 - linkAnother thought that i forget.
So now will be better to program in HIP since the language is almost equals to cuda, somebody post on site can't remember now, but all the commands you change "cuda" to "hip" like:
cudaMalloc() became hipMalloc()
cudaMemcpy() became hipMemcpy()
cudaDeviceSyncronize() became hipDeviceSyncronize()
and so on.
the diference is that with HIP i can choose for what gpu i want my code to compile. Where in cuda i have only one.
One make a gpu render software in CUDA, other make a gpu render software in HIP, but only the second one compile his code to booth vendors AMD and NVidia.
Very smart move AMD, very smart.