Fax: +44(0)1865 300 232
GPU Computing – The Next BIG Thing?
It was the mass market that made the powerful multi-core CPUs in today’s PCs possible. Without huge sales volumes, Intel and AMD would never justify the multi-billion dollar cost of a chip fabrication plant. Now, the mass market has spawned another technology – the graphics chips (GPUs) that gamers use to render the almost photo-realistic scenes in modern games - and this time the spin-off is being felt in the world of High Performance Computing (HPC). In fact, it’s already the case that the fastest supercomputer in the Top500 supercomputer charts is the Chinese Tianhe-1A system, which relies on GPUs to deliver a peak performance of 4.7petaflops/s.
The most powerful processor in a modern gaming computer is not the multi-core CPU – it’s the many-core GPU on the graphics card. The GPU chip may have an order of magnitude more of both memory bandwidth, and peak processing power than a modern CPU. Instead of 4 or 8 cores, there may be several hundred. A modern GPU can handle all the standard data types, including double precision, and NVIDIA’s Fermi architecture even offers 64 bit addressing and ECC memory.
Of course, the trick is to find a way to use that power. In part that is down to development tools, and here there a several viable options, but the most widely used is probably CUDA C (CUDA stands for Compute Unified Device Architecture), a parallel processing language based on C.
CUDA C is a proprietary language designed for NVIDIA GPUs. AMD offer the Stream SDK on their GPUs, but it would be fair to say that this is less widely used. Both NVIDIA and AMD have versions of OpenCL, a standardised language that may end up as the preferred option. Microsoft also offer an API called DirectCompute, available for Vista and Windows 7.
For Fortran users, PGI have a Fortran compiler that offers both CUDA-like extensions, and a directive based accelerator that aims to take the complexity out of GPU programming. Other Fortran vendors have GPU interfaces in development, but at the time of writing none is yet as mature as PGI’s. Of course, Fortran users also have the option to link to code written in CUDA C.
![]() |
The other part of the equation is the selection of algorithms. To work efficiently, GPUs need to be treated as massively parallel computers. For peak efficiency, a GPU with a few hundred cores needs to be fed with several thousand threads working in parallel. This is because GPUs use a fast thread-switching mechanism to hide memory latency. For some applications, this is easily done; if the computational core of the program can be viewed as one or more loops over a large number of independent items, each of which is processed by the same logic, then the program may be quite easy to convert. In other cases, a more thoughtful approach is required. Polyhedron has been working with GPUs since the first double-precision capable chips became available in 2008. We have developed GPU algorithms for various computational tasks, including a sparse linear solver for use in a commercial oil reservoir simulator. This is not a natural candidate for a massively parallel computer, because the solver needs to reconcile strongly interacting driving forces from different parts of the problem domain. Nevertheless, for large problems, this solver is an order of magnitude faster than a state-of-the-art serial algorithm running on a modern CPU.
|
If you are interested in applying GPU technology to your problem, why not call on Polyhedron? Our consultants can help you to check out the feasibility of a GPU based solution, and can even code it for you. Email john.gpu@polyhedron.com to discuss this further.