Compilers
» Calseum
Calseum is a compiler library that compiles C-like language to AMD GPUs. It is designed as a replacement for CAL IL. It is a small library that takes as input a string of C-like code and returns as output CAL IL. It is written in C++ and currently runs on 64-bit Linux®.
|
Additional Details |
It is being developed at the Dept of Computing Science, University of Alberta as one part of a thesis project under the supervision of Prof. Nelson Amaral. |
|
Author(s) |
Rahul Garg |
|
Organization Type
|
Research |
|
Organization |
University of Alberta |
|
API Used |
CAL/IL |
|
Software License |
Apache 2.0 |
|
Available Content |
Website |
» HMPP™ Workbench
Based on a set of compiler directives, the HMPP Workbench contains a pre-processor, target generators and a runtime library to develop and execute GPU-accelerated applications. By driving all the compilation workflow using standard C and Fortran compilers as well as the Brook+ and CAL/IL compilers, HMPP seamlessly integrates in the user environment as an hybrid compiler. As well as helping lower the development time and effort, HMPP delivers code portability, interoperability and easy application scaling.
|
Additional Details |
HMPP programming directives allows you to declare and execute hardware-accelerated versions of functions. Communication and device allocation can be further optimized using the data transfers and synchronization directives.
HMPP helps ease the scaling of applications in multi-GPU systems by detecting at runtime the presence and availability of GPUs before offloading computations into them. |
|
Author(s) |
CAPS entreprise |
|
Organization Type
|
Commercial |
|
Organization |
CAPS entreprise |
|
API Used |
Integration of Brook+ and CAL/IL in C and Fortran |
|
Software License |
Proprietary |
|
Available Content |
Website |
» RapidMind Multi-core Development Platform
RapidMind Multi-core Development Platform allows software organizations to quickly build applications that can harness the full potential of the latest multi-core processors as well as seamlessly take advantage of the application acceleration available in today’s stream processors such as the GPU or the Cell Broadband Engine.
|
Author(s) |
RapidMind |
|
Organization Type
|
Commercial |
|
Organization |
RapidMind |
|
API Used |
OpenGL |
|
Software License |
Proprietary |
|
Available Content |
Website |
Libraries
» PyGWA – GPGPU Library for Python
PyGWA is a GPGPU library for Python. It contains Python bindings for ATI CAL and PyGWA.DP – a toy data-parallel programming API.
|
Author(s) |
Rafal Lewczuk |
|
Organization Type
|
Research |
|
API Used |
CAL/IL |
|
Software License |
GPL |
|
Available Content |
Website |
Physics
» “PowderToy” Particle & Fluid Simulation Application
The “PowderToy” particle & fluid simulation application was the first public demonstration of OpenCL functionality. It was presented by AMD at Siggraph Asia 2008.
|
Additional Details |
This demonstration shows how OpenCL can be used to extract and implement high performance parallel computing on multi-core CPUs. |
|
Author(s) |
Advanced Micro Devices, Inc. |
|
Organization Type
|
Commercial |
|
Organization |
Advanced Micro Devices, Inc. |
|
API Used |
OpenCL |
|
Software License |
Proprietary |
|
Available Content |
Video |
» Optimized Executions of Havok Middleware on AMD Platforms
AMD and Havok demonstrated new, optimized executions of Havok's physics middleware on AMD platforms at the 2009 Game Developers Conference. The demonstration included the first OpenCL supported execution of Havok Cloth™.
» Bullet Physics SDK 3D Rigid Body Constraints Solver Demo
A port of the Bullet Physics SDK 3D rigid body constraint solver was demonstrated at the 2009 Game Developers Conference.
Computational Fluid Dynamics
» Smoothed Particle Hydrodynamics Fluid Simulation on ATI GPU
A fluid simulator running on the GPU using smoothed particle hydrodynamics to simulate a particle-based fluid.
|
Additional Details |

The image shown is a graphical representation of a Smoother Particle Hydrodynamics (SPH) fluid simulation on an ATI GPU. The simulation was performed on an ATI Radeon HD 4870 GPU co-processor. More than 10K fluid particles are simulated with physical boundary interactions in an interactive frame rate. The simulator is written using C/C++ and ATI Brook+. The fluid surface is rendered using Direct3D. |
|
Author(s) |
Jiawei Ou Chanjuan Wen |
|
Organization Type
|
Academia |
|
Organization |
Tongji University |
|
API Used |
Brook+ |
|
Software License |
GPLv3 |
|
Available Content |
Website |
» PhyFluids3D
Fluids simulation plug-in for Maxon Cinema 4D
|
Additional Details |
 |
|
Author(s) |
Remotion |
|
Organization Type
|
Freelance |
|
API Used |
Brook+ |
|
Software License |
Proprietary |
|
Available Content |
Demo Video Website |
Numerical Simulator
» Shallow Water Systems Simulation
Shallow water systems permit the simulation of rivers, channels and dambreak problems. Extremely efficient performance solvers are required to solve and analyze these problems in a reasonable amount of execution time. Due to the space and/or time cycles required being frequently very large, the computational requirements of the algorithms are high and execution time optimizations are desirable.
|
Additional Details |

We proposed a strategy to design an efficient implementation on AMD GPUs using Brook+ based on computational kernel analysis at the domain independent concept level (e.g. inductions, irregular reductions). |
|
Speedup |
140x* |
|
Author(s) |
J. Lobeiras M. Amor M. Arenaz B. B. Fraguela J. A. García M. J. Castro |
|
Organization Type
|
Academia |
|
Organization |
The project is a collaboration between several groups:
|
|
API Used |
Brook+ |
|
Software License |
Proprietary |
|
Available Content |
Demo Videos
M. Arenaz, R. Doallo and J. Touriño, XARK: An Extensible Framework for Automatic Recognition of Computational Kernels, ACM Transactions on Programming Languages and Systems, page 30(6), 2008. |
| |
* Based on difference in elapsed time required to perform shallow water simulation of 25,693 sec. for CPU-based system vs. 183 sec. for CPU-based system with numerical simulator software and ATI Radeon™ GPU. Configuration: AMD Athlon™ 4850e processor 2.5 GHz with ATI Radeon™ HD 4850 GPU, 4GB DDR2 359 MHz, AMD 790X Chipset-based motherboard, Microsoft® Windows® XP 64-bit, Microsoft® Visual C++® 2005 x64, ATI Stream SDK v1.4-beta, ATI Catalyst™ 9.2 software. |
Molecular Dynamics
» LAMMPS Molecular Dynamics Simulations
Computational chemistry employs molecular dynamics simulations to simulate the motions of interacting atoms and molecules. An existing production code used for molecular dynamics simulations (LAMMPS) was modified to take advantage of ATI Stream processors for calculating the particle-particle interactions, which accounted for the largest part of the overall simulation time on a standard processor.
|
Additional Details |

The image shown represents a 32,000-atom molecular dynamics simulation of a rhodopsin protein in a solvated lipid bilayer with the particle-particle interaction calculated on the GPU. The simulation represents the most challenging benchmark system distributed with the LAMMPS code. (More information on the LAMMPS molecular dynamics code can be found at http://lammps.sandia.gov). |
|
Author(s) |
David Richie |
|
Organization Type
|
Commercial |
|
Organization |
Brown Deer Technology |
|
API Used |
Brook+ |
|
Software License |
GPLv2 |
|
Available Content |
Website |
Science
» Astronomical Many-body Simulations
The gravitational many-body problem is concerned with the movement of bodies interacting through gravity. Solving the gravitational many-body problem with a CPU takes significant time due to O(N2) computational complexity. The demonstrated technique utilizes an ATI Radeon™ HD 4850 GPU from AMD to optimize the exact force-calculation. The optimized result is realized by a loop-unrolling technique that is highly effective on the ATI Radeon™ HD 4850 GPU.
|
Additional Details |

The image shows a snapshot of the demo program simulating the interaction between galaxies. The simulation starts out as four galaxies in a figure-eight orbit. As the simulation proceeds, the four galaxies eventually merge into a single spherical galaxy. |
|
Author(s) |
N. Nakasato K. Fujiwara M. Sato |
|
Organization Type |
Academia |
|
Organization |
University of Aizu, Japan |
|
API Used |
CAL/IL |
|
Available Content |
Website |
Video Processing
» ArcSoft SimHD™ Upscaling Technology Demo
By utilizing the ATI Stream™ SDK, ArcSoft managed to port the intensive upscaling computation from the CPU to ATI GPUs. The ATI Stream version of SimHD included in TotalMedia Theatre™ has been demonstrated at CEATEC Japan in 2008 and is planned to launch at Computex (June 2009).
|
Additional Details |
To achieve realistic upscaled images, ArcSoft SimHD™ technology requires high precision on floating point manipulations as well as intensive usage of parallel processing. Both are proven to perform far better on the GPU than the CPU. |
|
Author(s) |
ArcSoft, Inc.
For more information, please contact: Kam Shek Vickie Wei |
|
Organization Type
|
Commercial |
|
Organization |
ArcSoft, Inc. |
|
API Used |
CAL/IL |
|
Software License |
Proprietary (included in TotalMedia Theatre 3 Platinum Edition) |
|
Available Content |
Video Website |
Numerics
» GOSpMV: Automatic Performance Tuning of SpMV Software Package
GOSpMV performs automatic performance tuning for Spare Matrix-Vector Multiplication (SpMV) on AMD GPUs. It utilizes ATI Stream Computing and uses register-level blocking algorithms for its optimization. SpMV is an important computational kernel in scientific applications that tends to perform poorly on modern processors because of irregular memory accesses.
» AMD Core Math Library for Graphics Processors (ACML-GPU)
AMD Core Math Library for Graphic Processors (ACML-GPU) provides an ATI Stream-accelerated version of ACML. ACML-GPU accelerates certain routines in ACML, such as SGEMM and DGEMM, by off-loading the computation to the compatible GPUs in the system. The library dynamically decides, based on the parameters passed to the routines, whether to run the computation on the CPU or GPU, depending on which processor will yield the best performance.
|
Additional Details |
ACML-GPU automatically scales its computation across multiple GPUs, if available and can take advantage of the double precision floating point hardware in the GPU on products that contain hardware DPFP support. |
|
Author(s) |
Advanced Micro Devices, Inc. |
|
Organization Type
|
Commercial |
|
Organization |
Advanced Micro Devices, Inc. |
|
API Used |
CAL/IL |
|
Software License |
Proprietary |
|
Available Content |
Website |
Seismic Imaging
» Finite-Difference Time-Domain Solvers for Modeling Velocity-Stress Wave Propagation
Finite-difference time-domain solvers are used to model velocity-stress wave propagation for seismic applications. A direct solution of the time-dependent velocity vector and stress tensor is solved on a staggered three-dimensional grid.
|
Additional Details |

The image shown represents the propagating seismic pressure wave in a three-dimensional seismic model of the earth’s surface (1500m x 750m x 750m) with a 40m cubic high-velocity blocky inclusion.
The simulation was performed entirely on the AMD FireStream™ 9250 GPU co-processor. The velocity vector and stress tensor values were modeled on a 256 x 128 x 128 grid. The propagating wave was generated by a Ricker wavelet applied to the z-component of the velocity field at the center of the simulation domain. |
|
Author(s) |
David Richie |
|
Organization Type
|
Commercial |
|
Organization |
Brown Deer Technology |
|
API Used |
Brook+ |
|
Software License |
GPLv3 |
|
Available Content |
Website Source Code |
Medical Imaging
» GpuFdk
Analytic tomographic reconstruction using the Fdk algorithm, for a non-uniform detector geometry.
Financial
» Accelerating Binomial Options Pricing Scenarios
This demonstration presents three implementations of the Binomial Tree pricing model. The first is from the widely-used, open source project Quantlib. The second is a hand-tuned C version of the same model. The third is an implementation of this model using the RapidMind platform with ATI Stream technology accelerating the calculations.
|
Additional Details |
Competitive advantage in computational finance is about deploying the smartest, fastest algorithms for modeling financial management models and for pricing financial instruments. The RapidMind Multi-core Development Platform allows software organizations to quickly build applications that can harness the full potential of the latest multi-core processors as well as seamlessly take advantage of the application acceleration available in today’s stream processors such as the GPU or the Cell Broadband Engine. The RapidMind platform lets financial organizations focus on their internal algorithmic expertise yet quickly deploy applications on the best possible hardware. |
|
Speedup |
55x* |
|
Author(s) |
RapidMind |
|
Organization Type
|
Commercial |
|
Organization |
RapidMind |
|
API Used |
RapidMind Multi-core Development Platform |
|
Software License |
Proprietary |
|
Available Content |
* For backup and configuration information, see Video. RapidMind Website |
Security
» Elcomsoft Wireless Security Auditor
Wireless Security Auditor utilizes GPUs to help accelerate the process of auditing password security of wireless networks protected by WPA/WPA2-PSK.
|
Additional Details* |

Wireless Security Auditor is capable of taking advantage of multiple compatible GPUs in a single system. |
|
Author(s) |
Elcomsoft Co. Ltd. |
|
Organization Type |
Commercial |
|
Organization |
Elcomsoft Co. Ltd. |
|
API Used |
CAL/IL |
|
Software License |
Proprietary |
|
Available Content |
Website |
|
|
* Configuration: Intel Core 2 Duo E4500 (2.2 GHz), GIGABYTE GA-EP45-DS3P, 4GB DDR2 PC6400 memory, ATI Catalyst™ 9.4 drivers, Microsoft® Windows® XP Professional (32-bit) SP3. |
» RSA Labs RC5-72 Secret-Key Challenge Client
Distributed.net uses AMD GPUs to find the solution for the RC5-72 secret key challenge.
|
Additional Details* |

The client is capable of taking advantage of multiple GPUs in a single system. |
|
Author(s) |
Vyacheslav Chupyatov |
|
Organization Type |
Research |
|
Organization |
distributed.net |
|
API Used |
CAL/IL |
|
Software License |
Proprietary |
|
Available Content |
Website
Source Code |
|
|
*Based on AMD internal testing using RC5-72 clients as of 9/04/09. Results shown in MKeys evaluated per second. Configuration: AMD Phenom™ X4 9950 Black Edition processor, 8GB DDR2 RAM, Windows Vista® 32-bit. AMD drivers: ATI Catalyst™ 9.8 (ATI Radeon™ HD 48xx), prerelease driver (ATI Radeon HD 5870). Nvidia driver: GeForce 190.62. AMD client: [x86/Stream], v2.9106.513 (beta8). Nvidia client: [x86/CUDA-2.2], v2.9105.512 (beta8). |
Electromagnetics
» Finite-Difference Time-Domain Solvers for Simulating Electromagnetic Wave Propagation
Finite-difference time-domain solvers are used for full-wave simulations of electromagnetic propagation. A direct solution of the time-dependent electromagnetic vector field is calculated on a staggered three-dimensional grid.
|
Additional Details |

The image shown represents the electrical field in a three-dimensional waveguide from simulations run directly on the AMD FireStream™ 9250 GPU co-processor.
The simulation employed a 256 x 128 x 128 grid for the electric and magnetic vector fields with PEC boundary conditions and a sinusoidal source applied to the z-component of the electric field at the (left) edge of the waveguide. |
|
Author(s) |
David Richie |
|
Organization Type
|
Commercial |
|
Organization |
Brown Deer Technology |
|
API Used |
Brook+ |
|
Software License |
GPLv3 |
|
Available Content |
Website Source Code |
Real-Time Procedural Modeling
» Procedural Planets
Procedural Planets generates 3D noise and composites them in various ways to produce interesting models of planets.
|