AMD Logo AMD Developer Central

ATI Stream Developer Showcase
Skip Navigation LinksHome > Samples & Demos > ATI Stream Developer Showcase

ATI Stream

The ATI Stream Developer Showcase is an exhibition of notable applications and demos created by customers and technology partners using ATI Stream Technology. ATI Stream Technology allows developers to accelerate code used in a wide variety of applications, including physics, video processing, scientific simulations, finance and medical imaging.

If you have an application or demo that you have created using ATI Stream Technology, please let us know!

Compilers

» Calseum

Calseum is a compiler library that compiles C-like language to AMD GPUs. It is designed as a replacement for CAL IL. It is a small library that takes as input a string of C-like code and returns as output CAL IL. It is written in C++ and currently runs on 64-bit Linux®.

Additional Details

It is being developed at the Dept of Computing Science, University of Alberta as one part of a thesis project under the supervision of Prof. Nelson Amaral.

Author(s)

Rahul Garg

Organization Type

Research

Organization

University of Alberta

API Used

CAL/IL

Software License

Apache 2.0

Available Content

Website

» HMPP™ Workbench

Based on a set of compiler directives, the HMPP Workbench contains a pre-processor, target generators and a runtime library to develop and execute GPU-accelerated applications. By driving all the compilation workflow using standard C and Fortran compilers as well as the Brook+ and CAL/IL compilers, HMPP seamlessly integrates in the user environment as an hybrid compiler. As well as helping lower the development time and effort, HMPP delivers code portability, interoperability and easy application scaling.

Additional Details

HMPP programming directives allows you to declare and execute hardware-accelerated versions of functions. Communication and device allocation can be further optimized using the data transfers and synchronization directives.

HMPP helps ease the scaling of applications in multi-GPU systems by detecting at runtime the presence and availability of GPUs before offloading computations into them.

Author(s)

CAPS entreprise

Organization Type

Commercial

Organization

CAPS entreprise

API Used

Integration of Brook+ and CAL/IL in C and Fortran

Software License

Proprietary

Available Content

Website

» RapidMind Multi-core Development Platform

RapidMind Multi-core Development Platform allows software organizations to quickly build applications that can harness the full potential of the latest multi-core processors as well as seamlessly take advantage of the application acceleration available in today’s stream processors such as the GPU or the Cell Broadband Engine.

Author(s)

RapidMind

Organization Type

Commercial

Organization

RapidMind

API Used

OpenGL

Software License

Proprietary

Available Content

Website

Libraries

» PyGWA – GPGPU Library for Python

PyGWA is a GPGPU library for Python. It contains Python bindings for ATI CAL and PyGWA.DP – a toy data-parallel programming API.

Author(s)

Rafal Lewczuk

Organization Type

Research

API Used

CAL/IL

Software License

GPL

Available Content

Website

Physics

» “PowderToy” Particle & Fluid Simulation Application

The “PowderToy” particle & fluid simulation application was the first public demonstration of OpenCL functionality. It was presented by AMD at Siggraph Asia 2008.

Additional Details

This demonstration shows how OpenCL can be used to extract and implement high performance parallel computing on multi-core CPUs.

Author(s)

Advanced Micro Devices, Inc.

Organization Type

Commercial

Organization

Advanced Micro Devices, Inc.

API Used

OpenCL

Software License

Proprietary

Available Content

Video

» Optimized Executions of Havok Middleware on AMD Platforms

AMD and Havok demonstrated new, optimized executions of Havok's physics middleware on AMD platforms at the 2009 Game Developers Conference. The demonstration included the first OpenCL supported execution of Havok Cloth™.

Additional Details

Author(s)

Advanced Micro Devices, Inc. / Havok

Organization Type

Commercial

Organization

Advanced Micro Devices, Inc. / Havok

API Used

OpenCL

Software License

Proprietary

Available Content

Video

» Bullet Physics SDK 3D Rigid Body Constraints Solver Demo

A port of the Bullet Physics SDK 3D rigid body constraint solver was demonstrated at the 2009 Game Developers Conference.

Author(s)

Advanced Micro Devices, Inc.

Organization Type

Commercial

Organization

Advanced Micro Devices, Inc.

API Used

OpenCL

Software License

MIT License

Available Content

Website

Computational Fluid Dynamics

» Smoothed Particle Hydrodynamics Fluid Simulation on ATI GPU

A fluid simulator running on the GPU using smoothed particle hydrodynamics to simulate a particle-based fluid.

Additional Details

The image shown is a graphical representation of a Smoother Particle Hydrodynamics (SPH) fluid simulation on an ATI GPU. The simulation was performed on an ATI Radeon HD 4870 GPU co-processor. More than 10K fluid particles are simulated with physical boundary interactions in an interactive frame rate. The simulator is written using C/C++ and ATI Brook+. The fluid surface is rendered using Direct3D.

Author(s)

Jiawei Ou
Chanjuan Wen

Organization Type

Academia

Organization

Tongji University

API Used

Brook+

Software License

GPLv3

Available Content

Website

» PhyFluids3D

Fluids simulation plug-in for Maxon Cinema 4D

Additional Details

Author(s)

Remotion

Organization Type

Freelance

API Used

Brook+

Software License

Proprietary

Available Content

Demo Video
Website

Numerical Simulator

» Shallow Water Systems Simulation

Shallow water systems permit the simulation of rivers, channels and dambreak problems. Extremely efficient performance solvers are required to solve and analyze these problems in a reasonable amount of execution time. Due to the space and/or time cycles required being frequently very large, the computational requirements of the algorithms are high and execution time optimizations are desirable.

Additional Details

We proposed a strategy to design an efficient implementation on AMD GPUs using Brook+ based on computational kernel analysis at the domain independent concept level (e.g. inductions, irregular reductions).

Speedup

140x*

Author(s)

J. Lobeiras
M. Amor
M. Arenaz
B. B. Fraguela
J. A. García
M. J. Castro

Organization Type

Academia

Organization

The project is a collaboration between several groups:

API Used

Brook+

Software License

Proprietary

Available Content

Demo Videos

M. Arenaz, R. Doallo and J. Touriño, XARK: An Extensible Framework for Automatic Recognition of Computational Kernels, ACM Transactions on Programming Languages and Systems, page 30(6), 2008.

 

* Based on difference in elapsed time required to perform shallow water simulation of 25,693 sec. for CPU-based system vs. 183 sec. for CPU-based system with numerical simulator software and ATI Radeon™ GPU.  Configuration: AMD Athlon™ 4850e processor 2.5 GHz with ATI Radeon™ HD 4850 GPU, 4GB DDR2 359 MHz, AMD 790X Chipset-based motherboard, Microsoft® Windows® XP 64-bit, Microsoft® Visual C++® 2005 x64, ATI Stream SDK v1.4-beta, ATI Catalyst™ 9.2 software.

Molecular Dynamics

» LAMMPS Molecular Dynamics Simulations

Computational chemistry employs molecular dynamics simulations to simulate the motions of interacting atoms and molecules. An existing production code used for molecular dynamics simulations (LAMMPS) was modified to take advantage of ATI Stream processors for calculating the particle-particle interactions, which accounted for the largest part of the overall simulation time on a standard processor.

Additional Details

The image shown represents a 32,000-atom molecular dynamics simulation of a rhodopsin protein in a solvated lipid bilayer with the particle-particle interaction calculated on the GPU. The simulation represents the most challenging benchmark system distributed with the LAMMPS code. (More information on the LAMMPS molecular dynamics code can be found at http://lammps.sandia.gov).

Author(s)

David Richie

Organization Type

Commercial

Organization

Brown Deer Technology

API Used

Brook+

Software License

GPLv2

Available Content

Website

Science

» Astronomical Many-body Simulations

The gravitational many-body problem is concerned with the movement of bodies interacting through gravity. Solving the gravitational many-body problem with a CPU takes significant time due to O(N2) computational complexity. The demonstrated technique utilizes an ATI Radeon™ HD 4850 GPU from AMD to optimize the exact force-calculation. The optimized result is realized by a loop-unrolling technique that is highly effective on the ATI Radeon™ HD 4850 GPU.

Additional Details

The image shows a snapshot of the demo program simulating the interaction between galaxies. The simulation starts out as four galaxies in a figure-eight orbit. As the simulation proceeds, the four galaxies eventually merge into a single spherical galaxy.

Author(s)

N. Nakasato
K. Fujiwara
M. Sato

Organization Type

Academia

Organization

University of Aizu, Japan

API Used

CAL/IL

Available Content

Website

Video Processing

» ArcSoft SimHD™ Upscaling Technology Demo

By utilizing the ATI Stream™ SDK, ArcSoft managed to port the intensive upscaling computation from the CPU to ATI GPUs. The ATI Stream version of SimHD included in TotalMedia Theatre™ has been demonstrated at CEATEC Japan in 2008 and is planned to launch at Computex (June 2009).

Additional Details

To achieve realistic upscaled images, ArcSoft SimHD™ technology requires high precision on floating point manipulations as well as intensive usage of parallel processing. Both are proven to perform far better on the GPU than the CPU.

Author(s)

ArcSoft, Inc.

For more information, please contact:
Kam Shek
Vickie Wei

Organization Type

Commercial

Organization

ArcSoft, Inc.

API Used

CAL/IL

Software License

Proprietary (included in TotalMedia Theatre 3 Platinum Edition)

Available Content

Video
Website

Numerics

» GOSpMV: Automatic Performance Tuning of SpMV Software Package

GOSpMV performs automatic performance tuning for Spare Matrix-Vector Multiplication (SpMV) on AMD GPUs. It utilizes ATI Stream Computing and uses register-level blocking algorithms for its optimization. SpMV is an important computational kernel in scientific applications that tends to perform poorly on modern processors because of irregular memory accesses.

Author(s)

Xianyi Zhang
Xiangzheng Sun
Shengfei Liu
Yuxin Tang
Fangfang Liu

Organization Type

Academia

Organization

Lab of Parallel Computing, Institute of Software Chinese Academy of Sciences

API Used

Brook+

Available Content

Website

» AMD Core Math Library for Graphics Processors (ACML-GPU)

AMD Core Math Library for Graphic Processors (ACML-GPU) provides an ATI Stream-accelerated version of ACML. ACML-GPU accelerates certain routines in ACML, such as SGEMM and DGEMM, by off-loading the computation to the compatible GPUs in the system. The library dynamically decides, based on the parameters passed to the routines, whether to run the computation on the CPU or GPU, depending on which processor will yield the best performance.

Additional Details

ACML-GPU automatically scales its computation across multiple GPUs, if available and can take advantage of the double precision floating point hardware in the GPU on products that contain hardware DPFP support.

Author(s)

Advanced Micro Devices, Inc.

Organization Type

Commercial

Organization

Advanced Micro Devices, Inc.

API Used

CAL/IL

Software License

Proprietary

Available Content

Website

Seismic Imaging

» Finite-Difference Time-Domain Solvers for Modeling Velocity-Stress Wave Propagation

Finite-difference time-domain solvers are used to model velocity-stress wave propagation for seismic applications. A direct solution of the time-dependent velocity vector and stress tensor is solved on a staggered three-dimensional grid.

Additional Details

The image shown represents the propagating seismic pressure wave in a three-dimensional seismic model of the earth’s surface (1500m x 750m x 750m) with a 40m cubic high-velocity blocky inclusion.

The simulation was performed entirely on the AMD FireStream™ 9250 GPU co-processor. The velocity vector and stress tensor values were modeled on a 256 x 128 x 128 grid. The propagating wave was generated by a Ricker wavelet applied to the z-component of the velocity field at the center of the simulation domain.

Author(s)

David Richie

Organization Type

Commercial

Organization

Brown Deer Technology

API Used

Brook+

Software License

GPLv3

Available Content

Website
Source Code

Medical Imaging

» GpuFdk

Analytic tomographic reconstruction using the Fdk algorithm, for a non-uniform detector geometry.

Additional Details

Author(s)

Alain Bonnisent

Organization Type

Academia

Organization

CPPM-CNRS  

API Used

Brook+

Software License

CECIL

Available Content

Website
Website (multi-GPU implementation)

Financial

» Accelerating Binomial Options Pricing Scenarios

This demonstration presents three implementations of the Binomial Tree pricing model. The first is from the widely-used, open source project Quantlib. The second is a hand-tuned C version of the same model. The third is an implementation of this model using the RapidMind platform with ATI Stream technology accelerating the calculations.

Additional Details

Competitive advantage in computational finance is about deploying the smartest, fastest algorithms for modeling financial management models and for pricing financial instruments. The RapidMind Multi-core Development Platform allows software organizations to quickly build applications that can harness the full potential of the latest multi-core processors as well as seamlessly take advantage of the application acceleration available in today’s stream processors such as the GPU or the Cell Broadband Engine. The RapidMind platform lets financial organizations focus on their internal algorithmic expertise yet quickly deploy applications on the best possible hardware.

Speedup

55x*

Author(s)

RapidMind

Organization Type

Commercial

Organization

RapidMind  

API Used

RapidMind Multi-core Development Platform

Software License

Proprietary

Available Content

* For backup and configuration information, see Video.
RapidMind Website

Security

» Elcomsoft Wireless Security Auditor

Wireless Security Auditor utilizes GPUs to help accelerate the process of auditing password security of wireless networks protected by WPA/WPA2-PSK.

Additional Details*

Wireless Security Auditor is capable of taking advantage of multiple compatible GPUs in a single system.

Author(s)

Elcomsoft Co. Ltd.

Organization Type

Commercial

Organization

Elcomsoft Co. Ltd.

API Used

CAL/IL

Software License

Proprietary

Available Content

Website

 

* Configuration: Intel Core 2 Duo E4500 (2.2 GHz), GIGABYTE GA-EP45-DS3P, 4GB DDR2 PC6400 memory, ATI Catalyst™ 9.4 drivers, Microsoft® Windows® XP Professional (32-bit) SP3.

 

» RSA Labs RC5-72 Secret-Key Challenge Client

Distributed.net uses AMD GPUs to find the solution for the RC5-72 secret key challenge.

Additional Details*

The client is capable of taking advantage of multiple GPUs in a single system.

Author(s)

Vyacheslav Chupyatov

Organization Type

Research

Organization

distributed.net

API Used

CAL/IL

Software License

Proprietary

Available Content

Website
Source Code

 

*Based on AMD internal testing using RC5-72 clients as of 9/04/09. Results shown in MKeys evaluated per second. Configuration: AMD Phenom™ X4 9950 Black Edition processor, 8GB DDR2 RAM, Windows Vista® 32-bit. AMD drivers: ATI Catalyst™ 9.8 (ATI Radeon™ HD 48xx), prerelease driver (ATI Radeon HD 5870). Nvidia driver: GeForce 190.62. AMD client: [x86/Stream], v2.9106.513 (beta8). Nvidia client: [x86/CUDA-2.2], v2.9105.512 (beta8).

Electromagnetics

» Finite-Difference Time-Domain Solvers for Simulating Electromagnetic Wave Propagation

Finite-difference time-domain solvers are used for full-wave simulations of electromagnetic propagation. A direct solution of the time-dependent electromagnetic vector field is calculated on a staggered three-dimensional grid.

Additional Details

The image shown represents the electrical field in a three-dimensional waveguide from simulations run directly on the AMD FireStream™ 9250 GPU co-processor.

The simulation employed a 256 x 128 x 128 grid for the electric and magnetic vector fields with PEC boundary conditions and a sinusoidal source applied to the z-component of the electric field at the (left) edge of the waveguide.

Author(s)

David Richie

Organization Type

Commercial

Organization

Brown Deer Technology

API Used

Brook+

Software License

GPLv3

Available Content

Website
Source Code

Real-Time Procedural Modeling

» Procedural Planets

Procedural Planets generates 3D noise and composites them in various ways to produce interesting models of planets.

Additional Details

Procedural Planets is a technology demonstration using an ATI Stream-accelerated version of libnoise, a portable, open-source, coherent noise-generating library for C++.

Author(s)

Doug Morrow

API Used

Brook+

Available Content

Forum Post With Pictures: 1 2
Additional Rendered Images
libnoise Home Page (original version, without ATI Stream acceleration)