[fwlug] Fwd: CCL: Computing on graphical processors
John McKelvey
jmmckel at gmail.com
Thu Feb 21 22:40:29 EST 2008
Folks,
Here is a post I got from the Computational Chemistry List about using GPU's
for heavy computing.
The links below give some very interesting insight into the possibility of
using GPU's for intensive calculations [other than gaming, etc.]
Cheers!
John McKelvey
---------- Forwarded message ----------
From: Peter Burger <burger at chemie.uni-hamburg.de>
Date: Sun, Feb 17, 2008 at 5:45 AM
Subject: Re: CCL: Computing on graphical processors
To: jmmckel at gmail.com
John McKelvey jmmckel,,gmail.com schrieb:
> It seems that there is beginning to be significant efforts to do
processing using video card processors. Could someone comment on this as
to why this is being so successful [probably many factors, I suppose], what
kinds of computations work well on them, and what kinds of work probably do
not work well on them.... and why?
> Thanks!
> John McKelvey
>
>
Hi John,
Simply speaking performance! It is new type of math coprocessor - as had
been in use previously, e.g. separate 80x87 math coprocessors for Intel
CPUs prior
to the incorporation on the CPU starting with the 80486! GPUs are highly
parallelized, i.e. single instruction multiple data (SIMD) machines, and
the fastest ones
deliver in the order of a few hundred Gigflop, while a single (dual
core) Core-2 Duo CPU with 3 GHz delivers 3*8*2 GFlop peak = 48 GFlop in
sgemm (single precision)
and 24 GFlop in dgemm (double precision) matrix multiplication code. In
reality ca. 90% of these values can be achieved on the Intel CPUs. It is
important to note
that these codes are optimized in such a way that they are not limited
by memory bandwidth. Using eight cores of the Harpertown Xeons pushes this
to the limit of the memory bandwidth and "only" 80 GFlop (ca. 80% of
peak )are achievable.
There exist free blas and also lapack (maybe only commercially available
fro third parties) libraries for Nvidia graphic cards within the "CUDA"
environment.
Currently, the fastest implementations deliver >200 GFlops with the best
Nvidia GPUs in sgemm matrix multiplication. AMD/ATI has afaik something
similar.
This means that one just links the corresponding Nvidia libraries
rather than the Intel mkl or AMD acml library. There is still an issue
with the memory bandwidth
between the GPU <-> main memory interface which is likely to improve
with the newer PCI-express 2.0 and future bus(es).
http://www.nvidia.com/object/cuda_home.html
more is here
===> The following link is quite informative...
http://www.gpucomputing.eu/
The major problem is that the video card processor (GPU) have so far
only IEEE754 32-bit single precision which is not sufficient for
quantum-chemical
codes.
The latest development/rumor is rather exciting: Nvidia is working on
double precision capabilities for their latest GPUs (Geforce 9).
So far the GPUs are used with high success and huge performance gains in
classical MD calculations, in the latter case also parallel GPU cards
were used.
Check for instance the VMD home page.
http://www.ks.uiuc.edu/Research/gpu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://fortwaynelug.org/pipermail/fwlug_fortwaynelug.org/attachments/20080221/7c81cad4/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gpu.pdf
Type: application/pdf
Size: 329467 bytes
Desc: not available
Url : http://fortwaynelug.org/pipermail/fwlug_fortwaynelug.org/attachments/20080221/7c81cad4/attachment-0001.pdf
More information about the Fwlug
mailing list