Show simple item record

dc.contributor.advisorGregg, David
dc.contributor.authorCallanan, Owen
dc.date.accessioned2016-11-07T14:03:31Z
dc.date.available2016-11-07T14:03:31Z
dc.date.issued2007
dc.identifier.citationOwen Callanan, 'High performance scientific computing using FPGAs for lattice QCD', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2007, pp 161
dc.identifier.otherTHESIS 8144
dc.identifier.urihttp://hdl.handle.net/2262/77604
dc.description.abstractThe recent development of large FPGAs combined with the availability of a variety of FPGA-based non-integer arithmetic cores has made it possible to implement high performance matrix kernel operations on FPGAs. This thesis seeks to evaluate the performance of FPGAs for real scientific computations by implementing lattice Quantum Chromodynamics (lattice QCD), which is one of the classic scientific computing problems. Lattice QCD computing machinery is the focus of considerable research work worldwide, including two custom ASIC based solutions and a variety of custom built PC cluster machines. This wide variety of highly optimised lattice QCD computing machinery permits comparison with the state of the art for high performance computing machinery. The results presented in this thesis give significant insights into the usefulness of FPGAs for scientific computing. This thesis also evaluates two different number systems available for running scientific computing applications on FPGAs. FPGA based lattice QCD processors are implemented using both double precision IEEE floating point and logarithmic arithmetic cores with precision equivalent to IEEE single precision floating point. The performance of the FPGA based lattice QCD processors is compared with that of two lattice QCD targeted custom ASIC based supercomputers, with that of commercial supercomputers and with that of some highly optimised PC cluster based machines. The logarithmic arithmetic designs return per FPGA performance of 1320 MFLOPS for the performance critical lattice QCD Dirac operator, and they return 1050 MFLOPS for the full conjugate gradient solver application. The latest generation of PC clusters return per processor performance of about 1300 MFLOPS for the Dirac operator using single precision arithmetic. Thus the logarithmic arithmetic designs are competitive with the latest PC cluster machines, which are the main platform for single precision lattice QCD calculations. The double precision designs return performance of 1200 MFLOPS for the core Dirac operator and 940 MFLOPS for the conjugate gradient solver application. This compares very well with the double-precision per-processor performance of the latest PC clusters at 650 MFLOPS and with the performance of the IBM BlueGene/L supercomputer at 1100 MFLOPS per processor. BlueGene/L processors consist of an ASIC with two CPU cores, so the per-core performance of the BlueGene/L is 550 MFLOPS. The double precision FPGA design's performance also compares very well with the per-processor performance of the two custom ASIC supercomputers that have been constructed specifically for lattice QCD. The QCDOC machine has per-processor performance of 396 MFLOPS, whilst the apeNEXT system has perprocessor performance of 894 MFLOPS. All figures are for the Dirac operator. All current lattice QCD machines are constructed using many processors. The computational requirements of lattice QCD are so great that they can never be met by a single processor. To investigate the viability of multiple-FPGA based systems, a dual FPGA version of the Dirac operator was implemented. Lattice QCD is a highly parallelisable problem and can be implemented efficiently on multiple processor machines. The dual-FPGA Dirac operator, which is based on the logarithmic Dirac operator, uses a low latency communications system to allow two FPGAs to work together on a single application of the Dirac operator. A speedup of 1.98 times is delivered over the single FPGA design, by parallelising computation and calculation in the dual FPGA design. This result strongly indicates that FPGAs have the potential to form a scalable multiple processor platform for high performance computing applications such as lattice QCD. These three sets of results demonstrate that FPGAs can return excellent performance for a typical high performance computing application, lattice QCD, using two different arithmetic systems. Double precision floating point is the most commonly used arithmetic system for high performance computing applications. This makes the results from the double precision designs particularly significant since they demonstrate that FPGAs can return highly competitive performance for real scientific computing applications using double precision arithmetic. Finally the dual FPGA Dirac operator demonstrates that FPGAs have the potential to form a scalable multiple processor platform for high performance computing.
dc.format1 volume
dc.language.isoen
dc.publisherTrinity College (Dublin, Ireland). School of Computer Science & Statistics
dc.relation.isversionofhttp://stella.catalogue.tcd.ie/iii/encore/record/C__Rb12916943
dc.subjectComputer Science, Ph.D.
dc.subjectPh.D. Trinity College Dublin
dc.titleHigh performance scientific computing using FPGAs for lattice QCD
dc.typethesis
dc.type.supercollectionrefereed_publications
dc.type.supercollectionthesis_dissertations
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (Ph.D.)
dc.rights.ecaccessrightsopenAccess
dc.format.extentpaginationpp 161
dc.description.noteTARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ie


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record