GPU parallel computing for machine learning in Python: how to build a parallel computer
English | 2017 | ASIN: B071KN8484 | 22 pages | PDF, EPUB, AZW3 (conv) | 1.22 Mb
English | 2017 | ASIN: B071KN8484 | 22 pages | PDF, EPUB, AZW3 (conv) | 1.22 Mb
This book illustrates how to build a GPU parallel computer. If you don't want to waste your time for building, you can buy a built-in-GPU desktop/laptop machine. All you need to do is to install GPU-enabled software for parallel computing. Imagine that we are in the midst of a parallel computing era. The GPU parallel computer is suitable for machine learning, deep (neural network) learning. For example, GeForce GTX1080 Ti is a GPU board with 3584 CUDA cores. Using the GeForce GTX1080 Ti, the performance is roughly 20 times faster than that of an INTEL i7 quad-core CPU. We have benchmarked the MNIST hand-written digits recognition problem (60,000 persons: hand-written digits from 0 to 9). The result of MNIST benchmark for machine learning shows that GPU of a single GeForce GTX1080 Ti board takes only less than 48 seconds while the INTEL i7 quad-core CPU requires 15 minutes and 42 seconds.
A CUDA core is most commonly referring to the single-precision floating point units in an SM (streaming multiprocessor). A CUDA core can initiate one single precision floating point instruction per clock cycle. CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing. The GPU parallel computer is based on SIMD ( single instruction, multiple data) computing.
The first GPU for neural networks was used by Kyoung-Su Oh, et al. for image processing published in 2004 (1). A minimum GPU parallel computer is composed of a CPU board and a GPU board.
This book contains the important issue on which CPU/GPU board you should buy and also illustrates how to integrate them in a single box by considering the heat problem. The power consumption of GPU is so large that we should take care of the temperature and heat from the GPU board in the single box. Our goal is to have the faster parallel computer with lower power dissipation.
Software installation is another critical issue for machine learning in Python. Two operating system examples including Ubuntu16.04 and Windows 10 system will be described. This book shows how to install CUDA and cudnnlib in two operating systems. Three frameworks including pytorch, keras, and chainer for machine learning on CUDA and cudnnlib will be introduced. Matching problems between operating system (Ubuntu, Windows 10), library (CUDA, cudnnlib), and machine learning framework (pytorch, keras, chainer) are discussed.