CUDA Programming on Python

Опубликовано: 01 Октябрь 2022
на канале: Ahmad Bazzi

1,185,201

31k

☕️ Buy me a coffee: https://paypal.me/donationlink240
🙏🏻 Support me on Patreon:   / ahmadbazzi

In this tutorial, I’ll show you everything you need to know about CUDA programming so that you could make use of GPU parallelization, thru simple modifications of your already existing code, running on a boring CPU. The following tutorial was recorded on NVIDIA’s Jetson Orin supercomputer. CUDA stands for Compute Unified Device Architecture, and is a parallel computing platform and application programming interface that enables software to use certain types of graphics processing units for general purpose processing, an approach called general-purpose computing on GPUs.

First, I will start by writing a simple function that does a vector multiplication, which is going to run on a CPU. Then we get the same job done using CUDA parallelization on a GPU. Keep in mind that GPU’s have more cores than CPU and hence when it comes to parallel computing of data, GPUs perform exceptionally better than CPUs even though GPUs have lower clock speed and lack several core management features as compared to CPUs. An example reveals that running 64 million massive multiplications on a GPU takes about 0.64 seconds, as opposed to 31.4 seconds when running on a CPU. This translates to a x50 gain in terms of speed, thanks to the parallelization on such a huge number of cores. Amazing ! This means that running a complex program on CPU taking about a month, could be executed in 14 hrs. This could be also faster given more cores.

Then, I’ll show you the gains in filling arrays on python on a CPU vs on a GPU. Another example reveals that the amount of time it took to fill the array on a CPU is about 2.58 seconds, as opposed to 0.39 seconds on a GPU, which is a gain of about 6.6x. The last fundamental section of this video is to show the gains in rendering images (or videos) on python. We will demonstrate why you see some film producers or movie makers rendering and editing their content on a GPU. GPU rendering delivers with a graphics card rather of a CPU, which may substantially speed up the rendering process because GPUs are primarily built for fast picture rendering. GPUs were developed in response to graphically intensive applications that taxed CPUs and slowed processing speed. I will use the Mandelbrot set to perform a comparison between CPU and GPU power. This example reveals that only 1.4 seconds of execution is needed on a GPU as opposed to 110 seconds on a CPU, which is a 78x gain. This simply means that instead of rendering a 4K resolution video over a week on a CPU, you could get the same video in 8K resolution rendered in 2 hours on a GPU, if you are using 32 threads. So imagine if you doubled the threads and blocks involved in GPU optimization.

⏲Outline⏲

00:00 Introduction
00:33 Multiplication gains on GPUs vs CPUs
08:31 Filling an array on GPUs vs CPUs
11:55 Rendering gains on GPU vs CPU
12:35 What is a Mandelbrot set ?
13:39 Mandelbrot set rendering on CPU
17:01 Mandelbrot set rendering on GPU
20:54 Outro

📚Related Lectures

Jetson Orin Supercomputer -    • The BEST & SMALLEST AI supercomputer ...
Quick Deploy: Object Detection via NGC on Vertex AI Workbench Google Cloud -    • Quick Deploy: Object Detection via NG...
Voice Swap using NVIDIA's NeMo -    • Voice Swap using NVIDIA's NeMo on Pyt...

🔴 Subscribe for more videos on CUDA programming
👍 Smash that like button, in case you find this tutorial useful.
👁‍🗨 Speak up and comment, I am all ears.

💰 Donate to help the channel
Patreon -   / ahmadbazzi

#cuda #cudaprogramming #gpu