Cuda tutorial Introduction 1

12 February 2019

Installation

The easiest way to insatll CUDA is to use the standard installation form a package

sudo apt install nvidia-cuda-toolkit
sudo apt-get install libglu1-mesa libxi-dev libxmu-dev libglu1-mesa-dev

On Ubuntu 18.04 it will install the Cuda version 9.

Vocabulary

A small definition of concept is necessary to understand the CUDA architecture.
Host An another word to say the CPU of the running machine.
Device: The GPU itself
Kernel Function or portion of code that runs on a grid
GridSet of blocks
BlocksSet of threads
ThreadsSmallest unit of execution
Ouch.
Surprisingly the smallest unit of a GPU is called a thread (so a different meaning from Linux vocabulary) but the power of a GPU os it can start all threads in one or two instruction and synchronization is done directly on the card, whithout any intervention from the developer.

Example

To use this new concept, the Nvidia add new directive or keywords in the C++ syntax. To use them you must use the Nvidia compiler nvcc.

CUDA C keyword __global__ indicates that a function runs on the device called from host code (value by default to execute a kernel on a GPU)
CUDA C keyword __device__ indicates that a function runs on the device called from a device code
CUDA C keyword __host__ indicates that a function runs on the host (the main CPU)

After all theses consideration, let’s start with a small example.

#include "performancetiming.hpp"
#include 
#include 

// function to add the elements of two arrays
void add(int n, float *x, float *y)
{
  for (int i = 0; i < n; i++)
      y[i] = x[i] + y[i];
}

int main(void)
{
  int N = 1<<24; // 16M elements

  float *x = new float[N];
  float *y = new float[N];

  // initialize x and y arrays on the host
  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }

  // Run kernel on 16M elements on the CPU

  add(N, x, y);


  float maxError = 0.0f;
  for (int i = 0; i < N; i++)
    maxError = fmax(maxError, fabs(y[i]-3.0f));
  std::cout << "Max error: " << maxError << std::endl;

  // Free memory
  delete [] x;
  delete [] y;

  return 0;
}

This code is adding number one by one all elements of two arrays and store it in the second. The main processing part is done in add function. Each result is independent so we can

Category: CUDA

Cuda tutorial Introduction 1

Installation

Vocabulary

Example