Using CUDA API

List available devices and their properties

Let us start familiarizing ourselves with CUDA by writing a simple “Hello CUDA” program, which will query all available devices and print some information on them. We will start with a basic .cpp code, change it so it will be compiled by CUDA compiler and do some CUDA API call, to see what devices are available.

To do that, we are going to need a couple of CUDA API functions. First, we want to ask API how many CUDA+capable devices are available, which is done by following function:

The function calls the API and returns the number of the available devices in the address provided as a first argument. There are a couple of things to notice here. First, the function is defined with two CUDA specifiers __host__ and __device__. This means that it is available in both host and device code. Second, as most of CUDA calls, this function returns cudaError_t enumeration type, which can contain a error message if something went wrong. In case of success, cudaSuccess is returned. The actual number of devices is returned in the only argument the function takes, i.e. one needs to declare an integer and pass a pointer to it. The function will then update the value at this address. This type of signature is quite common to CUDA functions, with most of them returning cudaError_t type and taking a pointer for its actual output.

With the number of devices known, we can cycle through them and check what kind of devices are available, their names and capabilities. In CUDA, these are stored in cudaDeviceProp structure. This structure contains extensive information on the device, for instance its name (prop.name), major and minor compute capabilities (prop.major and prop.minor), number of streaming processors (prop.multiProcessorCount), core clock (prop.clockRate) and available memory (prop.totalGlobalMem). See the cudaDeviceProp API reference for full list of fields in the cudaDeviceProp structure. To populate the cudaDeviceProp structure, CUDA has cudaGetDeviceProperties(..) function:

The function has a __host__ specifier, which means that one can not call it from the device code. It also returns cudaError_t structure, which can be cudaErrorInvalidDevice in case we are trying to get properties of a non-existing device (e.g. when deviceId is larger than numDevices above). The function takes a pointer to the cudaDeviceProp structure, to which the data is saved and an integer index of the device to get the information about. The following code should get you an information on the first device in the system (one with deviceId = 0).

cudaGetDeviceProp prop;
cudaGetDeviceProperties(&prop, 0);

Exercise