cuda_loop


cuda_loop, a Fortran90 code which shows how, in a CUDA program running on a Graphics Processing Unit (GPU), the choice of block and thread factors determines the allocation of tasks to processors.

A CUDA kernel "kernel()" is invoked by a command of the form

    
      kernel << blocks, threads >> ( args )
      
where blocks and threads are each vectors of up to 3 values, listing the number of blocks and number of threads to be used.

If a problem involves N tasks, then tasks are allotted to specific CUDA processes in an organized fashion. Some processes may get no tasks, one task, or multiple tasks.

Each process is given variables that can be used to determine the tasks to be performed:

Essentially, a process can determine its linear index K by:

      K = threadIdx.x
        +  blockdim.x  * threadIdx.y
        +  blockDim.x  *  blockDim.y  * threadIdx.z
        +  blockDim.x  *  blockDim.y  *  blockDim.z  * blockIdx.x
        +  blockDim.x  *  blockDim.y  *  blockDim.z  *  gridDim.x  * blockIdx.y
        +  blockDim.x  *  blockDim.y  *  blockDim.z  *  gridDim.x  *  gridDim.y  * blockIdx.z
      
It should use this index as follow:
      Set task T = K.

      while ( T < N )
        carry out task T;
        T = T + blockDim.x * blockDim.y * blockDim.z * gridDim.x * gridDim.y * gridDim.z.
      

The CUDA_LOOP code suggests how a specific set of block and thread parameters would determine the assignment of individual tasks to CUDA processes.

Licensing:

The information on this web page is distributed under the MIT license.

Languages:

cuda_loop is available in a C version and a C++ version and a Fortran90 version and a MATLAB version and an Octave version and a Python version.

Related Programs and Data:

cuda_loop_test

Reference:

  1. John Cheng, Max Grossman, Ty McKercher,
    Professional CUDA C Programming,
    John Wiley, 2014,
    ISBN: 978-1-118-73932-7.
  2. Jason Sanders, Edward Kandrot,
    CUDA by Example,
    Addison Wesley, 2010,
    ISBN: 978-0-13-138768-3,
    LC: QA76.76.A65S255 2010.

Source Code:


Last revised on 14 June 2020.