|
Hermes
|
Holds CUDA launch parameters. More...
#include <cuda_utils.h>
Public Member Functions | |
| LaunchInfo (u32 n, size_t shared_memory_size_in_bytes=0, cudaStream_t stream={}) | |
| 1-dimensional launch constructor | |
| LaunchInfo (size2 b, size2 s={0, 0}, size_t shared_memory_size_in_bytes=0, cudaStream_t stream={}) | |
| 2-dimensional launch constructor | |
| LaunchInfo (size3 b, size3 s={0, 0, 0}, size_t shared_memory_size_in_bytes=0, cudaStream_t stream={}) | |
| 3-dimensional launch constructor | |
| u32 | threadCount () const |
| Computes the total number of threads. | |
| u32 | blockThreadCount () const |
| Computes the total number of threads per block. | |
Static Public Member Functions | |
| static void | distribute (u32 max_b, u32 n, u32 &b, u32 &g) |
| Recomputes block and grid sizes to achieve good occupancy. | |
| static void | redistribute (dim3 b, dim3 g, dim3 &new_b, dim3 &new_g) |
| Redistributes threads to fit the gpu block size limits. | |
Public Attributes | |
| dim3 | grid_size |
| cuda grid size (in number of blocks) | |
| dim3 | block_size |
| cuda block size (in number of threads) | |
| size_t | shared_memory_size {0} |
| size of shared memory in bytes | |
| cudaStream_t | stream_id {} |
| launch stream identifier | |
Holds CUDA launch parameters.
Here is a list of limitations about the quantity of threads in a CUDA launch:
|
inline |
1-dimensional launch constructor
| n | thread count |
| shared_memory_size_in_bytes | (per block) |
| stream | stream id |
|
inline |
2-dimensional launch constructor
| b | block size (threads per block) |
| s | grid size (blocks) |
| shared_memory_size_in_bytes | per (per block) |
| stream | stream id |
|
inline |
3-dimensional launch constructor
| b | block size (threads per block) |
| s | grid size (blocks) |
| shared_memory_size_in_bytes | (per block) |
| stream | stream id |
|
inline |
Computes the total number of threads per block.
|
inlinestatic |
Recomputes block and grid sizes to achieve good occupancy.
| max_b | maximum number of threads per block |
| n | total number of threads |
| b | output block size |
| g | output grid size |
|
inlinestatic |
Redistributes threads to fit the gpu block size limits.
| b | |
| g | |
| new_b | |
| new_g |
|
inline |
Computes the total number of threads.