Hermes
|
Holds CUDA launch parameters. More...
#include <cuda_utils.h>
Public Member Functions | |
LaunchInfo (u32 n, size_t shared_memory_size_in_bytes=0, cudaStream_t stream={}) | |
1-dimensional launch constructor | |
LaunchInfo (size2 b, size2 s={0, 0}, size_t shared_memory_size_in_bytes=0, cudaStream_t stream={}) | |
2-dimensional launch constructor | |
LaunchInfo (size3 b, size3 s={0, 0, 0}, size_t shared_memory_size_in_bytes=0, cudaStream_t stream={}) | |
3-dimensional launch constructor | |
u32 | threadCount () const |
Computes the total number of threads. | |
u32 | blockThreadCount () const |
Computes the total number of threads per block. | |
Static Public Member Functions | |
static void | distribute (u32 max_b, u32 n, u32 &b, u32 &g) |
Recomputes block and grid sizes to achieve good occupancy. | |
static void | redistribute (dim3 b, dim3 g, dim3 &new_b, dim3 &new_g) |
Redistributes threads to fit the gpu block size limits. | |
Public Attributes | |
dim3 | grid_size |
cuda grid size (in number of blocks) | |
dim3 | block_size |
cuda block size (in number of threads) | |
size_t | shared_memory_size {0} |
size of shared memory in bytes | |
cudaStream_t | stream_id {} |
launch stream identifier | |
Holds CUDA launch parameters.
Here is a list of limitations about the quantity of threads in a CUDA launch:
|
inline |
1-dimensional launch constructor
n | thread count |
shared_memory_size_in_bytes | (per block) |
stream | stream id |
|
inline |
2-dimensional launch constructor
b | block size (threads per block) |
s | grid size (blocks) |
shared_memory_size_in_bytes | per (per block) |
stream | stream id |
|
inline |
3-dimensional launch constructor
b | block size (threads per block) |
s | grid size (blocks) |
shared_memory_size_in_bytes | (per block) |
stream | stream id |
|
inline |
Computes the total number of threads per block.
|
inlinestatic |
Recomputes block and grid sizes to achieve good occupancy.
max_b | maximum number of threads per block |
n | total number of threads |
b | output block size |
g | output grid size |
|
inlinestatic |
Redistributes threads to fit the gpu block size limits.
b | |
g | |
new_b | |
new_g |
|
inline |
Computes the total number of threads.