CUDA Support
Description
The types of this library support compilation with NVCC. To get the safety guarantees, there are some small modifications to the way that CUDA code is written. Normally you would have something like this:
using test_type = boost::safe_numbers::u128;
__global__ void cuda_test(const test_type *in, test_type *out, int numElements)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < numElements)
{
out[i] = boost::safe_numbers::bit_ceil(in[i]);
}
}
int main()
{
// Setup: generate inputs, allocate space for the output, etc.
// Launch the CUDA kernel, and then synchronize the results and get errors (if they exist)
cudaError_t err = cudaSuccess;
cuda_test<<<blocksPerGrid, threadsPerBlock>>>(input_vector.get(), output_vector.get(), numElements);
cudaDeviceSynchronize();
err = cudaGetLastError();
if (err != cudaSuccess)
{
// Handle error
}
}
For the on-device computation behavior to match the CPU computation behavior, we have our own error context class. This reduces our above example to the following:
using test_type = boost::safe_numbers::u128;
__global__ void cuda_test(const test_type *in, test_type *out, int numElements)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < numElements)
{
out[i] = boost::safe_numbers::bit_ceil(in[i]);
}
}
int main()
{
// Setup: generate inputs, allocate space for the output, etc.
// First, initialize the device error context
// Then, launch the CUDA kernel, and then synchronize the results and get errors (if they exist)
// Upon synchronization, the device_error_context will throw the same exception type as the operation would have on the host
boost::safe_numbers::device_error_context ctx;
cuda_test<<<blocksPerGrid, threadsPerBlock>>>(input_vector.get(), output_vector.get(), numElements);
try
{
// ctx.synchronize() will internally call BOOST_THROW_EXCEPTION if an error occured on device
ctx.synchronize();
}
catch (const /*exception type*/& e)
{
// Perform error handling
}
}
An exception thrown by ctx.synchronize() will have an e.what() that looks something like:
Device error on thread 256 at /home/runner/work/safe_numbers/boost-root/libs/safe_numbers/include/boost/safe_numbers/detail/unsigned_integer_basis.hpp:1067: Underflow detected in u16 subtraction
The device_error_context will also attempt to printf the error into the terminal.
This works when compiling with verbose mode -V.
printf error messages will look the same as the message displayed by the thrown exception.
The device_exception_mode Enum
#include <boost/safe_numbers/cuda_error_reporting.hpp>
namespace boost::safe_numbers {
enum class device_exception_mode : unsigned
{
trapped,
untrapped,
};
inline constexpr auto trapped = device_exception_mode::trapped;
inline constexpr auto untrapped = device_exception_mode::untrapped;
} // namespace boost::safe_numbers
This enum controls what happens when a safe_numbers operation detects an error on the CUDA device.
| Mode | Behavior |
|---|---|
|
Calls |
|
Records the error in managed memory and returns without calling |
Convenience constants boost::safe_numbers::trapped and boost::safe_numbers::untrapped are provided so the mode can be passed without qualifying the enum:
boost::safe_numbers::device_error_context ctx{boost::safe_numbers::untrapped};
The device_error_context Class
#include <boost/safe_numbers/cuda_error_reporting.hpp>
namespace boost::safe_numbers {
class device_error_context
{
public:
device_error_context();
explicit device_error_context(device_exception_mode e);
~device_error_context();
device_error_context(const device_error_context&) = delete;
device_error_context& operator=(const device_error_context&) = delete;
void reset();
void set_device_exception_method(device_exception_mode e);
void synchronize();
};
} // namespace boost::safe_numbers
The device_error_context class manages a CUDA managed memory buffer used to capture errors from device code.
When a safe_numbers operation detects an error on the GPU (overflow, underflow, domain error), the error details — file, line, thread ID, expression, and exception type — are written into this shared buffer.
The host then reads the buffer during synchronize() and throws the corresponding std::exception.
Only one device_error_context may exist at a time.
Constructing a second instance while one is already alive throws std::logic_error.
This constraint prevents races on the shared error buffer.
Constructors
device_error_context();
Constructs a context with the default device_exception_mode::trapped mode.
Clears any stale error state.
explicit device_error_context(device_exception_mode e);
Constructs a context with the specified exception mode. Clears any stale error state.
reset
void reset();
Clears the error fields (flag, file, line, thread ID, expression) so the context can be reused across kernel launches.
This is called automatically by the constructors and by synchronize() after reading the error state.
set_device_exception_method
void set_device_exception_method(device_exception_mode e);
Changes the device exception mode after construction.
This writes to managed memory, so it takes effect on the next kernel launch.
synchronize
void synchronize();
Calls cudaDeviceSynchronize(), then inspects the managed error buffer.
If an error was captured by device code, the error state is cleared and the appropriate exception is thrown on the host:
| Device Error | Host Exception |
|---|---|
Overflow |
|
Underflow |
|
Domain error (e.g. division by zero) |
|
Unknown |
|
The error state is cleared before throwing, so after catching the exception the same context is immediately reusable — no manual reset() call is needed.
If no device error was captured but cudaDeviceSynchronize() returned a non-success status (e.g. from a __trap() in trapped mode), a std::runtime_error is thrown with the CUDA error string.
Choosing a Mode
Use trapped (the default) when errors must halt execution immediately and silently continuing with wrong results is unacceptable.
This is the safest option, but the CUDA context cannot be recovered — the process must exit.
Use untrapped when you want to detect errors on the host and handle them gracefully (e.g. retry with different inputs, log and continue, or run a fallback path).
Be aware that other threads in the kernel may continue executing with incorrect values between the point of error and kernel completion.
// Trapped mode (default): any device error is immediately fatal
{
boost::safe_numbers::device_error_context ctx;
my_kernel<<<blocks, threads>>>(input, output, n);
try
{
ctx.synchronize();
}
catch (const std::runtime_error& e)
{
// CUDA context is corrupted — log and terminate
std::cerr << e.what() << std::endl;
return EXIT_FAILURE;
}
}
// Untrapped mode: errors are deferred to the host
{
boost::safe_numbers::device_error_context ctx{boost::safe_numbers::untrapped};
my_kernel<<<blocks, threads>>>(input, output, n);
try
{
ctx.synchronize();
}
catch (const std::overflow_error& e)
{
// Context is still valid — can reuse for another launch
std::cerr << "Overflow detected: " << e.what() << std::endl;
}
}