Skip to main content

Run inference and read output

Inference is a two-step process:

  1. Call Inference(instancePtr, prompt).
  2. Call OutputNLM(instancePtr) to retrieve a pointer to an output struct in WASM memory.

The output struct layout is:

  • token_speed (float) at offset 0
  • size (int32) at offset 4
  • buffer_ptr (uint32) at offset 8
module.ccall('Inference', null, ['number', 'string'], [instance, prompt]);
const outputPtr = module.ccall('OutputNLM', 'number', ['number'], [instance]);

const tokenSpeed = module.HEAPF32[outputPtr / 4];
const size = module.HEAP32[(outputPtr + 4) / 4];
const bufferPtr = module.HEAPU32[(outputPtr + 8) / 4];

const text = module.UTF8ToString(bufferPtr, size);

When decoding strings, scan for a null terminator and clamp size to the real string length to avoid reading beyond the buffer.

Next: Cleanup