Run inference and read output
Inference is a two-step process:
- Call
Inference(instancePtr, prompt). - Call
OutputNLM(instancePtr)to retrieve a pointer to an output struct in WASM memory.
The output struct layout is:
token_speed(float) at offset0size(int32) at offset4buffer_ptr(uint32) at offset8
module.ccall('Inference', null, ['number', 'string'], [instance, prompt]);
const outputPtr = module.ccall('OutputNLM', 'number', ['number'], [instance]);
const tokenSpeed = module.HEAPF32[outputPtr / 4];
const size = module.HEAP32[(outputPtr + 4) / 4];
const bufferPtr = module.HEAPU32[(outputPtr + 8) / 4];
const text = module.UTF8ToString(bufferPtr, size);
When decoding strings, scan for a null terminator and clamp size to the real string length to avoid reading beyond the buffer.
Next: Cleanup
Databiomes