1.0 Introduction
Safetensors is a simple format for storing tensors safely. This means hackers can’t hide malicious executables in your AI model.
This is a low-level guide to opening, parsing and loading a safetensors file in C.
These files are large so this also serves as a guide to memory-mapping in C.
You can follow along with the code written at this Gist Link.
2.0 Environment Setup
This section covers cJSON installation and downloading a safetensors file from Huggingface.
Our tutorial is limited to floating-point 32 (FP-32). So you can skip the Huggingface section if a safetensor file is available.
*Sign up for Part 2 where we handle even more binary formats.
2.1 Downloading cJSON
We use the cJSON library to parse JSON in C. You can clone the repository here on GitHub. Then copy cJSON.c
and cJSON.h
into a directory called Dependencies
.
2.2 Downloading a Safetensors File

We use GPT-2 from the openai-community repo on Huggingface. It’s about 548 MB. Download and save the model to a folder called Safetensors
like this:
2.3 Verify Environment
Environment verification is the final step. We shall create a program that returns our file size.
Step 1: Create a file called Safetensor.c
in your directory:
Step 2: Write your includes. I’m working in Linux. This may be different on Windows and Mac:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
#include <sys/mman.h>
#include <stdbool.h>
#include "Dependencies/cJSON.h" //Include cJSON from Dependencies
int main()
{
return 0;
}
Step 3: Write a simple function called GetFileSize
to report the size in bytes of our downloaded safetensor.
size_t GetFileSize(char *fileName)
{
FILE *fp = fopen(fileName, "rb");
assert(fp != NULL);
fseek(fp, 0L, SEEK_END);
size_t currentFileSize = ftell(fp);rewind(fp);
fclose(fp);
return currentFileSize;
}
int main()
{
char *fileName = "Safetensors/model.safetensors";
size_t fileSize = GetFileSize(fileName);
printf("%ld bytes\n", fileSize);
return 0;
}
Step 4: Compile and run using the command
clear && gcc Safetensor.c Dependencies/cJSON.c -lm -o m.o && ./m.o
This should be your output:
3.0 The Safetensors File Format Specification

A safetensor file is divided into three sections:
Section 1: 8 bytes that state the header size.
Section 2: JSON character header.
Section 3: Tensor data stored as bytes.
The file authors give some additional notes. The most important are:
JSON keys must be unique.
NaN and Infinity values can exist.
All values are stored in little-endian.
4.0 Coding a Safetensors Parser
This section forms the bulk of our article. It is divided into these sections:
Memory-mapping a file to access raw byte data (unsigned 8-bit char).
Parsing different section of the file.
4.1 Memory Mapping
Memory-mapping permits us open large files in C without duplicating the entire file to memory. We write a function called LoadSafeTensorData
to achieve this:
unsigned char *LoadSafeTensorData(char *fileName, size_t *fileSizeHolder)
{
//Get filesize
size_t fileSize = GetFileSize_SafeTensor(fileName);
printf("%ld\n", fileSize);
//Open file in binary mode
FILE *fp = fopen(fileName, "rb");assert(fp != NULL);
int fileNumber = fileno(fp);
//Map the file in read-only mode
unsigned char *fileData = mmap(NULL,fileSize, PROT_READ, MAP_PRIVATE, fileNumber, 0);assert(fileData != NULL);
//Ensure the map worked
assert(fileData != MAP_FAILED);
fclose(fp);
*fileSizeHolder = fileSize;
return fileData;
}
Modify your main function to map the file and unmap at the very end like this:
int main()
{
char *fileName = "Safetensors/model.safetensors";
size_t fileSize = 0;
unsigned char *safeTensorData = LoadSafeTensorData(fileName, &fileSize);
assert(safeTensorData != NULL);
printf("%ld\n", fileSize);
//unmap memory
assert(munmap(safeTensorData, fileSize) != -1);
return 0;
}
This should run on your computer.
4.2 Parsing Different File Sections
As mentioned previously, a safetensors file is divided into three sections:
Section 1: 8 bytes that state the header size.
Section 2: JSON character header.
Section 3: Tensor data stored as bytes.
This section demonstrates how to parse these sections correctly.
4.2.1 Getting Header Length
First, we get the first 8 bytes of the file and store this in a variable called headerLength of type size_t. Remember, values are stored in LittleEndian.
Inside our main function, we write a for loop that starts at the byte at index 7 and iterates backwards:
size_t headerLength = 0;
for(int i = 7; i >= 0; i--)
{
headerLength <<= 8;
headerLength += safeTensorData[i];
}
printf("HeaderSize : %ld bytes\n", headerLength);
Your code should run and output 14283 as the header size:
4.2.2 Parsing JSON Data using CJSON
As mentioned in the file specification, JSON data comes immediately after the first 8 bytes.
A quick sanity check tests whether the byte at index 8 is the '{' character.
If this passes, we use pointer arithmetic to set out safeTensorData
array to index 8. Then parse upto headerLength
. This is the code:
//Test if 8th byte is {
assert(safeTensorData[8] == '{');
//Parse tensor data with cJSON
cJSON *tensorData = cJSON_ParseWithLength(safeTensorData+8, headerLength);
assert(tensorData != NULL);
//Load tensorData as string
char *formatted_json = cJSON_Print(tensorData);
assert(formatted_json != NULL);
printf("%s\n",formatted_json);
//Free formatted_json
free(formatted_json);
//Delete cJSON data structures
cJSON_Delete(tensorData);
//unmap memory
If everything works you should see this output:
If you go to the Huggingface model page and click the Files info
button
This will open the model card and you shall see similar information to the terminal output:
4.2.3 Querying Tensor Information From JSON
Now, we have enough information to query specific tensors.
We observe that a single JSON key holds these values:
Datatype
Shape
Data Offsets. This are byte indices within the weights section.
We write a function called GetTensorOffset that queries cJSON for a particular string:
int GetTensorOffset(cJSON *tensorData, char *tensorName, size_t *tensorStart, size_t *tensorEnd)
{
int foundTensor = -1;
cJSON *item = NULL;
cJSON *offset = NULL;
cJSON *dtype = NULL;
cJSON *data_offsets = NULL;
cJSON *shape = NULL;
cJSON *eachShape = NULL;
cJSON_ArrayForEach(item, tensorData)
{
dtype = cJSON_GetObjectItem(item, "dtype");data_offsets = cJSON_GetObjectItem(item, "data_offsets");
shape = cJSON_GetObjectItem(item, "shape");
if(dtype && data_offsets && shape)
{
if(strcmp(tensorName, item->string) == 0)
{
//printf("Key: %s\n", item->string);printf(" dtype: %s\n", dtype->valuestring);printf(" data_offsets: ");
cJSON_ArrayForEach(offset, data_offsets)
{
foundTensor += 1;
if(foundTensor == 0)
{
*tensorStart = (size_t) offset->valuedouble;
}
else if(foundTensor == 1)
{
*tensorEnd = (size_t) offset->valuedouble;
}
}
break;
}
}
}
return foundTensor;
}
and use it like this in our main function:
size_t tensorOffsetStart = 0;
size_t tensorOffsetEnd = 0;
int foundTensor = 0;
foundTensor = GetTensorOffset(tensorData, "h.4.ln_1.bias", &tensorOffsetStart, &tensorOffsetEnd);
assert(foundTensor > -1);
assert(tensorOffsetEnd > tensorOffsetStart);
//FP32 has 4 bytes per value
assert((tensorOffsetEnd - tensorOffsetStart) % 4 == 0);
printf("Tensor start: %ld Tensor end: %ld\n",tensorOffsetStart, tensorOffsetEnd);
If everything is running then you see this:
4.2.4 Converting Raw Bytes to Floats
Now that we know tensor locations, we can get raw byte information and convert bytes to float.
First, we use pointer arithmetic to move to the start of the weights section:
unsigned char *weightData = (safeTensorData+8+headerLength);
Next, we use pointer arithmetic to load our the tensor weights we want into a float array. This is a simple cast to float.
We use (tensorOffsetEnd - tensorOffsetStart) / 4
to find the length of our casted array:
//Convert and print weights
float *sampleWeights = (float *) (weightData + tensorOffsetStart);
for(size_t i = 0; i < (tensorOffsetEnd - tensorOffsetStart) / 4; i++)
{
printf("%.3f ", sampleWeights[i]);
}
If everything is working then you shall observe these floats:
Congratulations! You parsed a safetensors file in C.
If you have any feedback then feel free to share : murage.kibicho@leetarxiv.com