Emulating a GPU on a CPU using Finite Field Assembly
A Friendly yet Thorough Introduction to Finite Field Assembly programming language and the Recursive Computing Paradigm
Quick Note
This is Chapter 4 in our book, A Super-friendly Guide to Finite Fields for Programmers.
LeetArxiv and the Finite Field Assembly (FF-asm) programming language are entirely funded by our Substack readers. Subscribe to support us!
Past Chapters (for paying subscribers)
Chapter 1 : FizzBuzz and Finite Field Theory
Chapter 2 : Introducing Silly Math Jargon like Congruences and Primes.
Chapter 3 (Unfinished) : Constructing our First Finite Field
Chapter 4 (We are here): Emulating a GPU using a CPU using Congruences, Primes and Number Theory.
Introduction
Finite Field Assembly (FF-asm) is a programming language founded on the thesis: Math is mostly invented, rarely discovered.
For instance, binary digits (0 and 1) fall into the category of discovered math, while concepts like 2’s complement, fixed-point arithmetic, and floating-point arithmetic are examples of invented math.
The FF-asm language empowers programmers to create their own mathematical systems to solve the problem at hand.
Finite Field Assembly’s main feature is recursive computing : it’s not vectorization, not parallelization, but performing a calculation inside a calculation inside another calculation.
*The finite field is the most basic data structure in FF-asm.
Defining our Problem
The problem : We need to emulate a GPU on a regular CPU.
Here are our constraints :
We’re NOT using SIMD vectorization.
We’re NOT utilizing OpenMP’s parallelization routines.
We lack access to a GPU.
Therefore, we need to build math that permits us perform lots of parallel computations on our CPU.
This tutorial guides us through constructing a custom finite field that supports addition and multiplication using unsigned 8-bit integers in FF-asm.
Setting up our FF-asm Environment
1. Installing Dependencies
One needs a working installation of the GNU MP Bignum library.[1] Here's the link to install libgmp.
Ubuntu installation instructions here.
One also need the ff_asm_runtime.h file and the ff_asm_primes.h located in the official GitHub repository.
2. Verifying Installation : Writing Hello World in FF-asm
Remember, FF-asm, just like CUDA, is an extension of C. It's super low-level so you need to handle memory allocations.
This Hello World Program creates an 8-bit unsigned integer finite field.
Create a file called
01_HelloWorld
.c in the same directory that contains ff_asm_runtime.h and ff_asm_primes.hCopy the code below. You can find the source here.
#include <stdio.h>
#include "ff_asm_runtime.h"
//Run : gcc 01_HelloWorld.c -lgmp -lm -o m.o && ./m.o
int main()
{
//Unsigned 8-bit array
uint8_t data[] = {50, 100, 20};
uint64_t fieldOrder[] = {8, 9, 11};
size_t dataLength = sizeof(data) / sizeof(uint8_t);
//Allocate memory for a FF-asm field of type Int_8_U_Field
ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field);
//Set our finite field order
ff_asmSetFieldOrder(field, dataLength, fieldOrder);
//Place data in the FF-asm fieldNode at index 0
ff_asmAppendData(field, dataLength, data, Int_8_U_Field);
//Print the FF-asm field
ff_asmPrintField(field);
//Free the FF-asm field
ff_asmFreeField(field);
return 0;
}
Compile and run using these two commands:
$ gcc 01_HelloWorld.c -lgmp -lm -o m.o && ./m.o
$ ./m.o
If your installation is working, you should get this
Field Data Type: Int_8_U_Field
Field Length: 3
Field Node Count: 1
Field Order : 8 9 11
Node Data Type: Int_8_U_Field
Node Data Length: 3
Node Data : 50, 100, 20,
Node integer : 658
Int_8_U_Field
: This is the field data type, an unsigned 8-bit integer in this case.(8,9,11)
: Field order (the number of elements your field can hold). i.e you can store 8 * 9 * 11 elements in the field .658 :
This is a unique integer that represents your specific the set of finite field elements. It is the solution to a linear congruence.
3. Performing Recursive Addition on CPU
Recursive computing is a first-class feature in Finite Field Assembly (FF-asm). We use the mathematical theory of congruences and primes to perform lots of different calculations at the same time.
In this section, we show how to perform addition in Finite Field Assembly (FF-asm) using the Recursive Computing paradigm.
We cover the mathematical theory of Recursive Computing in Chapter 2 of our book, A Super-friendly Guide to Finite Fields for Programmers.
*Recursive computing is not parallelization, not cache-aware vectorization, but a calculation inside a calculation inside another calculation.
*Recursive computing - performing one outer calculation lets you perform several inner calculations at the same time.
Coding Lab : Recursive Addition for the vectors {5,3,6} and {2,4,1}
This coding lab introduces the functions :
ff_asmMalloc
ff_asmSetFieldOrder
ff_asmAppendData
ff_asmAdd
ff_asmDataDebug
ff_asmPrintField
ff_asmFreeField
You can find the file on GitHub here.
#include <stdio.h>
#include "ff_asm_runtime.h"
//Run : gcc 02_RecursiveAddition.c -lgmp -lm -o m.o && ./m.o
int main()
{
//Unsigned 8-bit arrays we want to add
uint8_t data0[] = {5, 3, 6};
uint8_t data1[] = {2, 4, 1};
uint64_t fieldOrder[] = {8, 9, 11};
int dataLength = sizeof(data0) / sizeof(uint8_t);
//Allocate memory for one FF-asm fields of type Int_8_U_Field
ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field);
//Set our finite field order
ff_asmSetFieldOrder(field, dataLength, fieldOrder);
//Place data0 in the FF-asm fieldNode at index 0
ff_asmAppendData(field, dataLength, data0, Int_8_U_Field);
//Place data1 in the FF-asm field at index 1
ff_asmAppendData(field, dataLength, data1, Int_8_U_Field);
//Add data at index 0 and data at index 1 : store at index 2.
ff_asmAdd(field, 0, 1, 2);
//Prepare the data for printing.
ff_asmDataDebug(field);
//Print the FF-asm field
ff_asmPrintField(field);
//Free the FF-asm field
ff_asmFreeField(field);
return 0;
}
This will output
Field Data Type: Int_8_U_Field
Field Length: 3
Field Node Count: 3
Field Order : 8 9 11
Node Data Type: Int_8_U_Field
Node Data Length: 3
Node Data : 5, 3, 6,
Node integer : 237
Node Data Type: Int_8_U_Field
Node Data Length: 3
Node Data : 2, 4, 1,
Node integer : 562
Node Data Type: Int_8_U_Field
Node Data Length: 3
Node Data : 7, 7, 7,
Node integer : 7
We observe that our result array {7,7,7} is the last element in the field.
4. Performing Recursive Multiplication on CPU
Coding Lab : Recursive Multiplication for the vectors {5,3,6} and {2,4,1}
Note : we increase the order of field (i.e the values in the fieldOrder array
) to construct a field that can hold our expected result.
This coding lab introduces the function:
ff_asmMultiply
#include <stdio.h>
#include "ff_asm_runtime.h"
//Run : gcc 03_RecursiveMultiplication.c -lgmp -lm -o m.o && ./m.o
int main()
{
//Unsigned 8-bit arrays we want to add
uint8_t data0[] = {5, 3, 6};
uint8_t data1[] = {2, 4, 1};
uint64_t fieldOrder[] = {15,17,19};
int dataLength = sizeof(data0) / sizeof(uint8_t);
//Allocate memory for one FF-asm fields of type Int_8_U_Field
ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field);
//Set our finite field order
ff_asmSetFieldOrder(field, dataLength, fieldOrder);
//Place data0 in the FF-asm fieldNode at index 0
ff_asmAppendData(field, dataLength, data0, Int_8_U_Field);
//Place data1 in the FF-asm field at index 1
ff_asmAppendData(field, dataLength, data1, Int_8_U_Field);
//Multiply data at index 0 and data at index 1 : store at index 2.
ff_asmMultiply(field, 0, 1, 2);
//Prepare the data for printing.
ff_asmDataDebug(field);
//Print the FF-asm field
ff_asmPrintField(field);
//Free the FF-asm field
ff_asmFreeField(field);
return 0;
}
Output
Field Data Type: Int_8_U_Field
Field Length: 3
Field Node Count: 3
Field Order : 15 17 19
Node Data Type: Int_8_U_Field
Node Data Length: 3
Node Data : 5, 3, 6,
Node integer : 785
Node Data Type: Int_8_U_Field
Node Data Length: 3
Node Data : 2, 4, 1,
Node integer : 4067
Node Data Type: Int_8_U_Field
Node Data Length: 3
Node Data : 10, 12, 6,
Node integer : 4585
We observe that our result array {10, 12, 6} is the last element in the field.
This was a short introduction to Finite Field Assembly. Make sure to look at our source code and star the repo here
Funding Goals
These are our immediate goals. It’s mostly math research and tooling.
*Paying subscribers please add a note describing what you’re supporting.
Finite Field GEMM - writing a Matrix Multiplication library over Finite Fields. (Research was funded and completed here)
Add subtraction, division and better error handling.
Supporting signed integers, floats and doubles.
Add a polynomial arithmetic library.
Add Tree-Sitter syntax highlighting and LSP support.
Hire webdev for better documentation.
Special thanks to the Red Bull Futurism Fund and
.