Rectified Flow for Everyday Programmers
Coding the Flux/Stable Diffusion Paper - Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Quick Intro
LeetArxiv is Leetcode for implementing Arxiv papers.
This is Chapter 4 in our upcoming book, Diffusion Models from Scratch in Python and C.

Past Chapters (Early access for paying subscribers)
Chapter 1 : Noise Generation and Forward Diffusion in C, Ruby and Rust.
Chapter 2 : NASA Almost Invented Stable Diffusion in the 1980’s.
Chapter 3 : Building a Tensor and Convolution Library In C and Python for Backpropagation.
Chapter 4 (we are here) : Rectified Flow Powers Stable Diffusion 3.5 and Flux.
1.0 Introduction
In 2022, researchers at UT Austin introduced Rectified Flow, a generative model that learns how to smoothly connect noise and data using ordinary differential equations (ODEs). The model learns a category of ODEs called straight transports, that enable fast generative models that can be simulated in a single step (Liu, Hu & Liu 2024)1.
LeetArxiv Summary
Rectified flow is a generative model that learns to draw straight lines between noise and actual data.
The model was introduced in the papers Liu, Gong & Liu (2022)2 and Liu (2022)3
Rectified flow is the diffusion model powering Stable Diffusion 3.5 and Flux.
This article is divided into:
Generating the paper’s dataset.
Coding the network architecture.
Comparing our model’s results to the author’s results.
The original paper is 40 pages long. Don’t fret because the idea is pretty simple: Use straight-line paths to train a neural network for style transfer.
However, it must be noted that training Stable Diffusion with rectified flows costs between $5,000 and $10,000. It takes 199 days to train on a single A100 GPU (Isozaki 2023)4.
2.0 Generating the Dataset
The graphic above features prominently in Liu, Gong & Liu (2022). It shows a rectified flow model learning to move data from one point to another. The purple dots are the original positions while the red dots are the final positions. The green and blue lines show the paths learnt by the model.
The next section demonstrates how one generates this dataset in both C and Python.
2.1 Generating Clusters and their Centers
The dataset exists in 2-Dimensional space with points clustered along a circle’s cicumference. Therefore, we need to define these variables:
clusterRadius
: the distance between our point clusters and the center.anglesInRadians
: the angle (in radians) where our clusters occur.standardDeviation
: the standard deviation noise added to our points.clusterCenters
: the center of each point cluster in our dataset.datasetSize
: the number of x,y points in our datasetIn Python, we have:
datasetSize = 1000 standardDeviation = 0.5 anglesInDegrees = np.array([0, 60, 120, 180, 240, 300]) anglesInRadians = np.deg2rad(anglesInDegrees) clusterRadius0 = 12 clusterRadius1 = 5
In C, we have:
#include <stdio.h> #include <stdlib.h> #include <math.h> #define M_PI 3.14159265358979323846 //Run: clear && gcc RectifiedFlow.c -lm -o m.o && ./m.o void PrintFloatArray(int length, float *array) { for(int i = 0; i < length; i++) { printf("%.3f,", array[i]); } printf("\n"); } void DegreesToRadians(int angleCount, int anglesInDegrees[], float anglesInRadians[]) { //Formula radians = degrees × (π/180) for(int i = 0; i < angleCount; i++) { anglesInRadians[i] = (float)anglesInDegrees[i] * (M_PI / 180); } } int main() { int datasetSize = 1000; float standardDeviation = 0.5; int anglesInDegrees[] = {0,60,120,180,240,300}; int angleCount = sizeof(anglesInDegrees) / sizeof(int); float anglesInRadians[angleCount]; DegreesToRadians(angleCount, anglesInDegrees, anglesInRadians); float clusterRadius0 = 12; float clusterRadius1 = 5 return 0; }
Next, we write these two functions:
GenerateClusterCenters
:This function takes the
radius
variable and theangleInRadians
array.It outputs an array of similar length to
angleInRadians
.The output array holds x and y coordinates for each cluster’s center.
The coordinates are found by calculating the sine and cosine of each input angle in radians.
GenerateDataset
:This function takes the
datasetSize
andstandardDeviation
variables, as well as the clusterCenters array.It outputs an array of points centered around different clusters.
First the function generates random 2D points, then centers them around random clusters finally adds some noise to these points.
In Python we have:
import random
def GenerateDataset(datasetSize, standardDeviation, clusterCenters):
dataset = []
for i in range(datasetSize):
#Generate random 2D points
sample = np.random.randn(clusterCenters.shape[1]) * standardDeviation
#Choose a random cluster center
currentClusterCenter = random.randint(0, len(clusterCenters) - 1)
#Add noise to the cluster center
sample[0] += clusterCenters[currentClusterCenter, 0]
sample[1] += clusterCenters[currentClusterCenter, 1]
dataset.append(sample)
return np.array(dataset)
def GenerateClusterCenters(anglesInRadians, clusterRadius):
cluster = []
for i in range(len(anglesInRadians)):
points = np.array([clusterRadius * np.cos(anglesInRadians[i]), clusterRadius * np.sin(anglesInRadians[i])])
cluster.append(points)
return np.array(cluster)
In C, we have:
Feel free to compare either your Python or C code to mine at this link.
By the end of this section, you should have a dataset that resembles this:
3.0 Training the Model
Our training objective is to find a straight transport schedule from the points at radius 12 to the points at radius 5. We don’t want the lines to drawn to intersect.

Image generators use U-nets to learn the schedule while Stable Diffusion uses a transformer to learn the schedule. We follow the multi-layer perceptron model used in Papers in 100 lines (2024)5.
The training section is written in Python. We interface C and Python using the code written here.
First, we build a simple MLP using Tanh as our activation function:
class MLP(nn.Module):
def __init__(self, in_dim, context_dim, h, out_dim):
super(MLP, self).__init__()
self.network = nn.Sequential(
nn.Linear(in_dim + context_dim, h),
nn.Tanh(),
nn.Linear(h, h),
nn.Tanh(),
nn.Linear(h, out_dim)
)
def forward(self, x, context):
# Convert inputs to model's dtype if needed
x = x.to(next(self.parameters()).dtype)
context = context.to(next(self.parameters()).dtype)
return self.network(torch.cat((x, context), dim=1))
Next, we build a dataset class:
class Dataset(torch.utils.data.Dataset):
def __init__(self, dist1, dist2):
self.dist1 = dist1
self.dist2 = dist2
assert self.dist1.shape == self.dist2.shape
def __len__(self):
return self.dist1.shape[0]
def __getitem__(self, idx):
return self.dist1[idx], self.dist2[idx]
We need to write our training loop. This interpolates between z0 and z1 at different timesteps and uses the L2 loss:
def train_rectified_flow(rectified_flow, optimizer, train_dataloader, NB_EPOCHS, eps=1e-15):
for epoch in tqdm(range(NB_EPOCHS)):
for z0, z1 in (train_dataloader):
z0, z1 = z0.to(device), z1.to(device)
t = torch.rand((z1.shape[0], 1), device=device)
z_t = t * z1 + (1.-t) * z0
target = z1 - z0
pred = rectified_flow(z_t, t)
loss = (target - pred).view(pred.shape[0], -1).abs().pow(2).sum(dim=1).mean()
optimizer.zero_grad()
loss.backward()
optimizer.step()
Running our code should generate this image. The model learns a straight transport schedule and the lines do not intersect.
References
Liu, B., Hu, X., & Liu, Q. (2024). Rectifed Flow: Straight is Fast. Let us Flow Together Blog Post.
Liu, X., Gong, C., & Liu, Q. (2022). Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv.
Isozaki, Isamu. (2023). Understanding InstaFlow/Rectified Flow. HuggingFace.