Rectified Flow for Everyday Programmers

Coding the Flux/Stable Diffusion Paper - Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Murage Kibicho

Jul 10, 2025

LeetArxiv is Leetcode for implementing Arxiv papers.

This is Chapter 4 in our upcoming book, Diffusion Models from Scratch in Python and C.

**Frontmatter for the paper “Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow”**

Past Chapters (Early access for paying subscribers)

Chapter 1 : Noise Generation and Forward Diffusion in C, Ruby and Rust.
Chapter 2 : NASA Almost Invented Stable Diffusion in the 1980’s.
Chapter 3 : Building a Tensor and Convolution Library In C and Python for Backpropagation.
Chapter 4 (we are here) : Rectified Flow Powers Stable Diffusion 3.5 and Flux.

1.0 Introduction

In 2022, researchers at UT Austin introduced Rectified Flow, a generative model that learns how to smoothly connect noise and data using ordinary differential equations (ODEs). The model learns a category of ODEs called straight transports, that enable fast generative models that can be simulated in a single step (Liu, Hu & Liu 2024)1.

LeetArxiv Summary
Rectified flow is a generative model that learns to draw straight lines between noise and actual data.
The model was introduced in the papers Liu, Gong & Liu (2022)2 and Liu (2022)3
Rectified flow is the diffusion model powering Stable Diffusion 3.5 and Flux.
This article is divided into:
Generating the paper’s dataset.
Coding the network architecture.
Comparing our model’s results to the author’s results.
Share LeetArxiv

The original paper is 40 pages long. Don’t fret because the idea is pretty simple: Use straight-line paths to train a neural network for style transfer.

**Rectified Flow Learning a Straight Line Path**. Taken from page 26 of Liu, Gong & Liu (2022)

However, it must be noted that training Stable Diffusion with rectified flows costs between $5,000 and $10,000. It takes 199 days to train on a single A100 GPU (Isozaki 2023)4.

2.0 Generating the Dataset

**Dataset used to test rectified flows in Liu, Gong & Liu (2022)**

The graphic above features prominently in Liu, Gong & Liu (2022). It shows a rectified flow model learning to move data from one point to another. The purple dots are the original positions while the red dots are the final positions. The green and blue lines show the paths learnt by the model.

The next section demonstrates how one generates this dataset in both C and Python.

2.1 Generating Clusters and their Centers

The dataset exists in 2-Dimensional space with points clustered along a circle’s cicumference. Therefore, we need to define these variables:

clusterRadius: the distance between our point clusters and the center.
anglesInRadians: the angle (in radians) where our clusters occur.
standardDeviation: the standard deviation noise added to our points.
clusterCenters: the center of each point cluster in our dataset.

datasetSize : the number of x,y points in our dataset

In Python, we have:

datasetSize = 1000
standardDeviation = 0.5
anglesInDegrees = np.array([0, 60, 120, 180, 240,  300])
anglesInRadians = np.deg2rad(anglesInDegrees)

clusterRadius0  = 12
clusterRadius1  = 5

In C, we have:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define M_PI 3.14159265358979323846

//Run: clear && gcc RectifiedFlow.c -lm -o m.o && ./m.o
void PrintFloatArray(int length, float *array)
{
	for(int i = 0; i < length; i++)
	{
		printf("%.3f,", array[i]);
	}
	printf("\n");
}
void DegreesToRadians(int angleCount, int anglesInDegrees[], float anglesInRadians[])
{
	//Formula radians = degrees × (π/180)
	for(int i = 0; i < angleCount; i++)
	{
		anglesInRadians[i] = (float)anglesInDegrees[i] * (M_PI / 180);
	}
}

int main()
{
	int datasetSize = 1000;
	float standardDeviation = 0.5;
	int anglesInDegrees[] = {0,60,120,180,240,300};
	int angleCount = sizeof(anglesInDegrees) / sizeof(int);
	float anglesInRadians[angleCount];
	DegreesToRadians(angleCount, anglesInDegrees, anglesInRadians);
        float clusterRadius0  = 12;
        float clusterRadius1  = 5

	return 0;
}

Next, we write these two functions:

GenerateClusterCenters:
- This function takes the radius variable and the angleInRadians array.
- It outputs an array of similar length to angleInRadians.
- The output array holds x and y coordinates for each cluster’s center.
- The coordinates are found by calculating the sine and cosine of each input angle in radians.
GenerateDataset:
- This function takes the datasetSize and standardDeviation variables, as well as the clusterCenters array.
- It outputs an array of points centered around different clusters.
- First the function generates random 2D points, then centers them around random clusters finally adds some noise to these points.

In Python we have:

import random
def GenerateDataset(datasetSize, standardDeviation, clusterCenters):
    dataset = []
    for i in range(datasetSize):
        #Generate random 2D points 
        sample = np.random.randn(clusterCenters.shape[1]) * standardDeviation
        #Choose a random cluster center
        currentClusterCenter = random.randint(0, len(clusterCenters) - 1) 
        #Add noise to the cluster center
        sample[0] += clusterCenters[currentClusterCenter, 0]
        sample[1] += clusterCenters[currentClusterCenter, 1]
        dataset.append(sample)
    return np.array(dataset)

def GenerateClusterCenters(anglesInRadians, clusterRadius):
    cluster = []
    for i in range(len(anglesInRadians)):
        points = np.array([clusterRadius * np.cos(anglesInRadians[i]), clusterRadius * np.sin(anglesInRadians[i])])
        cluster.append(points)
    return np.array(cluster)

In C, we have:

Feel free to compare either your Python or C code to mine at this link.

By the end of this section, you should have a dataset that resembles this:

3.0 Training the Model

Our training objective is to find a straight transport schedule from the points at radius 12 to the points at radius 5. We don’t want the lines to drawn to intersect.

**Straight transport properties** taken from Qiang Liu’s Montecarlo Seminar on YouTube

Image generators use U-nets to learn the schedule while Stable Diffusion uses a transformer to learn the schedule. We follow the multi-layer perceptron model used in Papers in 100 lines (2024)5.

The training section is written in Python. We interface C and Python using the code written here.

Making C and Python Talk to Each Other

Murage Kibicho

May 27

Read full story

First, we build a simple MLP using Tanh as our activation function:

class MLP(nn.Module):
    def __init__(self, in_dim, context_dim, h, out_dim):
        super(MLP, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(in_dim + context_dim, h),
            nn.Tanh(),
            nn.Linear(h, h), 
            nn.Tanh(),
            nn.Linear(h, out_dim)
        )
        
    def forward(self, x, context):
        # Convert inputs to model's dtype if needed
        x = x.to(next(self.parameters()).dtype)
        context = context.to(next(self.parameters()).dtype)
        return self.network(torch.cat((x, context), dim=1))

Next, we build a dataset class:

class Dataset(torch.utils.data.Dataset):
    def __init__(self, dist1, dist2):
        self.dist1 = dist1
        self.dist2 = dist2
        assert self.dist1.shape == self.dist2.shape

    def __len__(self):
        return self.dist1.shape[0]

    def __getitem__(self, idx):
        return self.dist1[idx], self.dist2[idx]

We need to write our training loop. This interpolates between z0 and z1 at different timesteps and uses the L2 loss:

def train_rectified_flow(rectified_flow, optimizer, train_dataloader, NB_EPOCHS, eps=1e-15):

    for epoch in tqdm(range(NB_EPOCHS)):
        for z0, z1 in (train_dataloader):

            z0, z1 = z0.to(device), z1.to(device)
            t = torch.rand((z1.shape[0], 1), device=device)
            z_t = t * z1 + (1.-t) * z0
            target = z1 - z0

            pred = rectified_flow(z_t, t)
            loss = (target - pred).view(pred.shape[0], -1).abs().pow(2).sum(dim=1).mean()
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

Running our code should generate this image. The model learns a straight transport schedule and the lines do not intersect.

References

Liu, B., Hu, X., & Liu, Q. (2024). Rectifed Flow: Straight is Fast. Let us Flow Together Blog Post.

Liu, X., Gong, C., & Liu, Q. (2022). Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv.

Liu, Q. (2022). Rectified Flow: A Marginal Preserving Approach to Optimal Transport. arXiv.

Isozaki, Isamu. (2023). Understanding InstaFlow/Rectified Flow. HuggingFace.

Papers in 100 Lines of Code. Rectified Flow: The Game-Changing Technique Powering Stable Diffusion 3 (Full Reimplementation!). YouTube.

LeetArxiv