Quick Intro
LeetArxiv is Leetcode for implementing Arxiv and other research papers.
*We code this paper in C and Python. Here is 12 months of Perplexity Pro on us.
There’s free GPU credits hidden somewhere below :)
Complete code is available here on Github.
Paper Summary
Sudoku solvers are great for testing reasoning models. This paper showcases Belief Propagation, an alternative to backpropagation rooted in Optimal Transport theory.
The idea is : we can use the Sinkhorn-Knopp algorithm to turn an integer matrix into a floating-point probability matrix.
It’s somewhat analogous to performing a softmax but without the derivatives.
1.0 Paper Introduction
The paper Sinkhorn Solves Sudoku (Moon, Gunther & Kupin, 2009)1 demonstrates the convergence of the Sinkhorn-Knopp algorithm on constrained reasoning problems such as solving Sudoku puzzles.
The authors utilise the concept of Sinkhorn balancing to iteratively solve puzzles.
Informally, Sinkhorn balancing is the idea one can use permutation matrices to design a custom sorting algorithm.
More formally, Sinkhorn balancing is an algorithm for projecting a matrix onto the space of doubly stochastic matrices (Moon, Gunther & Kupin, 2009).
2.0 Quick Primer on Sudoku and Sinkhorn-Knopp Algorithm
This section quickly introduces Sudoku to the reader. We gloss over the Sinkhorn-Knopp algorithm as well.
*The smallest value for a sudoku cell is 1, not zero lol
2.1 Introduction to Sudoku
Standard Sudoku is played on a grid of 9 x 9 spaces subdivided into 3*3 squares. The objective is to fill out the grid with the numbers 1-9, without repeating any numbers within the row, column or square (Sudoku.com, 2025)2.
Standard Sudoku is called Sudoku of rank 3 (Cornell Math, 2025)3.
A Sudoku of rank n is a n2×n2 square grid, subdivided into n2 blocks, each of size n×n and the numbers used to fill the grid are 1, 2, 3, ..., n2(Cornell Math, 2025).
2.2 Our Dataset
We use the sapientinc/sudoku-extreme Dataset from huggingface.
The dataset is split into these numpy arrays (Inamdar, 2025)4:
inputs: The starting grid of the puzzle.labels: The solved grid (the “answer key”).puzzle_indices: This acts like a card catalog, telling us where each individual puzzle starts and ends in the big stack of examples.group_indices: It groups together all the augmented versions of the same original puzzle.
We use this script to download and prepare the dataset.
Then we can test to ensure it loaded correctly in Python.
and in C
*The dataset needs some processing since some questions are filled with random noise, not zeros. The annotated line shows how to handle this:
2.2 Introduction to Sinkhorn-Knopp algorithm and Sinkhorn Balancing
Sinkhorn balancing, commonly known as the Sinkhorn-knopp (Sinkhorn & Knopp, 1967)5 algorithm is a technique for finding permutation matrices aka doubly-stochastic matrices.
A doubly stochastic matrix is a matrix with non-negative elements whose rows each sum to 1 and whose columns each sum to 1 (Moon, Gunther & Kupin, 2009).
Matrix balancing techniques help find a doubly stochastic diagonal scaling of a square nonnegative matrix (Knight & Ruiz, 2013)6 .
If A is a non-negative square matrix, A is said to have total support if every positive element of A lies on a positive diagonal.
*NOTE: The Sinkhorn-Knopp algorithm converges only on matrices with total support.
A thorough description of total support is given in (Gnarls, 2024)7.
2.3 Coding Sinkhorn-Knopp Algorithm
The Sinkhorn-Knopp algorithm is fundamental to Belief Propagation. We iteratively scale integer matrix entries until they are in the range 0 to 1.
Think of it as performing a softmax of sorts.
We went into the fine details of implementing the Sinkhorn-Knopp algorithm here in Chapter 2. So we’ll just lift the gist code lol.
3.0 Using Belief Propagation to Solve Sudoku
Belief propagation is the use of sinkhorn balancing to solve constraint problems.
The paper provides pseudocode for the Sinkhorn Sudoku Solution (SSS) algorithm. The probability constraint matrix (Q) is a stacked array of matrices of possible integer entries for each cell’s solution.
As described in the paper, there are 27 constraints: 9 row constraints + 9 column constraints + 9 grid subdivision constraints.
We use Sinkhorn balancing to find the solution this way:
For each of the 81 cells, find a probability distribution for the possible digits from 1 through 9.

Find all 27 probability constraint matrix (Q) of dimension N * N by stacking the probability distributions from step 1.
Perform sinkhorn balancing on all 27 constraint matrices.
Update the sudoku grid using the new probabilities.
The paper doesn’t tell us how to update probabilites. So here the implementation becomes somewhat hand-wavy lol. We sum the cell’s probabilities then average.
We find the hardmax as the new cell symbol.
We update our cell probabilities every iteration
4.0 Results
The training run jumps around quite a lot but it works lol.
Lots are solved and lots are also partially solved like the one below.
According to the paper’s original authors we should see close to 100% accuracy all the time. They didn’t publish their code so we take this with a grain of salt.
We could also have misinterpreted some things lol.
5.0 Further Reading
You might like:
You made it this far. Here are some free gpu credits.
If you haven’t already, here’s 12 months of Perplexity Pro for free.
References
Moon, T., Gunther, J., & Kupin, J.,. (2009). Sinkhorn Solves Sudoku. IEEE Transactions on Information Theory, vol. 55, no. 4, pp. 1741-1746. doi: 10.1109/TIT.2009.2013004.
Inamdar, A.,. (2025). Beyond the Grid: How I Taught an AI the Subtle Art of Sudoku (Without a Million Identical Puzzles). Link.
Sinkhorn, R., & Knopp, P,. (1967). Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics Vol. 21 (1967), No. 2, 343–348 . DOI: 10.2140/pjm.1967.21.343.


























