0:00
/

Belief Propagation is an Obscure Alternative to Backpropagation for Training Reasoning Models

We code the 2010 paper 'Sinkhorn Solves Sudoku' to understand the Belief Propagation algorithm in Python and C
Quick Intro
LeetArxiv is Leetcode for implementing Arxiv and other research papers.
*We code this paper in C and Python. Here is 12 months of Perplexity Pro on us.
There’s free GPU credits hidden somewhere below :)

Stop reading papers. Start coding them. Subscribe for weekly paper implementations.

Frontmatter for the 2009 paper Sinkhorn Solves Sudoku (Moon, Gunther & Kupin, 2009)

Complete code is available here on Github.

Paper Summary

Sudoku solvers are great for testing reasoning models. This paper showcases Belief Propagation, an alternative to backpropagation rooted in Optimal Transport theory.

The idea is : we can use the Sinkhorn-Knopp algorithm to turn an integer matrix into a floating-point probability matrix.

It’s somewhat analogous to performing a softmax but without the derivatives.

1.0 Paper Introduction

The paper Sinkhorn Solves Sudoku (Moon, Gunther & Kupin, 2009)1 demonstrates the convergence of the Sinkhorn-Knopp algorithm on constrained reasoning problems such as solving Sudoku puzzles.

The authors utilise the concept of Sinkhorn balancing to iteratively solve puzzles.

Informally, Sinkhorn balancing is the idea one can use permutation matrices to design a custom sorting algorithm.

Definition of Sinkhorn balancing taken from page 1 of (Moon, Gunther & Kupin, 2009)

More formally, Sinkhorn balancing is an algorithm for projecting a matrix onto the space of doubly stochastic matrices (Moon, Gunther & Kupin, 2009).

2.0 Quick Primer on Sudoku and Sinkhorn-Knopp Algorithm

This section quickly introduces Sudoku to the reader. We gloss over the Sinkhorn-Knopp algorithm as well.

*The smallest value for a sudoku cell is 1, not zero lol

2.1 Introduction to Sudoku

Sudoku is introduced on Page 1 of (Moon, Gunther & Kupin, 2009)

Standard Sudoku is played on a grid of 9 x 9 spaces subdivided into 3*3 squares. The objective is to fill out the grid with the numbers 1-9, without repeating any numbers within the row, column or square (Sudoku.com, 2025)2.

Standard Sudoku is called Sudoku of rank 3 (Cornell Math, 2025)3.

A Sudoku of rank n is a n2×n2 square grid, subdivided into n2 blocks, each of size n×n and the numbers used to fill the grid are 1, 2, 3, ..., n2(Cornell Math, 2025).

2.2 Our Dataset

We use the sapientinc/sudoku-extreme Dataset from huggingface.

The dataset is split into these numpy arrays (Inamdar, 2025)4:

  • inputs: The starting grid of the puzzle.

  • labels: The solved grid (the “answer key”).

  • puzzle_indices: This acts like a card catalog, telling us where each individual puzzle starts and ends in the big stack of examples.

  • group_indices: It groups together all the augmented versions of the same original puzzle.

We use this script to download and prepare the dataset.

Then we can test to ensure it loaded correctly in Python.

Test Python to show the first puzzle and its solution

and in C

Test C to ensure dataset is loaded

*The dataset needs some processing since some questions are filled with random noise, not zeros. The annotated line shows how to handle this:

Annotated extra steps needed to parse Sudoku dataset

2.2 Introduction to Sinkhorn-Knopp algorithm and Sinkhorn Balancing

Sinkhorn balancing, commonly known as the Sinkhorn-knopp (Sinkhorn & Knopp, 1967)5 algorithm is a technique for finding permutation matrices aka doubly-stochastic matrices.

Sinkhorn Balancing described on page 2 of (Moon, Gunther & Kupin, 2009)

A doubly stochastic matrix is a matrix with non-negative elements whose rows each sum to 1 and whose columns each sum to 1 (Moon, Gunther & Kupin, 2009).

Matrix balancing techniques help find a doubly stochastic diagonal scaling of a square nonnegative matrix (Knight & Ruiz, 2013)6 .

Important definitions taken from (Sinkhorn & Knopp, 1967)

If A is a non-negative square matrix, A is said to have total support if every positive element of A lies on a positive diagonal.

*NOTE: The Sinkhorn-Knopp algorithm converges only on matrices with total support.

A thorough description of total support is given in (Gnarls, 2024)7.

Summay of total support taken from (Gnarls, 2024)

2.3 Coding Sinkhorn-Knopp Algorithm

The Sinkhorn-Knopp algorithm is fundamental to Belief Propagation. We iteratively scale integer matrix entries until they are in the range 0 to 1.

Think of it as performing a softmax of sorts.

We went into the fine details of implementing the Sinkhorn-Knopp algorithm here in Chapter 2. So we’ll just lift the gist code lol.

C Implementation of Sinkhorn-Knopp taken from Chapter 2

3.0 Using Belief Propagation to Solve Sudoku

Belief propagation is the use of sinkhorn balancing to solve constraint problems.

Sinkhorn Sudoku Solution algorithm pseudocode. Taken from page 2

The paper provides pseudocode for the Sinkhorn Sudoku Solution (SSS) algorithm. The probability constraint matrix (Q) is a stacked array of matrices of possible integer entries for each cell’s solution.

Definition of Constraint Probability Matrix

As described in the paper, there are 27 constraints: 9 row constraints + 9 column constraints + 9 grid subdivision constraints.

We use Sinkhorn balancing to find the solution this way:

  1. For each of the 81 cells, find a probability distribution for the possible digits from 1 through 9.

Probability distribution described by the authors.
Sample C code to find probabilities as described in the paper. We check horizontally, vertically and within 3*3 block
  1. Find all 27 probability constraint matrix (Q) of dimension N * N by stacking the probability distributions from step 1.

    Finding a constraint matrix is stacking all the probability matrices for the cells involved in a single constraint
  2. Perform sinkhorn balancing on all 27 constraint matrices.

    Balance all 27 constraints
  3. Update the sudoku grid using the new probabilities.

    • The paper doesn’t tell us how to update probabilites. So here the implementation becomes somewhat hand-wavy lol. We sum the cell’s probabilities then average.

    • We find the hardmax as the new cell symbol.

    • We update our cell probabilities every iteration

4.0 Results

The training run jumps around quite a lot but it works lol.

100% prediction accuracy on some puzzles

Lots are solved and lots are also partially solved like the one below.

Almost 100% prediction accuracy on some tasks

According to the paper’s original authors we should see close to 100% accuracy all the time. They didn’t publish their code so we take this with a grain of salt.

We could also have misinterpreted some things lol.

5.0 Further Reading

You might like:

  1. Target Propagation: A Biologically Plausible Neural Network Training Algorithm

  2. Sinkhorn Knopp 1967 paper

You made it this far. Here are some free gpu credits.

If you haven’t already, here’s 12 months of Perplexity Pro for free.

LeetArxiv is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

References

1

Moon, T., Gunther, J., & Kupin, J.,. (2009). Sinkhorn Solves Sudoku. IEEE Transactions on Information Theory, vol. 55, no. 4, pp. 1741-1746. doi: 10.1109/TIT.2009.2013004.

2

Sudoku.com. (2025). Sudoku Rules for Complete Beginnners. Link.

3

Cornell University Department of Mathematics. (2025). The Math Behind Sudoku. Link.

4

Inamdar, A.,. (2025). Beyond the Grid: How I Taught an AI the Subtle Art of Sudoku (Without a Million Identical Puzzles). Link.

5

Sinkhorn, R., & Knopp, P,. (1967). Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics Vol. 21 (1967), No. 2, 343–348 . DOI: 10.2140/pjm.1967.21.343.

6

Knight, P., & Ruiz, D.,. (2013). A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis, 33 (3). pp. 1029-1047. ISSN 0272-4979. Link.

7

Gnarls. (2024). What does it mean for a matrix to have total support? . Mathematics Stack Exchange. Link.

Discussion about this video

User's avatar

Ready for more?