Target Propagation: A Biologically Plausible Neural Network Training Algorithm
An Alternative to Backpropagation Founded on Targets, Not Gradients
Quick Intro
LeetArxiv is Leetcode for implementing Arxiv papers.

The code to this article is available on GitHub.
1.0 Introduction
Gradient-based learning algorithms like backpropagation and conjugate gradient descent are biologically implausible. Biologically-inspired alternatives1 to gradient-based learning include the forward-forward algorithm (Hinton, 2022)2, NEAT or Neuro-Evolution of Augmenting Topologies (Stanley & Miikkulainen, 2002)3, equilibrium propagation (Bengio & Scellier, 2016)4, direct feedback alignment (Nøkland, 2016)5 and NoPropagation (Li et al. , 2025)6.
This article focuses on Target Propagation, a biologically plausible alternative to backpropagation introduced in (Bengio, 2014)7 and improved upon in (Bengio et al, 2015)8.
Target propagation is built upon the thesis: autoencoders are great at reconstruction and therefore can be used to learn the backpropagation computation.
2.0 Forward Pass
Target propagation permits training deep networks with long-term dependencies or strong non-linearities such as a composition of tanh layers (Bengio et al, 2015). They are constrained neither by depth nor gradients that vanish or explode.
Each layer of our neural network resembles parts of a denoising auto-encoder(DAE). DAEs are trained to find meaningful representations of data by learning to remove noise (Vincent et al., 2008)9.
3.0 Backward Pass
The backward pass takes targets instead of gradients. Learning occurs in target propagation by finding the approximate inverse of the layer’s output.
As mentioned in (Bengio et al, 2015), this approximate inverse is chosen to be a nearby value close to the desired output which hopefully leads to a lower global loss.
In Python, an inverse layer resembles a forward layer. This is the structure of a single layer:
class LinearWithInverse(nn.Module):
"""Linear layer with a learned approximate inverse"""
def __init__(self, in_features, out_features):
super(LinearWithInverse, self).__init__()
self.forward_layer = nn.Linear(in_features, out_features)
self.inverse_layer = nn.Linear(out_features, in_features)
def forward(self, x):
return self.forward_layer(x)
def inverse(self, y):
return self.inverse_layer(y)
The improved Difference Target Propagation network architecture has this structure:
class DTPNetwork(nn.Module):
"""Network for Difference Target Propagation"""
def __init__(self, layer_sizes):
super(DTPNetwork, self).__init__()
self.layers = nn.ModuleList()
# Create layers with forward and inverse mappings
for i in range(len(layer_sizes)-1):
self.layers.append(LinearWithInverse(layer_sizes[i], layer_sizes[i+1]))
def forward(self, x):
activations = [x]
for layer in self.layers:
x = F.relu(layer(x))
activations.append(x)
return activations
def compute_targets(self, activations, labels, learning_rate=0.1):
# Start with the top layer target (difference to true label)
targets = [None] * len(activations)
top_layer = len(activations) - 1
targets[top_layer] = labels - activations[top_layer]
# Propagate targets downward
for i in range(top_layer-1, 0, -1):
# Difference target propagation formula
targets[i] = activations[i] + self.layers[i].inverse(
activations[i+1] + learning_rate * targets[i+1]) - self.layers[i].inverse(activations[i+1])
return targets
4.0 Training the Network
Training via target propagation is computationally expensive compared to backpropagation. For some layers, a local gradient is found but this is not shared across layers. Here is the training function:
def train_dtp(model, train_loader, optimizer, epochs=10, learning_rate=0.1):
model.train()
for epoch in range(epochs):
total_loss = 0
for batch_idx, (data, target) in enumerate(train_loader):
# Flatten the image
data = data.view(data.size(0), -1)
# Forward pass to get activations
activations = model(data)
# Convert target to one-hot encoding
target_onehot = F.one_hot(target, num_classes=10).float()
# Compute targets using DTP
targets = model.compute_targets(activations, target_onehot, learning_rate)
# Update each layer
optimizer.zero_grad()
# Compute loss for each layer and update
for i in range(1, len(activations)):
# Compute layer-specific loss
layer_loss = F.mse_loss(activations[i], targets[i])
# Backward pass for this layer only
layer_loss.backward(retain_graph=True)
# Update only the current layer's parameters
for param in model.layers[i-1].parameters():
if param.grad is not None:
param.data -= learning_rate * param.grad
param.grad.zero_()
total_loss += layer_loss.item()
if batch_idx % 100 == 0:
print(f'Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)}'
f' ({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {layer_loss.item():.6f}')
print(f'Epoch: {epoch}, Average Loss: {total_loss / len(train_loader.dataset):.6f}')
5.0 Results
The algorithm takes 5 minutes to achieve 39% accuracy on MNIST on CPU. Target propagation is extremely slow. This is why it failed to go mainstream post-2015.
Footnotes
Penkovsky. (2019). Are There Alternatives to Backpropagation?. StackOverflow.
Hinton, G,. (2022). The Forward-Forward Algorithm: Some Preliminary Investigations. ArXiv. https://doi.org/10.48550/arXiv.2212.13345
Stanley, K,. & Miikkulainen, R,. (2002). Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computation. 10 (2): 99–127. CiteSeerX 10.1.1.638.3910. doi:10.1162/106365602320169811. PMID 12180173. S2CID 498161
Bengio, Y,. & Scellier, B,. (2016). Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation. ArXiv. https://doi.org/10.48550/arXiv.1602.05179
Nøkland, A,. (2016). Direct Feedback Alignment Provides Learning in Deep Neural Networks. ArXiv. https://doi.org/10.48550/arXiv.1609.01596
Li, Q., Teh, Y,. Pascanu, R,. (2025). NoProp: Training Neural Networks without Back-propagation or Forward-propagation. ArXiv. https://doi.org/10.48550/arXiv.2503.24322
Bengio, Y,. (2014). How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation. ArXiv. https://doi.org/10.48550/arXiv.1407.7906
Bengio, Y,. Lee, D.-H., Zhang, S., & Fischer, A,. (2015). Difference Target Propagation. ArXiv. https://doi.org/10.48550/arXiv.1412.7525