Alex Sanchez-Stern, Abhishek Varghese, Zhanna Kaufman,
Dylan Zhang, Talia Ringer, Yuriy Brun
Automating Formal Verification with Reward-Free Reinforcement Learning
writing proofs about programs!
more reliability
than testing!
requires specialized
expertise
labor
intensive
Labor saving!
2015
2020
2025
# of proof synthesis tools
Time
Theorem Prover
...
...
Predictor
Search
...
Unbounded Search Space
There are some techniques that can
help us prune the search tree
It's hard to explore when you don't know how close you are!
forall n,
n + 1 > n
Medium
forall n,
False
Hard
n = n
Easy
26% Shorter Proofs
in 27% Fewer Steps*
* Than a search without state scoring
Theorem Prover
...
...
Predictor
Search
"Reward-free Reinforcement Learning"
In particular, V-learning
S1
S2
S3
S5
\(\gamma\) = time discount
finding rewards sooner is better!
S4
A
B
C
Generalizes to
...
Action
State
State
Action
Goal
...
...
Goal
Goal
Goal
Action
Goal
Goal
Goal
Goal
Goal
Adding extra rewards can lead to bad behavior
induction x
induction x
induction x
Inductive Case
Base Case
Reward-free doesn't have this problem!
A new update equation that accounts for
the branching structure of proofs
Assumptions:
S1
S2
S3
S4
S1
S2
S3
S4
Assumption: The state of a completed proof has value 1
G1
G2
G3
V(G1) = ???
\(m\) = Steps to complete proof from G2
\(n\) = Steps to complete proof from G3
Steps to complete proof from G1 = \( m + n + 1 \)
Goal
Action
Since state values are always < 1,
generating more subgoals always has a cost, even if they are solved immediately
S1
S2
But within a subgoal, the agent learns as if it was a whole proof!
S3
S4
26% Shorter Proofs
in 27% Fewer Steps
124 Coq Projects
68,501 Theorems
85/15 train-test split
QEDCartographer, except without state scoring-based search
Uses a variant of depth-first search instead
27% faster
than Proverbot9001
26% Shorter Proofs
than Proverbot9001
Shorter proofs are:
Proves slightly more theorems, and proves complementary theorems
Automating Formal Verification with Reward-Free Reinforcement Learning
Uses a new V-value equation for branching goal structure
Makes producing verified-correct code easier and faster
Preprint available at alexsanchezstern.com