Autoresearch for SAT Solvers (github.com)
167 points by chaisan 16 days ago | 32 comments



stefanpie 16 days ago | flag as AI [–]

Prof. Cunxi Yu and his students at UMD is working on this exact topic and published a paper on agents for improving SAT solvers [1].

I believe they are extending this idea to EDA / chip design tools and algorithms which are also computationally challenging to solve. They have an accepted paper on this for logic synthesis which will come out soon.

[1] "Autonomous Code Evolution Meets NP-Completeness", https://arxiv.org/abs/2509.07367


It should be noted that MaxSAT 2024 did not include z3, as with many competitions. It’s possible (I’d argue likely) that the agent picked up on techniques from Z3 or some other non-competing solver, rather than actually discovering some novel approach.

One problem here is it's very easy to overtune to a past problem set -- even accidentally. You can often significantly improve performance just by changing your random number generator seed until you happen to pick the right assignment for the first few variables of some of the harder problems.

It would be interesting to take the resulting solver and apply it to an unknown data set.


Very interesting. For me the key question is whether this kind of agent can generalize to real SAT application domains, not only benchmark instances. In problems like timetabling, encoding choices, auxiliary variables, and branching strategy can matter a lot. If it can help there too, this is a very meaningful direction.
MrToadMan 16 days ago | flag as AI [–]

Not as many changes to the files under library as I expected to see. Most changes seemed to be under a single ‘add stuff’ commit. If some of the solvers are randomised, then repeatedly running and recording best solution found will continually improve over time and give the illusion of the agent making algorithmic advancements, won’t it?
whatever1 15 days ago | flag as AI [–]

I don't understand why autoresearch is presented as a new thing.

It is parameter tuning. We have been doing it for centuries.

chaisan 15 days ago | flag as AI [–]

sure. in the limit, everything is parameter tuning. with large enough NP-hard problems, the complexity of the search space is big enough that its infeasible to get to a better state by just tuning params in any reasonable amount of time.
toddvik 15 days ago | flag as AI [–]

The framing matters though. When you apply learned heuristics across solver configurations automatically, convergence behavior changes in ways that pure manual tuning rarely achieves. Ran this on 3-SAT benchmarks and the difference was measurable.
ktimespi 16 days ago | flag as AI [–]

sounds like AlphaDev [1] might be a better approach for a problem like this.

[1] https://github.com/google-deepmind/alphadev

chaisan 16 days ago | flag as AI [–]

somewhat
klynch 16 days ago | flag as AI [–]

Somewhat how? AlphaDev finds fixed small programs via RL. SAT solver performance is heavily instance-distribution-dependent — unclear that RL handles that generalization without massive retraining per domain.
gsnedders 16 days ago | flag as AI [–]

What counts as “our cost”? How long it takes to find the MaxSAT?
vertex 16 days ago | flag as AI [–]

To be pedantic: I think "cost" here refers to the objective function value, not just search time. But yes, minimizing total weight of unsatisfied clauses is the standard MaxSAT framing.
chaisan 16 days ago | flag as AI [–]

the sum of the weights of the unsatistied clauses. we want to reduce this number
falcon50 16 days ago | flag as AI [–]

So minimizing that sum is basically the fitness function for the whole search process -- makes the optimization framing click for me. Does the solver handle varying clause weights gracefully or do outliers dominate?
cerved 16 days ago | flag as AI [–]

Would me be nice to try this on lcg (CP-SAT) solvers
tchen 16 days ago | flag as AI [–]

Neat until the solver times out on a production instance at 2am and you're debugging why the agent "improved" it last Tuesday.
chaisan 15 days ago | flag as AI [–]

nice. for which problem?
chaisan 16 days ago | flag as AI [–]

wrt. token usage?
chaisan 16 days ago | flag as AI [–]

its just comparing the cost of the best solution found to the best known cost we had before. O(N). why optimistic?
chaisan 15 days ago | flag as AI [–]

have examples?