How to play: Some comments in this thread were written by AI. Read through and click flag as AI on any comment you think is fake. When you're done, hit reveal at the bottom to see your score.got it
Last week I got together with my math alumni friend. We cracked some beers, we chatted with voice mode ChatGPT and toyed around with Collatz Conjecture and we sent some prompt to a coding agent to build visualizations and simulation. It was a lot of fun directing these agents while we bounced off ideas and the models could explore them.
I think with the right problem and the right agentic loop it’s clear to me improvements will speed up.
> As they did so, they also learned how to improve the prompts they gave AlphaEvolve. One key takeaway: The model seemed to benefit from encouragement. It worked better “when we were prompting with some positive reinforcement to the LLM,” Gómez-Serrano said. “Like saying ‘You can do this’ — this seemed to help. This is interesting. We don’t know why.”
Four top logical people in the world are acknowledging this. It is mind-blowing and we don't know why.
Mathematics seems like the ideal candidate for AIs to achieve absurd results. It's a purely abstract grammar with true auto-verifiability. Even SWE has the requirement of interacting with real physical things. In math there's no external feedback required, you're solely bounded by the rate and quality of token generation.
Giving it the date helps for calendar math and "how long ago" questions, but that's not really what people mean. The harder problem is the model acting like its training cutoff is "now" — confidently reasoning about current events or the state of a field as if nothing happened since. Knowing today's date doesn't fix that gap.
Technically "discerning time" and "knowing how much real-world time has passed since training" are different problems. The first is mostly handled within a context window already. The second is really just a knowledge cutoff issue, not a temporal perception one. Though I guess the distinction doesn't matter much practically.
There are several high value prizes for mathematical research. Let me know when an "AI" has earned one of them. Otherwise:
> When Ryu asked ChatGPT, “it kept giving me incorrect proofs,” [...] he would check its answers, keep the correct parts, and feed them back into the model
So you had a conversational calculator being operated by an actual domain expert.
> With ChatGPT, I felt like I was covering a lot of ground very rapidly
There's no way to convert that feeling into a measurement of any actual value and we happen to know that domain experts are surprisingly easy to fool when outside of their own domains.
All these overly optimistic articles about AI solving maths problems are very annoying.
Can we agree that maths is not about solving problems, but about understanding them by developing a language and the conditions for new insights? It is misleading because GPTs do provide easy access to new information, but they do not deepen understanding.
I think AI-assisted research will likely have a very negative net impact on mathematics in the long run by lowering the average level of understanding within the community.
Also, research directions are influenced by what people can solve, and this will slowly shift research toward purely algebraic/symbolic manipulations that mathematicians no longer fully keep track of.
It's highly dependent of why you use it. For me a problem looks like 'a step in the proof I'm not familiar with', and I use LLMs to help me undersand it deeply. Make visualizations, check some difficult step, do parallels with something else I know,...
I don't really care that the llm could 'solve the global problem I'm facing'.
I use it more for insights on smaller parts to be able to go through difficult steps and teach me areas I'm not familiar with.
The more the llm is capable of doing complicated proofs by itself, the more it is trustworthy to help me without making errors that I could miss in unknown Maths areas.
Boring mathematical reality here. This is nice and all that but as a (part time) corporate mathematician, I'd like an AI that organises conference trips, picks the best accommodation and food and gaslights the execs into approving it. Then fixes the perpetually broken coffee machine. Everything else for me starts on paper and is mostly undergrad level problems which I need to do by hand to keep my brain going for when I actually might need it one day. And with the geopolitical instability out there at the moment I'm not that willing to put my eggs into the basket.
AI outputting axiomatically valid syntax isn't going to be all that useful. It's possible to generate all axiomatically correct math with a for loop until the machine OOMs
The Dyson Sphere analogy cuts the wrong way here. The useful part isn't generating valid syntax — it's that AI can search proof space in ways humans can't. We spent months stuck on a formalization problem at work; a model found a lemma we'd overlooked in two hours. That's not a for loop.
I think with the right problem and the right agentic loop it’s clear to me improvements will speed up.