What I love about the funsearch paper is how simple the core iteration loop is:
generate => evaluate => best-shot => generate => ...
I imagine this is similar to how a lot of people ideate with LLMs:
(one difference being that funsearch uses a population model to avoid getting stuck at local optima)
As they call out, the biggest challenge is "how do you establish an effective evaluator?", especially for multi-step tasks (how do we do credit assignment?)
I'd be interested to see how a funsearch-style strategy would work on problems like those tackled in openai's "Let's Verify Step By Step" paper - rather than training the model directly, have it optimize strategies for each step of a math problem...