Note

Funsearch

What I love about the funsearch paper is how simple the core iteration loop is:


generate => evaluate => best-shot => generate => ...

I imagine this is similar to how a lot of people ideate with LLMs:

  • generate 5 ideas
  • "my favorites are #2 and #4, please generate 5 more"
  • ...

(one difference being that funsearch uses a population model to avoid getting stuck at local optima)


As they call out, the biggest challenge is "how do you establish an effective evaluator?", especially for multi-step tasks (how do we do credit assignment?)


I'd be interested to see how a funsearch-style strategy would work on problems like those tackled in openai's "Let's Verify Step By Step" paper - rather than training the model directly, have it optimize strategies for each step of a math problem...