Multi-Armed Bandit: Explore and Exploit

TLDR; The multi-armed bandit algorithm; optimizing between exploiting known-best options and exploring potentially better ones; maps well onto navigating BJJ’s thousands of techniques and concepts.

Pasted image 20260306100418.png

An algorithm is a set of rules to optimize problem-solving. Problem-solving under physical stress is what submission grappling is about.

In computer science, algorithms are finite sequences of instructions. Quicksort, published in 1961, was 2–3x faster than prior methods. Algorithms make systems more efficient.

Great visualization on wikipedia Pasted image 20260306100529.png

What is your algorithm to sort through the thousands of techniques, concepts and variations in Jiu Jitsu?

The Lapel Guard Encyclopedia by Keenan: 155 individual sections. High-Percentage No-gi Chokes by Lachlan: 86 sections. Danaher’s Enter the System - Leglocks: 121 sections. That’s 300+ techniques across just three systems.

BJJ Fanatics has ~675 series, each hours long, releasing new ones faster than anyone could keep up.

So how do we sort through what works? How do we avoid dismissing high-value techniques that don’t click immediately? How do we avoid false positives?

Listening to battle-tested pros; coaches, world champions; is a start. Studying competition footage for what works at high rates helps too. But we all differ in body type, ability, and affinity; at some point we develop our own games.

What do we include and exclude from our A Game?

Enter: The multi-armed bandit.

I learned of this working at Adobe in Digital Marketing; it’s used for recommendation engines and A/B testing. Named after the slot machine (the one-armed bandit), it uses the best-performing option most of the time while making room to try new things sometimes.

Exploit – Explore.

Say you Exploit 80%, rely on your go-to options, and Explore 20%, try new stuff. Those parameters should shift “as the color of our belt darkens”. White and blue belts: mostly exploration, mapping the terrain. Later: exploit and refine what works, while staying open to improvements.

The algorithm has dozens of strategies:

Epsilon-greedy: Select the best option most of the time (1-ε), where ε ≈ 10%. BJJ: 90% kick their butt with what you know. 10% play around.

Epsilon First: Pure exploration phase, then pure exploitation. BJJ: interesting, but anyone stuck on this in the 90’s would miss the leg lock, lapel, and rolling back-take revolutions.

Epsilon Decreasing: Start highly exploratory, gradually shift exploitative. BJJ: been there done that, no more fancy spinning stuff that doesn’t work anyway.

Contextual-Epsilon-Greedy: Highly exploratory when non-critical, highly exploitative when it matters.

This one’s most relatable for me. There’s a time for exploration; with good partners who give you “the look” to drill and stress-test new moves. And a time for full exploitation; competitions.

This isn’t a judgment that old or new is better.

On one extreme: a jack of all trades, constantly chasing novelty, never developing depth. On the other: Roger Gracie, finishing top-level opponents with BJJ 101.

Personally I love both exploring and deep-diving, which makes this hobby very time-consuming, but I wouldn’t have it any other way!

Ars Longa, Vita Brevis Art is long, life is short.

Happy rolling, jelaludo

2020/07/31