Free Energy

A deep dive into Expected Free Energy and its calculation in LRS-Agents.

Overview

Expected Free Energy (G) is the core quantity minimized by Active Inference agents. This document explains:

  • What G represents

  • How it’s calculated

  • Why it balances exploration and exploitation

  • Implementation details in LRS-Agents

What is Expected Free Energy?

Definition

Expected Free Energy \(G\) for a policy \(\pi\) is:

\[G(\pi) = \mathbb{E}_{Q(o_\tau | \pi)}[\ln Q(s_\tau | \pi) - \ln P(o_\tau, s_\tau | C)]\]

where:

  • \(o_\tau\) = Future observations under policy \(\pi\)

  • \(s_\tau\) = Hidden states

  • \(C\) = Preferences (goals)

  • \(Q\) = Approximate posterior (beliefs)

  • \(P\) = Generative model

Intuitive Explanation

\(G\) measures the “badness” of a policy considering:

  1. Uncertainty reduction (Will I learn something?)

  2. Goal achievement (Will I get reward?)

Lower \(G\) = Better policy

A policy with low \(G\):

  • Reduces uncertainty about the world (epistemic value)

  • Achieves desired outcomes (pragmatic value)

Decomposition

\(G\) decomposes into two terms:

\[G(\pi) = \underbrace{\mathbb{E}[H[P(o|s)]]}_{\text{Epistemic}} - \underbrace{\mathbb{E}[\ln P(o|C)]}_{\text{Pragmatic}}\]

Epistemic Value (Information Gain):

  • How much will this policy reduce uncertainty?

  • High for novel, uncertain outcomes

  • Drives exploration

Pragmatic Value (Expected Utility):

  • How much will this policy achieve my goals?

  • High for reliable, rewarding outcomes

  • Drives exploitation

The Trade-off

High Epistemic, Low Pragmatic → Explore
     (Learn but risky)

Low Epistemic, High Pragmatic → Exploit
     (Safe but boring)

Low Epistemic, Low Pragmatic → Avoid
     (Neither learn nor gain)

High Epistemic, High Pragmatic → Ideal!
     (Learn and gain)

Epistemic Value Calculation

Definition

Epistemic value measures information gain:

\[\text{Epistemic}(\pi) = \mathbb{E}_{Q(s|\pi)}[H[P(o|s)]]\]

where \(H\) is entropy (uncertainty).

High entropy → High uncertainty → High information gain

In LRS-Agents

For a policy (sequence of tools):

\[\text{Epistemic} = \sum_{t=1}^{T} H[\text{Tool}_t]\]

where the entropy of each tool depends on:

  1. Historical reliability: More failures → More uncertainty

  2. Novelty: Never used → Maximum uncertainty

  3. Context: State-dependent uncertainty

from lrs.core.free_energy import calculate_epistemic_value

epistemic = calculate_epistemic_value(
    policy=[novel_tool, uncertain_tool],
    state={},
    historical_stats=None  # No history = high uncertainty
)
# Returns: ~1.5 (high information gain)

Calculation Details

For each tool in the policy:

\[H[\text{Tool}] = -\sum_i P(\text{outcome}_i) \log P(\text{outcome}_i)\]

Outcome probabilities from historical statistics:

\[\begin{split}P(\text{success}) &= \frac{\text{successes}}{\text{total calls}} \\ P(\text{failure}) &= 1 - P(\text{success})\end{split}\]

Binary entropy:

\[H = -P(\text{success}) \log P(\text{success}) - P(\text{failure}) \log P(\text{failure})\]

Special cases:

  • No history: \(H = \log 2 \approx 0.69\) (maximum uncertainty)

  • Always succeeds: \(H = 0\) (no uncertainty)

  • 50/50 success: \(H = \log 2\) (maximum binary entropy)

Example

from lrs.core.free_energy import calculate_epistemic_value

# Tool with 70% success rate
# H = -0.7*log(0.7) - 0.3*log(0.3) ≈ 0.61

# Tool never used before
# H = log(2) ≈ 0.69

# Total epistemic value for policy
epistemic = 0.61 + 0.69 = 1.30

Pragmatic Value Calculation

Definition

Pragmatic value measures expected utility:

\[\text{Pragmatic}(\pi) = \mathbb{E}_{Q(o|\pi)}[\ln P(o|C)]\]

where \(P(o|C)\) represents preferences over outcomes.

In simpler terms:

\[\text{Pragmatic} = \sum_{t=1}^{T} \gamma^t \left[ P_t(\text{success}) \cdot R_{\text{success}} + P_t(\text{failure}) \cdot R_{\text{failure}} \right]\]

where:

  • \(\gamma\) = Discount factor (default 0.99)

  • \(R\) = Rewards from preferences

  • \(P_t\) = Success probability at step \(t\)

In LRS-Agents

from lrs.core.free_energy import calculate_pragmatic_value

pragmatic = calculate_pragmatic_value(
    policy=[reliable_tool, fast_tool],
    state={},
    preferences={
        'success': 5.0,     # Reward for success
        'error': -3.0,      # Penalty for error
        'step_cost': -0.1   # Small cost per step
    },
    historical_stats=registry.statistics,
    discount_factor=0.99
)

Calculation Details

For each tool at step \(t\):

\[V_t = \gamma^{t-1} \left[ p_{\text{success}} \cdot R_{\text{success}} + (1 - p_{\text{success}}) \cdot R_{\text{error}} \right] + R_{\text{step}}\]

where:

  • \(p_{\text{success}}\) from historical statistics

  • \(R_{\text{success}}\) from preferences (default 5.0)

  • \(R_{\text{error}}\) from preferences (default -3.0)

  • \(R_{\text{step}}\) = step cost (default -0.1)

Total pragmatic value:

\[\text{Pragmatic} = \sum_{t=1}^{T} V_t\]

Example

# Policy: [tool_a, tool_b]
# tool_a: 80% success
# tool_b: 90% success

# Step 1: tool_a
V_1 = 0.99^0 * [0.8 * 5.0 + 0.2 * (-3.0)] - 0.1
V_1 = 1.0 * [4.0 - 0.6] - 0.1 = 3.3

# Step 2: tool_b
V_2 = 0.99^1 * [0.9 * 5.0 + 0.1 * (-3.0)] - 0.1
V_2 = 0.99 * [4.5 - 0.3] - 0.1  4.06

# Total pragmatic value
Pragmatic = 3.3 + 4.06 = 7.36

Total Expected Free Energy

Formula

Combining epistemic and pragmatic values:

\[G = \alpha \cdot \text{Epistemic} - \text{Pragmatic}\]

where \(\alpha\) is the epistemic weight (default 1.0).

Lower G is better because:

  • High epistemic → Higher G (exploration cost)

  • High pragmatic → Lower G (exploitation benefit)

The agent balances both by minimizing G.

In LRS-Agents

from lrs.core.free_energy import calculate_expected_free_energy

G = calculate_expected_free_energy(
    policy=[tool_a, tool_b],
    state={},
    preferences={'success': 5.0, 'error': -3.0},
    historical_stats=registry.statistics,
    epistemic_weight=1.0,
    discount_factor=0.99
)

Detailed Example

Compare two policies:

Policy A: Reliable tools [cache_tool, db_tool]

# Epistemic (low - known tools)
Epistemic_A = 0.1 + 0.15 = 0.25

# Pragmatic (high - reliable)
Pragmatic_A = 4.8 + 4.5 = 9.3

# G
G_A = 0.25 - 9.3 = -9.05  # Very negative (good!)

Policy B: Novel tools [new_api, experimental_tool]

# Epistemic (high - uncertain)
Epistemic_B = 0.69 + 0.69 = 1.38

# Pragmatic (low - unreliable)
Pragmatic_B = 2.0 + 1.5 = 3.5

# G
G_B = 1.38 - 3.5 = -2.12  # Less negative (worse)

Result: Agent prefers Policy A (lower G).

Precision-Weighted Selection

G alone doesn’t determine policy selection. Precision \(\gamma\) weights the choice.

Softmax Selection

Policies are selected via softmax:

\[P(\pi_i) = \frac{\exp(-\beta \cdot G_i)}{\sum_j \exp(-\beta \cdot G_j)}\]

where inverse temperature:

\[\beta = \frac{1}{T \cdot (1 - \gamma + \epsilon)}\]

Key insight:

  • High \(\gamma\) → High \(\beta\) → Deterministic selection (exploit)

  • Low \(\gamma\) → Low \(\beta\) → Stochastic selection (explore)

Example

Three policies with G values:

G_values = [-9.05, -7.2, -5.1]  # Lower is better

# High precision (γ = 0.8)
β_high = 1 / (0.7 * (1 - 0.8 + 0.01))  6.8
P_high = softmax(-6.8 * G_values)
# Result: [0.85, 0.12, 0.03]  # Exploit best

# Low precision (γ = 0.3)
β_low = 1 / (0.7 * (1 - 0.3 + 0.01))  2.0
P_low = softmax(-2.0 * G_values)
# Result: [0.50, 0.32, 0.18]  # More exploration

Precision-Dependent Behavior

Precision

Temperature

Behavior

γ > 0.7 (High)

Low (deterministic)

Exploit: Select best policy

γ ≈ 0.5 (Medium)

Medium

Balanced: Softmax over policies

γ < 0.3 (Low)

High (stochastic)

Explore: Try alternatives

Adaptive G Evaluation

Epistemic Weight Adaptation

The epistemic weight \(\alpha\) can adapt with precision:

\[\alpha(\gamma) = \alpha_{\text{base}} \cdot \left(1 + \frac{1 - \gamma}{\gamma + \epsilon}\right)\]

Low precision → Higher epistemic weight → More exploration

def adaptive_epistemic_weight(base_alpha, precision):
    return base_alpha * (1 + (1 - precision) / (precision + 0.01))

# High precision
alpha_high = adaptive_epistemic_weight(1.0, 0.8)
# Result: 1.25 (slightly higher epistemic)

# Low precision
alpha_low = adaptive_epistemic_weight(1.0, 0.3)
# Result: 3.3 (much higher epistemic - explore!)

Context-Dependent G

G can depend on current state:

def calculate_contextual_G(policy, state, precision):
    # Standard G calculation
    G_base = calculate_expected_free_energy(policy, state, ...)

    # Adjust based on context
    if state.get('urgent'):
        # Prioritize pragmatic value when urgent
        G_adjusted = G_base * 0.5  # Favor low-G policies more
    elif state.get('exploratory_phase'):
        # Increase epistemic weight
        G_adjusted = G_base * 2.0  # Allow higher-G exploration
    else:
        G_adjusted = G_base

    return G_adjusted

Multiple Objectives

Handle multiple competing objectives:

\[G_{\text{total}} = \sum_i w_i \cdot G_i\]
# Example: Balance speed and accuracy
G_speed = calculate_G(policy, preferences_speed)
G_accuracy = calculate_G(policy, preferences_accuracy)

# Weight based on precision
if precision > 0.7:
    w_speed, w_accuracy = 0.3, 0.7  # Prioritize accuracy
else:
    w_speed, w_accuracy = 0.6, 0.4  # Try faster approaches

G_total = w_speed * G_speed + w_accuracy * G_accuracy

Hybrid G Evaluation

LLM + Mathematical G

LRS-Agents support hybrid evaluation combining:

  • Mathematical G (precise but limited)

  • LLM-estimated G (flexible but noisy)

\[G_{\text{hybrid}} = (1 - \lambda) \cdot G_{\text{math}} + \lambda \cdot G_{\text{llm}}\]

where \(\lambda = 1 - \gamma\) (trust LLM more when uncertain).

from lrs.inference.evaluator import HybridGEvaluator

evaluator = HybridGEvaluator()

eval_result = evaluator.evaluate_hybrid(
    proposal=llm_proposal,
    state={},
    preferences={'success': 5.0},
    precision=0.5,
    historical_stats=registry.statistics
)

print(f"G_hybrid: {eval_result.total_G}")
print(f"G_math: {eval_result.components['G_math']}")
print(f"G_llm: {eval_result.components['G_llm']}")
print(f"λ: {eval_result.components['lambda']}")

Why Hybrid?

Mathematical G:

  • ✓ Precise

  • ✓ Consistent

  • ✗ Limited to known tools

  • ✗ Can’t handle novel contexts

LLM G:

  • ✓ Flexible

  • ✓ Handles novel scenarios

  • ✗ Noisy

  • ✗ Can be overconfident

Hybrid:

  • ✓ Precise when certain (high γ)

  • ✓ Flexible when uncertain (low γ)

  • ✓ Best of both worlds

Edge Cases and Special Scenarios

Empty Policy

\[G_{\text{empty}} = 0\]

No action = No information gain, no reward.

Single Tool

\[G = H[\text{tool}] - [p \cdot R_{\text{success}} + (1-p) \cdot R_{\text{error}}] - R_{\text{step}}\]

Long Policies

For policies with many steps, discount future contributions:

\[G = \sum_{t=1}^{T} \gamma^{t-1} [H_t - V_t]\]

Novel Tools

For tools never seen before:

  • Assume maximum entropy: \(H = \log 2\)

  • Assume neutral success probability: \(p = 0.5\)

  • Results in moderate G (neither avoid nor strongly prefer)

Failed Policies

If a policy fails during execution:

  • G becomes irrelevant (policy didn’t complete)

  • Precision drops based on failure

  • Next iteration explores alternatives

Implementation Details

Caching

G calculations can be expensive. Cache results:

from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_calculate_G(policy_tuple, state_hash, preferences_hash):
    return calculate_expected_free_energy(
        policy=list(policy_tuple),
        state=unhash(state_hash),
        preferences=unhash(preferences_hash),
        ...
    )

Numerical Stability

Avoid numerical issues:

import numpy as np

def safe_log(x, epsilon=1e-10):
    """Log with numerical stability"""
    return np.log(np.maximum(x, epsilon))

def safe_entropy(p, epsilon=1e-10):
    """Entropy with stability"""
    p = np.clip(p, epsilon, 1 - epsilon)
    return -p * safe_log(p) - (1 - p) * safe_log(1 - p)

Batch Evaluation

Evaluate multiple policies efficiently:

def evaluate_batch(policies, state, preferences, stats):
    """Vectorized G calculation"""
    epistemics = [calculate_epistemic_value(p, state, stats)
                  for p in policies]
    pragmatics = [calculate_pragmatic_value(p, state, preferences, stats)
                  for p in policies]

    G_values = np.array(epistemics) - np.array(pragmatics)
    return G_values

Validation

Sanity checks:

def validate_G(G, policy):
    """Ensure G is reasonable"""
    assert np.isfinite(G), "G must be finite"
    assert -100 < G < 100, "G out of reasonable range"

    # More pragmatic policies should have lower G
    # (all else equal)

Debugging

Inspect G components:

from lrs.core.free_energy import evaluate_policy

eval_obj = evaluate_policy(policy, state, preferences, stats)

print(f"Total G: {eval_obj.total_G}")
print(f"Epistemic: {eval_obj.epistemic_value}")
print(f"Pragmatic: {eval_obj.pragmatic_value}")
print(f"Per-step breakdown:")
for i, (e, p) in enumerate(zip(eval_obj.step_epistemics,
                                 eval_obj.step_pragmatics)):
    print(f"  Step {i+1}: E={e:.2f}, P={p:.2f}, G={e-p:.2f}")

Further Reading

Next Steps

  • Understand Precision Dynamics for adaptation

  • Try ../tutorials/02_understanding_precision for hands-on practice

  • Read Core Concepts for implementation details