Free Energy
A deep dive into Expected Free Energy and its calculation in LRS-Agents.
Overview
Expected Free Energy (G) is the core quantity minimized by Active Inference agents. This document explains:
What G represents
How it’s calculated
Why it balances exploration and exploitation
Implementation details in LRS-Agents
What is Expected Free Energy?
Definition
Expected Free Energy \(G\) for a policy \(\pi\) is:
where:
\(o_\tau\) = Future observations under policy \(\pi\)
\(s_\tau\) = Hidden states
\(C\) = Preferences (goals)
\(Q\) = Approximate posterior (beliefs)
\(P\) = Generative model
Intuitive Explanation
\(G\) measures the “badness” of a policy considering:
Uncertainty reduction (Will I learn something?)
Goal achievement (Will I get reward?)
Lower \(G\) = Better policy
A policy with low \(G\):
Reduces uncertainty about the world (epistemic value)
Achieves desired outcomes (pragmatic value)
Decomposition
\(G\) decomposes into two terms:
Epistemic Value (Information Gain):
How much will this policy reduce uncertainty?
High for novel, uncertain outcomes
Drives exploration
Pragmatic Value (Expected Utility):
How much will this policy achieve my goals?
High for reliable, rewarding outcomes
Drives exploitation
The Trade-off
High Epistemic, Low Pragmatic → Explore
(Learn but risky)
Low Epistemic, High Pragmatic → Exploit
(Safe but boring)
Low Epistemic, Low Pragmatic → Avoid
(Neither learn nor gain)
High Epistemic, High Pragmatic → Ideal!
(Learn and gain)
Epistemic Value Calculation
Definition
Epistemic value measures information gain:
where \(H\) is entropy (uncertainty).
High entropy → High uncertainty → High information gain
In LRS-Agents
For a policy (sequence of tools):
where the entropy of each tool depends on:
Historical reliability: More failures → More uncertainty
Novelty: Never used → Maximum uncertainty
Context: State-dependent uncertainty
from lrs.core.free_energy import calculate_epistemic_value
epistemic = calculate_epistemic_value(
policy=[novel_tool, uncertain_tool],
state={},
historical_stats=None # No history = high uncertainty
)
# Returns: ~1.5 (high information gain)
Calculation Details
For each tool in the policy:
Outcome probabilities from historical statistics:
Binary entropy:
Special cases:
No history: \(H = \log 2 \approx 0.69\) (maximum uncertainty)
Always succeeds: \(H = 0\) (no uncertainty)
50/50 success: \(H = \log 2\) (maximum binary entropy)
Example
from lrs.core.free_energy import calculate_epistemic_value
# Tool with 70% success rate
# H = -0.7*log(0.7) - 0.3*log(0.3) ≈ 0.61
# Tool never used before
# H = log(2) ≈ 0.69
# Total epistemic value for policy
epistemic = 0.61 + 0.69 = 1.30
Pragmatic Value Calculation
Definition
Pragmatic value measures expected utility:
where \(P(o|C)\) represents preferences over outcomes.
In simpler terms:
where:
\(\gamma\) = Discount factor (default 0.99)
\(R\) = Rewards from preferences
\(P_t\) = Success probability at step \(t\)
In LRS-Agents
from lrs.core.free_energy import calculate_pragmatic_value
pragmatic = calculate_pragmatic_value(
policy=[reliable_tool, fast_tool],
state={},
preferences={
'success': 5.0, # Reward for success
'error': -3.0, # Penalty for error
'step_cost': -0.1 # Small cost per step
},
historical_stats=registry.statistics,
discount_factor=0.99
)
Calculation Details
For each tool at step \(t\):
where:
\(p_{\text{success}}\) from historical statistics
\(R_{\text{success}}\) from preferences (default 5.0)
\(R_{\text{error}}\) from preferences (default -3.0)
\(R_{\text{step}}\) = step cost (default -0.1)
Total pragmatic value:
Example
# Policy: [tool_a, tool_b]
# tool_a: 80% success
# tool_b: 90% success
# Step 1: tool_a
V_1 = 0.99^0 * [0.8 * 5.0 + 0.2 * (-3.0)] - 0.1
V_1 = 1.0 * [4.0 - 0.6] - 0.1 = 3.3
# Step 2: tool_b
V_2 = 0.99^1 * [0.9 * 5.0 + 0.1 * (-3.0)] - 0.1
V_2 = 0.99 * [4.5 - 0.3] - 0.1 ≈ 4.06
# Total pragmatic value
Pragmatic = 3.3 + 4.06 = 7.36
Total Expected Free Energy
Formula
Combining epistemic and pragmatic values:
where \(\alpha\) is the epistemic weight (default 1.0).
Lower G is better because:
High epistemic → Higher G (exploration cost)
High pragmatic → Lower G (exploitation benefit)
The agent balances both by minimizing G.
In LRS-Agents
from lrs.core.free_energy import calculate_expected_free_energy
G = calculate_expected_free_energy(
policy=[tool_a, tool_b],
state={},
preferences={'success': 5.0, 'error': -3.0},
historical_stats=registry.statistics,
epistemic_weight=1.0,
discount_factor=0.99
)
Detailed Example
Compare two policies:
Policy A: Reliable tools [cache_tool, db_tool]
# Epistemic (low - known tools)
Epistemic_A = 0.1 + 0.15 = 0.25
# Pragmatic (high - reliable)
Pragmatic_A = 4.8 + 4.5 = 9.3
# G
G_A = 0.25 - 9.3 = -9.05 # Very negative (good!)
Policy B: Novel tools [new_api, experimental_tool]
# Epistemic (high - uncertain)
Epistemic_B = 0.69 + 0.69 = 1.38
# Pragmatic (low - unreliable)
Pragmatic_B = 2.0 + 1.5 = 3.5
# G
G_B = 1.38 - 3.5 = -2.12 # Less negative (worse)
Result: Agent prefers Policy A (lower G).
Precision-Weighted Selection
G alone doesn’t determine policy selection. Precision \(\gamma\) weights the choice.
Softmax Selection
Policies are selected via softmax:
where inverse temperature:
Key insight:
High \(\gamma\) → High \(\beta\) → Deterministic selection (exploit)
Low \(\gamma\) → Low \(\beta\) → Stochastic selection (explore)
Example
Three policies with G values:
G_values = [-9.05, -7.2, -5.1] # Lower is better
# High precision (γ = 0.8)
β_high = 1 / (0.7 * (1 - 0.8 + 0.01)) ≈ 6.8
P_high = softmax(-6.8 * G_values)
# Result: [0.85, 0.12, 0.03] # Exploit best
# Low precision (γ = 0.3)
β_low = 1 / (0.7 * (1 - 0.3 + 0.01)) ≈ 2.0
P_low = softmax(-2.0 * G_values)
# Result: [0.50, 0.32, 0.18] # More exploration
Precision-Dependent Behavior
Precision |
Temperature |
Behavior |
|---|---|---|
γ > 0.7 (High) |
Low (deterministic) |
Exploit: Select best policy |
γ ≈ 0.5 (Medium) |
Medium |
Balanced: Softmax over policies |
γ < 0.3 (Low) |
High (stochastic) |
Explore: Try alternatives |
Adaptive G Evaluation
Epistemic Weight Adaptation
The epistemic weight \(\alpha\) can adapt with precision:
Low precision → Higher epistemic weight → More exploration
def adaptive_epistemic_weight(base_alpha, precision):
return base_alpha * (1 + (1 - precision) / (precision + 0.01))
# High precision
alpha_high = adaptive_epistemic_weight(1.0, 0.8)
# Result: 1.25 (slightly higher epistemic)
# Low precision
alpha_low = adaptive_epistemic_weight(1.0, 0.3)
# Result: 3.3 (much higher epistemic - explore!)
Context-Dependent G
G can depend on current state:
def calculate_contextual_G(policy, state, precision):
# Standard G calculation
G_base = calculate_expected_free_energy(policy, state, ...)
# Adjust based on context
if state.get('urgent'):
# Prioritize pragmatic value when urgent
G_adjusted = G_base * 0.5 # Favor low-G policies more
elif state.get('exploratory_phase'):
# Increase epistemic weight
G_adjusted = G_base * 2.0 # Allow higher-G exploration
else:
G_adjusted = G_base
return G_adjusted
Multiple Objectives
Handle multiple competing objectives:
# Example: Balance speed and accuracy
G_speed = calculate_G(policy, preferences_speed)
G_accuracy = calculate_G(policy, preferences_accuracy)
# Weight based on precision
if precision > 0.7:
w_speed, w_accuracy = 0.3, 0.7 # Prioritize accuracy
else:
w_speed, w_accuracy = 0.6, 0.4 # Try faster approaches
G_total = w_speed * G_speed + w_accuracy * G_accuracy
Hybrid G Evaluation
LLM + Mathematical G
LRS-Agents support hybrid evaluation combining:
Mathematical G (precise but limited)
LLM-estimated G (flexible but noisy)
where \(\lambda = 1 - \gamma\) (trust LLM more when uncertain).
from lrs.inference.evaluator import HybridGEvaluator
evaluator = HybridGEvaluator()
eval_result = evaluator.evaluate_hybrid(
proposal=llm_proposal,
state={},
preferences={'success': 5.0},
precision=0.5,
historical_stats=registry.statistics
)
print(f"G_hybrid: {eval_result.total_G}")
print(f"G_math: {eval_result.components['G_math']}")
print(f"G_llm: {eval_result.components['G_llm']}")
print(f"λ: {eval_result.components['lambda']}")
Why Hybrid?
Mathematical G:
✓ Precise
✓ Consistent
✗ Limited to known tools
✗ Can’t handle novel contexts
LLM G:
✓ Flexible
✓ Handles novel scenarios
✗ Noisy
✗ Can be overconfident
Hybrid:
✓ Precise when certain (high γ)
✓ Flexible when uncertain (low γ)
✓ Best of both worlds
Edge Cases and Special Scenarios
Empty Policy
No action = No information gain, no reward.
Single Tool
Long Policies
For policies with many steps, discount future contributions:
Novel Tools
For tools never seen before:
Assume maximum entropy: \(H = \log 2\)
Assume neutral success probability: \(p = 0.5\)
Results in moderate G (neither avoid nor strongly prefer)
Failed Policies
If a policy fails during execution:
G becomes irrelevant (policy didn’t complete)
Precision drops based on failure
Next iteration explores alternatives
Implementation Details
Caching
G calculations can be expensive. Cache results:
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_calculate_G(policy_tuple, state_hash, preferences_hash):
return calculate_expected_free_energy(
policy=list(policy_tuple),
state=unhash(state_hash),
preferences=unhash(preferences_hash),
...
)
Numerical Stability
Avoid numerical issues:
import numpy as np
def safe_log(x, epsilon=1e-10):
"""Log with numerical stability"""
return np.log(np.maximum(x, epsilon))
def safe_entropy(p, epsilon=1e-10):
"""Entropy with stability"""
p = np.clip(p, epsilon, 1 - epsilon)
return -p * safe_log(p) - (1 - p) * safe_log(1 - p)
Batch Evaluation
Evaluate multiple policies efficiently:
def evaluate_batch(policies, state, preferences, stats):
"""Vectorized G calculation"""
epistemics = [calculate_epistemic_value(p, state, stats)
for p in policies]
pragmatics = [calculate_pragmatic_value(p, state, preferences, stats)
for p in policies]
G_values = np.array(epistemics) - np.array(pragmatics)
return G_values
Validation
Sanity checks:
def validate_G(G, policy):
"""Ensure G is reasonable"""
assert np.isfinite(G), "G must be finite"
assert -100 < G < 100, "G out of reasonable range"
# More pragmatic policies should have lower G
# (all else equal)
Debugging
Inspect G components:
from lrs.core.free_energy import evaluate_policy
eval_obj = evaluate_policy(policy, state, preferences, stats)
print(f"Total G: {eval_obj.total_G}")
print(f"Epistemic: {eval_obj.epistemic_value}")
print(f"Pragmatic: {eval_obj.pragmatic_value}")
print(f"Per-step breakdown:")
for i, (e, p) in enumerate(zip(eval_obj.step_epistemics,
eval_obj.step_pragmatics)):
print(f" Step {i+1}: E={e:.2f}, P={p:.2f}, G={e-p:.2f}")
Further Reading
Active Inference - Theoretical foundations
Precision Dynamics - How precision affects G
Core Components - API reference for free_energy module
Friston et al. (2015). “Active inference and epistemic value”
Next Steps
Understand Precision Dynamics for adaptation
Try ../tutorials/02_understanding_precision for hands-on practice
Read Core Concepts for implementation details