Free Energy =========== A deep dive into Expected Free Energy and its calculation in LRS-Agents. Overview -------- **Expected Free Energy** (G) is the core quantity minimized by Active Inference agents. This document explains: * What G represents * How it's calculated * Why it balances exploration and exploitation * Implementation details in LRS-Agents What is Expected Free Energy? ------------------------------ Definition ^^^^^^^^^^ Expected Free Energy :math:`G` for a policy :math:`\pi` is: .. math:: G(\pi) = \mathbb{E}_{Q(o_\tau | \pi)}[\ln Q(s_\tau | \pi) - \ln P(o_\tau, s_\tau | C)] where: * :math:`o_\tau` = Future observations under policy :math:`\pi` * :math:`s_\tau` = Hidden states * :math:`C` = Preferences (goals) * :math:`Q` = Approximate posterior (beliefs) * :math:`P` = Generative model Intuitive Explanation ^^^^^^^^^^^^^^^^^^^^^ :math:`G` measures the "badness" of a policy considering: 1. **Uncertainty reduction** (Will I learn something?) 2. **Goal achievement** (Will I get reward?) Lower :math:`G` = Better policy A policy with low :math:`G`: * Reduces uncertainty about the world (epistemic value) * Achieves desired outcomes (pragmatic value) Decomposition ^^^^^^^^^^^^^ :math:`G` decomposes into two terms: .. math:: G(\pi) = \underbrace{\mathbb{E}[H[P(o|s)]]}_{\text{Epistemic}} - \underbrace{\mathbb{E}[\ln P(o|C)]}_{\text{Pragmatic}} **Epistemic Value** (Information Gain): * How much will this policy reduce uncertainty? * High for novel, uncertain outcomes * Drives **exploration** **Pragmatic Value** (Expected Utility): * How much will this policy achieve my goals? * High for reliable, rewarding outcomes * Drives **exploitation** The Trade-off ^^^^^^^^^^^^^ .. code-block:: text High Epistemic, Low Pragmatic → Explore (Learn but risky) Low Epistemic, High Pragmatic → Exploit (Safe but boring) Low Epistemic, Low Pragmatic → Avoid (Neither learn nor gain) High Epistemic, High Pragmatic → Ideal! (Learn and gain) Epistemic Value Calculation ---------------------------- Definition ^^^^^^^^^^ Epistemic value measures information gain: .. math:: \text{Epistemic}(\pi) = \mathbb{E}_{Q(s|\pi)}[H[P(o|s)]] where :math:`H` is entropy (uncertainty). High entropy → High uncertainty → High information gain In LRS-Agents ^^^^^^^^^^^^^ For a policy (sequence of tools): .. math:: \text{Epistemic} = \sum_{t=1}^{T} H[\text{Tool}_t] where the entropy of each tool depends on: 1. **Historical reliability**: More failures → More uncertainty 2. **Novelty**: Never used → Maximum uncertainty 3. **Context**: State-dependent uncertainty .. code-block:: python from lrs.core.free_energy import calculate_epistemic_value epistemic = calculate_epistemic_value( policy=[novel_tool, uncertain_tool], state={}, historical_stats=None # No history = high uncertainty ) # Returns: ~1.5 (high information gain) Calculation Details ^^^^^^^^^^^^^^^^^^^ For each tool in the policy: .. math:: H[\text{Tool}] = -\sum_i P(\text{outcome}_i) \log P(\text{outcome}_i) Outcome probabilities from historical statistics: .. math:: P(\text{success}) &= \frac{\text{successes}}{\text{total calls}} \\ P(\text{failure}) &= 1 - P(\text{success}) Binary entropy: .. math:: H = -P(\text{success}) \log P(\text{success}) - P(\text{failure}) \log P(\text{failure}) Special cases: * **No history**: :math:`H = \log 2 \approx 0.69` (maximum uncertainty) * **Always succeeds**: :math:`H = 0` (no uncertainty) * **50/50 success**: :math:`H = \log 2` (maximum binary entropy) Example ^^^^^^^ .. code-block:: python from lrs.core.free_energy import calculate_epistemic_value # Tool with 70% success rate # H = -0.7*log(0.7) - 0.3*log(0.3) ≈ 0.61 # Tool never used before # H = log(2) ≈ 0.69 # Total epistemic value for policy epistemic = 0.61 + 0.69 = 1.30 Pragmatic Value Calculation ---------------------------- Definition ^^^^^^^^^^ Pragmatic value measures expected utility: .. math:: \text{Pragmatic}(\pi) = \mathbb{E}_{Q(o|\pi)}[\ln P(o|C)] where :math:`P(o|C)` represents preferences over outcomes. In simpler terms: .. math:: \text{Pragmatic} = \sum_{t=1}^{T} \gamma^t \left[ P_t(\text{success}) \cdot R_{\text{success}} + P_t(\text{failure}) \cdot R_{\text{failure}} \right] where: * :math:`\gamma` = Discount factor (default 0.99) * :math:`R` = Rewards from preferences * :math:`P_t` = Success probability at step :math:`t` In LRS-Agents ^^^^^^^^^^^^^ .. code-block:: python from lrs.core.free_energy import calculate_pragmatic_value pragmatic = calculate_pragmatic_value( policy=[reliable_tool, fast_tool], state={}, preferences={ 'success': 5.0, # Reward for success 'error': -3.0, # Penalty for error 'step_cost': -0.1 # Small cost per step }, historical_stats=registry.statistics, discount_factor=0.99 ) Calculation Details ^^^^^^^^^^^^^^^^^^^ For each tool at step :math:`t`: .. math:: V_t = \gamma^{t-1} \left[ p_{\text{success}} \cdot R_{\text{success}} + (1 - p_{\text{success}}) \cdot R_{\text{error}} \right] + R_{\text{step}} where: * :math:`p_{\text{success}}` from historical statistics * :math:`R_{\text{success}}` from preferences (default 5.0) * :math:`R_{\text{error}}` from preferences (default -3.0) * :math:`R_{\text{step}}` = step cost (default -0.1) Total pragmatic value: .. math:: \text{Pragmatic} = \sum_{t=1}^{T} V_t Example ^^^^^^^ .. code-block:: python # Policy: [tool_a, tool_b] # tool_a: 80% success # tool_b: 90% success # Step 1: tool_a V_1 = 0.99^0 * [0.8 * 5.0 + 0.2 * (-3.0)] - 0.1 V_1 = 1.0 * [4.0 - 0.6] - 0.1 = 3.3 # Step 2: tool_b V_2 = 0.99^1 * [0.9 * 5.0 + 0.1 * (-3.0)] - 0.1 V_2 = 0.99 * [4.5 - 0.3] - 0.1 ≈ 4.06 # Total pragmatic value Pragmatic = 3.3 + 4.06 = 7.36 Total Expected Free Energy --------------------------- Formula ^^^^^^^ Combining epistemic and pragmatic values: .. math:: G = \alpha \cdot \text{Epistemic} - \text{Pragmatic} where :math:`\alpha` is the epistemic weight (default 1.0). **Lower G is better** because: * High epistemic → Higher G (exploration cost) * High pragmatic → Lower G (exploitation benefit) The agent balances both by minimizing G. In LRS-Agents ^^^^^^^^^^^^^ .. code-block:: python from lrs.core.free_energy import calculate_expected_free_energy G = calculate_expected_free_energy( policy=[tool_a, tool_b], state={}, preferences={'success': 5.0, 'error': -3.0}, historical_stats=registry.statistics, epistemic_weight=1.0, discount_factor=0.99 ) Detailed Example ^^^^^^^^^^^^^^^^ Compare two policies: **Policy A: Reliable tools** [cache_tool, db_tool] .. code-block:: python # Epistemic (low - known tools) Epistemic_A = 0.1 + 0.15 = 0.25 # Pragmatic (high - reliable) Pragmatic_A = 4.8 + 4.5 = 9.3 # G G_A = 0.25 - 9.3 = -9.05 # Very negative (good!) **Policy B: Novel tools** [new_api, experimental_tool] .. code-block:: python # Epistemic (high - uncertain) Epistemic_B = 0.69 + 0.69 = 1.38 # Pragmatic (low - unreliable) Pragmatic_B = 2.0 + 1.5 = 3.5 # G G_B = 1.38 - 3.5 = -2.12 # Less negative (worse) **Result**: Agent prefers Policy A (lower G). Precision-Weighted Selection ----------------------------- G alone doesn't determine policy selection. Precision :math:`\gamma` weights the choice. Softmax Selection ^^^^^^^^^^^^^^^^^ Policies are selected via softmax: .. math:: P(\pi_i) = \frac{\exp(-\beta \cdot G_i)}{\sum_j \exp(-\beta \cdot G_j)} where inverse temperature: .. math:: \beta = \frac{1}{T \cdot (1 - \gamma + \epsilon)} Key insight: * High :math:`\gamma` → High :math:`\beta` → Deterministic selection (exploit) * Low :math:`\gamma` → Low :math:`\beta` → Stochastic selection (explore) Example ^^^^^^^ Three policies with G values: .. code-block:: python G_values = [-9.05, -7.2, -5.1] # Lower is better # High precision (γ = 0.8) β_high = 1 / (0.7 * (1 - 0.8 + 0.01)) ≈ 6.8 P_high = softmax(-6.8 * G_values) # Result: [0.85, 0.12, 0.03] # Exploit best # Low precision (γ = 0.3) β_low = 1 / (0.7 * (1 - 0.3 + 0.01)) ≈ 2.0 P_low = softmax(-2.0 * G_values) # Result: [0.50, 0.32, 0.18] # More exploration Precision-Dependent Behavior ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Precision - Temperature - Behavior * - γ > 0.7 (High) - Low (deterministic) - Exploit: Select best policy * - γ ≈ 0.5 (Medium) - Medium - Balanced: Softmax over policies * - γ < 0.3 (Low) - High (stochastic) - Explore: Try alternatives Adaptive G Evaluation ---------------------- Epistemic Weight Adaptation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The epistemic weight :math:`\alpha` can adapt with precision: .. math:: \alpha(\gamma) = \alpha_{\text{base}} \cdot \left(1 + \frac{1 - \gamma}{\gamma + \epsilon}\right) Low precision → Higher epistemic weight → More exploration .. code-block:: python def adaptive_epistemic_weight(base_alpha, precision): return base_alpha * (1 + (1 - precision) / (precision + 0.01)) # High precision alpha_high = adaptive_epistemic_weight(1.0, 0.8) # Result: 1.25 (slightly higher epistemic) # Low precision alpha_low = adaptive_epistemic_weight(1.0, 0.3) # Result: 3.3 (much higher epistemic - explore!) Context-Dependent G ^^^^^^^^^^^^^^^^^^^^ G can depend on current state: .. code-block:: python def calculate_contextual_G(policy, state, precision): # Standard G calculation G_base = calculate_expected_free_energy(policy, state, ...) # Adjust based on context if state.get('urgent'): # Prioritize pragmatic value when urgent G_adjusted = G_base * 0.5 # Favor low-G policies more elif state.get('exploratory_phase'): # Increase epistemic weight G_adjusted = G_base * 2.0 # Allow higher-G exploration else: G_adjusted = G_base return G_adjusted Multiple Objectives ^^^^^^^^^^^^^^^^^^^^ Handle multiple competing objectives: .. math:: G_{\text{total}} = \sum_i w_i \cdot G_i .. code-block:: python # Example: Balance speed and accuracy G_speed = calculate_G(policy, preferences_speed) G_accuracy = calculate_G(policy, preferences_accuracy) # Weight based on precision if precision > 0.7: w_speed, w_accuracy = 0.3, 0.7 # Prioritize accuracy else: w_speed, w_accuracy = 0.6, 0.4 # Try faster approaches G_total = w_speed * G_speed + w_accuracy * G_accuracy Hybrid G Evaluation -------------------- LLM + Mathematical G ^^^^^^^^^^^^^^^^^^^^ LRS-Agents support **hybrid evaluation** combining: * Mathematical G (precise but limited) * LLM-estimated G (flexible but noisy) .. math:: G_{\text{hybrid}} = (1 - \lambda) \cdot G_{\text{math}} + \lambda \cdot G_{\text{llm}} where :math:`\lambda = 1 - \gamma` (trust LLM more when uncertain). .. code-block:: python from lrs.inference.evaluator import HybridGEvaluator evaluator = HybridGEvaluator() eval_result = evaluator.evaluate_hybrid( proposal=llm_proposal, state={}, preferences={'success': 5.0}, precision=0.5, historical_stats=registry.statistics ) print(f"G_hybrid: {eval_result.total_G}") print(f"G_math: {eval_result.components['G_math']}") print(f"G_llm: {eval_result.components['G_llm']}") print(f"λ: {eval_result.components['lambda']}") Why Hybrid? ^^^^^^^^^^^ **Mathematical G**: * ✓ Precise * ✓ Consistent * ✗ Limited to known tools * ✗ Can't handle novel contexts **LLM G**: * ✓ Flexible * ✓ Handles novel scenarios * ✗ Noisy * ✗ Can be overconfident **Hybrid**: * ✓ Precise when certain (high γ) * ✓ Flexible when uncertain (low γ) * ✓ Best of both worlds Edge Cases and Special Scenarios --------------------------------- Empty Policy ^^^^^^^^^^^^ .. math:: G_{\text{empty}} = 0 No action = No information gain, no reward. Single Tool ^^^^^^^^^^^ .. math:: G = H[\text{tool}] - [p \cdot R_{\text{success}} + (1-p) \cdot R_{\text{error}}] - R_{\text{step}} Long Policies ^^^^^^^^^^^^^ For policies with many steps, discount future contributions: .. math:: G = \sum_{t=1}^{T} \gamma^{t-1} [H_t - V_t] Novel Tools ^^^^^^^^^^^ For tools never seen before: * Assume maximum entropy: :math:`H = \log 2` * Assume neutral success probability: :math:`p = 0.5` * Results in moderate G (neither avoid nor strongly prefer) Failed Policies ^^^^^^^^^^^^^^^ If a policy fails during execution: * G becomes irrelevant (policy didn't complete) * Precision drops based on failure * Next iteration explores alternatives Implementation Details ---------------------- Caching ^^^^^^^ G calculations can be expensive. Cache results: .. code-block:: python from functools import lru_cache @lru_cache(maxsize=1000) def cached_calculate_G(policy_tuple, state_hash, preferences_hash): return calculate_expected_free_energy( policy=list(policy_tuple), state=unhash(state_hash), preferences=unhash(preferences_hash), ... ) Numerical Stability ^^^^^^^^^^^^^^^^^^^ Avoid numerical issues: .. code-block:: python import numpy as np def safe_log(x, epsilon=1e-10): """Log with numerical stability""" return np.log(np.maximum(x, epsilon)) def safe_entropy(p, epsilon=1e-10): """Entropy with stability""" p = np.clip(p, epsilon, 1 - epsilon) return -p * safe_log(p) - (1 - p) * safe_log(1 - p) Batch Evaluation ^^^^^^^^^^^^^^^^ Evaluate multiple policies efficiently: .. code-block:: python def evaluate_batch(policies, state, preferences, stats): """Vectorized G calculation""" epistemics = [calculate_epistemic_value(p, state, stats) for p in policies] pragmatics = [calculate_pragmatic_value(p, state, preferences, stats) for p in policies] G_values = np.array(epistemics) - np.array(pragmatics) return G_values Validation ^^^^^^^^^^ Sanity checks: .. code-block:: python def validate_G(G, policy): """Ensure G is reasonable""" assert np.isfinite(G), "G must be finite" assert -100 < G < 100, "G out of reasonable range" # More pragmatic policies should have lower G # (all else equal) Debugging ^^^^^^^^^ Inspect G components: .. code-block:: python from lrs.core.free_energy import evaluate_policy eval_obj = evaluate_policy(policy, state, preferences, stats) print(f"Total G: {eval_obj.total_G}") print(f"Epistemic: {eval_obj.epistemic_value}") print(f"Pragmatic: {eval_obj.pragmatic_value}") print(f"Per-step breakdown:") for i, (e, p) in enumerate(zip(eval_obj.step_epistemics, eval_obj.step_pragmatics)): print(f" Step {i+1}: E={e:.2f}, P={p:.2f}, G={e-p:.2f}") Further Reading --------------- * :doc:`active_inference` - Theoretical foundations * :doc:`precision_dynamics` - How precision affects G * :doc:`../api/core` - API reference for free_energy module * Friston et al. (2015). "Active inference and epistemic value" Next Steps ---------- * Understand :doc:`precision_dynamics` for adaptation * Try :doc:`../tutorials/02_understanding_precision` for hands-on practice * Read :doc:`../getting_started/core_concepts` for implementation details