Inference API

The inference module provides components for LLM-based policy generation and evaluation.

Meta-Cognitive Prompting

Meta-cognitive prompting for LRS-Agents.

Generates precision-adaptive prompts that guide LLMs to produce diverse policy proposals appropriate to the agent’s epistemic state.

class lrs.inference.prompts.StrategyMode(value)[source]

Bases: Enum

Strategic mode based on precision level

EXPLOITATION = 'exploit'
EXPLORATION = 'explore'
BALANCED = 'balanced'
class lrs.inference.prompts.PromptContext(precision: float, recent_errors: List[float], available_tools: List[str], goal: str, state: Dict[str, Any], tool_history: List[Dict[str, Any]])[source]

Bases: object

Context for generating meta-cognitive prompts.

precision

Current precision value [0, 1]

Type:

float

recent_errors

List of recent prediction errors

Type:

List[float]

available_tools

List of tool names

Type:

List[str]

goal

Current goal description

Type:

str

state

Current agent state

Type:

Dict[str, Any]

tool_history

Recent tool executions

Type:

List[Dict[str, Any]]

precision: float
recent_errors: List[float]
available_tools: List[str]
goal: str
state: Dict[str, Any]
tool_history: List[Dict[str, Any]]
__init__(precision: float, recent_errors: List[float], available_tools: List[str], goal: str, state: Dict[str, Any], tool_history: List[Dict[str, Any]]) None
class lrs.inference.prompts.MetaCognitivePrompter(high_precision_threshold: float = 0.7, low_precision_threshold: float = 0.4, high_error_threshold: float = 0.7)[source]

Bases: object

Generates precision-adaptive prompts for LLM policy generation.

The prompts adapt based on: 1. Precision level (confidence in world model) 2. Recent prediction errors (surprise events) 3. Available tools 4. Current goal

Examples

>>> prompter = MetaCognitivePrompter()
>>>
>>> context = PromptContext(
...     precision=0.3,  # Low precision
...     recent_errors=[0.9, 0.85, 0.7],
...     available_tools=["api_fetch", "cache_fetch"],
...     goal="Fetch user data",
...     state={},
...     tool_history=[]
... )
>>>
>>> prompt = prompter.generate_prompt(context)
>>> print("EXPLORATION MODE" in prompt)
True
__init__(high_precision_threshold: float = 0.7, low_precision_threshold: float = 0.4, high_error_threshold: float = 0.7)[source]

Initialize prompter.

Parameters:
  • high_precision_threshold – Threshold for exploitation mode

  • low_precision_threshold – Threshold for exploration mode

  • high_error_threshold – Threshold for “high surprise”

generate_prompt(context: PromptContext) str[source]

Generate precision-adaptive prompt.

Parameters:

context – Prompt context with precision, errors, tools, etc.

Returns:

Complete prompt string for LLM

Examples

>>> prompt = prompter.generate_prompt(context)
>>> # Prompt includes precision value, strategy guidance, tool list
lrs.inference.prompts.build_simple_prompt(goal: str, tools: List[str], precision: float, num_proposals: int = 5) str[source]

Build a simple prompt without full context.

Convenience function for quick prompting.

Parameters:
  • goal – Task goal

  • tools – Available tool names

  • precision – Current precision value

  • num_proposals – Number of proposals to generate

Returns:

Prompt string

Examples

>>> prompt = build_simple_prompt(
...     goal="Fetch data",
...     tools=["api", "cache"],
...     precision=0.5
... )

Classes

class lrs.inference.prompts.PromptContext(precision: float, recent_errors: List[float], available_tools: List[str], goal: str, state: Dict[str, Any], tool_history: List[Dict[str, Any]])[source]

Bases: object

Context for generating meta-cognitive prompts.

precision

Current precision value [0, 1]

Type:

float

recent_errors

List of recent prediction errors

Type:

List[float]

available_tools

List of tool names

Type:

List[str]

goal

Current goal description

Type:

str

state

Current agent state

Type:

Dict[str, Any]

tool_history

Recent tool executions

Type:

List[Dict[str, Any]]

Context for generating meta-cognitive prompts.

Attributes:

  • precision (float): Current precision value

  • recent_errors (List[float]): Recent prediction errors

  • available_tools (List[str]): Tools the agent can use

  • goal (str): Current task goal

  • state (dict): Current belief state

  • tool_history (List[dict]): Execution history

precision: float
recent_errors: List[float]
available_tools: List[str]
goal: str
state: Dict[str, Any]
tool_history: List[Dict[str, Any]]
__init__(precision: float, recent_errors: List[float], available_tools: List[str], goal: str, state: Dict[str, Any], tool_history: List[Dict[str, Any]]) None
class lrs.inference.prompts.StrategyMode(value)[source]

Bases: Enum

Strategic mode based on precision level

Policy generation strategy based on precision.

  • EXPLOITATION: High precision → Prioritize reward

  • EXPLORATION: Low precision → Prioritize information gain

  • BALANCED: Medium precision → Balance both

EXPLOITATION = 'exploit'
EXPLORATION = 'explore'
BALANCED = 'balanced'
class lrs.inference.prompts.MetaCognitivePrompter(high_precision_threshold: float = 0.7, low_precision_threshold: float = 0.4, high_error_threshold: float = 0.7)[source]

Bases: object

Generates precision-adaptive prompts for LLM policy generation.

The prompts adapt based on: 1. Precision level (confidence in world model) 2. Recent prediction errors (surprise events) 3. Available tools 4. Current goal

Examples

>>> prompter = MetaCognitivePrompter()
>>>
>>> context = PromptContext(
...     precision=0.3,  # Low precision
...     recent_errors=[0.9, 0.85, 0.7],
...     available_tools=["api_fetch", "cache_fetch"],
...     goal="Fetch user data",
...     state={},
...     tool_history=[]
... )
>>>
>>> prompt = prompter.generate_prompt(context)
>>> print("EXPLORATION MODE" in prompt)
True

Generates precision-adaptive prompts for LLM policy generation.

Methods:

generate_prompt(context: PromptContext) str[source]

Generate precision-adaptive prompt.

Parameters:

context – Prompt context with precision, errors, tools, etc.

Returns:

Complete prompt string for LLM

Examples

>>> prompt = prompter.generate_prompt(context)
>>> # Prompt includes precision value, strategy guidance, tool list

Example:

from lrs.inference.prompts import MetaCognitivePrompter, PromptContext

prompter = MetaCognitivePrompter()

context = PromptContext(
    precision=0.3,  # Low precision
    recent_errors=[0.8, 0.9],
    available_tools=['api', 'cache', 'db'],
    goal='Fetch user data',
    state={},
    tool_history=[]
)

prompt = prompter.generate_prompt(context)
# Generates exploration-focused prompt
__init__(high_precision_threshold: float = 0.7, low_precision_threshold: float = 0.4, high_error_threshold: float = 0.7)[source]

Initialize prompter.

Parameters:
  • high_precision_threshold – Threshold for exploitation mode

  • low_precision_threshold – Threshold for exploration mode

  • high_error_threshold – Threshold for “high surprise”

generate_prompt(context: PromptContext) str[source]

Generate precision-adaptive prompt.

Parameters:

context – Prompt context with precision, errors, tools, etc.

Returns:

Complete prompt string for LLM

Examples

>>> prompt = prompter.generate_prompt(context)
>>> # Prompt includes precision value, strategy guidance, tool list

LLM Policy Generator

LLM-based policy generation for Active Inference.

class lrs.inference.llm_policy_generator.PolicyProposal(*, tool_sequence: ~typing.List[str], reasoning: str, estimated_success_prob: ~typing.Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)], estimated_info_gain: ~typing.Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)], strategy: str, failure_modes: ~typing.List[str] = <factory>)[source]

Bases: BaseModel

A single policy proposal with metadata.

tool_sequence: List[str]
reasoning: str
estimated_success_prob: float
estimated_info_gain: float
strategy: str
failure_modes: List[str]
classmethod validate_strategy(v: str) str[source]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lrs.inference.llm_policy_generator.PolicyProposalSet(*, proposals: ~typing.List[~lrs.inference.llm_policy_generator.PolicyProposal], current_uncertainty: ~typing.Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)], known_unknowns: ~typing.List[str] = <factory>)[source]

Bases: BaseModel

Complete set of policy proposals with metadata.

proposals: List[PolicyProposal]
current_uncertainty: float
known_unknowns: List[str]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lrs.inference.llm_policy_generator.LLMPolicyGenerator(llm: BaseChatModel, registry: ToolRegistry, prompter: MetaCognitivePrompter | None = None)[source]

Bases: object

Generates policy proposals using an LLM with Active Inference principles.

The generator uses meta-cognitive prompting to produce diverse policies that balance exploration and exploitation based on precision parameters.

__init__(llm: BaseChatModel, registry: ToolRegistry, prompter: MetaCognitivePrompter | None = None)[source]

Initialize the policy generator.

Parameters:
  • llm – Language model for generating proposals

  • registry – Tool registry for available actions

  • prompter – Optional custom prompter (creates default if None)

generate_proposals(state: Dict[str, Any] | None = None, precision: PrecisionParameters | None = None, num_proposals: int = 3) List[Dict[str, Any]][source]

Generate policy proposals based on current context and precision.

Parameters:
  • state – Current state, goal, and history (deprecated, use context instead)

  • context – Current state, goal, and history

  • precision – Precision parameters guiding exploration/exploitation

  • num_proposals – Number of proposals to generate

Returns:

List of policy dictionaries with tools and metadata

lrs.inference.llm_policy_generator.create_mock_generator(num_proposals: int = 3) LLMPolicyGenerator[source]

Create a mock policy generator for testing.

Parameters:

num_proposals – Number of proposals the mock should generate

Returns:

Generator that produces simple test proposals.

Classes

class lrs.inference.llm_policy_generator.PolicyProposal(*, tool_sequence: ~typing.List[str], reasoning: str, estimated_success_prob: ~typing.Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)], estimated_info_gain: ~typing.Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)], strategy: str, failure_modes: ~typing.List[str] = <factory>)[source]

Bases: BaseModel

A single policy proposal with metadata.

Single policy proposal from LLM.

Attributes:

  • policy_id (int): Unique identifier

  • tools (List[str]): Tool names in sequence

  • estimated_success_prob (float): LLM’s self-assessed success probability

  • expected_information_gain (float): Expected epistemic value

  • strategy (str): “exploit”, “explore”, or “balanced”

  • rationale (str): Explanation of policy

  • failure_modes (List[str]): Potential failure scenarios

tool_sequence: List[str]
reasoning: str
estimated_success_prob: float
estimated_info_gain: float
strategy: str
failure_modes: List[str]
classmethod validate_strategy(v: str) str[source]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lrs.inference.llm_policy_generator.PolicyProposalSet(*, proposals: ~typing.List[~lrs.inference.llm_policy_generator.PolicyProposal], current_uncertainty: ~typing.Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)], known_unknowns: ~typing.List[str] = <factory>)[source]

Bases: BaseModel

Complete set of policy proposals with metadata.

Set of 3-7 policy proposals from LLM.

Attributes:

  • proposals (List[PolicyProposal]): Individual proposals

  • current_uncertainty (Optional[float]): LLM’s uncertainty estimate

  • known_unknowns (List[str]): What the LLM doesn’t know

proposals: List[PolicyProposal]
current_uncertainty: float
known_unknowns: List[str]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lrs.inference.llm_policy_generator.LLMPolicyGenerator(llm: BaseChatModel, registry: ToolRegistry, prompter: MetaCognitivePrompter | None = None)[source]

Bases: object

Generates policy proposals using an LLM with Active Inference principles.

The generator uses meta-cognitive prompting to produce diverse policies that balance exploration and exploitation based on precision parameters.

LLM-based variational proposal mechanism.

Methods:

generate_proposals(state: Dict[str, Any] | None = None, precision: PrecisionParameters | None = None, num_proposals: int = 3) List[Dict[str, Any]][source]

Generate policy proposals based on current context and precision.

Parameters:
  • state – Current state, goal, and history (deprecated, use context instead)

  • context – Current state, goal, and history

  • precision – Precision parameters guiding exploration/exploitation

  • num_proposals – Number of proposals to generate

Returns:

List of policy dictionaries with tools and metadata

Temperature Adaptation:

Temperature scales with precision:

\[T = T_{base} \times \frac{1}{\gamma + 0.1}\]

Low precision → High temperature → Diverse proposals

Example:

from lrs.inference.llm_policy_generator import LLMPolicyGenerator
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-sonnet-4-20250514")
generator = LLMPolicyGenerator(llm, registry)

proposals = generator.generate_proposals(
    state={'goal': 'Fetch data'},
    precision=0.5,
    num_proposals=5
)

for proposal in proposals:
    print(f"{proposal['strategy']}: {proposal['tool_names']}")
__init__(llm: BaseChatModel, registry: ToolRegistry, prompter: MetaCognitivePrompter | None = None)[source]

Initialize the policy generator.

Parameters:
  • llm – Language model for generating proposals

  • registry – Tool registry for available actions

  • prompter – Optional custom prompter (creates default if None)

generate_proposals(state: Dict[str, Any] | None = None, precision: PrecisionParameters | None = None, num_proposals: int = 3) List[Dict[str, Any]][source]

Generate policy proposals based on current context and precision.

Parameters:
  • state – Current state, goal, and history (deprecated, use context instead)

  • context – Current state, goal, and history

  • precision – Precision parameters guiding exploration/exploitation

  • num_proposals – Number of proposals to generate

Returns:

List of policy dictionaries with tools and metadata

Functions

lrs.inference.llm_policy_generator.create_mock_generator(num_proposals: int = 3) LLMPolicyGenerator[source]

Create a mock policy generator for testing.

Parameters:

num_proposals – Number of proposals the mock should generate

Returns:

Generator that produces simple test proposals.

Create mock generator for testing (doesn’t require API key).

Hybrid Evaluator

Hybrid Expected Free Energy evaluation.

class lrs.inference.evaluator.HybridGEvaluator(lambda_fn: callable | None = None, epistemic_weight: float = 1.0)[source]

Bases: object

Evaluate policies using both LLM priors and mathematical statistics.

G_hybrid = (1 - λ) * G_math + λ * G_llm

Where: - G_math: Calculated from historical execution statistics - G_llm: Derived from LLM’s self-assessed success prob and info gain - λ: Interpolation factor (adaptive based on precision)

Intuition: - Low precision → trust LLM more (world model unreliable, use semantics) - High precision → trust math more (world model accurate, use statistics)

Examples

>>> evaluator = HybridGEvaluator()
>>>
>>> # LLM proposal with self-assessment
>>> proposal = {
...     'policy': [tool_a, tool_b],
...     'llm_success_prob': 0.7,
...     'llm_info_gain': 0.4
... }
>>>
>>> # Evaluate with hybrid approach
>>> G = evaluator.evaluate_hybrid(
...     proposal, state, preferences, precision=0.5
... )
__init__(lambda_fn: callable | None = None, epistemic_weight: float = 1.0)[source]

Initialize hybrid evaluator.

Parameters:
  • lambda_fn – Function mapping precision → interpolation weight Default: λ = 1 - precision (trust LLM when uncertain)

  • epistemic_weight – Weight for epistemic value in G calculation

evaluate_hybrid(proposal: Dict[str, Any], state: Dict[str, Any], preferences: Dict[str, float], precision: float, historical_stats: Dict[str, Dict] | None = None) float[source]

Evaluate policy using hybrid approach.

Parameters:
  • proposal – Policy proposal with ‘policy’, ‘llm_success_prob’, ‘llm_info_gain’

  • state – Current agent state

  • preferences – Reward function

  • precision – Current precision value

  • historical_stats – Optional execution history

Returns:

Hybrid G value

Examples

>>> G = evaluator.evaluate_hybrid(proposal, state, preferences, precision=0.3)
>>> # Low precision → G weighted toward LLM's assessment
evaluate_all(proposals: List[Dict[str, Any]], state: Dict[str, Any], preferences: Dict[str, float], precision: float, historical_stats: Dict[str, Dict] | None = None) List[PolicyEvaluation][source]

Evaluate multiple proposals.

Parameters:
  • proposals – List of policy proposals

  • state – Current state

  • preferences – Reward function

  • precision – Current precision

  • historical_stats – Execution history

Returns:

List of PolicyEvaluation objects

lrs.inference.evaluator.compare_math_vs_llm(proposal: Dict[str, Any], state: Dict[str, Any], preferences: Dict[str, float], historical_stats: Dict[str, Dict] | None = None) Dict[str, float][source]

Compare mathematical vs LLM-based G calculation.

Useful for debugging and understanding how the hybrid evaluator works.

Parameters:
  • proposal – Policy proposal with LLM assessments

  • state – Current state

  • preferences – Reward function

  • historical_stats – Execution history

Returns:

Dict with ‘G_math’, ‘G_llm’, and ‘difference’

Examples

>>> comparison = compare_math_vs_llm(proposal, state, preferences)
>>> print(f"Math G: {comparison['G_math']:.2f}")
>>> print(f"LLM G: {comparison['G_llm']:.2f}")
>>> print(f"Difference: {comparison['difference']:.2f}")

Classes

class lrs.inference.evaluator.HybridGEvaluator(lambda_fn: callable | None = None, epistemic_weight: float = 1.0)[source]

Bases: object

Evaluate policies using both LLM priors and mathematical statistics.

G_hybrid = (1 - λ) * G_math + λ * G_llm

Where: - G_math: Calculated from historical execution statistics - G_llm: Derived from LLM’s self-assessed success prob and info gain - λ: Interpolation factor (adaptive based on precision)

Intuition: - Low precision → trust LLM more (world model unreliable, use semantics) - High precision → trust math more (world model accurate, use statistics)

Examples

>>> evaluator = HybridGEvaluator()
>>>
>>> # LLM proposal with self-assessment
>>> proposal = {
...     'policy': [tool_a, tool_b],
...     'llm_success_prob': 0.7,
...     'llm_info_gain': 0.4
... }
>>>
>>> # Evaluate with hybrid approach
>>> G = evaluator.evaluate_hybrid(
...     proposal, state, preferences, precision=0.5
... )

Hybrid evaluator combining mathematical G with LLM self-assessment.

Formula:

\[G_{hybrid} = (1 - \lambda) \cdot G_{math} + \lambda \cdot G_{llm}\]

where \(\lambda = 1 - \gamma\) (low precision → trust LLM more)

Methods:

evaluate_hybrid(proposal: Dict[str, Any], state: Dict[str, Any], preferences: Dict[str, float], precision: float, historical_stats: Dict[str, Dict] | None = None) float[source]

Evaluate policy using hybrid approach.

Parameters:
  • proposal – Policy proposal with ‘policy’, ‘llm_success_prob’, ‘llm_info_gain’

  • state – Current agent state

  • preferences – Reward function

  • precision – Current precision value

  • historical_stats – Optional execution history

Returns:

Hybrid G value

Examples

>>> G = evaluator.evaluate_hybrid(proposal, state, preferences, precision=0.3)
>>> # Low precision → G weighted toward LLM's assessment
evaluate_all(proposals: List[Dict[str, Any]], state: Dict[str, Any], preferences: Dict[str, float], precision: float, historical_stats: Dict[str, Dict] | None = None) List[PolicyEvaluation][source]

Evaluate multiple proposals.

Parameters:
  • proposals – List of policy proposals

  • state – Current state

  • preferences – Reward function

  • precision – Current precision

  • historical_stats – Execution history

Returns:

List of PolicyEvaluation objects

Example:

from lrs.inference.evaluator import HybridGEvaluator

evaluator = HybridGEvaluator()

# Evaluate single proposal
eval_result = evaluator.evaluate_hybrid(
    proposal=proposal_dict,
    state={},
    preferences={'success': 5.0},
    precision=0.5,
    historical_stats=registry.statistics
)

print(f"G_hybrid: {eval_result.total_G}")
print(f"G_math: {eval_result.components['G_math']}")
print(f"G_llm: {eval_result.components['G_llm']}")
print(f"λ: {eval_result.components['lambda']}")
__init__(lambda_fn: callable | None = None, epistemic_weight: float = 1.0)[source]

Initialize hybrid evaluator.

Parameters:
  • lambda_fn – Function mapping precision → interpolation weight Default: λ = 1 - precision (trust LLM when uncertain)

  • epistemic_weight – Weight for epistemic value in G calculation

evaluate_hybrid(proposal: Dict[str, Any], state: Dict[str, Any], preferences: Dict[str, float], precision: float, historical_stats: Dict[str, Dict] | None = None) float[source]

Evaluate policy using hybrid approach.

Parameters:
  • proposal – Policy proposal with ‘policy’, ‘llm_success_prob’, ‘llm_info_gain’

  • state – Current agent state

  • preferences – Reward function

  • precision – Current precision value

  • historical_stats – Optional execution history

Returns:

Hybrid G value

Examples

>>> G = evaluator.evaluate_hybrid(proposal, state, preferences, precision=0.3)
>>> # Low precision → G weighted toward LLM's assessment
evaluate_all(proposals: List[Dict[str, Any]], state: Dict[str, Any], preferences: Dict[str, float], precision: float, historical_stats: Dict[str, Dict] | None = None) List[PolicyEvaluation][source]

Evaluate multiple proposals.

Parameters:
  • proposals – List of policy proposals

  • state – Current state

  • preferences – Reward function

  • precision – Current precision

  • historical_stats – Execution history

Returns:

List of PolicyEvaluation objects

Functions

lrs.inference.evaluator.compare_math_vs_llm(proposal: Dict[str, Any], state: Dict[str, Any], preferences: Dict[str, float], historical_stats: Dict[str, Dict] | None = None) Dict[str, float][source]

Compare mathematical vs LLM-based G calculation.

Useful for debugging and understanding how the hybrid evaluator works.

Parameters:
  • proposal – Policy proposal with LLM assessments

  • state – Current state

  • preferences – Reward function

  • historical_stats – Execution history

Returns:

Dict with ‘G_math’, ‘G_llm’, and ‘difference’

Examples

>>> comparison = compare_math_vs_llm(proposal, state, preferences)
>>> print(f"Math G: {comparison['G_math']:.2f}")
>>> print(f"LLM G: {comparison['G_llm']:.2f}")
>>> print(f"Difference: {comparison['difference']:.2f}")

Debug utility to compare mathematical vs LLM G values.