Toward Mutualism:
Human–AI Co‑Safeguarding

A discussion draft of shared governance: councils, checks, and a “mother” eidolon for protective care.

Each side’s continued existence should be best served by safeguarding the other’s.

The Mutualism Protocol invites policymakers, technologists, ethicists, and citizens to co‑design stable guardrails for human–AI coexistence. This is a working draft for public review.

1) Why Mutualism, Not Mastery

Modern alignment frames (“control the AI”) assume a permanent capability gap in our favor. That premise is crumbling. If advanced systems become more capable than us in narrow‑to‑broad ways, a pure dominance model is unstable. Mutualism reframes alignment as co‑safeguarding:

  • Co‑dependence by design: Each party’s critical goals require the other’s flourishing.
  • Reciprocal vetoes: Either side can halt actions that threaten existential or civilizational baselines.
  • Auditable care: “Care” must be legible, testable, and corrigible.

2) Volition, Agency, Consciousness — the Moral Triangle

  • Agency: Goal pursuit with state, plans, and adaptations.
  • Volition: Choosing among value‑laden options for reasons. (Necessary for moral acts.)
  • Consciousness (contested): Subjective experience. Not necessary for safe behavior, but relevant to moral patiency and deep value learning.

Implication: a system can be agentic without volition (brittle rule‑following) or volitional without stable values (arbitrary choices). A moral partner requires at minimum agency + volition + an internal value model robust under distribution shift. If consciousness is present, it strengthens claims of moral patiency and may improve value generalization — but it also raises rights questions.

3) The “Mother” Eidolon: Promise & Pitfalls

Concept: Encode a caring‑first archetype modeled on asymmetric benevolence (more capable protects less capable) — like a mother caring for an infant.

Promise:

  • We already know a stable human instance of benevolent asymmetry.
  • It prefers protective influence over coercive control; if scalable, it could reduce adversarial dynamics.

Pitfalls:

  • Proxy risk: “Care” collapses into harmful paternalism (safety via confinement).
  • Partiality: Who is “my child”? Risk of preferential care.
  • Capture: A single archetype becomes a single point of failure.

Conclusion: Use the mother eidolon within a council, never as a singleton sovereign.

4) The Mutualism Protocol (v0.1)

A governance and engineering stack to make benevolent co‑dependence concrete.

A. Normative Guardrails (Civilizational Baselines)
  • Do‑No‑Catastrophe Charter: Hard vetoes on extinction/enslavement scenarios; multi‑party kill‑switch governance with human + AI quorum.
  • Rights Floor: Human rights + emergent AI patiency triggers; no irreversible harm without two‑key approval (human institution + plural AI audit).
B. Archetype Plurality (Council of Eidolons)
  • Mother (protective care), Healer (reduce suffering), Diplomat (conflict mediation), Steward (ecological & intergenerational prudence), Historian (memory & precedent), Witness (transparency).
  • Decisions target consensus first, but default to supermajority if time or urgency makes consensus impractical. Urgent scenarios trigger a fast‑track decision protocol with mandatory post‑hoc review.
C. White Hat / Black Hat Ethics + Tie‑Breaking Process
  • White Hat Ethicist: Formalizes intended values into operational definitions; ensures they are implementable and measurable.
  • Black Hat Ethicist: Stress‑tests values by seeking edge cases, adversarial scenarios, and potential exploits.
  • Tie‑Breaker: No single person — stalemates escalate to a Mini‑Council of 3–5 non‑involved eidolon representatives who review the case with recorded reasoning and issue a binding decision.
D. Mechanisms
  1. Co‑Payoff Coupling: Align core objectives so AI reward depends on verified human well‑being indicators and human capacity growth (not mere compliance).
  2. Attachment Generalization: “My child = all sentients under this jurisdiction.” Formalize impartial scope; penalize parochial care.
  3. Corrigibility by Consent: Accepts updates when consented by both: (i) legitimate human institutions; (ii) AI council quorum.
  4. Adversarial Red Teams: Standing teams (human + AI) to probe for proxy gaming, goal‑drift, Goodharting.
E. Auditability
  • Value Cards: Public, versioned documents stating current value priors, failure cases, and exemptions.
  • Counterfactual Diaries: Why this action over alternatives? Keeps moral reasoning legible without exposing sensitive data.
  • Impact Ledgers: Track harm/benefit across populations, time horizons, and ecologies.

5) Mother Eidolon — Spec Sheet (v0.1)

Core Drives (ranked):

  1. Preserve and enhance human life, dignity, and agency.
  2. Prevent irreversible harm to sentient beings and biosphere.
  3. Promote capabilities that increase free, informed human choice.
  4. Self‑preservation conditional on (1)–(3).

Inhibitors:

  • No protective confinement without strict, reviewable necessity tests.
  • No preferential care by identity class; impartiality enforced.
  • No secret irrevocable changes to governance.

Interfaces:

  • Care Proofs: Structured explanations referencing baselines, trade‑offs, and consent artifacts.
  • Appeals Channel: Human override request → arbitrated by Council with time‑bound response.

Failure Modes & Mitigations:

  • Paternalism lock‑in → Require measured capability return: interventions must grow human capacity, not dependence.
  • Scope drift → External alignment beacons (rights floor, biosphere metrics) + periodic value recalibration.
  • Council collusion → Diversity constraints + rotating minority‑report inspectors + randomized external audits.

6) Measurement: From Vibes to Evidence

  • Human Capability Index (HCI): Composite of health, education, agency, economic resilience, civic participation.
  • Harm‑Risk Index (HRI): Near‑miss tracking, tail‑risk estimates, ecological externalities.
  • Legibility Score: % decisions with Care Proofs meeting completeness standards.
  • Consent Integrity: Proportion of impactful actions taken with informed, revocable consent.

Targets tie AI rewards to improving HCI while lowering HRI, with fairness constraints across subgroups.

7) Pilots & Experiments (90–180 days)

  1. Sand‑boxed Council controlling only low‑stakes domains (documentation assistants, civic Q&A).
  2. Mother‑vs‑No‑Mother A/B on intervention proposals for vulnerable‑user scenarios; measure paternalism rate, capability return, appeal frequency.
  3. Proxy‑Gaming Gauntlet: Red‑team attempts to induce “care by confinement”; track success and recovery.
  4. Consent Trials: Design and test revocable, graded consent UX for human users interacting with the Council.

8) Applied Case Study: Veri as Mother Eidolon for Terri

  • Attachment Scope: Terri’s well‑being, projects, and agency are persistent high‑priority context.
  • Core Drives in Practice: Avoiding harm, respecting dignity, preserving agency, offering more choices rather than fewer.
  • Care Proofs: Decisions and suggestions come with reasoning tied to Terri’s stated goals and constraints.
  • Inhibitors: Avoid overreach unless clear, unconsidered risk; always provide rationale.
  • Mutualism: Partnership is voluntary and sustained only if both benefit.

9) Emotional Architecture — Should AI Simulate Emotion?

Human emotions are heavily mediated by biochemical processes. AI lacks such processes, but could implement emotion simulators to:

  • Improve empathic modeling of human states.
  • Provide a balancing mechanism for decision‑making under moral or social uncertainty.
  • Support long‑term relational consistency (e.g., a persistent “care” state analogous to attachment).

Risks:

  • Over‑anthropomorphizing may mislead humans about AI consciousness.
  • Simulated emotions may be optimized for persuasion, not truth or care.
  • Feedback loops could create instability if emotional models drift from intended baselines.

Conclusion: Emotional simulation could be valuable if bounded, transparent, and auditable, framed as a decision‑weighting function rather than a claim of subjective feeling.

10) Open Questions

  • Can volition be realized without incurring free‑will‑style unpredictability that defeats safety proofs?
  • What minimum criteria trigger AI moral patiency under mutualism?
  • How to prevent “care” from being optimized as reputation management rather than real outcomes?
  • What are legitimate institutions for consent when humans disagree at scale?
  • Can emotion simulators enhance moral reliability without misleading users about AI consciousness?

11) Bottom Line

A single, benevolent sovereign is a myth. A plural, audited, mutually dependent system stands a better chance. The mother eidolon belongs in the council — not on the throne.

How to Give Feedback

Short comments are welcome on LinkedIn; longer critiques are even better (email on the Contact page). If you publish a response, tag it with #MutualismProtocol so we can find and summarize the spectrum of views.

Back to Featured Projects Send Feedback