even more refactoring
Some checks failed
Build and Publish / BuildAndDeploy (push) Successful in 3m7s
Build and Publish / BuildAndDeployAmd64 (push) Has been cancelled

This commit is contained in:
matst80
2025-10-10 11:46:19 +00:00
parent 12d87036f6
commit 716f1121aa
32 changed files with 3857 additions and 953 deletions

238
README.md
View File

@@ -175,4 +175,240 @@ curl --cookie cookies.txt http://localhost:8080/cart/add/TEST-SKU-123
- Always regenerate protobuf Go code after modifying any `.proto` files (messages/cart_actor/control_plane)
- The generated `messages.pb.go` file should not be edited manually
- Make sure your PATH includes the protoc-gen-go binary location (usually `$GOPATH/bin`)
- Make sure your PATH includes the protoc-gen-go binary location (usually `$GOPATH/bin`)
---
## Architecture Overview
The system is a distributed, sharded (by cart id) actor model implementation:
- Each cart is a grain (an inmemory struct `*CartGrain`) that owns and mutates its own state.
- A **local grain pool** holds grains owned by the node.
- A **synced (cluster) pool** (`SyncedPool`) coordinates multiple nodes and exposes local or remote grains through a uniform interface (`GrainPool`).
- All internode communication is gRPC:
- Cart mutation & state RPCs (CartActor service).
- Control plane RPCs (ControlPlane service) for membership, ownership negotiation, liveness, and graceful shutdown.
### Key Processes
1. Client HTTP request (or gRPC client) arrives with a cart identifier (cookie or path).
2. The pool resolves ownership:
- If local grain exists → use it.
- If a remote host is known owner → a remote grain proxy (`RemoteGrainGRPC`) is used; it performs gRPC calls to the owning node.
- If ownership is unknown → node attempts to claim ownership (quorum negotiation) and spawns a local grain.
3. Mutation is executed via the **mutation registry** (registry wraps domain logic + optional totals recomputation).
4. Updated state returned to caller; ownership preserved unless relinquished later (not yet implemented to shed load).
---
## Grain & Mutation Model
- `CartGrain` holds items, deliveries, pricing aggregates, and checkout/order metadata.
- All mutations are registered via `RegisterMutation[T]` with signature:
```
func(*CartGrain, *T) error
```
- `WithTotals()` flag triggers automatic recalculation of totals after successful handlers.
- The old giant `switch` in `CartGrain.Apply` has been replaced by registry dispatch; unregistered mutations fail fast.
- Adding a mutation:
1. Define proto message.
2. Generate code.
3. Register handler (optionally WithTotals).
4. Add gRPC RPC + request wrapper if the mutation must be remotely invokable.
5. (Optional) Add HTTP endpoint mapping to the mutation.
---
## Local Grain Pool
- Manages an inmemory map `map[CartId]*CartGrain`.
- Lazy spawn: first mutation or explicit access triggers `spawn(id)`.
- TTL / purge loop periodically removes expired grains unless they changed recently (basic memory pressure management).
- Capacity limit (`PoolSize`); oldest expired grain evicted first when full.
---
## Synced (Cluster) Pool
`SyncedPool` wraps a local pool and tracks:
- `remoteHosts`: known peer nodes (gRPC connections).
- `remoteIndex`: mapping of cart id → remote grain proxy (`RemoteGrainGRPC`) for carts owned elsewhere.
Responsibilities:
1. Discovery integration (via a `Discovery` interface) adds/removes hosts.
2. Periodic ping health checks (ControlPlane.Ping).
3. Ownership negotiation:
- On first contention / unknown owner, node calls `ConfirmOwner` on peers to achieve quorum before making a local grain authoritative.
4. Remote spawning:
- When a remote host reports its cart ids (`GetCartIds`), the pool creates remote proxies for fast routing.
---
## Remote Grain Proxies
A `RemoteGrainGRPC` implements the `Grain` interface but delegates:
- `Apply` → Specific CartActor permutation RPC (e.g., `AddItem`, `RemoveItem`) constructed from the mutation type. (Legacy envelope removed.)
- `GetCurrentState` → `CartActor.GetState`.
Return path:
1. gRPC reply (CartMutationReply / StateReply) → proto `CartState`.
2. `ToCartState` / mapping reconstructs a local `CartGrain` snapshot for callers expecting grain semantics.
---
## Control Plane (InterNode Coordination)
Defined in `proto/control_plane.proto`:
| RPC | Purpose |
|-----|---------|
| `Ping` | Liveness; increments missed ping counter if failing. |
| `Negotiate` | Merges membership views; used after discovery events. |
| `GetCartIds` | Enumerate locally owned carts for remote index seeding. |
| `ConfirmOwner` | Quorum acknowledgment for ownership claim. |
| `Closing` | Graceful shutdown notice; peers remove host & associated remote grains. |
### Ownership / Quorum Rules
- If total participating hosts < 3 → all must accept.
- Otherwise majority acceptance (`ok >= total/2`).
- On failure → local tentative grain is removed (rollback to avoid splitbrain).
---
## Request / Mutation Flow Examples
### Local Mutation
1. HTTP handler parses request → determines cart id.
2. `SyncedPool.Apply`:
- Finds local grain (or spawns new after quorum).
- Executes registry mutation.
3. Totals updated if flagged.
4. HTTP response returns updated JSON (via `ToCartState`).
### Remote Mutation
1. `SyncedPool.Apply` sees cart mapped to a remote host.
2. Routes to `RemoteGrainGRPC.Apply`.
3. Remote node executes mutation locally and returns updated state over gRPC.
4. Proxy materializes snapshot locally (not authoritative, readonly view).
### Checkout (SideEffecting, Non-Pure)
- HTTP `/checkout` uses current grain snapshot to build payload (pure function).
- Calls Klarna externally (not a mutation).
- Applies `InitializeCheckout` mutation to persist reference + status.
- Returns Klarna order JSON to client.
---
## Scaling & Deployment
- **Horizontal scaling**: Add more nodes; discovery layer (Kubernetes / service registry) feeds hosts to `SyncedPool`.
- **Sharding**: Implicit by cart id hash. Ownership is first-claim with quorum acceptance.
- **Hot spots**: A single popular cart remains on one node; for heavy multi-client concurrency, future work could add read replicas or partitioning (not implemented).
- **Capacity tuning**: Increase `PoolSize` & memory limits; adjust TTL for stale cart eviction.
### Adding Nodes
1. Node starts gRPC server (CartActor + ControlPlane).
2. After brief delay, begins discovery watch; on event:
- New host → dial + negotiate → seed remote cart ids.
3. Pings maintain health; failed hosts removed (proxies invalidated).
---
## Failure Handling
| Scenario | Behavior |
|----------|----------|
| Remote host unreachable | Pings increment `MissedPings`; after threshold host removed. |
| Ownership negotiation fails | Tentative local grain discarded. |
| gRPC call error on remote mutation | Error bubbled to caller; no local fallback. |
| Missing mutation registration | Fast failure with explicit error message. |
| Partial checkout (Klarna fails) | No local state mutation for checkout; client sees error; cart remains unchanged. |
---
## Mutation Registry Summary
- Central, type-safe registry prevents silent omission.
- Each handler:
- Validates input.
- Mutates `*CartGrain`.
- Returns error for rejection.
- Automatic totals recomputation reduces boilerplate and consistency risk.
- Coverage test (add separately) can enforce all proto mutations are registered.
---
## gRPC Interfaces
- **CartActor**: Per-mutation unary RPCs + `GetState`. (Checkout logic intentionally excluded; handled at HTTP layer.)
- **ControlPlane**: Cluster coordination (Ping, Negotiate, ConfirmOwner, etc.).
**Ports** (default / implied):
- CartActor & ControlPlane share the same gRPC server/listener (single port, e.g. `:1337`).
- Legacy frame/TCP code has been removed.
---
## Security & Future Enhancements
| Area | Potential Improvement |
|------|------------------------|
| Transport Security | Add TLS / mTLS to gRPC servers & clients. |
| Auth / RBAC | Intercept CartActor RPCs with auth metadata. |
| Backpressure | Rate-limit remote mutation calls per host. |
| Observability | Add per-mutation Prometheus metrics & tracing spans. |
| Ownership | Add lease timeouts / fencing tokens for stricter guarantees. |
| Batch Ops | Introduce batch mutation RPC or streaming updates (WatchState). |
| Persistence | Reintroduce event log or snapshot persistence layer if durability required. |
---
## Adding a New Node (Operational Checklist)
1. Deploy binary/container with same proto + registry.
2. Expose gRPC port.
3. Ensure discovery lists the new host.
4. Node dials peers, negotiates membership.
5. Remote cart proxies seeded.
6. Traffic routed automatically based on ownership.
---
## Adding a New Mutation (Checklist Recap)
1. Define proto message (+ request wrapper & RPC if remote invocation needed).
2. Regenerate protobuf code.
3. Implement & register handler (`RegisterMutation`).
4. Add client (HTTP/gRPC) endpoint.
5. Write unit + integration tests.
6. (Optional) Add to coverage test list and docs.
---
## High-Level Data Flow Diagram (Text)
```
Client -> HTTP Handler -> SyncedPool -> (local?) -> Registry -> Grain State
\-> (remote?) -> RemoteGrainGRPC -> gRPC -> Remote CartActor -> Registry -> Grain
ControlPlane: Discovery Events <-> Negotiation/Ping/ConfirmOwner <-> SyncedPool state
```
---
## Troubleshooting
| Symptom | Likely Cause | Action |
|---------|--------------|--------|
| New cart every request | Secure cookie over plain HTTP or not sending cookie jar | Disable Secure locally or use HTTPS & proper curl `-b` |
| Unsupported mutation error | Missing registry handler | Add `RegisterMutation` for that proto |
| Ownership flapping | Quorum failing due to intermittent peers | Investigate `ConfirmOwner` errors / network |
| Remote mutation latency | Network / serialization overhead | Consider batching or colocating hot carts |
| Checkout returns 500 | Klarna call failed | Inspect logs; no grain state mutated |
---