even more refactoring
This commit is contained in:
238
README.md
238
README.md
@@ -175,4 +175,240 @@ curl --cookie cookies.txt http://localhost:8080/cart/add/TEST-SKU-123
|
||||
|
||||
- Always regenerate protobuf Go code after modifying any `.proto` files (messages/cart_actor/control_plane)
|
||||
- The generated `messages.pb.go` file should not be edited manually
|
||||
- Make sure your PATH includes the protoc-gen-go binary location (usually `$GOPATH/bin`)
|
||||
- Make sure your PATH includes the protoc-gen-go binary location (usually `$GOPATH/bin`)
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
The system is a distributed, sharded (by cart id) actor model implementation:
|
||||
|
||||
- Each cart is a grain (an in‑memory struct `*CartGrain`) that owns and mutates its own state.
|
||||
- A **local grain pool** holds grains owned by the node.
|
||||
- A **synced (cluster) pool** (`SyncedPool`) coordinates multiple nodes and exposes local or remote grains through a uniform interface (`GrainPool`).
|
||||
- All inter‑node communication is gRPC:
|
||||
- Cart mutation & state RPCs (CartActor service).
|
||||
- Control plane RPCs (ControlPlane service) for membership, ownership negotiation, liveness, and graceful shutdown.
|
||||
|
||||
### Key Processes
|
||||
|
||||
1. Client HTTP request (or gRPC client) arrives with a cart identifier (cookie or path).
|
||||
2. The pool resolves ownership:
|
||||
- If local grain exists → use it.
|
||||
- If a remote host is known owner → a remote grain proxy (`RemoteGrainGRPC`) is used; it performs gRPC calls to the owning node.
|
||||
- If ownership is unknown → node attempts to claim ownership (quorum negotiation) and spawns a local grain.
|
||||
3. Mutation is executed via the **mutation registry** (registry wraps domain logic + optional totals recomputation).
|
||||
4. Updated state returned to caller; ownership preserved unless relinquished later (not yet implemented to shed load).
|
||||
|
||||
---
|
||||
|
||||
## Grain & Mutation Model
|
||||
|
||||
- `CartGrain` holds items, deliveries, pricing aggregates, and checkout/order metadata.
|
||||
- All mutations are registered via `RegisterMutation[T]` with signature:
|
||||
```
|
||||
func(*CartGrain, *T) error
|
||||
```
|
||||
- `WithTotals()` flag triggers automatic recalculation of totals after successful handlers.
|
||||
- The old giant `switch` in `CartGrain.Apply` has been replaced by registry dispatch; unregistered mutations fail fast.
|
||||
- Adding a mutation:
|
||||
1. Define proto message.
|
||||
2. Generate code.
|
||||
3. Register handler (optionally WithTotals).
|
||||
4. Add gRPC RPC + request wrapper if the mutation must be remotely invokable.
|
||||
5. (Optional) Add HTTP endpoint mapping to the mutation.
|
||||
|
||||
---
|
||||
|
||||
## Local Grain Pool
|
||||
|
||||
- Manages an in‑memory map `map[CartId]*CartGrain`.
|
||||
- Lazy spawn: first mutation or explicit access triggers `spawn(id)`.
|
||||
- TTL / purge loop periodically removes expired grains unless they changed recently (basic memory pressure management).
|
||||
- Capacity limit (`PoolSize`); oldest expired grain evicted first when full.
|
||||
|
||||
---
|
||||
|
||||
## Synced (Cluster) Pool
|
||||
|
||||
`SyncedPool` wraps a local pool and tracks:
|
||||
|
||||
- `remoteHosts`: known peer nodes (gRPC connections).
|
||||
- `remoteIndex`: mapping of cart id → remote grain proxy (`RemoteGrainGRPC`) for carts owned elsewhere.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
1. Discovery integration (via a `Discovery` interface) adds/removes hosts.
|
||||
2. Periodic ping health checks (ControlPlane.Ping).
|
||||
3. Ownership negotiation:
|
||||
- On first contention / unknown owner, node calls `ConfirmOwner` on peers to achieve quorum before making a local grain authoritative.
|
||||
4. Remote spawning:
|
||||
- When a remote host reports its cart ids (`GetCartIds`), the pool creates remote proxies for fast routing.
|
||||
|
||||
---
|
||||
|
||||
## Remote Grain Proxies
|
||||
|
||||
A `RemoteGrainGRPC` implements the `Grain` interface but delegates:
|
||||
|
||||
- `Apply` → Specific CartActor per‑mutation RPC (e.g., `AddItem`, `RemoveItem`) constructed from the mutation type. (Legacy envelope removed.)
|
||||
- `GetCurrentState` → `CartActor.GetState`.
|
||||
|
||||
Return path:
|
||||
|
||||
1. gRPC reply (CartMutationReply / StateReply) → proto `CartState`.
|
||||
2. `ToCartState` / mapping reconstructs a local `CartGrain` snapshot for callers expecting grain semantics.
|
||||
|
||||
---
|
||||
|
||||
## Control Plane (Inter‑Node Coordination)
|
||||
|
||||
Defined in `proto/control_plane.proto`:
|
||||
|
||||
| RPC | Purpose |
|
||||
|-----|---------|
|
||||
| `Ping` | Liveness; increments missed ping counter if failing. |
|
||||
| `Negotiate` | Merges membership views; used after discovery events. |
|
||||
| `GetCartIds` | Enumerate locally owned carts for remote index seeding. |
|
||||
| `ConfirmOwner` | Quorum acknowledgment for ownership claim. |
|
||||
| `Closing` | Graceful shutdown notice; peers remove host & associated remote grains. |
|
||||
|
||||
### Ownership / Quorum Rules
|
||||
|
||||
- If total participating hosts < 3 → all must accept.
|
||||
- Otherwise majority acceptance (`ok >= total/2`).
|
||||
- On failure → local tentative grain is removed (rollback to avoid split‑brain).
|
||||
|
||||
---
|
||||
|
||||
## Request / Mutation Flow Examples
|
||||
|
||||
### Local Mutation
|
||||
1. HTTP handler parses request → determines cart id.
|
||||
2. `SyncedPool.Apply`:
|
||||
- Finds local grain (or spawns new after quorum).
|
||||
- Executes registry mutation.
|
||||
3. Totals updated if flagged.
|
||||
4. HTTP response returns updated JSON (via `ToCartState`).
|
||||
|
||||
### Remote Mutation
|
||||
1. `SyncedPool.Apply` sees cart mapped to a remote host.
|
||||
2. Routes to `RemoteGrainGRPC.Apply`.
|
||||
3. Remote node executes mutation locally and returns updated state over gRPC.
|
||||
4. Proxy materializes snapshot locally (not authoritative, read‑only view).
|
||||
|
||||
### Checkout (Side‑Effecting, Non-Pure)
|
||||
- HTTP `/checkout` uses current grain snapshot to build payload (pure function).
|
||||
- Calls Klarna externally (not a mutation).
|
||||
- Applies `InitializeCheckout` mutation to persist reference + status.
|
||||
- Returns Klarna order JSON to client.
|
||||
|
||||
---
|
||||
|
||||
## Scaling & Deployment
|
||||
|
||||
- **Horizontal scaling**: Add more nodes; discovery layer (Kubernetes / service registry) feeds hosts to `SyncedPool`.
|
||||
- **Sharding**: Implicit by cart id hash. Ownership is first-claim with quorum acceptance.
|
||||
- **Hot spots**: A single popular cart remains on one node; for heavy multi-client concurrency, future work could add read replicas or partitioning (not implemented).
|
||||
- **Capacity tuning**: Increase `PoolSize` & memory limits; adjust TTL for stale cart eviction.
|
||||
|
||||
### Adding Nodes
|
||||
1. Node starts gRPC server (CartActor + ControlPlane).
|
||||
2. After brief delay, begins discovery watch; on event:
|
||||
- New host → dial + negotiate → seed remote cart ids.
|
||||
3. Pings maintain health; failed hosts removed (proxies invalidated).
|
||||
|
||||
---
|
||||
|
||||
## Failure Handling
|
||||
|
||||
| Scenario | Behavior |
|
||||
|----------|----------|
|
||||
| Remote host unreachable | Pings increment `MissedPings`; after threshold host removed. |
|
||||
| Ownership negotiation fails | Tentative local grain discarded. |
|
||||
| gRPC call error on remote mutation | Error bubbled to caller; no local fallback. |
|
||||
| Missing mutation registration | Fast failure with explicit error message. |
|
||||
| Partial checkout (Klarna fails) | No local state mutation for checkout; client sees error; cart remains unchanged. |
|
||||
|
||||
---
|
||||
|
||||
## Mutation Registry Summary
|
||||
|
||||
- Central, type-safe registry prevents silent omission.
|
||||
- Each handler:
|
||||
- Validates input.
|
||||
- Mutates `*CartGrain`.
|
||||
- Returns error for rejection.
|
||||
- Automatic totals recomputation reduces boilerplate and consistency risk.
|
||||
- Coverage test (add separately) can enforce all proto mutations are registered.
|
||||
|
||||
---
|
||||
|
||||
## gRPC Interfaces
|
||||
|
||||
- **CartActor**: Per-mutation unary RPCs + `GetState`. (Checkout logic intentionally excluded; handled at HTTP layer.)
|
||||
- **ControlPlane**: Cluster coordination (Ping, Negotiate, ConfirmOwner, etc.).
|
||||
|
||||
**Ports** (default / implied):
|
||||
- CartActor & ControlPlane share the same gRPC server/listener (single port, e.g. `:1337`).
|
||||
- Legacy frame/TCP code has been removed.
|
||||
|
||||
---
|
||||
|
||||
## Security & Future Enhancements
|
||||
|
||||
| Area | Potential Improvement |
|
||||
|------|------------------------|
|
||||
| Transport Security | Add TLS / mTLS to gRPC servers & clients. |
|
||||
| Auth / RBAC | Intercept CartActor RPCs with auth metadata. |
|
||||
| Backpressure | Rate-limit remote mutation calls per host. |
|
||||
| Observability | Add per-mutation Prometheus metrics & tracing spans. |
|
||||
| Ownership | Add lease timeouts / fencing tokens for stricter guarantees. |
|
||||
| Batch Ops | Introduce batch mutation RPC or streaming updates (WatchState). |
|
||||
| Persistence | Reintroduce event log or snapshot persistence layer if durability required. |
|
||||
|
||||
---
|
||||
|
||||
## Adding a New Node (Operational Checklist)
|
||||
|
||||
1. Deploy binary/container with same proto + registry.
|
||||
2. Expose gRPC port.
|
||||
3. Ensure discovery lists the new host.
|
||||
4. Node dials peers, negotiates membership.
|
||||
5. Remote cart proxies seeded.
|
||||
6. Traffic routed automatically based on ownership.
|
||||
|
||||
---
|
||||
|
||||
## Adding a New Mutation (Checklist Recap)
|
||||
|
||||
1. Define proto message (+ request wrapper & RPC if remote invocation needed).
|
||||
2. Regenerate protobuf code.
|
||||
3. Implement & register handler (`RegisterMutation`).
|
||||
4. Add client (HTTP/gRPC) endpoint.
|
||||
5. Write unit + integration tests.
|
||||
6. (Optional) Add to coverage test list and docs.
|
||||
|
||||
---
|
||||
|
||||
## High-Level Data Flow Diagram (Text)
|
||||
|
||||
```
|
||||
Client -> HTTP Handler -> SyncedPool -> (local?) -> Registry -> Grain State
|
||||
\-> (remote?) -> RemoteGrainGRPC -> gRPC -> Remote CartActor -> Registry -> Grain
|
||||
ControlPlane: Discovery Events <-> Negotiation/Ping/ConfirmOwner <-> SyncedPool state
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Likely Cause | Action |
|
||||
|---------|--------------|--------|
|
||||
| New cart every request | Secure cookie over plain HTTP or not sending cookie jar | Disable Secure locally or use HTTPS & proper curl `-b` |
|
||||
| Unsupported mutation error | Missing registry handler | Add `RegisterMutation` for that proto |
|
||||
| Ownership flapping | Quorum failing due to intermittent peers | Investigate `ConfirmOwner` errors / network |
|
||||
| Remote mutation latency | Network / serialization overhead | Consider batching or colocating hot carts |
|
||||
| Checkout returns 500 | Klarna call failed | Inspect logs; no grain state mutated |
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user