Complete refactor to new grpc control plane and only http proxy for carts (#4)
All checks were successful
Build and Publish / Metadata (push) Successful in 11s
Build and Publish / BuildAndDeployAmd64 (push) Successful in 1m14s
Build and Publish / BuildAndDeployArm64 (push) Successful in 3m54s

Co-authored-by: matst80 <mats.tornberg@gmail.com>
Reviewed-on: https://git.tornberg.me/mats/go-cart-actor/pulls/4
Co-authored-by: Mats Törnberg <mats@tornberg.me>
Co-committed-by: Mats Törnberg <mats@tornberg.me>
This commit was merged in pull request #4.
This commit is contained in:
2025-10-14 22:31:12 +02:00
committed by mats
parent f735540c3d
commit f5014fe906
88 changed files with 9836 additions and 5646 deletions

374
README.md
View File

@@ -1,12 +1,43 @@
# Go Cart Actor
## Migration Notes (Ring-based Ownership Transition)
This release removes the legacy ConfirmOwner ownership negotiation RPC in favor of deterministic ownership via the consistent hashing ring.
Summary of changes:
- ConfirmOwner RPC removed from the ControlPlane service.
- OwnerChangeRequest message removed (was only used by ConfirmOwner).
- OwnerChangeAck retained solely as the response type for the Closing RPC.
- SyncedPool now relies exclusively on the ring for ownership (no quorum negotiation).
- Remote proxy creation includes a bounded readiness retry to reduce first-call failures.
- New Prometheus ring metrics:
- cart_ring_epoch
- cart_ring_hosts
- cart_ring_vnodes
- cart_ring_host_share{host}
- cart_ring_lookup_local_total
- cart_ring_lookup_remote_total
Action required for consumers:
1. Regenerate protobuf code after pulling (requires protoc-gen-go and protoc-gen-go-grpc installed).
2. Remove any client code or automation invoking ConfirmOwner (calls will now return UNIMPLEMENTED if using stale generated stubs).
3. Update monitoring/alerts that referenced ConfirmOwner or ownership quorum failures—use ring metrics instead.
4. If you previously interpreted “ownership flapping” via ConfirmOwner logs, now check for:
- Rapid changes in ring epoch (cart_ring_epoch)
- Host churn (cart_ring_hosts)
- Imbalance in vnode distribution (cart_ring_host_share)
No data migration is necessary; cart IDs and grain state are unaffected.
---
A distributed cart management system using the actor model pattern.
## Prerequisites
- Go 1.24.2+
- Protocol Buffers compiler (`protoc`)
- protoc-gen-go plugin
- protoc-gen-go and protoc-gen-go-grpc plugins
### Installing Protocol Buffers
@@ -32,17 +63,20 @@ sudo apt install protobuf-compiler
```bash
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
```
## Working with Protocol Buffers
### Generating Go code from proto files
After modifying `proto/messages.proto`, regenerate the Go code:
After modifying any proto (`proto/messages.proto`, `proto/cart_actor.proto`, `proto/control_plane.proto`), regenerate the Go code (all three share the unified `messages` package):
```bash
cd proto
protoc --go_out=. --go_opt=paths=source_relative messages.proto
protoc --go_out=. --go_opt=paths=source_relative \
--go-grpc_out=. --go-grpc_opt=paths=source_relative \
messages.proto cart_actor.proto control_plane.proto
```
### Protocol Buffer Messages
@@ -73,8 +107,338 @@ go build .
go test ./...
```
## HTTP API Quick Start (curl Examples)
Assuming the service is reachable at http://localhost:8080 and the cart API is mounted at /cart.
Most endpoints use an HTTP cookie named `cartid` to track the cart. The first request will set it.
### 1. Get (or create) a cart
```bash
curl -i http://localhost:8080/cart/
```
Response sets a `cartid` cookie and returns the current (possibly empty) cart JSON.
### 2. Add an item by SKU (implicit quantity = 1)
```bash
curl -i --cookie-jar cookies.txt http://localhost:8080/cart/add/TEST-SKU-123
```
Stores cookie in `cookies.txt` for subsequent calls.
### 3. Add an item with explicit payload (country, quantity)
```bash
curl -i --cookie cookies.txt \
-H "Content-Type: application/json" \
-d '{"sku":"TEST-SKU-456","quantity":2,"country":"se"}' \
http://localhost:8080/cart/
```
### 4. Change quantity of an existing line
(First list the cart to find `id` of the line; here we use id=1 as an example)
```bash
curl -i --cookie cookies.txt \
-X PUT -H "Content-Type: application/json" \
-d '{"id":1,"quantity":3}' \
http://localhost:8080/cart/
```
### 5. Remove an item
```bash
curl -i --cookie cookies.txt -X DELETE http://localhost:8080/cart/1
```
### 6. Set entire cart contents (overwrites items)
```bash
curl -i --cookie cookies.txt \
-X POST -H "Content-Type: application/json" \
-d '{"items":[{"sku":"TEST-SKU-AAA","quantity":1,"country":"se"},{"sku":"TEST-SKU-BBB","quantity":2,"country":"se"}]}' \
http://localhost:8080/cart/set
```
### 7. Add a delivery (provider + optional items)
If `items` is empty or omitted, all items without a delivery get this one.
```bash
curl -i --cookie cookies.txt \
-X POST -H "Content-Type: application/json" \
-d '{"provider":"standard","items":[1,2]}' \
http://localhost:8080/cart/delivery
```
### 8. Remove a delivery by deliveryId
```bash
curl -i --cookie cookies.txt -X DELETE http://localhost:8080/cart/delivery/1
```
### 9. Set a pickup point for a delivery
```bash
curl -i --cookie cookies.txt \
-X PUT -H "Content-Type: application/json" \
-d '{"id":"PUP123","name":"Locker 5","address":"Main St 1","city":"Stockholm","zip":"11122","country":"SE"}' \
http://localhost:8080/cart/delivery/1/pickupPoint
```
### 10. Checkout (returns HTML snippet from Klarna)
```bash
curl -i --cookie cookies.txt http://localhost:8080/cart/checkout
```
### 11. Using a known cart id directly (bypassing cookie)
If you already have a cart id (e.g. 1720000000000000):
```bash
CART_ID=1720000000000000
curl -i http://localhost:8080/cart/byid/$CART_ID
curl -i -X POST -H "Content-Type: application/json" \
-d '{"sku":"TEST-SKU-XYZ","quantity":1,"country":"se"}' \
http://localhost:8080/cart/byid/$CART_ID
```
### 12. Clear cart cookie (forces a new cart on next request)
```bash
curl -i --cookie cookies.txt -X DELETE http://localhost:8080/cart/
```
Tip: Use `--cookie-jar` and `--cookie` to persist the session across multiple commands:
```bash
curl --cookie-jar cookies.txt http://localhost:8080/cart/
curl --cookie cookies.txt http://localhost:8080/cart/add/TEST-SKU-123
```
## Important Notes
- Always regenerate protobuf Go code after modifying `.proto` files
- Always regenerate protobuf Go code after modifying any `.proto` files (messages/cart_actor/control_plane)
- The generated `messages.pb.go` file should not be edited manually
- Make sure your PATH includes the protoc-gen-go binary location (usually `$GOPATH/bin`)
- Make sure your PATH includes the protoc-gen-go binary location (usually `$GOPATH/bin`)
---
## Architecture Overview
The system is a distributed, sharded (by cart id) actor model implementation:
- Each cart is a grain (an inmemory struct `*CartGrain`) that owns and mutates its own state.
- A **local grain pool** holds grains owned by the node.
- A **synced (cluster) pool** (`SyncedPool`) coordinates multiple nodes and exposes local or remote grains through a uniform interface (`GrainPool`).
- All internode communication is gRPC:
- Cart mutation & state RPCs (CartActor service).
- Control plane RPCs (ControlPlane service) for membership, ownership negotiation, liveness, and graceful shutdown.
### Key Processes
1. Client HTTP request (or gRPC client) arrives with a cart identifier (cookie or path).
2. The pool resolves ownership:
- If local grain exists → use it.
- If a remote host is known owner → a remote grain proxy (`RemoteGrainGRPC`) is used; it performs gRPC calls to the owning node.
- If ownership is unknown → node attempts to claim ownership (quorum negotiation) and spawns a local grain.
3. Mutation is executed via the **mutation registry** (registry wraps domain logic + optional totals recomputation).
4. Updated state returned to caller; ownership preserved unless relinquished later (not yet implemented to shed load).
---
## Grain & Mutation Model
- `CartGrain` holds items, deliveries, pricing aggregates, and checkout/order metadata.
- All mutations are registered via `RegisterMutation[T]` with signature:
```
func(*CartGrain, *T) error
```
- `WithTotals()` flag triggers automatic recalculation of totals after successful handlers.
- The old giant `switch` in `CartGrain.Apply` has been replaced by registry dispatch; unregistered mutations fail fast.
- Adding a mutation:
1. Define proto message.
2. Generate code.
3. Register handler (optionally WithTotals).
4. Add gRPC RPC + request wrapper if the mutation must be remotely invokable.
5. (Optional) Add HTTP endpoint mapping to the mutation.
---
## Local Grain Pool
- Manages an inmemory map `map[CartId]*CartGrain`.
- Lazy spawn: first mutation or explicit access triggers `spawn(id)`.
- TTL / purge loop periodically removes expired grains unless they changed recently (basic memory pressure management).
- Capacity limit (`PoolSize`); oldest expired grain evicted first when full.
---
## Synced (Cluster) Pool
`SyncedPool` wraps a local pool and tracks:
- `remoteHosts`: known peer nodes (gRPC connections).
- `remoteIndex`: mapping of cart id → remote grain proxy (`RemoteGrainGRPC`) for carts owned elsewhere.
Responsibilities:
1. Discovery integration (via a `Discovery` interface) adds/removes hosts.
2. Periodic ping health checks (ControlPlane.Ping).
3. Ring-based deterministic ownership:
- Ownership is derived directly from the consistent hashing ring (no quorum RPC or `ConfirmOwner`).
4. Remote spawning:
- When a remote host reports its cart ids (`GetCartIds`), the pool creates remote proxies for fast routing.
---
## Remote Grain Proxies
A `RemoteGrainGRPC` implements the `Grain` interface but delegates:
- `Apply` → Specific CartActor permutation RPC (e.g., `AddItem`, `RemoveItem`) constructed from the mutation type. (Legacy envelope removed.)
- `GetCurrentState` → `CartActor.GetState`.
Return path:
1. gRPC reply (CartMutationReply / StateReply) → proto `CartState`.
2. `ToCartState` / mapping reconstructs a local `CartGrain` snapshot for callers expecting grain semantics.
---
## Control Plane (InterNode Coordination)
Defined in `proto/control_plane.proto`:
| RPC | Purpose |
|-----|---------|
| `Ping` | Liveness; increments missed ping counter if failing. |
| `Negotiate` | Merges membership views; used after discovery events. |
| `GetCartIds` | Enumerate locally owned carts for remote index seeding. |
| `Closing` | Graceful shutdown notice; peers remove host & associated remote grains. |
### Ownership / Quorum Rules
- If total participating hosts < 3 → all must accept.
- Otherwise majority acceptance (`ok >= total/2`).
- On failure → local tentative grain is removed (rollback to avoid splitbrain).
---
## Request / Mutation Flow Examples
### Local Mutation
1. HTTP handler parses request → determines cart id.
2. `SyncedPool.Apply`:
- Finds local grain (or spawns new after quorum).
- Executes registry mutation.
3. Totals updated if flagged.
4. HTTP response returns updated JSON (via `ToCartState`).
### Remote Mutation
1. `SyncedPool.Apply` sees cart mapped to a remote host.
2. Routes to `RemoteGrainGRPC.Apply`.
3. Remote node executes mutation locally and returns updated state over gRPC.
4. Proxy materializes snapshot locally (not authoritative, readonly view).
### Checkout (SideEffecting, Non-Pure)
- HTTP `/checkout` uses current grain snapshot to build payload (pure function).
- Calls Klarna externally (not a mutation).
- Applies `InitializeCheckout` mutation to persist reference + status.
- Returns Klarna order JSON to client.
---
## Scaling & Deployment
- **Horizontal scaling**: Add more nodes; discovery layer (Kubernetes / service registry) feeds hosts to `SyncedPool`.
- **Sharding**: Implicit by cart id hash. Ownership is first-claim with quorum acceptance.
- **Hot spots**: A single popular cart remains on one node; for heavy multi-client concurrency, future work could add read replicas or partitioning (not implemented).
- **Capacity tuning**: Increase `PoolSize` & memory limits; adjust TTL for stale cart eviction.
### Adding Nodes
1. Node starts gRPC server (CartActor + ControlPlane).
2. After brief delay, begins discovery watch; on event:
- New host → dial + negotiate → seed remote cart ids.
3. Pings maintain health; failed hosts removed (proxies invalidated).
---
## Failure Handling
| Scenario | Behavior |
|----------|----------|
| Remote host unreachable | Pings increment `MissedPings`; after threshold host removed. |
| Ownership negotiation fails | Tentative local grain discarded. |
| gRPC call error on remote mutation | Error bubbled to caller; no local fallback. |
| Missing mutation registration | Fast failure with explicit error message. |
| Partial checkout (Klarna fails) | No local state mutation for checkout; client sees error; cart remains unchanged. |
---
## Mutation Registry Summary
- Central, type-safe registry prevents silent omission.
- Each handler:
- Validates input.
- Mutates `*CartGrain`.
- Returns error for rejection.
- Automatic totals recomputation reduces boilerplate and consistency risk.
- Coverage test (add separately) can enforce all proto mutations are registered.
---
## gRPC Interfaces
- **CartActor**: Per-mutation unary RPCs + `GetState`. (Checkout logic intentionally excluded; handled at HTTP layer.)
- **ControlPlane**: Cluster coordination (Ping, Negotiate, GetCartIds, Closing) — ownership now ring-determined (no ConfirmOwner).
**Ports** (default / implied):
- CartActor & ControlPlane share the same gRPC server/listener (single port, e.g. `:1337`).
- Legacy frame/TCP code has been removed.
---
## Security & Future Enhancements
| Area | Potential Improvement |
|------|------------------------|
| Transport Security | Add TLS / mTLS to gRPC servers & clients. |
| Auth / RBAC | Intercept CartActor RPCs with auth metadata. |
| Backpressure | Rate-limit remote mutation calls per host. |
| Observability | Add per-mutation Prometheus metrics & tracing spans. |
| Ownership | Add lease timeouts / fencing tokens for stricter guarantees. |
| Batch Ops | Introduce batch mutation RPC or streaming updates (WatchState). |
| Persistence | Reintroduce event log or snapshot persistence layer if durability required. |
---
## Adding a New Node (Operational Checklist)
1. Deploy binary/container with same proto + registry.
2. Expose gRPC port.
3. Ensure discovery lists the new host.
4. Node dials peers, negotiates membership.
5. Remote cart proxies seeded.
6. Traffic routed automatically based on ownership.
---
## Adding a New Mutation (Checklist Recap)
1. Define proto message (+ request wrapper & RPC if remote invocation needed).
2. Regenerate protobuf code.
3. Implement & register handler (`RegisterMutation`).
4. Add client (HTTP/gRPC) endpoint.
5. Write unit + integration tests.
6. (Optional) Add to coverage test list and docs.
---
## High-Level Data Flow Diagram (Text)
```
Client -> HTTP Handler -> SyncedPool -> (local?) -> Registry -> Grain State
\-> (remote?) -> RemoteGrainGRPC -> gRPC -> Remote CartActor -> Registry -> Grain
ControlPlane: Discovery Events <-> Negotiation/Ping <-> SyncedPool state (ring determines ownership)
```
---
## Troubleshooting
| Symptom | Likely Cause | Action |
|---------|--------------|--------|
| New cart every request | Secure cookie over plain HTTP or not sending cookie jar | Disable Secure locally or use HTTPS & proper curl `-b` |
| Unsupported mutation error | Missing registry handler | Add `RegisterMutation` for that proto |
| Ownership imbalance | Ring host distribution skew or rapid host churn | Examine `cart_ring_host_share`, `cart_ring_hosts`, and logs for host add/remove; rebalance or investigate instability |
| Remote mutation latency | Network / serialization overhead | Consider batching or colocating hot carts |
| Checkout returns 500 | Klarna call failed | Inspect logs; no grain state mutated |
---