Complete refactor to new grpc control plane and only http proxy for carts (#4)
Co-authored-by: matst80 <mats.tornberg@gmail.com> Reviewed-on: https://git.tornberg.me/mats/go-cart-actor/pulls/4 Co-authored-by: Mats Törnberg <mats@tornberg.me> Co-committed-by: Mats Törnberg <mats@tornberg.me>
This commit was merged in pull request #4.
This commit is contained in:
374
README.md
374
README.md
@@ -1,12 +1,43 @@
|
||||
# Go Cart Actor
|
||||
|
||||
## Migration Notes (Ring-based Ownership Transition)
|
||||
|
||||
This release removes the legacy ConfirmOwner ownership negotiation RPC in favor of deterministic ownership via the consistent hashing ring.
|
||||
|
||||
Summary of changes:
|
||||
- ConfirmOwner RPC removed from the ControlPlane service.
|
||||
- OwnerChangeRequest message removed (was only used by ConfirmOwner).
|
||||
- OwnerChangeAck retained solely as the response type for the Closing RPC.
|
||||
- SyncedPool now relies exclusively on the ring for ownership (no quorum negotiation).
|
||||
- Remote proxy creation includes a bounded readiness retry to reduce first-call failures.
|
||||
- New Prometheus ring metrics:
|
||||
- cart_ring_epoch
|
||||
- cart_ring_hosts
|
||||
- cart_ring_vnodes
|
||||
- cart_ring_host_share{host}
|
||||
- cart_ring_lookup_local_total
|
||||
- cart_ring_lookup_remote_total
|
||||
|
||||
Action required for consumers:
|
||||
1. Regenerate protobuf code after pulling (requires protoc-gen-go and protoc-gen-go-grpc installed).
|
||||
2. Remove any client code or automation invoking ConfirmOwner (calls will now return UNIMPLEMENTED if using stale generated stubs).
|
||||
3. Update monitoring/alerts that referenced ConfirmOwner or ownership quorum failures—use ring metrics instead.
|
||||
4. If you previously interpreted “ownership flapping” via ConfirmOwner logs, now check for:
|
||||
- Rapid changes in ring epoch (cart_ring_epoch)
|
||||
- Host churn (cart_ring_hosts)
|
||||
- Imbalance in vnode distribution (cart_ring_host_share)
|
||||
|
||||
No data migration is necessary; cart IDs and grain state are unaffected.
|
||||
|
||||
---
|
||||
|
||||
A distributed cart management system using the actor model pattern.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Go 1.24.2+
|
||||
- Protocol Buffers compiler (`protoc`)
|
||||
- protoc-gen-go plugin
|
||||
- protoc-gen-go and protoc-gen-go-grpc plugins
|
||||
|
||||
### Installing Protocol Buffers
|
||||
|
||||
@@ -32,17 +63,20 @@ sudo apt install protobuf-compiler
|
||||
|
||||
```bash
|
||||
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
|
||||
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
|
||||
```
|
||||
|
||||
## Working with Protocol Buffers
|
||||
|
||||
### Generating Go code from proto files
|
||||
|
||||
After modifying `proto/messages.proto`, regenerate the Go code:
|
||||
After modifying any proto (`proto/messages.proto`, `proto/cart_actor.proto`, `proto/control_plane.proto`), regenerate the Go code (all three share the unified `messages` package):
|
||||
|
||||
```bash
|
||||
cd proto
|
||||
protoc --go_out=. --go_opt=paths=source_relative messages.proto
|
||||
protoc --go_out=. --go_opt=paths=source_relative \
|
||||
--go-grpc_out=. --go-grpc_opt=paths=source_relative \
|
||||
messages.proto cart_actor.proto control_plane.proto
|
||||
```
|
||||
|
||||
### Protocol Buffer Messages
|
||||
@@ -73,8 +107,338 @@ go build .
|
||||
go test ./...
|
||||
```
|
||||
|
||||
## HTTP API Quick Start (curl Examples)
|
||||
|
||||
Assuming the service is reachable at http://localhost:8080 and the cart API is mounted at /cart.
|
||||
Most endpoints use an HTTP cookie named `cartid` to track the cart. The first request will set it.
|
||||
|
||||
### 1. Get (or create) a cart
|
||||
```bash
|
||||
curl -i http://localhost:8080/cart/
|
||||
```
|
||||
Response sets a `cartid` cookie and returns the current (possibly empty) cart JSON.
|
||||
|
||||
### 2. Add an item by SKU (implicit quantity = 1)
|
||||
```bash
|
||||
curl -i --cookie-jar cookies.txt http://localhost:8080/cart/add/TEST-SKU-123
|
||||
```
|
||||
Stores cookie in `cookies.txt` for subsequent calls.
|
||||
|
||||
### 3. Add an item with explicit payload (country, quantity)
|
||||
```bash
|
||||
curl -i --cookie cookies.txt \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"sku":"TEST-SKU-456","quantity":2,"country":"se"}' \
|
||||
http://localhost:8080/cart/
|
||||
```
|
||||
|
||||
### 4. Change quantity of an existing line
|
||||
(First list the cart to find `id` of the line; here we use id=1 as an example)
|
||||
```bash
|
||||
curl -i --cookie cookies.txt \
|
||||
-X PUT -H "Content-Type: application/json" \
|
||||
-d '{"id":1,"quantity":3}' \
|
||||
http://localhost:8080/cart/
|
||||
```
|
||||
|
||||
### 5. Remove an item
|
||||
```bash
|
||||
curl -i --cookie cookies.txt -X DELETE http://localhost:8080/cart/1
|
||||
```
|
||||
|
||||
### 6. Set entire cart contents (overwrites items)
|
||||
```bash
|
||||
curl -i --cookie cookies.txt \
|
||||
-X POST -H "Content-Type: application/json" \
|
||||
-d '{"items":[{"sku":"TEST-SKU-AAA","quantity":1,"country":"se"},{"sku":"TEST-SKU-BBB","quantity":2,"country":"se"}]}' \
|
||||
http://localhost:8080/cart/set
|
||||
```
|
||||
|
||||
### 7. Add a delivery (provider + optional items)
|
||||
If `items` is empty or omitted, all items without a delivery get this one.
|
||||
```bash
|
||||
curl -i --cookie cookies.txt \
|
||||
-X POST -H "Content-Type: application/json" \
|
||||
-d '{"provider":"standard","items":[1,2]}' \
|
||||
http://localhost:8080/cart/delivery
|
||||
```
|
||||
|
||||
### 8. Remove a delivery by deliveryId
|
||||
```bash
|
||||
curl -i --cookie cookies.txt -X DELETE http://localhost:8080/cart/delivery/1
|
||||
```
|
||||
|
||||
### 9. Set a pickup point for a delivery
|
||||
```bash
|
||||
curl -i --cookie cookies.txt \
|
||||
-X PUT -H "Content-Type: application/json" \
|
||||
-d '{"id":"PUP123","name":"Locker 5","address":"Main St 1","city":"Stockholm","zip":"11122","country":"SE"}' \
|
||||
http://localhost:8080/cart/delivery/1/pickupPoint
|
||||
```
|
||||
|
||||
### 10. Checkout (returns HTML snippet from Klarna)
|
||||
```bash
|
||||
curl -i --cookie cookies.txt http://localhost:8080/cart/checkout
|
||||
```
|
||||
|
||||
### 11. Using a known cart id directly (bypassing cookie)
|
||||
If you already have a cart id (e.g. 1720000000000000):
|
||||
```bash
|
||||
CART_ID=1720000000000000
|
||||
curl -i http://localhost:8080/cart/byid/$CART_ID
|
||||
curl -i -X POST -H "Content-Type: application/json" \
|
||||
-d '{"sku":"TEST-SKU-XYZ","quantity":1,"country":"se"}' \
|
||||
http://localhost:8080/cart/byid/$CART_ID
|
||||
```
|
||||
|
||||
### 12. Clear cart cookie (forces a new cart on next request)
|
||||
```bash
|
||||
curl -i --cookie cookies.txt -X DELETE http://localhost:8080/cart/
|
||||
```
|
||||
|
||||
Tip: Use `--cookie-jar` and `--cookie` to persist the session across multiple commands:
|
||||
```bash
|
||||
curl --cookie-jar cookies.txt http://localhost:8080/cart/
|
||||
curl --cookie cookies.txt http://localhost:8080/cart/add/TEST-SKU-123
|
||||
```
|
||||
|
||||
## Important Notes
|
||||
|
||||
- Always regenerate protobuf Go code after modifying `.proto` files
|
||||
- Always regenerate protobuf Go code after modifying any `.proto` files (messages/cart_actor/control_plane)
|
||||
- The generated `messages.pb.go` file should not be edited manually
|
||||
- Make sure your PATH includes the protoc-gen-go binary location (usually `$GOPATH/bin`)
|
||||
- Make sure your PATH includes the protoc-gen-go binary location (usually `$GOPATH/bin`)
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
The system is a distributed, sharded (by cart id) actor model implementation:
|
||||
|
||||
- Each cart is a grain (an in‑memory struct `*CartGrain`) that owns and mutates its own state.
|
||||
- A **local grain pool** holds grains owned by the node.
|
||||
- A **synced (cluster) pool** (`SyncedPool`) coordinates multiple nodes and exposes local or remote grains through a uniform interface (`GrainPool`).
|
||||
- All inter‑node communication is gRPC:
|
||||
- Cart mutation & state RPCs (CartActor service).
|
||||
- Control plane RPCs (ControlPlane service) for membership, ownership negotiation, liveness, and graceful shutdown.
|
||||
|
||||
### Key Processes
|
||||
|
||||
1. Client HTTP request (or gRPC client) arrives with a cart identifier (cookie or path).
|
||||
2. The pool resolves ownership:
|
||||
- If local grain exists → use it.
|
||||
- If a remote host is known owner → a remote grain proxy (`RemoteGrainGRPC`) is used; it performs gRPC calls to the owning node.
|
||||
- If ownership is unknown → node attempts to claim ownership (quorum negotiation) and spawns a local grain.
|
||||
3. Mutation is executed via the **mutation registry** (registry wraps domain logic + optional totals recomputation).
|
||||
4. Updated state returned to caller; ownership preserved unless relinquished later (not yet implemented to shed load).
|
||||
|
||||
---
|
||||
|
||||
## Grain & Mutation Model
|
||||
|
||||
- `CartGrain` holds items, deliveries, pricing aggregates, and checkout/order metadata.
|
||||
- All mutations are registered via `RegisterMutation[T]` with signature:
|
||||
```
|
||||
func(*CartGrain, *T) error
|
||||
```
|
||||
- `WithTotals()` flag triggers automatic recalculation of totals after successful handlers.
|
||||
- The old giant `switch` in `CartGrain.Apply` has been replaced by registry dispatch; unregistered mutations fail fast.
|
||||
- Adding a mutation:
|
||||
1. Define proto message.
|
||||
2. Generate code.
|
||||
3. Register handler (optionally WithTotals).
|
||||
4. Add gRPC RPC + request wrapper if the mutation must be remotely invokable.
|
||||
5. (Optional) Add HTTP endpoint mapping to the mutation.
|
||||
|
||||
---
|
||||
|
||||
## Local Grain Pool
|
||||
|
||||
- Manages an in‑memory map `map[CartId]*CartGrain`.
|
||||
- Lazy spawn: first mutation or explicit access triggers `spawn(id)`.
|
||||
- TTL / purge loop periodically removes expired grains unless they changed recently (basic memory pressure management).
|
||||
- Capacity limit (`PoolSize`); oldest expired grain evicted first when full.
|
||||
|
||||
---
|
||||
|
||||
## Synced (Cluster) Pool
|
||||
|
||||
`SyncedPool` wraps a local pool and tracks:
|
||||
|
||||
- `remoteHosts`: known peer nodes (gRPC connections).
|
||||
- `remoteIndex`: mapping of cart id → remote grain proxy (`RemoteGrainGRPC`) for carts owned elsewhere.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
1. Discovery integration (via a `Discovery` interface) adds/removes hosts.
|
||||
2. Periodic ping health checks (ControlPlane.Ping).
|
||||
3. Ring-based deterministic ownership:
|
||||
- Ownership is derived directly from the consistent hashing ring (no quorum RPC or `ConfirmOwner`).
|
||||
4. Remote spawning:
|
||||
- When a remote host reports its cart ids (`GetCartIds`), the pool creates remote proxies for fast routing.
|
||||
|
||||
---
|
||||
|
||||
## Remote Grain Proxies
|
||||
|
||||
A `RemoteGrainGRPC` implements the `Grain` interface but delegates:
|
||||
|
||||
- `Apply` → Specific CartActor per‑mutation RPC (e.g., `AddItem`, `RemoveItem`) constructed from the mutation type. (Legacy envelope removed.)
|
||||
- `GetCurrentState` → `CartActor.GetState`.
|
||||
|
||||
Return path:
|
||||
|
||||
1. gRPC reply (CartMutationReply / StateReply) → proto `CartState`.
|
||||
2. `ToCartState` / mapping reconstructs a local `CartGrain` snapshot for callers expecting grain semantics.
|
||||
|
||||
---
|
||||
|
||||
## Control Plane (Inter‑Node Coordination)
|
||||
|
||||
Defined in `proto/control_plane.proto`:
|
||||
|
||||
| RPC | Purpose |
|
||||
|-----|---------|
|
||||
| `Ping` | Liveness; increments missed ping counter if failing. |
|
||||
| `Negotiate` | Merges membership views; used after discovery events. |
|
||||
| `GetCartIds` | Enumerate locally owned carts for remote index seeding. |
|
||||
| `Closing` | Graceful shutdown notice; peers remove host & associated remote grains. |
|
||||
|
||||
### Ownership / Quorum Rules
|
||||
|
||||
- If total participating hosts < 3 → all must accept.
|
||||
- Otherwise majority acceptance (`ok >= total/2`).
|
||||
- On failure → local tentative grain is removed (rollback to avoid split‑brain).
|
||||
|
||||
---
|
||||
|
||||
## Request / Mutation Flow Examples
|
||||
|
||||
### Local Mutation
|
||||
1. HTTP handler parses request → determines cart id.
|
||||
2. `SyncedPool.Apply`:
|
||||
- Finds local grain (or spawns new after quorum).
|
||||
- Executes registry mutation.
|
||||
3. Totals updated if flagged.
|
||||
4. HTTP response returns updated JSON (via `ToCartState`).
|
||||
|
||||
### Remote Mutation
|
||||
1. `SyncedPool.Apply` sees cart mapped to a remote host.
|
||||
2. Routes to `RemoteGrainGRPC.Apply`.
|
||||
3. Remote node executes mutation locally and returns updated state over gRPC.
|
||||
4. Proxy materializes snapshot locally (not authoritative, read‑only view).
|
||||
|
||||
### Checkout (Side‑Effecting, Non-Pure)
|
||||
- HTTP `/checkout` uses current grain snapshot to build payload (pure function).
|
||||
- Calls Klarna externally (not a mutation).
|
||||
- Applies `InitializeCheckout` mutation to persist reference + status.
|
||||
- Returns Klarna order JSON to client.
|
||||
|
||||
---
|
||||
|
||||
## Scaling & Deployment
|
||||
|
||||
- **Horizontal scaling**: Add more nodes; discovery layer (Kubernetes / service registry) feeds hosts to `SyncedPool`.
|
||||
- **Sharding**: Implicit by cart id hash. Ownership is first-claim with quorum acceptance.
|
||||
- **Hot spots**: A single popular cart remains on one node; for heavy multi-client concurrency, future work could add read replicas or partitioning (not implemented).
|
||||
- **Capacity tuning**: Increase `PoolSize` & memory limits; adjust TTL for stale cart eviction.
|
||||
|
||||
### Adding Nodes
|
||||
1. Node starts gRPC server (CartActor + ControlPlane).
|
||||
2. After brief delay, begins discovery watch; on event:
|
||||
- New host → dial + negotiate → seed remote cart ids.
|
||||
3. Pings maintain health; failed hosts removed (proxies invalidated).
|
||||
|
||||
---
|
||||
|
||||
## Failure Handling
|
||||
|
||||
| Scenario | Behavior |
|
||||
|----------|----------|
|
||||
| Remote host unreachable | Pings increment `MissedPings`; after threshold host removed. |
|
||||
| Ownership negotiation fails | Tentative local grain discarded. |
|
||||
| gRPC call error on remote mutation | Error bubbled to caller; no local fallback. |
|
||||
| Missing mutation registration | Fast failure with explicit error message. |
|
||||
| Partial checkout (Klarna fails) | No local state mutation for checkout; client sees error; cart remains unchanged. |
|
||||
|
||||
---
|
||||
|
||||
## Mutation Registry Summary
|
||||
|
||||
- Central, type-safe registry prevents silent omission.
|
||||
- Each handler:
|
||||
- Validates input.
|
||||
- Mutates `*CartGrain`.
|
||||
- Returns error for rejection.
|
||||
- Automatic totals recomputation reduces boilerplate and consistency risk.
|
||||
- Coverage test (add separately) can enforce all proto mutations are registered.
|
||||
|
||||
---
|
||||
|
||||
## gRPC Interfaces
|
||||
|
||||
- **CartActor**: Per-mutation unary RPCs + `GetState`. (Checkout logic intentionally excluded; handled at HTTP layer.)
|
||||
- **ControlPlane**: Cluster coordination (Ping, Negotiate, GetCartIds, Closing) — ownership now ring-determined (no ConfirmOwner).
|
||||
|
||||
**Ports** (default / implied):
|
||||
- CartActor & ControlPlane share the same gRPC server/listener (single port, e.g. `:1337`).
|
||||
- Legacy frame/TCP code has been removed.
|
||||
|
||||
---
|
||||
|
||||
## Security & Future Enhancements
|
||||
|
||||
| Area | Potential Improvement |
|
||||
|------|------------------------|
|
||||
| Transport Security | Add TLS / mTLS to gRPC servers & clients. |
|
||||
| Auth / RBAC | Intercept CartActor RPCs with auth metadata. |
|
||||
| Backpressure | Rate-limit remote mutation calls per host. |
|
||||
| Observability | Add per-mutation Prometheus metrics & tracing spans. |
|
||||
| Ownership | Add lease timeouts / fencing tokens for stricter guarantees. |
|
||||
| Batch Ops | Introduce batch mutation RPC or streaming updates (WatchState). |
|
||||
| Persistence | Reintroduce event log or snapshot persistence layer if durability required. |
|
||||
|
||||
---
|
||||
|
||||
## Adding a New Node (Operational Checklist)
|
||||
|
||||
1. Deploy binary/container with same proto + registry.
|
||||
2. Expose gRPC port.
|
||||
3. Ensure discovery lists the new host.
|
||||
4. Node dials peers, negotiates membership.
|
||||
5. Remote cart proxies seeded.
|
||||
6. Traffic routed automatically based on ownership.
|
||||
|
||||
---
|
||||
|
||||
## Adding a New Mutation (Checklist Recap)
|
||||
|
||||
1. Define proto message (+ request wrapper & RPC if remote invocation needed).
|
||||
2. Regenerate protobuf code.
|
||||
3. Implement & register handler (`RegisterMutation`).
|
||||
4. Add client (HTTP/gRPC) endpoint.
|
||||
5. Write unit + integration tests.
|
||||
6. (Optional) Add to coverage test list and docs.
|
||||
|
||||
---
|
||||
|
||||
## High-Level Data Flow Diagram (Text)
|
||||
|
||||
```
|
||||
Client -> HTTP Handler -> SyncedPool -> (local?) -> Registry -> Grain State
|
||||
\-> (remote?) -> RemoteGrainGRPC -> gRPC -> Remote CartActor -> Registry -> Grain
|
||||
ControlPlane: Discovery Events <-> Negotiation/Ping <-> SyncedPool state (ring determines ownership)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Likely Cause | Action |
|
||||
|---------|--------------|--------|
|
||||
| New cart every request | Secure cookie over plain HTTP or not sending cookie jar | Disable Secure locally or use HTTPS & proper curl `-b` |
|
||||
| Unsupported mutation error | Missing registry handler | Add `RegisterMutation` for that proto |
|
||||
| Ownership imbalance | Ring host distribution skew or rapid host churn | Examine `cart_ring_host_share`, `cart_ring_hosts`, and logs for host add/remove; rebalance or investigate instability |
|
||||
| Remote mutation latency | Network / serialization overhead | Consider batching or colocating hot carts |
|
||||
| Checkout returns 500 | Klarna call failed | Inspect logs; no grain state mutated |
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user