Skip to content

Load Shedding

go-zero uses an adaptive load shedder based on CPU utilisation and in-flight request count. When the system is overloaded, new requests are rejected with HTTP 503 / gRPC UNAVAILABLE to protect existing in-flight work.

The shedder combines two signals to decide whether to accept a request:

  1. CPU usage — sampled every 250 ms via /proc/stat. Shedding activates once CPU exceeds the configured threshold (default 90%).
  2. Pass rate — the rolling ratio of completed requests to total attempted requests in the last sliding window. If the pass rate drops below a calculated floor, new arrivals are shed.

This double-gate ensures you never shed when the CPU is healthy, and always shed when it is saturated, regardless of the request volume.

Load shedding is enabled by default for every rest.Server. Configure the CPU threshold in your YAML:

etc/app.yaml
CpuThreshold: 900 # 90% — unit is millicores × 10 (0-1000)

When a request is shed, the server responds with HTTP 503 Service Unavailable and a X-Content-Type-Options: nosniff header.

To add a custom handler for shed requests:

server := rest.MustNewServer(c.RestConf,
rest.WithUnauthorizedCallback(func(w http.ResponseWriter, r *http.Request, err error) {
// custom 503 body
httpx.WriteJson(w, http.StatusServiceUnavailable, map[string]string{
"code": "OVERLOADED",
"msg": "service temporarily unavailable",
})
}),
)

The SheddingInterceptor is registered automatically on every zrpc.Server. Shed requests return codes.ResourceExhausted (429):

// Automatically registered — no code changes needed.
// Shed requests get:
// status.Error(codes.ResourceExhausted, "concurrent connections over threshold")

Callers using go-zero’s gRPC client see zrpc.ErrResourceExhausted and can decide to retry with backoff or fallback.

Use load.NewAdaptiveShedder when you need programmatic control — for example, to wrap a non-HTTP workload like a message consumer:

import (
"github.com/zeromicro/go-zero/core/load"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
shedder := load.NewAdaptiveShedder(
load.WithCpuThreshold(800), // activate at 80% CPU
load.WithWindow(5*time.Second), // sliding window size
load.WithBuckets(50), // window buckets (granularity)
)
func processMessage(msg Message) error {
promise, err := shedder.Allow()
if err != nil {
// overloaded — drop the message or push back to queue
metrics.Inc("messages.shed")
return ErrOverloaded
}
procErr := handle(msg)
// IMPORTANT: always call Pass or Fail
if procErr != nil {
promise.Fail() // counts as failed — lowers pass rate
} else {
promise.Pass() // counts as success
}
return procErr
}
OptionDefaultDescription
WithCpuThreshold(n)900CPU threshold in millicores×10 (0–1000)
WithWindow(d)5sSliding window duration
WithBuckets(n)50Number of buckets in window

When Prometheus is enabled, the shedder exports:

MetricTypeDescription
shedding_drops_totalCounterTotal requests shed
shedding_pass_totalCounterTotal requests passed
cpu_usageGaugeCurrent CPU usage (0–1000)
  • Tune the threshold based on your service’s CPU profile. Stateless services can tolerate higher thresholds (900–950); CPU-intensive services should use 700–800.
  • Monitor shedding_drops_total alongside error rate. A spike in drops usually indicates a traffic surge or a slow downstream dependency.
  • Combine with circuit breaker and rate limiter for defence in depth: rate limiter caps steady-state load, circuit breaker stops calls to unhealthy dependencies, load shedder protects the process itself.
  • Do not disable shedding in production unless you have an external quota enforcement layer.