The full set of ORCA headers are returned by Arcana model containers
since the
20260115 release.HTTP ORCA header
The ORCA header in HTTP responses looks like:Max concurrency
Theapplication_utilization metric is calculated by dividing the number of
concurrent inference requests by a preconfigured max concurrency.
You can override the max concurrency after parameter
tuning by setting the
INFERENCE_CONCURRENCY_CAPACITY to the desired max concurrency.
The
INFERENCE_CONCURRENCY_CAPACITY variable is only used to calculate
the utilization to inform the load balancer. Setting it does not reject or
queue overflowing requests.
