Hi, one of the features we seem to be missing in this library is the means to access the response headers that were returned with the original HTTP response - in particular to allow us to track rate limit usage for Anthropic First Party and latency information for AWS Bedrock.
For example in a curl request to Anthropic useful information is returned:
anthropic-ratelimit-input-tokens-limit: 10000
anthropic-ratelimit-input-tokens-remaining: 8000
anthropic-ratelimit-input-tokens-reset: 2025-07-23T15:05:10Z
anthropic-ratelimit-output-tokens-limit: 4000
anthropic-ratelimit-output-tokens-remaining: 4000
anthropic-ratelimit-output-tokens-reset: 2025-07-23T15:04:59Z
anthropic-ratelimit-requests-limit: 5
anthropic-ratelimit-requests-remaining: 4
anthropic-ratelimit-requests-reset: 2025-07-23T15:05:09Z
anthropic-ratelimit-tokens-limit: 14000
anthropic-ratelimit-tokens-remaining: 12000
anthropic-ratelimit-tokens-reset: 2025-07-23T15:04:59Z
And for AWS Bedrock:
x-amzn-bedrock-invocation-latency: 2270
I can see how to access these in the Python SDK by using the with_raw_response e.g.:
>>> response = client.messages.with_raw_response.create(
... model="claude-3-5-sonnet-20241022",
... max_tokens=1000,
... messages=[{"role": "user", "content": "Hello"}]
... )
>>> response
<APIResponse [200 OK] type=<class 'anthropic.types.message.Message'>>
>>> response.headers
Headers({'date': 'Fri, 25 Jul 2025 14:28:57 GMT', 'content-type': 'application/json', 'transfer-encoding': 'chunked', 'connection': 'keep-alive', 'content-encoding': 'gzip', 'anthropic-ratelimit-input-tokens-limit': '20000', 'anthropic-ratelimit-input-tokens-remaining': '20000', 'anthropic-ratelimit-input-tokens-reset': '2025-07-25T14:28:57Z', 'anthropic-ratelimit-output-tokens-limit': '4000', 'anthropic-ratelimit-output-tokens-remaining': '4000', 'anthropic-ratelimit-output-tokens-reset': '2025-07-25T14:28:57Z', 'anthropic-ratelimit-requests-limit': '5', 'anthropic-ratelimit-requests-remaining': '4', 'anthropic-ratelimit-requests-reset': '2025-07-25T14:28:55Z', 'anthropic-ratelimit-tokens-limit': '24000', 'anthropic-ratelimit-tokens-remaining': '24000', 'anthropic-ratelimit-tokens-reset': '2025-07-25T14:28:57Z', 'request-id': 'req_011CRTtstudugSKpBy5C6idJ', 'strict-transport-security': 'max-age=31536000; includeSubDomains; preload', 'anthropic-organization-id': 'f8bf172e-0b01-42f5-9955-e0c43e1cb1a4', 'via': '1.1 google', 'cf-cache-status': 'DYNAMIC', 'x-robots-tag': 'none', 'server': 'cloudflare', 'cf-ray': '964c586c4a7b6405-LHR'})
But there doesn't seem to be an equivalent for Ruby
Hi, one of the features we seem to be missing in this library is the means to access the response headers that were returned with the original HTTP response - in particular to allow us to track rate limit usage for Anthropic First Party and latency information for AWS Bedrock.
For example in a curl request to Anthropic useful information is returned:
And for AWS Bedrock:
I can see how to access these in the Python SDK by using the
with_raw_responsee.g.:But there doesn't seem to be an equivalent for Ruby