* Returning the bucket name in RGW response
@ 2013-11-06 19:33 Wido den Hollander
2013-11-06 21:12 ` Yehuda Sadeh
0 siblings, 1 reply; 5+ messages in thread
From: Wido den Hollander @ 2013-11-06 19:33 UTC (permalink / raw)
To: ceph-devel
Hi,
I'm working on a RGW setup where I'm using Varnish[0] to cache objects,
but when doing so you run into the problem that a lot of (cached)
requests will not reach the RGW itself so the accounting of traffic
isn't correct.
To overcome this I've been sending all the logs from Varnish to
Logstash[1] and into ElasticSearch and afterwards analyzing the logs in
ElasticSearch to find out how much traffic each bucket did.
This method works, but it isn't safe enough. Since I'm currently parsing
the "Host" header to find out which bucket it was, but this isn't always
safe since users can CNAME.
So I've been playing with the idea to add the "Rgwx-bucket" header to
each response which tells you which bucket the request was made to.
In Varnish I can catch this response header and send it to Logstash so I
have a safer method of which requests was done by which bucket.
I'm using Varnish, but you could do the same with nginx or any HTTP
caching proxy.
Would it be an idea to add this to RGW? I have it running on my system
and it works fine, but it's currently a bit hacky.
A config variable like "rgw expose bucket" could be false by default,
but when set to true RGW would send the response header with the bucket
name.
How does this sound?
P.S.: When this is all up and running I'm planning to make a cool
presentation about this for the next Ceph day.
[0]: http://www.varnish-cache.org/
[1]: http://www.logstash.net/
--
Wido den Hollander
42on B.V.
Phone: +31 (0)20 700 9902
Skype: contact42on
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Returning the bucket name in RGW response
2013-11-06 19:33 Returning the bucket name in RGW response Wido den Hollander
@ 2013-11-06 21:12 ` Yehuda Sadeh
2013-11-11 20:40 ` Wido den Hollander
0 siblings, 1 reply; 5+ messages in thread
From: Yehuda Sadeh @ 2013-11-06 21:12 UTC (permalink / raw)
To: Wido den Hollander; +Cc: ceph-devel
On Wed, Nov 6, 2013 at 11:33 AM, Wido den Hollander <wido@42on.com> wrote:
> Hi,
>
> I'm working on a RGW setup where I'm using Varnish[0] to cache objects, but
> when doing so you run into the problem that a lot of (cached) requests will
> not reach the RGW itself so the accounting of traffic isn't correct.
>
> To overcome this I've been sending all the logs from Varnish to Logstash[1]
> and into ElasticSearch and afterwards analyzing the logs in ElasticSearch to
> find out how much traffic each bucket did.
>
> This method works, but it isn't safe enough. Since I'm currently parsing the
> "Host" header to find out which bucket it was, but this isn't always safe
> since users can CNAME.
>
> So I've been playing with the idea to add the "Rgwx-bucket" header to each
> response which tells you which bucket the request was made to.
>
> In Varnish I can catch this response header and send it to Logstash so I
> have a safer method of which requests was done by which bucket.
>
> I'm using Varnish, but you could do the same with nginx or any HTTP caching
> proxy.
>
> Would it be an idea to add this to RGW? I have it running on my system and
> it works fine, but it's currently a bit hacky.
Yeah, I don't see why not. As long as it's configurable.
>
> A config variable like "rgw expose bucket" could be false by default, but
> when set to true RGW would send the response header with the bucket name.
>
> How does this sound?
Sounds good, just need to see the code now ...
>
> P.S.: When this is all up and running I'm planning to make a cool
> presentation about this for the next Ceph day.
>
Awesome!
Yehuda
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Returning the bucket name in RGW response
2013-11-06 21:12 ` Yehuda Sadeh
@ 2013-11-11 20:40 ` Wido den Hollander
[not found] ` <CABBk=J9HUyESppF2L5ABe9dU=yGiY3Z2wfJJsaNgRFBqhdhYNA@mail.gmail.com>
0 siblings, 1 reply; 5+ messages in thread
From: Wido den Hollander @ 2013-11-11 20:40 UTC (permalink / raw)
To: Yehuda Sadeh; +Cc: ceph-devel
On 11/06/2013 10:12 PM, Yehuda Sadeh wrote:
> On Wed, Nov 6, 2013 at 11:33 AM, Wido den Hollander <wido@42on.com> wrote:
>> Hi,
>>
>> I'm working on a RGW setup where I'm using Varnish[0] to cache objects, but
>> when doing so you run into the problem that a lot of (cached) requests will
>> not reach the RGW itself so the accounting of traffic isn't correct.
>>
>> To overcome this I've been sending all the logs from Varnish to Logstash[1]
>> and into ElasticSearch and afterwards analyzing the logs in ElasticSearch to
>> find out how much traffic each bucket did.
>>
>> This method works, but it isn't safe enough. Since I'm currently parsing the
>> "Host" header to find out which bucket it was, but this isn't always safe
>> since users can CNAME.
>>
>> So I've been playing with the idea to add the "Rgwx-bucket" header to each
>> response which tells you which bucket the request was made to.
>>
>> In Varnish I can catch this response header and send it to Logstash so I
>> have a safer method of which requests was done by which bucket.
>>
>> I'm using Varnish, but you could do the same with nginx or any HTTP caching
>> proxy.
>>
>> Would it be an idea to add this to RGW? I have it running on my system and
>> it works fine, but it's currently a bit hacky.
>
> Yeah, I don't see why not. As long as it's configurable.
>
>>
>> A config variable like "rgw expose bucket" could be false by default, but
>> when set to true RGW would send the response header with the bucket name.
>>
>> How does this sound?
>
>
> Sounds good, just need to see the code now ...
>
I did it way to complex until I looked at the code again today and came
up with a much simpler patch. It's in wip-rgw-expose-bucket now:
https://github.com/ceph/ceph/commit/f321471df2703ae706910757a133ab8a13803acb
The dump_bucket_from_state method was already there, but it's not used
anywhere. So I modified it a bit to have it honor the configuration boolean.
It writes the header "Bucket" although we might want to change it to
Rgwx-Bucket or X-Bucket where I prefer the last one.
The unwritten rule is that when you come up with custom header to prefix
it with "X-".
How does this sound?
Wido
>>
>> P.S.: When this is all up and running I'm planning to make a cool
>> presentation about this for the next Ceph day.
>>
>
> Awesome!
>
> Yehuda
>
--
Wido den Hollander
42on B.V.
Phone: +31 (0)20 700 9902
Skype: contact42on
^ permalink raw reply [flat|nested] 5+ messages in thread
* Fwd: Returning the bucket name in RGW response
[not found] ` <CABBk=J9HUyESppF2L5ABe9dU=yGiY3Z2wfJJsaNgRFBqhdhYNA@mail.gmail.com>
@ 2013-11-12 4:37 ` Yehuda Sadeh
2013-11-12 21:09 ` Wido den Hollander
0 siblings, 1 reply; 5+ messages in thread
From: Yehuda Sadeh @ 2013-11-12 4:37 UTC (permalink / raw)
To: Wido den Hollander, ceph-devel
(resending as plain text)
I'm away and on my phone, I'll make it short. Overall direction is ok.
Run git submodule update because a submodule change snuck in.
I'd move the header dumping into a new RGWOp::pre_exec() callback and
fold the dump_continue() also into it.
Note that dumping bucket is only applicable for the object store api,
so need to take it into account (e.g., do it in the appropriate
subclass of RGWOp).
Yehuda
On Nov 11, 2013 12:40 PM, "Wido den Hollander" <wido@42on.com> wrote:
>
> On 11/06/2013 10:12 PM, Yehuda Sadeh wrote:
>>
>> On Wed, Nov 6, 2013 at 11:33 AM, Wido den Hollander <wido@42on.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm working on a RGW setup where I'm using Varnish[0] to cache objects, but
>>> when doing so you run into the problem that a lot of (cached) requests will
>>> not reach the RGW itself so the accounting of traffic isn't correct.
>>>
>>> To overcome this I've been sending all the logs from Varnish to Logstash[1]
>>> and into ElasticSearch and afterwards analyzing the logs in ElasticSearch to
>>> find out how much traffic each bucket did.
>>>
>>> This method works, but it isn't safe enough. Since I'm currently parsing the
>>> "Host" header to find out which bucket it was, but this isn't always safe
>>> since users can CNAME.
>>>
>>> So I've been playing with the idea to add the "Rgwx-bucket" header to each
>>> response which tells you which bucket the request was made to.
>>>
>>> In Varnish I can catch this response header and send it to Logstash so I
>>> have a safer method of which requests was done by which bucket.
>>>
>>> I'm using Varnish, but you could do the same with nginx or any HTTP caching
>>> proxy.
>>>
>>> Would it be an idea to add this to RGW? I have it running on my system and
>>> it works fine, but it's currently a bit hacky.
>>
>>
>> Yeah, I don't see why not. As long as it's configurable.
>>
>>>
>>> A config variable like "rgw expose bucket" could be false by default, but
>>> when set to true RGW would send the response header with the bucket name.
>>>
>>> How does this sound?
>>
>>
>>
>> Sounds good, just need to see the code now ...
>>
>
> I did it way to complex until I looked at the code again today and came up with a much simpler patch. It's in wip-rgw-expose-bucket now: https://github.com/ceph/ceph/commit/f321471df2703ae706910757a133ab8a13803acb
>
> The dump_bucket_from_state method was already there, but it's not used anywhere. So I modified it a bit to have it honor the configuration boolean.
>
> It writes the header "Bucket" although we might want to change it to Rgwx-Bucket or X-Bucket where I prefer the last one.
>
> The unwritten rule is that when you come up with custom header to prefix it with "X-".
>
> How does this sound?
>
> Wido
>
>>>
>>> P.S.: When this is all up and running I'm planning to make a cool
>>> presentation about this for the next Ceph day.
>>>
>>
>> Awesome!
>>
>> Yehuda
>>
>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Returning the bucket name in RGW response
2013-11-12 4:37 ` Fwd: " Yehuda Sadeh
@ 2013-11-12 21:09 ` Wido den Hollander
0 siblings, 0 replies; 5+ messages in thread
From: Wido den Hollander @ 2013-11-12 21:09 UTC (permalink / raw)
To: Yehuda Sadeh; +Cc: ceph-devel
> Op 12 nov. 2013 om 05:37 heeft "Yehuda Sadeh" <yehuda@inktank.com> het volgende geschreven:
>
> (resending as plain text)
>
> I'm away and on my phone, I'll make it short. Overall direction is ok.
> Run git submodule update because a submodule change snuck in.
> I'd move the header dumping into a new RGWOp::pre_exec() callback and
> fold the dump_continue() also into it.
> Note that dumping bucket is only applicable for the object store api,
> so need to take it into account (e.g., do it in the appropriate
> subclass of RGWOp).
>
I pushed a revised patch, was this how you meant it?
I backported it to dumpling on my cluster and works fine. Didn't feel like upgrading to master yet.
Wido
> Yehuda
>
>> On Nov 11, 2013 12:40 PM, "Wido den Hollander" <wido@42on.com> wrote:
>>
>>> On 11/06/2013 10:12 PM, Yehuda Sadeh wrote:
>>>
>>>> On Wed, Nov 6, 2013 at 11:33 AM, Wido den Hollander <wido@42on.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm working on a RGW setup where I'm using Varnish[0] to cache objects, but
>>>> when doing so you run into the problem that a lot of (cached) requests will
>>>> not reach the RGW itself so the accounting of traffic isn't correct.
>>>>
>>>> To overcome this I've been sending all the logs from Varnish to Logstash[1]
>>>> and into ElasticSearch and afterwards analyzing the logs in ElasticSearch to
>>>> find out how much traffic each bucket did.
>>>>
>>>> This method works, but it isn't safe enough. Since I'm currently parsing the
>>>> "Host" header to find out which bucket it was, but this isn't always safe
>>>> since users can CNAME.
>>>>
>>>> So I've been playing with the idea to add the "Rgwx-bucket" header to each
>>>> response which tells you which bucket the request was made to.
>>>>
>>>> In Varnish I can catch this response header and send it to Logstash so I
>>>> have a safer method of which requests was done by which bucket.
>>>>
>>>> I'm using Varnish, but you could do the same with nginx or any HTTP caching
>>>> proxy.
>>>>
>>>> Would it be an idea to add this to RGW? I have it running on my system and
>>>> it works fine, but it's currently a bit hacky.
>>>
>>>
>>> Yeah, I don't see why not. As long as it's configurable.
>>>
>>>>
>>>> A config variable like "rgw expose bucket" could be false by default, but
>>>> when set to true RGW would send the response header with the bucket name.
>>>>
>>>> How does this sound?
>>>
>>>
>>>
>>> Sounds good, just need to see the code now ...
>>
>> I did it way to complex until I looked at the code again today and came up with a much simpler patch. It's in wip-rgw-expose-bucket now: https://github.com/ceph/ceph/commit/f321471df2703ae706910757a133ab8a13803acb
>>
>> The dump_bucket_from_state method was already there, but it's not used anywhere. So I modified it a bit to have it honor the configuration boolean.
>>
>> It writes the header "Bucket" although we might want to change it to Rgwx-Bucket or X-Bucket where I prefer the last one.
>>
>> The unwritten rule is that when you come up with custom header to prefix it with "X-".
>>
>> How does this sound?
>>
>> Wido
>>
>>>>
>>>> P.S.: When this is all up and running I'm planning to make a cool
>>>> presentation about this for the next Ceph day.
>>>
>>> Awesome!
>>>
>>> Yehuda
>>
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-11-12 21:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-06 19:33 Returning the bucket name in RGW response Wido den Hollander
2013-11-06 21:12 ` Yehuda Sadeh
2013-11-11 20:40 ` Wido den Hollander
[not found] ` <CABBk=J9HUyESppF2L5ABe9dU=yGiY3Z2wfJJsaNgRFBqhdhYNA@mail.gmail.com>
2013-11-12 4:37 ` Fwd: " Yehuda Sadeh
2013-11-12 21:09 ` Wido den Hollander
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.