All of lore.kernel.org
 help / color / mirror / Atom feed
* RGW object purging in upstream caches
@ 2013-01-22 16:41 Wido den Hollander
  2013-01-22 17:15 ` Jeff Mitchell
  2013-05-07 19:45 ` John Nielsen
  0 siblings, 2 replies; 5+ messages in thread
From: Wido den Hollander @ 2013-01-22 16:41 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hi,

(http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/12316)
 > Hopefully not Varnish-specific; something like the Last-Modified header
 > would be good.
 >
 > Also there are tricks you can do with queries; see for instance
 > http://forum.nginx.org/read.php?2,1047,1052

It seems like a good discussion to have, since more people are looking 
at this.

Yehuda started a thread in May last year about the future directions of 
the RGW: http://www.spinics.net/lists/ceph-devel/msg06257.html

To improve performance I still think using a proxy like Varnish or Nginx 
can help a lot.

Now, when running just one Varnish instance which does loadbalancing 
over multiple RGW instances is not a real problem. When it sees a PUT 
operation it can "purge" (called banning in Varnish) the object from 
it's cache.

When looking at the scenario where you have multiple caches you run into 
the cache-consistency problem. If an object is modified the caches are 
not notified and will continue to serve an outdated object.

Looking at the Last-Modified header is not an option since the cache 
will not contact RGW when serving out of it's cache.

To handle this there has to be some kind of "hook" inside RGW that can 
notify Varnish (or some other cache) when an object changes.

Is this something that's on the roadmap? Thoughts?

Wido

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RGW object purging in upstream caches
  2013-01-22 16:41 RGW object purging in upstream caches Wido den Hollander
@ 2013-01-22 17:15 ` Jeff Mitchell
  2013-05-07 19:45 ` John Nielsen
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff Mitchell @ 2013-01-22 17:15 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel@vger.kernel.org

Wido den Hollander wrote:
> Now, when running just one Varnish instance which does loadbalancing
> over multiple RGW instances is not a real problem. When it sees a PUT
> operation it can "purge" (called banning in Varnish) the object from
> it's cache.
>
> When looking at the scenario where you have multiple caches you run into
> the cache-consistency problem. If an object is modified the caches are
> not notified and will continue to serve an outdated object.
>
> Looking at the Last-Modified header is not an option since the cache
> will not contact RGW when serving out of it's cache.
>
> To handle this there has to be some kind of "hook" inside RGW that can
> notify Varnish (or some other cache) when an object changes.

For nginx, it appears there is a well-tested production module that does 
this: http://labs.frickle.com/nginx_ngx_cache_purge/ (see the examples 
at the end of the README)

--Jeff


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RGW object purging in upstream caches
  2013-01-22 16:41 RGW object purging in upstream caches Wido den Hollander
  2013-01-22 17:15 ` Jeff Mitchell
@ 2013-05-07 19:45 ` John Nielsen
  2013-05-07 22:25   ` Yehuda Sadeh
  1 sibling, 1 reply; 5+ messages in thread
From: John Nielsen @ 2013-05-07 19:45 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel@vger.kernel.org

On Jan 22, 2013, at 9:41 AM, Wido den Hollander <wido@widodh.nl> wrote:

> To improve performance I still think using a proxy like Varnish or Nginx can help a lot.
> 
> Now, when running just one Varnish instance which does loadbalancing over multiple RGW instances is not a real problem. When it sees a PUT operation it can "purge" (called banning in Varnish) the object from it's cache.
> 
> When looking at the scenario where you have multiple caches you run into the cache-consistency problem. If an object is modified the caches are not notified and will continue to serve an outdated object.
> 
> Looking at the Last-Modified header is not an option since the cache will not contact RGW when serving out of it's cache.
> 
> To handle this there has to be some kind of "hook" inside RGW that can notify Varnish (or some other cache) when an object changes.
> 
> Is this something that's on the roadmap? Thoughts?

Responding to an old thread, but I've been thinking about this again lately.

I agree it would be nice for an upstream cache to know when its content is invalid. I realized today that RGW already does this for itself--multiple gateways on the same RADOS cluster share cache state with each other by updating and rados_watch()ing the notify_oid's in the .rgw.control pool.

I would like to see a small "RGW cache notifier" utility that just does the "watch" half. It could be a standalone read-only librados client with configurable output. You would tell it what pool to watch (i.e. ".rgw.control") and it would spit out event details as they happen. A small amount of scripting or other glue could turn this utility in to a real-time cache invalidator for Varnish or anything else.

Any comments or volunteers? If I'm not mistaken (which is by no means certain), someone familiar with the code could add such a utility by creating a new .cc file with a simple main() function and a modified RGWCache::watch_cb() function, then linking it to the build with a stanza in Makefile.am similar to that for radosgw-admin. I might even take a crack at it myself but my C++ is more than a bit rusty.

In any case, comments welcome.

JN


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RGW object purging in upstream caches
  2013-05-07 19:45 ` John Nielsen
@ 2013-05-07 22:25   ` Yehuda Sadeh
  2013-05-07 23:01     ` John Nielsen
  0 siblings, 1 reply; 5+ messages in thread
From: Yehuda Sadeh @ 2013-05-07 22:25 UTC (permalink / raw)
  To: John Nielsen; +Cc: Wido den Hollander, ceph-devel@vger.kernel.org

On Tue, May 7, 2013 at 12:45 PM, John Nielsen <lists@jnielsen.net> wrote:
> On Jan 22, 2013, at 9:41 AM, Wido den Hollander <wido@widodh.nl> wrote:
>
>> To improve performance I still think using a proxy like Varnish or Nginx can help a lot.
>>
>> Now, when running just one Varnish instance which does loadbalancing over multiple RGW instances is not a real problem. When it sees a PUT operation it can "purge" (called banning in Varnish) the object from it's cache.
>>
>> When looking at the scenario where you have multiple caches you run into the cache-consistency problem. If an object is modified the caches are not notified and will continue to serve an outdated object.
>>
>> Looking at the Last-Modified header is not an option since the cache will not contact RGW when serving out of it's cache.
>>
>> To handle this there has to be some kind of "hook" inside RGW that can notify Varnish (or some other cache) when an object changes.
>>
>> Is this something that's on the roadmap? Thoughts?
>
> Responding to an old thread, but I've been thinking about this again lately.
>
> I agree it would be nice for an upstream cache to know when its content is invalid. I realized today that RGW already does this for itself--multiple gateways on the same RADOS cluster share cache state with each other by updating and rados_watch()ing the notify_oid's in the .rgw.control pool.

Not quite. The gateway only caches metadata, not data. I don't think
doing it for data would make much sense, performance wise.
>
> I would like to see a small "RGW cache notifier" utility that just does the "watch" half. It could be a standalone read-only librados client with configurable output. You would tell it what pool to watch (i.e. ".rgw.control") and it would spit out event details as they happen. A small amount of scripting or other glue could turn this utility in to a real-time cache invalidator for Varnish or anything else.
>
> Any comments or volunteers? If I'm not mistaken (which is by no means certain), someone familiar with the code could add such a utility by creating a new .cc file with a simple main() function and a modified RGWCache::watch_cb() function, then linking it to the build with a stanza in Makefile.am similar to that for radosgw-admin. I might even take a crack at it myself but my C++ is more than a bit rusty.
>
> In any case, comments welcome.
>
> JN
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RGW object purging in upstream caches
  2013-05-07 22:25   ` Yehuda Sadeh
@ 2013-05-07 23:01     ` John Nielsen
  0 siblings, 0 replies; 5+ messages in thread
From: John Nielsen @ 2013-05-07 23:01 UTC (permalink / raw)
  To: Yehuda Sadeh; +Cc: Wido den Hollander, ceph-devel@vger.kernel.org

On May 7, 2013, at 4:25 PM, Yehuda Sadeh <yehuda@inktank.com> wrote:

>> Responding to an old thread, but I've been thinking about this again lately.
>> 
>> I agree it would be nice for an upstream cache to know when its content is invalid. I realized today that RGW already does this for itself--multiple gateways on the same RADOS cluster share cache state with each other by updating and rados_watch()ing the notify_oid's in the .rgw.control pool.
> 
> Not quite. The gateway only caches metadata, not data. I don't think
> doing it for data would make much sense, performance wise.

If we only want to know when an object has changed (so we can invalidate it in our external cache), isn't the metadata sufficient? Presumably it changes whenever the data changes. If the external cache wants a fresh copy of the data it can request it directly; there's no need for the helper utility I have in mind to pass any actual object data.

Is my idea of watching the (rados) control objects to learn about updates to rgw objects feasible? The watcher would just need to supply the outside cache with a key to identify the object that should be evicted--maybe the etag?



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-05-07 23:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-22 16:41 RGW object purging in upstream caches Wido den Hollander
2013-01-22 17:15 ` Jeff Mitchell
2013-05-07 19:45 ` John Nielsen
2013-05-07 22:25   ` Yehuda Sadeh
2013-05-07 23:01     ` John Nielsen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.