All of lore.kernel.org
 help / color / mirror / Atom feed
* sharded collection list
@ 2015-06-02 22:54 Sage Weil
  2015-06-04 13:30 ` John Spray
  0 siblings, 1 reply; 2+ messages in thread
From: Sage Weil @ 2015-06-02 22:54 UTC (permalink / raw)
  To: john.spray, sjust; +Cc: ceph-devel

Hey John-

So the shared pgls stuff has collided a bit with the looming hobject 
sorting changes.  Sam and I just talked about it a bit and came up 
with what librados API would be most appealing:

 - the listing API would have start/end markers

 - it would be driven by a new opaque type rados_list_cursor_t, which is 
just data, no state, and internally is just an hobject_t.

 - it would be totally stateless.. kill the [N]ListContext stuff in 
Objecter (and reimplement a simple wrapper in librados.cc or even .h).  
Note that the important bits of state there now are

 epoch (needed for detecting split; this will go away with a better cursor)
 result buffer (we can drop this)
 nspace (part of the ioctx, it just tags each request)
 cookie (this basically becomes the cursor .. it's just an hobject_t typedef)

 - the list could take a start cursor, optional end cursor, and output the 
next cursor to continue from.

 - we'd lose the buffering that ListContext currently does, which means 
that the request that goes over the wire will return the same number 
of entries that the C caller asks for.  The C++ interface is an iterator 
so it'll have to do its own buffering, but that should be pretty 
trivial...

 - we should kill these calls, which were never used:

 CEPH_RADOS_API uint32_t rados_nobjects_list_get_pg_hash_position(rados_list_ctx_t ctx);

 CEPH_RADOS_API uint32_t rados_nobjects_list_seek(rados_list_ctx_t ctx,
                                                  uint32_t pos);

 - we'd add a new call that is something like 

 int rados_construct_iterator(ioctx, int n, int m, cursor *out);

so that you can get a position partway through the pg.

What do you think?  Unfortunately it is quite a departure from what you 
implemented already but I think it'll be a net simplification *and* 
let you do all the things we want, like

 - get a set of ranges to list form
 - change our mind partway through to break things into smaller shards 
without losing previous work
 - start listing from a random position in the pool

You could even list a single hash value by constructing a cursor with 
n=hash and n=hash=1 and m=2^32.

What do you think?
sage


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-06-04 13:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-02 22:54 sharded collection list Sage Weil
2015-06-04 13:30 ` John Spray

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.