* sharded collection list
@ 2015-06-02 22:54 Sage Weil
2015-06-04 13:30 ` John Spray
0 siblings, 1 reply; 2+ messages in thread
From: Sage Weil @ 2015-06-02 22:54 UTC (permalink / raw)
To: john.spray, sjust; +Cc: ceph-devel
Hey John-
So the shared pgls stuff has collided a bit with the looming hobject
sorting changes. Sam and I just talked about it a bit and came up
with what librados API would be most appealing:
- the listing API would have start/end markers
- it would be driven by a new opaque type rados_list_cursor_t, which is
just data, no state, and internally is just an hobject_t.
- it would be totally stateless.. kill the [N]ListContext stuff in
Objecter (and reimplement a simple wrapper in librados.cc or even .h).
Note that the important bits of state there now are
epoch (needed for detecting split; this will go away with a better cursor)
result buffer (we can drop this)
nspace (part of the ioctx, it just tags each request)
cookie (this basically becomes the cursor .. it's just an hobject_t typedef)
- the list could take a start cursor, optional end cursor, and output the
next cursor to continue from.
- we'd lose the buffering that ListContext currently does, which means
that the request that goes over the wire will return the same number
of entries that the C caller asks for. The C++ interface is an iterator
so it'll have to do its own buffering, but that should be pretty
trivial...
- we should kill these calls, which were never used:
CEPH_RADOS_API uint32_t rados_nobjects_list_get_pg_hash_position(rados_list_ctx_t ctx);
CEPH_RADOS_API uint32_t rados_nobjects_list_seek(rados_list_ctx_t ctx,
uint32_t pos);
- we'd add a new call that is something like
int rados_construct_iterator(ioctx, int n, int m, cursor *out);
so that you can get a position partway through the pg.
What do you think? Unfortunately it is quite a departure from what you
implemented already but I think it'll be a net simplification *and*
let you do all the things we want, like
- get a set of ranges to list form
- change our mind partway through to break things into smaller shards
without losing previous work
- start listing from a random position in the pool
You could even list a single hash value by constructing a cursor with
n=hash and n=hash=1 and m=2^32.
What do you think?
sage
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: sharded collection list
2015-06-02 22:54 sharded collection list Sage Weil
@ 2015-06-04 13:30 ` John Spray
0 siblings, 0 replies; 2+ messages in thread
From: John Spray @ 2015-06-04 13:30 UTC (permalink / raw)
To: Sage Weil, sjust; +Cc: ceph-devel
Talked about this elsewhere but for the benefit of the list:
* The API suggested here looks nicer to me too
* This depends on the new PGLS ordering OSD side, so that has to land
before this
* In the meantime I've rebased the #9964 (rados import/export) branch
to not depend on sharded pgls
Cheers,
John
On 02/06/2015 23:54, Sage Weil wrote:
> Hey John-
>
> So the shared pgls stuff has collided a bit with the looming hobject
> sorting changes. Sam and I just talked about it a bit and came up
> with what librados API would be most appealing:
>
> - the listing API would have start/end markers
>
> - it would be driven by a new opaque type rados_list_cursor_t, which is
> just data, no state, and internally is just an hobject_t.
>
> - it would be totally stateless.. kill the [N]ListContext stuff in
> Objecter (and reimplement a simple wrapper in librados.cc or even .h).
> Note that the important bits of state there now are
>
> epoch (needed for detecting split; this will go away with a better cursor)
> result buffer (we can drop this)
> nspace (part of the ioctx, it just tags each request)
> cookie (this basically becomes the cursor .. it's just an hobject_t typedef)
>
> - the list could take a start cursor, optional end cursor, and output the
> next cursor to continue from.
>
> - we'd lose the buffering that ListContext currently does, which means
> that the request that goes over the wire will return the same number
> of entries that the C caller asks for. The C++ interface is an iterator
> so it'll have to do its own buffering, but that should be pretty
> trivial...
>
> - we should kill these calls, which were never used:
>
> CEPH_RADOS_API uint32_t rados_nobjects_list_get_pg_hash_position(rados_list_ctx_t ctx);
>
> CEPH_RADOS_API uint32_t rados_nobjects_list_seek(rados_list_ctx_t ctx,
> uint32_t pos);
>
> - we'd add a new call that is something like
>
> int rados_construct_iterator(ioctx, int n, int m, cursor *out);
>
> so that you can get a position partway through the pg.
>
> What do you think? Unfortunately it is quite a departure from what you
> implemented already but I think it'll be a net simplification *and*
> let you do all the things we want, like
>
> - get a set of ranges to list form
> - change our mind partway through to break things into smaller shards
> without losing previous work
> - start listing from a random position in the pool
>
> You could even list a single hash value by constructing a cursor with
> n=hash and n=hash=1 and m=2^32.
>
> What do you think?
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-06-04 13:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-02 22:54 sharded collection list Sage Weil
2015-06-04 13:30 ` John Spray
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.