* xio-rados-firefly branch update
[not found] <586483842.112.1398195579489.JavaMail.root@thunderbeast.private.linuxbox.com>
@ 2014-04-22 19:44 ` Matt W. Benjamin
2014-04-22 19:58 ` Gregory Farnum
0 siblings, 1 reply; 8+ messages in thread
From: Matt W. Benjamin @ 2014-04-22 19:44 UTC (permalink / raw)
To: ceph-devel; +Cc: Gregory Farnum, Yaron Haviv, Eyal Salomon
Hi,
We've completed a trial merge of our current XioMessenger branches to Ceph
upstream master. The resulting branch is on our public Ceph development
repo, on banch xio-rados-firefly:
https://github.com/linuxbox2/linuxbox-ceph/commits/xio-rados-firefly
This branch isn't fully vetted. We haven't re-run librados tests on it, and
there may be issues. In addition, the core XioMessenger code on this branch
has at least one open bug (a race condition) that appeared after we switched
to Accelio one-way messages (which streamline the workflow, and are faster than
the original request-response code). We'll send an update on this to the
list as soon as we can (later this week).
There is also ongoing merge work (making XioMessenger work with the rest of
Ceph), and open items (e.g., XioMessenger doesn't implement CephX).
What I think is good about this branch is it has all the commits broken out,
starting with general changes to the Messenger class family, for review.
Thanks and regards,
Matt
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104
http://linuxbox.com
tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update
2014-04-22 19:44 ` Matt W. Benjamin
@ 2014-04-22 19:58 ` Gregory Farnum
0 siblings, 0 replies; 8+ messages in thread
From: Gregory Farnum @ 2014-04-22 19:58 UTC (permalink / raw)
To: Matt W. Benjamin; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon
Awesome! I'll try and take a preliminary look at this in the next day
or two. What kind of feedback are you interested in right now?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Apr 22, 2014 at 12:44 PM, Matt W. Benjamin <matt@linuxbox.com> wrote:
> Hi,
>
> We've completed a trial merge of our current XioMessenger branches to Ceph
> upstream master. The resulting branch is on our public Ceph development
> repo, on banch xio-rados-firefly:
>
> https://github.com/linuxbox2/linuxbox-ceph/commits/xio-rados-firefly
>
> This branch isn't fully vetted. We haven't re-run librados tests on it, and
> there may be issues. In addition, the core XioMessenger code on this branch
> has at least one open bug (a race condition) that appeared after we switched
> to Accelio one-way messages (which streamline the workflow, and are faster than
> the original request-response code). We'll send an update on this to the
> list as soon as we can (later this week).
>
> There is also ongoing merge work (making XioMessenger work with the rest of
> Ceph), and open items (e.g., XioMessenger doesn't implement CephX).
>
> What I think is good about this branch is it has all the commits broken out,
> starting with general changes to the Messenger class family, for review.
>
> Thanks and regards,
>
> Matt
>
> --
> Matt Benjamin
> The Linux Box
> 206 South Fifth Ave. Suite 150
> Ann Arbor, MI 48104
>
> http://linuxbox.com
>
> tel. 734-761-4689
> fax. 734-769-8938
> cel. 734-216-5309
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update
[not found] <226225586.131.1398202885165.JavaMail.root@thunderbeast.private.linuxbox.com>
@ 2014-04-22 21:50 ` Matt W. Benjamin
2014-04-28 23:40 ` Gregory Farnum
0 siblings, 1 reply; 8+ messages in thread
From: Matt W. Benjamin @ 2014-04-22 21:50 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon
Hi Greg,
Sure. I'm interested in all feedback you come up with, but for starters,
it would be helpful to have feedback on
1. the Messenger reorg in general, and perhaps specifically how to think
about the throttle ideas that I shoved into SimplePolicyMessenger
2. about the completion mechanisms--with the addition of claim_data, this
is provisionally complete
3. about DispatchStrategy and non-prio queueing, and how the way we're doing
things relates to your parallel work with SimpleMessenger
Thanks,
Matt
----- "Gregory Farnum" <greg@inktank.com> wrote:
> Awesome! I'll try and take a preliminary look at this in the next day
> or two. What kind of feedback are you interested in right now?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
> On Tue, Apr 22, 2014 at 12:44 PM, Matt W. Benjamin <matt@linuxbox.com>
> wrote:
> > Hi,
> >
> > We've completed a trial merge of our current XioMessenger branches
> to Ceph
> > upstream master. The resulting branch is on our public Ceph
> development
> > repo, on banch xio-rados-firefly:
> >
> >
> https://github.com/linuxbox2/linuxbox-ceph/commits/xio-rados-firefly
> >
> > This branch isn't fully vetted. We haven't re-run librados tests on
> it, and
> > there may be issues. In addition, the core XioMessenger code on
> this branch
> > has at least one open bug (a race condition) that appeared after we
> switched
> > to Accelio one-way messages (which streamline the workflow, and are
> faster than
> > the original request-response code). We'll send an update on this
> to the
> > list as soon as we can (later this week).
> >
> > There is also ongoing merge work (making XioMessenger work with the
> rest of
> > Ceph), and open items (e.g., XioMessenger doesn't implement CephX).
> >
> > What I think is good about this branch is it has all the commits
> broken out,
> > starting with general changes to the Messenger class family, for
> review.
> >
> > Thanks and regards,
> >
> > Matt
> >
> > --
> > Matt Benjamin
> > The Linux Box
> > 206 South Fifth Ave. Suite 150
> > Ann Arbor, MI 48104
> >
> > http://linuxbox.com
> >
> > tel. 734-761-4689
> > fax. 734-769-8938
> > cel. 734-216-5309
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104
http://linuxbox.com
tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update
2014-04-22 21:50 ` xio-rados-firefly branch update Matt W. Benjamin
@ 2014-04-28 23:40 ` Gregory Farnum
2014-04-29 0:42 ` Josh Durgin
2014-04-29 3:14 ` Matt W. Benjamin
0 siblings, 2 replies; 8+ messages in thread
From: Gregory Farnum @ 2014-04-28 23:40 UTC (permalink / raw)
To: Matt W. Benjamin; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon
A few days later than I wanted, but I got through various pieces of
this today. It wasn't a thorough review but more a "shape of things"
check, but I have a bunch of notes.
On Tue, Apr 22, 2014 at 2:50 PM, Matt W. Benjamin <matt@linuxbox.com> wrote:
> Hi Greg,
>
> Sure. I'm interested in all feedback you come up with, but for starters,
> it would be helpful to have feedback on
>
> 1. the Messenger reorg in general, and perhaps specifically how to think
> about the throttle ideas that I shoved into SimplePolicyMessenger
The re-org mostly looks fine.
I notice you're adding a few more "friend" declarations though, and I
don't think those should be necessary — Connection can label stuff as
protected instead of private to help with that.
PipeConnection should probably live with Pipe or SimpleMessenger
instead of Connection? Certainly don't want the backwards knowledge
that "friend class PipeConnection" implies.
It would be better if we didn't need to add members to the shared
classes in order to allow one implementation but not the other
(looking particularly at bi::list_member_hook<> dispatch_q).
SimplePolicyMessenger seems a bit weird — it's just a
shared-implementation class, but it doesn't hide any of the meaning of
the policies away from you or provide interfaces to do the work for
you. In particular, it doesn't look like the Xio stuff is actually
following any of the throttle rules about how much data is allowed in
from clients, and SimplePolicyMessenger isn't helping it do so.
> 2. about the completion mechanisms--with the addition of claim_data, this
> is provisionally complete
Can you talk about this a bit in general before I comment? I want to
make sure I understand what's going on here, particularly with the way
you're doing memory allocation in the mempools and handling the
refcounting.
> 3. about DispatchStrategy and non-prio queueing, and how the way we're doing
> things relates to your parallel work with SimpleMessenger
This looks a bit problematic. In particular, the standard
SimpleMessenger interface is responsible for delivery fairness between
different peers, and none of your examples provide that. I suppose
they do have the information needed to do so as part of the
Message::connection member.
My work on speeding up the SimpleMessenger introduces "fast dispatch"
for specific types of messages, and shifts responsibility for fairness
onto the Dispatcher. It's essentially your "FastStrategy" but via
separate delivery interfaces (so the daemon can do less locking, etc).
Other notes of varying import:
XioMessenger doesn't implement any of the mark_down stuff. I realize
that those interfaces are a bit weird for a "persistent-in-hardware"
connection, but we need some way of simulating them — the Dispatcher
uses those interfaces to toss everything out and start over, making
sure messages don't survive from one daemon invocation to another. (A
separate daemon invocation from the cluster's perspective might still
be the same daemon on the local machine, if it went unresponsive for
some reason.)
Is MDataPing just for your messenger testing bits?
The buffer changes appear to have some unnecessary Xio class
declarations in buffer.h, and it’d be nice if all of that was guarded
by HAVE_XIO blocks.
Do you actually want the only exposed interface to be create_msg()? It
might become convenient to be able to allocate xio memory space
elsewhere and give it to the messenger (for instance, reading data off
disk for transmission to a client).
Is "new (bp) xio_msg_buffer(m_hook, buf, len);” some syntax I’m
unaware of, or a merge/rebase issue? (Seriously, no idea. :)
What is CEPH_ENTITY_TYPE_GENERIC_SERVER for?
Looking at your example setups in OSD and Monitor, I'm not entirely
thrilled with adding new parameters for an XioMessenger in addition to
SimpleMessenger. I'd rather we came up with some interface so this was
all transparent to the daemon code and endpoint selection was handled
in the messenger layer.
This is also introducing some difficulties with monitor identification
— they are pretty much identified by IP:port, and having two separate
addresses for a single monitor is going to get weird.
I notice that (for the OSD) you’re setting the “poison pill” on
ms_public twice rather than adding it to ms_xio_public.
Thanks!
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update
2014-04-28 23:40 ` Gregory Farnum
@ 2014-04-29 0:42 ` Josh Durgin
2014-04-29 3:14 ` Matt W. Benjamin
1 sibling, 0 replies; 8+ messages in thread
From: Josh Durgin @ 2014-04-29 0:42 UTC (permalink / raw)
To: Gregory Farnum, Matt W. Benjamin; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon
On 04/28/2014 04:40 PM, Gregory Farnum wrote:
> The buffer changes appear to have some unnecessary Xio class
> declarations in buffer.h, and it’d be nice if all of that was guarded
> by HAVE_XIO blocks.
buffer.h is exposed by the librados c++ api, so hopefully nothing
xio-specific really needs to be exposed there.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update
2014-04-28 23:40 ` Gregory Farnum
2014-04-29 0:42 ` Josh Durgin
@ 2014-04-29 3:14 ` Matt W. Benjamin
2014-04-29 12:00 ` Matt W. Benjamin
2014-05-01 22:26 ` Gregory Farnum
1 sibling, 2 replies; 8+ messages in thread
From: Matt W. Benjamin @ 2014-04-29 3:14 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon
Hi Greg,
----- "Gregory Farnum" <greg@inktank.com> wrote:
>
> The re-org mostly looks fine.
> I notice you're adding a few more "friend" declarations though, and I
> don't think those should be necessary — Connection can label stuff as
> protected instead of private to help with that.
> PipeConnection should probably live with Pipe or SimpleMessenger
> instead of Connection? Certainly don't want the backwards knowledge
> that "friend class PipeConnection" implies.
Ok.
> It would be better if we didn't need to add members to the shared
> classes in order to allow one implementation but not the other
> (looking particularly at bi::list_member_hook<> dispatch_q).
No, but this is a potential optimization for both/all Messengers. It
could also be worth revisiting whether Message could be specialized by
Messenger.
>
> SimplePolicyMessenger seems a bit weird — it's just a
> shared-implementation class, but it doesn't hide any of the meaning
> of
> the policies away from you or provide interfaces to do the work for
> you. In particular, it doesn't look like the Xio stuff is actually
> following any of the throttle rules about how much data is allowed in
> from clients, and SimplePolicyMessenger isn't helping it do so.
I agree. What I meant by "shoving" all this stuff into SimplePolicyMessenger
was that we hadn't examined the policy interfaces fully, and were just
trying to clean up the interface, essentially, so XioMessenger could be
clearer. There's more work to do here, at least forward looking.
>
>
> > 2. about the completion mechanisms--with the addition of claim_data,
> this
> > is provisionally complete
>
> Can you talk about this a bit in general before I comment? I want to
> make sure I understand what's going on here, particularly with the
> way
> you're doing memory allocation in the mempools and handling the
> refcounting.
Yes, and also, we have just pushed a major update to this logic on our
current stabilization branch (xio-rados-exp-noreg-buf-qs-upstream-ow-mdw).
The logic in the last version was overly complex. We streamlined things,
and also have switched to using the Xio one-way message paradigm, which
as you mentioned a while back, is closer to Ceph's, and also has higher
performance.
>
> > 3. about DispatchStrategy and non-prio queueing, and how the way
> we're doing
> > things relates to your parallel work with SimpleMessenger
>
> This looks a bit problematic. In particular, the standard
> SimpleMessenger interface is responsible for delivery fairness
> between
> different peers, and none of your examples provide that. I suppose
> they do have the information needed to do so as part of the
> Message::connection member.
> My work on speeding up the SimpleMessenger introduces "fast dispatch"
> for specific types of messages, and shifts responsibility for
> fairness
> onto the Dispatcher. It's essentially your "FastStrategy" but via
> separate delivery interfaces (so the daemon can do less locking,
> etc).
I agree, there's a bit more cleanup needed here. Initially, I only had
the equivalent of FastStrategy. We realized that we wanted to make the
Dispatcher responsible for threading model, but didn't think we could
really tackle that without more feedback, so we made it possible to do
out-of-line delivery with the QueueStrategy, pending further discussion.
>
> Other notes of varying import:
> XioMessenger doesn't implement any of the mark_down stuff. I realize
> that those interfaces are a bit weird for a "persistent-in-hardware"
> connection, but we need some way of simulating them — the Dispatcher
> uses those interfaces to toss everything out and start over, making
> sure messages don't survive from one daemon invocation to another. (A
> separate daemon invocation from the cluster's perspective might still
> be the same daemon on the local machine, if it went unresponsive for
> some reason.)
Agree. Again, we tried to simplify the complexity away provisionally.
>
> Is MDataPing just for your messenger testing bits?
Yes.
>
> The buffer changes appear to have some unnecessary Xio class
> declarations in buffer.h, and it’d be nice if all of that was guarded
> by HAVE_XIO blocks.
> Do you actually want the only exposed interface to be create_msg()?
> It
> might become convenient to be able to allocate xio memory space
> elsewhere and give it to the messenger (for instance, reading data
> off
> disk for transmission to a client).
Yes, we only need to expose create_msg(), and (Josh) we could do it in
another header, or whatever.
>
> Is "new (bp) xio_msg_buffer(m_hook, buf, len);” some syntax I’m
> unaware of, or a merge/rebase issue? (Seriously, no idea. :)
This is C++ "placement new."
>
> What is CEPH_ENTITY_TYPE_GENERIC_SERVER for?
This was just cargo-cult coding. It's a server that isn't one of
the known types (like a test harness).
>
> Looking at your example setups in OSD and Monitor, I'm not entirely
> thrilled with adding new parameters for an XioMessenger in addition
> to
> SimpleMessenger. I'd rather we came up with some interface so this
> was
> all transparent to the daemon code and endpoint selection was handled
> in the messenger layer.
> This is also introducing some difficulties with monitor
> identification
> — they are pretty much identified by IP:port, and having two separate
> addresses for a single monitor is going to get weird.
> I notice that (for the OSD) you’re setting the “poison pill” on
> ms_public twice rather than adding it to ms_xio_public.
We have a bunch of work planned here. Our current thinking is to create
a facade (named something like MessengerGroup) that encapsulates all
the messengers that an endpoint deals with. It goes together with an
address-list extension to entity_name_t, as we talked about briefly.
Thank you, Greg (and Josh)!
>
> Thanks!
> -Greg
--
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104
http://linuxbox.com
tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update
2014-04-29 3:14 ` Matt W. Benjamin
@ 2014-04-29 12:00 ` Matt W. Benjamin
2014-05-01 22:26 ` Gregory Farnum
1 sibling, 0 replies; 8+ messages in thread
From: Matt W. Benjamin @ 2014-04-29 12:00 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon
Hi Greg,
I forgot to finish a couple of thoughts, filling them in...
----- "Matt W. Benjamin" <matt@linuxbox.com> wrote:
> > > 2. about the completion mechanisms--with the addition of
> claim_data,
> > this
> > > is provisionally complete
> >
> > Can you talk about this a bit in general before I comment? I want
> to
> > make sure I understand what's going on here, particularly with the
> > way
> > you're doing memory allocation in the mempools and handling the
> > refcounting.
I didn't actually talk about it :). The mempools are used everywhere
to avoid allocation in XioMessenger fast paths (which is most of
them). We arrived at a need for this via profiling and measurement,
of course.
The most complicated case is reclaiming memory for a Message and related
structures on the responder side. We manage the memory together using
a per-request pool, which is a member of the completion hook object.
(The backing store is the Xio pool facility.) The decoded Message itself
(assuming we successfully decode) as well as each tracking Ceph buffer
(logic for which we re-wrote from last version, as noted earlier) holds
a reference on the hook object. When the last of those refs is released
(the upper-layer code is done with Message and with any buffers which
it claimed) a cleamup process is started on the completion hook. This
takes a new initial reference (yes, it's atomic) on the hook, which is
handed off to the XioPortal thread which originally delivered it, for
final cleanup. That thread needs to be the one which returns xio_msg
buffers to Accelio. Once that is done, the portal thread returns the
last reference to the completion hook, and it is recycled also.
>
> Yes, and also, we have just pushed a major update to this logic on
> our
> current stabilization branch
> (xio-rados-exp-noreg-buf-qs-upstream-ow-mdw).
> The logic in the last version was overly complex. We streamlined
> things,
> and also have switched to using the Xio one-way message paradigm,
> which
> as you mentioned a while back, is closer to Ceph's, and also has
> higher
> performance.
>
> >
> > Is "new (bp) xio_msg_buffer(m_hook, buf, len);” some syntax I’m
> > unaware of, or a merge/rebase issue? (Seriously, no idea. :)
>
> This is C++ "placement new."
The placement new syntax is the way you substitute your own memory allocation
strategy for built-in new and delete (which typically use the system
allocator, of course). Essentially, this step calls the desired constructor
on the already-allocated memory. This comes up because we're using Accelio's
pool allocator.
>
--
Matt Benjamin
CohortFS, LLC.
206 South Fifth Ave. Suite 150
Ann Arbor, MI 48104
http://cohortfs.com
tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update
2014-04-29 3:14 ` Matt W. Benjamin
2014-04-29 12:00 ` Matt W. Benjamin
@ 2014-05-01 22:26 ` Gregory Farnum
1 sibling, 0 replies; 8+ messages in thread
From: Gregory Farnum @ 2014-05-01 22:26 UTC (permalink / raw)
To: Matt W. Benjamin; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon
On Mon, Apr 28, 2014 at 8:14 PM, Matt W. Benjamin <matt@linuxbox.com> wrote:
> Hi Greg,
>
> ----- "Gregory Farnum" <greg@inktank.com> wrote:
>
>>
>> The re-org mostly looks fine.
>> I notice you're adding a few more "friend" declarations though, and I
>> don't think those should be necessary — Connection can label stuff as
>> protected instead of private to help with that.
>> PipeConnection should probably live with Pipe or SimpleMessenger
>> instead of Connection? Certainly don't want the backwards knowledge
>> that "friend class PipeConnection" implies.
>
> Ok.
>
>> It would be better if we didn't need to add members to the shared
>> classes in order to allow one implementation but not the other
>> (looking particularly at bi::list_member_hook<> dispatch_q).
>
> No, but this is a potential optimization for both/all Messengers. It
> could also be worth revisiting whether Message could be specialized by
> Messenger.
>
>>
>> SimplePolicyMessenger seems a bit weird — it's just a
>> shared-implementation class, but it doesn't hide any of the meaning
>> of
>> the policies away from you or provide interfaces to do the work for
>> you. In particular, it doesn't look like the Xio stuff is actually
>> following any of the throttle rules about how much data is allowed in
>> from clients, and SimplePolicyMessenger isn't helping it do so.
>
> I agree. What I meant by "shoving" all this stuff into SimplePolicyMessenger
> was that we hadn't examined the policy interfaces fully, and were just
> trying to clean up the interface, essentially, so XioMessenger could be
> clearer. There's more work to do here, at least forward looking.
I guess I'd like to hear how you think that might work — are you just
going to re-implement the DispatchQueue inside of a
SimplePolicyMessenger? Does the interface actually allow you to do
that? With TCP it's easy to stop accepting data from somebody — just
stop reading off their socket, which we do by blocking on the
Throttler in the socket reader loop. That seems harder here and IIRC
you're passing the messages out asynchronously with anything that can
turn a throttle limit into backpressure.
>> > 2. about the completion mechanisms--with the addition of claim_data,
>> this
>> > is provisionally complete
>>
>> Can you talk about this a bit in general before I comment? I want to
>> make sure I understand what's going on here, particularly with the
>> way
>> you're doing memory allocation in the mempools and handling the
>> refcounting.
>
> Yes, and also, we have just pushed a major update to this logic on our
> current stabilization branch (xio-rados-exp-noreg-buf-qs-upstream-ow-mdw).
> The logic in the last version was overly complex. We streamlined things,
> and also have switched to using the Xio one-way message paradigm, which
> as you mentioned a while back, is closer to Ceph's, and also has higher
> performance.
...
> I didn't actually talk about it :). The mempools are used everywhere
> to avoid allocation in XioMessenger fast paths (which is most of
> them). We arrived at a need for this via profiling and measurement,
> of course.
>
> The most complicated case is reclaiming memory for a Message and related
> structures on the responder side. We manage the memory together using
> a per-request pool, which is a member of the completion hook object.
> (The backing store is the Xio pool facility.) The decoded Message itself
> (assuming we successfully decode) as well as each tracking Ceph buffer
> (logic for which we re-wrote from last version, as noted earlier) holds
> a reference on the hook object. When the last of those refs is released
> (the upper-layer code is done with Message and with any buffers which
> it claimed) a cleamup process is started on the completion hook. This
> takes a new initial reference (yes, it's atomic) on the hook, which is
> handed off to the XioPortal thread which originally delivered it, for
> final cleanup. That thread needs to be the one which returns xio_msg
> buffers to Accelio. Once that is done, the portal thread returns the
> last reference to the completion hook, and it is recycled also.
Okay, I think this makes sense. I was hoping we could just make use of
static functions or something instead of carrying around a hook on
every message, but I can't think of a good way to do that while
supporting multiple messengers in a system.
>
>>
>> > 3. about DispatchStrategy and non-prio queueing, and how the way
>> we're doing
>> > things relates to your parallel work with SimpleMessenger
>>
>> This looks a bit problematic. In particular, the standard
>> SimpleMessenger interface is responsible for delivery fairness
>> between
>> different peers, and none of your examples provide that. I suppose
>> they do have the information needed to do so as part of the
>> Message::connection member.
>> My work on speeding up the SimpleMessenger introduces "fast dispatch"
>> for specific types of messages, and shifts responsibility for
>> fairness
>> onto the Dispatcher. It's essentially your "FastStrategy" but via
>> separate delivery interfaces (so the daemon can do less locking,
>> etc).
>
> I agree, there's a bit more cleanup needed here. Initially, I only had
> the equivalent of FastStrategy. We realized that we wanted to make the
> Dispatcher responsible for threading model, but didn't think we could
> really tackle that without more feedback, so we made it possible to do
> out-of-line delivery with the QueueStrategy, pending further discussion.
Hrm. "mak[ing] the Dispatcher responsible for threading model" does
not really fill me with warm fuzzy thoughts. I don't want to change
those interface without very good reason — it took a long time just to
do the fast dispatch work in the OSD and will be much more code churn
in the other daemons; I'm not really sure how we could do make it the
Dispatcher's responsibility except by moving the code over but I think
it lives better in the Messenger, because it has so much more
information available about what connections are busy, etc; but maybe
there's something I'm missing. I didn't read through much of the
XioMessenger implementation, but from the parts I looked at and the
interfaces I'm seeing I think we need to address this soon.
>
>>
>> Other notes of varying import:
>> XioMessenger doesn't implement any of the mark_down stuff. I realize
>> that those interfaces are a bit weird for a "persistent-in-hardware"
>> connection, but we need some way of simulating them — the Dispatcher
>> uses those interfaces to toss everything out and start over, making
>> sure messages don't survive from one daemon invocation to another. (A
>> separate daemon invocation from the cluster's perspective might still
>> be the same daemon on the local machine, if it went unresponsive for
>> some reason.)
>
> Agree. Again, we tried to simplify the complexity away provisionally.
Again, from what I'm seeing this is something that needs to get
addressed soon. These interfaces aren't minor bolt-ons even for a
TCP-based messenger, and I suspect it will be more work for an
RDMA-style one.
Along some similar lines, I didn't look closely enough to see how
you're handling stuff like the lossy/non-lossy and server/client
distinctions. Have you taken those on yet?
>> The buffer changes appear to have some unnecessary Xio class
>> declarations in buffer.h, and it’d be nice if all of that was guarded
>> by HAVE_XIO blocks.
>> Do you actually want the only exposed interface to be create_msg()?
>> It
>> might become convenient to be able to allocate xio memory space
>> elsewhere and give it to the messenger (for instance, reading data
>> off
>> disk for transmission to a client).
>
> Yes, we only need to expose create_msg(), and (Josh) we could do it in
> another header, or whatever.
>
>>
>> Is "new (bp) xio_msg_buffer(m_hook, buf, len);” some syntax I’m
>> unaware of, or a merge/rebase issue? (Seriously, no idea. :)
>
> This is C++ "placement new."
Ah, nifty. Not sure why I haven't run across that before.
>> Looking at your example setups in OSD and Monitor, I'm not entirely
>> thrilled with adding new parameters for an XioMessenger in addition
>> to
>> SimpleMessenger. I'd rather we came up with some interface so this
>> was
>> all transparent to the daemon code and endpoint selection was handled
>> in the messenger layer.
>> This is also introducing some difficulties with monitor
>> identification
>> — they are pretty much identified by IP:port, and having two separate
>> addresses for a single monitor is going to get weird.
>> I notice that (for the OSD) you’re setting the “poison pill” on
>> ms_public twice rather than adding it to ms_xio_public.
>
> We have a bunch of work planned here. Our current thinking is to create
> a facade (named something like MessengerGroup) that encapsulates all
> the messengers that an endpoint deals with. It goes together with an
> address-list extension to entity_name_t, as we talked about briefly.
Yeah, that sounds good. Just wanted to check.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-05-01 22:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <226225586.131.1398202885165.JavaMail.root@thunderbeast.private.linuxbox.com>
2014-04-22 21:50 ` xio-rados-firefly branch update Matt W. Benjamin
2014-04-28 23:40 ` Gregory Farnum
2014-04-29 0:42 ` Josh Durgin
2014-04-29 3:14 ` Matt W. Benjamin
2014-04-29 12:00 ` Matt W. Benjamin
2014-05-01 22:26 ` Gregory Farnum
[not found] <586483842.112.1398195579489.JavaMail.root@thunderbeast.private.linuxbox.com>
2014-04-22 19:44 ` Matt W. Benjamin
2014-04-22 19:58 ` Gregory Farnum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.