* xio-rados-firefly branch update [not found] <586483842.112.1398195579489.JavaMail.root@thunderbeast.private.linuxbox.com> @ 2014-04-22 19:44 ` Matt W. Benjamin 2014-04-22 19:58 ` Gregory Farnum 0 siblings, 1 reply; 8+ messages in thread From: Matt W. Benjamin @ 2014-04-22 19:44 UTC (permalink / raw) To: ceph-devel; +Cc: Gregory Farnum, Yaron Haviv, Eyal Salomon Hi, We've completed a trial merge of our current XioMessenger branches to Ceph upstream master. The resulting branch is on our public Ceph development repo, on banch xio-rados-firefly: https://github.com/linuxbox2/linuxbox-ceph/commits/xio-rados-firefly This branch isn't fully vetted. We haven't re-run librados tests on it, and there may be issues. In addition, the core XioMessenger code on this branch has at least one open bug (a race condition) that appeared after we switched to Accelio one-way messages (which streamline the workflow, and are faster than the original request-response code). We'll send an update on this to the list as soon as we can (later this week). There is also ongoing merge work (making XioMessenger work with the rest of Ceph), and open items (e.g., XioMessenger doesn't implement CephX). What I think is good about this branch is it has all the commits broken out, starting with general changes to the Messenger class family, for review. Thanks and regards, Matt -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update 2014-04-22 19:44 ` xio-rados-firefly branch update Matt W. Benjamin @ 2014-04-22 19:58 ` Gregory Farnum 0 siblings, 0 replies; 8+ messages in thread From: Gregory Farnum @ 2014-04-22 19:58 UTC (permalink / raw) To: Matt W. Benjamin; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon Awesome! I'll try and take a preliminary look at this in the next day or two. What kind of feedback are you interested in right now? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Apr 22, 2014 at 12:44 PM, Matt W. Benjamin <matt@linuxbox.com> wrote: > Hi, > > We've completed a trial merge of our current XioMessenger branches to Ceph > upstream master. The resulting branch is on our public Ceph development > repo, on banch xio-rados-firefly: > > https://github.com/linuxbox2/linuxbox-ceph/commits/xio-rados-firefly > > This branch isn't fully vetted. We haven't re-run librados tests on it, and > there may be issues. In addition, the core XioMessenger code on this branch > has at least one open bug (a race condition) that appeared after we switched > to Accelio one-way messages (which streamline the workflow, and are faster than > the original request-response code). We'll send an update on this to the > list as soon as we can (later this week). > > There is also ongoing merge work (making XioMessenger work with the rest of > Ceph), and open items (e.g., XioMessenger doesn't implement CephX). > > What I think is good about this branch is it has all the commits broken out, > starting with general changes to the Messenger class family, for review. > > Thanks and regards, > > Matt > > -- > Matt Benjamin > The Linux Box > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://linuxbox.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <226225586.131.1398202885165.JavaMail.root@thunderbeast.private.linuxbox.com>]
* Re: xio-rados-firefly branch update [not found] <226225586.131.1398202885165.JavaMail.root@thunderbeast.private.linuxbox.com> @ 2014-04-22 21:50 ` Matt W. Benjamin 2014-04-28 23:40 ` Gregory Farnum 0 siblings, 1 reply; 8+ messages in thread From: Matt W. Benjamin @ 2014-04-22 21:50 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon Hi Greg, Sure. I'm interested in all feedback you come up with, but for starters, it would be helpful to have feedback on 1. the Messenger reorg in general, and perhaps specifically how to think about the throttle ideas that I shoved into SimplePolicyMessenger 2. about the completion mechanisms--with the addition of claim_data, this is provisionally complete 3. about DispatchStrategy and non-prio queueing, and how the way we're doing things relates to your parallel work with SimpleMessenger Thanks, Matt ----- "Gregory Farnum" <greg@inktank.com> wrote: > Awesome! I'll try and take a preliminary look at this in the next day > or two. What kind of feedback are you interested in right now? > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > On Tue, Apr 22, 2014 at 12:44 PM, Matt W. Benjamin <matt@linuxbox.com> > wrote: > > Hi, > > > > We've completed a trial merge of our current XioMessenger branches > to Ceph > > upstream master. The resulting branch is on our public Ceph > development > > repo, on banch xio-rados-firefly: > > > > > https://github.com/linuxbox2/linuxbox-ceph/commits/xio-rados-firefly > > > > This branch isn't fully vetted. We haven't re-run librados tests on > it, and > > there may be issues. In addition, the core XioMessenger code on > this branch > > has at least one open bug (a race condition) that appeared after we > switched > > to Accelio one-way messages (which streamline the workflow, and are > faster than > > the original request-response code). We'll send an update on this > to the > > list as soon as we can (later this week). > > > > There is also ongoing merge work (making XioMessenger work with the > rest of > > Ceph), and open items (e.g., XioMessenger doesn't implement CephX). > > > > What I think is good about this branch is it has all the commits > broken out, > > starting with general changes to the Messenger class family, for > review. > > > > Thanks and regards, > > > > Matt > > > > -- > > Matt Benjamin > > The Linux Box > > 206 South Fifth Ave. Suite 150 > > Ann Arbor, MI 48104 > > > > http://linuxbox.com > > > > tel. 734-761-4689 > > fax. 734-769-8938 > > cel. 734-216-5309 -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update 2014-04-22 21:50 ` Matt W. Benjamin @ 2014-04-28 23:40 ` Gregory Farnum 2014-04-29 0:42 ` Josh Durgin 2014-04-29 3:14 ` Matt W. Benjamin 0 siblings, 2 replies; 8+ messages in thread From: Gregory Farnum @ 2014-04-28 23:40 UTC (permalink / raw) To: Matt W. Benjamin; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon A few days later than I wanted, but I got through various pieces of this today. It wasn't a thorough review but more a "shape of things" check, but I have a bunch of notes. On Tue, Apr 22, 2014 at 2:50 PM, Matt W. Benjamin <matt@linuxbox.com> wrote: > Hi Greg, > > Sure. I'm interested in all feedback you come up with, but for starters, > it would be helpful to have feedback on > > 1. the Messenger reorg in general, and perhaps specifically how to think > about the throttle ideas that I shoved into SimplePolicyMessenger The re-org mostly looks fine. I notice you're adding a few more "friend" declarations though, and I don't think those should be necessary — Connection can label stuff as protected instead of private to help with that. PipeConnection should probably live with Pipe or SimpleMessenger instead of Connection? Certainly don't want the backwards knowledge that "friend class PipeConnection" implies. It would be better if we didn't need to add members to the shared classes in order to allow one implementation but not the other (looking particularly at bi::list_member_hook<> dispatch_q). SimplePolicyMessenger seems a bit weird — it's just a shared-implementation class, but it doesn't hide any of the meaning of the policies away from you or provide interfaces to do the work for you. In particular, it doesn't look like the Xio stuff is actually following any of the throttle rules about how much data is allowed in from clients, and SimplePolicyMessenger isn't helping it do so. > 2. about the completion mechanisms--with the addition of claim_data, this > is provisionally complete Can you talk about this a bit in general before I comment? I want to make sure I understand what's going on here, particularly with the way you're doing memory allocation in the mempools and handling the refcounting. > 3. about DispatchStrategy and non-prio queueing, and how the way we're doing > things relates to your parallel work with SimpleMessenger This looks a bit problematic. In particular, the standard SimpleMessenger interface is responsible for delivery fairness between different peers, and none of your examples provide that. I suppose they do have the information needed to do so as part of the Message::connection member. My work on speeding up the SimpleMessenger introduces "fast dispatch" for specific types of messages, and shifts responsibility for fairness onto the Dispatcher. It's essentially your "FastStrategy" but via separate delivery interfaces (so the daemon can do less locking, etc). Other notes of varying import: XioMessenger doesn't implement any of the mark_down stuff. I realize that those interfaces are a bit weird for a "persistent-in-hardware" connection, but we need some way of simulating them — the Dispatcher uses those interfaces to toss everything out and start over, making sure messages don't survive from one daemon invocation to another. (A separate daemon invocation from the cluster's perspective might still be the same daemon on the local machine, if it went unresponsive for some reason.) Is MDataPing just for your messenger testing bits? The buffer changes appear to have some unnecessary Xio class declarations in buffer.h, and it’d be nice if all of that was guarded by HAVE_XIO blocks. Do you actually want the only exposed interface to be create_msg()? It might become convenient to be able to allocate xio memory space elsewhere and give it to the messenger (for instance, reading data off disk for transmission to a client). Is "new (bp) xio_msg_buffer(m_hook, buf, len);” some syntax I’m unaware of, or a merge/rebase issue? (Seriously, no idea. :) What is CEPH_ENTITY_TYPE_GENERIC_SERVER for? Looking at your example setups in OSD and Monitor, I'm not entirely thrilled with adding new parameters for an XioMessenger in addition to SimpleMessenger. I'd rather we came up with some interface so this was all transparent to the daemon code and endpoint selection was handled in the messenger layer. This is also introducing some difficulties with monitor identification — they are pretty much identified by IP:port, and having two separate addresses for a single monitor is going to get weird. I notice that (for the OSD) you’re setting the “poison pill” on ms_public twice rather than adding it to ms_xio_public. Thanks! -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update 2014-04-28 23:40 ` Gregory Farnum @ 2014-04-29 0:42 ` Josh Durgin 2014-04-29 3:14 ` Matt W. Benjamin 1 sibling, 0 replies; 8+ messages in thread From: Josh Durgin @ 2014-04-29 0:42 UTC (permalink / raw) To: Gregory Farnum, Matt W. Benjamin; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon On 04/28/2014 04:40 PM, Gregory Farnum wrote: > The buffer changes appear to have some unnecessary Xio class > declarations in buffer.h, and it’d be nice if all of that was guarded > by HAVE_XIO blocks. buffer.h is exposed by the librados c++ api, so hopefully nothing xio-specific really needs to be exposed there. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update 2014-04-28 23:40 ` Gregory Farnum 2014-04-29 0:42 ` Josh Durgin @ 2014-04-29 3:14 ` Matt W. Benjamin 2014-04-29 12:00 ` Matt W. Benjamin 2014-05-01 22:26 ` Gregory Farnum 1 sibling, 2 replies; 8+ messages in thread From: Matt W. Benjamin @ 2014-04-29 3:14 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon Hi Greg, ----- "Gregory Farnum" <greg@inktank.com> wrote: > > The re-org mostly looks fine. > I notice you're adding a few more "friend" declarations though, and I > don't think those should be necessary — Connection can label stuff as > protected instead of private to help with that. > PipeConnection should probably live with Pipe or SimpleMessenger > instead of Connection? Certainly don't want the backwards knowledge > that "friend class PipeConnection" implies. Ok. > It would be better if we didn't need to add members to the shared > classes in order to allow one implementation but not the other > (looking particularly at bi::list_member_hook<> dispatch_q). No, but this is a potential optimization for both/all Messengers. It could also be worth revisiting whether Message could be specialized by Messenger. > > SimplePolicyMessenger seems a bit weird — it's just a > shared-implementation class, but it doesn't hide any of the meaning > of > the policies away from you or provide interfaces to do the work for > you. In particular, it doesn't look like the Xio stuff is actually > following any of the throttle rules about how much data is allowed in > from clients, and SimplePolicyMessenger isn't helping it do so. I agree. What I meant by "shoving" all this stuff into SimplePolicyMessenger was that we hadn't examined the policy interfaces fully, and were just trying to clean up the interface, essentially, so XioMessenger could be clearer. There's more work to do here, at least forward looking. > > > > 2. about the completion mechanisms--with the addition of claim_data, > this > > is provisionally complete > > Can you talk about this a bit in general before I comment? I want to > make sure I understand what's going on here, particularly with the > way > you're doing memory allocation in the mempools and handling the > refcounting. Yes, and also, we have just pushed a major update to this logic on our current stabilization branch (xio-rados-exp-noreg-buf-qs-upstream-ow-mdw). The logic in the last version was overly complex. We streamlined things, and also have switched to using the Xio one-way message paradigm, which as you mentioned a while back, is closer to Ceph's, and also has higher performance. > > > 3. about DispatchStrategy and non-prio queueing, and how the way > we're doing > > things relates to your parallel work with SimpleMessenger > > This looks a bit problematic. In particular, the standard > SimpleMessenger interface is responsible for delivery fairness > between > different peers, and none of your examples provide that. I suppose > they do have the information needed to do so as part of the > Message::connection member. > My work on speeding up the SimpleMessenger introduces "fast dispatch" > for specific types of messages, and shifts responsibility for > fairness > onto the Dispatcher. It's essentially your "FastStrategy" but via > separate delivery interfaces (so the daemon can do less locking, > etc). I agree, there's a bit more cleanup needed here. Initially, I only had the equivalent of FastStrategy. We realized that we wanted to make the Dispatcher responsible for threading model, but didn't think we could really tackle that without more feedback, so we made it possible to do out-of-line delivery with the QueueStrategy, pending further discussion. > > Other notes of varying import: > XioMessenger doesn't implement any of the mark_down stuff. I realize > that those interfaces are a bit weird for a "persistent-in-hardware" > connection, but we need some way of simulating them — the Dispatcher > uses those interfaces to toss everything out and start over, making > sure messages don't survive from one daemon invocation to another. (A > separate daemon invocation from the cluster's perspective might still > be the same daemon on the local machine, if it went unresponsive for > some reason.) Agree. Again, we tried to simplify the complexity away provisionally. > > Is MDataPing just for your messenger testing bits? Yes. > > The buffer changes appear to have some unnecessary Xio class > declarations in buffer.h, and it’d be nice if all of that was guarded > by HAVE_XIO blocks. > Do you actually want the only exposed interface to be create_msg()? > It > might become convenient to be able to allocate xio memory space > elsewhere and give it to the messenger (for instance, reading data > off > disk for transmission to a client). Yes, we only need to expose create_msg(), and (Josh) we could do it in another header, or whatever. > > Is "new (bp) xio_msg_buffer(m_hook, buf, len);” some syntax I’m > unaware of, or a merge/rebase issue? (Seriously, no idea. :) This is C++ "placement new." > > What is CEPH_ENTITY_TYPE_GENERIC_SERVER for? This was just cargo-cult coding. It's a server that isn't one of the known types (like a test harness). > > Looking at your example setups in OSD and Monitor, I'm not entirely > thrilled with adding new parameters for an XioMessenger in addition > to > SimpleMessenger. I'd rather we came up with some interface so this > was > all transparent to the daemon code and endpoint selection was handled > in the messenger layer. > This is also introducing some difficulties with monitor > identification > — they are pretty much identified by IP:port, and having two separate > addresses for a single monitor is going to get weird. > I notice that (for the OSD) you’re setting the “poison pill” on > ms_public twice rather than adding it to ms_xio_public. We have a bunch of work planned here. Our current thinking is to create a facade (named something like MessengerGroup) that encapsulates all the messengers that an endpoint deals with. It goes together with an address-list extension to entity_name_t, as we talked about briefly. Thank you, Greg (and Josh)! > > Thanks! > -Greg -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update 2014-04-29 3:14 ` Matt W. Benjamin @ 2014-04-29 12:00 ` Matt W. Benjamin 2014-05-01 22:26 ` Gregory Farnum 1 sibling, 0 replies; 8+ messages in thread From: Matt W. Benjamin @ 2014-04-29 12:00 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon Hi Greg, I forgot to finish a couple of thoughts, filling them in... ----- "Matt W. Benjamin" <matt@linuxbox.com> wrote: > > > 2. about the completion mechanisms--with the addition of > claim_data, > > this > > > is provisionally complete > > > > Can you talk about this a bit in general before I comment? I want > to > > make sure I understand what's going on here, particularly with the > > way > > you're doing memory allocation in the mempools and handling the > > refcounting. I didn't actually talk about it :). The mempools are used everywhere to avoid allocation in XioMessenger fast paths (which is most of them). We arrived at a need for this via profiling and measurement, of course. The most complicated case is reclaiming memory for a Message and related structures on the responder side. We manage the memory together using a per-request pool, which is a member of the completion hook object. (The backing store is the Xio pool facility.) The decoded Message itself (assuming we successfully decode) as well as each tracking Ceph buffer (logic for which we re-wrote from last version, as noted earlier) holds a reference on the hook object. When the last of those refs is released (the upper-layer code is done with Message and with any buffers which it claimed) a cleamup process is started on the completion hook. This takes a new initial reference (yes, it's atomic) on the hook, which is handed off to the XioPortal thread which originally delivered it, for final cleanup. That thread needs to be the one which returns xio_msg buffers to Accelio. Once that is done, the portal thread returns the last reference to the completion hook, and it is recycled also. > > Yes, and also, we have just pushed a major update to this logic on > our > current stabilization branch > (xio-rados-exp-noreg-buf-qs-upstream-ow-mdw). > The logic in the last version was overly complex. We streamlined > things, > and also have switched to using the Xio one-way message paradigm, > which > as you mentioned a while back, is closer to Ceph's, and also has > higher > performance. > > > > > Is "new (bp) xio_msg_buffer(m_hook, buf, len);” some syntax I’m > > unaware of, or a merge/rebase issue? (Seriously, no idea. :) > > This is C++ "placement new." The placement new syntax is the way you substitute your own memory allocation strategy for built-in new and delete (which typically use the system allocator, of course). Essentially, this step calls the desired constructor on the already-allocated memory. This comes up because we're using Accelio's pool allocator. > -- Matt Benjamin CohortFS, LLC. 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://cohortfs.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xio-rados-firefly branch update 2014-04-29 3:14 ` Matt W. Benjamin 2014-04-29 12:00 ` Matt W. Benjamin @ 2014-05-01 22:26 ` Gregory Farnum 1 sibling, 0 replies; 8+ messages in thread From: Gregory Farnum @ 2014-05-01 22:26 UTC (permalink / raw) To: Matt W. Benjamin; +Cc: ceph-devel, Yaron Haviv, Eyal Salomon On Mon, Apr 28, 2014 at 8:14 PM, Matt W. Benjamin <matt@linuxbox.com> wrote: > Hi Greg, > > ----- "Gregory Farnum" <greg@inktank.com> wrote: > >> >> The re-org mostly looks fine. >> I notice you're adding a few more "friend" declarations though, and I >> don't think those should be necessary — Connection can label stuff as >> protected instead of private to help with that. >> PipeConnection should probably live with Pipe or SimpleMessenger >> instead of Connection? Certainly don't want the backwards knowledge >> that "friend class PipeConnection" implies. > > Ok. > >> It would be better if we didn't need to add members to the shared >> classes in order to allow one implementation but not the other >> (looking particularly at bi::list_member_hook<> dispatch_q). > > No, but this is a potential optimization for both/all Messengers. It > could also be worth revisiting whether Message could be specialized by > Messenger. > >> >> SimplePolicyMessenger seems a bit weird — it's just a >> shared-implementation class, but it doesn't hide any of the meaning >> of >> the policies away from you or provide interfaces to do the work for >> you. In particular, it doesn't look like the Xio stuff is actually >> following any of the throttle rules about how much data is allowed in >> from clients, and SimplePolicyMessenger isn't helping it do so. > > I agree. What I meant by "shoving" all this stuff into SimplePolicyMessenger > was that we hadn't examined the policy interfaces fully, and were just > trying to clean up the interface, essentially, so XioMessenger could be > clearer. There's more work to do here, at least forward looking. I guess I'd like to hear how you think that might work — are you just going to re-implement the DispatchQueue inside of a SimplePolicyMessenger? Does the interface actually allow you to do that? With TCP it's easy to stop accepting data from somebody — just stop reading off their socket, which we do by blocking on the Throttler in the socket reader loop. That seems harder here and IIRC you're passing the messages out asynchronously with anything that can turn a throttle limit into backpressure. >> > 2. about the completion mechanisms--with the addition of claim_data, >> this >> > is provisionally complete >> >> Can you talk about this a bit in general before I comment? I want to >> make sure I understand what's going on here, particularly with the >> way >> you're doing memory allocation in the mempools and handling the >> refcounting. > > Yes, and also, we have just pushed a major update to this logic on our > current stabilization branch (xio-rados-exp-noreg-buf-qs-upstream-ow-mdw). > The logic in the last version was overly complex. We streamlined things, > and also have switched to using the Xio one-way message paradigm, which > as you mentioned a while back, is closer to Ceph's, and also has higher > performance. ... > I didn't actually talk about it :). The mempools are used everywhere > to avoid allocation in XioMessenger fast paths (which is most of > them). We arrived at a need for this via profiling and measurement, > of course. > > The most complicated case is reclaiming memory for a Message and related > structures on the responder side. We manage the memory together using > a per-request pool, which is a member of the completion hook object. > (The backing store is the Xio pool facility.) The decoded Message itself > (assuming we successfully decode) as well as each tracking Ceph buffer > (logic for which we re-wrote from last version, as noted earlier) holds > a reference on the hook object. When the last of those refs is released > (the upper-layer code is done with Message and with any buffers which > it claimed) a cleamup process is started on the completion hook. This > takes a new initial reference (yes, it's atomic) on the hook, which is > handed off to the XioPortal thread which originally delivered it, for > final cleanup. That thread needs to be the one which returns xio_msg > buffers to Accelio. Once that is done, the portal thread returns the > last reference to the completion hook, and it is recycled also. Okay, I think this makes sense. I was hoping we could just make use of static functions or something instead of carrying around a hook on every message, but I can't think of a good way to do that while supporting multiple messengers in a system. > >> >> > 3. about DispatchStrategy and non-prio queueing, and how the way >> we're doing >> > things relates to your parallel work with SimpleMessenger >> >> This looks a bit problematic. In particular, the standard >> SimpleMessenger interface is responsible for delivery fairness >> between >> different peers, and none of your examples provide that. I suppose >> they do have the information needed to do so as part of the >> Message::connection member. >> My work on speeding up the SimpleMessenger introduces "fast dispatch" >> for specific types of messages, and shifts responsibility for >> fairness >> onto the Dispatcher. It's essentially your "FastStrategy" but via >> separate delivery interfaces (so the daemon can do less locking, >> etc). > > I agree, there's a bit more cleanup needed here. Initially, I only had > the equivalent of FastStrategy. We realized that we wanted to make the > Dispatcher responsible for threading model, but didn't think we could > really tackle that without more feedback, so we made it possible to do > out-of-line delivery with the QueueStrategy, pending further discussion. Hrm. "mak[ing] the Dispatcher responsible for threading model" does not really fill me with warm fuzzy thoughts. I don't want to change those interface without very good reason — it took a long time just to do the fast dispatch work in the OSD and will be much more code churn in the other daemons; I'm not really sure how we could do make it the Dispatcher's responsibility except by moving the code over but I think it lives better in the Messenger, because it has so much more information available about what connections are busy, etc; but maybe there's something I'm missing. I didn't read through much of the XioMessenger implementation, but from the parts I looked at and the interfaces I'm seeing I think we need to address this soon. > >> >> Other notes of varying import: >> XioMessenger doesn't implement any of the mark_down stuff. I realize >> that those interfaces are a bit weird for a "persistent-in-hardware" >> connection, but we need some way of simulating them — the Dispatcher >> uses those interfaces to toss everything out and start over, making >> sure messages don't survive from one daemon invocation to another. (A >> separate daemon invocation from the cluster's perspective might still >> be the same daemon on the local machine, if it went unresponsive for >> some reason.) > > Agree. Again, we tried to simplify the complexity away provisionally. Again, from what I'm seeing this is something that needs to get addressed soon. These interfaces aren't minor bolt-ons even for a TCP-based messenger, and I suspect it will be more work for an RDMA-style one. Along some similar lines, I didn't look closely enough to see how you're handling stuff like the lossy/non-lossy and server/client distinctions. Have you taken those on yet? >> The buffer changes appear to have some unnecessary Xio class >> declarations in buffer.h, and it’d be nice if all of that was guarded >> by HAVE_XIO blocks. >> Do you actually want the only exposed interface to be create_msg()? >> It >> might become convenient to be able to allocate xio memory space >> elsewhere and give it to the messenger (for instance, reading data >> off >> disk for transmission to a client). > > Yes, we only need to expose create_msg(), and (Josh) we could do it in > another header, or whatever. > >> >> Is "new (bp) xio_msg_buffer(m_hook, buf, len);” some syntax I’m >> unaware of, or a merge/rebase issue? (Seriously, no idea. :) > > This is C++ "placement new." Ah, nifty. Not sure why I haven't run across that before. >> Looking at your example setups in OSD and Monitor, I'm not entirely >> thrilled with adding new parameters for an XioMessenger in addition >> to >> SimpleMessenger. I'd rather we came up with some interface so this >> was >> all transparent to the daemon code and endpoint selection was handled >> in the messenger layer. >> This is also introducing some difficulties with monitor >> identification >> — they are pretty much identified by IP:port, and having two separate >> addresses for a single monitor is going to get weird. >> I notice that (for the OSD) you’re setting the “poison pill” on >> ms_public twice rather than adding it to ms_xio_public. > > We have a bunch of work planned here. Our current thinking is to create > a facade (named something like MessengerGroup) that encapsulates all > the messengers that an endpoint deals with. It goes together with an > address-list extension to entity_name_t, as we talked about briefly. Yeah, that sounds good. Just wanted to check. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-05-01 22:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <586483842.112.1398195579489.JavaMail.root@thunderbeast.private.linuxbox.com>
2014-04-22 19:44 ` xio-rados-firefly branch update Matt W. Benjamin
2014-04-22 19:58 ` Gregory Farnum
[not found] <226225586.131.1398202885165.JavaMail.root@thunderbeast.private.linuxbox.com>
2014-04-22 21:50 ` Matt W. Benjamin
2014-04-28 23:40 ` Gregory Farnum
2014-04-29 0:42 ` Josh Durgin
2014-04-29 3:14 ` Matt W. Benjamin
2014-04-29 12:00 ` Matt W. Benjamin
2014-05-01 22:26 ` Gregory Farnum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.