From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Gregory Haskins" Subject: Re: [PATCH 1/7] shm-signal: shared-memory signals Date: Thu, 06 Aug 2009 09:11:15 -0600 Message-ID: <4A7ABA530200005A00051C18@sinclair.provo.novell.com> References: <20090803171030.17268.26962.stgit@dev.haskins.net> <20090803171735.17268.37490.stgit@dev.haskins.net> <200908061556.55390.arnd@arndb.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Cc: , , , To: "Arnd Bergmann" Return-path: In-Reply-To: <200908061556.55390.arnd@arndb.de> Content-Disposition: inline Sender: netdev-owner@vger.kernel.org List-ID: Hi Arnd, >>> On 8/6/2009 at 9:56 AM, in message <200908061556.55390.arnd@arndb.de>, Arnd Bergmann wrote: > On Monday 03 August 2009, Gregory Haskins wrote: >> shm-signal provides a generic shared-memory based bidirectional >> signaling mechanism. It is used in conjunction with an existing >> signal transport (such as posix-signals, interrupts, pipes, etc) to >> increase the efficiency of the transport since the state information >> is directly accessible to both sides of the link. The shared-memory >> design provides very cheap access to features such as event-masking >> and spurious delivery mititgation, and is useful implementing higher >> level shared-memory constructs such as rings. > > Looks like a very useful feature in general. Thanks, I was hoping that would be the case. > >> +struct shm_signal_irq { >> + __u8 enabled; >> + __u8 pending; >> + __u8 dirty; >> +}; > > Won't this layout cause cache line ping pong? Other schemes I have > seen try to separate the bits so that each cache line is written to > by only one side. It could possibly use some optimization in that regard. I generally consider myself an expert at concurrent programming, but this lockless stuff is, um, hard ;) I was going for correctness first. Long story short, any suggestions on ways to split this up are welcome (particularly now, before the ABI is sealed ;) > This gets much more interesting if the two sides > are on remote ends of an I/O link, e.g. using a nontransparent > PCI bridge, where you only want to send stores over the wire, but > never fetches or even read-modify-write cycles. /me head explodes ;) > > Your code is probably optimal if you only communicate between host > and guest code on the same CPU, but not so good if it crosses NUMA > nodes or worse. Yeah, I wont lie and say it wasn't designed primarily for the former case in mind (since it was my particular itch). I would certainly appreciate any insight on ways to make it more generally applicable for things like the transparent bridge model, and/or NUMA, though. > >> +struct shm_signal_desc { >> + __u32 magic; >> + __u32 ver; >> + struct shm_signal_irq irq[2]; >> +}; > > This data structure has implicit padding of two bytes at the end. > How about adding another '__u16 reserved' to make it explicit? Good idea. Will fix. > >> + /* >> + * We always mark the remote side as dirty regardless of whether >> + * they need to be notified. >> + */ >> + irq->dirty = 1; >> + wmb(); /* dirty must be visible before we test the pending state */ >> + >> + if (irq->enabled && !irq->pending) { >> + rmb(); >> + >> + /* >> + * If the remote side has enabled notifications, and we do >> + * not see a notification pending, we must inject a new one. >> + */ >> + irq->pending = 1; >> + wmb(); /* make it visible before we do the injection */ >> + >> + s->ops->inject(s); >> + } > > Barriers always confuse me, but the rmb() looks slightly wrong. AFAIU > it only prevents reads after the barrier from being done before the > barrier, but you don't do any reads after it. Its probably overzealous barrier'ing on my part. I had a conversation with Paul McKenney (CC'd) where I was wondering if a conditional was an implicit barrier. His response was something to the effect of "on most arches, yes, but not all". And we concluded that, to be conservative, there should be a rmb() after the if(). That said, tbh I am not sure if its actually needed. Paul? > > The (irq->enabled && !irq->pending) check could be done before the > irq->dirty = 1 arrives at the bus, but that does not seem to hurt, it > would at most cause a duplicate ->inject(). > > Regarding the scope of the barrier, did you intentionally use the > global versions (rmb()/wmb()) and not the lighter single-system > (smp_rmb()/smp_wmb()) versions? Your version should cope with remote > links over PCI but looks otherwise optimized for local use, as I > wrote above. Yes, it was intentional. Both for the remote case, as you point out. Also for the case where local might be mismatched (for instance, a guest compiled as UP). Thanks Arnd, -Greg