From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Ira W. Snyder" Subject: Re: [PATCH 1/7] shm-signal: shared-memory signals Date: Thu, 6 Aug 2009 13:51:09 -0700 Message-ID: <20090806205109.GA1330@ovro.caltech.edu> References: <20090803171030.17268.26962.stgit@dev.haskins.net> <20090803171735.17268.37490.stgit@dev.haskins.net> <200908061556.55390.arnd@arndb.de> <4A7ABA530200005A00051C18@sinclair.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Arnd Bergmann , paulmck@linux.vnet.ibm.com, alacrityvm-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org, netdev@vger.kernel.org To: Gregory Haskins Return-path: Content-Disposition: inline In-Reply-To: <4A7ABA530200005A00051C18@sinclair.provo.novell.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Thu, Aug 06, 2009 at 09:11:15AM -0600, Gregory Haskins wrote: > Hi Arnd, > > >>> On 8/6/2009 at 9:56 AM, in message <200908061556.55390.arnd@arndb.de>, Arnd > Bergmann wrote: > > On Monday 03 August 2009, Gregory Haskins wrote: > >> shm-signal provides a generic shared-memory based bidirectional > >> signaling mechanism. It is used in conjunction with an existing > >> signal transport (such as posix-signals, interrupts, pipes, etc) to > >> increase the efficiency of the transport since the state information > >> is directly accessible to both sides of the link. The shared-memory > >> design provides very cheap access to features such as event-masking > >> and spurious delivery mititgation, and is useful implementing higher > >> level shared-memory constructs such as rings. > > > > Looks like a very useful feature in general. > > Thanks, I was hoping that would be the case. > > > > >> +struct shm_signal_irq { > >> + __u8 enabled; > >> + __u8 pending; > >> + __u8 dirty; > >> +}; > > > > Won't this layout cause cache line ping pong? Other schemes I have > > seen try to separate the bits so that each cache line is written to > > by only one side. > > It could possibly use some optimization in that regard. I generally consider myself an expert at concurrent programming, but this lockless stuff is, um, hard ;) I was going for correctness first. > > Long story short, any suggestions on ways to split this up are welcome (particularly now, before the ABI is sealed ;) > > > This gets much more interesting if the two sides > > are on remote ends of an I/O link, e.g. using a nontransparent > > PCI bridge, where you only want to send stores over the wire, but > > never fetches or even read-modify-write cycles. > > /me head explodes ;) > I've actually implemented this idea for virtio. Read the virtio-over-PCI patches I posted, and you'll see that the entire virtqueue implementation NEVER uses reads across the PCI bus, only writes. The slowpath configuration space uses reads, but the virtqueues themselves are write-only. Some trivial benchmarking against an earlier driver that did writes+reads across the PCI bus showed that the write-only driver was about 2x as fast. (Throughput increased from ~30MB/sec to ~65MB/sec). I'm sure the write-only design was not the only change responsible for the speedup, but it was definitely a contributing factor. Ira