From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: MMIO and gcc re-ordering issue Date: Thu, 29 May 2008 10:01:29 -0500 Message-ID: <1212073289.3428.30.camel@localhost.localdomain> References: <1211852026.3286.36.camel@pasglop> <20080526.184047.88207142.davem@davemloft.net> <1211854540.3286.42.camel@pasglop> <20080526.192812.184590464.davem@davemloft.net> <20080526204233.75b71bb8@infradead.org> <1211872130.3286.64.camel@pasglop> <1211922696.3286.82.camel@pasglop> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: from accolon.hansenpartnership.com ([76.243.235.52]:51636 "EHLO accolon.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752522AbYE2PBk (ORCPT ); Thu, 29 May 2008 11:01:40 -0400 In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-ID: To: Jes Sorensen Cc: Roland Dreier , benh@kernel.crashing.org, Arjan van de Ven , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, tpiepho@freescale.com, linuxppc-dev@ozlabs.org, scottwood@freescale.com, torvalds@linux-foundation.org, David Miller , alan@lxorguk.ukuu.org.uk On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote: > >>>>> "Roland" == Roland Dreier writes: > > >> This is a different issue. We deal with it on powerpc by having > >> writel set a per-cpu flag and spin_unlock() test it, and do the > >> barrier if needed there. > > Roland> Cool... I assume you do this for mutex_unlock() etc? > > Roland> Is there any reason why ia64 can't do this too so we can kill > Roland> mmiowb and save everyone a lot of hassle? (mips, sh and frv > Roland> have non-empty mmiowb() definitions too but I'd guess that > Roland> these are all bugs based on misunderstandings of the mmiowb() > Roland> semantics...) > > Hi Roland, > > Thats not going to solve the problem on Altix. On Altix the issue is > that there can be multiple paths through the NUMA fabric from cpuX to > PCI bridge Y. > > Consider this uber-cool ascii art - NR is my abbrevation for NUMA > router: > > ------- ------- > |cpu X| |cpu Y| > ------- ------- > | \____ ____/ | > | \/ | > | ____/\____ | > | / \ | > ----- ------ > |NR 1| |NR 2| > ------ ------ > \ / > \ / > ------- > | PCI | > ------- > > The problem is that your two writel's, despite being both issued on > cpu X, due to the spin lock, in your example, can end up with the > first one going through NR 1 and the second one going through NR 2. If > there's contention on NR 1, the write going via NR 2 may hit the PCI > bridge prior to the one going via NR 1. > > Of course, the bigger the system, the worse the problem.... > > The only way to guarantee ordering in the above setup, is to either > make writel() fully ordered or adding the mmiowb()'s inbetween the two > writel's. On Altix you have to go and read from the PCI brige to > ensure all writes to it have been flushed, which is also what mmiowb() > is doing. If writel() was to guarantee this ordering, it would make > every writel() call extremely expensive :-( So if a read from the bridge achieves the same effect, can't we just put one after the writes within the spinlock (an unrelaxed one). That way this whole sequence will look like a well understood PCI posting flush rather than have to muck around with little understood (at least by most driver writers) io barriers? James