From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cantor.suse.de ([195.135.220.2]:35559 "EHLO Cantor.suse.de") by vger.kernel.org with ESMTP id S268716AbUHLUDA (ORCPT ); Thu, 12 Aug 2004 16:03:00 -0400 Date: Thu, 12 Aug 2004 22:00:25 +0200 From: Andi Kleen Subject: Re: clear_user_highpage() Message-Id: <20040812220025.27cb260a.ak@suse.de> In-Reply-To: <20040812125059.298ae914.davem@redhat.com> References: <20040811161537.5e24c2b6.davem@redhat.com> <20040811165307.46ff1eb6.davem@redhat.com> <20040812020825.GA14411@wotan.suse.de> <20040811194545.0034428b.davem@redhat.com> <20040812110924.0713f5d9.ak@suse.de> <20040812125059.298ae914.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit To: "David S. Miller" Cc: torvalds@osdl.org, linux-arch@vger.kernel.org List-ID: On Thu, 12 Aug 2004 12:50:59 -0700 "David S. Miller" wrote: > On Thu, 12 Aug 2004 11:09:24 +0200 > Andi Kleen wrote: > > > On Wed, 11 Aug 2004 19:45:45 -0700 > > "David S. Miller" wrote: > > > > > Do these cache-bypassing stores use the L2 cache on a hit? > > > > No, they invalidate the cache. > > That explains, at least partly, why they performed so poorly. Well, the writes are usually faster. While they don't use the cache they use special write combining buffers in the CPU that hold the data until it can blast out a full cache. Advantage is that it doesn't have to read anything first. How effective this is depends on the CPU, in general newer x86s tend to have much larger WC buffers than the previous generation (e.g. Intel just enlarged them again in Prescott) Unlike all other stores on x86 they are also very lazily ordered and need explicit memory barriers. Normally it is used for frame buffers and other hardware mappings, but sometimes it can be useful for a lot of streaming data too. > Is there any other platform that has the same kind of block > stores sparc64 does (basically use L2 cache if line present, > else bypass L2 cache for the store and do not allocate L2 > cache lines for the data)? I bet ia64 does have something > like this. This still has the same problem: in the end the data is out of cache and when someone else needs it later they eat large penalties. -Andi P.S.: I added a new experimental option now to use unordered WC stores for writel(). Haven't benchmarked it much so far though.