From mboxrd@z Thu Jan 1 00:00:00 1970 From: William Lee Irwin III Date: Sat, 20 Nov 2004 04:24:27 +0000 Subject: Re: page fault scalability patch V11 [0/7]: overview Message-Id: <20041120042427.GK2714@holomorphy.com> List-Id: References: <419D5E09.20805@yahoo.com.au> <1100848068.25520.49.camel@gaston> <20041120020401.GC2714@holomorphy.com> <419EA96E.9030206@yahoo.com.au> <20041120023443.GD2714@holomorphy.com> <419EAEA8.2060204@yahoo.com.au> <20041120030425.GF2714@holomorphy.com> <20041120033312.GB1434@lnx-holt.americas.sgi.com> In-Reply-To: <20041120033312.GB1434@lnx-holt.americas.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Robin Holt Cc: Nick Piggin , Christoph Lameter , torvalds@osdl.org, akpm@osdl.org, Benjamin Herrenschmidt , Hugh Dickins , linux-mm@kvack.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, Nov 19, 2004 at 09:33:12PM -0600, Robin Holt wrote: > Agree, we are currently using atomic ops on a global rss on our 2.4 > kernel with 512cpu systems and not seeing much cacheline contention. > I don't remember how little it ended up being, but it was very little. > We had gone to dropping the page_table_lock and only reaquiring it if > the pte was non-null when we went to insert our new one. I think that > was how we had it working. I would have to wake up and actually look > at that code as it was many months ago that Ray Bryant did that work. > We did make rss atomic. Most of the contention is sorted out by the > mmap_sem. Processes acquiring themselves off of mmap_sem were found > to have spaced themselves out enough that they were all approximately > equal time from doing their atomic_add and therefore had very little > contention for the cacheline. At least it was not enough that we could > measure it as significant. Also, the densely-packed split counter can only get 4-16 cpus to a cacheline with cachelines <= 128B, so there are definite limitations to the amount of cacheline contention in such schemes. -- wli