From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932641AbWLMKWO (ORCPT ); Wed, 13 Dec 2006 05:22:14 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932648AbWLMKWO (ORCPT ); Wed, 13 Dec 2006 05:22:14 -0500 Received: from amsfep17-int.chello.nl ([213.46.243.15]:7807 "EHLO amsfep18-int.chello.nl" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S932641AbWLMKWN (ORCPT ); Wed, 13 Dec 2006 05:22:13 -0500 X-Greylist: delayed 1015 seconds by postgrey-1.27 at vger.kernel.org; Wed, 13 Dec 2006 05:22:12 EST Subject: Re: [patch 03/13] io-accounting: write accounting From: Peter Zijlstra To: Andrew Morton Cc: Suleiman Souhlal , linux-kernel@vger.kernel.org, balbir@in.ibm.com, csturtiv@sgi.com, daw@sgi.com, guillaume.thouvenin@bull.net, jlan@sgi.com, nagar@watson.ibm.com, tee@sgi.com In-Reply-To: <20061213005954.e2d32446.akpm@osdl.org> References: <200612081152.kB8BqQvb019756@shell0.pdx.osdl.net> <457FBDBE.10102@FreeBSD.org> <20061213005954.e2d32446.akpm@osdl.org> Content-Type: text/plain Date: Wed, 13 Dec 2006 11:04:54 +0100 Message-Id: <1166004294.32332.93.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2006-12-13 at 00:59 -0800, Andrew Morton wrote: > On Wed, 13 Dec 2006 00:45:50 -0800 > Suleiman Souhlal wrote: > > > akpm@osdl.org wrote: > > > From: Andrew Morton > > > > > > Accounting writes is fairly simple: whenever a process flips a > page from clean > > > to dirty, we accuse it of having caused a write to underlying > storage of > > > PAGE_CACHE_SIZE bytes. > > > > On architectures where dirtying a page doesn't cause a page fault > (like i386), couldn't you end up billing the wrong process (in fact, I > think that even on other archituctures set_page_dirty() doesn't get > called immediately in the page fault handler)? > > Yes, that would be a problem in 2.6.18 and earlier. > > In 2.6.19 and later, we do take a fault when transitioning a page from > pte-clean to pte-dirty. That was done to get the dirty-page > accounting > right - to avoid the > all-of-memory-is-dirty-but-the-kernel-doesn't-know-it > problem. > > > > AFAICS, set_page_dirty() is mostly called when trying to unmap a > page when trying to shrink LRU lists, and there is no guarantee that > this happens under the process that dirtied it (in fact, the > set_page_dirty() is often done by kswapd). we're talking: shrink_page_list() TestSetPageLock() try_to_unmap() try_to_unmap_file() try_to_unmap_one() set_page_dirty() Which, because its run under the page lock, should always be false, right? perhaps a WARN_ON might do. > hm, that code is still there in zap_pte_range(). If all is well, that > set_page_dirty() call should never return true. exit_mmap(), do_unmap() unmap_vmas() unmap_page_range() zap_{pud,pmd,pte}_range() vmtruncate() unmap_mapping_range() unmap_mapping_range_*() unmap_mapping_range_vma(), madvise_dontneed() zap_page_range() unmap_vmas() unmap_page_range() zap_{pud,pmd,pte}_range() Which is all ran without the page lock, so might be open for a race; however, I don't see that hurting because the whole page is un-accounted pretty soon afterwards. > Peter did, you ever test for that? Nope > (Well, it might return true in rare races, because zap_pte_range() > doesn't lock the pages)