From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Andrew Morton <akpm@osdl.org>
Cc: Nick Piggin <piggin@cyberone.com.au>,
clameter@sgi.com, torvalds@osdl.org, ak@suse.de,
rohitseth@google.com, mbligh@google.com, hugh@veritas.com,
riel@redhat.com, andrea@suse.de, arjan@infradead.org,
apw@shadowen.org, mel@csn.ul.ie, marcelo@kvack.org,
anton@samba.org, paulmck@us.ibm.com, linux-mm@kvack.org
Subject: Re: [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4
Date: Fri, 12 May 2006 09:06:01 +0200 [thread overview]
Message-ID: <1147417561.8951.17.camel@twins> (raw)
In-Reply-To: <20060511213045.32b41aa6.akpm@osdl.org>
On Thu, 2006-05-11 at 21:30 -0700, Andrew Morton wrote:
> Nick Piggin <piggin@cyberone.com.au> wrote:
> >
> > >So let's see. We take a write fault, we mark the page dirty then we return
> > >to userspace which will proceed with the write and will mark the pte dirty.
> > >
> > >Later, the VM will write the page out.
> > >
> > >Later still, the pte will get cleaned by reclaim or by munmap or whatever
> > >and the page will be marked dirty and the page will again be written out.
> > >Potentially needlessly.
> > >
> >
> > page_wrprotect also marks the page clean,
>
> Oh. I missed that when reading the comment which describes
> page_wrprotect() (I do go on).
Yes, this name is not the best of names :-(
I was aware of this, but since in my mind the counting through
protection
faults was the prime idea, I stuck to page_wrprotect().
But I'm hard pressed to come up with a better one. Nick proposes:
page_mkclean()
But that also doesn't cover the whole of it from my perspective.
> > so this window is very small.
> > The window is that the fault path might set_page_dirty, then throttle
> > on writeout, and the page gets written out before it really gets dirtied
> > by the application (which then has to fault again).
>
> : int test_clear_page_dirty(struct page *page)
> : {
> : struct address_space *mapping = page_mapping(page);
> : unsigned long flags;
> :
> : if (mapping) {
> : write_lock_irqsave(&mapping->tree_lock, flags);
> : if (TestClearPageDirty(page)) {
> : radix_tree_tag_clear(&mapping->page_tree,
> : page_index(page),
> : PAGECACHE_TAG_DIRTY);
> : write_unlock_irqrestore(&mapping->tree_lock, flags);
> : /*
> : * We can continue to use `mapping' here because the
> : * page is locked, which pins the address_space
> : */
>
> So if userspace modifies the page right here, and marks the pte dirty.
>
> : if (mapping_cap_account_dirty(mapping)) {
> : page_wrprotect(page);
>
> We just lost that pte dirty bit, and hence the user's data.
I thought that at the time we clean PAGECACHE_TAG_DIRTY the page is in
flight to disk.
Now that I look at it again, perhaps the page_wrprotect() call in
clear_page_dirty_for_io()
should be in test_set_page_writeback().
> : dec_page_state(nr_dirty);
> : }
> : return 1;
> : }
> : write_unlock_irqrestore(&mapping->tree_lock, flags);
> : return 0;
> : }
> : return TestClearPageDirty(page);
> : }
> :
>
> Which is just the sort of subtle and nasty problem I was referring to...
>
> If that's correct then I guess we need the
>
> if (ptep_clear_flush_dirty(vma, addr, pte) ||
> page_test_and_clear_dirty(page))
> ret += set_page_dirty(page);
>
> treatment in page_wrprotect().
I would make that a BUG_ON(); the only way for a pte of a shared mapping
to become
dirty is through the fault handler, and that should already call
set_page_dirty() on it.
> Now I suppose it's not really a dataloss race, because in practice the
> kernel is about to write this page to backing store anwyay. I guess. I
> cannot immediately think of any clear_page_dirty() callers for whom that
> won't be true.
>
> Someone please convince me that this has all been thought about and is solid
> as a rock.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-05-12 7:06 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-05 20:35 [RFC][PATCH] tracking dirty pages in shared mappings Peter Zijlstra
2006-05-06 13:18 ` Nick Piggin
2006-05-06 13:34 ` Peter Zijlstra
2006-05-06 13:47 ` Nick Piggin
2006-05-06 15:29 ` Peter Zijlstra
2006-05-07 0:40 ` Nick Piggin
2006-05-07 3:43 ` Nick Piggin
2006-05-08 6:43 ` Christoph Lameter
2006-05-08 7:23 ` Peter Zijlstra
2006-05-08 19:20 ` [RFC][PATCH 1/2] tracking dirty pages in shared mappings -V3 Peter Zijlstra
2006-05-09 5:41 ` Christoph Lameter
2006-05-09 6:06 ` Peter Zijlstra
2006-05-09 20:44 ` [RFC][PATCH 1/3] tracking dirty pages in shared mappings -V4 Peter Zijlstra
2006-05-09 20:52 ` Peter Chubb
2006-05-09 20:55 ` Martin Bligh
2006-05-09 22:56 ` Brian Twichell
2006-05-10 0:24 ` Linus Torvalds
2006-05-10 0:29 ` Nick Piggin
2006-05-10 1:24 ` Linus Torvalds
2006-05-11 15:02 ` Andrew Morton
2006-05-11 16:39 ` Andy Whitcroft
2006-05-11 22:52 ` Christoph Lameter
2006-05-11 23:30 ` Linus Torvalds
2006-05-11 23:44 ` Andrew Morton
2006-05-12 0:10 ` Linus Torvalds
2006-05-12 8:07 ` Andy Whitcroft
2006-05-12 14:25 ` Martin J. Bligh
2006-05-14 15:58 ` Andy Whitcroft
2006-05-12 1:51 ` Nick Piggin
2006-05-12 4:30 ` Andrew Morton
2006-05-12 5:05 ` Nick Piggin
2006-05-12 7:06 ` Peter Zijlstra [this message]
2006-05-12 8:04 ` Nick Piggin
2006-05-12 8:52 ` Peter Zijlstra
2006-05-12 8:07 ` Nick Piggin
2006-05-12 4:51 ` Christoph Lameter
2006-05-09 20:44 ` [RFC][PATCH 2/3] throttle writers of shared mappings Peter Zijlstra
2006-05-09 22:54 ` Nick Piggin
2006-05-09 22:55 ` Nick Piggin
2006-05-10 6:25 ` Peter Zijlstra
2006-05-09 20:44 ` [RFC][PATCH 3/3] optimize follow_pages() Peter Zijlstra
2006-05-10 6:30 ` Peter Zijlstra
2006-05-08 19:24 ` [RFC][PATCH 2/2] throttle writers of shared mappings Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1147417561.8951.17.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=anton@samba.org \
--cc=apw@shadowen.org \
--cc=arjan@infradead.org \
--cc=clameter@sgi.com \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
--cc=marcelo@kvack.org \
--cc=mbligh@google.com \
--cc=mel@csn.ul.ie \
--cc=paulmck@us.ibm.com \
--cc=piggin@cyberone.com.au \
--cc=riel@redhat.com \
--cc=rohitseth@google.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.