From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@osdl.org>,
David Howells <dhowells@redhat.com>,
Christoph Lameter <christoph@lameter.com>,
Martin Bligh <mbligh@google.com>, Nick Piggin <npiggin@suse.de>,
Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH] mm: tracking shared dirty pages -v10
Date: Sat, 24 Jun 2006 00:00:17 +0200 [thread overview]
Message-ID: <1151100017.30819.50.camel@lappy> (raw)
In-Reply-To: <Pine.LNX.4.64.0606231933060.7524@blonde.wat.veritas.com>
On Fri, 2006-06-23 at 20:06 +0100, Hugh Dickins wrote:
> > - if (unlikely((vma->vm_flags & (VM_SHARED|VM_WRITE)) ==
> > - (VM_SHARED|VM_WRITE))) {
> > + /*
> > + * Only catch write-faults on shared writable pages, read-only
> > + * shared pages can get COWed by get_user_pages(.write=1, .force=1).
> > + */
> > + if (unlikely(((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
> > + (VM_WRITE|VM_SHARED)))) {
>
> I was about to say I'm satisfied with that test now, and it's all the
> better for not mentioning VM_PFNMAP there - though odd that you've
> exchanged WRITE and SHARED, and added extra unnecessary parentheses ;)
Oops, on the extra parentheses. I exchanged the order because that is
more consistent with all the other sites.
> But grrr, sigh, damn, argh - I now realize it's right to the first
> order (normal case) and to the second order (ptrace poke), but not
> to the third order (ptrace poke anon page here to be COWed -
> perhaps can't occur without intervening mprotects).
>
> That's not an issue for you at all (there are other places which
> are inconsistent on whether such pages are private or shared e.g.
> copy_one_pte does not wrprotect them), but could be a problem for
> David's page_mkwrite - there's a danger of passing it an anonymous
> page, which (depending on what the ->page_mkwrite actually does)
> could go seriously wrong.
>
> I guess it ought to be restructured
> if (PageAnon(old_page)) {
> ...
> } else if (shared writable vma) {
> ...
> }
> and a patch to do that should precede your dirty page patches
> (and the only change your dirty page patches need add here on top
> of that would be the dirty_page = old_page, get_page(dirty_page)).
Hmm, I'll investigate this and maybe write a follow-up patch if you or
someone else doesn't beaten me to it.
> (You've probably noticed that use of mprotect on problem vmas
> is liable to remove the cache bits. There have been occasional
> complaints about that, and perhaps a patch to fix it. But that's
> long-standing behaviour, so not something you need worry about:
> we've only had to worry about a new change in behaviour in mmap.)
Yeah, had noticed, but had assumed that since it was a user action
they'd know better than to call it on dubious mappings.
However if you say there is demand; I've been through all the arch's to
verify the use of that most ugly drm construct, and whipping up an arch
function like pgprot_modify() that would fold a pgprot_t and
protection_map[] argument together seems quite doable.
> > Page from remap_pfn_range() are explicitly excluded because their
> > COW semantics are already horrid enough (see vm_normal_page() in
> > do_wp_page()) and because they don't have a backing store anyway.
> >
> > copy_one_pte() cleans child PTEs, not wrong in the current context,
> > but why?
>
> Dates back to Linux 2.2 if not earlier. I think the idea is that the
> parent has its pte dirty, and that's enough to mark the page as dirty;
> the child is unlikely to dirty the page further itself, and a second
> instance of dirty pte for that page may entail some overhead. What
> overhead would vary from release to release, I've not thought through
> what the current saving would be in 2.6.17, probably very little.
> But you can imagine there might have been some release which would
> write out the page each time it found pte dirty, in which case good
> reason to clear it there. But I'm guessing, it's from before my time.
I'll keep that in mind for a future cleanup.
> > + if ((pgprot_val(vma->vm_page_prot) == pgprot_val(vm_page_prot) &&
> > + ((vm_flags & (VM_WRITE|VM_SHARED|VM_PFNMAP|VM_INSERTPAGE)) ==
> > + (VM_WRITE|VM_SHARED)) &&
> > + vma->vm_file && vma->vm_file->f_mapping &&
> > + mapping_cap_account_dirty(vma->vm_file->f_mapping)) ||
> > + (vma->vm_ops && vma->vm_ops->page_mkwrite))
> > + vma->vm_page_prot =
> > + protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC)];
> > +
>
> I'm dazzled by the beauty of it!
It's a real beauty isn't it :-)
> Given the change you've made to
> mapping_cap_account_dirty, you don't have to check vma->vm_file->f_mapping
> there; but other than that, seems right to me. May offend other tastes.
I meant to remove the mapping_cap_account_dirty() changes, so as not to
burden all call sites with the extra conditional. This one extra check
in here does not make the difference between beauty and beast.
> > Index: 2.6-mm/mm/rmap.c
> > ===================================================================
> > --- 2.6-mm.orig/mm/rmap.c 2006-06-22 17:59:07.000000000 +0200
> > +++ 2.6-mm/mm/rmap.c 2006-06-23 01:14:39.000000000 +0200
> > +static int page_mkclean_file(struct address_space *mapping, struct page *page)
> > +{
> > + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > + struct vm_area_struct *vma;
> > + struct prio_tree_iter iter;
> > + int ret = 0;
> > +
> > + BUG_ON(PageAnon(page));
> > +
> > + spin_lock(&mapping->i_mmap_lock);
> > + vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> > + int protect = mapping_cap_account_dirty(mapping) &&
> > + ((vma->vm_flags & (VM_WRITE|VM_SHARED|VM_PFNMAP|VM_INSERTPAGE)) ==
> > + (VM_WRITE|VM_SHARED));
>
> I think you can forget about testing VM_PFNMAP|VM_INSERTPAGE here,
> or else BUG_ON or WARN_ON(vma->vm_flags & (VM_PFNMAP|VM_INSERTPAGE)):
> those just shouldn't arrive here.
Done
> But I'm still puzzled, why did you move page_mkclean out from under
> the mapping_cap_account_dirty tests in page-writeback.c: no explanation
> in your change history. Doesn't that just add some unnecessary work,
> and this "protect" business?
You're right, added back under the wing of mapping_cap_account_dirty().
> If you're thinking ahead to other changes you want to make, you'd
> do better to add these changes then, and for now just work on the
> VM_SHARED vmas.
But I'll leave page_mkclean() as callable without, that way its more
true to its name.
Peter
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@osdl.org>,
David Howells <dhowells@redhat.com>,
Christoph Lameter <christoph@lameter.com>,
Martin Bligh <mbligh@google.com>, Nick Piggin <npiggin@suse.de>,
Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH] mm: tracking shared dirty pages -v10
Date: Sat, 24 Jun 2006 00:00:17 +0200 [thread overview]
Message-ID: <1151100017.30819.50.camel@lappy> (raw)
In-Reply-To: <Pine.LNX.4.64.0606231933060.7524@blonde.wat.veritas.com>
On Fri, 2006-06-23 at 20:06 +0100, Hugh Dickins wrote:
> > - if (unlikely((vma->vm_flags & (VM_SHARED|VM_WRITE)) ==
> > - (VM_SHARED|VM_WRITE))) {
> > + /*
> > + * Only catch write-faults on shared writable pages, read-only
> > + * shared pages can get COWed by get_user_pages(.write=1, .force=1).
> > + */
> > + if (unlikely(((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
> > + (VM_WRITE|VM_SHARED)))) {
>
> I was about to say I'm satisfied with that test now, and it's all the
> better for not mentioning VM_PFNMAP there - though odd that you've
> exchanged WRITE and SHARED, and added extra unnecessary parentheses ;)
Oops, on the extra parentheses. I exchanged the order because that is
more consistent with all the other sites.
> But grrr, sigh, damn, argh - I now realize it's right to the first
> order (normal case) and to the second order (ptrace poke), but not
> to the third order (ptrace poke anon page here to be COWed -
> perhaps can't occur without intervening mprotects).
>
> That's not an issue for you at all (there are other places which
> are inconsistent on whether such pages are private or shared e.g.
> copy_one_pte does not wrprotect them), but could be a problem for
> David's page_mkwrite - there's a danger of passing it an anonymous
> page, which (depending on what the ->page_mkwrite actually does)
> could go seriously wrong.
>
> I guess it ought to be restructured
> if (PageAnon(old_page)) {
> ...
> } else if (shared writable vma) {
> ...
> }
> and a patch to do that should precede your dirty page patches
> (and the only change your dirty page patches need add here on top
> of that would be the dirty_page = old_page, get_page(dirty_page)).
Hmm, I'll investigate this and maybe write a follow-up patch if you or
someone else doesn't beaten me to it.
> (You've probably noticed that use of mprotect on problem vmas
> is liable to remove the cache bits. There have been occasional
> complaints about that, and perhaps a patch to fix it. But that's
> long-standing behaviour, so not something you need worry about:
> we've only had to worry about a new change in behaviour in mmap.)
Yeah, had noticed, but had assumed that since it was a user action
they'd know better than to call it on dubious mappings.
However if you say there is demand; I've been through all the arch's to
verify the use of that most ugly drm construct, and whipping up an arch
function like pgprot_modify() that would fold a pgprot_t and
protection_map[] argument together seems quite doable.
> > Page from remap_pfn_range() are explicitly excluded because their
> > COW semantics are already horrid enough (see vm_normal_page() in
> > do_wp_page()) and because they don't have a backing store anyway.
> >
> > copy_one_pte() cleans child PTEs, not wrong in the current context,
> > but why?
>
> Dates back to Linux 2.2 if not earlier. I think the idea is that the
> parent has its pte dirty, and that's enough to mark the page as dirty;
> the child is unlikely to dirty the page further itself, and a second
> instance of dirty pte for that page may entail some overhead. What
> overhead would vary from release to release, I've not thought through
> what the current saving would be in 2.6.17, probably very little.
> But you can imagine there might have been some release which would
> write out the page each time it found pte dirty, in which case good
> reason to clear it there. But I'm guessing, it's from before my time.
I'll keep that in mind for a future cleanup.
> > + if ((pgprot_val(vma->vm_page_prot) == pgprot_val(vm_page_prot) &&
> > + ((vm_flags & (VM_WRITE|VM_SHARED|VM_PFNMAP|VM_INSERTPAGE)) ==
> > + (VM_WRITE|VM_SHARED)) &&
> > + vma->vm_file && vma->vm_file->f_mapping &&
> > + mapping_cap_account_dirty(vma->vm_file->f_mapping)) ||
> > + (vma->vm_ops && vma->vm_ops->page_mkwrite))
> > + vma->vm_page_prot =
> > + protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC)];
> > +
>
> I'm dazzled by the beauty of it!
It's a real beauty isn't it :-)
> Given the change you've made to
> mapping_cap_account_dirty, you don't have to check vma->vm_file->f_mapping
> there; but other than that, seems right to me. May offend other tastes.
I meant to remove the mapping_cap_account_dirty() changes, so as not to
burden all call sites with the extra conditional. This one extra check
in here does not make the difference between beauty and beast.
> > Index: 2.6-mm/mm/rmap.c
> > ===================================================================
> > --- 2.6-mm.orig/mm/rmap.c 2006-06-22 17:59:07.000000000 +0200
> > +++ 2.6-mm/mm/rmap.c 2006-06-23 01:14:39.000000000 +0200
> > +static int page_mkclean_file(struct address_space *mapping, struct page *page)
> > +{
> > + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > + struct vm_area_struct *vma;
> > + struct prio_tree_iter iter;
> > + int ret = 0;
> > +
> > + BUG_ON(PageAnon(page));
> > +
> > + spin_lock(&mapping->i_mmap_lock);
> > + vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> > + int protect = mapping_cap_account_dirty(mapping) &&
> > + ((vma->vm_flags & (VM_WRITE|VM_SHARED|VM_PFNMAP|VM_INSERTPAGE)) ==
> > + (VM_WRITE|VM_SHARED));
>
> I think you can forget about testing VM_PFNMAP|VM_INSERTPAGE here,
> or else BUG_ON or WARN_ON(vma->vm_flags & (VM_PFNMAP|VM_INSERTPAGE)):
> those just shouldn't arrive here.
Done
> But I'm still puzzled, why did you move page_mkclean out from under
> the mapping_cap_account_dirty tests in page-writeback.c: no explanation
> in your change history. Doesn't that just add some unnecessary work,
> and this "protect" business?
You're right, added back under the wing of mapping_cap_account_dirty().
> If you're thinking ahead to other changes you want to make, you'd
> do better to add these changes then, and for now just work on the
> VM_SHARED vmas.
But I'll leave page_mkclean() as callable without, that way its more
true to its name.
Peter
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-06-23 22:00 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-19 17:52 [PATCH 0/6] mm: tracking dirty pages -v9 Peter Zijlstra
2006-06-19 17:52 ` Peter Zijlstra
2006-06-19 17:52 ` [PATCH 1/6] mm: tracking shared dirty pages Peter Zijlstra
2006-06-19 17:52 ` Peter Zijlstra
2006-06-22 5:56 ` Andrew Morton
2006-06-22 5:56 ` Andrew Morton
2006-06-22 6:07 ` Christoph Lameter
2006-06-22 6:07 ` Christoph Lameter
2006-06-22 6:15 ` Andrew Morton
2006-06-22 6:15 ` Andrew Morton
2006-06-22 11:33 ` Peter Zijlstra
2006-06-22 11:33 ` Peter Zijlstra
2006-06-22 13:17 ` Hugh Dickins
2006-06-22 13:17 ` Hugh Dickins
2006-06-22 20:52 ` Hugh Dickins
2006-06-22 20:52 ` Hugh Dickins
2006-06-22 23:02 ` Peter Zijlstra
2006-06-22 23:02 ` Peter Zijlstra
2006-06-22 23:39 ` [PATCH] mm: tracking shared dirty pages -v10 Peter Zijlstra
2006-06-22 23:39 ` Peter Zijlstra
2006-06-23 3:10 ` Jeff Dike
2006-06-23 3:10 ` Jeff Dike
2006-06-23 3:31 ` Andrew Morton
2006-06-23 3:31 ` Andrew Morton
2006-06-23 3:50 ` Jeff Dike
2006-06-23 3:50 ` Jeff Dike
2006-06-23 4:01 ` H. Peter Anvin
2006-06-23 4:01 ` H. Peter Anvin
2006-06-23 15:08 ` Jeff Dike
2006-06-23 15:08 ` Jeff Dike
2006-06-23 6:08 ` Linus Torvalds
2006-06-23 6:08 ` Linus Torvalds
2006-06-23 7:27 ` Hugh Dickins
2006-06-23 7:27 ` Hugh Dickins
2006-06-23 17:00 ` Christoph Lameter
2006-06-23 17:00 ` Christoph Lameter
2006-06-23 17:22 ` Peter Zijlstra
2006-06-23 17:22 ` Peter Zijlstra
2006-06-23 17:52 ` Christoph Lameter
2006-06-23 17:52 ` Christoph Lameter
2006-06-23 18:11 ` Martin Bligh
2006-06-23 18:11 ` Martin Bligh
2006-06-23 18:20 ` Linus Torvalds
2006-06-23 18:20 ` Linus Torvalds
2006-06-23 17:56 ` Linus Torvalds
2006-06-23 17:56 ` Linus Torvalds
2006-06-23 18:03 ` Peter Zijlstra
2006-06-23 18:03 ` Peter Zijlstra
2006-06-23 18:23 ` Christoph Lameter
2006-06-23 18:23 ` Christoph Lameter
2006-06-23 18:41 ` Christoph Hellwig
2006-06-23 18:41 ` Christoph Hellwig
2006-06-23 17:49 ` Linus Torvalds
2006-06-23 17:49 ` Linus Torvalds
2006-06-23 18:05 ` Arjan van de Ven
2006-06-23 18:05 ` Arjan van de Ven
2006-06-23 18:08 ` Miklos Szeredi
2006-06-23 18:08 ` Miklos Szeredi
2006-06-23 19:06 ` Hugh Dickins
2006-06-23 19:06 ` Hugh Dickins
2006-06-23 22:00 ` Peter Zijlstra [this message]
2006-06-23 22:00 ` Peter Zijlstra
2006-06-23 22:35 ` Linus Torvalds
2006-06-23 22:35 ` Linus Torvalds
2006-06-23 22:44 ` Peter Zijlstra
2006-06-23 22:44 ` Peter Zijlstra
2006-06-28 14:58 ` [RFC][PATCH] mm: fixup do_wp_page() Peter Zijlstra
2006-06-28 14:58 ` Peter Zijlstra
2006-06-28 18:20 ` Hugh Dickins
2006-06-28 18:20 ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 2/6] mm: balance dirty pages Peter Zijlstra
2006-06-19 17:53 ` Peter Zijlstra
2006-06-19 17:53 ` [PATCH 3/6] mm: msync() cleanup Peter Zijlstra
2006-06-19 17:53 ` Peter Zijlstra
2006-06-22 17:02 ` Hugh Dickins
2006-06-22 17:02 ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 4/6] mm: optimize the new mprotect() code a bit Peter Zijlstra
2006-06-19 17:53 ` Peter Zijlstra
2006-06-22 17:21 ` Hugh Dickins
2006-06-22 17:21 ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 5/6] mm: small cleanup of install_page() Peter Zijlstra
2006-06-19 17:53 ` Peter Zijlstra
2006-06-19 17:53 ` [PATCH 6/6] mm: remove some update_mmu_cache() calls Peter Zijlstra
2006-06-19 17:53 ` Peter Zijlstra
2006-06-22 16:29 ` Hugh Dickins
2006-06-22 16:29 ` Hugh Dickins
2006-06-22 16:37 ` Christoph Lameter
2006-06-22 16:37 ` Christoph Lameter
2006-06-22 17:35 ` Hugh Dickins
2006-06-22 17:35 ` Hugh Dickins
2006-06-22 18:31 ` Christoph Lameter
2006-06-22 18:31 ` Christoph Lameter
-- strict thread matches above, loose matches on Subject: below --
2006-06-23 18:27 [PATCH] mm: tracking shared dirty pages -v10 Brian D. McGrew
2006-06-23 18:27 ` Brian D. McGrew
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1151100017.30819.50.camel@lappy \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@osdl.org \
--cc=christoph@lameter.com \
--cc=dhowells@redhat.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mbligh@google.com \
--cc=npiggin@suse.de \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.