All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>, Linus Torvalds <torvalds@osdl.org>,
	David Howells <dhowells@redhat.com>,
	linux-mm@kvack.org, Christoph Lameter <clameter@sgi.com>
Subject: Re: tracking dirty pages patches
Date: Wed, 24 May 2006 01:07:07 +0200	[thread overview]
Message-ID: <1148425627.10561.32.camel@lappy> (raw)
In-Reply-To: <Pine.LNX.4.64.0605222022100.11067@blonde.wat.veritas.com>

On Mon, 2006-05-22 at 20:31 +0100, Hugh Dickins wrote:
> Belated observations on your "tracking dirty pages" patches.

Thanks for the thorough examination, I always suspected there was
something I'd overlooked, this being my first foray into these parts of
the code.

> page_wrprotect is a nice use of rmap, but I see a couple of problems.
> One is in the lock ordering (there's info on mm lock ordering at the
> top of filemap.c, but I find the list at the top of rmap.c easier).
> 
> set_page_dirty has always (awkwardly) been liable to be called from
> very low in the hierarchy; whereas you're assuming clear_page_dirty
> is called from much higher up.  And in most cases there's no problem
> (please cross-check to confirm that); but try_to_free_buffers in fs/
> buffer.c calls it while holding mapping->private_lock - page_wrprotect
> called from test_clear_page_dirty then violates the order.
> 
> If we're lucky and that is indeed the only violation, maybe Andrew
> can recommend a change to try_to_free_buffers to avoid it: I have
> no appreciation of the issues at that end myself.

Not really familiar with the code myself either, but from some
inspection it seems safe to do so. ->private_lock seems to serialise
access to the page buffers, not the page state.

Will be in the next version.

> The other worries are in page_wrprotect_one's block
> 	entry = pte_mkclean(pte_wrprotect(*pte));
> 	ptep_establish(vma, address, pte, entry);
> 	update_mmu_cache(vma, address, entry);
> 	lazy_mmu_prot_update(entry);
> ptep_establish, update_mmu_cache and lazy_mmu_prot_update are tricky
> arch-dependent functions which have hitherto only been used on the
> current task mm, whereas you're now using them from (perhaps) another.
> 
> Well, no, I'm wrong: ptrace's get_user_pages has been using them
> from another process; but that's not so common a case as to reassure
> me there won't be issues on some architectures there.

Christoph Lameter has cleared up the waters here, thanks!

> Quite likely ptep_establish and update_mmu_cache are okay for use in
> that way (needs careful checking of arches), at least they take a vma
> argument from which the mm can be found.  Whereas lazy_mmu_prot_update
> looks likely to be wrong, but only does something on ia64: you need
> to consult ia64 mm gurus to check what's needed there.  Maybe it'll
> just be a suboptimal issue (but more important now than in ptrace
> to make it optimal).

Not much here, except to say: thanks for the discussions, they were very
educative.

> Is there a problem with page_wrprotect on VM_LOCKED vmas?  I'm not
> sure: usually VM_LOCKED guarantees no faulting, you abandon that.

Also cleared up by Christoph, he is my hero for today ;-)

> Like others, I don't care for "VM_SharedWritable": you followed the
> VM_ReadHint macros, but this isn't a read hint, and those are weird.
> 
> Personally, I much prefer the explicit
> 	((vma->vm_flags & (VM_SHARED|VM_WRITE)) == (VM_SHARED|VM_WRITE))
> which is the usual style for vm_flags tests throughout mm (except for
> the hugetlb test designed to melt away without HUGETLB).  But I may be
> in a minority on that, Linus did put an is_cow_mapping() in memory.c
> recently, so maybe you'd follow that and say is_shared_writable().

OK, done, is_shared_writable() is it.

> There's a clash and overlap between your "tracking dirty pages" patches
> and David Howell's "add notification of page becoming writable" patch.
> The merge of the two in 2.6.17-rc4-mm1 was wrong: your handle_pte_fault
> change meant it never reached David's page_mkwrite call in do_wp_page.

...

> Please take a look at that patch (David reposted it on linux-kernel
> last Friday, as 08/14 of FS-Cache try #10): I went over it with him
> many months ago, and it fills in at least one gap you're missing...
> 
> mprotect: we all forget mprotect, but mprotect(,,PROT_READ)
> followed by mprotect(,,PROT_WRITE) will give write permission to all
> those ptes you've carefully taken write permission from.  In the
> page_mkwrite patch, we found that was most easily fixed by using
> the !VM_SHARED vm_page_prot in place of the VM_SHARED one.  I
> expect you can simplify your patch a little by doing the same.

Whee, this took me a while to understand, but I think I've got it.
If I do what you propose to do, would there be any remaining users of
the MAP_SHARED part of protection_map left?

I shall try this approach tomorrow if I find some time.

> msync: I rather think that with your changes, if they're to stay,
> then all the page table walking code can be removed from msync -
> since it already skipped vmas which were not VM_SHARED, and there's
> nothing to gain from syncing the !mapping_cap_account_dirty ones.
> I think MS_ASYNC becomes a no-op, and sys_msync so small it won't
> deserve its own msync.c (madvise.c wouldn't be a bad place for it).
> Or am I missing something?

I vaguely remember some discussions on the semantics of these things.
I'll reread and examine the code.

> I'm not convinced that optimize-follow_pages is a worthwhile optimization
> (in some cases you're adding an atomic inc and dec), and it's irrelevant
> to your tracking of dirty pages, but I don't feel strongly about it.
> Except, if it stays then it needs fixing: the flags 0 case is doing
> a put_page without having done a get_page.

Not sure on the benefit either, I just did it to educate myself on the
subject (and blotched it on my way). Christoph kindly fixed the
offending condition.

I guess this patch could really do with some numbers if found that the
set_page_dirty() is needed at all.

> Though currently it seems only some powerpc #ifdef __DEBUG code is using
> follow_pages in that way: since that's not the common case, I think you'd
> best just remove the "if (flags & (FOLL_GET | FOLL_TOUCH))" condition
> from before the get_page.
> 
> (Why does follow_pages set_page_dirty at all?  I _think_ it's in case
> the get_user_pages caller forgets to set_page_dirty when releasing.
> But that's not how we usually write kernel code, to hide mistakes most
> of the time, and your mods may change the balance there.  Andrew will
> remember better whether that set_page_dirty has stronger justification.)
> 
> Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-05-23 23:07 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-22 19:31 tracking dirty pages patches Hugh Dickins
2006-05-22 20:29 ` Andrew Morton
2006-05-23  8:17   ` Nick Piggin
2006-05-23 14:55   ` Hugh Dickins
2006-05-23 16:24 ` Christoph Lameter
2006-05-23 19:21   ` Hugh Dickins
2006-05-23 19:31     ` Christoph Lameter
2006-05-23 20:34       ` Hugh Dickins
2006-05-23 21:16         ` Christoph Lameter
2006-05-23 21:17         ` Chen, Kenneth W
2006-05-23 21:40           ` update_mmu_cache vs. lazy_mmu_prot_update Christoph Lameter
2006-05-23 21:40             ` Christoph Lameter
2006-05-24 14:12             ` Hugh Dickins
2006-05-24 14:12               ` Hugh Dickins
2006-05-23 22:28         ` remove VM_LOCKED before remap_pfn_range and drop VM_SHM Christoph Lameter
2006-05-23 22:28           ` Christoph Lameter
2006-05-24 14:57           ` Hugh Dickins
2006-05-24 14:57             ` Hugh Dickins
2006-05-24  2:25         ` tracking dirty pages patches Arjan van de Ven
2006-05-24 15:10           ` Hugh Dickins
2006-05-25  2:26             ` Arjan van de Ven
2006-05-23 16:41 ` David Howells
2006-05-23 23:07 ` Peter Zijlstra [this message]
2006-05-24 14:20   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1148425627.10561.32.camel@lappy \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=dhowells@redhat.com \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.