All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@osdl.org>,
	David Howells <dhowells@redhat.com>,
	Christoph Lameter <christoph@lameter.com>,
	Martin Bligh <mbligh@google.com>, Nick Piggin <npiggin@suse.de>,
	Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH 1/6] mm: tracking shared dirty pages
Date: Fri, 23 Jun 2006 01:02:27 +0200	[thread overview]
Message-ID: <1151017347.15744.135.camel@lappy> (raw)
In-Reply-To: <Pine.LNX.4.64.0606222126310.26805@blonde.wat.veritas.com>

On Thu, 2006-06-22 at 21:52 +0100, Hugh Dickins wrote:
> On Mon, 19 Jun 2006, Peter Zijlstra wrote:

> > +static inline int is_shared_writable(unsigned int flags)
> > +{
> > +	return (flags & (VM_SHARED|VM_WRITE|VM_PFNMAP)) ==
> > +		(VM_SHARED|VM_WRITE);
> > +}
> > +
> 
> Andrew asked for the inclusion of VM_PFNMAP to be commented there,
> I don't believe that's enough: a function called "is_shared_writable"
> should be testing precisely that, or people will misuse it.
> 
> Either you change the name to "is_shared_writable_but_not_pfnmap"
> or somesuch, or you split out the VM_PFNMAP test, or you do away
> with the function and make the tests explicit inline.  As before,
> my instinctive preference is the latter: I really want to see what's
> being tested (especially in do_wp_page); but perhaps it'll just look
> too ugly all over - give it a try and see.

*sight*, thats it, explicit it will be :-)

> > +	/*
> > +	 * This is not fully correct in the light of trapping write faults
> > +	 * for writable shared mappings. However since we're going to mark
> > +	 * the page dirty anyway some few lines downward, we might as well
> > +	 * take the write fault now.
> > +	 */
> 
> I don't understand what you're getting at here: please explain,
> what is not fully correct and why?  In mail first, then we can
> decide what the comment should say, or if it should be removed.
> follow_page isn't making a pte writable, so what's the issue?

I have no idea either, I reread this part earlier today and found it one
big brainfart. It does indeed seem to do the right thing.

> > -	if (unlikely(vma->vm_flags & VM_SHARED)) {
> > +	if (unlikely(is_shared_writable(vma->vm_flags))) {
> 
> Most interesting line in the series, yes, and I'd find it
> easier to think through if it showed the flags test explicitly:
> 	if ((vma->vm_flags & (VM_SHARED|VM_WRITE|VM_PFNMAP)) ==
> 		(VM_SHARED|VM_WRITE))
> 
> Yes, Andrew, you're right it's a change in behaviour from David's
> page_mkwrite patch.  I've realized that when I was originally
> reviewing David's patch, I believed do_wp_page was mistaken to be
> doing COW on VM_SHARED areas.  But Linus has since asserted very
> forcefully that it's intentional, that ptrace poke on a VM_SHARED
> area which is currently not !VM_WRITE should COW it, so I mentioned
> that to Peter.
> 
> Has he got the test right there now?  Ummm... maybe: my brain
> exploded weeks ago.  Several strangenesses collide here, I'll
> try again tomorrow, maybe others will argue it to certainty before.

I don't think the VM_PFNMAP is needed here, but it doesn't hurt either.
Like said, I'll do explicits from now on.

> > @@ -1084,18 +1086,13 @@ munmap_back:
> >  		error = file->f_op->mmap(file, vma);
> >  		if (error)
> >  			goto unmap_and_free_vma;
> > +
> 
> Do you really need this blank line?

:-) uhu..

> > +	/*
> > +	 * Tracking of dirty pages for shared writable mappings. Do this by
> > +	 * write protecting writable pages, and mark dirty in the write fault.
> > +	 *
> > +	 * Modify vma->vm_page_prot (the default protection for new pages)
> > +	 * to this effect.
> > +	 *
> > +	 * Cannot do before because the condition depends on:
> > +	 *  - backing_dev_info having the right capabilities
> > +	 *    (set by f_op->open())
> 
> Is that so, backing_dev_info set by f_op->open()?
> And how would that be a problem here if it were so?

useless information indeed, a remnant from old times when I placed the
vm_page_prot modification between the two calls, shall remove.

> > +	 *  - vma->vm_flags being fully set
> > +	 *    (finished in f_op->mmap(), which could call remap_pfn_range())
> > +	 *
> > +	 *  Also, cannot reset vma->vm_page_prot from vma->vm_flags because
> > +	 *  f_op->mmap() can modify it.
> > +	 */
> > +	if (is_shared_writable(vm_flags) && vma->vm_file)
> > +		mapping = vma->vm_file->f_mapping;
> > +	if ((mapping && mapping_cap_account_dirty(mapping)) ||
> > +			(vma->vm_ops && vma->vm_ops->page_mkwrite))
> 
> The only way "mapping" might be set is just above.
> Wouldn't it all be clearer (though more indented) if you said
> 
> 	if (is_shared_writable(vm_flags) && vma->vm_file) {
> 		mapping = vma->vm_file->f_mapping;
> 		if ((mapping && mapping_cap_account_dirty(mapping)) ||
> 				(vma->vm_ops && vma->vm_ops->page_mkwrite)) {
> 			vma->vm_page_prot = whatever;
> 		}
> 	}
> 
> Or no need for "mapping" here at all if you change
> mapping_cap_account_dirty(vma->vm_file->f_mapping)
> to do the right thing with NULL.

Made it one big if stmt, perhaps too big, we'll see.

> 
> > +		vma->vm_page_prot =
> > +			__pgprot(pte_val
> > +				(pte_wrprotect
> > +				 (__pte(pgprot_val(vma->vm_page_prot)))));
> > +
> 
> In other mail I've suggested saving vm_page_prot above, and
> changing it here only if the driver's ->mmap did not change it.

Yes, that was a very good suggestion and has already been incorporated,
thanks.

> I remain uneasy about interfering with the permissions expected by
> strange drivers, but can't really justify my paranoia.  Certainly
> you're right to exclude VM_PFNMAPs from this interference, that's
> important; I'd be less uneasy if you also exclude VM_INSERTPAGEs,
> they're strange too - but at least they're dealing with proper struct
> pages, so should be able to handle an unexpected do_wp_page; that
> leaves the driver nopage cases, which again should be okay now you're
> (one way or another) protecting specially added vm_page_prot flags.

VM_INSERTPAGE thou shall have.

> I guess I'm just paranoid; it's irritating me that we do not have
> the right backing_dev_infos in place and having to hack around it.

Sad situation but true.

> > +static int page_mkclean_file(struct address_space *mapping, struct page *page)
> > +{
> > +	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +	struct vm_area_struct *vma;
> > +	struct prio_tree_iter iter;
> > +	int ret = 0;
> > +
> > +	BUG_ON(PageAnon(page));
> > +
> > +	spin_lock(&mapping->i_mmap_lock);
> > +	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> > +		int protect = mapping_cap_account_dirty(mapping) &&
> > +			is_shared_writable(vma->vm_flags);
> > +		ret += page_mkclean_one(page, vma, protect);
> 
> You have a good point here, one I'd completely missed: because a vma
> may have been recently mprotected !VM_WRITE, you have to check readonly
> mappings too.  Perhaps worth a comment.  But I think "is_shared_writable"
> is not the best test here: just test for VM_SHARED vmas, they're the
> only ones which can be mprotected to/from shared writable.  And then
> I think you don't need to pass down an additional "protect" argument?
> It's only being called for mapping_cap_account_dirty mappings anyway,
> isn't it?

Well, no, not anymore. I thought to make it actually do what its name
said it does: clean the page's PTEs (I am even pondering about
implementing the anonymous branch).

In that light, its now called for each page.


New patch will follow shortly since I can't seem to sleep anyway...




WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@osdl.org>,
	David Howells <dhowells@redhat.com>,
	Christoph Lameter <christoph@lameter.com>,
	Martin Bligh <mbligh@google.com>, Nick Piggin <npiggin@suse.de>,
	Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH 1/6] mm: tracking shared dirty pages
Date: Fri, 23 Jun 2006 01:02:27 +0200	[thread overview]
Message-ID: <1151017347.15744.135.camel@lappy> (raw)
In-Reply-To: <Pine.LNX.4.64.0606222126310.26805@blonde.wat.veritas.com>

On Thu, 2006-06-22 at 21:52 +0100, Hugh Dickins wrote:
> On Mon, 19 Jun 2006, Peter Zijlstra wrote:

> > +static inline int is_shared_writable(unsigned int flags)
> > +{
> > +	return (flags & (VM_SHARED|VM_WRITE|VM_PFNMAP)) ==
> > +		(VM_SHARED|VM_WRITE);
> > +}
> > +
> 
> Andrew asked for the inclusion of VM_PFNMAP to be commented there,
> I don't believe that's enough: a function called "is_shared_writable"
> should be testing precisely that, or people will misuse it.
> 
> Either you change the name to "is_shared_writable_but_not_pfnmap"
> or somesuch, or you split out the VM_PFNMAP test, or you do away
> with the function and make the tests explicit inline.  As before,
> my instinctive preference is the latter: I really want to see what's
> being tested (especially in do_wp_page); but perhaps it'll just look
> too ugly all over - give it a try and see.

*sight*, thats it, explicit it will be :-)

> > +	/*
> > +	 * This is not fully correct in the light of trapping write faults
> > +	 * for writable shared mappings. However since we're going to mark
> > +	 * the page dirty anyway some few lines downward, we might as well
> > +	 * take the write fault now.
> > +	 */
> 
> I don't understand what you're getting at here: please explain,
> what is not fully correct and why?  In mail first, then we can
> decide what the comment should say, or if it should be removed.
> follow_page isn't making a pte writable, so what's the issue?

I have no idea either, I reread this part earlier today and found it one
big brainfart. It does indeed seem to do the right thing.

> > -	if (unlikely(vma->vm_flags & VM_SHARED)) {
> > +	if (unlikely(is_shared_writable(vma->vm_flags))) {
> 
> Most interesting line in the series, yes, and I'd find it
> easier to think through if it showed the flags test explicitly:
> 	if ((vma->vm_flags & (VM_SHARED|VM_WRITE|VM_PFNMAP)) ==
> 		(VM_SHARED|VM_WRITE))
> 
> Yes, Andrew, you're right it's a change in behaviour from David's
> page_mkwrite patch.  I've realized that when I was originally
> reviewing David's patch, I believed do_wp_page was mistaken to be
> doing COW on VM_SHARED areas.  But Linus has since asserted very
> forcefully that it's intentional, that ptrace poke on a VM_SHARED
> area which is currently not !VM_WRITE should COW it, so I mentioned
> that to Peter.
> 
> Has he got the test right there now?  Ummm... maybe: my brain
> exploded weeks ago.  Several strangenesses collide here, I'll
> try again tomorrow, maybe others will argue it to certainty before.

I don't think the VM_PFNMAP is needed here, but it doesn't hurt either.
Like said, I'll do explicits from now on.

> > @@ -1084,18 +1086,13 @@ munmap_back:
> >  		error = file->f_op->mmap(file, vma);
> >  		if (error)
> >  			goto unmap_and_free_vma;
> > +
> 
> Do you really need this blank line?

:-) uhu..

> > +	/*
> > +	 * Tracking of dirty pages for shared writable mappings. Do this by
> > +	 * write protecting writable pages, and mark dirty in the write fault.
> > +	 *
> > +	 * Modify vma->vm_page_prot (the default protection for new pages)
> > +	 * to this effect.
> > +	 *
> > +	 * Cannot do before because the condition depends on:
> > +	 *  - backing_dev_info having the right capabilities
> > +	 *    (set by f_op->open())
> 
> Is that so, backing_dev_info set by f_op->open()?
> And how would that be a problem here if it were so?

useless information indeed, a remnant from old times when I placed the
vm_page_prot modification between the two calls, shall remove.

> > +	 *  - vma->vm_flags being fully set
> > +	 *    (finished in f_op->mmap(), which could call remap_pfn_range())
> > +	 *
> > +	 *  Also, cannot reset vma->vm_page_prot from vma->vm_flags because
> > +	 *  f_op->mmap() can modify it.
> > +	 */
> > +	if (is_shared_writable(vm_flags) && vma->vm_file)
> > +		mapping = vma->vm_file->f_mapping;
> > +	if ((mapping && mapping_cap_account_dirty(mapping)) ||
> > +			(vma->vm_ops && vma->vm_ops->page_mkwrite))
> 
> The only way "mapping" might be set is just above.
> Wouldn't it all be clearer (though more indented) if you said
> 
> 	if (is_shared_writable(vm_flags) && vma->vm_file) {
> 		mapping = vma->vm_file->f_mapping;
> 		if ((mapping && mapping_cap_account_dirty(mapping)) ||
> 				(vma->vm_ops && vma->vm_ops->page_mkwrite)) {
> 			vma->vm_page_prot = whatever;
> 		}
> 	}
> 
> Or no need for "mapping" here at all if you change
> mapping_cap_account_dirty(vma->vm_file->f_mapping)
> to do the right thing with NULL.

Made it one big if stmt, perhaps too big, we'll see.

> 
> > +		vma->vm_page_prot =
> > +			__pgprot(pte_val
> > +				(pte_wrprotect
> > +				 (__pte(pgprot_val(vma->vm_page_prot)))));
> > +
> 
> In other mail I've suggested saving vm_page_prot above, and
> changing it here only if the driver's ->mmap did not change it.

Yes, that was a very good suggestion and has already been incorporated,
thanks.

> I remain uneasy about interfering with the permissions expected by
> strange drivers, but can't really justify my paranoia.  Certainly
> you're right to exclude VM_PFNMAPs from this interference, that's
> important; I'd be less uneasy if you also exclude VM_INSERTPAGEs,
> they're strange too - but at least they're dealing with proper struct
> pages, so should be able to handle an unexpected do_wp_page; that
> leaves the driver nopage cases, which again should be okay now you're
> (one way or another) protecting specially added vm_page_prot flags.

VM_INSERTPAGE thou shall have.

> I guess I'm just paranoid; it's irritating me that we do not have
> the right backing_dev_infos in place and having to hack around it.

Sad situation but true.

> > +static int page_mkclean_file(struct address_space *mapping, struct page *page)
> > +{
> > +	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > +	struct vm_area_struct *vma;
> > +	struct prio_tree_iter iter;
> > +	int ret = 0;
> > +
> > +	BUG_ON(PageAnon(page));
> > +
> > +	spin_lock(&mapping->i_mmap_lock);
> > +	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> > +		int protect = mapping_cap_account_dirty(mapping) &&
> > +			is_shared_writable(vma->vm_flags);
> > +		ret += page_mkclean_one(page, vma, protect);
> 
> You have a good point here, one I'd completely missed: because a vma
> may have been recently mprotected !VM_WRITE, you have to check readonly
> mappings too.  Perhaps worth a comment.  But I think "is_shared_writable"
> is not the best test here: just test for VM_SHARED vmas, they're the
> only ones which can be mprotected to/from shared writable.  And then
> I think you don't need to pass down an additional "protect" argument?
> It's only being called for mapping_cap_account_dirty mappings anyway,
> isn't it?

Well, no, not anymore. I thought to make it actually do what its name
said it does: clean the page's PTEs (I am even pondering about
implementing the anonymous branch).

In that light, its now called for each page.


New patch will follow shortly since I can't seem to sleep anyway...



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-06-22 23:02 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-19 17:52 [PATCH 0/6] mm: tracking dirty pages -v9 Peter Zijlstra
2006-06-19 17:52 ` Peter Zijlstra
2006-06-19 17:52 ` [PATCH 1/6] mm: tracking shared dirty pages Peter Zijlstra
2006-06-19 17:52   ` Peter Zijlstra
2006-06-22  5:56   ` Andrew Morton
2006-06-22  5:56     ` Andrew Morton
2006-06-22  6:07     ` Christoph Lameter
2006-06-22  6:07       ` Christoph Lameter
2006-06-22  6:15       ` Andrew Morton
2006-06-22  6:15         ` Andrew Morton
2006-06-22 11:33     ` Peter Zijlstra
2006-06-22 11:33       ` Peter Zijlstra
2006-06-22 13:17       ` Hugh Dickins
2006-06-22 13:17         ` Hugh Dickins
2006-06-22 20:52   ` Hugh Dickins
2006-06-22 20:52     ` Hugh Dickins
2006-06-22 23:02     ` Peter Zijlstra [this message]
2006-06-22 23:02       ` Peter Zijlstra
2006-06-22 23:39     ` [PATCH] mm: tracking shared dirty pages -v10 Peter Zijlstra
2006-06-22 23:39       ` Peter Zijlstra
2006-06-23  3:10       ` Jeff Dike
2006-06-23  3:10         ` Jeff Dike
2006-06-23  3:31         ` Andrew Morton
2006-06-23  3:31           ` Andrew Morton
2006-06-23  3:50           ` Jeff Dike
2006-06-23  3:50             ` Jeff Dike
2006-06-23  4:01           ` H. Peter Anvin
2006-06-23  4:01             ` H. Peter Anvin
2006-06-23 15:08             ` Jeff Dike
2006-06-23 15:08               ` Jeff Dike
2006-06-23  6:08       ` Linus Torvalds
2006-06-23  6:08         ` Linus Torvalds
2006-06-23  7:27         ` Hugh Dickins
2006-06-23  7:27           ` Hugh Dickins
2006-06-23 17:00           ` Christoph Lameter
2006-06-23 17:00             ` Christoph Lameter
2006-06-23 17:22             ` Peter Zijlstra
2006-06-23 17:22               ` Peter Zijlstra
2006-06-23 17:52               ` Christoph Lameter
2006-06-23 17:52                 ` Christoph Lameter
2006-06-23 18:11                 ` Martin Bligh
2006-06-23 18:11                   ` Martin Bligh
2006-06-23 18:20                   ` Linus Torvalds
2006-06-23 18:20                     ` Linus Torvalds
2006-06-23 17:56               ` Linus Torvalds
2006-06-23 17:56                 ` Linus Torvalds
2006-06-23 18:03                 ` Peter Zijlstra
2006-06-23 18:03                   ` Peter Zijlstra
2006-06-23 18:23                   ` Christoph Lameter
2006-06-23 18:23                     ` Christoph Lameter
2006-06-23 18:41                 ` Christoph Hellwig
2006-06-23 18:41                   ` Christoph Hellwig
2006-06-23 17:49           ` Linus Torvalds
2006-06-23 17:49             ` Linus Torvalds
2006-06-23 18:05             ` Arjan van de Ven
2006-06-23 18:05               ` Arjan van de Ven
2006-06-23 18:08             ` Miklos Szeredi
2006-06-23 18:08               ` Miklos Szeredi
2006-06-23 19:06       ` Hugh Dickins
2006-06-23 19:06         ` Hugh Dickins
2006-06-23 22:00         ` Peter Zijlstra
2006-06-23 22:00           ` Peter Zijlstra
2006-06-23 22:35           ` Linus Torvalds
2006-06-23 22:35             ` Linus Torvalds
2006-06-23 22:44             ` Peter Zijlstra
2006-06-23 22:44               ` Peter Zijlstra
2006-06-28 14:58         ` [RFC][PATCH] mm: fixup do_wp_page() Peter Zijlstra
2006-06-28 14:58           ` Peter Zijlstra
2006-06-28 18:20           ` Hugh Dickins
2006-06-28 18:20             ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 2/6] mm: balance dirty pages Peter Zijlstra
2006-06-19 17:53   ` Peter Zijlstra
2006-06-19 17:53 ` [PATCH 3/6] mm: msync() cleanup Peter Zijlstra
2006-06-19 17:53   ` Peter Zijlstra
2006-06-22 17:02   ` Hugh Dickins
2006-06-22 17:02     ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 4/6] mm: optimize the new mprotect() code a bit Peter Zijlstra
2006-06-19 17:53   ` Peter Zijlstra
2006-06-22 17:21   ` Hugh Dickins
2006-06-22 17:21     ` Hugh Dickins
2006-06-19 17:53 ` [PATCH 5/6] mm: small cleanup of install_page() Peter Zijlstra
2006-06-19 17:53   ` Peter Zijlstra
2006-06-19 17:53 ` [PATCH 6/6] mm: remove some update_mmu_cache() calls Peter Zijlstra
2006-06-19 17:53   ` Peter Zijlstra
2006-06-22 16:29   ` Hugh Dickins
2006-06-22 16:29     ` Hugh Dickins
2006-06-22 16:37     ` Christoph Lameter
2006-06-22 16:37       ` Christoph Lameter
2006-06-22 17:35       ` Hugh Dickins
2006-06-22 17:35         ` Hugh Dickins
2006-06-22 18:31         ` Christoph Lameter
2006-06-22 18:31           ` Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2006-06-28 20:17 [PATCH 0/6] mm: tracking dirty pages -v14 Peter Zijlstra
2006-06-28 20:17 ` [PATCH 1/6] mm: tracking shared dirty pages Peter Zijlstra
2006-06-28 20:17   ` Peter Zijlstra
2006-06-13 11:21 [PATCH 0/6] mm: tracking dirty pages -v8 Peter Zijlstra
2006-06-13 11:21 ` [PATCH 1/6] mm: tracking shared dirty pages Peter Zijlstra
2006-06-13 11:21   ` Peter Zijlstra
2006-06-13 12:03   ` Andi Kleen
2006-06-13 12:15     ` Peter Zijlstra
2006-06-13 12:24       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1151017347.15744.135.camel@lappy \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=christoph@lameter.com \
    --cc=dhowells@redhat.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbligh@google.com \
    --cc=npiggin@suse.de \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.