Re: [RFP-V2 0/3] Make mmu_notifier_invalidate_range_start able to sleep.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Robin Holt <holt@sgi.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Robin Holt <holt@sgi.com>, Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jack Steiner <steiner@sgi.com>,
	linux-mm@kvack.org
Subject: Re: [RFP-V2 0/3] Make mmu_notifier_invalidate_range_start able to sleep.
Date: Tue, 2 Feb 2010 18:48:33 -0600	[thread overview]
Message-ID: <20100203004833.GS6653@sgi.com> (raw)
In-Reply-To: <20100202201718.GQ4135@random.random>

> Yes with mm_count it's better and this way it should be safe. I think
> it's an ok tradeoff, hopefully then nobody will ask to schedule in
> ->invalidate_page. Still it'd be interesting (back to Andrew's
> argument) to understand what is fundamentally different that you are
> ok not to schedule in ->invalidate_page but you absolutely need it
> here. And yes this will break also my transparent hugepage patch that

In the _invalidate_page case, it is called by the kernel from sites where
the kernel is relying upon the reference count to eliminate the page from
use while maintaining the page's data as clean and ready to be released.
If the page is marked as dirty, etc. then the kernel will "do the right
thing" with the page to maintain data consistency.

The _invalidate_range_start/end pairs are used in places where the
caller's address space is being modified.  If we allow the attachers
to continue to use the old pages from the old mapping even for a short
time after the process has started to use the new pages, there would be
silent data corruption.

A difference is the kernel's expectations.  The truncate case is the one
place where the kernel's expectation for _invalidate_range_start/end
more closely matches those of _invalidate_page.  When I was babbling
about a new version of the patch, it basically adds that concept to the
_invalidate_range_start callout as a parameter.  Essentially changing
the bool atomic into a flag indicating the kernel does not expect this
step to be complete prior finishing this callout.

I don't like that second patch which is why I have not posted it.
It relies upon the fuzzy quantity of an "adequate" period of time between
when the file is truncated down before it may be extended again to ensure
data consistency.  Shrink and extend too quickly and problems will ensue.

> can't schedule inside the anon_vma->lock and uses the range calls to
> be safer (then maybe we can require the mmu notifier users to check
> PageTransHuge against the pages and handle the invalidate through
> ->invalidate_page or we can add ->invalidate_transhuge_page.

I don't think that is a problem.  I don't think the GRU has any issues
at all.  I believe that the invalidate even of a standard page size will
eliminate the entire TLB.  Jack was going to verify that the last time
I talked with him.  If it behaved any differently, I would be surprised
as it would be inconsistent with nearly every other TLB out there.

XPMEM will currently not work, but I believe I can get it to work quite
easily as I can walk the segment's PFN table without acquiring any
sleeping locks and decide to expand the page size for any invalidation
within that range.  With that, at the time of the callout, I can schedule
an invalidation of the appropriate size.

As for the transparent huge page patches, I have just skimmed them
lightly as I really don't have much time to understand your intention.
I do think I am agreeing with Christoph Lameter that using the migration
style mechanism is probably better in that it handles the invalidate_page
callouts, follows the page reference counts, and allows my asynchronous
invalidation expectation to persist.

If I read it correctly, your patch would now require invalidate_page
to be complete and have reference counts pushed back down upon return.
I probably missed a key portion of that patch and would welcome the
chance to be informed.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-02-03  0:48 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20100202040145.555474000@alcatraz.americas.sgi.com>
2010-02-02  4:01 ` [RFP-V2 1/3] Have mmu_notifiers use SRCU so they may safely schedule Robin Holt
2010-02-02  4:01 ` [RFP-V2 2/3] Fix unmap_vma() bug related to mmu_notifiers Robin Holt
2010-02-02  4:01 ` [RFP-V2 3/3] Make mmu_notifier_invalidate_range_start able to sleep Robin Holt
2010-02-02  8:09 ` [RFP-V2 0/3] " Christoph Hellwig
2010-02-02 12:59   ` Andrea Arcangeli
2010-02-02 13:13     ` Andrea Arcangeli
2010-02-02 13:29       ` Robin Holt
2010-02-02 13:40         ` Andrea Arcangeli
2010-02-02 13:51           ` Robin Holt
2010-02-02 14:10             ` Andrea Arcangeli
2010-02-02 14:21               ` Robin Holt
2010-02-02 14:59                 ` Andrea Arcangeli
2010-02-02 15:21                   ` Robin Holt
2010-02-02 16:01                     ` Andrea Arcangeli
2010-02-02 16:39                       ` Robin Holt
2010-02-02 16:52                         ` Andrea Arcangeli
2010-02-02 16:59                           ` Robin Holt
2010-02-02 17:31                             ` Robin Holt
2010-02-02 20:27                               ` Andrea Arcangeli
2010-02-02 20:17                             ` Andrea Arcangeli
2010-02-03  0:48                               ` Robin Holt [this message]
2010-02-03 17:14                                 ` Andrea Arcangeli
2010-02-03 17:18                                   ` Andrea Arcangeli
2010-02-03 19:54                                   ` Robin Holt
2010-02-02 13:23     ` Robin Holt
2010-02-02 13:35   ` Robin Holt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100203004833.GS6653@sgi.com \
    --to=holt@sgi.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).