linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Johannes Weiner <jweiner@redhat.com>
Cc: linux-mm@kvack.org, Mel Gorman <mel@csn.ul.ie>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH] THP: mremap support and TLB optimization #2
Date: Fri, 5 Aug 2011 17:52:59 +0200	[thread overview]
Message-ID: <20110805155259.GW9770@redhat.com> (raw)
In-Reply-To: <20110805105047.GA32064@redhat.com>

On Fri, Aug 05, 2011 at 12:50:47PM +0200, Johannes Weiner wrote:
> On Thu, Jul 28, 2011 at 04:26:31PM +0200, Andrea Arcangeli wrote:
> > Hello,
> > 
> > this is the latest version of the mremap THP native implementation
> > plus optimizations.
> > 
> > So first question is: am I right, the "- 1" that I am removing below
> > was buggy? It's harmless because these old_end/next are page aligned,
> > but if PAGE_SIZE would be 1, it'd break, right? It's really confusing
> > to read even if it happens to work. Please let me know because that "-
> > 1" ironically it's the thing I'm less comfortable about in this patch.
> 
> > @@ -134,14 +126,17 @@ unsigned long move_page_tables(struct vm
> >  {
> >  	unsigned long extent, next, old_end;
> >  	pmd_t *old_pmd, *new_pmd;
> > +	bool need_flush = false;
> >  
> >  	old_end = old_addr + len;
> >  	flush_cache_range(vma, old_addr, old_end);
> >  
> > +	mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end);
> > +
> >  	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
> >  		cond_resched();
> >  		next = (old_addr + PMD_SIZE) & PMD_MASK;
> > -		if (next - 1 > old_end)
> > +		if (next > old_end)
> >  			next = old_end;
> 
> If old_addr + PMD_SIZE overflows, next will be zero, thus smaller than
> old_end and not fixed up.  This results in a bogus extent length here:

So basically this -1 is to prevent an overflow and it's relaying on
PAGE_SIZE > 1 to be safe, which is safe assumption.

> >  		extent = next - old_addr;
> 
> which I think can overrun old_addr + len if the extent should have
> actually been smaller than the distance between new_addr and the next
> PMD as well as that LATENCY_LIMIT to which extent is capped a few
> lines below.  I haven't checked all the possibilities, though.
> 
> It could probably be
> 
> 	if (next > old_end || next - 1 > old_end)
> 		next = old_end
> 
> to catch the theoretical next == old_end + 1 case, but PAGE_SIZE > 1
> looks like a sensible assumption to me.

However if old_end == 0, I doubt it could still work safe because next
would be again zero leading to an extreme high extent. It looks like
the last page must not be allowed to be mapped for this to work
safe. sparc is using 0xf0000000 as TASK_SIZE. But being safe on the
PMD is better in case it spans more than 32-4 bits. The pmd_shift on
sparc32 seems at most 23, so it doesn't look problematic. Maybe some
other arch would lead to trouble, but it's possible it never happens
to be problematic and it was just an off by one error as I thought?
For this to break the userland must end less than a PMD_SIZE before
the end of the address space and it's possible no arch is like
that. x86 has pgd_size of 4m on 32bit nopae, and 2m pmd_size for pae
32/64bit and address space 32bit ends at 3g... leaving 1g vs 4m or
>=1g vs 2m.

But hey I prefer to be safe against overflow even if no arch is
actually going to be in trouble and if TASK_SIZE is always set at
least one PMD away from the overflowing point like it seems to happen
on sparc32 too.

So rather than doing -1 which looks more like an off by one error,
it's cleaner to do an explicit overflow check.

This first calculates the next pmd start which may be 0 if it
overflows. Then does 0 - old_addr and stores it in extent. Then
comares old_end-old_addr with extent (where extent is from old_addr to
the next pmd start). If old_end-old_addr is smaller than extent it
stores that into extent to stop at old_end instead of at the next pmd
start. So the -1 goes away.

diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -136,9 +136,10 @@ unsigned long move_page_tables(struct vm
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
 		next = (old_addr + PMD_SIZE) & PMD_MASK;
-		if (next > old_end)
-			next = old_end;
+		/* even if next overflowed, extent below will be ok */
 		extent = next - old_addr;
+		if (extent > old_end - old_addr)
+			extent = old_end - old_addr;
 		old_pmd = get_old_pmd(vma->vm_mm, old_addr);
 		if (!old_pmd)
 			continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-08-05 15:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-28 14:26 [PATCH] THP: mremap support and TLB optimization #2 Andrea Arcangeli
2011-08-04 15:32 ` Andrea Arcangeli
2011-08-05 10:50 ` Johannes Weiner
2011-08-05 15:52   ` Andrea Arcangeli [this message]
2011-08-05 15:25 ` Mel Gorman
2011-08-05 16:21   ` Andrea Arcangeli
2011-08-05 17:11     ` Mel Gorman
2011-08-05 18:18       ` Andrea Arcangeli
2011-08-06 16:31   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110805155259.GW9770@redhat.com \
    --to=aarcange@redhat.com \
    --cc=hughd@google.com \
    --cc=jweiner@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).