All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Hillf Danton <dhillf@gmail.com>, Hugh Dickins <hughd@google.com>,
	Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>
Subject: Re: oops in copy_page_rep()
Date: Wed, 9 Jan 2013 11:44:13 +0000	[thread overview]
Message-ID: <20130109114413.GA13475@suse.de> (raw)
In-Reply-To: <CA+55aFzaTvF7nYxWBT-G_b=xGz+_akRAeJ=U9iHy+Y=ZPo=pbA@mail.gmail.com>

On Tue, Jan 08, 2013 at 08:52:14AM -0800, Linus Torvalds wrote:
> On Tue, Jan 8, 2013 at 8:31 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >>
> >> Heh. I was more thinking about why do_huge_pmd_wp_page() needs it, but
> >> do_huge_pmd_numa_page() does not.
> >
> > It does. The check should be moved up.
> >
> >> Also, do we actually need it for huge_pmd_set_accessed()? The
> >> *placement* of that thing confuses me. And because it confuses me, I'd
> >> like to understand it.
> >
> > We need it for huge_pmd_set_accessed() too.
> >
> > Looks like a mis-merge. The original patch for huge_pmd_set_accessed() was
> > correct: http://lkml.org/lkml/2012/10/25/402
> 
> Not a merge error: the pmd_trans_splitting() check was removed by
> commit d10e63f29488 ("mm: numa: Create basic numa page hinting
> infrastructure").
> 
> Now, *why* it was removed, I can't tell. And it's not clear why the
> original code just had it in a conditional, while the suggested patch
> has that "goto repeat" thing.

It was a mistake by me to remove it and as I screwed up in October I no
longer remember how I managed it.

The retry versus "goto repeat" is a detail. By retrying the full fault
there is a possibility the split will still be in progress on fault
retry or that a new THP is collapsed underneath and a new split started
while the mmap_sem is released but both are unlikely. On the other side,
taking the anon_vma rwsem for write in wait_split_huge_page() could cause
delays elsewhere that would be almost impossible to detect so it is not
necessarily better. Retrying the fault as your patch does is reasonable.

> I suspect re-trying the fault (which I
> assume the original code did) is actually better, because that way you
> go through all the "should I reschedule as I return through the
> exception" stuff. I dunno.
> 
> Mel, that original patch came from you , although it was based on
> previous work by Peter/Ingo/Andrea. Can you walk us through the
> history and thinking about the loss of pmd_trans_splitting(). Was it
> purely a mistake? It looks intentional.
> 

Mistake. Andrea, Peter and Ingo did not make similar mistakes.

Looking at your patch, I also think that the check needs to be made before
the call to do_huge_pmd_numa_page() so it can reply on a pmd_same() check
to make sure a split did not start before the page table lock was taken.

In response you said to Andrea

	Also, and more fundamentally, since do_pmd_numa_page() doesn't
	take the orig_pmd thing as an argument (and re-check it under the
	page-table lock), testing pmd_trans_splitting() on it is pointless,
	since it can change later.

do_pmd_numa_page() is called for a normal PMD that is marked pmd_numa(), not
a THP PMD. As the mmap_sem is held it cannot collapse to a THP underneath us
after the pmd_trans_huge() check so it should be unnecessary to check
pmd_trans_splitting() there.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Hillf Danton <dhillf@gmail.com>, Hugh Dickins <hughd@google.com>,
	Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>
Subject: Re: oops in copy_page_rep()
Date: Wed, 9 Jan 2013 11:44:13 +0000	[thread overview]
Message-ID: <20130109114413.GA13475@suse.de> (raw)
In-Reply-To: <CA+55aFzaTvF7nYxWBT-G_b=xGz+_akRAeJ=U9iHy+Y=ZPo=pbA@mail.gmail.com>

On Tue, Jan 08, 2013 at 08:52:14AM -0800, Linus Torvalds wrote:
> On Tue, Jan 8, 2013 at 8:31 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >>
> >> Heh. I was more thinking about why do_huge_pmd_wp_page() needs it, but
> >> do_huge_pmd_numa_page() does not.
> >
> > It does. The check should be moved up.
> >
> >> Also, do we actually need it for huge_pmd_set_accessed()? The
> >> *placement* of that thing confuses me. And because it confuses me, I'd
> >> like to understand it.
> >
> > We need it for huge_pmd_set_accessed() too.
> >
> > Looks like a mis-merge. The original patch for huge_pmd_set_accessed() was
> > correct: http://lkml.org/lkml/2012/10/25/402
> 
> Not a merge error: the pmd_trans_splitting() check was removed by
> commit d10e63f29488 ("mm: numa: Create basic numa page hinting
> infrastructure").
> 
> Now, *why* it was removed, I can't tell. And it's not clear why the
> original code just had it in a conditional, while the suggested patch
> has that "goto repeat" thing.

It was a mistake by me to remove it and as I screwed up in October I no
longer remember how I managed it.

The retry versus "goto repeat" is a detail. By retrying the full fault
there is a possibility the split will still be in progress on fault
retry or that a new THP is collapsed underneath and a new split started
while the mmap_sem is released but both are unlikely. On the other side,
taking the anon_vma rwsem for write in wait_split_huge_page() could cause
delays elsewhere that would be almost impossible to detect so it is not
necessarily better. Retrying the fault as your patch does is reasonable.

> I suspect re-trying the fault (which I
> assume the original code did) is actually better, because that way you
> go through all the "should I reschedule as I return through the
> exception" stuff. I dunno.
> 
> Mel, that original patch came from you , although it was based on
> previous work by Peter/Ingo/Andrea. Can you walk us through the
> history and thinking about the loss of pmd_trans_splitting(). Was it
> purely a mistake? It looks intentional.
> 

Mistake. Andrea, Peter and Ingo did not make similar mistakes.

Looking at your patch, I also think that the check needs to be made before
the call to do_huge_pmd_numa_page() so it can reply on a pmd_same() check
to make sure a split did not start before the page table lock was taken.

In response you said to Andrea

	Also, and more fundamentally, since do_pmd_numa_page() doesn't
	take the orig_pmd thing as an argument (and re-check it under the
	page-table lock), testing pmd_trans_splitting() on it is pointless,
	since it can change later.

do_pmd_numa_page() is called for a normal PMD that is marked pmd_numa(), not
a THP PMD. As the mmap_sem is held it cannot collapse to a THP underneath us
after the pmd_trans_huge() check so it should be unnecessary to check
pmd_trans_splitting() there.

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2013-01-09 11:44 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-05 15:22 oops in copy_page_rep() Dave Jones
2013-01-06  3:57 ` Linus Torvalds
2013-01-07  0:37   ` Dave Jones
2013-01-07  3:38     ` Hugh Dickins
2013-01-06 11:55 ` Hillf Danton
2013-01-06 16:10   ` Dave Jones
2013-01-06 19:06   ` Hugh Dickins
2013-01-07 12:24     ` Hillf Danton
2013-01-07 17:34       ` Linus Torvalds
2013-01-08 13:04         ` Hillf Danton
2013-01-08 13:04           ` Hillf Danton
2013-01-08 15:37           ` Linus Torvalds
2013-01-08 15:37             ` Linus Torvalds
2013-01-08 16:31             ` Kirill A. Shutemov
2013-01-08 16:31               ` Kirill A. Shutemov
2013-01-08 16:52               ` Linus Torvalds
2013-01-08 16:52                 ` Linus Torvalds
2013-01-08 17:30                 ` Kirill A. Shutemov
2013-01-08 17:30                   ` Kirill A. Shutemov
2013-01-08 17:38                   ` Linus Torvalds
2013-01-08 17:38                     ` Linus Torvalds
2013-01-08 17:49                   ` Andrea Arcangeli
2013-01-08 17:49                     ` Andrea Arcangeli
2013-01-08 18:03                     ` Kirill A. Shutemov
2013-01-08 18:03                       ` Kirill A. Shutemov
2013-01-11  7:50                     ` Simon Jeons
2013-01-11  7:50                       ` Simon Jeons
2013-01-11 14:01                       ` Andrea Arcangeli
2013-01-11 14:01                         ` Andrea Arcangeli
2013-01-08 17:37                 ` Andrea Arcangeli
2013-01-08 17:37                   ` Andrea Arcangeli
2013-01-08 17:51                   ` Linus Torvalds
2013-01-08 18:03                     ` Andrea Arcangeli
2013-01-08 18:03                       ` Andrea Arcangeli
2013-01-08 18:21                       ` Linus Torvalds
2013-01-08 18:21                         ` Linus Torvalds
2013-01-09 11:38                         ` Hillf Danton
2013-01-09 11:38                           ` Hillf Danton
2013-01-09  4:23                   ` Hugh Dickins
2013-01-09  4:23                     ` Hugh Dickins
2013-01-09 11:44                 ` Mel Gorman [this message]
2013-01-09 11:44                   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130109114413.GA13475@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=dhillf@gmail.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.