All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Ulrich Obergfell <uobergfe@redhat.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH] mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode
Date: Fri, 16 Mar 2012 12:54:18 +0100	[thread overview]
Message-ID: <20120316115418.GC24602@redhat.com> (raw)
In-Reply-To: <20120315162711.0870c27b.akpm@linux-foundation.org>

On Thu, Mar 15, 2012 at 04:27:11PM -0700, Andrew Morton wrote:
> On Fri, 16 Mar 2012 00:15:56 +0100
> Andrea Arcangeli <aarcange@redhat.com> wrote:
> 
> > On Thu, Mar 15, 2012 at 03:45:04PM -0700, Andrew Morton wrote:
> > > Or do we still need pdm_trans_unstable() checking in
> > > mem_cgroup_count_precharge_pte_range() and
> > > mem_cgroup_move_charge_pte_range()?
> > 
> > I think we need a pmd_trans_unstable check before the
> > pte_offset_map_lock in both places. Otherwise with only the mmap_sem
> > hold for reading, the pmd may have been transhuge,
> > mem_cgroup_move_charge_pte_range could be called, and then
> > MADV_DONTNEED would transform the pmd to none from another thread just
> > before pmd_trans_huge_lock runs, and we would end up doing
> > pmd_offset_map_lock on a none pmd (or a transhuge pmd if it becomes
> > huge again before we get there).
> 
> page_table_lock doesn't prevent the race?  pmd_trans_huge_lock()
> rechecks after taking that lock...

pmd_trans_huge_lock makes the THP path safe. No change needed in that
path, after taking the page_table_lock we're safe there and it'll stop
changing.

The problem is when the pmd_trans_huge isn't set when
pmd_trans_huge_lock runs, so we fallback in the pte walk without
holding the page_table_lock. And the pte walk then needs a
pmd_trans_unstable check before calling pte_offset_map_lock on the
pmd, to skip the pmd in case a race triggered and the pmd may have
become none (or trans huge again).

The pmd_trans_unstable check is a noop for builds with THP disabled.

It can transition to none to transhuge freely under any code with
mmap_sem in read mode. It stops changing only if it becomes a regular
pmd pointing to a pte (that's because free_pgtables is only run with
mmap_sem in write mode). Only if it is a regular pmd we can start the
pte walk and take the PT lock.

> > Only if pmd_trans_unstable is false, the pmd can't change from under
> > us, so we can proceed safely with the pte level walk (and it just need
> > to be checked with a compiler barrier, as the real pmd changes freely
> > from under us).
> > 
> > pmd_trans_unstable will never actually trigger unless we're hitting
> > the race, if the pmd was none when we started the walk we'd abort at
> > the higher level (method not called), if the pmd was transhuge we'd
> > stop at the pmd_trans_huge_lock() == 1 branch. So the only way to run
> > pmd_trans_unstable is when the result is undefined, i.e. the pmd was
> > not none initially but it become none or transhuge or none again at
> > some point, so we can just simply consider it still none and skip for
> > the undefined case.
> 
> Naoya, could you please take a look into this?

That would help thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      parent reply	other threads:[~2012-03-16 11:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-15 14:44 [PATCH] mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode Andrea Arcangeli
2012-03-15 16:01 ` Rik van Riel
2012-03-15 16:13 ` Larry Woodman
2012-03-15 17:16 ` Dave Jones
2012-03-15 17:41   ` Andrea Arcangeli
2012-03-15 22:30     ` Andrew Morton
2012-03-15 18:08 ` Johannes Weiner
2012-03-15 22:45 ` Andrew Morton
2012-03-15 23:15   ` Andrea Arcangeli
2012-03-15 23:27     ` Andrew Morton
2012-03-16  8:05       ` Naoya Horiguchi
2012-03-16 11:54       ` Andrea Arcangeli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120316115418.GC24602@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=uobergfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.