All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
To: Hugh Dickins <hughd@google.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Linux PPC dev <linuxppc-dev@ozlabs.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	kirill.shutemov@linux.intel.com
Subject: Re: Question on follow_page_mask
Date: Wed, 24 Feb 2016 17:15:07 +0530	[thread overview]
Message-ID: <56CD97C3.6080407@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1602231230010.3043@eggly.anvils>

On 02/24/2016 02:37 AM, Hugh Dickins via Linuxppc-dev wrote:
> On Tue, 23 Feb 2016, Kirill A. Shutemov wrote:
>> On Tue, Feb 23, 2016 at 06:45:05PM +0530, Anshuman Khandual wrote:
>>> Not able to understand the first code block of follow_page_mask
>>> function. follow_huge_addr function is expected to find the page
>>> struct for the given address if it turns out to be a HugeTLB page
>>> but then when it finds the page we bug on if it had been called
>>> with FOLL_GET flag.
>>>
>>> 	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
>>> 	if (!IS_ERR(page)) {
>>> 		BUG_ON(flags & FOLL_GET);
>>> 		return page;
>>> 	}
>>>
>>> do_move_page_to_node_array calls follow_page with FOLL_GET which
>>> in turn calls follow_page_mask with FOLL_GET. On POWER, the
>>> function follow_huge_addr is defined and does not return -EINVAL
>>> like the generic one. It returns the page struct if its a HugeTLB
>>> page. Just curious to know what is the purpose behind the BUG_ON.
>>
>> I would guess requesting pin on non-reclaimable page is considered
>> useless, meaning suspicius behavior. BUG_ON() is overkill, I think.
>> WARN_ON_ONCE() would make it.

Hugh, thanks for such a detailed response.

> 
> No, it's there to guard against abuse, until the correct functionality
> is implemented: which has not so far been required, I think.
> 
> The problem is that a get_page() here is too late: it needs to be done
> inside each arch's implementation of follow_huge_addr(), while holding
> whatever is the appropriate lock, dropped by the time it returns here.

Got it.

> 
> If you look through where FOLL_GET is usually implemented, such as in
> follow_page_pte(), but pud and pmd cases too, I hope you'll still find
> that they are careful to get the reference on the page while it's safe
> in the pagetable.

yeah, true.

> 
> But follow_huge_addr() would need some work to offer the same guarantees:
> it's good for those "peep at a page without actually getting a reference"
> cases, but not good enough for preventing a page for being put to some
> other use completely, before we've secured it with our reference.

Yeah, I understand that now.

> 
> Unless something's changed: the last time I recall the issue coming up,
> was when Naoya Horiguchi was working on hugetlbfs page migration: see
> linux-kernel/linux-mm mail thread "BUG at mm/memory.c:1489!" from
> 28 May 2014; and the resolution there was not to support the
> follow_huge_addr() case (which IIRC is peculiar to powerpc alone?).

Yeah correct, looked into that discussion. Also tried out the discussed
small patch where we do a get_page(page) if the returned page is a head
of a HugeTLB compound page and the flag contains FOLL_GET.

That made the do_move_page_to_node_array function work before hitting a
race condition with the test case in the commit message of e66f17ff7177
("mm/hugetlb: take page table lock in follow_huge_pmd()").

I believe the test exposes the locking problem which is not being taken
care at the follow_huge_addr function level right now on powerpc and
forces it to call follow_huge_pmd (which does a BUG_ON() on powerpc)
after failing to detect a huge page at the PMD level when it checked
the first time around.

  reply	other threads:[~2016-02-24 11:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-23 13:15 Question on follow_page_mask Anshuman Khandual
2016-02-23 14:03 ` Kirill A. Shutemov
2016-02-23 21:07   ` Hugh Dickins
2016-02-24 11:45     ` Anshuman Khandual [this message]
2016-02-24 11:22   ` Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CD97C3.6080407@linux.vnet.ibm.com \
    --to=khandual@linux.vnet.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.