All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Song Liu <songliubraving@fb.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"matthew.wilcox@oracle.com" <matthew.wilcox@oracle.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	Kernel Team <Kernel-team@fb.com>,
	"william.kucharski@oracle.com" <william.kucharski@oracle.com>,
	"srikar@linux.vnet.ibm.com" <srikar@linux.vnet.ibm.com>
Subject: Re: [PATCH v10 3/4] mm, thp: introduce FOLL_SPLIT_PMD
Date: Wed, 31 Jul 2019 17:18:43 +0200	[thread overview]
Message-ID: <20190731151842.GB25078@redhat.com> (raw)
In-Reply-To: <1E2B5653-BA85-4A05-9B41-57CF9E48F14A@fb.com>

On 07/30, Song Liu wrote:
>
>
> > On Jul 30, 2019, at 9:11 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > So after the next patch we have a single user of FOLL_SPLIT_PMD (uprobes)
> > and a single user of FOLL_SPLIT: arch/s390/mm/gmap.c:thp_split_mm().
> >
> > Hmm.
>
> I think this is what we want. :)

We? I don't ;)

> FOLL_SPLIT is the fallback solution for users who cannot handle THP.

and again, we have a single user: thp_split_mm(). I do not know if it
can use FOLL_SPLIT_PMD or not, may be you can take a look...

> With
> more THP aware code, there will be fewer users of FOLL_SPLIT.

Fewer than 1? Good ;)

> >> @@ -399,7 +399,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
> >> 		spin_unlock(ptl);
> >> 		return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
> >> 	}
> >> -	if (flags & FOLL_SPLIT) {
> >> +	if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
> >> 		int ret;
> >> 		page = pmd_page(*pmd);
> >> 		if (is_huge_zero_page(page)) {
> >> @@ -408,7 +408,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
> >> 			split_huge_pmd(vma, pmd, address);
> >> 			if (pmd_trans_unstable(pmd))
> >> 				ret = -EBUSY;
> >> -		} else {
> >> +		} else if (flags & FOLL_SPLIT) {
> >> 			if (unlikely(!try_get_page(page))) {
> >> 				spin_unlock(ptl);
> >> 				return ERR_PTR(-ENOMEM);
> >> @@ -420,6 +420,10 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
> >> 			put_page(page);
> >> 			if (pmd_none(*pmd))
> >> 				return no_page_table(vma, flags);
> >> +		} else {  /* flags & FOLL_SPLIT_PMD */
> >> +			spin_unlock(ptl);
> >> +			split_huge_pmd(vma, pmd, address);
> >> +			ret = pte_alloc(mm, pmd);
> >
> > I fail to understand why this differs from the is_huge_zero_page() case above.
>
> split_huge_pmd() handles is_huge_zero_page() differently. In this case, we
> cannot use the pmd_trans_unstable() check.

Please correct me, but iiuc the problem is not that split_huge_pmd() handles
is_huge_zero_page() differently, the problem is that __split_huge_pmd_locked()
handles the !vma_is_anonymous(vma) differently and returns with pmd_none() = T
after pmdp_huge_clear_flush_notify(). This means that pmd_trans_unstable() will
fail.

Now, I don't understand why do we need pmd_trans_unstable() after
split_huge_pmd(huge-zero-pmd), but whatever reason we have, why can't we
unify both cases?

IOW, could you explain why the path below is wrong?

Oleg.


--- x/mm/gup.c
+++ x/mm/gup.c
@@ -399,14 +399,16 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
 	}
-	if (flags & FOLL_SPLIT) {
+	if (flags & (FOLL_SPLIT | FOLL_SPLIT_PMD)) {
 		int ret;
 		page = pmd_page(*pmd);
-		if (is_huge_zero_page(page)) {
+		if ((flags & FOLL_SPLIT_PMD) || is_huge_zero_page(page)) {
 			spin_unlock(ptl);
-			ret = 0;
 			split_huge_pmd(vma, pmd, address);
-			if (pmd_trans_unstable(pmd))
+			ret = 0;
+			if (pte_alloc(mm, pmd))
+				ret = -ENOMEM;
+			else if (pmd_trans_unstable(pmd))
 				ret = -EBUSY;
 		} else {
 			if (unlikely(!try_get_page(page))) {


  reply	other threads:[~2019-07-31 15:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-30  5:23 [PATCH v10 0/4] THP aware uprobe Song Liu
2019-07-30  5:23 ` [PATCH v10 1/4] mm: move memcmp_pages() and pages_identical() Song Liu
2019-07-30  5:23 ` [PATCH v10 2/4] uprobe: use original page when all uprobes are removed Song Liu
2019-07-30 16:58   ` Oleg Nesterov
2019-07-30  5:23 ` [PATCH v10 3/4] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
2019-07-30 16:11   ` Oleg Nesterov
2019-07-30 17:42     ` Song Liu
2019-07-31 15:18       ` Oleg Nesterov [this message]
2019-07-31 17:10         ` Song Liu
2019-08-01 15:04           ` Oleg Nesterov
2019-07-30  5:23 ` [PATCH v10 4/4] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190731151842.GB25078@redhat.com \
    --to=oleg@redhat.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.wilcox@oracle.com \
    --cc=songliubraving@fb.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=william.kucharski@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.