All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bob Liu <bob.liu@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Fengguang Wu" <fengguang.wu@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Jörn Engel" <joern@logfs.org>, "Mel Gorman" <mgorman@suse.de>,
	"Michel Lespinasse" <walken@google.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@suse.cz>
Subject: Re: [RFC PATCH RESEND] mm: munlock: Prevent walking off the end of a pagetable in no-pmd configuration
Date: Wed, 18 Sep 2013 09:17:46 +0800	[thread overview]
Message-ID: <5238FF3A.2070500@oracle.com> (raw)
In-Reply-To: <1379427739-31451-1-git-send-email-vbabka@suse.cz>

On 09/17/2013 10:22 PM, Vlastimil Babka wrote:
> The function __munlock_pagevec_fill() introduced in commit 7a8010cd3
> ("mm: munlock: manual pte walk in fast path instead of follow_page_mask()")
> uses pmd_addr_end() for restricting its operation within current page table.
> This is insufficient on architectures/configurations where pmd is folded
> and pmd_addr_end() just returns the end of the full range to be walked. In
> this case, it allows pte++ to walk off the end of a page table resulting in
> unpredictable behaviour.
> 
> This patch fixes the function by using pgd_addr_end() and pud_addr_end()
> before pmd_addr_end(), which will yield correct page table boundary on all
> configurations. This is similar to what existing page walkers do when walking
> each level of the page table.
> 
> Additionaly, the patch clarifies a comment for get_locked_pte() call in the
> function.
> 
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> Cc: JA?rn Engel <joern@logfs.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/mlock.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index d638026..758c0fc 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -379,10 +379,14 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>  
>  	/*
>  	 * Initialize pte walk starting at the already pinned page where we
> -	 * are sure that there is a pte.
> +	 * are sure that there is a pte, as it was pinned under the same
> +	 * mmap_sem write op.
>  	 */
>  	pte = get_locked_pte(vma->vm_mm, start,	&ptl);
> -	end = min(end, pmd_addr_end(start, end));
> +	/* Make sure we do not cross the page table boundary */
> +	end = pgd_addr_end(start, end);
> +	end = pud_addr_end(start, end);
> +	end = pmd_addr_end(start, end);
>  

Nitpick, how about unfolding pmd_addr_end(start, end) directly? Like:

--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -376,13 +376,14 @@ static unsigned long __munlock_pagevec_fill(struct
pagevec *pvec,
 {
        pte_t *pte;
        spinlock_t *ptl;
+       unsigned long pmd_end = (start + PMD_SIZE) & PMD_MASK;
+       end = (pmd_end - 1 < end - 1) ? pmd_end : end;

        /*
         * Initialize pte walk starting at the already pinned page where we
         * are sure that there is a pte.
         */
        pte = get_locked_pte(vma->vm_mm, start, &ptl);
-       end = min(end, pmd_addr_end(start, end));

        /* The page next to the pinned page is the first we will try to
get */
        start += PAGE_SIZE;

Anyway,
Reviewed-by: Bob Liu <bob.liu@oracle.com>

-- 
Regards,
-Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Bob Liu <bob.liu@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Fengguang Wu" <fengguang.wu@intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Jörn Engel" <joern@logfs.org>, "Mel Gorman" <mgorman@suse.de>,
	"Michel Lespinasse" <walken@google.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@suse.cz>
Subject: Re: [RFC PATCH RESEND] mm: munlock: Prevent walking off the end of a pagetable in no-pmd configuration
Date: Wed, 18 Sep 2013 09:17:46 +0800	[thread overview]
Message-ID: <5238FF3A.2070500@oracle.com> (raw)
In-Reply-To: <1379427739-31451-1-git-send-email-vbabka@suse.cz>

On 09/17/2013 10:22 PM, Vlastimil Babka wrote:
> The function __munlock_pagevec_fill() introduced in commit 7a8010cd3
> ("mm: munlock: manual pte walk in fast path instead of follow_page_mask()")
> uses pmd_addr_end() for restricting its operation within current page table.
> This is insufficient on architectures/configurations where pmd is folded
> and pmd_addr_end() just returns the end of the full range to be walked. In
> this case, it allows pte++ to walk off the end of a page table resulting in
> unpredictable behaviour.
> 
> This patch fixes the function by using pgd_addr_end() and pud_addr_end()
> before pmd_addr_end(), which will yield correct page table boundary on all
> configurations. This is similar to what existing page walkers do when walking
> each level of the page table.
> 
> Additionaly, the patch clarifies a comment for get_locked_pte() call in the
> function.
> 
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> Cc: Jörn Engel <joern@logfs.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michel Lespinasse <walken@google.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/mlock.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index d638026..758c0fc 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -379,10 +379,14 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>  
>  	/*
>  	 * Initialize pte walk starting at the already pinned page where we
> -	 * are sure that there is a pte.
> +	 * are sure that there is a pte, as it was pinned under the same
> +	 * mmap_sem write op.
>  	 */
>  	pte = get_locked_pte(vma->vm_mm, start,	&ptl);
> -	end = min(end, pmd_addr_end(start, end));
> +	/* Make sure we do not cross the page table boundary */
> +	end = pgd_addr_end(start, end);
> +	end = pud_addr_end(start, end);
> +	end = pmd_addr_end(start, end);
>  

Nitpick, how about unfolding pmd_addr_end(start, end) directly? Like:

--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -376,13 +376,14 @@ static unsigned long __munlock_pagevec_fill(struct
pagevec *pvec,
 {
        pte_t *pte;
        spinlock_t *ptl;
+       unsigned long pmd_end = (start + PMD_SIZE) & PMD_MASK;
+       end = (pmd_end - 1 < end - 1) ? pmd_end : end;

        /*
         * Initialize pte walk starting at the already pinned page where we
         * are sure that there is a pte.
         */
        pte = get_locked_pte(vma->vm_mm, start, &ptl);
-       end = min(end, pmd_addr_end(start, end));

        /* The page next to the pinned page is the first we will try to
get */
        start += PAGE_SIZE;

Anyway,
Reviewed-by: Bob Liu <bob.liu@oracle.com>

-- 
Regards,
-Bob

  reply	other threads:[~2013-09-18  1:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-16  8:47 [munlock] BUG: Bad page map in process killall5 pte:53425553 pmd:075f4067 Fengguang Wu
2013-09-16  8:47 ` Fengguang Wu
2013-09-16 15:27 ` Vlastimil Babka
2013-09-16 15:27   ` Vlastimil Babka
2013-09-17 13:29   ` Fengguang Wu
2013-09-17 13:34     ` Vlastimil Babka
2013-09-17 13:34       ` Vlastimil Babka
2013-09-17 14:22       ` [RFC PATCH RESEND] mm: munlock: Prevent walking off the end of a pagetable in no-pmd configuration Vlastimil Babka
2013-09-17 14:22         ` Vlastimil Babka
2013-09-18  1:17         ` Bob Liu [this message]
2013-09-18  1:17           ` Bob Liu
2013-09-20  9:04           ` Vlastimil Babka
2013-09-20  9:04             ` Vlastimil Babka
2013-09-20  9:16 ` [PATCH] " Vlastimil Babka
2013-09-20  9:16   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5238FF3A.2070500@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=joern@logfs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.