All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Andrey Ryabinin <arbn@yandex-team.com>
Cc: stable@vger.kernel.org, aarcange@redhat.com,
	akpm@linux-foundation.org, mgorman@techsingularity.net,
	mhocko@suse.com, rientjes@google.com,
	torvalds@linux-foundation.org
Subject: Re: [PATCH 5.4] mm: mempolicy: fix THP allocations escaping mempolicy restrictions
Date: Mon, 27 Dec 2021 15:08:42 +0100	[thread overview]
Message-ID: <YcnI6ppzfa0VydKG@kroah.com> (raw)
In-Reply-To: <20211227134539.1447-1-arbn@yandex-team.com>

On Mon, Dec 27, 2021 at 04:45:39PM +0300, Andrey Ryabinin wrote:
> commit 338635340669d5b317c7e8dcf4fff4a0f3651d87 upstream.
> 
> alloc_pages_vma() may try to allocate THP page on the local NUMA node
> first:
> 
> 	page = __alloc_pages_node(hpage_node,
> 		gfp | __GFP_THISNODE | __GFP_NORETRY, order);
> 
> And if the allocation fails it retries allowing remote memory:
> 
> 	if (!page && (gfp & __GFP_DIRECT_RECLAIM))
>     		page = __alloc_pages_node(hpage_node,
> 					gfp, order);
> 
> However, this retry allocation completely ignores memory policy nodemask
> allowing allocation to escape restrictions.
> 
> The first appearance of this bug seems to be the commit ac5b2c18911f
> ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").
> 
> The bug disappeared later in the commit 89c83fb539f9 ("mm, thp:
> consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
> reappeared again in slightly different form in the commit 76e654cc91bb
> ("mm, page_alloc: allow hugepage fallback to remote nodes when
> madvised")
> 
> Fix this by passing correct nodemask to the __alloc_pages() call.
> 
> The demonstration/reproducer of the problem:
> 
>     $ mount -oremount,size=4G,huge=always /dev/shm/
>     $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
>     $ cat mbind_thp.c
>     #include <unistd.h>
>     #include <sys/mman.h>
>     #include <sys/stat.h>
>     #include <fcntl.h>
>     #include <assert.h>
>     #include <stdlib.h>
>     #include <stdio.h>
>     #include <numaif.h>
> 
>     #define SIZE 2ULL << 30
>     int main(int argc, char **argv)
>     {
>         int fd;
>         unsigned long long i;
>         char *addr;
>         pid_t pid;
>         char buf[100];
>         unsigned long nodemask = 1;
> 
>         fd = open("/dev/shm/test", O_RDWR|O_CREAT);
>         assert(fd > 0);
>         assert(ftruncate(fd, SIZE) == 0);
> 
>         addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
>                            MAP_SHARED, fd, 0);
> 
>         assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
>         for (i = 0; i < SIZE; i+=4096) {
>           addr[i] = 1;
>         }
>         pid = getpid();
>         snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
>         system(buf);
>         sleep(10000);
> 
>         return 0;
>     }
>     $ gcc mbind_thp.c -o mbind_thp -lnuma
>     $ numactl -H
>     available: 2 nodes (0-1)
>     node 0 cpus: 0 2
>     node 0 size: 1918 MB
>     node 0 free: 1595 MB
>     node 1 cpus: 1 3
>     node 1 size: 2014 MB
>     node 1 free: 1731 MB
>     node distances:
>     node   0   1
>       0:  10  20
>       1:  20  10
>     $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
>     7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4
> 
> Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
> Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
>  mm/mempolicy.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Both backports now queued up, thanks.

greg k-h

      reply	other threads:[~2021-12-27 14:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-27 12:34 FAILED: patch "[PATCH] mm: mempolicy: fix THP allocations escaping mempolicy" failed to apply to 5.4-stable tree gregkh
2021-12-27 13:45 ` [PATCH 5.4] mm: mempolicy: fix THP allocations escaping mempolicy restrictions Andrey Ryabinin
2021-12-27 14:08   ` Greg KH [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YcnI6ppzfa0VydKG@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arbn@yandex-team.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.