From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Alex Thorlton <athorlton@sgi.com>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Hugh Dickins <hughd@google.com>, Bob Liu <lliubbo@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm@kvack.org
Subject: Re: [BUG] mm, thp: khugepaged can't allocate on requested node when confined to a cpuset
Date: Tue, 14 Oct 2014 14:48:28 +0300 [thread overview]
Message-ID: <20141014114828.GA6524@node.dhcp.inet.fi> (raw)
In-Reply-To: <20141008191050.GK3778@sgi.com>
On Wed, Oct 08, 2014 at 02:10:50PM -0500, Alex Thorlton wrote:
> Hey everyone,
>
> I've run into a some frustrating behavior from the khugepaged thread,
> that I'm hoping to get sorted out. It appears that if you pin
> khugepaged to a cpuset (i.e. node 0),
Why whould you want to pin khugpeaged? Is there a valid use-case?
Looks like userspace shoots to its leg.
> and it begins scanning/collapsing pages for a process on a cpuset that
> doesn't have any memory nodes in common with kugepaged (i.e. node 1),
> then the collapsed pages will all be allocated khugepaged's node (in
> this case node 0), clearly breaking the cpuset boundary set up for the
> process in question.
>
> I'm aware that there are some known issues with khugepaged performing
> off-node allocations in certain situations, but I believe this is a bit
> of a special circumstance since, in this situation, there's no way for
> khugepaged to perform an allocation on the desired node.
>
> The problem really stems from the way that we determine the allowed
> memory nodes in get_page_from_freelist. When we call down to
> cpuset_zone_allowed_softwall, we check current->mems_allowed to
> determine what nodes we're allowed on. In the case of khugepaged, we'll
> be making allocations for the mm of the process we're collapsing for,
> but we'll be checking the mems_allowed of khugepaged, which can
> obviously cause some problems.
Is there a reason why we should respect cpuset limitation for kernel
threads?
Should we bypass cpuset for PF_KTHREAD completely?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 736d8e1b6381..03a74878ad46 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1960,6 +1960,9 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
zonelist_scan:
zonelist_rescan = false;
+ /* Bypass cpuset limitation if allocate from kernel thread context */
+ if (current->flags & PF_KTHREAD)
+ alloc_flags &= ~ALLOC_CPUSET;
/*
* Scan zonelist, looking for a zone with enough free.
* See also __cpuset_node_allowed_softwall() comment in kernel/cpuset.c.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Alex Thorlton <athorlton@sgi.com>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Hugh Dickins <hughd@google.com>, Bob Liu <lliubbo@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm@kvack.org
Subject: Re: [BUG] mm, thp: khugepaged can't allocate on requested node when confined to a cpuset
Date: Tue, 14 Oct 2014 14:48:28 +0300 [thread overview]
Message-ID: <20141014114828.GA6524@node.dhcp.inet.fi> (raw)
In-Reply-To: <20141008191050.GK3778@sgi.com>
On Wed, Oct 08, 2014 at 02:10:50PM -0500, Alex Thorlton wrote:
> Hey everyone,
>
> I've run into a some frustrating behavior from the khugepaged thread,
> that I'm hoping to get sorted out. It appears that if you pin
> khugepaged to a cpuset (i.e. node 0),
Why whould you want to pin khugpeaged? Is there a valid use-case?
Looks like userspace shoots to its leg.
> and it begins scanning/collapsing pages for a process on a cpuset that
> doesn't have any memory nodes in common with kugepaged (i.e. node 1),
> then the collapsed pages will all be allocated khugepaged's node (in
> this case node 0), clearly breaking the cpuset boundary set up for the
> process in question.
>
> I'm aware that there are some known issues with khugepaged performing
> off-node allocations in certain situations, but I believe this is a bit
> of a special circumstance since, in this situation, there's no way for
> khugepaged to perform an allocation on the desired node.
>
> The problem really stems from the way that we determine the allowed
> memory nodes in get_page_from_freelist. When we call down to
> cpuset_zone_allowed_softwall, we check current->mems_allowed to
> determine what nodes we're allowed on. In the case of khugepaged, we'll
> be making allocations for the mm of the process we're collapsing for,
> but we'll be checking the mems_allowed of khugepaged, which can
> obviously cause some problems.
Is there a reason why we should respect cpuset limitation for kernel
threads?
Should we bypass cpuset for PF_KTHREAD completely?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 736d8e1b6381..03a74878ad46 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1960,6 +1960,9 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
zonelist_scan:
zonelist_rescan = false;
+ /* Bypass cpuset limitation if allocate from kernel thread context */
+ if (current->flags & PF_KTHREAD)
+ alloc_flags &= ~ALLOC_CPUSET;
/*
* Scan zonelist, looking for a zone with enough free.
* See also __cpuset_node_allowed_softwall() comment in kernel/cpuset.c.
--
Kirill A. Shutemov
next prev parent reply other threads:[~2014-10-14 11:51 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-08 19:10 [BUG] mm, thp: khugepaged can't allocate on requested node when confined to a cpuset Alex Thorlton
2014-10-08 19:10 ` Alex Thorlton
2014-10-10 9:20 ` Peter Zijlstra
2014-10-10 9:20 ` Peter Zijlstra
2014-10-10 18:56 ` Alex Thorlton
2014-10-10 18:56 ` Alex Thorlton
2014-10-10 21:57 ` Vlastimil Babka
2014-10-10 21:57 ` Vlastimil Babka
2014-10-14 14:58 ` Alex Thorlton
2014-10-14 14:58 ` Alex Thorlton
2014-10-21 10:59 ` Peter Zijlstra
2014-10-21 10:59 ` Peter Zijlstra
2014-10-21 10:55 ` Peter Zijlstra
2014-10-21 10:55 ` Peter Zijlstra
2014-10-21 16:25 ` Alex Thorlton
2014-10-21 16:25 ` Alex Thorlton
2014-10-14 11:48 ` Kirill A. Shutemov [this message]
2014-10-14 11:48 ` Kirill A. Shutemov
2014-10-14 14:54 ` Peter Zijlstra
2014-10-14 14:54 ` Peter Zijlstra
2014-10-14 15:31 ` Rik van Riel
2014-10-14 15:31 ` Rik van Riel
2014-10-14 17:38 ` Kirill A. Shutemov
2014-10-14 17:38 ` Kirill A. Shutemov
2014-10-21 10:17 ` Peter Zijlstra
2014-10-21 10:17 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141014114828.GA6524@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=akpm@linux-foundation.org \
--cc=athorlton@sgi.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lliubbo@gmail.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.