From: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Lai Jiangshan <laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>,
Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>,
Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations
Date: Fri, 23 May 2014 20:33:14 -0300 [thread overview]
Message-ID: <20140523233314.GA3775@amt.cnet> (raw)
In-Reply-To: <alpine.DEB.2.02.1405231334460.13205-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
On Fri, May 23, 2014 at 01:51:12PM -0700, David Rientjes wrote:
> On Fri, 23 May 2014, Marcelo Tosatti wrote:
>
> > Zone specific allocations, such as GFP_DMA32, should not be restricted
> > to cpusets allowed node list: the zones which such allocations demand
> > might be contained in particular nodes outside the cpuset node list.
> >
> > The alternative would be to not perform such allocations from
> > applications which are cpuset restricted, which is unrealistic.
> >
>
> Or ensure applications that allocate from lowmem are allowed to do so, but
> I understand that might be hard to make sure always happens.
>
> > Fixes KVM's alloc_page(gfp_mask=GFP_DMA32) with cpuset as explained.
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5dba293..f228039 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > unsigned int cpuset_mems_cookie;
> > int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> > struct mem_cgroup *memcg = NULL;
> > + nodemask_t *cpuset_mems_allowed = &cpuset_current_mems_allowed;
> >
> > gfp_mask &= gfp_allowed_mask;
> >
> > @@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > retry_cpuset:
> > cpuset_mems_cookie = read_mems_allowed_begin();
> >
> > +#ifdef CONFIG_NUMA
> > + if (gfp_zone(gfp_mask) < policy_zone)
> > + cpuset_mems_allowed = NULL;
> > +#endif
> > +
> > /* The preferred zone is used for statistics later */
> > first_zones_zonelist(zonelist, high_zoneidx,
> > - nodemask ? : &cpuset_current_mems_allowed,
> > + nodemask ? : cpuset_mems_allowed,
> > &preferred_zone);
> > if (!preferred_zone)
> > goto out;
> >
>
> I think this is incomplete. Correct me if I'm wrong on how this is
> working: preferred_zone, today, is NULL because first_zones_zonelist() is
> restricted to a cpuset.mems that does not include lowmem and your patch
> fixes that.
> But if the fastpath allocation with mandatory ALLOC_CPUSET
> fails and we go to the slowpath, which may or may not have showed up in
> your testing, there's still issues,
> particularly if __GFP_WAIT and lots of
> allocators do GFP_KERNEL | __GFP_DMA32. This requires ALLOC_CPUSET on all
> allocations and you haven't updated __cpuset_node_allowed_softwall() with
> this exception nor zlc_setup().
Yes, thanks. Can you please review updated patch below.
> After that's done, I think all of this is really convoluted and deserves a
> comment to describe the ALLOC_CPUSET and __GFP_DMA32 behavior.
The comment at mm/mempolicy.c seems sufficient:
/* Highest zone. An specific allocation for a zone below that is not
policied. */
enum zone_type policy_zone = 0;
> Adding Li, the cpusets maintainer, to this as well.
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..b70a336 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2392,6 +2392,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t gfp_mask)
if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
return 1;
+#ifdef CONFIG_NUMA
+ if (gfp_zone(gfp_mask) < policy_zone)
+ return 1;
+#endif
might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
if (node_isset(node, current->mems_allowed))
return 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..dfea3dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
struct mem_cgroup *memcg = NULL;
+ nodemask_t *cpuset_mems_allowed = &cpuset_current_mems_allowed;
gfp_mask &= gfp_allowed_mask;
@@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
+#ifdef CONFIG_NUMA
+ if (gfp_zone(gfp_mask) < policy_zone)
+ cpuset_mems_allowed = NULL;
+#endif
+
/* The preferred zone is used for statistics later */
first_zones_zonelist(zonelist, high_zoneidx,
- nodemask ? : &cpuset_current_mems_allowed,
+ nodemask ? : cpuset_mems_allowed,
&preferred_zone);
if (!preferred_zone)
goto out;
WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Mel Gorman <mgorman@suse.de>, Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@linux.com>, Li Zefan <lizefan@huawei.com>,
Andrew Morton <akpm@linux-foundation.org>,
cgroups@vger.kernel.org
Subject: Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations
Date: Fri, 23 May 2014 20:33:14 -0300 [thread overview]
Message-ID: <20140523233314.GA3775@amt.cnet> (raw)
In-Reply-To: <alpine.DEB.2.02.1405231334460.13205@chino.kir.corp.google.com>
On Fri, May 23, 2014 at 01:51:12PM -0700, David Rientjes wrote:
> On Fri, 23 May 2014, Marcelo Tosatti wrote:
>
> > Zone specific allocations, such as GFP_DMA32, should not be restricted
> > to cpusets allowed node list: the zones which such allocations demand
> > might be contained in particular nodes outside the cpuset node list.
> >
> > The alternative would be to not perform such allocations from
> > applications which are cpuset restricted, which is unrealistic.
> >
>
> Or ensure applications that allocate from lowmem are allowed to do so, but
> I understand that might be hard to make sure always happens.
>
> > Fixes KVM's alloc_page(gfp_mask=GFP_DMA32) with cpuset as explained.
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5dba293..f228039 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > unsigned int cpuset_mems_cookie;
> > int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> > struct mem_cgroup *memcg = NULL;
> > + nodemask_t *cpuset_mems_allowed = &cpuset_current_mems_allowed;
> >
> > gfp_mask &= gfp_allowed_mask;
> >
> > @@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > retry_cpuset:
> > cpuset_mems_cookie = read_mems_allowed_begin();
> >
> > +#ifdef CONFIG_NUMA
> > + if (gfp_zone(gfp_mask) < policy_zone)
> > + cpuset_mems_allowed = NULL;
> > +#endif
> > +
> > /* The preferred zone is used for statistics later */
> > first_zones_zonelist(zonelist, high_zoneidx,
> > - nodemask ? : &cpuset_current_mems_allowed,
> > + nodemask ? : cpuset_mems_allowed,
> > &preferred_zone);
> > if (!preferred_zone)
> > goto out;
> >
>
> I think this is incomplete. Correct me if I'm wrong on how this is
> working: preferred_zone, today, is NULL because first_zones_zonelist() is
> restricted to a cpuset.mems that does not include lowmem and your patch
> fixes that.
> But if the fastpath allocation with mandatory ALLOC_CPUSET
> fails and we go to the slowpath, which may or may not have showed up in
> your testing, there's still issues,
> particularly if __GFP_WAIT and lots of
> allocators do GFP_KERNEL | __GFP_DMA32. This requires ALLOC_CPUSET on all
> allocations and you haven't updated __cpuset_node_allowed_softwall() with
> this exception nor zlc_setup().
Yes, thanks. Can you please review updated patch below.
> After that's done, I think all of this is really convoluted and deserves a
> comment to describe the ALLOC_CPUSET and __GFP_DMA32 behavior.
The comment at mm/mempolicy.c seems sufficient:
/* Highest zone. An specific allocation for a zone below that is not
policied. */
enum zone_type policy_zone = 0;
> Adding Li, the cpusets maintainer, to this as well.
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..b70a336 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2392,6 +2392,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t gfp_mask)
if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
return 1;
+#ifdef CONFIG_NUMA
+ if (gfp_zone(gfp_mask) < policy_zone)
+ return 1;
+#endif
might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
if (node_isset(node, current->mems_allowed))
return 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..dfea3dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
struct mem_cgroup *memcg = NULL;
+ nodemask_t *cpuset_mems_allowed = &cpuset_current_mems_allowed;
gfp_mask &= gfp_allowed_mask;
@@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
+#ifdef CONFIG_NUMA
+ if (gfp_zone(gfp_mask) < policy_zone)
+ cpuset_mems_allowed = NULL;
+#endif
+
/* The preferred zone is used for statistics later */
first_zones_zonelist(zonelist, high_zoneidx,
- nodemask ? : &cpuset_current_mems_allowed,
+ nodemask ? : cpuset_mems_allowed,
&preferred_zone);
if (!preferred_zone)
goto out;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <mtosatti@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Mel Gorman <mgorman@suse.de>, Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@linux.com>, Li Zefan <lizefan@huawei.com>,
Andrew Morton <akpm@linux-foundation.org>,
cgroups@vger.kernel.org
Subject: Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations
Date: Fri, 23 May 2014 20:33:14 -0300 [thread overview]
Message-ID: <20140523233314.GA3775@amt.cnet> (raw)
In-Reply-To: <alpine.DEB.2.02.1405231334460.13205@chino.kir.corp.google.com>
On Fri, May 23, 2014 at 01:51:12PM -0700, David Rientjes wrote:
> On Fri, 23 May 2014, Marcelo Tosatti wrote:
>
> > Zone specific allocations, such as GFP_DMA32, should not be restricted
> > to cpusets allowed node list: the zones which such allocations demand
> > might be contained in particular nodes outside the cpuset node list.
> >
> > The alternative would be to not perform such allocations from
> > applications which are cpuset restricted, which is unrealistic.
> >
>
> Or ensure applications that allocate from lowmem are allowed to do so, but
> I understand that might be hard to make sure always happens.
>
> > Fixes KVM's alloc_page(gfp_mask=GFP_DMA32) with cpuset as explained.
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 5dba293..f228039 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > unsigned int cpuset_mems_cookie;
> > int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
> > struct mem_cgroup *memcg = NULL;
> > + nodemask_t *cpuset_mems_allowed = &cpuset_current_mems_allowed;
> >
> > gfp_mask &= gfp_allowed_mask;
> >
> > @@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > retry_cpuset:
> > cpuset_mems_cookie = read_mems_allowed_begin();
> >
> > +#ifdef CONFIG_NUMA
> > + if (gfp_zone(gfp_mask) < policy_zone)
> > + cpuset_mems_allowed = NULL;
> > +#endif
> > +
> > /* The preferred zone is used for statistics later */
> > first_zones_zonelist(zonelist, high_zoneidx,
> > - nodemask ? : &cpuset_current_mems_allowed,
> > + nodemask ? : cpuset_mems_allowed,
> > &preferred_zone);
> > if (!preferred_zone)
> > goto out;
> >
>
> I think this is incomplete. Correct me if I'm wrong on how this is
> working: preferred_zone, today, is NULL because first_zones_zonelist() is
> restricted to a cpuset.mems that does not include lowmem and your patch
> fixes that.
> But if the fastpath allocation with mandatory ALLOC_CPUSET
> fails and we go to the slowpath, which may or may not have showed up in
> your testing, there's still issues,
> particularly if __GFP_WAIT and lots of
> allocators do GFP_KERNEL | __GFP_DMA32. This requires ALLOC_CPUSET on all
> allocations and you haven't updated __cpuset_node_allowed_softwall() with
> this exception nor zlc_setup().
Yes, thanks. Can you please review updated patch below.
> After that's done, I think all of this is really convoluted and deserves a
> comment to describe the ALLOC_CPUSET and __GFP_DMA32 behavior.
The comment at mm/mempolicy.c seems sufficient:
/* Highest zone. An specific allocation for a zone below that is not
policied. */
enum zone_type policy_zone = 0;
> Adding Li, the cpusets maintainer, to this as well.
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 3d54c41..b70a336 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2392,6 +2392,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t gfp_mask)
if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
return 1;
+#ifdef CONFIG_NUMA
+ if (gfp_zone(gfp_mask) < policy_zone)
+ return 1;
+#endif
might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
if (node_isset(node, current->mems_allowed))
return 1;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..dfea3dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2698,6 +2698,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
struct mem_cgroup *memcg = NULL;
+ nodemask_t *cpuset_mems_allowed = &cpuset_current_mems_allowed;
gfp_mask &= gfp_allowed_mask;
@@ -2726,9 +2727,14 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
+#ifdef CONFIG_NUMA
+ if (gfp_zone(gfp_mask) < policy_zone)
+ cpuset_mems_allowed = NULL;
+#endif
+
/* The preferred zone is used for statistics later */
first_zones_zonelist(zonelist, high_zoneidx,
- nodemask ? : &cpuset_current_mems_allowed,
+ nodemask ? : cpuset_mems_allowed,
&preferred_zone);
if (!preferred_zone)
goto out;
next prev parent reply other threads:[~2014-05-23 23:33 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-23 19:37 [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations Marcelo Tosatti
2014-05-23 19:37 ` Marcelo Tosatti
2014-05-23 20:51 ` David Rientjes
2014-05-23 20:51 ` David Rientjes
[not found] ` <alpine.DEB.2.02.1405231334460.13205-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
2014-05-23 23:33 ` Marcelo Tosatti [this message]
2014-05-23 23:33 ` Marcelo Tosatti
2014-05-23 23:33 ` Marcelo Tosatti
2014-05-26 18:53 ` [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v2) Marcelo Tosatti
2014-05-26 18:53 ` Marcelo Tosatti
2014-05-28 7:02 ` Li Zefan
2014-05-28 7:02 ` Li Zefan
2014-05-28 22:43 ` [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v3) Marcelo Tosatti
2014-05-28 22:43 ` Marcelo Tosatti
2014-05-28 23:45 ` Christoph Lameter
2014-05-28 23:45 ` Christoph Lameter
2014-05-29 18:46 ` Marcelo Tosatti
2014-05-29 18:46 ` Marcelo Tosatti
2014-05-29 18:43 ` [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v4) Marcelo Tosatti
2014-05-29 18:43 ` Marcelo Tosatti
2014-05-29 22:40 ` Andrew Morton
2014-05-29 22:40 ` Andrew Morton
2014-05-29 23:01 ` David Rientjes
2014-05-29 23:01 ` David Rientjes
2014-05-29 23:12 ` Andrew Morton
2014-05-29 23:12 ` Andrew Morton
2014-05-30 13:48 ` Christoph Lameter
2014-05-30 13:48 ` Christoph Lameter
2014-05-30 21:43 ` Marcelo Tosatti
2014-05-30 21:43 ` Marcelo Tosatti
2014-05-29 23:28 ` [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v5) Marcelo Tosatti
2014-05-29 23:28 ` Marcelo Tosatti
2014-05-29 23:54 ` David Rientjes
2014-05-29 23:54 ` David Rientjes
2014-05-30 13:12 ` Marcelo Tosatti
2014-05-30 13:12 ` Marcelo Tosatti
2014-05-30 13:50 ` Christoph Lameter
2014-05-30 13:50 ` Christoph Lameter
2014-05-30 21:18 ` Andi Kleen
2014-05-30 21:18 ` Andi Kleen
2014-05-27 14:21 ` [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations Christoph Lameter
2014-05-27 14:21 ` Christoph Lameter
2014-05-27 14:53 ` Marcelo Tosatti
2014-05-27 14:53 ` Marcelo Tosatti
2014-05-27 14:57 ` Marcelo Tosatti
2014-05-27 14:57 ` Marcelo Tosatti
2014-05-27 15:31 ` Christoph Lameter
2014-05-27 15:31 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140523233314.GA3775@amt.cnet \
--to=mtosatti-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org \
--cc=laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
--cc=mgorman-l3A5Bk7waGM@public.gmane.org \
--cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.