* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0 (was: Re: mm, page_alloc: avoid looking up the first zone in a zonelist twice)
[not found] <CAMuHMdV00vJJxoA7XABw+mFF+2QUd1MuQbPKKgkmGnK_NySZpg@mail.gmail.com>
@ 2016-05-30 15:56 ` Mel Gorman
[not found] ` <20160530155644.GP2527@techsingularity.net>
1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2016-05-30 15:56 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Andrew Morton, Linux Kernel Mailing List, Linux MM, linux-m68k
On Mon, May 30, 2016 at 03:13:40PM +0200, Geert Uytterhoeven wrote:
> > The benefit is negligible and the results are within the noise but each
> > cycle counts.
> >
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> About one week ago, I started seeing an obscure intermittent crash during
> system shutdown on m68k/ARAnyM using atari_defconfig.
> The crash isn't 100% reproducible, but it happens during ca. 1 out of 5
> shutdowns.
>
> I finally managed to bisect it to the above commit.
> I did verify that the parent commit didn't crash after 60 tries.
> Unfortunately I couldn't revert the offending commit on top of v4.7-rc1, due to
> conflicting changes.
>
> Do you have any idea what's going wrong?
There isn't anything obvious from the crash log you showed but can you
try the following just in case?
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dba8cfd0b2d6..f2c1e47adc11 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3232,6 +3232,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* allocations are system rather than user orientated
*/
ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+ ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
+ ac->high_zoneidx, ac->nodemask);
+ ac->classzone_idx = zonelist_zone_idx(ac->preferred_zoneref);
page = get_page_from_freelist(gfp_mask, order,
ALLOC_NO_WATERMARKS, ac);
if (page)
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0 (was: Re: mm, page_alloc: avoid looking up the first zone in a zonelist twice)
[not found] ` <20160530155644.GP2527@techsingularity.net>
@ 2016-05-30 17:37 ` Geert Uytterhoeven
2016-05-30 18:56 ` Mel Gorman
[not found] ` <20160530185616.GQ2527@techsingularity.net>
2016-05-31 21:44 ` Vlastimil Babka
[not found] ` <574E05B8.3060009@suse.cz>
2 siblings, 2 replies; 19+ messages in thread
From: Geert Uytterhoeven @ 2016-05-30 17:37 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Linux Kernel Mailing List, Linux MM, linux-m68k
Hi Mel,
On Mon, May 30, 2016 at 5:56 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> On Mon, May 30, 2016 at 03:13:40PM +0200, Geert Uytterhoeven wrote:
>> > The benefit is negligible and the results are within the noise but each
>> > cycle counts.
>> >
>> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> > Cc: Vlastimil Babka <vbabka@suse.cz>
>> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>
>> About one week ago, I started seeing an obscure intermittent crash during
>> system shutdown on m68k/ARAnyM using atari_defconfig.
>> The crash isn't 100% reproducible, but it happens during ca. 1 out of 5
>> shutdowns.
>>
>> I finally managed to bisect it to the above commit.
>> I did verify that the parent commit didn't crash after 60 tries.
>> Unfortunately I couldn't revert the offending commit on top of v4.7-rc1, due to
>> conflicting changes.
>>
>> Do you have any idea what's going wrong?
>
> There isn't anything obvious from the crash log you showed but can you
> try the following just in case?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dba8cfd0b2d6..f2c1e47adc11 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3232,6 +3232,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> * allocations are system rather than user orientated
> */
> ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> + ac->high_zoneidx, ac->nodemask);
> + ac->classzone_idx = zonelist_zone_idx(ac->preferred_zoneref);
> page = get_page_from_freelist(gfp_mask, order,
> ALLOC_NO_WATERMARKS, ac);
> if (page)
Thanks, but unfortunately it doesn't help.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0 (was: Re: mm, page_alloc: avoid looking up the first zone in a zonelist twice)
2016-05-30 17:37 ` Geert Uytterhoeven
@ 2016-05-30 18:56 ` Mel Gorman
[not found] ` <20160530185616.GQ2527@techsingularity.net>
1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2016-05-30 18:56 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Andrew Morton, Linux Kernel Mailing List, Linux MM, linux-m68k
On Mon, May 30, 2016 at 07:37:39PM +0200, Geert Uytterhoeven wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index dba8cfd0b2d6..f2c1e47adc11 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3232,6 +3232,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > * allocations are system rather than user orientated
> > */
> > ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> > + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> > + ac->high_zoneidx, ac->nodemask);
> > + ac->classzone_idx = zonelist_zone_idx(ac->preferred_zoneref);
> > page = get_page_from_freelist(gfp_mask, order,
> > ALLOC_NO_WATERMARKS, ac);
> > if (page)
>
> Thanks, but unfortunately it doesn't help.
>
Thanks. Please try the following instead
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb320cde4d6d..557549c81083 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3024,6 +3024,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
apply_fair = false;
fair_skipped = false;
reset_alloc_batches(ac->preferred_zoneref->zone);
+ z = ac->preferred_zoneref;
goto zonelist_scan;
}
--
Mel Gorman
SUSE Labs
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0 (was: Re: mm, page_alloc: avoid looking up the first zone in a zonelist twice)
[not found] ` <20160530185616.GQ2527@techsingularity.net>
@ 2016-05-31 9:28 ` Geert Uytterhoeven
[not found] ` <CAMuHMdXCN5LeNCNJ9=B5sGAtdd81JeRNrUMSCOjSL_Bx1-tDvA@mail.gmail.com>
1 sibling, 0 replies; 19+ messages in thread
From: Geert Uytterhoeven @ 2016-05-31 9:28 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Linux Kernel Mailing List, Linux MM, linux-m68k
Hi Mel,
On Mon, May 30, 2016 at 8:56 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> Thanks. Please try the following instead
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bb320cde4d6d..557549c81083 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3024,6 +3024,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> apply_fair = false;
> fair_skipped = false;
> reset_alloc_batches(ac->preferred_zoneref->zone);
> + z = ac->preferred_zoneref;
> goto zonelist_scan;
> }
Thanks a lot, that seems to fix the issue!.
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
JFTR, without the fix, sometimes I get a different, but equally obscure, crash
than the one I posted before:
Kernel panic - not syncing: Aiee, killing interrupt handler!
CPU: 0 PID: 668 Comm: cron Not tainted
4.7.0-rc1-atari-01184-g21f2f74eaf41989e #370
Stack from 00bf9be0:
00bf9be0 0031a075 000236d0 0000000f 007e9060 efcc5cd0 00bf800c 00bf9fcc
00000000 00000000 ffffffff 00bf9c18 00024cbe 00303d7f 0000000f 007e9060
efcc5cd0 007e9104 00bf9fcc 00bf9ea0 000061b2 00bf9cf8 efcc5d54 00bf9cf8
00000007 00000000 00000000 efcc5d50 00000000 00000000 00000000 00000000
00000000 00029efe 008fe870 008fe858 efcc5d50 008fe5d0 00029ee4 00bf9eb4
00000009 008fe858 008fe5d0 00355490 008fe5d0 0002a880 efcc5d50 007e9060
Call Trace: [<000236d0>] panic+0xae/0x23e
[<00024cbe>] do_exit+0x8e/0x6e4
[<000061b2>] send_fault_sig+0x5a/0xb8
[<00029efe>] __dequeue_signal+0x1a/0xb2
[<00029ee4>] __dequeue_signal+0x0/0xb2
[<0002a880>] dequeue_signal+0xec/0xf6
[<0002538e>] do_group_exit+0x7a/0xa2
[<0002c084>] get_signal+0x19c/0x3d2
[<0002c280>] get_signal+0x398/0x3d2
[<00003648>] do_notify_resume+0x2e/0x786
[<00005654>] buserr_c+0x2d4/0x610
[<00034212>] search_exception_tables+0x1a/0x34
[<00005654>] buserr_c+0x2d4/0x610
[<00003db0>] handle_kernel_fault+0x10/0x58
[<00005654>] buserr_c+0x2d4/0x610
[<000061b2>] send_fault_sig+0x5a/0xb8
[<00046e3e>] update_rmtp+0x4e/0x68
[<00034212>] search_exception_tables+0x1a/0x34
[<00046e3e>] update_rmtp+0x4e/0x68
[<00003db0>] handle_kernel_fault+0x10/0x58
[<00046e3e>] update_rmtp+0x4e/0x68
[<000061b2>] send_fault_sig+0x5a/0xb8
[<000056b2>] buserr_c+0x332/0x610
[<0000c350>] dnrm_lp+0x162/0x17e
[<00047276>] hrtimer_nanosleep+0xdc/0x128
[<00046e58>] hrtimer_wakeup+0x0/0x1e
[<00010000>] stanh+0xd8/0x140
[<000029a0>] do_signal_return+0x10/0x1a
[<00002924>] syscall+0x8/0xc
[<0008c00b>] ___cache_free+0xcd/0x13e
---[ end Kernel panic - not syncing: Aiee, killing interrupt handler!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0 (was: Re: mm, page_alloc: avoid looking up the first zone in a zonelist twice)
[not found] ` <CAMuHMdXCN5LeNCNJ9=B5sGAtdd81JeRNrUMSCOjSL_Bx1-tDvA@mail.gmail.com>
@ 2016-05-31 10:13 ` Mel Gorman
0 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2016-05-31 10:13 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Andrew Morton, Linux Kernel Mailing List, Linux MM, linux-m68k
On Tue, May 31, 2016 at 11:28:05AM +0200, Geert Uytterhoeven wrote:
> Hi Mel,
>
> On Mon, May 30, 2016 at 8:56 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> > Thanks. Please try the following instead
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index bb320cde4d6d..557549c81083 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3024,6 +3024,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> > apply_fair = false;
> > fair_skipped = false;
> > reset_alloc_batches(ac->preferred_zoneref->zone);
> > + z = ac->preferred_zoneref;
> > goto zonelist_scan;
> > }
>
> Thanks a lot, that seems to fix the issue!.
>
> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
>
> JFTR, without the fix, sometimes I get a different, but equally obscure, crash
> than the one I posted before:
>
I'm afraid I don't recognise it. Given the nature of the previous bug
though, I have a vague suspicion that someone is not handling a page
allocation failure properly and goes boom later.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0 (was: Re: mm, page_alloc: avoid looking up the first zone in a zonelist twice)
[not found] ` <20160530155644.GP2527@techsingularity.net>
2016-05-30 17:37 ` Geert Uytterhoeven
@ 2016-05-31 21:44 ` Vlastimil Babka
[not found] ` <574E05B8.3060009@suse.cz>
2 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2016-05-31 21:44 UTC (permalink / raw)
To: Mel Gorman, Geert Uytterhoeven
Cc: Andrew Morton, Linux Kernel Mailing List, Linux MM, linux-m68k
On 05/30/2016 05:56 PM, Mel Gorman wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dba8cfd0b2d6..f2c1e47adc11 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3232,6 +3232,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> * allocations are system rather than user orientated
> */
> ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> + ac->high_zoneidx, ac->nodemask);
> + ac->classzone_idx = zonelist_zone_idx(ac->preferred_zoneref);
> page = get_page_from_freelist(gfp_mask, order,
> ALLOC_NO_WATERMARKS, ac);
> if (page)
>
Even if that didn't help for this report, I think it's needed too
(except the classzone_idx which doesn't exist anymore?).
And I think the following as well. (the changed comment could be also
just deleted).
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8f3bfc435ee..0a8d8e2bf331 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3808,7 +3808,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned
int order,
/* Dirty zone balancing only done in the fast path */
ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
- /* The preferred zone is used for statistics later */
+ /* The preferred zone is crucial for get_page_from_freelist */
ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
ac.high_zoneidx, ac.nodemask);
if (!ac.preferred_zoneref) {
@@ -3832,8 +3832,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned
int order,
* Restore the original nodemask if it was potentially replaced with
* &cpuset_current_mems_allowed to optimize the fast-path attempt.
*/
- if (cpusets_enabled())
+ if (cpusets_enabled()) {
ac.nodemask = nodemask;
+ ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
+ ac.high_zoneidx, ac.nodemask);
+ }
page = __alloc_pages_slowpath(alloc_mask, order, &ac);
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0 (was: Re: mm, page_alloc: avoid looking up the first zone in a zonelist twice)
[not found] ` <574E05B8.3060009@suse.cz>
@ 2016-06-01 9:19 ` Mel Gorman
[not found] ` <20160601091921.GT2527@techsingularity.net>
1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2016-06-01 9:19 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Geert Uytterhoeven, Andrew Morton, Linux Kernel Mailing List,
Linux MM, linux-m68k
On Tue, May 31, 2016 at 11:44:24PM +0200, Vlastimil Babka wrote:
> On 05/30/2016 05:56 PM, Mel Gorman wrote:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index dba8cfd0b2d6..f2c1e47adc11 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3232,6 +3232,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > * allocations are system rather than user orientated
> > */
> > ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> > + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> > + ac->high_zoneidx, ac->nodemask);
> > + ac->classzone_idx = zonelist_zone_idx(ac->preferred_zoneref);
> > page = get_page_from_freelist(gfp_mask, order,
> > ALLOC_NO_WATERMARKS, ac);
> > if (page)
> >
>
> Even if that didn't help for this report, I think it's needed too
> (except the classzone_idx which doesn't exist anymore?).
>
> And I think the following as well. (the changed comment could be also
> just deleted).
>
Why?
The comment is fine but I do not see why the recalculation would occur.
In the original code, the preferred_zoneref for statistics is calculated
based on either the supplied nodemask or cpuset_current_mems_allowed during
the initial attempt. It then relies on the cpuset checks in the slowpath
to encorce mems_allowed but the preferred zone doesn't change.
With your proposed change, it's possible that the
preferred_zoneref recalculation points to a zoneref disallowed by
cpuset_current_mems_sllowed. While it'll be skipped during allocation,
the statistics will still be against a zone that is potentially outside
what is allowed.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
[not found] ` <20160601091921.GT2527@techsingularity.net>
@ 2016-06-01 10:01 ` Vlastimil Babka
[not found] ` <574EB274.4030408@suse.cz>
1 sibling, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2016-06-01 10:01 UTC (permalink / raw)
To: Mel Gorman
Cc: Geert Uytterhoeven, Andrew Morton, Linux Kernel Mailing List,
Linux MM, linux-m68k
On 06/01/2016 11:19 AM, Mel Gorman wrote:
> On Tue, May 31, 2016 at 11:44:24PM +0200, Vlastimil Babka wrote:
>> On 05/30/2016 05:56 PM, Mel Gorman wrote:
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index dba8cfd0b2d6..f2c1e47adc11 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -3232,6 +3232,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>>> * allocations are system rather than user orientated
>>> */
>>> ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
>>> + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
>>> + ac->high_zoneidx, ac->nodemask);
>>> + ac->classzone_idx = zonelist_zone_idx(ac->preferred_zoneref);
>>> page = get_page_from_freelist(gfp_mask, order,
>>> ALLOC_NO_WATERMARKS, ac);
>>> if (page)
>>>
>>
>> Even if that didn't help for this report, I think it's needed too
>> (except the classzone_idx which doesn't exist anymore?).
But you agree that the hunk above should be merged?
>> And I think the following as well. (the changed comment could be also
>> just deleted).
>>
>
> Why?
>
> The comment is fine but I do not see why the recalculation would occur.
>
> In the original code, the preferred_zoneref for statistics is calculated
> based on either the supplied nodemask or cpuset_current_mems_allowed during
> the initial attempt. It then relies on the cpuset checks in the slowpath
> to encorce mems_allowed but the preferred zone doesn't change.
>
> With your proposed change, it's possible that the
> preferred_zoneref recalculation points to a zoneref disallowed by
> cpuset_current_mems_sllowed. While it'll be skipped during allocation,
> the statistics will still be against a zone that is potentially outside
> what is allowed.
Hmm that's true and I was ready to agree. But then I noticed that
gfp_to_alloc_flags() can mask out ALLOC_CPUSET for GFP_ATOMIC. So it's
like a lighter version of the ALLOC_NO_WATERMARKS situation. In that
case it's wrong if we leave ac->preferred_zoneref at a position that has
skipped some zones due to mempolicies?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
[not found] ` <574EB274.4030408@suse.cz>
@ 2016-06-02 10:39 ` Mel Gorman
[not found] ` <20160602103936.GU2527@techsingularity.net>
1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2016-06-02 10:39 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Geert Uytterhoeven, Andrew Morton, Linux Kernel Mailing List,
Linux MM, linux-m68k
On Wed, Jun 01, 2016 at 12:01:24PM +0200, Vlastimil Babka wrote:
> > Why?
> >
> > The comment is fine but I do not see why the recalculation would occur.
> >
> > In the original code, the preferred_zoneref for statistics is calculated
> > based on either the supplied nodemask or cpuset_current_mems_allowed during
> > the initial attempt. It then relies on the cpuset checks in the slowpath
> > to encorce mems_allowed but the preferred zone doesn't change.
> >
> > With your proposed change, it's possible that the
> > preferred_zoneref recalculation points to a zoneref disallowed by
> > cpuset_current_mems_sllowed. While it'll be skipped during allocation,
> > the statistics will still be against a zone that is potentially outside
> > what is allowed.
>
> Hmm that's true and I was ready to agree. But then I noticed that
> gfp_to_alloc_flags() can mask out ALLOC_CPUSET for GFP_ATOMIC. So it's
> like a lighter version of the ALLOC_NO_WATERMARKS situation. In that
> case it's wrong if we leave ac->preferred_zoneref at a position that has
> skipped some zones due to mempolicies?
>
So both options are wrong then. How about this?
---8<---
mm, page_alloc: Recalculate the preferred zoneref if the context can ignore memory policies
The optimistic fast path may use cpuset_current_mems_allowed instead of
of a NULL nodemask supplied by the caller for cpuset allocations. The
preferred zone is calculated on this basis for statistic purposes and
as a starting point in the zonelist iterator.
However, if the context can ignore memory policies due to being atomic or
being able to ignore watermarks then the starting point in the zonelist
iterator is no longer correct. This patch resets the zonelist iterator in
the allocator slowpath if the context can ignore memory policies. This will
alter the zone used for statistics but only after it is known that it makes
sense for that context. Resetting it before entering the slowpath would
potentially allow an ALLOC_CPUSET allocation to be accounted for against
the wrong zone. Note that while nodemask is not explicitly set to the
original nodemask, it would only have been overwritten if cpuset_enabled()
and it was reset before the slowpath was entered.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
mm/page_alloc.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 557549c81083..b17358617a1b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3598,6 +3598,17 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
*/
alloc_flags = gfp_to_alloc_flags(gfp_mask);
+ /*
+ * Reset the zonelist iterators if memory policies can be ignored.
+ * These allocations are high priority and system rather than user
+ * orientated.
+ */
+ if ((alloc_flags & ALLOC_NO_WATERMARKS) || !(alloc_flags & ALLOC_CPUSET)) {
+ ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+ ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
+ ac->high_zoneidx, ac->nodemask);
+ }
+
/* This is the last chance, in general, before the goto nopage. */
page = get_page_from_freelist(gfp_mask, order,
alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
@@ -3606,12 +3617,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
/* Allocate without watermarks if the context allows */
if (alloc_flags & ALLOC_NO_WATERMARKS) {
- /*
- * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
- * the allocation is high priority and these type of
- * allocations are system rather than user orientated
- */
- ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
page = get_page_from_freelist(gfp_mask, order,
ALLOC_NO_WATERMARKS, ac);
if (page)
@@ -3810,7 +3815,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
/* Dirty zone balancing only done in the fast path */
ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
- /* The preferred zone is used for statistics later */
+ /*
+ * The preferred zone is used for statistics but crucially it is
+ * also used as the starting point for the zonelist iterator. It
+ * may get reset for allocations that ignore memory policies.
+ */
ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
ac.high_zoneidx, ac.nodemask);
if (!ac.preferred_zoneref) {
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
[not found] ` <20160602103936.GU2527@techsingularity.net>
@ 2016-06-02 12:04 ` Vlastimil Babka
2016-06-02 12:19 ` Mel Gorman
0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2016-06-02 12:04 UTC (permalink / raw)
To: Mel Gorman
Cc: Geert Uytterhoeven, Andrew Morton, Linux Kernel Mailing List,
Linux MM, linux-m68k
On 06/02/2016 12:39 PM, Mel Gorman wrote:
> On Wed, Jun 01, 2016 at 12:01:24PM +0200, Vlastimil Babka wrote:
>>> Why?
>>>
>>> The comment is fine but I do not see why the recalculation would occur.
>>>
>>> In the original code, the preferred_zoneref for statistics is calculated
>>> based on either the supplied nodemask or cpuset_current_mems_allowed during
>>> the initial attempt. It then relies on the cpuset checks in the slowpath
>>> to encorce mems_allowed but the preferred zone doesn't change.
>>>
>>> With your proposed change, it's possible that the
>>> preferred_zoneref recalculation points to a zoneref disallowed by
>>> cpuset_current_mems_sllowed. While it'll be skipped during allocation,
>>> the statistics will still be against a zone that is potentially outside
>>> what is allowed.
>>
>> Hmm that's true and I was ready to agree. But then I noticed that
>> gfp_to_alloc_flags() can mask out ALLOC_CPUSET for GFP_ATOMIC. So it's
>> like a lighter version of the ALLOC_NO_WATERMARKS situation. In that
>> case it's wrong if we leave ac->preferred_zoneref at a position that has
>> skipped some zones due to mempolicies?
>>
>
> So both options are wrong then. How about this?
I wonder if the original patch we're fixing was worth all this trouble
(and more
for my compaction priority series :), but yeah this should work.
> ---8<---
> mm, page_alloc: Recalculate the preferred zoneref if the context can ignore memory policies
>
> The optimistic fast path may use cpuset_current_mems_allowed instead of
> of a NULL nodemask supplied by the caller for cpuset allocations. The
> preferred zone is calculated on this basis for statistic purposes and
> as a starting point in the zonelist iterator.
>
> However, if the context can ignore memory policies due to being atomic or
> being able to ignore watermarks then the starting point in the zonelist
> iterator is no longer correct. This patch resets the zonelist iterator in
> the allocator slowpath if the context can ignore memory policies. This will
> alter the zone used for statistics but only after it is known that it makes
> sense for that context. Resetting it before entering the slowpath would
> potentially allow an ALLOC_CPUSET allocation to be accounted for against
> the wrong zone. Note that while nodemask is not explicitly set to the
> original nodemask, it would only have been overwritten if cpuset_enabled()
> and it was reset before the slowpath was entered.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-02 12:04 ` Vlastimil Babka
@ 2016-06-02 12:19 ` Mel Gorman
2016-06-02 18:43 ` Andrew Morton
0 siblings, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2016-06-02 12:19 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Geert Uytterhoeven, Andrew Morton, Linux Kernel Mailing List,
Linux MM, linux-m68k
On Thu, Jun 02, 2016 at 02:04:42PM +0200, Vlastimil Babka wrote:
> On 06/02/2016 12:39 PM, Mel Gorman wrote:
> >On Wed, Jun 01, 2016 at 12:01:24PM +0200, Vlastimil Babka wrote:
> >>>Why?
> >>>
> >>>The comment is fine but I do not see why the recalculation would occur.
> >>>
> >>>In the original code, the preferred_zoneref for statistics is calculated
> >>>based on either the supplied nodemask or cpuset_current_mems_allowed during
> >>>the initial attempt. It then relies on the cpuset checks in the slowpath
> >>>to encorce mems_allowed but the preferred zone doesn't change.
> >>>
> >>>With your proposed change, it's possible that the
> >>>preferred_zoneref recalculation points to a zoneref disallowed by
> >>>cpuset_current_mems_sllowed. While it'll be skipped during allocation,
> >>>the statistics will still be against a zone that is potentially outside
> >>>what is allowed.
> >>
> >>Hmm that's true and I was ready to agree. But then I noticed that
> >>gfp_to_alloc_flags() can mask out ALLOC_CPUSET for GFP_ATOMIC. So it's
> >>like a lighter version of the ALLOC_NO_WATERMARKS situation. In that
> >>case it's wrong if we leave ac->preferred_zoneref at a position that has
> >>skipped some zones due to mempolicies?
> >>
> >
> >So both options are wrong then. How about this?
>
> I wonder if the original patch we're fixing was worth all this trouble (and
> more
> for my compaction priority series :), but yeah this should work.
>
I considered that option when the bug report first came in. It was a 2%
hit to the page allocator microbenchmark to revert it. More than I expected
but enough to care. If this causes another problem, I'll revert it as
there will be other options later.
> >---8<---
> >mm, page_alloc: Recalculate the preferred zoneref if the context can ignore memory policies
> >
> >The optimistic fast path may use cpuset_current_mems_allowed instead of
> >of a NULL nodemask supplied by the caller for cpuset allocations. The
> >preferred zone is calculated on this basis for statistic purposes and
> >as a starting point in the zonelist iterator.
> >
> >However, if the context can ignore memory policies due to being atomic or
> >being able to ignore watermarks then the starting point in the zonelist
> >iterator is no longer correct. This patch resets the zonelist iterator in
> >the allocator slowpath if the context can ignore memory policies. This will
> >alter the zone used for statistics but only after it is known that it makes
> >sense for that context. Resetting it before entering the slowpath would
> >potentially allow an ALLOC_CPUSET allocation to be accounted for against
> >the wrong zone. Note that while nodemask is not explicitly set to the
> >original nodemask, it would only have been overwritten if cpuset_enabled()
> >and it was reset before the slowpath was entered.
> >
> >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>
Thanks.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-02 12:19 ` Mel Gorman
@ 2016-06-02 18:43 ` Andrew Morton
2016-06-03 3:52 ` Stephen Rothwell
2016-06-03 7:57 ` Geert Uytterhoeven
0 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2016-06-02 18:43 UTC (permalink / raw)
To: Mel Gorman
Cc: Vlastimil Babka, Geert Uytterhoeven, Linux Kernel Mailing List,
Linux MM, linux-m68k
On Thu, 2 Jun 2016 13:19:36 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> > >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> >
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> >
>
> Thanks.
I queued this. A tested-by:Geert would be nice?
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
The optimistic fast path may use cpuset_current_mems_allowed instead of of
a NULL nodemask supplied by the caller for cpuset allocations. The
preferred zone is calculated on this basis for statistic purposes and as a
starting point in the zonelist iterator.
However, if the context can ignore memory policies due to being atomic or
being able to ignore watermarks then the starting point in the zonelist
iterator is no longer correct. This patch resets the zonelist iterator in
the allocator slowpath if the context can ignore memory policies. This
will alter the zone used for statistics but only after it is known that it
makes sense for that context. Resetting it before entering the slowpath
would potentially allow an ALLOC_CPUSET allocation to be accounted for
against the wrong zone. Note that while nodemask is not explicitly set to
the original nodemask, it would only have been overwritten if
cpuset_enabled() and it was reset before the slowpath was entered.
Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff -puN mm/page_alloc.c~mm-page_alloc-recalculate-the-preferred-zoneref-if-the-context-can-ignore-memory-policies mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-recalculate-the-preferred-zoneref-if-the-context-can-ignore-memory-policies
+++ a/mm/page_alloc.c
@@ -3604,6 +3604,17 @@ retry:
*/
alloc_flags = gfp_to_alloc_flags(gfp_mask);
+ /*
+ * Reset the zonelist iterators if memory policies can be ignored.
+ * These allocations are high priority and system rather than user
+ * orientated.
+ */
+ if ((alloc_flags & ALLOC_NO_WATERMARKS) || !(alloc_flags & ALLOC_CPUSET)) {
+ ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+ ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
+ ac->high_zoneidx, ac->nodemask);
+ }
+
/* This is the last chance, in general, before the goto nopage. */
page = get_page_from_freelist(gfp_mask, order,
alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
@@ -3612,12 +3623,6 @@ retry:
/* Allocate without watermarks if the context allows */
if (alloc_flags & ALLOC_NO_WATERMARKS) {
- /*
- * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
- * the allocation is high priority and these type of
- * allocations are system rather than user orientated
- */
- ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
page = get_page_from_freelist(gfp_mask, order,
ALLOC_NO_WATERMARKS, ac);
if (page)
@@ -3816,7 +3821,11 @@ retry_cpuset:
/* Dirty zone balancing only done in the fast path */
ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
- /* The preferred zone is used for statistics later */
+ /*
+ * The preferred zone is used for statistics but crucially it is
+ * also used as the starting point for the zonelist iterator. It
+ * may get reset for allocations that ignore memory policies.
+ */
ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
ac.high_zoneidx, ac.nodemask);
if (!ac.preferred_zoneref) {
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-02 18:43 ` Andrew Morton
@ 2016-06-03 3:52 ` Stephen Rothwell
2016-06-03 7:57 ` Geert Uytterhoeven
1 sibling, 0 replies; 19+ messages in thread
From: Stephen Rothwell @ 2016-06-03 3:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Vlastimil Babka, Geert Uytterhoeven,
Linux Kernel Mailing List, Linux MM, linux-m68k
Hi Andrew,
On Thu, 2 Jun 2016 11:43:41 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 2 Jun 2016 13:19:36 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
>
> > > >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > >
> > > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > >
> >
> > Thanks.
>
> I queued this. A tested-by:Geert would be nice?
>
>
> From: Mel Gorman <mgorman@techsingularity.net>
> Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
I dumped that into linux-next today as well.
--
Cheers,
Stephen Rothwell
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-02 18:43 ` Andrew Morton
2016-06-03 3:52 ` Stephen Rothwell
@ 2016-06-03 7:57 ` Geert Uytterhoeven
2016-06-03 8:41 ` Mel Gorman
1 sibling, 1 reply; 19+ messages in thread
From: Geert Uytterhoeven @ 2016-06-03 7:57 UTC (permalink / raw)
To: Andrew Morton, Mel Gorman
Cc: Vlastimil Babka, Linux Kernel Mailing List, Linux MM, linux-m68k
Hi Andrew, Mel,
On Thu, Jun 2, 2016 at 8:43 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 2 Jun 2016 13:19:36 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
>> > >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> >
>> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> >
>>
>> Thanks.
>
> I queued this. A tested-by:Geert would be nice?
>
>
> From: Mel Gorman <mgorman@techsingularity.net>
> Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
>
> The optimistic fast path may use cpuset_current_mems_allowed instead of of
> a NULL nodemask supplied by the caller for cpuset allocations. The
> preferred zone is calculated on this basis for statistic purposes and as a
> starting point in the zonelist iterator.
>
> However, if the context can ignore memory policies due to being atomic or
> being able to ignore watermarks then the starting point in the zonelist
> iterator is no longer correct. This patch resets the zonelist iterator in
> the allocator slowpath if the context can ignore memory policies. This
> will alter the zone used for statistics but only after it is known that it
> makes sense for that context. Resetting it before entering the slowpath
> would potentially allow an ALLOC_CPUSET allocation to be accounted for
> against the wrong zone. Note that while nodemask is not explicitly set to
> the original nodemask, it would only have been overwritten if
> cpuset_enabled() and it was reset before the slowpath was entered.
>
> Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
> Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
My understanding was that this was an an additional patch, not fixing
the problem in-se?
Indeed, after applying this patch (without the other one that added
"z = ac->preferred_zoneref;" to the reset_fair block of
get_page_from_freelist()) I still get crashes...
Now testing with both applied...
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/page_alloc.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff -puN mm/page_alloc.c~mm-page_alloc-recalculate-the-preferred-zoneref-if-the-context-can-ignore-memory-policies mm/page_alloc.c
> --- a/mm/page_alloc.c~mm-page_alloc-recalculate-the-preferred-zoneref-if-the-context-can-ignore-memory-policies
> +++ a/mm/page_alloc.c
> @@ -3604,6 +3604,17 @@ retry:
> */
> alloc_flags = gfp_to_alloc_flags(gfp_mask);
>
> + /*
> + * Reset the zonelist iterators if memory policies can be ignored.
> + * These allocations are high priority and system rather than user
> + * orientated.
> + */
> + if ((alloc_flags & ALLOC_NO_WATERMARKS) || !(alloc_flags & ALLOC_CPUSET)) {
> + ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
> + ac->high_zoneidx, ac->nodemask);
> + }
> +
> /* This is the last chance, in general, before the goto nopage. */
> page = get_page_from_freelist(gfp_mask, order,
> alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
> @@ -3612,12 +3623,6 @@ retry:
>
> /* Allocate without watermarks if the context allows */
> if (alloc_flags & ALLOC_NO_WATERMARKS) {
> - /*
> - * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
> - * the allocation is high priority and these type of
> - * allocations are system rather than user orientated
> - */
> - ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> page = get_page_from_freelist(gfp_mask, order,
> ALLOC_NO_WATERMARKS, ac);
> if (page)
> @@ -3816,7 +3821,11 @@ retry_cpuset:
> /* Dirty zone balancing only done in the fast path */
> ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
>
> - /* The preferred zone is used for statistics later */
> + /*
> + * The preferred zone is used for statistics but crucially it is
> + * also used as the starting point for the zonelist iterator. It
> + * may get reset for allocations that ignore memory policies.
> + */
> ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
> ac.high_zoneidx, ac.nodemask);
> if (!ac.preferred_zoneref) {
> _
>
--
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-03 7:57 ` Geert Uytterhoeven
@ 2016-06-03 8:41 ` Mel Gorman
2016-06-03 9:00 ` Geert Uytterhoeven
0 siblings, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2016-06-03 8:41 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Andrew Morton, Vlastimil Babka, Linux Kernel Mailing List,
Linux MM, linux-m68k
On Fri, Jun 03, 2016 at 09:57:22AM +0200, Geert Uytterhoeven wrote:
> Hi Andrew, Mel,
>
> On Thu, Jun 2, 2016 at 8:43 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Thu, 2 Jun 2016 13:19:36 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> >> > >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> >> >
> >> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> >> >
> >>
> >> Thanks.
> >
> > I queued this. A tested-by:Geert would be nice?
> >
> >
> > From: Mel Gorman <mgorman@techsingularity.net>
> > Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
> >
> > The optimistic fast path may use cpuset_current_mems_allowed instead of of
> > a NULL nodemask supplied by the caller for cpuset allocations. The
> > preferred zone is calculated on this basis for statistic purposes and as a
> > starting point in the zonelist iterator.
> >
> > However, if the context can ignore memory policies due to being atomic or
> > being able to ignore watermarks then the starting point in the zonelist
> > iterator is no longer correct. This patch resets the zonelist iterator in
> > the allocator slowpath if the context can ignore memory policies. This
> > will alter the zone used for statistics but only after it is known that it
> > makes sense for that context. Resetting it before entering the slowpath
> > would potentially allow an ALLOC_CPUSET allocation to be accounted for
> > against the wrong zone. Note that while nodemask is not explicitly set to
> > the original nodemask, it would only have been overwritten if
> > cpuset_enabled() and it was reset before the slowpath was entered.
> >
> > Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
> > Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
>
> My understanding was that this was an an additional patch, not fixing
> the problem in-se?
>
It doesn't fix the problem you had, it is a follow-on patch that
potentially affects.
> Indeed, after applying this patch (without the other one that added
> "z = ac->preferred_zoneref;" to the reset_fair block of
> get_page_from_freelist()) I still get crashes...
>
The patch you have is the only one required for the crash. This patch
handles a corner case with atomic allocations that can ignore memory
policies.
> Now testing with both applied...
Thanks.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-03 8:41 ` Mel Gorman
@ 2016-06-03 9:00 ` Geert Uytterhoeven
2016-06-03 16:35 ` Andrew Morton
0 siblings, 1 reply; 19+ messages in thread
From: Geert Uytterhoeven @ 2016-06-03 9:00 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Vlastimil Babka, Linux Kernel Mailing List,
Linux MM, linux-m68k
Hi Mel,
On Fri, Jun 3, 2016 at 10:41 AM, Mel Gorman <mgorman@techsingularity.net> wrote:
> On Fri, Jun 03, 2016 at 09:57:22AM +0200, Geert Uytterhoeven wrote:
>> On Thu, Jun 2, 2016 at 8:43 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
>> > On Thu, 2 Jun 2016 13:19:36 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
>> >> > >Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> >> >
>> >> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> >>
>> >> Thanks.
>> >
>> > I queued this. A tested-by:Geert would be nice?
>> >
>> > From: Mel Gorman <mgorman@techsingularity.net>
>> > Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
>> >
>> > The optimistic fast path may use cpuset_current_mems_allowed instead of of
>> > a NULL nodemask supplied by the caller for cpuset allocations. The
>> > preferred zone is calculated on this basis for statistic purposes and as a
>> > starting point in the zonelist iterator.
>> >
>> > However, if the context can ignore memory policies due to being atomic or
>> > being able to ignore watermarks then the starting point in the zonelist
>> > iterator is no longer correct. This patch resets the zonelist iterator in
>> > the allocator slowpath if the context can ignore memory policies. This
>> > will alter the zone used for statistics but only after it is known that it
>> > makes sense for that context. Resetting it before entering the slowpath
>> > would potentially allow an ALLOC_CPUSET allocation to be accounted for
>> > against the wrong zone. Note that while nodemask is not explicitly set to
>> > the original nodemask, it would only have been overwritten if
>> > cpuset_enabled() and it was reset before the slowpath was entered.
>> >
>> > Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
>> > Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
>>
>> My understanding was that this was an an additional patch, not fixing
>> the problem in-se?
>
> It doesn't fix the problem you had, it is a follow-on patch that
> potentially affects.
Thanks for confirming!
>> Indeed, after applying this patch (without the other one that added
>> "z = ac->preferred_zoneref;" to the reset_fair block of
>> get_page_from_freelist()) I still get crashes...
>
> The patch you have is the only one required for the crash. This patch
> handles a corner case with atomic allocations that can ignore memory
> policies.
OK.
In the mean time my tests completed successfully with both patches applied.
Thanks!
Gr{oetje,eeting}s,
Geert
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-03 9:00 ` Geert Uytterhoeven
@ 2016-06-03 16:35 ` Andrew Morton
2016-06-03 16:46 ` Mel Gorman
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2016-06-03 16:35 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Mel Gorman, Vlastimil Babka, Linux Kernel Mailing List, Linux MM,
linux-m68k
On Fri, 3 Jun 2016 11:00:30 +0200 Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> In the mean time my tests completed successfully with both patches applied.
Can we please identify "both patches" with specificity? I have the
below one.
From: Mel Gorman <mgorman@techsingularity.net>
Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies
The optimistic fast path may use cpuset_current_mems_allowed instead of of
a NULL nodemask supplied by the caller for cpuset allocations. The
preferred zone is calculated on this basis for statistic purposes and as a
starting point in the zonelist iterator.
However, if the context can ignore memory policies due to being atomic or
being able to ignore watermarks then the starting point in the zonelist
iterator is no longer correct. This patch resets the zonelist iterator in
the allocator slowpath if the context can ignore memory policies. This
will alter the zone used for statistics but only after it is known that it
makes sense for that context. Resetting it before entering the slowpath
would potentially allow an ALLOC_CPUSET allocation to be accounted for
against the wrong zone. Note that while nodemask is not explicitly set to
the original nodemask, it would only have been overwritten if
cpuset_enabled() and it was reset before the slowpath was entered.
Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net
Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff -puN mm/page_alloc.c~mm-page_alloc-recalculate-the-preferred-zoneref-if-the-context-can-ignore-memory-policies mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-recalculate-the-preferred-zoneref-if-the-context-can-ignore-memory-policies
+++ a/mm/page_alloc.c
@@ -3604,6 +3604,17 @@ retry:
*/
alloc_flags = gfp_to_alloc_flags(gfp_mask);
+ /*
+ * Reset the zonelist iterators if memory policies can be ignored.
+ * These allocations are high priority and system rather than user
+ * orientated.
+ */
+ if ((alloc_flags & ALLOC_NO_WATERMARKS) || !(alloc_flags & ALLOC_CPUSET)) {
+ ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+ ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
+ ac->high_zoneidx, ac->nodemask);
+ }
+
/* This is the last chance, in general, before the goto nopage. */
page = get_page_from_freelist(gfp_mask, order,
alloc_flags & ~ALLOC_NO_WATERMARKS, ac);
@@ -3612,12 +3623,6 @@ retry:
/* Allocate without watermarks if the context allows */
if (alloc_flags & ALLOC_NO_WATERMARKS) {
- /*
- * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds
- * the allocation is high priority and these type of
- * allocations are system rather than user orientated
- */
- ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
page = get_page_from_freelist(gfp_mask, order,
ALLOC_NO_WATERMARKS, ac);
if (page)
@@ -3816,7 +3821,11 @@ retry_cpuset:
/* Dirty zone balancing only done in the fast path */
ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE);
- /* The preferred zone is used for statistics later */
+ /*
+ * The preferred zone is used for statistics but crucially it is
+ * also used as the starting point for the zonelist iterator. It
+ * may get reset for allocations that ignore memory policies.
+ */
ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
ac.high_zoneidx, ac.nodemask);
if (!ac.preferred_zoneref) {
_
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-03 16:35 ` Andrew Morton
@ 2016-06-03 16:46 ` Mel Gorman
2016-06-03 16:49 ` Andrew Morton
0 siblings, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2016-06-03 16:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Geert Uytterhoeven, Vlastimil Babka, Linux Kernel Mailing List,
Linux MM, linux-m68k
On Fri, Jun 03, 2016 at 09:35:18AM -0700, Andrew Morton wrote:
> On Fri, 3 Jun 2016 11:00:30 +0200 Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> > In the mean time my tests completed successfully with both patches applied.
>
> Can we please identify "both patches" with specificity? I have the
> below one.
>
mm, page_alloc: Reset zonelist iterator after resetting fair zone allocation policy
mm, page_alloc: Recalculate the preferred zoneref if the context can ignore memory policies
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0
2016-06-03 16:46 ` Mel Gorman
@ 2016-06-03 16:49 ` Andrew Morton
0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2016-06-03 16:49 UTC (permalink / raw)
To: Mel Gorman
Cc: Geert Uytterhoeven, Vlastimil Babka, Linux Kernel Mailing List,
Linux MM, linux-m68k
On Fri, 3 Jun 2016 17:46:25 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> On Fri, Jun 03, 2016 at 09:35:18AM -0700, Andrew Morton wrote:
> > On Fri, 3 Jun 2016 11:00:30 +0200 Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > > In the mean time my tests completed successfully with both patches applied.
> >
> > Can we please identify "both patches" with specificity? I have the
> > below one.
> >
>
> mm, page_alloc: Reset zonelist iterator after resetting fair zone allocation policy
> mm, page_alloc: Recalculate the preferred zoneref if the context can ignore memory policies
Cool, thanks. I'll get both over to Linus today.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2016-06-03 16:49 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAMuHMdV00vJJxoA7XABw+mFF+2QUd1MuQbPKKgkmGnK_NySZpg@mail.gmail.com>
2016-05-30 15:56 ` BUG: scheduling while atomic: cron/668/0x10c9a0c0 (was: Re: mm, page_alloc: avoid looking up the first zone in a zonelist twice) Mel Gorman
[not found] ` <20160530155644.GP2527@techsingularity.net>
2016-05-30 17:37 ` Geert Uytterhoeven
2016-05-30 18:56 ` Mel Gorman
[not found] ` <20160530185616.GQ2527@techsingularity.net>
2016-05-31 9:28 ` Geert Uytterhoeven
[not found] ` <CAMuHMdXCN5LeNCNJ9=B5sGAtdd81JeRNrUMSCOjSL_Bx1-tDvA@mail.gmail.com>
2016-05-31 10:13 ` Mel Gorman
2016-05-31 21:44 ` Vlastimil Babka
[not found] ` <574E05B8.3060009@suse.cz>
2016-06-01 9:19 ` Mel Gorman
[not found] ` <20160601091921.GT2527@techsingularity.net>
2016-06-01 10:01 ` BUG: scheduling while atomic: cron/668/0x10c9a0c0 Vlastimil Babka
[not found] ` <574EB274.4030408@suse.cz>
2016-06-02 10:39 ` Mel Gorman
[not found] ` <20160602103936.GU2527@techsingularity.net>
2016-06-02 12:04 ` Vlastimil Babka
2016-06-02 12:19 ` Mel Gorman
2016-06-02 18:43 ` Andrew Morton
2016-06-03 3:52 ` Stephen Rothwell
2016-06-03 7:57 ` Geert Uytterhoeven
2016-06-03 8:41 ` Mel Gorman
2016-06-03 9:00 ` Geert Uytterhoeven
2016-06-03 16:35 ` Andrew Morton
2016-06-03 16:46 ` Mel Gorman
2016-06-03 16:49 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox