linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: vbabka@suse.cz, Linux-MM <linux-mm@kvack.org>
Subject: Re: low-memory crash with patch "capture a page under direct compaction"
Date: Tue, 05 Mar 2019 10:13:24 -0500	[thread overview]
Message-ID: <1551798804.7087.7.camel@lca.pw> (raw)
In-Reply-To: <20190305144234.GH9565@techsingularity.net>

On Tue, 2019-03-05 at 14:42 +0000, Mel Gorman wrote:
> On Mon, Mar 04, 2019 at 10:55:04PM -0500, Qian Cai wrote:
> > Reverted the patches below from linux-next seems fixed a crash while running
> > LTP
> > oom01.
> > 
> > 915c005358c1 mm, compaction: Capture a page under direct compaction -fix
> > e492a5711b67 mm, compaction: capture a page under direct compaction
> > 
> > Especially, just removed this chunk along seems fixed the problem.
> > 
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -2227,10 +2227,10 @@ compact_zone(struct compact_control *cc, struct
> > capture_control *capc)
> >                 }
> > 
> >                 /* Stop if a page has been captured */
> > -               if (capc && capc->page) {
> > -                       ret = COMPACT_SUCCESS;
> > -                       break;
> > -               }
> > 
> 
> It's hard to make sense of how this is connected to the bug. The
> out-of-bounds warning would have required page flags to be corrupted
> quite badly or maybe the use of an uninitialised page. How reproducible
> has this been for you? I just ran the test 100 times with UBSAN and page
> alloc debugging enabled and it completed correctly.
> 

I did manage to reproduce this every time by running oom01 within 3 tries on
this x86_64 server and was unable to reproduce on arm64 and ppc64le servers so
far.

# for i in `seq 1 3`; do /opt/ltp/testcases/bin/oom01 ; done

Sometimes, it could trigger different traces.

[  391.704320] SLUB: Unable to allocate memory on node -1,
gfp=0x800(GFP_NOWAIT)
[  391.737794]   cache: kmalloc-64, object size: 64, buffer size: 416,
default order: 2, min order: 0
[  391.778079]   node 0: slabs: 5999, objs: 232851, free: 16
[  391.802926]   node 1: slabs: 4303, objs: 167067, free: 37
[  499.866479] ------------[ cut here ]------------
[  499.866500] BUG: Bad page state in process oom01  pfn:fffffe7a09fffd07
[  499.890013] kernel BUG at mm/page_alloc.c:3124!
[  499.935430] double fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  499.971334] CPU: 0 PID: 1623 Comm: oom01 Tainted: G        W
5.0.0-next-20190305+ #49
[  499.992805]
================================================================================
[  500.009887] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.009901] RIP: 0010:check_memory_region+0x10/0x1e0
[  500.048252] UBSAN: Undefined behaviour in
kernel/locking/qspinlock.c:138:9
[  500.085378] Code: 00 00 00 48 89 e5 e8 ff 3e 9f 00 5d c3 0f 1f 00 66 2e
0f 1f 84 00 00 00 00 00 48 85 f6 0f 84 68 01 00 00 55 0f b6 d2 48 89 e5
<41> 55 41 54 53 e9 b3 00 00 00 48 b8 00 00 00 00 00 00 00 ff 48 39
[  500.107608] index 8190 is out of range for type 'long unsigned int
[256]'
[  500.138462] RSP: 0000:ffff888428f80000 EFLAGS: 00010002
[  500.223186] CPU: 42 PID: 0 Comm: swapper/42 Tainted: G        W
5.0.0-next-20190305+ #49
[  500.253922] RAX: ffff88827fff41c0 RBX: ffff88827fff41c8 RCX:
ffffffff9c0a9468
[  500.253925] RDX: 0000000000000000 RSI: 0000000000000004 RDI:
ffff88827fff41f8
[  500.277367] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.277370] Call Trace:
[  500.318081] RBP: ffff888428f80000 R08: ffffed104fffe840 R09:
ffffed104fffe83f
[  500.318085] R10: ffffed104fffe83f R11: ffff88827fff41fb R12:
ffff88827fff41f8
[  500.349838]  <IRQ>
[  500.381765] R13: ffff88827fff41c8 R14: ffff88842a96f770 R15:
ffff88827fff41c8
[  500.381768] FS:  00007fdfd3559700(0000) GS:ffff8881f3c00000(0000)
knlGS:0000000000000000
[  500.424074]  dump_stack+0x62/0x9a
[  500.435452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  500.435455] CR2: ffff888428f7fff8 CR3: 000000041abca003 CR4:
00000000001606b0
[  500.467546]  ubsan_epilogue+0xd/0x7f
[  500.500039] Call Trace:
[  500.500042] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci igb
libahci i2c_algo_bit i2c_core libata dm_mirror dm_region_hash dm_log dm_mod
efivarfs
[  500.509058]  __ubsan_handle_out_of_bounds+0x14d/0x192
[  500.541152] ---[ end trace f9ff2b89b6b88c5f ]---
[  500.541155] invalid opcode: 0000 [#2] SMP DEBUG_PAGEALLOC KASAN PTI
[  500.541159] CPU: 10 PID: 262 Comm: kcompactd0 Tainted: G      D W
5.0.0-next-20190305+ #49
[  500.541161] Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9,
BIOS U20 10/25/2017
[  500.541167] RIP: 0010:__isolate_free_page+0x464/0x600
[  500.541170] Code: 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c7 c6 20 6f
0b 9d 48 89 df e8 4a 8b f8 ff 0f 0b 48 c7 c7 a0 32 69 9d e8 51 40 43 00
<0f> 0b 48 c7 c7 e0 31 69 9d e8 43 40 43 00 48 c7 c6 80 71 0b 9d 48
[  500.541172] RSP: 0000:ffff8881f1fdf848 EFLAGS: 00010002
[  500.541175] RAX: 00000000f0000080 RBX: ffffea00064fc000 RCX:
ffff88827fff41d0
[  500.541177] RDX: 1ffffd4000c9f806 RSI: 0000000000000008 RDI:
ffffffff9d9f1640
[  500.541179] RBP: ffff8881f1fdf898 R08: ffffea00064fc000 R09:
ffff8881f1fdfd30
[  500.541181] R10: 0000000000000002 R11: 1ffff1104fffe83b R12:
0000000000000008
[  500.541183] R13: dffffc0000000000 R14: ffff88827fff3000 R15:
0000000000000002
[  500.541185] FS:  0000000000000000(0000) GS:ffff8881f4100000(0000)
knlGS:0000000000000000
[  500.541188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  500.541190] CR2: 00007fdce416a000 CR3: 000000026ea16002 CR4:
00000000001606a0
[  500.541191] Call Trace:
[  500.541199]  compaction_alloc+0x886/0x25f0
[  500.541221]  unmap_and_move+0x37/0x1e70
[  500.541228]  migrate_pages+0x2ca/0xb20
[  500.541238]  compact_zone+0x19cb/0x3620
[  500.541252]  kcompactd_do_work+0x2df/0x680


  reply	other threads:[~2019-03-05 15:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-05  3:55 low-memory crash with patch "capture a page under direct compaction" Qian Cai
2019-03-05 14:42 ` Mel Gorman
2019-03-05 15:13   ` Qian Cai [this message]
2019-03-05 15:27     ` Mel Gorman
2019-03-06  3:01       ` Qian Cai
2019-03-06  3:14         ` Qian Cai
2019-03-06  9:13           ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1551798804.7087.7.camel@lca.pw \
    --to=cai@lca.pw \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).