From: Florian Weimer <fw@deneb.enyo.de>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>,
linux-mm@kvack.org, Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16]
Date: Wed, 16 Oct 2019 21:38:49 +0200 [thread overview]
Message-ID: <87blugh452.fsf@mid.deneb.enyo.de> (raw)
In-Reply-To: <96023250-6168-3806-320a-a3468f1cd8c9@suse.cz> (Vlastimil Babka's message of "Tue, 1 Oct 2019 11:10:22 +0200")
* Vlastimil Babka:
> On 9/30/19 11:17 PM, Dave Chinner wrote:
>> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote:
>>> * Dave Chinner:
>>>
>>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote:
>>>>> Simply running “du -hc” on a large directory tree causes du to be
>>>>> killed because of kernel paging request failure in the XFS code.
>>>>
>>>> dmesg output? if the system was still running, then you might be
>>>> able to pull the trace from syslog. But we can't do much without
>>>> knowing what the actual failure was....
>>>
>>> Huh. I actually have something in syslog:
>>>
>>> [ 4001.238411] BUG: kernel NULL pointer dereference, address:
>>> 0000000000000000
>>> [ 4001.238415] #PF: supervisor read access in kernel mode
>>> [ 4001.238417] #PF: error_code(0x0000) - not-present page
>>> [ 4001.238418] PGD 0 P4D 0
>>> [ 4001.238420] Oops: 0000 [#1] SMP PTI
>>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+
>>> #1
>>> [ 4001.238424] Hardware name: System manufacturer System Product
>>> Name/P6X58D-E, BIOS 0701 05/10/2011
>>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
>>
>> That's memory compaction code it's crashed in.
>>
>>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0
>>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1
>>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00
>>> 00 00 4c 89 f7
>
> Tried to decode it, but couldn't match it to source code, my version of
> compiled code is too different. Would it be possible to either send
> mm/compaction.o from the matching build, or output of 'objdump -d -l'
> for the __reset_isolation_pfn function?
(dropping the fs lists)
I got another crash, this time triggered by rsync (large tree with
many small files, few files changed).
Oops:
[41969.140117] BUG: kernel NULL pointer dereference, address: 0000000000000000
[41969.140121] #PF: supervisor read access in kernel mode
[41969.140122] #PF: error_code(0x0000) - not-present page
[41969.140123] PGD 0 P4D 0
[41969.140125] Oops: 0000 [#1] SMP PTI
[41969.140127] CPU: 5 PID: 144 Comm: kswapd0 Tainted: G I 5.2.18fw+ #10
[41969.140128] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011
[41969.140133] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0
[41969.140134] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7
[41969.140135] RSP: 0018:ffffc900003ffde0 EFLAGS: 00010246
[41969.140137] RAX: 000000000004fdac RBX: 0000000000118000 RCX: 0000000000000000
[41969.140138] RDX: 0000000000000000 RSI: 0000000000000230 RDI: ffff88833fffa000
[41969.140138] RBP: ffffc900003ffe18 R08: 000000000000003c R09: ffff888335080000
[41969.140139] R10: ffff88833fff9000 R11: 0000000000000000 R12: 0000000000000001
[41969.140140] R13: 0000000000000001 R14: ffff888338dc01c0 R15: 0000000000000001
[41969.140141] FS: 0000000000000000(0000) GS:ffff888333d40000(0000) knlGS:0000000000000000
[41969.140142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41969.140143] CR2: 0000000000000000 CR3: 000000000200a001 CR4: 00000000000206e0
[41969.140144] Call Trace:
[41969.140147] __reset_isolation_suitable+0x9b/0x120
[41969.140149] reset_isolation_suitable+0x3b/0x40
[41969.140152] kswapd+0x98/0x300
[41969.140154] ? wait_woken+0x80/0x80
[41969.140157] kthread+0x114/0x130
[41969.140158] ? balance_pgdat+0x450/0x450
[41969.140159] ? kthread_park+0x80/0x80
[41969.140162] ret_from_fork+0x1f/0x30
[41969.140163] Modules linked in: usb_storage nfnetlink 8021q garp stp llc fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntrack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec_hdmi coretemp serio_raw snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwdep irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_edac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod cdrom radeon e1000e xhci_pci ptp ehci_pci uhci_hcd xhci_hcd pps_core ehci_hcd sky2 usbcore ttm usb_common sd_mod
[41969.140187] CR2: 0000000000000000
[41969.140189] ---[ end trace e27ddb472a95c047 ]---
This time, I've got a kernel with debugging information (still
5.2.18). The crash is at offset 0x39f:
if (!mem_section[SECTION_NR_TO_ROOT(nr)])
384: 48 c1 ea 35 shr $0x35,%rdx
388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx
38c: 48 c1 e8 2d shr $0x2d,%rax
390: 48 85 d2 test %rdx,%rdx
393: 74 0a je 39f <__reset_isolation_pfn+0x27f>
return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK];
395: 0f b6 c0 movzbl %al,%eax
398: 48 c1 e0 04 shl $0x4,%rax
39c: 48 01 c2 add %rax,%rdx
unsigned long map = section->section_mem_map;
39f: 48 8b 02 mov (%rdx),%rax
clear_pageblock_skip(page);
3a2: 4c 89 f2 mov %r14,%rdx
3a5: 41 b8 01 00 00 00 mov $0x1,%r8d
3ab: 31 f6 xor %esi,%esi
3ad: b9 03 00 00 00 mov $0x3,%ecx
3b2: 4c 89 f7 mov %r14,%rdi
Hmm, -l output is likely more helpful here:
/home/fw/src/linux/linux/mm/compaction.c:293
37a: a8 10 test $0x10,%al
37c: 74 bc je 33a <__reset_isolation_pfn+0x21a>
page_to_section():
/home/fw/src/linux/linux/./include/linux/mm.h:1265
37e: 49 8b 16 mov (%r14),%rdx
381: 48 89 d0 mov %rdx,%rax
__nr_to_section():
/home/fw/src/linux/linux/./include/linux/mmzone.h:1218
384: 48 c1 ea 35 shr $0x35,%rdx
388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx
page_to_section():
/home/fw/src/linux/linux/./include/linux/mm.h:1265
38c: 48 c1 e8 2d shr $0x2d,%rax
__nr_to_section():
/home/fw/src/linux/linux/./include/linux/mmzone.h:1218
390: 48 85 d2 test %rdx,%rdx
393: 74 0a je 39f <__reset_isolation_pfn+0x27f>
/home/fw/src/linux/linux/./include/linux/mmzone.h:1220
395: 0f b6 c0 movzbl %al,%eax
398: 48 c1 e0 04 shl $0x4,%rax
39c: 48 01 c2 add %rax,%rdx
__section_mem_map_addr():
/home/fw/src/linux/linux/./include/linux/mmzone.h:1247
39f: 48 8b 02 mov (%rdx),%rax
__reset_isolation_pfn():
/home/fw/src/linux/linux/mm/compaction.c:294
3a2: 4c 89 f2 mov %r14,%rdx
3a5: 41 b8 01 00 00 00 mov $0x1,%r8d
3ab: 31 f6 xor %esi,%esi
It's this loop:
286 /*
287 * Only clear the hint if a sample indicates there is either a
288 * free page or an LRU page in the block. One or other condition
289 * is necessary for the block to be a migration source/target.
290 */
291 do {
292 if (pfn_valid_within(pfn)) {
293 if (check_source && PageLRU(page)) {
294 clear_pageblock_skip(page);
295 return true;
296 }
297
298 if (check_target && PageBuddy(page)) {
299 clear_pageblock_skip(page);
300 return true;
301 }
302 }
303
304 page += (1 << PAGE_ALLOC_COSTLY_ORDER);
305 pfn += (1 << PAGE_ALLOC_COSTLY_ORDER);
306 } while (page < end_page);
next prev parent reply other threads:[~2019-10-16 19:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-30 7:28 xfs_inode not reclaimed/memory leak on 5.2.16 Florian Weimer
2019-09-30 8:54 ` Dave Chinner
2019-09-30 19:07 ` Florian Weimer
2019-09-30 21:17 ` [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] Dave Chinner
2019-09-30 21:42 ` Florian Weimer
2019-10-01 9:10 ` Vlastimil Babka
2019-10-01 19:40 ` Florian Weimer
2019-10-07 13:28 ` Vlastimil Babka
2019-10-07 13:56 ` Vlastimil Babka
2019-10-08 8:52 ` Mel Gorman
2019-10-16 19:38 ` Florian Weimer [this message]
2019-10-16 20:03 ` Vlastimil Babka
2019-10-18 17:38 ` Florian Weimer
2019-10-21 8:13 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87blugh452.fsf@mid.deneb.enyo.de \
--to=fw@deneb.enyo.de \
--cc=david@fromorbit.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.