From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBB0EECE599 for ; Wed, 16 Oct 2019 19:41:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CFE0521835 for ; Wed, 16 Oct 2019 19:41:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CFE0521835 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=deneb.enyo.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7D8B48E0005; Wed, 16 Oct 2019 15:41:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 78A7B8E0001; Wed, 16 Oct 2019 15:41:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 650FF8E0005; Wed, 16 Oct 2019 15:41:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0199.hostedemail.com [216.40.44.199]) by kanga.kvack.org (Postfix) with ESMTP id 3AE068E0001 for ; Wed, 16 Oct 2019 15:41:24 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id DC5ED1849F715 for ; Wed, 16 Oct 2019 19:41:23 +0000 (UTC) X-FDA: 76050666846.21.wheel81_807fb3ec8f91c X-HE-Tag: wheel81_807fb3ec8f91c X-Filterd-Recvd-Size: 9875 Received: from albireo.enyo.de (albireo.enyo.de [37.24.231.21]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Wed, 16 Oct 2019 19:41:22 +0000 (UTC) Received: from [172.17.203.2] (helo=deneb.enyo.de) by albireo.enyo.de with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) id 1iKpAc-0004VT-QE; Wed, 16 Oct 2019 19:41:18 +0000 Received: from fw by deneb.enyo.de with local (Exim 4.92) (envelope-from ) id 1iKp8D-0000pz-Mi; Wed, 16 Oct 2019 21:38:49 +0200 From: Florian Weimer To: Vlastimil Babka Cc: Dave Chinner , linux-mm@kvack.org, Mel Gorman Subject: Re: [bug, 5.2.16] kswapd/compaction null pointer crash [was Re: xfs_inode not reclaimed/memory leak on 5.2.16] References: <87pnji8cpw.fsf@mid.deneb.enyo.de> <20190930085406.GP16973@dread.disaster.area> <87o8z1fvqu.fsf@mid.deneb.enyo.de> <20190930211727.GQ16973@dread.disaster.area> <96023250-6168-3806-320a-a3468f1cd8c9@suse.cz> Date: Wed, 16 Oct 2019 21:38:49 +0200 In-Reply-To: <96023250-6168-3806-320a-a3468f1cd8c9@suse.cz> (Vlastimil Babka's message of "Tue, 1 Oct 2019 11:10:22 +0200") Message-ID: <87blugh452.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: * Vlastimil Babka: > On 9/30/19 11:17 PM, Dave Chinner wrote: >> On Mon, Sep 30, 2019 at 09:07:53PM +0200, Florian Weimer wrote: >>> * Dave Chinner: >>> >>>> On Mon, Sep 30, 2019 at 09:28:27AM +0200, Florian Weimer wrote: >>>>> Simply running =E2=80=9Cdu -hc=E2=80=9D on a large directory tree cau= ses du to be >>>>> killed because of kernel paging request failure in the XFS code. >>>> >>>> dmesg output? if the system was still running, then you might be >>>> able to pull the trace from syslog. But we can't do much without >>>> knowing what the actual failure was.... >>> >>> Huh. I actually have something in syslog: >>> >>> [ 4001.238411] BUG: kernel NULL pointer dereference, address: >>> 0000000000000000 >>> [ 4001.238415] #PF: supervisor read access in kernel mode >>> [ 4001.238417] #PF: error_code(0x0000) - not-present page >>> [ 4001.238418] PGD 0 P4D 0=20 >>> [ 4001.238420] Oops: 0000 [#1] SMP PTI >>> [ 4001.238423] CPU: 3 PID: 143 Comm: kswapd0 Tainted: G I 5.2.16fw+ >>> #1 >>> [ 4001.238424] Hardware name: System manufacturer System Product >>> Name/P6X58D-E, BIOS 0701 05/10/2011 >>> [ 4001.238430] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 >>=20 >> That's memory compaction code it's crashed in. >>=20 >>> [ 4001.238432] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 >>> 48 c1 ea 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 >>> e0 04 48 01 c2 <48> 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 >>> 00 00 4c 89 f7 > > Tried to decode it, but couldn't match it to source code, my version of > compiled code is too different. Would it be possible to either send > mm/compaction.o from the matching build, or output of 'objdump -d -l' > for the __reset_isolation_pfn function? (dropping the fs lists) I got another crash, this time triggered by rsync (large tree with many small files, few files changed). Oops: [41969.140117] BUG: kernel NULL pointer dereference, address: 0000000000000= 000 [41969.140121] #PF: supervisor read access in kernel mode [41969.140122] #PF: error_code(0x0000) - not-present page [41969.140123] PGD 0 P4D 0 [41969.140125] Oops: 0000 [#1] SMP PTI [41969.140127] CPU: 5 PID: 144 Comm: kswapd0 Tainted: G I 5.= 2.18fw+ #10 [41969.140128] Hardware name: System manufacturer System Product Name/P6X58= D-E, BIOS 0701 05/10/2011 [41969.140133] RIP: 0010:__reset_isolation_pfn+0x27f/0x3c0 [41969.140134] Code: 44 c6 48 8b 00 a8 10 74 bc 49 8b 16 48 89 d0 48 c1 ea = 35 48 8b 14 d7 48 c1 e8 2d 48 85 d2 74 0a 0f b6 c0 48 c1 e0 04 48 01 c2 <48= > 8b 02 4c 89 f2 41 b8 01 00 00 00 31 f6 b9 03 00 00 00 4c 89 f7 [41969.140135] RSP: 0018:ffffc900003ffde0 EFLAGS: 00010246 [41969.140137] RAX: 000000000004fdac RBX: 0000000000118000 RCX: 00000000000= 00000 [41969.140138] RDX: 0000000000000000 RSI: 0000000000000230 RDI: ffff88833ff= fa000 [41969.140138] RBP: ffffc900003ffe18 R08: 000000000000003c R09: ffff8883350= 80000 [41969.140139] R10: ffff88833fff9000 R11: 0000000000000000 R12: 00000000000= 00001 [41969.140140] R13: 0000000000000001 R14: ffff888338dc01c0 R15: 00000000000= 00001 [41969.140141] FS: 0000000000000000(0000) GS:ffff888333d40000(0000) knlGS:= 0000000000000000 [41969.140142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [41969.140143] CR2: 0000000000000000 CR3: 000000000200a001 CR4: 00000000000= 206e0 [41969.140144] Call Trace: [41969.140147] __reset_isolation_suitable+0x9b/0x120 [41969.140149] reset_isolation_suitable+0x3b/0x40 [41969.140152] kswapd+0x98/0x300 [41969.140154] ? wait_woken+0x80/0x80 [41969.140157] kthread+0x114/0x130 [41969.140158] ? balance_pgdat+0x450/0x450 [41969.140159] ? kthread_park+0x80/0x80 [41969.140162] ret_from_fork+0x1f/0x30 [41969.140163] Modules linked in: usb_storage nfnetlink 8021q garp stp llc = fuse ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_filter xt_state xt_conntr= ack iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag= _ipv4 iptable_filter tun ip6_tables binfmt_misc mxm_wmi evdev snd_hda_codec= _hdmi coretemp serio_raw snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwde= p irqbypass snd_hda_core pcspkr snd_pcm snd_timer snd soundcore sg i7core_e= dac asus_atk0110 wmi button loop ip_tables x_tables raid10 raid456 async_ra= id6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0= multipath linear md_mod hid_generic usbhid hid crc32c_intel psmouse sr_mod= cdrom radeon e1000e xhci_pci ptp ehci_pci uhci_hcd xhci_hcd pps_core ehci_= hcd sky2 usbcore ttm usb_common sd_mod [41969.140187] CR2: 0000000000000000 [41969.140189] ---[ end trace e27ddb472a95c047 ]--- This time, I've got a kernel with debugging information (still 5.2.18). The crash is at offset 0x39f: if (!mem_section[SECTION_NR_TO_ROOT(nr)]) 384: 48 c1 ea 35 shr $0x35,%rdx 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx 38c: 48 c1 e8 2d shr $0x2d,%rax 390: 48 85 d2 test %rdx,%rdx 393: 74 0a je 39f <__reset_isolation_pfn+0= x27f> return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; 395: 0f b6 c0 movzbl %al,%eax 398: 48 c1 e0 04 shl $0x4,%rax 39c: 48 01 c2 add %rax,%rdx unsigned long map =3D section->section_mem_map; 39f: 48 8b 02 mov (%rdx),%rax clear_pageblock_skip(page); 3a2: 4c 89 f2 mov %r14,%rdx 3a5: 41 b8 01 00 00 00 mov $0x1,%r8d 3ab: 31 f6 xor %esi,%esi 3ad: b9 03 00 00 00 mov $0x3,%ecx 3b2: 4c 89 f7 mov %r14,%rdi Hmm, -l output is likely more helpful here: /home/fw/src/linux/linux/mm/compaction.c:293 37a: a8 10 test $0x10,%al 37c: 74 bc je 33a <__reset_isolation_pfn+0= x21a> page_to_section(): /home/fw/src/linux/linux/./include/linux/mm.h:1265 37e: 49 8b 16 mov (%r14),%rdx 381: 48 89 d0 mov %rdx,%rax __nr_to_section(): /home/fw/src/linux/linux/./include/linux/mmzone.h:1218 384: 48 c1 ea 35 shr $0x35,%rdx 388: 48 8b 14 d7 mov (%rdi,%rdx,8),%rdx page_to_section(): /home/fw/src/linux/linux/./include/linux/mm.h:1265 38c: 48 c1 e8 2d shr $0x2d,%rax __nr_to_section(): /home/fw/src/linux/linux/./include/linux/mmzone.h:1218 390: 48 85 d2 test %rdx,%rdx 393: 74 0a je 39f <__reset_isolation_pfn+0= x27f> /home/fw/src/linux/linux/./include/linux/mmzone.h:1220 395: 0f b6 c0 movzbl %al,%eax 398: 48 c1 e0 04 shl $0x4,%rax 39c: 48 01 c2 add %rax,%rdx __section_mem_map_addr(): /home/fw/src/linux/linux/./include/linux/mmzone.h:1247 39f: 48 8b 02 mov (%rdx),%rax __reset_isolation_pfn(): /home/fw/src/linux/linux/mm/compaction.c:294 3a2: 4c 89 f2 mov %r14,%rdx 3a5: 41 b8 01 00 00 00 mov $0x1,%r8d 3ab: 31 f6 xor %esi,%esi It's this loop: 286 /* 287 * Only clear the hint if a sample indicates there is either a 288 * free page or an LRU page in the block. One or other condit= ion 289 * is necessary for the block to be a migration source/target. 290 */ 291 do { 292 if (pfn_valid_within(pfn)) { 293 if (check_source && PageLRU(page)) { 294 clear_pageblock_skip(page); 295 return true; 296 } 297=20 298 if (check_target && PageBuddy(page)) { 299 clear_pageblock_skip(page); 300 return true; 301 } 302 } 303=20 304 page +=3D (1 << PAGE_ALLOC_COSTLY_ORDER); 305 pfn +=3D (1 << PAGE_ALLOC_COSTLY_ORDER); 306 } while (page < end_page);