All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Qian Cai <cai@lca.pw>
Cc: Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
Date: Fri, 24 Apr 2020 11:43:53 +0800	[thread overview]
Message-ID: <20200424034353.GD4247@MiWiFi-R3L-srv> (raw)
In-Reply-To: <8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw>

On 04/23/20 at 05:25pm, Qian Cai wrote:
> Compaction starts to crash below on linux-next today. The faulty page belongs to Node 0 DMA32 zone.
> I’ll continue to narrow it down, but just want to give a headup in case someone could beat me to it.
> 
> Debug output from free_area_init_core()
> [    0.000000] KK start page = ffffea0000000040, end page = ffffea0000040000, nid = 0 DMA
> [    0.000000] KK start page = ffffea0000040000, end page = ffffea0004000000, nid = 0 DMA32
> [    0.000000] KK start page = ffffea0004000000, end page = ffffea0012000000, nid = 0 NORMAL
> [    0.000000] KK start page = ffffea0012000000, end page = ffffea0021fc0000, nid = 4 NORMAL

Where are these printed? They are the direct mapping address of page?

> 
> I don’t understand how it could end up in such a situation. There are several recent patches look
> more related than some others.
> 
> - mm: rework free_area_init*() funcitons
> https://lore.kernel.org/linux-mm/20200412194859.12663-1-rppt@kernel.org/
> Could this somehow allow an invalid pfn to escape into the page allocator?
> Especially, is it related to skip the checks in memmap_init_zone()?
> https://lore.kernel.org/linux-mm/20200412194859.12663-16-rppt@kernel.org

Possibly. In which arch is this happening? Do you have boot log?

> 
> # numactl -H
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 16 17
> node 0 size: 7951 MB
> node 0 free: 4445 MB
> node 1 cpus: 2 3 18 19
> node 1 size: 0 MB
> node 1 free: 0 MB
> node 2 cpus: 4 5 20 21
> node 2 size: 0 MB
> node 2 free: 0 MB
> node 3 cpus: 6 7 22 23
> node 3 size: 0 MB
> node 3 free: 0 MB
> node 4 cpus: 8 9 24 25
> node 4 size: 15354 MB
> node 4 free: 78 MB
> node 5 cpus: 10 11 26 27
> node 5 size: 0 MB
> node 5 free: 0 MB
> node 6 cpus: 12 13 28 29
> node 6 size: 0 MB
> node 6 free: 0 MB
> node 7 cpus: 14 15 30 31
> node 7 size: 0 MB
> node 7 free: 0 MB
> node distances:
> node   0   1   2   3   4   5   6   7 
>   0:  10  16  16  16  32  32  32  32 
>   1:  16  10  16  16  32  32  32  32 
>   2:  16  16  10  16  32  32  32  32 
>   3:  16  16  16  10  32  32  32  32 
>   4:  32  32  32  32  10  16  16  16 
>   5:  32  32  32  32  16  10  16  16 
>   6:  32  32  32  32  16  16  10  16 
>   7:  32  32  32  32  16  16  16  10
> 
> [ 6803.941550] LTP: starting swapping01 (swapping01 -i 5)
> [ 6821.098489] page:ffffea0000aa0000 refcount:1 mapcount:0 mapping:000000002243743b index:0x0
> [ 6821.107077] flags: 0x1fffe000001000(reserved)
> [ 6821.111534] raw: 001fffe000001000 ffffea0000aa0008 ffffea0000aa0008 0000000000000000
> [ 6821.119365] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [ 6821.127167] page dumped because: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn))
> [ 6821.135399] page_owner info is not present (never set?)
> [ 6821.140717] ------------[ cut here ]------------
> [ 6821.145372] kernel BUG at mm/page_alloc.c:533!
> [ 6821.150075] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [ 6821.150083] irq event stamp: 10075005
> [ 6821.150102] hardirqs last  enabled at (10075005): [<ffffffff99ea403f>] do_page_fault+0x45f/0x9d7
> [ 6821.156829] CPU: 17 PID: 218 Comm: kcompactd0 Not tainted 5.7.0-rc2-next-20200423+ #7
> [ 6821.160522] hardirqs last disabled at (10075004): [<ffffffff99e03ed1>] trace_hardirqs_off_thunk+0x1a/0x1c
> [ 6821.160535] softirqs last  enabled at (10067158): [<ffffffff9ac00478>] __do_softirq+0x478/0x77f
> [ 6821.169366] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
> [ 6821.169378] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
> [ 6821.177257] softirqs last disabled at (10067149): [<ffffffff99ed22a6>] irq_exit+0xd6/0xf0
> [ 6821.218362] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
> [ 6821.237457] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
> [ 6821.242719] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
> [ 6821.249900] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
> [ 6821.257384] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
> [ 6821.264566] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
> [ 6821.271748] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
> [ 6821.278930] FS:  0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
> [ 6821.287318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6821.293102] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
> [ 6821.300283] Call Trace:
> [ 6821.302752]  isolate_freepages+0xb20/0x1140
> [ 6821.307167]  ? isolate_freepages_block+0x730/0x730
> [ 6821.311993]  ? mark_held_locks+0x34/0xb0
> [ 6821.315942]  ? free_unref_page+0x7d/0x90
> [ 6821.319891]  ? free_unref_page+0x7d/0x90
> [ 6821.323842]  ? check_flags.part.28+0x86/0x220
> [ 6821.328234]  compaction_alloc+0xdd/0x100
> [ 6821.332401]  migrate_pages+0x304/0x17e0
> [ 6821.336277]  ? __ClearPageMovable+0x100/0x100
> [ 6821.340674]  ? isolate_freepages+0x1140/0x1140
> [ 6821.345153]  compact_zone+0x1249/0x1e90
> [ 6821.349020]  ? compaction_suitable+0x260/0x260
> [ 6821.353494]  kcompactd_do_work+0x231/0x650
> [ 6821.357873]  ? sysfs_compact_node+0x80/0x80
> [ 6821.362088]  ? finish_wait+0xe6/0x110
> [ 6821.365775]  kcompactd+0x162/0x490
> [ 6821.369202]  ? kcompactd_do_work+0x650/0x650
> [ 6821.373501]  ? finish_wait+0x110/0x110
> [ 6821.377280]  ? __kasan_check_read+0x11/0x20
> [ 6821.381693]  ? __kthread_parkme+0xd4/0xf0
> [ 6821.385729]  ? kcompactd_do_work+0x650/0x650
> [ 6821.390027]  kthread+0x1f7/0x220
> [ 6821.393280]  ? kthread_create_worker_on_cpu+0xc0/0xc0
> [ 6821.398369]  ret_from_fork+0x27/0x50
> [ 6821.401968] Modules linked in: brd vfat fat ext4 crc16 mbcache jbd2 loop kvm_amd ses kvm enclosure dax_pmem dax_pmem_core irqbypass acpi_cpufreq ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas tg3 firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
> [ 6821.426127] ---[ end trace 9783087562801ccf ]---
> [ 6821.430800] RIP: 0010:set_pfnblock_flags_mask+0x150/0x210
> [ 6821.436410] Code: 1a 49 8d 7f 38 e8 80 ee 05 00 48 8b 45 a0 48 8b 55 a8 48 03 50 78 49 39 d4 72 1d 48 c7 c6 80 ee ef 9a 4c 89 ef e8 70 73 fb ff <0f> 0b 48 c7 c7 40 50 75 9b e8 c4 09 2b 00 4c 8b 65 d0 b9 3f 00 00
> [ 6821.455319] RSP: 0018:ffffc900042ff858 EFLAGS: 00010282
> [ 6821.460863] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9a002382
> [ 6821.468063] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8884535b8e6c
> [ 6821.475245] RBP: ffffc900042ff8b8 R08: ffffed108a6b8459 R09: ffffed108a6b8459
> [ 6821.482675] R10: ffff8884535c22c7 R11: ffffed108a6b8458 R12: 000000000002a800
> [ 6821.489877] R13: ffffea0000aa0000 R14: ffff88847fff3000 R15: ffff88847fff3040
> [ 6821.497062] FS:  0000000000000000(0000) GS:ffff888453580000(0000) knlGS:0000000000000000
> [ 6821.505218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6821.511284] CR2: 00007fd1eb4a1000 CR3: 000000083154c000 CR4: 00000000003406e0
> [ 6821.518487] Kernel panic - not syncing: Fatal exception
> [ 6821.523876] Kernel Offset: 0x18e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 6821.534915] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 



  reply	other threads:[~2020-04-24  3:44 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 21:25 compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)) Qian Cai
2020-04-24  3:43 ` Baoquan He [this message]
2020-04-24 13:45   ` Qian Cai
2020-05-05 12:43     ` Baoquan He
2020-05-05 13:20       ` Qian Cai
2020-05-11  1:21         ` Baoquan He
2020-04-26 14:41 ` Mike Rapoport
2020-04-27 13:45   ` Qian Cai
2020-11-21 19:45 ` [PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask Andrea Arcangeli
2020-11-21 19:45   ` [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Andrea Arcangeli
2020-11-21 19:53     ` Andrea Arcangeli
2020-11-23 11:26       ` David Hildenbrand
2020-11-23 13:01     ` Vlastimil Babka
2020-11-24 13:32       ` Mel Gorman
2020-11-24 20:56         ` Andrea Arcangeli
2020-11-25 10:30           ` Mel Gorman
2020-11-25 17:59             ` Andrea Arcangeli
2020-11-26 10:47               ` Mel Gorman
2020-12-06  2:26                 ` Andrea Arcangeli
2020-12-06 23:47                   ` Mel Gorman
2020-11-25  5:34       ` Andrea Arcangeli
2020-11-25  6:45         ` David Hildenbrand
2020-11-25  8:51           ` Mike Rapoport
2020-11-25 10:39           ` Mel Gorman
2020-11-25 11:04             ` David Hildenbrand
2020-11-25 11:41               ` David Hildenbrand
2020-11-25 18:47                 ` Andrea Arcangeli
2020-11-25 13:33               ` Mel Gorman
2020-11-25 13:41                 ` David Hildenbrand
2020-11-25 18:28           ` Andrea Arcangeli
2020-11-25 19:27             ` David Hildenbrand
2020-11-25 20:41               ` Andrea Arcangeli
2020-11-25 21:13                 ` David Hildenbrand
2020-11-25 21:04               ` Mike Rapoport
2020-11-25 21:38                 ` Andrea Arcangeli
2020-11-26  9:36                   ` Mike Rapoport
2020-11-26 10:05                     ` David Hildenbrand
2020-11-26 17:46                       ` Mike Rapoport
2020-11-29 12:32                         ` Mike Rapoport
2020-12-02  0:44                           ` Andrea Arcangeli
2020-12-02 17:39                             ` Mike Rapoport
2020-12-03  6:23                               ` Andrea Arcangeli
2020-12-03 10:51                                 ` Mike Rapoport
2020-12-03 17:31                                   ` Andrea Arcangeli
2020-12-06  8:09                                     ` Mike Rapoport
2020-11-26 18:15                       ` Andrea Arcangeli
2020-11-26 18:29                     ` Andrea Arcangeli
2020-11-26 19:44                       ` Mike Rapoport
2020-11-26 20:30                         ` Andrea Arcangeli
2020-11-26 21:03                           ` Mike Rapoport
2020-11-26 19:21                     ` Andrea Arcangeli
2020-11-25 12:08         ` Vlastimil Babka
2020-11-25 13:32           ` David Hildenbrand
2020-11-25 14:13             ` Mike Rapoport
2020-11-25 14:42               ` David Hildenbrand
2020-11-26 10:51                 ` Mel Gorman
2020-11-25 19:14               ` Andrea Arcangeli
2020-11-25 19:01           ` Andrea Arcangeli
2020-11-25 19:33             ` David Hildenbrand
2020-11-26  3:40         ` Andrea Arcangeli
2020-11-26 10:43           ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200424034353.GD4247@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.