From: Zhenyu Zhang <zhenyzha@redhat.com>
To: linux-xfs@vger.kernel.org
Cc: Guowen Shan <gshan@redhat.com>,
Shaoqin Huang <shahuang@redhat.com>,
Zhenyu Zhang <zhenyzha@redhat.com>
Subject: Endless calls to xas_split_alloc() due to corrupted xarray entry
Date: Fri, 22 Sep 2023 11:56:43 +0800 [thread overview]
Message-ID: <CAJFLiB+J4mKGDOppp=1moMe2aNqeJhM9F2cD4KPTXoM6nzb5RA@mail.gmail.com> (raw)
Hi all,
we don't know how the xarray entry was corrupted. Maybe it's a known
issue to community.
Lets see.
Contents
--------
1. Problem Statement
2. The call trace
3. The captured data by bpftrace
1. Problem Statement
--------------------
With 4k guest and 64k host, on aarch64(Ampere's Altra Max CPU) hit Call trace:
Steps:
1) System setup hugepages on host.
# echo 60 > /proc/sys/vm/nr_hugepages
2) Mount this hugepage to /mnt/kvm_hugepage.
# mount -t hugetlbfs -o pagesize=524288K none /mnt/kvm_hugepage
3) HugePages didn't leak when using non-existent mem-path.
# mkdir -p /mnt/tmp
4) Boot guest.
# /usr/libexec/qemu-kvm \
...
-m 30720 \
-object '{"size": 32212254720, "mem-path": "/mnt/tmp", "qom-type":
"memory-backend-file"}' \
-smp 4,maxcpus=4,cores=2,threads=1,clusters=1,sockets=2 \
-blockdev '{"node-name": "file_image1", "driver": "file",
"auto-read-only": true, "discard": "unmap", "aio": "threads",
"filename": "/home/kvm_autotest_root/images/back_up_4k.qcow2",
"cache": {"direct": true, "no-flush": false}}' \
-blockdev '{"node-name": "drive_image1", "driver": "qcow2",
"read-only": false, "cache": {"direct": true, "no-flush": false},
"file": "file_image1"}' \
-device '{"driver": "scsi-hd", "id": "image1", "drive":
"drive_image1", "write-cache": "on"}' \
5) Wait about 1 minute ------> hit Call trace
2. The call trace
--------------------
[ 14.982751] block dm-0: the capability attribute has been deprecated.
[ 15.690043] PEFILE: Unsigned PE binary
[ 90.135676] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 90.136629] rcu: 3-...0: (3 ticks this GP)
idle=e6ec/1/0x4000000000000000 softirq=6847/6849 fqs=232
[ 90.137293] rcu: (detected by 2, t=6012 jiffies, g=2085, q=2539 ncpus=4)
[ 90.137796] Task dump for CPU 3:
[ 90.138037] task:PK-Backend state:R running task stack:0
pid:2287 ppid:1 flags:0x00000202
[ 90.138757] Call trace:
[ 90.138940] __switch_to+0xc8/0x110
[ 90.139203] 0xb54a54f8c5fb0700
[ 270.190849] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 270.191722] rcu: 3-...0: (3 ticks this GP)
idle=e6ec/1/0x4000000000000000 softirq=6847/6849 fqs=1020
[ 270.192405] rcu: (detected by 1, t=24018 jiffies, g=2085, q=3104 ncpus=4)
[ 270.192876] Task dump for CPU 3:
[ 270.193099] task:PK-Backend state:R running task stack:0
pid:2287 ppid:1 flags:0x00000202
[ 270.193774] Call trace:
[ 270.193946] __switch_to+0xc8/0x110
[ 270.194336] 0xb54a54f8c5fb0700
[ 1228.068406] ------------[ cut here ]------------
[ 1228.073011] WARNING: CPU: 2 PID: 4496 at lib/xarray.c:1010
xas_split_alloc+0xf8/0x128
[ 1228.080828] Modules linked in: binfmt_misc vhost_net vhost
vhost_iotlb tap xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc
qrtr rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu
ipmi_devintf arm_cmn arm_dmc620_pmu ipmi_msghandler cppc_cpufreq
arm_dsu_pmu xfs libcrc32c ast drm_shmem_helper drm_kms_helper drm
crct10dif_ce ghash_ce igb nvme sha2_ce nvme_core sha256_arm64 sha1_ce
i2c_designware_platform sbsa_gwdt nvme_common i2c_algo_bit
i2c_designware_core xgene_hwmon dm_mirror dm_region_hash dm_log dm_mod
fuse
[ 1228.137630] CPU: 2 PID: 4496 Comm: qemu-kvm Kdump: loaded Tainted:
G W 6.6.0-rc2-zhenyzha+ #5
[ 1228.147529] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
F31h (SCP: 2.10.20220810) 07/27/2022
[ 1228.156820] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1228.163767] pc : xas_split_alloc+0xf8/0x128
[ 1228.167938] lr : __filemap_add_folio+0x33c/0x4e0
[ 1228.172543] sp : ffff80008dd4f1c0
[ 1228.175844] x29: ffff80008dd4f1c0 x28: ffffd15825388c40 x27: 0000000000000001
[ 1228.182967] x26: 0000000000000001 x25: ffffffffffffc005 x24: 0000000000000000
[ 1228.190089] x23: ffff80008dd4f270 x22: ffffffc202b00000 x21: 0000000000000000
[ 1228.197211] x20: ffffffc2007f9600 x19: 000000000000000d x18: 0000000000000014
[ 1228.204334] x17: 00000000b21b8a3f x16: 0000000013a8aa94 x15: ffffd15824625944
[ 1228.211456] x14: ffffffffffffffff x13: 0000000000000030 x12: 0101010101010101
[ 1228.218578] x11: 7f7f7f7f7f7f7f7f x10: 000000000000000a x9 : ffffd158252dd3fc
[ 1228.225701] x8 : ffff80008dd4f1c0 x7 : ffff07ffa0945468 x6 : ffff80008dd4f1c0
[ 1228.232823] x5 : 0000000000000018 x4 : 0000000000000000 x3 : 0000000000012c40
[ 1228.239945] x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000
[ 1228.247067] Call trace:
[ 1228.249500] xas_split_alloc+0xf8/0x128
[ 1228.253324] __filemap_add_folio+0x33c/0x4e0
[ 1228.257582] filemap_add_folio+0x48/0xd0
[ 1228.261493] page_cache_ra_order+0x214/0x310
[ 1228.265750] ondemand_readahead+0x1a8/0x320
[ 1228.269921] page_cache_async_ra+0x64/0xa8
[ 1228.274005] filemap_fault+0x238/0xaa8
[ 1228.277742] __xfs_filemap_fault+0x60/0x3c0 [xfs]
[ 1228.282491] xfs_filemap_fault+0x54/0x68 [xfs]
[ 1228.286979] __do_fault+0x40/0x210
[ 1228.290368] do_cow_fault+0xf0/0x300
[ 1228.293931] do_pte_missing+0x140/0x238
[ 1228.297754] handle_pte_fault+0x100/0x160
[ 1228.301751] __handle_mm_fault+0x100/0x310
[ 1228.305835] handle_mm_fault+0x6c/0x270
[ 1228.309659] faultin_page+0x70/0x128
[ 1228.313221] __get_user_pages+0xc8/0x2d8
[ 1228.317131] get_user_pages_unlocked+0xc4/0x3b8
[ 1228.321648] hva_to_pfn+0xf8/0x468
[ 1228.325037] __gfn_to_pfn_memslot+0xa4/0xf8
[ 1228.329207] user_mem_abort+0x174/0x7e8
[ 1228.333031] kvm_handle_guest_abort+0x2dc/0x450
[ 1228.337548] handle_exit+0x70/0x1c8
[ 1228.341024] kvm_arch_vcpu_ioctl_run+0x224/0x5b8
[ 1228.345628] kvm_vcpu_ioctl+0x28c/0x9d0
[ 1228.349450] __arm64_sys_ioctl+0xa8/0xf0
[ 1228.353360] invoke_syscall.constprop.0+0x7c/0xd0
[ 1228.358052] do_el0_svc+0xb4/0xd0
[ 1228.361354] el0_svc+0x50/0x228
[ 1228.364484] el0t_64_sync_handler+0x134/0x150
[ 1228.368827] el0t_64_sync+0x17c/0x180
[ 1228.372476] ---[ end trace 0000000000000000 ]---
[ 1228.377124] ------------[ cut here ]------------
[ 1228.381728] WARNING: CPU: 2 PID: 4496 at lib/xarray.c:1010
xas_split_alloc+0xf8/0x128
[ 1228.389546] Modules linked in: binfmt_misc vhost_net vhost
vhost_iotlb tap xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc
qrtr rfkill sunrpc vfat fat acpi_ipmi ipmi_ssif arm_spe_pmu
ipmi_devintf arm_cmn arm_dmc620_pmu ipmi_msghandler cppc_cpufreq
arm_dsu_pmu xfs libcrc32c ast drm_shmem_helper drm_kms_helper drm
crct10dif_ce ghash_ce igb nvme sha2_ce nvme_core sha256_arm64 sha1_ce
i2c_designware_platform sbsa_gwdt nvme_common i2c_algo_bit
i2c_designware_core xgene_hwmon dm_mirror dm_region_hash dm_log dm_mod
fuse
[ 1228.446348] CPU: 2 PID: 4496 Comm: qemu-kvm Kdump: loaded Tainted:
G W 6.6.0-rc2-zhenyzha+ #5
[ 1228.456248] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS
F31h (SCP: 2.10.20220810) 07/27/2022
[ 1228.465538] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1228.472486] pc : xas_split_alloc+0xf8/0x128
[ 1228.476656] lr : __filemap_add_folio+0x33c/0x4e0
[ 1228.481261] sp : ffff80008dd4f1c0
[ 1228.484563] x29: ffff80008dd4f1c0 x28: ffffd15825388c40 x27: 0000000000000001
[ 1228.491685] x26: 0000000000000001 x25: ffffffffffffc005 x24: 0000000000000000
[ 1228.498807] x23: ffff80008dd4f270 x22: ffffffc202b00000 x21: 0000000000000000
[ 1228.505930] x20: ffffffc2007f9600 x19: 000000000000000d x18: 0000000000000014
[ 1228.513052] x17: 00000000b21b8a3f x16: 0000000013a8aa94 x15: ffffd15824625944
[ 1228.520174] x14: ffffffffffffffff x13: 0000000000000030 x12: 0101010101010101
[ 1228.527297] x11: 7f7f7f7f7f7f7f7f x10: 000000000000000a x9 : ffffd158252dd3fc
[ 1228.534419] x8 : ffff80008dd4f1c0 x7 : ffff07ffa0945468 x6 : ffff80008dd4f1c0
[ 1228.541542] x5 : 0000000000000018 x4 : 0000000000000000 x3 : 0000000000012c40
[ 1228.548664] x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000
[ 1228.555786] Call trace:
[ 1228.558220] xas_split_alloc+0xf8/0x128
[ 1228.562043] __filemap_add_folio+0x33c/0x4e0
[ 1228.566300] filemap_add_folio+0x48/0xd0
[ 1228.570211] page_cache_ra_order+0x214/0x310
[ 1228.574469] ondemand_readahead+0x1a8/0x320
[ 1228.578639] page_cache_async_ra+0x64/0xa8
[ 1228.582724] filemap_fault+0x238/0xaa8
[ 1228.586460] __xfs_filemap_fault+0x60/0x3c0 [xfs]
[ 1228.591210] xfs_filemap_fault+0x54/0x68 [xfs]
3. The captured data by bpftrace
(The following part is the crawl analysis of gshan@redhat.com )
--------------------
pid: 4475 task: qemu-kvm
file: /mnt/tmp/qemu_back_mem.mem-machine_mem.OdGYet (deleted)
-------------------- inode --------------------
i_flags: 0x0
i_ino: 67333199
i_size: 32212254720
----------------- address_space ----------------
flags: 040
invalidate_lock
count: 256
owner: 0xffff07fff6e759c1
pid: 4496 task: qemu-kvm
wait_list.next: 0xffff07ffa20422e0
wait_list.prev: 0xffff07ffa20422e0
-------------------- xarray --------------------
entry[0]: 0xffff080f7eda0002
shift: 18
offset: 0
count: 2
nr_values: 0
parent: 0x0
slots[00]: 0xffff07ffa094546a
slots[01]: 0xffff07ffa1b09b22
entry[1]: 0xffff07ffa094546a
shift: 12
offset: 0
count: 20
nr_values: 0
parent: 0xffff080f7eda0000
slots[00]: 0xffffffc202880000
slots[01]: 0x2
entry[2]: 0xffffffc202880000
shift: 104
offset: 128
count: 0
nr_values: 0
parent: 0xffffffc20304c888
slots[00]: 0xffff08009a960000
slots[01]: 0x2001ffffffff
It seems the last xarray entry ("entry[2]") has been corrupted. "shift"
becomes 104 and "offset" becomes 128, which isn't reasonable.
It's explaining why we run into xas_split_alloc() in __filemap_add_folio()
if (order > folio_order(folio))
xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index),
order, gfp);
folio_order(folio) is likely 6 since the readahead window size on the BDI device
is 4MB.
However, @order figured from the corrupted xarray entry is much larger than 6.
log2(0x400000 / 0x10000) = log2(64) = 6
[root@ampere-mtsnow-altramax-28 ~]# uname -r
6.6.0-rc2-zhenyzha+
[root@ampere-mtsnow-altramax-28 ~]# cat
/sys/devices/virtual/bdi/253:0/read_ahead_kb
4096
next reply other threads:[~2023-09-22 3:58 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-22 3:56 Zhenyu Zhang [this message]
[not found] ` <ZRFbIJH47RkQuDid@debian.me>
2023-09-25 15:12 ` Endless calls to xas_split_alloc() due to corrupted xarray entry Darrick J. Wong
2023-09-26 7:49 ` Zhenyu Zhang
2023-09-29 10:11 ` Gavin Shan
2023-09-29 19:17 ` Matthew Wilcox
2023-09-30 2:12 ` Gavin Shan
2024-06-19 9:45 ` David Hildenbrand
2024-06-19 14:31 ` Matthew Wilcox
2024-06-19 15:48 ` Linus Torvalds
2024-06-19 19:58 ` David Hildenbrand
2024-06-25 1:10 ` Gavin Shan
2024-06-19 20:50 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJFLiB+J4mKGDOppp=1moMe2aNqeJhM9F2cD4KPTXoM6nzb5RA@mail.gmail.com' \
--to=zhenyzha@redhat.com \
--cc=gshan@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=shahuang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).