* [bug report] raid0 array mkfs.xfs hang
@ 2024-08-08 17:12 John Garry
2024-08-12 14:50 ` John Garry
0 siblings, 1 reply; 10+ messages in thread
From: John Garry @ 2024-08-08 17:12 UTC (permalink / raw)
To: linux-xfs, linux-scsi, linux-block, linux-raid
After upgrading from v6.10 to v6.11-rc1/2, I am seeing a hang when
attempting to format a software raid0 array:
$sudo mkfs.xfs -f -K /dev/md127
meta-data=/dev/md127 isize=512 agcount=32,
agsize=33550272 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0 inobtcount=0
data = bsize=4096 blocks=1073608704, imaxpct=5
= sunit=64 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
^C^C^C^C
I'm using mkfs.xfs -K to avoid discard-related lock-up issues which I
have seen reported when googling - maybe this is just another similar issue.
The kernel lockup callstack is at the bottom.
Some array details:
$sudo mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Thu Aug 8 13:23:59 2024
Raid Level : raid0
Array Size : 4294438912 (4.00 TiB 4.40 TB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Thu Aug 8 13:23:59 2024
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : -unknown-
Chunk Size : 256K
Consistency Policy : none
Name : 0
UUID : 3490e53f:36d0131b:7c7eb913:0fd62deb
Events : 0
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 64 1 active sync /dev/sde
2 8 48 2 active sync /dev/sdd
3 8 80 3 active sync /dev/sdf
$lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 46.6G 0 disk
├─sda1 8:1 0 100M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 45.5G 0 part
├─ocivolume-root 252:0 0 35.5G 0 lvm /
└─ocivolume-oled 252:1 0 10G 0 lvm /var/oled
sdb 8:16 0 1T 0 disk
└─md127 9:127 0 4T 0 raid0
sdc 8:32 0 1T 0 disk
sdd 8:48 0 1T 0 disk
└─md127 9:127 0 4T 0 raid0
sde 8:64 0 1T 0 disk
└─md127 9:127 0 4T 0 raid0
sdf 8:80 0 1T 0 disk
└─md127 9:127 0 4T 0 raid0
I'll start to look deeper, but any suggestions on the problem are welcome.
Thanks,
John
ort_iscsi aesni_intel crypto_simd cryptd
[ 396.110305] CPU: 0 UID: 0 PID: 321 Comm: kworker/0:1H Not tainted
6.11.0-rc1-g8400291e289e #11
[ 396.111020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.5.1 06/16/2021
[ 396.111695] Workqueue: kblockd blk_mq_run_work_fn
[ 396.112114] RIP: 0010:bio_endio+0xa0/0x1b0
[ 396.112455] Code: 96 9a 02 00 48 8b 43 08 48 85 c0 74 09 0f b7 53 14
f6 c2 80 75 3b 48 8b 43 38 48 3d e0 a3 3c b2 75 44 0f b6 43 19 48 8b 6b
40 <84> c0 74 09 80 7d 19 00 75 03 88 45 19 48 89 df 48 89 eb e8 58 fe
[ 396.113962] RSP: 0018:ffffa3fec19fbc38 EFLAGS: 00000246
[ 396.114392] RAX: 0000000000000001 RBX: ffff97a284c3e600 RCX:
00000000002a0001
[ 396.114974] RDX: 0000000000000000 RSI: ffffcfb0f1130f80 RDI:
0000000000020000
[ 396.115546] RBP: ffff97a284c41bc0 R08: ffff97a284c3e3c0 R09:
00000000002a0001
[ 396.116185] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff9798216ed000
[ 396.116766] R13: ffff97975bf071c0 R14: ffff979751be4798 R15:
0000000000009000
[ 396.117393] FS: 0000000000000000(0000) GS:ffff97b5ff600000(0000)
knlGS:0000000000000000
[ 396.118122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 396.118709] CR2: 00007f2477a45f68 CR3: 0000000107998005 CR4:
0000000000770ef0
[ 396.119398] PKRU: 55555554
[ 396.119627] Call Trace:
[ 396.119905] <IRQ>
[ 396.120078] ? watchdog_timer_fn+0x1e2/0x260
[ 396.120457] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 396.120900] ? __hrtimer_run_queues+0x10c/0x270
[ 396.121276] ? hrtimer_interrupt+0x109/0x250
[ 396.121663] ? __sysvec_apic_timer_interrupt+0x55/0x120
[ 396.122197] ? sysvec_apic_timer_interrupt+0x6c/0x90
[ 396.122640] </IRQ>
[ 396.122815] <TASK>
[ 396.123009] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 396.123473] ? bio_endio+0xa0/0x1b0
[ 396.123794] ? bio_endio+0xb8/0x1b0
[ 396.124082] md_end_clone_io+0x42/0xa0
[ 396.124406] blk_update_request+0x128/0x490
[ 396.124745] ? srso_alias_return_thunk+0x5/0xfbef5
[ 396.125554] ? scsi_dec_host_busy+0x14/0x90
[ 396.126290] blk_mq_end_request+0x22/0x2e0
[ 396.126965] blk_mq_dispatch_rq_list+0x2b6/0x730
[ 396.127660] ? srso_alias_return_thunk+0x5/0xfbef5
[ 396.128386] __blk_mq_sched_dispatch_requests+0x442/0x640
[ 396.129152] blk_mq_sched_dispatch_requests+0x2a/0x60
[ 396.130005] blk_mq_run_work_fn+0x67/0x80
[ 396.130697] process_scheduled_works+0xa6/0x3e0
[ 396.131413] worker_thread+0x117/0x260
[ 396.132051] ? __pfx_worker_thread+0x10/0x10
[ 396.132697] kthread+0xd2/0x100
[ 396.133288] ? __pfx_kthread+0x10/0x10
[ 396.133977] ret_from_fork+0x34/0x40
[ 396.134613] ? __pfx_kthread+0x10/0x10
[ 396.135207] ret_from_fork_asm+0x1a/0x30
[ 396.135863] </TASK>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-08 17:12 [bug report] raid0 array mkfs.xfs hang John Garry @ 2024-08-12 14:50 ` John Garry 2024-08-14 14:00 ` John Garry 0 siblings, 1 reply; 10+ messages in thread From: John Garry @ 2024-08-12 14:50 UTC (permalink / raw) To: linux-xfs, linux-scsi, linux-block, linux-raid On 08/08/2024 18:12, John Garry wrote: Update for anyone interested: xfsprogs 5.3.0 does not have this issue for v6.11-rc. xfsprogs 5.15.0 and later does. For xfsprogs on my modestly recent baseline, mkfs.xfs is getting stuck in prepare_devices() -> libxfs_log_clear() -> libxfs_device_zero() -> libxfs_device_zero() -> platform_zero_range() -> fallocate(start=2198746472448 len=2136997888), and this never returns AFAICS. With v6.10 kernel, that fallocate with same args returns promptly. That code path is just not in xfsprogs 5.3.0 > After upgrading from v6.10 to v6.11-rc1/2, I am seeing a hang when > attempting to format a software raid0 array: > > $sudo mkfs.xfs -f -K /dev/md127 > meta-data=/dev/md127 isize=512 agcount=32, > agsize=33550272 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=0 > = reflink=1 bigtime=0 inobtcount=0 > data = bsize=4096 blocks=1073608704, imaxpct=5 > = sunit=64 swidth=256 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=521728, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > ^C^C^C^C > > > I'm using mkfs.xfs -K to avoid discard-related lock-up issues which I > have seen reported when googling - maybe this is just another similar > issue. > > The kernel lockup callstack is at the bottom. > > Some array details: > $sudo mdadm --detail /dev/md127 > /dev/md127: > Version : 1.2 > Creation Time : Thu Aug 8 13:23:59 2024 > Raid Level : raid0 > Array Size : 4294438912 (4.00 TiB 4.40 TB) > Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Thu Aug 8 13:23:59 2024 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Layout : -unknown- > Chunk Size : 256K > > Consistency Policy : none > > Name : 0 > UUID : 3490e53f:36d0131b:7c7eb913:0fd62deb > Events : 0 > > Number Major Minor RaidDevice State > 0 8 16 0 active sync /dev/sdb > 1 8 64 1 active sync /dev/sde > 2 8 48 2 active sync /dev/sdd > 3 8 80 3 active sync /dev/sdf > > > > $lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 46.6G 0 disk > ├─sda1 8:1 0 100M 0 part /boot/efi > ├─sda2 8:2 0 1G 0 part /boot > └─sda3 8:3 0 45.5G 0 part > ├─ocivolume-root 252:0 0 35.5G 0 lvm / > └─ocivolume-oled 252:1 0 10G 0 lvm /var/oled > sdb 8:16 0 1T 0 disk > └─md127 9:127 0 4T 0 raid0 > sdc 8:32 0 1T 0 disk > sdd 8:48 0 1T 0 disk > └─md127 9:127 0 4T 0 raid0 > sde 8:64 0 1T 0 disk > └─md127 9:127 0 4T 0 raid0 > sdf 8:80 0 1T 0 disk > └─md127 9:127 0 4T 0 raid0 > > I'll start to look deeper, but any suggestions on the problem are welcome. > > Thanks, > John > > > ort_iscsi aesni_intel crypto_simd cryptd > [ 396.110305] CPU: 0 UID: 0 PID: 321 Comm: kworker/0:1H Not tainted > 6.11.0-rc1-g8400291e289e #11 > [ 396.111020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.5.1 06/16/2021 > [ 396.111695] Workqueue: kblockd blk_mq_run_work_fn > [ 396.112114] RIP: 0010:bio_endio+0xa0/0x1b0 > [ 396.112455] Code: 96 9a 02 00 48 8b 43 08 48 85 c0 74 09 0f b7 53 14 > f6 c2 80 75 3b 48 8b 43 38 48 3d e0 a3 3c b2 75 44 0f b6 43 19 48 8b 6b > 40 <84> c0 74 09 80 7d 19 00 75 03 88 45 19 48 89 df 48 89 eb e8 58 fe > [ 396.113962] RSP: 0018:ffffa3fec19fbc38 EFLAGS: 00000246 > [ 396.114392] RAX: 0000000000000001 RBX: ffff97a284c3e600 RCX: > 00000000002a0001 > [ 396.114974] RDX: 0000000000000000 RSI: ffffcfb0f1130f80 RDI: > 0000000000020000 > [ 396.115546] RBP: ffff97a284c41bc0 R08: ffff97a284c3e3c0 R09: > 00000000002a0001 > [ 396.116185] R10: 0000000000000000 R11: 0000000000000000 R12: > ffff9798216ed000 > [ 396.116766] R13: ffff97975bf071c0 R14: ffff979751be4798 R15: > 0000000000009000 > [ 396.117393] FS: 0000000000000000(0000) GS:ffff97b5ff600000(0000) > knlGS:0000000000000000 > [ 396.118122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 396.118709] CR2: 00007f2477a45f68 CR3: 0000000107998005 CR4: > 0000000000770ef0 > [ 396.119398] PKRU: 55555554 > [ 396.119627] Call Trace: > [ 396.119905] <IRQ> > [ 396.120078] ? watchdog_timer_fn+0x1e2/0x260 > [ 396.120457] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 396.120900] ? __hrtimer_run_queues+0x10c/0x270 > [ 396.121276] ? hrtimer_interrupt+0x109/0x250 > [ 396.121663] ? __sysvec_apic_timer_interrupt+0x55/0x120 > [ 396.122197] ? sysvec_apic_timer_interrupt+0x6c/0x90 > [ 396.122640] </IRQ> > [ 396.122815] <TASK> > [ 396.123009] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 396.123473] ? bio_endio+0xa0/0x1b0 > [ 396.123794] ? bio_endio+0xb8/0x1b0 > [ 396.124082] md_end_clone_io+0x42/0xa0 > [ 396.124406] blk_update_request+0x128/0x490 > [ 396.124745] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 396.125554] ? scsi_dec_host_busy+0x14/0x90 > [ 396.126290] blk_mq_end_request+0x22/0x2e0 > [ 396.126965] blk_mq_dispatch_rq_list+0x2b6/0x730 > [ 396.127660] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 396.128386] __blk_mq_sched_dispatch_requests+0x442/0x640 > [ 396.129152] blk_mq_sched_dispatch_requests+0x2a/0x60 > [ 396.130005] blk_mq_run_work_fn+0x67/0x80 > [ 396.130697] process_scheduled_works+0xa6/0x3e0 > [ 396.131413] worker_thread+0x117/0x260 > [ 396.132051] ? __pfx_worker_thread+0x10/0x10 > [ 396.132697] kthread+0xd2/0x100 > [ 396.133288] ? __pfx_kthread+0x10/0x10 > [ 396.133977] ret_from_fork+0x34/0x40 > [ 396.134613] ? __pfx_kthread+0x10/0x10 > [ 396.135207] ret_from_fork_asm+0x1a/0x30 > [ 396.135863] </TASK> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-12 14:50 ` John Garry @ 2024-08-14 14:00 ` John Garry 2024-08-14 14:46 ` Keith Busch 2024-08-15 5:52 ` Christoph Hellwig 0 siblings, 2 replies; 10+ messages in thread From: John Garry @ 2024-08-14 14:00 UTC (permalink / raw) To: linux-xfs, linux-scsi, linux-block, linux-raid, hch, axboe, martin.petersen This looks to resolve the issue: ------>8------ Author: John Garry <john.g.garry@oracle.com> Date: Wed Aug 14 12:15:26 2024 +0100 block: Read max write zeroes once for __blkdev_issue_write_zeroes() As reported in [0], we may get a hang when formatting a XFS FS on a RAID0 disk. Commit 73a768d5f955 ("block: factor out a blk_write_zeroes_limit helper") changed __blkdev_issue_write_zeroes() to read the max write zeroes value in a loop. This is not safe in case max write zeroes changes, which it seems to do. For case of [0], the value goes to 0, and we get an infinite loop. Lift the limit reading out of the loop. [0] https://lore.kernel.org/linux-xfs/4d31268f-310b-4220-88a2-e191c3932a82@oracle.com/T/#t Signed-off-by: John Garry <john.g.garry@oracle.com> diff --git a/block/blk-lib.c b/block/blk-lib.c index 9f735efa6c94..f65fb083c25d 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -113,11 +113,11 @@ static sector_t bio_write_zeroes_limit(struct block_device *bdev) static void __blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, - struct bio **biop, unsigned flags) + struct bio **biop, unsigned flags, sector_t limit) { + while (nr_sects) { - unsigned int len = min_t(sector_t, nr_sects, - bio_write_zeroes_limit(bdev)); + unsigned int len = min_t(sector_t, nr_sects, limit); struct bio *bio; if ((flags & BLKDEV_ZERO_KILLABLE) && @@ -144,9 +144,10 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector, struct bio *bio = NULL; struct blk_plug plug; int ret = 0; + sector_t limit = bio_write_zeroes_limit(bdev); blk_start_plug(&plug); - __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, flags); + __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, flags, limit); if (bio) { if ((flags & BLKDEV_ZERO_KILLABLE) && fatal_signal_pending(current)) { @@ -165,7 +166,7 @@ static int blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector, * on an I/O error, in which case we'll turn any error into * "not supported" here. */ - if (ret && !bdev_write_zeroes_sectors(bdev)) + if (ret && !limit) return -EOPNOTSUPP; return ret; } @@ -265,12 +266,14 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct bio **biop, unsigned flags) { + + sector_t limit = bio_write_zeroes_limit(bdev); if (bdev_read_only(bdev)) return -EPERM; - if (bdev_write_zeroes_sectors(bdev)) { + if (limit) { __blkdev_issue_write_zeroes(bdev, sector, nr_sects, - gfp_mask, biop, flags); + gfp_mask, biop, flags, limit); } else { if (flags & BLKDEV_ZERO_NOFALLBACK) return -EOPNOTSUPP; ubuntu@jgarry-instance-20240809-1141-3:~/linux$ -----8<------ The value max write zeroes value is changing in raid0_map_submit_bio() -> mddev_check_write_zeroes() > > xfsprogs 5.3.0 does not have this issue for v6.11-rc. xfsprogs 5.15.0 > and later does. > > For xfsprogs on my modestly recent baseline, mkfs.xfs is getting stuck > in prepare_devices() -> libxfs_log_clear() -> libxfs_device_zero() -> > libxfs_device_zero() -> platform_zero_range() -> > fallocate(start=2198746472448 len=2136997888), and this never returns > AFAICS. With v6.10 kernel, that fallocate with same args returns promptly. > > That code path is just not in xfsprogs 5.3.0 > >> After upgrading from v6.10 to v6.11-rc1/2, I am seeing a hang when >> attempting to format a software raid0 array: >> >> $sudo mkfs.xfs -f -K /dev/md127 >> meta-data=/dev/md127 isize=512 agcount=32, >> agsize=33550272 blks >> = sectsz=4096 attr=2, projid32bit=1 >> = crc=1 finobt=1, sparse=1, >> rmapbt=0 >> = reflink=1 bigtime=0 inobtcount=0 >> data = bsize=4096 blocks=1073608704, >> imaxpct=5 >> = sunit=64 swidth=256 blks >> naming =version 2 bsize=4096 ascii-ci=0, ftype=1 >> log =internal log bsize=4096 blocks=521728, version=2 >> = sectsz=4096 sunit=1 blks, lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 >> ^C^C^C^C >> >> >> I'm using mkfs.xfs -K to avoid discard-related lock-up issues which I >> have seen reported when googling - maybe this is just another similar >> issue. >> >> The kernel lockup callstack is at the bottom. >> >> Some array details: >> $sudo mdadm --detail /dev/md127 >> /dev/md127: >> Version : 1.2 >> Creation Time : Thu Aug 8 13:23:59 2024 >> Raid Level : raid0 >> Array Size : 4294438912 (4.00 TiB 4.40 TB) >> Raid Devices : 4 >> Total Devices : 4 >> Persistence : Superblock is persistent >> >> Update Time : Thu Aug 8 13:23:59 2024 >> State : clean >> Active Devices : 4 >> Working Devices : 4 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Layout : -unknown- >> Chunk Size : 256K >> >> Consistency Policy : none >> >> Name : 0 >> UUID : 3490e53f:36d0131b:7c7eb913:0fd62deb >> Events : 0 >> >> Number Major Minor RaidDevice State >> 0 8 16 0 active sync /dev/sdb >> 1 8 64 1 active sync /dev/sde >> 2 8 48 2 active sync /dev/sdd >> 3 8 80 3 active sync /dev/sdf >> >> >> >> $lsblk >> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> sda 8:0 0 46.6G 0 disk >> ├─sda1 8:1 0 100M 0 part /boot/efi >> ├─sda2 8:2 0 1G 0 part /boot >> └─sda3 8:3 0 45.5G 0 part >> ├─ocivolume-root 252:0 0 35.5G 0 lvm / >> └─ocivolume-oled 252:1 0 10G 0 lvm /var/oled >> sdb 8:16 0 1T 0 disk >> └─md127 9:127 0 4T 0 raid0 >> sdc 8:32 0 1T 0 disk >> sdd 8:48 0 1T 0 disk >> └─md127 9:127 0 4T 0 raid0 >> sde 8:64 0 1T 0 disk >> └─md127 9:127 0 4T 0 raid0 >> sdf 8:80 0 1T 0 disk >> └─md127 9:127 0 4T 0 raid0 >> >> I'll start to look deeper, but any suggestions on the problem are >> welcome. >> >> Thanks, >> John >> >> >> ort_iscsi aesni_intel crypto_simd cryptd >> [ 396.110305] CPU: 0 UID: 0 PID: 321 Comm: kworker/0:1H Not tainted >> 6.11.0-rc1-g8400291e289e #11 >> [ 396.111020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS 1.5.1 06/16/2021 >> [ 396.111695] Workqueue: kblockd blk_mq_run_work_fn >> [ 396.112114] RIP: 0010:bio_endio+0xa0/0x1b0 >> [ 396.112455] Code: 96 9a 02 00 48 8b 43 08 48 85 c0 74 09 0f b7 53 >> 14 f6 c2 80 75 3b 48 8b 43 38 48 3d e0 a3 3c b2 75 44 0f b6 43 19 48 >> 8b 6b 40 <84> c0 74 09 80 7d 19 00 75 03 88 45 19 48 89 df 48 89 eb e8 >> 58 fe >> [ 396.113962] RSP: 0018:ffffa3fec19fbc38 EFLAGS: 00000246 >> [ 396.114392] RAX: 0000000000000001 RBX: ffff97a284c3e600 RCX: >> 00000000002a0001 >> [ 396.114974] RDX: 0000000000000000 RSI: ffffcfb0f1130f80 RDI: >> 0000000000020000 >> [ 396.115546] RBP: ffff97a284c41bc0 R08: ffff97a284c3e3c0 R09: >> 00000000002a0001 >> [ 396.116185] R10: 0000000000000000 R11: 0000000000000000 R12: >> ffff9798216ed000 >> [ 396.116766] R13: ffff97975bf071c0 R14: ffff979751be4798 R15: >> 0000000000009000 >> [ 396.117393] FS: 0000000000000000(0000) GS:ffff97b5ff600000(0000) >> knlGS:0000000000000000 >> [ 396.118122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 396.118709] CR2: 00007f2477a45f68 CR3: 0000000107998005 CR4: >> 0000000000770ef0 >> [ 396.119398] PKRU: 55555554 >> [ 396.119627] Call Trace: >> [ 396.119905] <IRQ> >> [ 396.120078] ? watchdog_timer_fn+0x1e2/0x260 >> [ 396.120457] ? __pfx_watchdog_timer_fn+0x10/0x10 >> [ 396.120900] ? __hrtimer_run_queues+0x10c/0x270 >> [ 396.121276] ? hrtimer_interrupt+0x109/0x250 >> [ 396.121663] ? __sysvec_apic_timer_interrupt+0x55/0x120 >> [ 396.122197] ? sysvec_apic_timer_interrupt+0x6c/0x90 >> [ 396.122640] </IRQ> >> [ 396.122815] <TASK> >> [ 396.123009] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 396.123473] ? bio_endio+0xa0/0x1b0 >> [ 396.123794] ? bio_endio+0xb8/0x1b0 >> [ 396.124082] md_end_clone_io+0x42/0xa0 >> [ 396.124406] blk_update_request+0x128/0x490 >> [ 396.124745] ? srso_alias_return_thunk+0x5/0xfbef5 >> [ 396.125554] ? scsi_dec_host_busy+0x14/0x90 >> [ 396.126290] blk_mq_end_request+0x22/0x2e0 >> [ 396.126965] blk_mq_dispatch_rq_list+0x2b6/0x730 >> [ 396.127660] ? srso_alias_return_thunk+0x5/0xfbef5 >> [ 396.128386] __blk_mq_sched_dispatch_requests+0x442/0x640 >> [ 396.129152] blk_mq_sched_dispatch_requests+0x2a/0x60 >> [ 396.130005] blk_mq_run_work_fn+0x67/0x80 >> [ 396.130697] process_scheduled_works+0xa6/0x3e0 >> [ 396.131413] worker_thread+0x117/0x260 >> [ 396.132051] ? __pfx_worker_thread+0x10/0x10 >> [ 396.132697] kthread+0xd2/0x100 >> [ 396.133288] ? __pfx_kthread+0x10/0x10 >> [ 396.133977] ret_from_fork+0x34/0x40 >> [ 396.134613] ? __pfx_kthread+0x10/0x10 >> [ 396.135207] ret_from_fork_asm+0x1a/0x30 >> [ 396.135863] </TASK> >> > ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-14 14:00 ` John Garry @ 2024-08-14 14:46 ` Keith Busch 2024-08-14 14:52 ` Martin K. Petersen 2024-08-15 5:52 ` Christoph Hellwig 1 sibling, 1 reply; 10+ messages in thread From: Keith Busch @ 2024-08-14 14:46 UTC (permalink / raw) To: John Garry Cc: linux-xfs, linux-scsi, linux-block, linux-raid, hch, axboe, martin.petersen On Wed, Aug 14, 2024 at 03:00:06PM +0100, John Garry wrote: > > The value max write zeroes value is changing in raid0_map_submit_bio() -> > mddev_check_write_zeroes() Your change looks fine, though it sounds odd that md raid is changing queue_limit values outside the limits_lock. The stacking limits should have set the md device to 0 if one of the member drives doesn't support write_zeroes, right? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-14 14:46 ` Keith Busch @ 2024-08-14 14:52 ` Martin K. Petersen 2024-08-14 17:25 ` John Garry 0 siblings, 1 reply; 10+ messages in thread From: Martin K. Petersen @ 2024-08-14 14:52 UTC (permalink / raw) To: Keith Busch Cc: John Garry, linux-xfs, linux-scsi, linux-block, linux-raid, hch, axboe, martin.petersen Keith, > Your change looks fine, though it sounds odd that md raid is changing > queue_limit values outside the limits_lock. The stacking limits should > have set the md device to 0 if one of the member drives doesn't > support write_zeroes, right? SCSI can't reliably detect ahead of time whether a device supports WRITE SAME. So we'll issue a WRITE SAME command and if that fails we'll set the queue limit to 0. So it is "normal" that the limit changes at runtime. I'm debugging a couple of other regressions from the queue limits shuffle so I haven't looked into this one yet. But I assume that's what's happening. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-14 14:52 ` Martin K. Petersen @ 2024-08-14 17:25 ` John Garry 2024-08-15 5:42 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: John Garry @ 2024-08-14 17:25 UTC (permalink / raw) To: Martin K. Petersen, Keith Busch Cc: linux-xfs, linux-scsi, linux-block, linux-raid, hch, axboe On 14/08/2024 15:52, Martin K. Petersen wrote: > > Keith, > >> Your change looks fine, though it sounds odd that md raid is changing >> queue_limit values outside the limits_lock. The stacking limits should >> have set the md device to 0 if one of the member drives doesn't >> support write_zeroes, right? > And even if we had used the limits lock to synchronize the update, that only synchronizes writers but not readers (of the limits). > SCSI can't reliably detect ahead of time whether a device supports WRITE > SAME. So we'll issue a WRITE SAME command and if that fails we'll set > the queue limit to 0. So it is "normal" that the limit changes at > runtime. > > I'm debugging a couple of other regressions from the queue limits > shuffle so I haven't looked into this one yet. But I assume that's > what's happening. > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-14 17:25 ` John Garry @ 2024-08-15 5:42 ` Christoph Hellwig 0 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2024-08-15 5:42 UTC (permalink / raw) To: John Garry Cc: Martin K. Petersen, Keith Busch, linux-xfs, linux-scsi, linux-block, linux-raid, hch, axboe On Wed, Aug 14, 2024 at 06:25:39PM +0100, John Garry wrote: > On 14/08/2024 15:52, Martin K. Petersen wrote: >> >> Keith, >> >>> Your change looks fine, though it sounds odd that md raid is changing >>> queue_limit values outside the limits_lock. The stacking limits should >>> have set the md device to 0 if one of the member drives doesn't >>> support write_zeroes, right? >> > > And even if we had used the limits lock to synchronize the update, that > only synchronizes writers but not readers (of the limits). Readers are blocked by freezing the queues (for most drivers) or doing the internal mddev suspend for md. So I suspect kicking off a workqueue to do the limits update will be the right thing going ahead. For now we'll just need to hack around by doing single reads of the field. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-14 14:00 ` John Garry 2024-08-14 14:46 ` Keith Busch @ 2024-08-15 5:52 ` Christoph Hellwig 2024-08-15 6:19 ` John Garry 1 sibling, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2024-08-15 5:52 UTC (permalink / raw) To: John Garry Cc: linux-xfs, linux-scsi, linux-block, linux-raid, hch, axboe, martin.petersen On Wed, Aug 14, 2024 at 03:00:06PM +0100, John Garry wrote: > - __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, > flags); > + __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, > flags, limit); Please fix the overly long line while touching this. > { > + > + sector_t limit = bio_write_zeroes_limit(bdev); > if (bdev_read_only(bdev)) > return -EPERM; Can you add a comment explaining why the limit is read once for future readers? Also please keep an empty line after the variable declaration instead of before it. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-15 5:52 ` Christoph Hellwig @ 2024-08-15 6:19 ` John Garry 2024-08-15 6:21 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: John Garry @ 2024-08-15 6:19 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-xfs, linux-scsi, linux-block, linux-raid, axboe, martin.petersen On 15/08/2024 06:52, Christoph Hellwig wrote: > On Wed, Aug 14, 2024 at 03:00:06PM +0100, John Garry wrote: >> - __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, >> flags); >> + __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, >> flags, limit); > > Please fix the overly long line while touching this. > >> { >> + >> + sector_t limit = bio_write_zeroes_limit(bdev); >> if (bdev_read_only(bdev)) >> return -EPERM; > > Can you add a comment explaining why the limit is read once for future > readers? > Yes, I was going to do that. > Also please keep an empty line after the variable declaration > instead of before it. ok BTW, on a slightly related topic, why is bdev_write_zeroes_sectors() seemingly the only bdev helper which checks bdev_get_queue() return value: static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q) return q->limits.max_write_zeroes_sectors; return 0; } According to the comment in bdev_get_queue(), it never is never NULL. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] raid0 array mkfs.xfs hang 2024-08-15 6:19 ` John Garry @ 2024-08-15 6:21 ` Christoph Hellwig 0 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2024-08-15 6:21 UTC (permalink / raw) To: John Garry Cc: Christoph Hellwig, linux-xfs, linux-scsi, linux-block, linux-raid, axboe, martin.petersen On Thu, Aug 15, 2024 at 07:19:50AM +0100, John Garry wrote: > static inline unsigned int bdev_write_zeroes_sectors(struct block_device > *bdev) > { > struct request_queue *q = bdev_get_queue(bdev); > > if (q) > return q->limits.max_write_zeroes_sectors; > > return 0; > } > > According to the comment in bdev_get_queue(), it never is never NULL. Probably because no one got around to remove it. There never was a need to check the return value, but lots of places did check it. I removed most of them as did a few others when touchign the code, but apparently we never got to this one. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-08-15 6:21 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-08 17:12 [bug report] raid0 array mkfs.xfs hang John Garry 2024-08-12 14:50 ` John Garry 2024-08-14 14:00 ` John Garry 2024-08-14 14:46 ` Keith Busch 2024-08-14 14:52 ` Martin K. Petersen 2024-08-14 17:25 ` John Garry 2024-08-15 5:42 ` Christoph Hellwig 2024-08-15 5:52 ` Christoph Hellwig 2024-08-15 6:19 ` John Garry 2024-08-15 6:21 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).