Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline)

netfs.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline)
       [not found] <669f22fc89e45dd4e56d75876dc8f2bf@3xo.fr>
@ 2025-01-01 18:00 ` nicolas.baranger
  2025-01-06  7:20   ` Christoph Hellwig
  2025-01-06  9:13   ` David Howells
  2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells
  1 sibling, 2 replies; 14+ messages in thread
From: nicolas.baranger @ 2025-01-01 18:00 UTC (permalink / raw)
  To: linux-cifs, netfs


Dear mainteners

Don't know if it's the right place to contact kernel developers but 
today I did report the following bug which appears in Linux 6.10 and I'm 
able to reproduce from Linux 6.10 to mainline

I think the new way CIFS is using NETFS could be one of the cause of the 
issue, as doing :

git log --pretty=oneline v6.9.12..v6.10 | grep cifs: | grep netfs
return :
3ee1a1fc39819906f04d6c62c180e760cd3a689d cifs: Cut over to using 
netfslib
69c3c023af25edb5433a2db824d3e7cc328f0183 cifs: Implement netfslib hooks
dc5939de82f149633d6ec1c403003538442ec9ef cifs: Replace the writedata 
replay bool with a netfs sreq flag
ab58fbdeebc7f9fe8b9bc202660eae3a10e5e678 cifs: Use more fields from 
netfs_io_subrequest
a975a2f22cdce7ec0c678ce8d73d2f6616cb281c cifs: Replace cifs_writedata 
with a wrapper around netfs_io_subrequest
753b67eb630db34e36ec4ae1e86c75e243ea4fc9 cifs: Replace cifs_readdata 
with a wrapper around netfs_io_subrequest


The BUG ID is : https://bugzilla.kernel.org/show_bug.cgi?id=219648

Content of #219648 bug:

Dear maintener,

Since I upgrade 1 server from Linux 6.9.12 to Linux 6.10, I encountered 
the following problem :

Backups of one server are made by writing a dd copy of each LVM volumes 
snapshots into a BACK FILE (LUKS + BTRFS formated) which reside on a big 
SAMBA share (SMB 3.1.1).

The BACK FILE is mounted on the server using 'losetup --sector-size 4096 
--direct-io=on /dev/loop2046 /path/to/back-file/on/samba/share'

It has work very fine since years and years and this methode is it still 
working on the same share ( and different back-files ) from other 
servers using respectively Linux 6.1.25 and Linux 6.5.10 and Linux 
6.9.12

But since I update kernel to Linux  6.10, when I mount the BACK-FILE, at 
the first write the BACK-FILE becomes read-only  and I have this error 
in kernel logs :

[lun. 23 déc. 10:08:49 2024] loop2046: detected capacity change from 0 
to 8589934592
[lun. 23 déc. 10:08:51 2024] BTRFS: device fsid 
a2c979e6-2c6e-4308-a238-55e417a3bcd9 devid 1 transid 395 
/dev/mapper/bckcrypt (253:30) scanned by mount (3643571)
[lun. 23 déc. 10:08:51 2024] BTRFS info (device dm-30): first mount of 
filesystem a2c979e6-2c6e-4308-a238-55e417a3bcd9
[lun. 23 déc. 10:08:51 2024] BTRFS info (device dm-30): using crc32c 
(crc32c-intel) checksum algorithm
[lun. 23 déc. 10:08:51 2024] BTRFS info (device dm-30): disk space 
caching is enabled
[lun. 23 déc. 10:08:51 2024] BTRFS warning (device dm-30): space cache 
v1 is being deprecated and will be removed in a future release, please 
use -o space_cache=v2
[lun. 23 déc. 10:08:51 2024] BTRFS info (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 15, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:08:54 2024] blk_print_req_error: 62 callbacks 
suppressed
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014676032 
op 0x1:(WRITE) flags 0x8800 phys_seg 6 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014675776 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014688456 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014688968 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014689224 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014689736 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014688200 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014689480 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014688712 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014690248 
op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
[lun. 23 déc. 10:08:54 2024] btrfs_dev_stat_inc_and_print: 54 callbacks 
suppressed
[lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 16, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO 
failed ino 361 op 0x8801 offset 0x611000 len 2138112 err no 10
[lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 17, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO 
failed ino 361 op 0x8801 offset 0x81b000 len 2129920 err no 10
[lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 18, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO 
failed ino 361 op 0x8801 offset 0xa23000 len 2088960 err no 10
[lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 19, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO 
failed ino 361 op 0x8801 offset 0x0 len 4190208 err no 10
[lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 20, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO 
failed ino 361 op 0x8801 offset 0x3ff000 len 2170880 err no 10
[lun. 23 déc. 10:09:24 2024] blk_print_req_error: 82 callbacks 
suppressed
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 183168 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 182912 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 21, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 180864 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 181120 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 22, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 181120 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 180864 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 23, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 180864 op 
0x1:(WRITE) flags 0x800 phys_seg 4 prio class 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 185216 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 24, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 184960 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 25, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 183680 op 
0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 26, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 27, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 28, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:25 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 29, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:25 2024] BTRFS error (device dm-30): bdev 
/dev/mapper/bckcrypt errs: wr 30, rd 15, flush 0, corrupt 0, gen 0
[lun. 23 déc. 10:09:25 2024] BTRFS: error (device dm-30) in 
btrfs_commit_transaction:2524: errno=-5 IO failure (Error while writing 
out transaction)
[lun. 23 déc. 10:09:25 2024] BTRFS info (device dm-30 state E): forced 
readonly
[lun. 23 déc. 10:09:25 2024] BTRFS warning (device dm-30 state E): 
Skipping commit of aborted transaction.
[lun. 23 déc. 10:09:25 2024] BTRFS error (device dm-30 state EA): 
Transaction aborted (error -5)
[lun. 23 déc. 10:09:25 2024] BTRFS: error (device dm-30 state EA) in 
cleanup_transaction:2018: errno=-5 IO failure


So First I thought that BTRFS and the fifteen READ / WRITE errors "bdev 
/dev/mapper/bckcrypt errs: wr 15, rd 15" was responsible of the issue, 
so I recreate 100% of the back file on the CIFS share the following way 
:

dd if=/dev/urandom of=/mnt/FBX24T/bck0crypt2044 bs=1G count=4096 
oflag=direct status=progress
losetup --sector-size 4096 --direct-io=on loop2044 
/mnt/FBX24T/bck0crypt2044
cryptsetup luksFormat /dev/loop2044

and normally after (but it didn't work here as the back-file becomes 
read-only)
cryptsetup open /dev/loop2044 bck0crypt2044
mkfs.btrfs /dev/mapper/bck0crypt2044
mount  /dev/mapper/bck0crypt2044 /mnt/bck0crypt

But I cannot do the last part, it breaks at the first write of 
'cryptsetup luksFormat /dev/loop2044'

When testing to directly format the loop device with BTRFS and XFS and 
EXT4, I get the same failiure, ie: at the first write IO the loop device 
become read-only with the provided errors.

So I decide to test again, removing --direct-io=on from losetup command 
and I was able to format with cryptsetup and btrfs (I also tried xfs and 
ext4, all with same result: broken with --direct-io=on and working 
without, same after mounting the loop device whatever filesystem it 
contains)

So this is the  '--direct-io=on' which breaks (from Linux 6.10 to 
mainline) when the back-file is on a CIFS share.
To validate, I did try  '--direct-io=on' on a local filesystem with no 
issue (same process with cryptsetup and brtfs or xfs or ext4)

And to be sure, I did umount the read-only device from the server and 
remove the losetup device too and I go to my Debian 12 laptop with 
kernel 6.9.12 and I mount the CIFS share on which the 'broken' back-file 
reside, creating a loop device (with  '--direct-io=on' ) and I open it 
with cryptsetup and mount it (as previously described).
After that I did issue about 1TB of write from my laptop to this loop 
device with no issues.

One of the biggest problem here removing the '--direct-io=on' from 
losetup command is the read/write performance of the loop device: about 
150Mib/s without '--direct-io=on' (which make it unusable for backuping 
a server every day) and more than 3Gib/s with '--direct-io=on' option.
So with  '--direct-io=on' , the write speed is between 25 to 50 times 
the speed I constat without the option (making this solution viable for 
an 'every day' full backup of 2TB+)

When having a deeper look in the kernel logs, I find that the issue is 
always precede by the following 'netfs:' error:

[mer.  1 janv. 10:58:53 2025] ------------[ cut here ]------------
[mer.  1 janv. 10:58:53 2025] WARNING: CPU: 2 PID: 109 at 
fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]
[mer.  1 janv. 10:58:53 2025] Modules linked in: dm_crypt(E) cmac(E) 
nls_utf8(E) cifs(E) cifs_arc4(E) nls_ucs2_utils(E) cifs_md4(E) 
dns_resolver(E) netfs(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) 
snd_seq_device(E) rfkill(E) qrtr(E) binfmt_misc(E) nls_ascii(E) 
nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) mbcache(E) jbd2(E) 
intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency_common(E) 
kvm_intel(E) kvm(E) ghash_clmulni_intel(E) sha512_ssse3(E) 
sha256_ssse3(E) sha1_ssse3(E) snd_intel8x0(E) snd_ac97_codec(E) 
ac97_bus(E) aesni_intel(E) crypto_simd(E) cryptd(E) snd_pcm(E) joydev(E) 
rapl(E) snd_timer(E) snd(E) vboxguest(OE) pcspkr(E) soundcore(E) ac(E) 
sg(E) serio_raw(E) evdev(E) msr(E) parport_pc(E) ppdev(E) lp(E) 
parport(E) loop(E) configfs(E) efi_pstore(E) efivarfs(E) ip_tables(E) 
x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) 
libcrc32c(E) crc32c_generic(E) dm_mod(E) hid_generic(E) usbhid(E) hid(E) 
nvme(E) sr_mod(E) cdrom(E) nvme_core(E) t10_pi(E) vmwgfx(E) ahci(E) 
xhci_pci(E) drm_ttm_helper(E)
[mer.  1 janv. 10:58:53 2025]  virtio_net(E) ttm(E) libahci(E) 
net_failover(E) crc64_rocksoft(E) crc32_pclmul(E) drm_kms_helper(E) 
xhci_hcd(E) failover(E) crc64(E) libata(E) crc32c_intel(E) psmouse(E) 
crc_t10dif(E) dimlib(E) crct10dif_generic(E) crct10dif_pclmul(E) 
crct10dif_common(E) scsi_mod(E) usbcore(E) drm(E) scsi_common(E) 
usb_common(E) i2c_piix4(E) video(E) wmi(E) button(E)
[mer.  1 janv. 10:58:53 2025] CPU: 2 PID: 109 Comm: kworker/u35:1 
Tainted: G        W  OE      6.10.0-amd64 #1
[mer.  1 janv. 10:58:53 2025] Hardware name: innotek GmbH 
VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[mer.  1 janv. 10:58:53 2025] Workqueue: loop2046 loop_rootcg_workfn 
[loop]
[mer.  1 janv. 10:58:53 2025] RIP: 
0010:netfs_extract_user_iter+0x170/0x250 [netfs]
[mer.  1 janv. 10:58:53 2025] Code: 00 29 fb 31 ff 89 5a f8 4c 39 d9 75 
c2 4d 85 c9 0f 84 c2 00 00 00 45 39 f2 0f 83 b9 00 00 00 4d 89 cd 44 89 
d3 e9 35 ff ff ff <0f> 0b 48 c7 c3 fb ff ff ff 48 8b 44 24 28 65 48 2b 
04 25 28 00 00
[mer.  1 janv. 10:58:53 2025] RSP: 0018:ffffa44f00887c10 EFLAGS: 
00010202
[mer.  1 janv. 10:58:53 2025] RAX: 0000000000000000 RBX: 
ffff937da084b200 RCX: 0000000000000000
[mer.  1 janv. 10:58:53 2025] RDX: ffff937da084b340 RSI: 
0000000000100000 RDI: ffffa44f00887d10
[mer.  1 janv. 10:58:53 2025] RBP: ffff937da0837920 R08: 
ffffffffc13f7bc0 R09: 0000000000000000
[mer.  1 janv. 10:58:53 2025] R10: ffff937da084b200 R11: 
ffff937b81149100 R12: 0000000000100000
[mer.  1 janv. 10:58:53 2025] R13: ffffa44f00887d10 R14: 
ffffffffc0dbbeb0 R15: 0000034702500000
[mer.  1 janv. 10:58:53 2025] FS:  0000000000000000(0000) 
GS:ffff937e8fb00000(0000) knlGS:0000000000000000
[mer.  1 janv. 10:58:53 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
[mer.  1 janv. 10:58:53 2025] CR2: 00007efe292cb9f0 CR3: 
00000001c53f4001 CR4: 00000000000706f0
[mer.  1 janv. 10:58:53 2025] Call Trace:
[mer.  1 janv. 10:58:53 2025]  <TASK>
[mer.  1 janv. 10:58:53 2025]  ? __warn+0x80/0x120
[mer.  1 janv. 10:58:53 2025]  ? netfs_extract_user_iter+0x170/0x250 
[netfs]
[mer.  1 janv. 10:58:53 2025]  ? report_bug+0x164/0x190
[mer.  1 janv. 10:58:53 2025]  ? handle_bug+0x41/0x70
[mer.  1 janv. 10:58:53 2025]  ? exc_invalid_op+0x17/0x70
[mer.  1 janv. 10:58:53 2025]  ? asm_exc_invalid_op+0x1a/0x20
[mer.  1 janv. 10:58:53 2025]  ? __pfx_lo_rw_aio_complete+0x10/0x10 
[loop]
[mer.  1 janv. 10:58:53 2025]  ? netfs_extract_user_iter+0x170/0x250 
[netfs]
[mer.  1 janv. 10:58:53 2025]  ? __pfx_lo_rw_aio_complete+0x10/0x10 
[loop]
[mer.  1 janv. 10:58:53 2025]  
netfs_unbuffered_write_iter_locked+0x97/0x3a0 [netfs]
[mer.  1 janv. 10:58:53 2025]  netfs_unbuffered_write_iter+0x177/0x230 
[netfs]
[mer.  1 janv. 10:58:53 2025]  lo_rw_aio.isra.0+0x2ad/0x2d0 [loop]
[mer.  1 janv. 10:58:53 2025]  loop_process_work+0xae/0x980 [loop]
[mer.  1 janv. 10:58:53 2025]  ? psi_task_switch+0xd6/0x230
[mer.  1 janv. 10:58:53 2025]  ? _raw_spin_unlock+0xe/0x30
[mer.  1 janv. 10:58:53 2025]  ? finish_task_switch.isra.0+0x88/0x2d0
[mer.  1 janv. 10:58:53 2025]  ? __schedule+0x3f3/0xb40
[mer.  1 janv. 10:58:53 2025]  process_one_work+0x17c/0x390
[mer.  1 janv. 10:58:53 2025]  worker_thread+0x265/0x380
[mer.  1 janv. 10:58:53 2025]  ? __pfx_worker_thread+0x10/0x10
[mer.  1 janv. 10:58:53 2025]  kthread+0xd2/0x100
[mer.  1 janv. 10:58:53 2025]  ? __pfx_kthread+0x10/0x10
[mer.  1 janv. 10:58:53 2025]  ret_from_fork+0x34/0x50
[mer.  1 janv. 10:58:53 2025]  ? __pfx_kthread+0x10/0x10
[mer.  1 janv. 10:58:53 2025]  ret_from_fork_asm+0x1a/0x30
[mer.  1 janv. 10:58:53 2025]  </TASK>
[mer.  1 janv. 10:58:53 2025] ---[ end trace 0000000000000000 ]---
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038117888 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038187520 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038185472 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038183424 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038181376 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038179328 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038177280 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038175232 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038173184 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038171136 
op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
[mer.  1 janv. 10:58:53 2025] BTRFS error (device dm-7): bdev 
/dev/mapper/bckcrypt errs: wr 16, rd 15, flush 0, corrupt 0, gen 0
[mer.  1 janv. 10:58:53 2025] BTRFS warning (device dm-7): direct IO 
failed ino 360 op 0x8801 offset 0x0 len 268435456 err no 10


NB: I'm able to reproduce it everytime on Virtual Machines + on physical 
hardware (server, laptop, pc ...) and while trying to bissect the 
regression, I build and test nearly all kernels from Linux 6.9.6 to 
mainline with the exact same result: still working from Linux 6.9.6 to 
6.9.12 and failing from Linux 6.10(.0) to mainline

I'm not a kernel developer but I stay availiable for help or to 
reproduce and give traces of this issue (having now some VM in my lab 
dedicated to build new kernel + to tests and reproduce this particular 
issue, and if needed a public IPv6 access to those lab machines could be 
privately share with the maintener and a live demo of the issue can be 
organised).

I would be happy if I could help, let me know how to

Thanks for help
Kind regards
Nicolas Baranger



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline)
  2025-01-01 18:00 ` Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) nicolas.baranger
@ 2025-01-06  7:20   ` Christoph Hellwig
  2025-01-06  9:13   ` David Howells
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-01-06  7:20 UTC (permalink / raw)
  To: nicolas.baranger; +Cc: linux-cifs, netfs, David Howells

On Wed, Jan 01, 2025 at 07:00:58PM +0100, nicolas.baranger@3xo.fr wrote:
> 
> Dear mainteners
> 
> Don't know if it's the right place to contact kernel developers but today I
> did report the following bug which appears in Linux 6.10 and I'm able to
> reproduce from Linux 6.10 to mainline
> 
> I think the new way CIFS is using NETFS could be one of the cause of the
> issue, as doing :

The poblem is that netfs_extract_user_iter rejects iter types other than
ubuf and iovec, which breaks loop which is using bvec iters.  It would
also break other things like io_uring pre-registered buffers, and all
of these are regressions compared to the old cifs code.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline)
  2025-01-01 18:00 ` Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) nicolas.baranger
  2025-01-06  7:20   ` Christoph Hellwig
@ 2025-01-06  9:13   ` David Howells
  2025-01-06  9:16     ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: David Howells @ 2025-01-06  9:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: dhowells, nicolas.baranger, linux-cifs, netfs

Christoph Hellwig <hch@infradead.org> wrote:

> > I think the new way CIFS is using NETFS could be one of the cause of the
> > issue, as doing :
> 
> The poblem is that netfs_extract_user_iter rejects iter types other than
> ubuf and iovec, which breaks loop which is using bvec iters.  It would
> also break other things like io_uring pre-registered buffers, and all
> of these are regressions compared to the old cifs code.

Okay, I can reproduce it trivially.  Question is, do I need to copy the
bio_vec array (or kvec array or folio_queue list) or can I rely on that being
maintained till the end of the op?  (Obviously, I can't rely on the iov_iter
struct itself being maintained).  I think I have to copy the contents, just in
case.

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline)
  2025-01-06  9:13   ` David Howells
@ 2025-01-06  9:16     ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-01-06  9:16 UTC (permalink / raw)
  To: David Howells; +Cc: Christoph Hellwig, nicolas.baranger, linux-cifs, netfs

On Mon, Jan 06, 2025 at 09:13:02AM +0000, David Howells wrote:
> Okay, I can reproduce it trivially.  Question is, do I need to copy the
> bio_vec array (or kvec array or folio_queue list) or can I rely on that being
> maintained till the end of the op?  (Obviously, I can't rely on the iov_iter
> struct itself being maintained).  I think I have to copy the contents, just in
> case.

The bio_vec array can't be freed while I/O is in progress.  Take a look
at the iov_iter_is_bvec case in bio_iov_iter_get_pages for how simple
ITER_BVEC handling can be.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] netfs: Fix kernel async DIO
       [not found] <669f22fc89e45dd4e56d75876dc8f2bf@3xo.fr>
  2025-01-01 18:00 ` Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) nicolas.baranger
@ 2025-01-06 11:37 ` David Howells
  2025-01-06 12:07   ` nicolas.baranger
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: David Howells @ 2025-01-06 11:37 UTC (permalink / raw)
  To: nicolas.baranger
  Cc: dhowells, Steve French, Christoph Hellwig, Jeff Layton,
	Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel

Hi Nicolas,

Does the attached fix your problem?

David
---
netfs: Fix kernel async DIO

Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array.  Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

        mount //my/cifs/share /foo
        dd if=/dev/zero of=/foo/m0 bs=4K count=1K
        losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
        echo hello >/dev/loop2046

This causes the following to appear in syslog:

        WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes.  Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/direct_write.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index eded8afaa60b..42ce53cc216e 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 		 * allocate a sufficiently large bvec array and may shorten the
 		 * request.
 		 */
-		if (async || user_backed_iter(iter)) {
+		if (user_backed_iter(iter)) {
 			n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
 			if (n < 0) {
 				ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 			wreq->direct_bv_count = n;
 			wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
 		} else {
+			/* If this is a kernel-generated async DIO request,
+			 * assume that any resources the iterator points to
+			 * (eg. a bio_vec array) will persist till the end of
+			 * the op.
+			 */
 			wreq->buffer.iter = *iter;
 		}
 	}


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix kernel async DIO
  2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells
@ 2025-01-06 12:07   ` nicolas.baranger
  2025-01-07  8:26     ` nicolas.baranger
  2025-01-07 14:49     ` David Howells
  2025-01-06 15:34   ` Christoph Hellwig
  2025-01-07 12:03   ` [PATCH] netfs: Fix kernel async DIO Paulo Alcantara
  2 siblings, 2 replies; 14+ messages in thread
From: nicolas.baranger @ 2025-01-06 12:07 UTC (permalink / raw)
  To: David Howells
  Cc: Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner,
	netfs, linux-cifs, linux-fsdevel, linux-kernel

Hi David

Thanks for the job !
I will buid Linux 6.10 and mainline with the provided change and I'm 
comming here as soon as I get results from tests (CET working time).

Thanks again for help in this issue
Nicolas

Le 2025-01-06 12:37, David Howells a écrit :

> Hi Nicolas,
> 
> Does the attached fix your problem?
> 
> David
> ---
> netfs: Fix kernel async DIO
> 
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
> that
> is supplied with a bio_vec[] array.  Currently, because of the async 
> flag,
> this gets passed to netfs_extract_user_iter() which throws a warning 
> and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
> 
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
> 
> This causes the following to appear in syslog:
> 
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
> netfs_extract_user_iter+0x170/0x250 [netfs]
> 
> and the write to fail.
> 
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
> that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence 
> of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
> 
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: 
> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c |    7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index eded8afaa60b..42ce53cc216e 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> * allocate a sufficiently large bvec array and may shorten the
> * request.
> */
> -        if (async || user_backed_iter(iter)) {
> +        if (user_backed_iter(iter)) {
> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
> if (n < 0) {
> ret = n;
> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> wreq->direct_bv_count = n;
> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
> } else {
> +            /* If this is a kernel-generated async DIO request,
> +             * assume that any resources the iterator points to
> +             * (eg. a bio_vec array) will persist till the end of
> +             * the op.
> +             */
> wreq->buffer.iter = *iter;
> }
> }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix kernel async DIO
  2025-01-06 12:07   ` nicolas.baranger
@ 2025-01-07  8:26     ` nicolas.baranger
  2025-01-07 14:49     ` David Howells
  1 sibling, 0 replies; 14+ messages in thread
From: nicolas.baranger @ 2025-01-07  8:26 UTC (permalink / raw)
  To: David Howells
  Cc: Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner,
	netfs, linux-cifs, linux-fsdevel, linux-kernel

Hi David

As your patch was written on top on linux-next I was required to make 
some small modifications to make it work on mainline (6.13-rc6).
The following patch is working fine for me on mainline, but i think it 
would be better to wait for your confirmation / validation (or new 
patch) before applying it on production.

#-------- PATCH --------#

diff --git a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c 
b/linux-6.13-rc6/fs/netfs/direct_write.c
index 88f2adf..94a1ee8 100644
--- a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c
+++ b/linux-6.13-rc6/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
kiocb *iocb, struct iov_iter *
                  * allocate a sufficiently large bvec array and may 
shorten the
                  * request.
                  */
-               if (async || user_backed_iter(iter)) {
+               if (user_backed_iter(iter)) {
                         n = netfs_extract_user_iter(iter, len, 
&wreq->iter, 0);
                         if (n < 0) {
                                 ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
kiocb *iocb, struct iov_iter *
                         wreq->direct_bv_count = n;
                         wreq->direct_bv_unpin = 
iov_iter_extract_will_pin(iter);
                 } else {
+                       /* If this is a kernel-generated async DIO 
request,
+                        * assume that any resources the iterator points 
to
+                        * (eg. a bio_vec array) will persist till the 
end of
+                        * the op.
+                        */
                         wreq->iter = *iter;
                 }


#-------- TESTS --------#

Using this patch Linux 6.13-rc6 build with no error and '--direct-io=on' 
is working :


18:38:47 root@deb12-lab-10d:~# uname -a
Linux deb12-lab-10d.lab.lan 6.13.0-rc6-amd64 #0 SMP PREEMPT_DYNAMIC Mon 
Jan  6 18:14:07 CET 2025 x86_64 GNU/Linux

18:39:29 root@deb12-lab-10d:~# losetup
NAME          SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                    
    DIO LOG-SEC
/dev/loop2046         0      0         0  0 
/mnt/FBX24T/FS-LAN/bckcrypt2046   1    4096

18:39:32 root@deb12-lab-10d:~# dmsetup ls | grep bckcrypt
bckcrypt    (254:7)

18:39:55 root@deb12-lab-10d:~# cryptsetup status bckcrypt
/dev/mapper/bckcrypt is active and is in use.
   type:    LUKS2
   cipher:  aes-xts-plain64
   keysize: 512 bits
   key location: keyring
   device:  /dev/loop2046
   loop:    /mnt/FBX24T/FS-LAN/bckcrypt2046
   sector size:  512
   offset:  32768 sectors
   size:    8589901824 sectors
   mode:    read/write

18:40:36 root@deb12-lab-10d:~# df -h | egrep 'cifs|bckcrypt'
//10.0.10.100/FBX24T      cifs        22T     13T  9,0T  60% /mnt/FBX24T
/dev/mapper/bckcrypt      btrfs      4,0T    3,3T  779G  82% 
/mnt/bckcrypt


09:08:44 root@deb12-lab-10d:~# LANG=en_US.UTF-8
09:08:46 root@deb12-lab-10d:~# dd if=/dev/zero 
of=/mnt/bckcrypt/test/test.dd bs=256M count=16 oflag=direct 
status=progress
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14 s, 302 MB/s
16+0 records in
16+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14.2061 s, 302 MB/s



No write errors using '--direct-io=on' option of losetup with this patch 
=> writing to the back-file is more than 20x faster ...
It seems to be ok !

Let me know if something's wrong in this patch or if it can safely be 
used in production.

Again thanks everyone for help.
Nicolas



Le 2025-01-06 13:07, nicolas.baranger@3xo.fr a écrit :

> Hi David
> 
> Thanks for the job !
> I will buid Linux 6.10 and mainline with the provided change and I'm 
> comming here as soon as I get results from tests (CET working time).
> 
> Thanks again for help in this issue
> Nicolas
> 
> Le 2025-01-06 12:37, David Howells a écrit :
> 
>> Hi Nicolas,
>> 
>> Does the attached fix your problem?
>> 
>> David
>> ---
>> netfs: Fix kernel async DIO
>> 
>> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
>> that
>> is supplied with a bio_vec[] array.  Currently, because of the async 
>> flag,
>> this gets passed to netfs_extract_user_iter() which throws a warning 
>> and
>> fails because it only handles IOVEC and UBUF iterators.  This can be
>> triggered through a combination of cifs and a loopback blockdev with
>> something like:
>> 
>> mount //my/cifs/share /foo
>> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>> echo hello >/dev/loop2046
>> 
>> This causes the following to appear in syslog:
>> 
>> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
>> netfs_extract_user_iter+0x170/0x250 [netfs]
>> 
>> and the write to fail.
>> 
>> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
>> that
>> causes async kernel DIO writes to be handled as userspace writes.  
>> Note
>> that this change relies on the kernel caller maintaining the existence 
>> of
>> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>> 
>> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
>> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
>> Closes: 
>> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Steve French <smfrench@gmail.com>
>> cc: Jeff Layton <jlayton@kernel.org>
>> cc: netfs@lists.linux.dev
>> cc: linux-cifs@vger.kernel.org
>> cc: linux-fsdevel@vger.kernel.org
>> ---
>> fs/netfs/direct_write.c |    7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
>> index eded8afaa60b..42ce53cc216e 100644
>> --- a/fs/netfs/direct_write.c
>> +++ b/fs/netfs/direct_write.c
>> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
>> kiocb *iocb, struct iov_iter *
>> * allocate a sufficiently large bvec array and may shorten the
>> * request.
>> */
>> -        if (async || user_backed_iter(iter)) {
>> +        if (user_backed_iter(iter)) {
>> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0);
>> if (n < 0) {
>> ret = n;
>> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
>> kiocb *iocb, struct iov_iter *
>> wreq->direct_bv_count = n;
>> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
>> } else {
>> +            /* If this is a kernel-generated async DIO request,
>> +             * assume that any resources the iterator points to
>> +             * (eg. a bio_vec array) will persist till the end of
>> +             * the op.
>> +             */
>> wreq->buffer.iter = *iter;
>> }
>> }

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix kernel async DIO
  2025-01-06 12:07   ` nicolas.baranger
  2025-01-07  8:26     ` nicolas.baranger
@ 2025-01-07 14:49     ` David Howells
  2025-01-07 18:08       ` Nicolas Baranger
  1 sibling, 1 reply; 14+ messages in thread
From: David Howells @ 2025-01-07 14:49 UTC (permalink / raw)
  To: nicolas.baranger
  Cc: dhowells, Steve French, Christoph Hellwig, Jeff Layton,
	Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel

Thanks!

I ported the patch to linus/master (see below) and it looks pretty much the
same as yours, give or take tabs getting converted to spaces.

Could I put you down as a Tested-by?

David

---
netfs: Fix kernel async DIO

Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array.  Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

        mount //my/cifs/share /foo
        dd if=/dev/zero of=/foo/m0 bs=4K count=1K
        losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
        echo hello >/dev/loop2046

This causes the following to appear in syslog:

        WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes.  Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/direct_write.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 173e8b5e6a93..f9421f3e6d37 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 		 * allocate a sufficiently large bvec array and may shorten the
 		 * request.
 		 */
-		if (async || user_backed_iter(iter)) {
+		if (user_backed_iter(iter)) {
 			n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
 			if (n < 0) {
 				ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 			wreq->direct_bv_count = n;
 			wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
 		} else {
+			/* If this is a kernel-generated async DIO request,
+			 * assume that any resources the iterator points to
+			 * (eg. a bio_vec array) will persist till the end of
+			 * the op.
+			 */
 			wreq->iter = *iter;
 		}
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix kernel async DIO
  2025-01-07 14:49     ` David Howells
@ 2025-01-07 18:08       ` Nicolas Baranger
  0 siblings, 0 replies; 14+ messages in thread
From: Nicolas Baranger @ 2025-01-07 18:08 UTC (permalink / raw)
  To: David Howells
  Cc: Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner,
	netfs, linux-cifs, linux-fsdevel, linux-kernel

Hi David

Sure you can !

Please also note that after building 'linux-next' and applying the first 
patch you provide I sucessfully test DIO write (same test process as 
before).
It works fine too !

I stay availiable for further testing

Thanks again for help (special thanks to Christoph and David)
Nicolas



Le 2025-01-07 15:49, David Howells a écrit :

> Thanks!
> 
> I ported the patch to linus/master (see below) and it looks pretty much 
> the
> same as yours, give or take tabs getting converted to spaces.
> 
> Could I put you down as a Tested-by?
> 
> David
> 
> ---
> netfs: Fix kernel async DIO
> 
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO 
> that
> is supplied with a bio_vec[] array.  Currently, because of the async 
> flag,
> this gets passed to netfs_extract_user_iter() which throws a warning 
> and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
> 
> mount //my/cifs/share /foo
> dd if=/dev/zero of=/foo/m0 bs=4K count=1K
> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
> echo hello >/dev/loop2046
> 
> This causes the following to appear in syslog:
> 
> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 
> netfs_extract_user_iter+0x170/0x250 [netfs]
> 
> and the write to fail.
> 
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() 
> that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence 
> of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
> 
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: 
> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/netfs/direct_write.c |    7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
> index 173e8b5e6a93..f9421f3e6d37 100644
> --- a/fs/netfs/direct_write.c
> +++ b/fs/netfs/direct_write.c
> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> * allocate a sufficiently large bvec array and may shorten the
> * request.
> */
> -        if (async || user_backed_iter(iter)) {
> +        if (user_backed_iter(iter)) {
> n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
> if (n < 0) {
> ret = n;
> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct 
> kiocb *iocb, struct iov_iter *
> wreq->direct_bv_count = n;
> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
> } else {
> +            /* If this is a kernel-generated async DIO request,
> +             * assume that any resources the iterator points to
> +             * (eg. a bio_vec array) will persist till the end of
> +             * the op.
> +             */
> wreq->iter = *iter;
> }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix kernel async DIO
  2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells
  2025-01-06 12:07   ` nicolas.baranger
@ 2025-01-06 15:34   ` Christoph Hellwig
  2025-03-20  8:46     ` [Linux 6.14 - netfs/cifs] loop on file cat + file copy Nicolas Baranger
  2025-01-07 12:03   ` [PATCH] netfs: Fix kernel async DIO Paulo Alcantara
  2 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2025-01-06 15:34 UTC (permalink / raw)
  To: David Howells
  Cc: nicolas.baranger, Steve French, Christoph Hellwig, Jeff Layton,
	Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel

On Mon, Jan 06, 2025 at 11:37:24AM +0000, David Howells wrote:
>         mount //my/cifs/share /foo
>         dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>         losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>         echo hello >/dev/loop2046

Can you add a testcase using losetup --direct-io with a file on
$TEST_DIR so that we get coverage for ITER_BVEC directio to xfstests?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Linux 6.14 - netfs/cifs] loop on file cat + file copy
  2025-01-06 15:34   ` Christoph Hellwig
@ 2025-03-20  8:46     ` Nicolas Baranger
  0 siblings, 0 replies; 14+ messages in thread
From: Nicolas Baranger @ 2025-03-20  8:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Howells, Steve French, Jeff Layton, Christian Brauner,
	netfs, linux-cifs, linux-fsdevel, linux-kernel


Hi Christoph

Sorry to contact you again but last time you and David H. help me a lot 
with 'kernel async DIO' / 'Losetup Direct I/O breaks BACK-FILE 
filesystem on CIFS share (Appears in Linux 6.10 and reproduced on 
mainline)'

I don't know if it had already been reported but after building Linux 
6.14-rc1 I constat the following behaviour:

'cat' command is going on a loop when I cat a file which reside on cifs 
share

And so 'cp' command does the same: it copy the content of a file on cifs 
share and loop writing it to the destination
I did test with a file named 'toto' and containing only ascii string 
'toto'.

When I started copying it from cifs share to local filesystem, I had to 
CTRL+C the copy of this 5 bytes file after some time because the 
destination file was using all the filesystem free space and containing 
billions of 'toto' lines

Here is an example with cat:

CIFS SHARE is mounted as /mnt/fbx/FBX-24T

CIFS mount options:
grep cifs /proc/mounts
//10.0.10.100/FBX24T /mnt/fbx/FBX-24T cifs 
rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=none,upcall_target=app,username=fbx,domain=HOMELAN,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.10.100,file_mode=0666,dir_mode=0755,iocharset=utf8,soft,nounix,serverino,mapposix,mfsymlinks,reparse=nfs,nativesocket,symlink=mfsymlinks,rsize=65536,wsize=65536,bsize=16777216,retrans=1,echo_interval=60,actimeo=1,closetimeo=1 
0 0

KERNEL: uname -a
Linux 14RV-SERVER.14rv.lan 6.14.0.1-ast-rc2-amd64 #0 SMP PREEMPT_DYNAMIC 
Wed Feb 12 18:23:00 CET 2025 x86_64 GNU/Linux


To be reproduced:
echo toto >/mnt/fbx/FBX-24T/toto

ls -l /mnt/fbx/FBX-24T/toto
-rw-rw-rw- 1 root root 5 20 mars  09:20 /mnt/fbx/FBX-24T/toto

cat /mnt/fbx/FBX-24T/toto
toto
toto
toto
toto
toto
toto
toto
^C

strace cat /mnt/fbx/FBX-24T/toto
execve("/usr/bin/cat", ["cat", "/mnt/fbx/FBX-24T/toto"], 0x7ffc39b41848 
/* 19 vars */) = 0
brk(NULL)                               = 0x55755b1c1000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7f55f95d6000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (Aucun fichier ou 
dossier de ce type)
openat(AT_FDCWD, "glibc-hwcaps/x86-64-v3/libc.so.6", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "glibc-hwcaps/x86-64-v2/libc.so.6", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = 
-1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun 
fichier ou dossier de ce type)
openat(AT_FDCWD, "haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun 
fichier ou dossier de ce type)
openat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v3/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v3", 0x7fff25937800, 0) 
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v2/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v2", 0x7fff25937800, 0) 
= -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, 
"/usr/local/cuda-12.6/lib64/tls/haswell/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/x86_64", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/x86_64", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls", 0x7fff25937800, 
0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/x86_64", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/x86_64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/x86_64", 
0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type)
openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/libc.so.6", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type)
newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64", 
{st_mode=S_IFDIR|S_ISGID|0755, st_size=4570, ...}, 0) = 0
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=148466, ...}, 
AT_EMPTY_PATH) = 0
mmap(NULL, 148466, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f55f95b1000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) 
= 3
read(3, 
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20t\2\0\0\0\0\0"..., 
832) = 832
pread64(3, 
"\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 
64) = 784
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1922136, ...}, 
AT_EMPTY_PATH) = 0
pread64(3, 
"\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 
64) = 784
mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f55f93d0000
mmap(0x7f55f93f6000, 1396736, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f55f93f6000
mmap(0x7f55f954b000, 339968, PROT_READ, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f55f954b000
mmap(0x7f55f959e000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f55f959e000
mmap(0x7f55f95a4000, 53072, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f55f95a4000
close(3)                                = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x7f55f93cd000
arch_prctl(ARCH_SET_FS, 0x7f55f93cd740) = 0
set_tid_address(0x7f55f93cda10)         = 38427
set_robust_list(0x7f55f93cda20, 24)     = 0
rseq(0x7f55f93ce060, 0x20, 0, 0x53053053) = 0
mprotect(0x7f55f959e000, 16384, PROT_READ) = 0
mprotect(0x55754475e000, 4096, PROT_READ) = 0
mprotect(0x7f55f960e000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, 
rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7f55f95b1000, 148466)          = 0
getrandom("\x19\x6b\x9e\x55\x7e\x09\x74\x5f", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x55755b1c1000
brk(0x55755b1e2000)                     = 0x55755b1e2000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 
3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=3048928, ...}, 
AT_EMPTY_PATH) = 0
mmap(NULL, 3048928, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f55f9000000
close(3)                                = 0
newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}, 
AT_EMPTY_PATH) = 0
openat(AT_FDCWD, "/mnt/fbx/FBX-24T/toto", O_RDONLY) = 3
newfstatat(3, "", {st_mode=S_IFREG|0666, st_size=5, ...}, AT_EMPTY_PATH) 
= 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
mmap(NULL, 16785408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x7f55f7ffe000
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
) = 16711680
read(3, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16777216) = 16711680
write(1, 
"toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
16711680toto
^Cstrace: Process 38427 detached
  <detached ...>


Please let me know if it had already been fixed or reported and if 
you're able to reproduce this issue.

Thanks for help

Kind regards
Nicolas Baranger

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix kernel async DIO
  2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells
  2025-01-06 12:07   ` nicolas.baranger
  2025-01-06 15:34   ` Christoph Hellwig
@ 2025-01-07 12:03   ` Paulo Alcantara
  2 siblings, 0 replies; 14+ messages in thread
From: Paulo Alcantara @ 2025-01-07 12:03 UTC (permalink / raw)
  To: David Howells, nicolas.baranger
  Cc: dhowells, Steve French, Christoph Hellwig, Jeff Layton,
	Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel

David Howells <dhowells@redhat.com> writes:

> netfs: Fix kernel async DIO
>
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
> is supplied with a bio_vec[] array.  Currently, because of the async flag,
> this gets passed to netfs_extract_user_iter() which throws a warning and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
>
>         mount //my/cifs/share /foo
>         dd if=/dev/zero of=/foo/m0 bs=4K count=1K
>         losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
>         echo hello >/dev/loop2046
>
> This causes the following to appear in syslog:
>
>         WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]
>
> and the write to fail.
>
> Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
> causes async kernel DIO writes to be handled as userspace writes.  Note
> that this change relies on the kernel caller maintaining the existence of
> the bio_vec array (or kvec[] or folio_queue) until the op is complete.
>
> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr>
> Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <smfrench@gmail.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: netfs@lists.linux.dev
> cc: linux-cifs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/netfs/direct_write.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

LGTM.  Feel free to add:

Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>

Thanks Christoph and Dave!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] netfs: Fix kernel async DIO
@ 2025-01-07 18:39 David Howells
  2025-01-09 16:19 ` Christian Brauner
  0 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2025-01-07 18:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: dhowells, Nicolas Baranger, Paulo Alcantara, Steve French,
	Jeff Layton, netfs, linux-cifs, linux-fsdevel, linux-kernel

Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
is supplied with a bio_vec[] array.  Currently, because of the async flag,
this gets passed to netfs_extract_user_iter() which throws a warning and
fails because it only handles IOVEC and UBUF iterators.  This can be
triggered through a combination of cifs and a loopback blockdev with
something like:

        mount //my/cifs/share /foo
        dd if=/dev/zero of=/foo/m0 bs=4K count=1K
        losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0
        echo hello >/dev/loop2046

This causes the following to appear in syslog:

        WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs]

and the write to fail.

Fix this by removing the check in netfs_unbuffered_write_iter_locked() that
causes async kernel DIO writes to be handled as userspace writes.  Note
that this change relies on the kernel caller maintaining the existence of
the bio_vec array (or kvec[] or folio_queue) until the op is complete.

Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Reported-by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Nicolas Baranger <nicolas.baranger@3xo.fr>
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Steve French <smfrench@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/direct_write.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 173e8b5e6a93..f9421f3e6d37 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 		 * allocate a sufficiently large bvec array and may shorten the
 		 * request.
 		 */
-		if (async || user_backed_iter(iter)) {
+		if (user_backed_iter(iter)) {
 			n = netfs_extract_user_iter(iter, len, &wreq->iter, 0);
 			if (n < 0) {
 				ret = n;
@@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 			wreq->direct_bv_count = n;
 			wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter);
 		} else {
+			/* If this is a kernel-generated async DIO request,
+			 * assume that any resources the iterator points to
+			 * (eg. a bio_vec array) will persist till the end of
+			 * the op.
+			 */
 			wreq->iter = *iter;
 		}
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] netfs: Fix kernel async DIO
  2025-01-07 18:39 David Howells
@ 2025-01-09 16:19 ` Christian Brauner
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2025-01-09 16:19 UTC (permalink / raw)
  To: David Howells
  Cc: Christian Brauner, Nicolas Baranger, Paulo Alcantara,
	Steve French, Jeff Layton, netfs, linux-cifs, linux-fsdevel,
	linux-kernel

On Tue, 07 Jan 2025 18:39:27 +0000, David Howells wrote:
> Netfslib needs to be able to handle kernel-initiated asynchronous DIO that
> is supplied with a bio_vec[] array.  Currently, because of the async flag,
> this gets passed to netfs_extract_user_iter() which throws a warning and
> fails because it only handles IOVEC and UBUF iterators.  This can be
> triggered through a combination of cifs and a loopback blockdev with
> something like:
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] netfs: Fix kernel async DIO
      https://git.kernel.org/vfs/vfs/c/3f6bc9e3ab9b

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-03-20  8:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <669f22fc89e45dd4e56d75876dc8f2bf@3xo.fr>
2025-01-01 18:00 ` Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) nicolas.baranger
2025-01-06  7:20   ` Christoph Hellwig
2025-01-06  9:13   ` David Howells
2025-01-06  9:16     ` Christoph Hellwig
2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells
2025-01-06 12:07   ` nicolas.baranger
2025-01-07  8:26     ` nicolas.baranger
2025-01-07 14:49     ` David Howells
2025-01-07 18:08       ` Nicolas Baranger
2025-01-06 15:34   ` Christoph Hellwig
2025-03-20  8:46     ` [Linux 6.14 - netfs/cifs] loop on file cat + file copy Nicolas Baranger
2025-01-07 12:03   ` [PATCH] netfs: Fix kernel async DIO Paulo Alcantara
2025-01-07 18:39 David Howells
2025-01-09 16:19 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).