* Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) [not found] <669f22fc89e45dd4e56d75876dc8f2bf@3xo.fr> @ 2025-01-01 18:00 ` nicolas.baranger 2025-01-06 7:20 ` Christoph Hellwig 2025-01-06 9:13 ` David Howells 2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells 1 sibling, 2 replies; 12+ messages in thread From: nicolas.baranger @ 2025-01-01 18:00 UTC (permalink / raw) To: linux-cifs, netfs Dear mainteners Don't know if it's the right place to contact kernel developers but today I did report the following bug which appears in Linux 6.10 and I'm able to reproduce from Linux 6.10 to mainline I think the new way CIFS is using NETFS could be one of the cause of the issue, as doing : git log --pretty=oneline v6.9.12..v6.10 | grep cifs: | grep netfs return : 3ee1a1fc39819906f04d6c62c180e760cd3a689d cifs: Cut over to using netfslib 69c3c023af25edb5433a2db824d3e7cc328f0183 cifs: Implement netfslib hooks dc5939de82f149633d6ec1c403003538442ec9ef cifs: Replace the writedata replay bool with a netfs sreq flag ab58fbdeebc7f9fe8b9bc202660eae3a10e5e678 cifs: Use more fields from netfs_io_subrequest a975a2f22cdce7ec0c678ce8d73d2f6616cb281c cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest 753b67eb630db34e36ec4ae1e86c75e243ea4fc9 cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest The BUG ID is : https://bugzilla.kernel.org/show_bug.cgi?id=219648 Content of #219648 bug: Dear maintener, Since I upgrade 1 server from Linux 6.9.12 to Linux 6.10, I encountered the following problem : Backups of one server are made by writing a dd copy of each LVM volumes snapshots into a BACK FILE (LUKS + BTRFS formated) which reside on a big SAMBA share (SMB 3.1.1). The BACK FILE is mounted on the server using 'losetup --sector-size 4096 --direct-io=on /dev/loop2046 /path/to/back-file/on/samba/share' It has work very fine since years and years and this methode is it still working on the same share ( and different back-files ) from other servers using respectively Linux 6.1.25 and Linux 6.5.10 and Linux 6.9.12 But since I update kernel to Linux 6.10, when I mount the BACK-FILE, at the first write the BACK-FILE becomes read-only and I have this error in kernel logs : [lun. 23 déc. 10:08:49 2024] loop2046: detected capacity change from 0 to 8589934592 [lun. 23 déc. 10:08:51 2024] BTRFS: device fsid a2c979e6-2c6e-4308-a238-55e417a3bcd9 devid 1 transid 395 /dev/mapper/bckcrypt (253:30) scanned by mount (3643571) [lun. 23 déc. 10:08:51 2024] BTRFS info (device dm-30): first mount of filesystem a2c979e6-2c6e-4308-a238-55e417a3bcd9 [lun. 23 déc. 10:08:51 2024] BTRFS info (device dm-30): using crc32c (crc32c-intel) checksum algorithm [lun. 23 déc. 10:08:51 2024] BTRFS info (device dm-30): disk space caching is enabled [lun. 23 déc. 10:08:51 2024] BTRFS warning (device dm-30): space cache v1 is being deprecated and will be removed in a future release, please use -o space_cache=v2 [lun. 23 déc. 10:08:51 2024] BTRFS info (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 15, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:08:54 2024] blk_print_req_error: 62 callbacks suppressed [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014676032 op 0x1:(WRITE) flags 0x8800 phys_seg 6 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014675776 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014688456 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014688968 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014689224 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014689736 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014688200 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014689480 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014688712 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] I/O error, dev loop2046, sector 7014690248 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0 [lun. 23 déc. 10:08:54 2024] btrfs_dev_stat_inc_and_print: 54 callbacks suppressed [lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 16, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO failed ino 361 op 0x8801 offset 0x611000 len 2138112 err no 10 [lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 17, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO failed ino 361 op 0x8801 offset 0x81b000 len 2129920 err no 10 [lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 18, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO failed ino 361 op 0x8801 offset 0xa23000 len 2088960 err no 10 [lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 19, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO failed ino 361 op 0x8801 offset 0x0 len 4190208 err no 10 [lun. 23 déc. 10:08:54 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 20, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:08:54 2024] BTRFS warning (device dm-30): direct IO failed ino 361 op 0x8801 offset 0x3ff000 len 2170880 err no 10 [lun. 23 déc. 10:09:24 2024] blk_print_req_error: 82 callbacks suppressed [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 183168 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 182912 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 21, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 180864 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 181120 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 22, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 181120 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 180864 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 23, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 180864 op 0x1:(WRITE) flags 0x800 phys_seg 4 prio class 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 185216 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 24, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 184960 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 25, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:24 2024] I/O error, dev loop2046, sector 183680 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 26, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 27, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:24 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 28, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:25 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 29, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:25 2024] BTRFS error (device dm-30): bdev /dev/mapper/bckcrypt errs: wr 30, rd 15, flush 0, corrupt 0, gen 0 [lun. 23 déc. 10:09:25 2024] BTRFS: error (device dm-30) in btrfs_commit_transaction:2524: errno=-5 IO failure (Error while writing out transaction) [lun. 23 déc. 10:09:25 2024] BTRFS info (device dm-30 state E): forced readonly [lun. 23 déc. 10:09:25 2024] BTRFS warning (device dm-30 state E): Skipping commit of aborted transaction. [lun. 23 déc. 10:09:25 2024] BTRFS error (device dm-30 state EA): Transaction aborted (error -5) [lun. 23 déc. 10:09:25 2024] BTRFS: error (device dm-30 state EA) in cleanup_transaction:2018: errno=-5 IO failure So First I thought that BTRFS and the fifteen READ / WRITE errors "bdev /dev/mapper/bckcrypt errs: wr 15, rd 15" was responsible of the issue, so I recreate 100% of the back file on the CIFS share the following way : dd if=/dev/urandom of=/mnt/FBX24T/bck0crypt2044 bs=1G count=4096 oflag=direct status=progress losetup --sector-size 4096 --direct-io=on loop2044 /mnt/FBX24T/bck0crypt2044 cryptsetup luksFormat /dev/loop2044 and normally after (but it didn't work here as the back-file becomes read-only) cryptsetup open /dev/loop2044 bck0crypt2044 mkfs.btrfs /dev/mapper/bck0crypt2044 mount /dev/mapper/bck0crypt2044 /mnt/bck0crypt But I cannot do the last part, it breaks at the first write of 'cryptsetup luksFormat /dev/loop2044' When testing to directly format the loop device with BTRFS and XFS and EXT4, I get the same failiure, ie: at the first write IO the loop device become read-only with the provided errors. So I decide to test again, removing --direct-io=on from losetup command and I was able to format with cryptsetup and btrfs (I also tried xfs and ext4, all with same result: broken with --direct-io=on and working without, same after mounting the loop device whatever filesystem it contains) So this is the '--direct-io=on' which breaks (from Linux 6.10 to mainline) when the back-file is on a CIFS share. To validate, I did try '--direct-io=on' on a local filesystem with no issue (same process with cryptsetup and brtfs or xfs or ext4) And to be sure, I did umount the read-only device from the server and remove the losetup device too and I go to my Debian 12 laptop with kernel 6.9.12 and I mount the CIFS share on which the 'broken' back-file reside, creating a loop device (with '--direct-io=on' ) and I open it with cryptsetup and mount it (as previously described). After that I did issue about 1TB of write from my laptop to this loop device with no issues. One of the biggest problem here removing the '--direct-io=on' from losetup command is the read/write performance of the loop device: about 150Mib/s without '--direct-io=on' (which make it unusable for backuping a server every day) and more than 3Gib/s with '--direct-io=on' option. So with '--direct-io=on' , the write speed is between 25 to 50 times the speed I constat without the option (making this solution viable for an 'every day' full backup of 2TB+) When having a deeper look in the kernel logs, I find that the issue is always precede by the following 'netfs:' error: [mer. 1 janv. 10:58:53 2025] ------------[ cut here ]------------ [mer. 1 janv. 10:58:53 2025] WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs] [mer. 1 janv. 10:58:53 2025] Modules linked in: dm_crypt(E) cmac(E) nls_utf8(E) cifs(E) cifs_arc4(E) nls_ucs2_utils(E) cifs_md4(E) dns_resolver(E) netfs(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) rfkill(E) qrtr(E) binfmt_misc(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) ext4(E) crc16(E) mbcache(E) jbd2(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency_common(E) kvm_intel(E) kvm(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) sha1_ssse3(E) snd_intel8x0(E) snd_ac97_codec(E) ac97_bus(E) aesni_intel(E) crypto_simd(E) cryptd(E) snd_pcm(E) joydev(E) rapl(E) snd_timer(E) snd(E) vboxguest(OE) pcspkr(E) soundcore(E) ac(E) sg(E) serio_raw(E) evdev(E) msr(E) parport_pc(E) ppdev(E) lp(E) parport(E) loop(E) configfs(E) efi_pstore(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) dm_mod(E) hid_generic(E) usbhid(E) hid(E) nvme(E) sr_mod(E) cdrom(E) nvme_core(E) t10_pi(E) vmwgfx(E) ahci(E) xhci_pci(E) drm_ttm_helper(E) [mer. 1 janv. 10:58:53 2025] virtio_net(E) ttm(E) libahci(E) net_failover(E) crc64_rocksoft(E) crc32_pclmul(E) drm_kms_helper(E) xhci_hcd(E) failover(E) crc64(E) libata(E) crc32c_intel(E) psmouse(E) crc_t10dif(E) dimlib(E) crct10dif_generic(E) crct10dif_pclmul(E) crct10dif_common(E) scsi_mod(E) usbcore(E) drm(E) scsi_common(E) usb_common(E) i2c_piix4(E) video(E) wmi(E) button(E) [mer. 1 janv. 10:58:53 2025] CPU: 2 PID: 109 Comm: kworker/u35:1 Tainted: G W OE 6.10.0-amd64 #1 [mer. 1 janv. 10:58:53 2025] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [mer. 1 janv. 10:58:53 2025] Workqueue: loop2046 loop_rootcg_workfn [loop] [mer. 1 janv. 10:58:53 2025] RIP: 0010:netfs_extract_user_iter+0x170/0x250 [netfs] [mer. 1 janv. 10:58:53 2025] Code: 00 29 fb 31 ff 89 5a f8 4c 39 d9 75 c2 4d 85 c9 0f 84 c2 00 00 00 45 39 f2 0f 83 b9 00 00 00 4d 89 cd 44 89 d3 e9 35 ff ff ff <0f> 0b 48 c7 c3 fb ff ff ff 48 8b 44 24 28 65 48 2b 04 25 28 00 00 [mer. 1 janv. 10:58:53 2025] RSP: 0018:ffffa44f00887c10 EFLAGS: 00010202 [mer. 1 janv. 10:58:53 2025] RAX: 0000000000000000 RBX: ffff937da084b200 RCX: 0000000000000000 [mer. 1 janv. 10:58:53 2025] RDX: ffff937da084b340 RSI: 0000000000100000 RDI: ffffa44f00887d10 [mer. 1 janv. 10:58:53 2025] RBP: ffff937da0837920 R08: ffffffffc13f7bc0 R09: 0000000000000000 [mer. 1 janv. 10:58:53 2025] R10: ffff937da084b200 R11: ffff937b81149100 R12: 0000000000100000 [mer. 1 janv. 10:58:53 2025] R13: ffffa44f00887d10 R14: ffffffffc0dbbeb0 R15: 0000034702500000 [mer. 1 janv. 10:58:53 2025] FS: 0000000000000000(0000) GS:ffff937e8fb00000(0000) knlGS:0000000000000000 [mer. 1 janv. 10:58:53 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [mer. 1 janv. 10:58:53 2025] CR2: 00007efe292cb9f0 CR3: 00000001c53f4001 CR4: 00000000000706f0 [mer. 1 janv. 10:58:53 2025] Call Trace: [mer. 1 janv. 10:58:53 2025] <TASK> [mer. 1 janv. 10:58:53 2025] ? __warn+0x80/0x120 [mer. 1 janv. 10:58:53 2025] ? netfs_extract_user_iter+0x170/0x250 [netfs] [mer. 1 janv. 10:58:53 2025] ? report_bug+0x164/0x190 [mer. 1 janv. 10:58:53 2025] ? handle_bug+0x41/0x70 [mer. 1 janv. 10:58:53 2025] ? exc_invalid_op+0x17/0x70 [mer. 1 janv. 10:58:53 2025] ? asm_exc_invalid_op+0x1a/0x20 [mer. 1 janv. 10:58:53 2025] ? __pfx_lo_rw_aio_complete+0x10/0x10 [loop] [mer. 1 janv. 10:58:53 2025] ? netfs_extract_user_iter+0x170/0x250 [netfs] [mer. 1 janv. 10:58:53 2025] ? __pfx_lo_rw_aio_complete+0x10/0x10 [loop] [mer. 1 janv. 10:58:53 2025] netfs_unbuffered_write_iter_locked+0x97/0x3a0 [netfs] [mer. 1 janv. 10:58:53 2025] netfs_unbuffered_write_iter+0x177/0x230 [netfs] [mer. 1 janv. 10:58:53 2025] lo_rw_aio.isra.0+0x2ad/0x2d0 [loop] [mer. 1 janv. 10:58:53 2025] loop_process_work+0xae/0x980 [loop] [mer. 1 janv. 10:58:53 2025] ? psi_task_switch+0xd6/0x230 [mer. 1 janv. 10:58:53 2025] ? _raw_spin_unlock+0xe/0x30 [mer. 1 janv. 10:58:53 2025] ? finish_task_switch.isra.0+0x88/0x2d0 [mer. 1 janv. 10:58:53 2025] ? __schedule+0x3f3/0xb40 [mer. 1 janv. 10:58:53 2025] process_one_work+0x17c/0x390 [mer. 1 janv. 10:58:53 2025] worker_thread+0x265/0x380 [mer. 1 janv. 10:58:53 2025] ? __pfx_worker_thread+0x10/0x10 [mer. 1 janv. 10:58:53 2025] kthread+0xd2/0x100 [mer. 1 janv. 10:58:53 2025] ? __pfx_kthread+0x10/0x10 [mer. 1 janv. 10:58:53 2025] ret_from_fork+0x34/0x50 [mer. 1 janv. 10:58:53 2025] ? __pfx_kthread+0x10/0x10 [mer. 1 janv. 10:58:53 2025] ret_from_fork_asm+0x1a/0x30 [mer. 1 janv. 10:58:53 2025] </TASK> [mer. 1 janv. 10:58:53 2025] ---[ end trace 0000000000000000 ]--- [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038117888 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038187520 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038185472 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038183424 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038181376 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038179328 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038177280 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038175232 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038173184 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] I/O error, dev loop2046, sector 7038171136 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 [mer. 1 janv. 10:58:53 2025] BTRFS error (device dm-7): bdev /dev/mapper/bckcrypt errs: wr 16, rd 15, flush 0, corrupt 0, gen 0 [mer. 1 janv. 10:58:53 2025] BTRFS warning (device dm-7): direct IO failed ino 360 op 0x8801 offset 0x0 len 268435456 err no 10 NB: I'm able to reproduce it everytime on Virtual Machines + on physical hardware (server, laptop, pc ...) and while trying to bissect the regression, I build and test nearly all kernels from Linux 6.9.6 to mainline with the exact same result: still working from Linux 6.9.6 to 6.9.12 and failing from Linux 6.10(.0) to mainline I'm not a kernel developer but I stay availiable for help or to reproduce and give traces of this issue (having now some VM in my lab dedicated to build new kernel + to tests and reproduce this particular issue, and if needed a public IPv6 access to those lab machines could be privately share with the maintener and a live demo of the issue can be organised). I would be happy if I could help, let me know how to Thanks for help Kind regards Nicolas Baranger ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) 2025-01-01 18:00 ` Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) nicolas.baranger @ 2025-01-06 7:20 ` Christoph Hellwig 2025-01-06 9:13 ` David Howells 1 sibling, 0 replies; 12+ messages in thread From: Christoph Hellwig @ 2025-01-06 7:20 UTC (permalink / raw) To: nicolas.baranger; +Cc: linux-cifs, netfs, David Howells On Wed, Jan 01, 2025 at 07:00:58PM +0100, nicolas.baranger@3xo.fr wrote: > > Dear mainteners > > Don't know if it's the right place to contact kernel developers but today I > did report the following bug which appears in Linux 6.10 and I'm able to > reproduce from Linux 6.10 to mainline > > I think the new way CIFS is using NETFS could be one of the cause of the > issue, as doing : The poblem is that netfs_extract_user_iter rejects iter types other than ubuf and iovec, which breaks loop which is using bvec iters. It would also break other things like io_uring pre-registered buffers, and all of these are regressions compared to the old cifs code. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) 2025-01-01 18:00 ` Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) nicolas.baranger 2025-01-06 7:20 ` Christoph Hellwig @ 2025-01-06 9:13 ` David Howells 2025-01-06 9:16 ` Christoph Hellwig 1 sibling, 1 reply; 12+ messages in thread From: David Howells @ 2025-01-06 9:13 UTC (permalink / raw) To: Christoph Hellwig; +Cc: dhowells, nicolas.baranger, linux-cifs, netfs Christoph Hellwig <hch@infradead.org> wrote: > > I think the new way CIFS is using NETFS could be one of the cause of the > > issue, as doing : > > The poblem is that netfs_extract_user_iter rejects iter types other than > ubuf and iovec, which breaks loop which is using bvec iters. It would > also break other things like io_uring pre-registered buffers, and all > of these are regressions compared to the old cifs code. Okay, I can reproduce it trivially. Question is, do I need to copy the bio_vec array (or kvec array or folio_queue list) or can I rely on that being maintained till the end of the op? (Obviously, I can't rely on the iov_iter struct itself being maintained). I think I have to copy the contents, just in case. David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) 2025-01-06 9:13 ` David Howells @ 2025-01-06 9:16 ` Christoph Hellwig 0 siblings, 0 replies; 12+ messages in thread From: Christoph Hellwig @ 2025-01-06 9:16 UTC (permalink / raw) To: David Howells; +Cc: Christoph Hellwig, nicolas.baranger, linux-cifs, netfs On Mon, Jan 06, 2025 at 09:13:02AM +0000, David Howells wrote: > Okay, I can reproduce it trivially. Question is, do I need to copy the > bio_vec array (or kvec array or folio_queue list) or can I rely on that being > maintained till the end of the op? (Obviously, I can't rely on the iov_iter > struct itself being maintained). I think I have to copy the contents, just in > case. The bio_vec array can't be freed while I/O is in progress. Take a look at the iov_iter_is_bvec case in bio_iov_iter_get_pages for how simple ITER_BVEC handling can be. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] netfs: Fix kernel async DIO [not found] <669f22fc89e45dd4e56d75876dc8f2bf@3xo.fr> 2025-01-01 18:00 ` Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) nicolas.baranger @ 2025-01-06 11:37 ` David Howells 2025-01-06 12:07 ` nicolas.baranger ` (2 more replies) 1 sibling, 3 replies; 12+ messages in thread From: David Howells @ 2025-01-06 11:37 UTC (permalink / raw) To: nicolas.baranger Cc: dhowells, Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel Hi Nicolas, Does the attached fix your problem? David --- netfs: Fix kernel async DIO Netfslib needs to be able to handle kernel-initiated asynchronous DIO that is supplied with a bio_vec[] array. Currently, because of the async flag, this gets passed to netfs_extract_user_iter() which throws a warning and fails because it only handles IOVEC and UBUF iterators. This can be triggered through a combination of cifs and a loopback blockdev with something like: mount //my/cifs/share /foo dd if=/dev/zero of=/foo/m0 bs=4K count=1K losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0 echo hello >/dev/loop2046 This causes the following to appear in syslog: WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs] and the write to fail. Fix this by removing the check in netfs_unbuffered_write_iter_locked() that causes async kernel DIO writes to be handled as userspace writes. Note that this change relies on the kernel caller maintaining the existence of the bio_vec array (or kvec[] or folio_queue) until the op is complete. Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support") Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr> Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <smfrench@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/netfs/direct_write.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c index eded8afaa60b..42ce53cc216e 100644 --- a/fs/netfs/direct_write.c +++ b/fs/netfs/direct_write.c @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * * allocate a sufficiently large bvec array and may shorten the * request. */ - if (async || user_backed_iter(iter)) { + if (user_backed_iter(iter)) { n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0); if (n < 0) { ret = n; @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * wreq->direct_bv_count = n; wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter); } else { + /* If this is a kernel-generated async DIO request, + * assume that any resources the iterator points to + * (eg. a bio_vec array) will persist till the end of + * the op. + */ wreq->buffer.iter = *iter; } } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] netfs: Fix kernel async DIO 2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells @ 2025-01-06 12:07 ` nicolas.baranger 2025-01-07 8:26 ` nicolas.baranger 2025-01-07 14:49 ` David Howells 2025-01-06 15:34 ` Christoph Hellwig 2025-01-07 12:03 ` [PATCH] netfs: Fix kernel async DIO Paulo Alcantara 2 siblings, 2 replies; 12+ messages in thread From: nicolas.baranger @ 2025-01-06 12:07 UTC (permalink / raw) To: David Howells Cc: Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel Hi David Thanks for the job ! I will buid Linux 6.10 and mainline with the provided change and I'm comming here as soon as I get results from tests (CET working time). Thanks again for help in this issue Nicolas Le 2025-01-06 12:37, David Howells a écrit : > Hi Nicolas, > > Does the attached fix your problem? > > David > --- > netfs: Fix kernel async DIO > > Netfslib needs to be able to handle kernel-initiated asynchronous DIO > that > is supplied with a bio_vec[] array. Currently, because of the async > flag, > this gets passed to netfs_extract_user_iter() which throws a warning > and > fails because it only handles IOVEC and UBUF iterators. This can be > triggered through a combination of cifs and a loopback blockdev with > something like: > > mount //my/cifs/share /foo > dd if=/dev/zero of=/foo/m0 bs=4K count=1K > losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0 > echo hello >/dev/loop2046 > > This causes the following to appear in syslog: > > WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 > netfs_extract_user_iter+0x170/0x250 [netfs] > > and the write to fail. > > Fix this by removing the check in netfs_unbuffered_write_iter_locked() > that > causes async kernel DIO writes to be handled as userspace writes. Note > that this change relies on the kernel caller maintaining the existence > of > the bio_vec array (or kvec[] or folio_queue) until the op is complete. > > Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support") > Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr> > Closes: > https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/ > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Steve French <smfrench@gmail.com> > cc: Jeff Layton <jlayton@kernel.org> > cc: netfs@lists.linux.dev > cc: linux-cifs@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > --- > fs/netfs/direct_write.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c > index eded8afaa60b..42ce53cc216e 100644 > --- a/fs/netfs/direct_write.c > +++ b/fs/netfs/direct_write.c > @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct > kiocb *iocb, struct iov_iter * > * allocate a sufficiently large bvec array and may shorten the > * request. > */ > - if (async || user_backed_iter(iter)) { > + if (user_backed_iter(iter)) { > n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0); > if (n < 0) { > ret = n; > @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct > kiocb *iocb, struct iov_iter * > wreq->direct_bv_count = n; > wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter); > } else { > + /* If this is a kernel-generated async DIO request, > + * assume that any resources the iterator points to > + * (eg. a bio_vec array) will persist till the end of > + * the op. > + */ > wreq->buffer.iter = *iter; > } > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] netfs: Fix kernel async DIO 2025-01-06 12:07 ` nicolas.baranger @ 2025-01-07 8:26 ` nicolas.baranger 2025-01-07 14:49 ` David Howells 1 sibling, 0 replies; 12+ messages in thread From: nicolas.baranger @ 2025-01-07 8:26 UTC (permalink / raw) To: David Howells Cc: Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel Hi David As your patch was written on top on linux-next I was required to make some small modifications to make it work on mainline (6.13-rc6). The following patch is working fine for me on mainline, but i think it would be better to wait for your confirmation / validation (or new patch) before applying it on production. #-------- PATCH --------# diff --git a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c b/linux-6.13-rc6/fs/netfs/direct_write.c index 88f2adf..94a1ee8 100644 --- a/linux-6.13-rc6/nba/_orig_fs.netfs.direct_write.c +++ b/linux-6.13-rc6/fs/netfs/direct_write.c @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * * allocate a sufficiently large bvec array and may shorten the * request. */ - if (async || user_backed_iter(iter)) { + if (user_backed_iter(iter)) { n = netfs_extract_user_iter(iter, len, &wreq->iter, 0); if (n < 0) { ret = n; @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * wreq->direct_bv_count = n; wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter); } else { + /* If this is a kernel-generated async DIO request, + * assume that any resources the iterator points to + * (eg. a bio_vec array) will persist till the end of + * the op. + */ wreq->iter = *iter; } #-------- TESTS --------# Using this patch Linux 6.13-rc6 build with no error and '--direct-io=on' is working : 18:38:47 root@deb12-lab-10d:~# uname -a Linux deb12-lab-10d.lab.lan 6.13.0-rc6-amd64 #0 SMP PREEMPT_DYNAMIC Mon Jan 6 18:14:07 CET 2025 x86_64 GNU/Linux 18:39:29 root@deb12-lab-10d:~# losetup NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC /dev/loop2046 0 0 0 0 /mnt/FBX24T/FS-LAN/bckcrypt2046 1 4096 18:39:32 root@deb12-lab-10d:~# dmsetup ls | grep bckcrypt bckcrypt (254:7) 18:39:55 root@deb12-lab-10d:~# cryptsetup status bckcrypt /dev/mapper/bckcrypt is active and is in use. type: LUKS2 cipher: aes-xts-plain64 keysize: 512 bits key location: keyring device: /dev/loop2046 loop: /mnt/FBX24T/FS-LAN/bckcrypt2046 sector size: 512 offset: 32768 sectors size: 8589901824 sectors mode: read/write 18:40:36 root@deb12-lab-10d:~# df -h | egrep 'cifs|bckcrypt' //10.0.10.100/FBX24T cifs 22T 13T 9,0T 60% /mnt/FBX24T /dev/mapper/bckcrypt btrfs 4,0T 3,3T 779G 82% /mnt/bckcrypt 09:08:44 root@deb12-lab-10d:~# LANG=en_US.UTF-8 09:08:46 root@deb12-lab-10d:~# dd if=/dev/zero of=/mnt/bckcrypt/test/test.dd bs=256M count=16 oflag=direct status=progress 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14 s, 302 MB/s 16+0 records in 16+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 14.2061 s, 302 MB/s No write errors using '--direct-io=on' option of losetup with this patch => writing to the back-file is more than 20x faster ... It seems to be ok ! Let me know if something's wrong in this patch or if it can safely be used in production. Again thanks everyone for help. Nicolas Le 2025-01-06 13:07, nicolas.baranger@3xo.fr a écrit : > Hi David > > Thanks for the job ! > I will buid Linux 6.10 and mainline with the provided change and I'm > comming here as soon as I get results from tests (CET working time). > > Thanks again for help in this issue > Nicolas > > Le 2025-01-06 12:37, David Howells a écrit : > >> Hi Nicolas, >> >> Does the attached fix your problem? >> >> David >> --- >> netfs: Fix kernel async DIO >> >> Netfslib needs to be able to handle kernel-initiated asynchronous DIO >> that >> is supplied with a bio_vec[] array. Currently, because of the async >> flag, >> this gets passed to netfs_extract_user_iter() which throws a warning >> and >> fails because it only handles IOVEC and UBUF iterators. This can be >> triggered through a combination of cifs and a loopback blockdev with >> something like: >> >> mount //my/cifs/share /foo >> dd if=/dev/zero of=/foo/m0 bs=4K count=1K >> losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0 >> echo hello >/dev/loop2046 >> >> This causes the following to appear in syslog: >> >> WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 >> netfs_extract_user_iter+0x170/0x250 [netfs] >> >> and the write to fail. >> >> Fix this by removing the check in netfs_unbuffered_write_iter_locked() >> that >> causes async kernel DIO writes to be handled as userspace writes. >> Note >> that this change relies on the kernel caller maintaining the existence >> of >> the bio_vec array (or kvec[] or folio_queue) until the op is complete. >> >> Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support") >> Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr> >> Closes: >> https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/ >> Signed-off-by: David Howells <dhowells@redhat.com> >> cc: Steve French <smfrench@gmail.com> >> cc: Jeff Layton <jlayton@kernel.org> >> cc: netfs@lists.linux.dev >> cc: linux-cifs@vger.kernel.org >> cc: linux-fsdevel@vger.kernel.org >> --- >> fs/netfs/direct_write.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c >> index eded8afaa60b..42ce53cc216e 100644 >> --- a/fs/netfs/direct_write.c >> +++ b/fs/netfs/direct_write.c >> @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct >> kiocb *iocb, struct iov_iter * >> * allocate a sufficiently large bvec array and may shorten the >> * request. >> */ >> - if (async || user_backed_iter(iter)) { >> + if (user_backed_iter(iter)) { >> n = netfs_extract_user_iter(iter, len, &wreq->buffer.iter, 0); >> if (n < 0) { >> ret = n; >> @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct >> kiocb *iocb, struct iov_iter * >> wreq->direct_bv_count = n; >> wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter); >> } else { >> + /* If this is a kernel-generated async DIO request, >> + * assume that any resources the iterator points to >> + * (eg. a bio_vec array) will persist till the end of >> + * the op. >> + */ >> wreq->buffer.iter = *iter; >> } >> } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] netfs: Fix kernel async DIO 2025-01-06 12:07 ` nicolas.baranger 2025-01-07 8:26 ` nicolas.baranger @ 2025-01-07 14:49 ` David Howells 2025-01-07 18:08 ` Nicolas Baranger 1 sibling, 1 reply; 12+ messages in thread From: David Howells @ 2025-01-07 14:49 UTC (permalink / raw) To: nicolas.baranger Cc: dhowells, Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel Thanks! I ported the patch to linus/master (see below) and it looks pretty much the same as yours, give or take tabs getting converted to spaces. Could I put you down as a Tested-by? David --- netfs: Fix kernel async DIO Netfslib needs to be able to handle kernel-initiated asynchronous DIO that is supplied with a bio_vec[] array. Currently, because of the async flag, this gets passed to netfs_extract_user_iter() which throws a warning and fails because it only handles IOVEC and UBUF iterators. This can be triggered through a combination of cifs and a loopback blockdev with something like: mount //my/cifs/share /foo dd if=/dev/zero of=/foo/m0 bs=4K count=1K losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0 echo hello >/dev/loop2046 This causes the following to appear in syslog: WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs] and the write to fail. Fix this by removing the check in netfs_unbuffered_write_iter_locked() that causes async kernel DIO writes to be handled as userspace writes. Note that this change relies on the kernel caller maintaining the existence of the bio_vec array (or kvec[] or folio_queue) until the op is complete. Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support") Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr> Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <smfrench@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org --- fs/netfs/direct_write.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c index 173e8b5e6a93..f9421f3e6d37 100644 --- a/fs/netfs/direct_write.c +++ b/fs/netfs/direct_write.c @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * * allocate a sufficiently large bvec array and may shorten the * request. */ - if (async || user_backed_iter(iter)) { + if (user_backed_iter(iter)) { n = netfs_extract_user_iter(iter, len, &wreq->iter, 0); if (n < 0) { ret = n; @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * wreq->direct_bv_count = n; wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter); } else { + /* If this is a kernel-generated async DIO request, + * assume that any resources the iterator points to + * (eg. a bio_vec array) will persist till the end of + * the op. + */ wreq->iter = *iter; } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] netfs: Fix kernel async DIO 2025-01-07 14:49 ` David Howells @ 2025-01-07 18:08 ` Nicolas Baranger 0 siblings, 0 replies; 12+ messages in thread From: Nicolas Baranger @ 2025-01-07 18:08 UTC (permalink / raw) To: David Howells Cc: Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel Hi David Sure you can ! Please also note that after building 'linux-next' and applying the first patch you provide I sucessfully test DIO write (same test process as before). It works fine too ! I stay availiable for further testing Thanks again for help (special thanks to Christoph and David) Nicolas Le 2025-01-07 15:49, David Howells a écrit : > Thanks! > > I ported the patch to linus/master (see below) and it looks pretty much > the > same as yours, give or take tabs getting converted to spaces. > > Could I put you down as a Tested-by? > > David > > --- > netfs: Fix kernel async DIO > > Netfslib needs to be able to handle kernel-initiated asynchronous DIO > that > is supplied with a bio_vec[] array. Currently, because of the async > flag, > this gets passed to netfs_extract_user_iter() which throws a warning > and > fails because it only handles IOVEC and UBUF iterators. This can be > triggered through a combination of cifs and a loopback blockdev with > something like: > > mount //my/cifs/share /foo > dd if=/dev/zero of=/foo/m0 bs=4K count=1K > losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0 > echo hello >/dev/loop2046 > > This causes the following to appear in syslog: > > WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 > netfs_extract_user_iter+0x170/0x250 [netfs] > > and the write to fail. > > Fix this by removing the check in netfs_unbuffered_write_iter_locked() > that > causes async kernel DIO writes to be handled as userspace writes. Note > that this change relies on the kernel caller maintaining the existence > of > the bio_vec array (or kvec[] or folio_queue) until the op is complete. > > Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support") > Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr> > Closes: > https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/ > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Steve French <smfrench@gmail.com> > cc: Jeff Layton <jlayton@kernel.org> > cc: netfs@lists.linux.dev > cc: linux-cifs@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > --- > fs/netfs/direct_write.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c > index 173e8b5e6a93..f9421f3e6d37 100644 > --- a/fs/netfs/direct_write.c > +++ b/fs/netfs/direct_write.c > @@ -67,7 +67,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct > kiocb *iocb, struct iov_iter * > * allocate a sufficiently large bvec array and may shorten the > * request. > */ > - if (async || user_backed_iter(iter)) { > + if (user_backed_iter(iter)) { > n = netfs_extract_user_iter(iter, len, &wreq->iter, 0); > if (n < 0) { > ret = n; > @@ -77,6 +77,11 @@ ssize_t netfs_unbuffered_write_iter_locked(struct > kiocb *iocb, struct iov_iter * > wreq->direct_bv_count = n; > wreq->direct_bv_unpin = iov_iter_extract_will_pin(iter); > } else { > + /* If this is a kernel-generated async DIO request, > + * assume that any resources the iterator points to > + * (eg. a bio_vec array) will persist till the end of > + * the op. > + */ > wreq->iter = *iter; > } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] netfs: Fix kernel async DIO 2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells 2025-01-06 12:07 ` nicolas.baranger @ 2025-01-06 15:34 ` Christoph Hellwig 2025-03-20 8:46 ` [Linux 6.14 - netfs/cifs] loop on file cat + file copy Nicolas Baranger 2025-01-07 12:03 ` [PATCH] netfs: Fix kernel async DIO Paulo Alcantara 2 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2025-01-06 15:34 UTC (permalink / raw) To: David Howells Cc: nicolas.baranger, Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel On Mon, Jan 06, 2025 at 11:37:24AM +0000, David Howells wrote: > mount //my/cifs/share /foo > dd if=/dev/zero of=/foo/m0 bs=4K count=1K > losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0 > echo hello >/dev/loop2046 Can you add a testcase using losetup --direct-io with a file on $TEST_DIR so that we get coverage for ITER_BVEC directio to xfstests? ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Linux 6.14 - netfs/cifs] loop on file cat + file copy 2025-01-06 15:34 ` Christoph Hellwig @ 2025-03-20 8:46 ` Nicolas Baranger 0 siblings, 0 replies; 12+ messages in thread From: Nicolas Baranger @ 2025-03-20 8:46 UTC (permalink / raw) To: Christoph Hellwig Cc: David Howells, Steve French, Jeff Layton, Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel Hi Christoph Sorry to contact you again but last time you and David H. help me a lot with 'kernel async DIO' / 'Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline)' I don't know if it had already been reported but after building Linux 6.14-rc1 I constat the following behaviour: 'cat' command is going on a loop when I cat a file which reside on cifs share And so 'cp' command does the same: it copy the content of a file on cifs share and loop writing it to the destination I did test with a file named 'toto' and containing only ascii string 'toto'. When I started copying it from cifs share to local filesystem, I had to CTRL+C the copy of this 5 bytes file after some time because the destination file was using all the filesystem free space and containing billions of 'toto' lines Here is an example with cat: CIFS SHARE is mounted as /mnt/fbx/FBX-24T CIFS mount options: grep cifs /proc/mounts //10.0.10.100/FBX24T /mnt/fbx/FBX-24T cifs rw,nosuid,nodev,noexec,relatime,vers=3.1.1,cache=none,upcall_target=app,username=fbx,domain=HOMELAN,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.10.100,file_mode=0666,dir_mode=0755,iocharset=utf8,soft,nounix,serverino,mapposix,mfsymlinks,reparse=nfs,nativesocket,symlink=mfsymlinks,rsize=65536,wsize=65536,bsize=16777216,retrans=1,echo_interval=60,actimeo=1,closetimeo=1 0 0 KERNEL: uname -a Linux 14RV-SERVER.14rv.lan 6.14.0.1-ast-rc2-amd64 #0 SMP PREEMPT_DYNAMIC Wed Feb 12 18:23:00 CET 2025 x86_64 GNU/Linux To be reproduced: echo toto >/mnt/fbx/FBX-24T/toto ls -l /mnt/fbx/FBX-24T/toto -rw-rw-rw- 1 root root 5 20 mars 09:20 /mnt/fbx/FBX-24T/toto cat /mnt/fbx/FBX-24T/toto toto toto toto toto toto toto toto ^C strace cat /mnt/fbx/FBX-24T/toto execve("/usr/bin/cat", ["cat", "/mnt/fbx/FBX-24T/toto"], 0x7ffc39b41848 /* 19 vars */) = 0 brk(NULL) = 0x55755b1c1000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f55f95d6000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "glibc-hwcaps/x86-64-v3/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "glibc-hwcaps/x86-64-v2/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v3/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v3", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v2/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/glibc-hwcaps/x86-64-v2", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/x86_64", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/haswell", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/x86_64", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/tls", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/x86_64", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/haswell", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/x86_64", 0x7fff25937800, 0) = -1 ENOENT (Aucun fichier ou dossier de ce type) openat(AT_FDCWD, "/usr/local/cuda-12.6/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (Aucun fichier ou dossier de ce type) newfstatat(AT_FDCWD, "/usr/local/cuda-12.6/lib64", {st_mode=S_IFDIR|S_ISGID|0755, st_size=4570, ...}, 0) = 0 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=148466, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 148466, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f55f95b1000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20t\2\0\0\0\0\0"..., 832) = 832 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1922136, ...}, AT_EMPTY_PATH) = 0 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55f93d0000 mmap(0x7f55f93f6000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f55f93f6000 mmap(0x7f55f954b000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f55f954b000 mmap(0x7f55f959e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f55f959e000 mmap(0x7f55f95a4000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f55f95a4000 close(3) = 0 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f55f93cd000 arch_prctl(ARCH_SET_FS, 0x7f55f93cd740) = 0 set_tid_address(0x7f55f93cda10) = 38427 set_robust_list(0x7f55f93cda20, 24) = 0 rseq(0x7f55f93ce060, 0x20, 0, 0x53053053) = 0 mprotect(0x7f55f959e000, 16384, PROT_READ) = 0 mprotect(0x55754475e000, 4096, PROT_READ) = 0 mprotect(0x7f55f960e000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7f55f95b1000, 148466) = 0 getrandom("\x19\x6b\x9e\x55\x7e\x09\x74\x5f", 8, GRND_NONBLOCK) = 8 brk(NULL) = 0x55755b1c1000 brk(0x55755b1e2000) = 0x55755b1e2000 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=3048928, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 3048928, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f55f9000000 close(3) = 0 newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0), ...}, AT_EMPTY_PATH) = 0 openat(AT_FDCWD, "/mnt/fbx/FBX-24T/toto", O_RDONLY) = 3 newfstatat(3, "", {st_mode=S_IFREG|0666, st_size=5, ...}, AT_EMPTY_PATH) = 0 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 mmap(NULL, 16785408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f55f7ffe000 read(3, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16777216) = 16711680 write(1, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16711680toto ) = 16711680 read(3, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16777216) = 16711680 write(1, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16711680toto ) = 16711680 read(3, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16777216) = 16711680 write(1, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16711680toto ) = 16711680 read(3, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16777216) = 16711680 write(1, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16711680toto ) = 16711680 read(3, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16777216) = 16711680 write(1, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16711680toto ) = 16711680 read(3, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16777216) = 16711680 write(1, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16711680toto ) = 16711680 read(3, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16777216) = 16711680 write(1, "toto\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16711680toto ^Cstrace: Process 38427 detached <detached ...> Please let me know if it had already been fixed or reported and if you're able to reproduce this issue. Thanks for help Kind regards Nicolas Baranger ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] netfs: Fix kernel async DIO 2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells 2025-01-06 12:07 ` nicolas.baranger 2025-01-06 15:34 ` Christoph Hellwig @ 2025-01-07 12:03 ` Paulo Alcantara 2 siblings, 0 replies; 12+ messages in thread From: Paulo Alcantara @ 2025-01-07 12:03 UTC (permalink / raw) To: David Howells, nicolas.baranger Cc: dhowells, Steve French, Christoph Hellwig, Jeff Layton, Christian Brauner, netfs, linux-cifs, linux-fsdevel, linux-kernel David Howells <dhowells@redhat.com> writes: > netfs: Fix kernel async DIO > > Netfslib needs to be able to handle kernel-initiated asynchronous DIO that > is supplied with a bio_vec[] array. Currently, because of the async flag, > this gets passed to netfs_extract_user_iter() which throws a warning and > fails because it only handles IOVEC and UBUF iterators. This can be > triggered through a combination of cifs and a loopback blockdev with > something like: > > mount //my/cifs/share /foo > dd if=/dev/zero of=/foo/m0 bs=4K count=1K > losetup --sector-size 4096 --direct-io=on /dev/loop2046 /foo/m0 > echo hello >/dev/loop2046 > > This causes the following to appear in syslog: > > WARNING: CPU: 2 PID: 109 at fs/netfs/iterator.c:50 netfs_extract_user_iter+0x170/0x250 [netfs] > > and the write to fail. > > Fix this by removing the check in netfs_unbuffered_write_iter_locked() that > causes async kernel DIO writes to be handled as userspace writes. Note > that this change relies on the kernel caller maintaining the existence of > the bio_vec array (or kvec[] or folio_queue) until the op is complete. > > Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support") > Reported by: Nicolas Baranger <nicolas.baranger@3xo.fr> > Closes: https://lore.kernel.org/r/fedd8a40d54b2969097ffa4507979858@3xo.fr/ > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Steve French <smfrench@gmail.com> > cc: Jeff Layton <jlayton@kernel.org> > cc: netfs@lists.linux.dev > cc: linux-cifs@vger.kernel.org > cc: linux-fsdevel@vger.kernel.org > --- > fs/netfs/direct_write.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) LGTM. Feel free to add: Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Thanks Christoph and Dave! ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-03-20 8:56 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <669f22fc89e45dd4e56d75876dc8f2bf@3xo.fr>
2025-01-01 18:00 ` Losetup Direct I/O breaks BACK-FILE filesystem on CIFS share (Appears in Linux 6.10 and reproduced on mainline) nicolas.baranger
2025-01-06 7:20 ` Christoph Hellwig
2025-01-06 9:13 ` David Howells
2025-01-06 9:16 ` Christoph Hellwig
2025-01-06 11:37 ` [PATCH] netfs: Fix kernel async DIO David Howells
2025-01-06 12:07 ` nicolas.baranger
2025-01-07 8:26 ` nicolas.baranger
2025-01-07 14:49 ` David Howells
2025-01-07 18:08 ` Nicolas Baranger
2025-01-06 15:34 ` Christoph Hellwig
2025-03-20 8:46 ` [Linux 6.14 - netfs/cifs] loop on file cat + file copy Nicolas Baranger
2025-01-07 12:03 ` [PATCH] netfs: Fix kernel async DIO Paulo Alcantara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).