* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
[not found] <1107690.1750683895@warthog.procyon.org.uk>
@ 2025-06-24 12:25 ` Stefan Metzmacher
2025-06-25 14:18 ` Stefan Metzmacher
2025-06-25 16:00 ` David Howells
2025-06-24 14:22 ` David Howells
2025-06-25 12:47 ` David Howells
2 siblings, 2 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-24 12:25 UTC (permalink / raw)
To: David Howells
Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French
Hi David,
this looks very useful! Just a few comments below...
> Collapse smbd_recv_buf() and smbd_recv_page() into smbd_recv() and just use
> copy_to_iter() instead of memcpy().
>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <stfrench@microsoft.com>
> cc: Tom Talpey <tom@talpey.com>
> cc: Stefan Metzmacher <metze@samba.org>
> cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: linux-cifs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
> fs/smb/client/smbdirect.c | 116 +++++++---------------------------------------
> 1 file changed, 20 insertions(+), 96 deletions(-)
>
> diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c
> index 5ae847919da5..dc64c337aae0 100644
> --- a/fs/smb/client/smbdirect.c
> +++ b/fs/smb/client/smbdirect.c
> @@ -1747,35 +1747,39 @@ struct smbd_connection *smbd_get_connection(
> }
>
> /*
> - * Receive data from receive reassembly queue
> + * Receive data from the transport's receive reassembly queue
> * All the incoming data packets are placed in reassembly queue
> - * buf: the buffer to read data into
> + * iter: the buffer to read data into
> * size: the length of data to read
> * return value: actual data read
> - * Note: this implementation copies the data from reassebmly queue to receive
> + *
> + * Note: this implementation copies the data from reassembly queue to receive
> * buffers used by upper layer. This is not the optimal code path. A better way
> * to do it is to not have upper layer allocate its receive buffers but rather
> * borrow the buffer from reassembly queue, and return it after data is
> * consumed. But this will require more changes to upper layer code, and also
> * need to consider packet boundaries while they still being reassembled.
> */
> -static int smbd_recv_buf(struct smbd_connection *info, char *buf,
> - unsigned int size)
> +int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
> {
> struct smbdirect_socket *sc = &info->socket;
> struct smbd_response *response;
> struct smbdirect_data_transfer *data_transfer;
> + size_t size = msg->msg_iter.count;
I think this should be iov_iter_count()?
> int to_copy, to_read, data_read, offset;
> u32 data_length, remaining_data_length, data_offset;
> int rc;
>
> + if (WARN_ON_ONCE(iov_iter_rw(&msg->msg_iter) == WRITE))
> + return -EINVAL; /* It's a bug in upper layer to get there */
> +
> again:
> /*
> * No need to hold the reassembly queue lock all the time as we are
> * the only one reading from the front of the queue. The transport
> * may add more entries to the back of the queue at the same time
> */
> - log_read(INFO, "size=%d info->reassembly_data_length=%d\n", size,
> + log_read(INFO, "size=%zd info->reassembly_data_length=%d\n", size,
> info->reassembly_data_length);
> if (info->reassembly_data_length >= size) {
> int queue_length;
> @@ -1811,9 +1815,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
> * transport layer is added
> */
> if (response->first_segment && size == 4) {
> - unsigned int rfc1002_len =
> + unsigned int len =
> data_length + remaining_data_length;
> - *((__be32 *)buf) = cpu_to_be32(rfc1002_len);
> + __be32 rfc1002_len = cpu_to_be32(len);
> + if (copy_to_iter(&rfc1002_len, sizeof(rfc1002_len),
> + &msg->msg_iter) != sizeof(rfc1002_len))
> + return -EFAULT;
> data_read = 4;
> response->first_segment = false;
> log_read(INFO, "returning rfc1002 length %d\n",
> @@ -1822,10 +1829,9 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
> }
>
> to_copy = min_t(int, data_length - offset, to_read);
> - memcpy(
> - buf + data_read,
> - (char *)data_transfer + data_offset + offset,
> - to_copy);
> + if (copy_to_iter((char *)data_transfer + data_offset + offset,
> + to_copy, &msg->msg_iter) != to_copy)
> + return -EFAULT;
>
> /* move on to the next buffer? */
> if (to_copy == data_length - offset) {
> @@ -1870,6 +1876,8 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
> data_read, info->reassembly_data_length,
> info->first_entry_offset);
> read_rfc1002_done:
> + /* SMBDirect will read it all or nothing */
> + msg->msg_iter.count = 0;
And this iov_iter_truncate(0);
While I'm wondering why we had this at all.
It seems all callers of cifs_read_iter_from_socket()
don't care and the code path via sock_recvmsg() doesn't
truncate it just calls copy_to_iter() via this chain:
->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
->simple_copy_to_iter->copy_to_iter()
I think the old code should have called
iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
But the new code doesn't need it as copy_to_iter()
calls iterate_and_advance().
> return data_read;
> }
>
> @@ -1890,90 +1898,6 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
> goto again;
> }
>
> -/*
> - * Receive a page from receive reassembly queue
> - * page: the page to read data into
> - * to_read: the length of data to read
> - * return value: actual data read
> - */
> -static int smbd_recv_page(struct smbd_connection *info,
> - struct page *page, unsigned int page_offset,
> - unsigned int to_read)
> -{
> - struct smbdirect_socket *sc = &info->socket;
> - int ret;
> - char *to_address;
> - void *page_address;
> -
> - /* make sure we have the page ready for read */
> - ret = wait_event_interruptible(
> - info->wait_reassembly_queue,
> - info->reassembly_data_length >= to_read ||
> - sc->status != SMBDIRECT_SOCKET_CONNECTED);
> - if (ret)
> - return ret;
> -
> - /* now we can read from reassembly queue and not sleep */
> - page_address = kmap_atomic(page);
> - to_address = (char *) page_address + page_offset;
> -
> - log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
> - page, to_address, to_read);
> -
> - ret = smbd_recv_buf(info, to_address, to_read);
> - kunmap_atomic(page_address);
> -
> - return ret;
> -}
> -
> -/*
> - * Receive data from transport
> - * msg: a msghdr point to the buffer, can be ITER_KVEC or ITER_BVEC
> - * return: total bytes read, or 0. SMB Direct will not do partial read.
> - */
> -int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
> -{
> - char *buf;
> - struct page *page;
> - unsigned int to_read, page_offset;
> - int rc;
> -
> - if (iov_iter_rw(&msg->msg_iter) == WRITE) {
> - /* It's a bug in upper layer to get there */
> - cifs_dbg(VFS, "Invalid msg iter dir %u\n",
> - iov_iter_rw(&msg->msg_iter));
> - rc = -EINVAL;
> - goto out;
> - }
> -
> - switch (iov_iter_type(&msg->msg_iter)) {
> - case ITER_KVEC:
> - buf = msg->msg_iter.kvec->iov_base;
> - to_read = msg->msg_iter.kvec->iov_len;
> - rc = smbd_recv_buf(info, buf, to_read);
> - break;
> -
> - case ITER_BVEC:
> - page = msg->msg_iter.bvec->bv_page;
> - page_offset = msg->msg_iter.bvec->bv_offset;
> - to_read = msg->msg_iter.bvec->bv_len;
> - rc = smbd_recv_page(info, page, page_offset, to_read);
> - break;
> -
> - default:
> - /* It's a bug in upper layer to get there */
> - cifs_dbg(VFS, "Invalid msg type %d\n",
> - iov_iter_type(&msg->msg_iter));
> - rc = -EINVAL;
> - }
I guess this is actually a real fix as I just saw
CIFS: VFS: Invalid msg type 4
in logs while running the cifs/001 test.
And 4 is ITER_FOLIOQ.
So there might be something broken when ITER_FOLIOQ was
introduced, but I wasn't able to find a specific commit.
Maybe it was also already broken when using
smb3 encryption over smbdirect, when ITER_XARRAY was still used.
metze
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
[not found] <1107690.1750683895@warthog.procyon.org.uk>
2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
@ 2025-06-24 14:22 ` David Howells
2025-06-24 16:05 ` Stefan Metzmacher
2025-06-25 8:07 ` Stefan Metzmacher
2025-06-25 12:47 ` David Howells
2 siblings, 2 replies; 10+ messages in thread
From: David Howells @ 2025-06-24 14:22 UTC (permalink / raw)
To: Stefan Metzmacher
Cc: dhowells, linux-cifs@vger.kernel.org, netfs, linux-fsdevel,
Steve French
Stefan Metzmacher <metze@samba.org> wrote:
> > read_rfc1002_done:
> > + /* SMBDirect will read it all or nothing */
> > + msg->msg_iter.count = 0;
>
> And this iov_iter_truncate(0);
Actually, it should probably have been iov_iter_advance().
> While I'm wondering why we had this at all.
>
> It seems all callers of cifs_read_iter_from_socket()
> don't care and the code path via sock_recvmsg() doesn't
> truncate it just calls copy_to_iter() via this chain:
> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
> ->simple_copy_to_iter->copy_to_iter()
>
> I think the old code should have called
> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
>
> But the new code doesn't need it as copy_to_iter()
> calls iterate_and_advance().
Yeah, it should. I seem to remember that there were situations in which it
didn't, but it's possible I managed to get rid of them.
> > - default:
> > - /* It's a bug in upper layer to get there */
> > - cifs_dbg(VFS, "Invalid msg type %d\n",
> > - iov_iter_type(&msg->msg_iter));
> > - rc = -EINVAL;
> > - }
>
> I guess this is actually a real fix as I just saw
> CIFS: VFS: Invalid msg type 4
> in logs while running the cifs/001 test.
> And 4 is ITER_FOLIOQ.
Ah... Were you using "-o seal"? The encrypted data is held in a buffer formed
from a folioq with a series of folios in it.
David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
2025-06-24 14:22 ` David Howells
@ 2025-06-24 16:05 ` Stefan Metzmacher
2025-06-25 8:07 ` Stefan Metzmacher
1 sibling, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-24 16:05 UTC (permalink / raw)
To: David Howells
Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French
Hi David,
>>> read_rfc1002_done:
>>> + /* SMBDirect will read it all or nothing */
>>> + msg->msg_iter.count = 0;
>>
>> And this iov_iter_truncate(0);
>
> Actually, it should probably have been iov_iter_advance().
>
>> While I'm wondering why we had this at all.
>>
>> It seems all callers of cifs_read_iter_from_socket()
>> don't care and the code path via sock_recvmsg() doesn't
>> truncate it just calls copy_to_iter() via this chain:
>> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
>> ->simple_copy_to_iter->copy_to_iter()
>>
>> I think the old code should have called
>> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
>>
>> But the new code doesn't need it as copy_to_iter()
>> calls iterate_and_advance().
>
> Yeah, it should. I seem to remember that there were situations in which it
> didn't, but it's possible I managed to get rid of them.
>
>>> - default:
>>> - /* It's a bug in upper layer to get there */
>>> - cifs_dbg(VFS, "Invalid msg type %d\n",
>>> - iov_iter_type(&msg->msg_iter));
>>> - rc = -EINVAL;
>>> - }
>>
>> I guess this is actually a real fix as I just saw
>> CIFS: VFS: Invalid msg type 4
>> in logs while running the cifs/001 test.
>> And 4 is ITER_FOLIOQ.
>
> Ah... Were you using "-o seal"? The encrypted data is held in a buffer formed
> from a folioq with a series of folios in it.
In local.config I have this:
[smb3-1-rdma]
FSTYP=cifs
TEST_DEV=//172.31.9.1/TEST
TEST_DIR=/mnt/test
TEST_FS_MOUNT_OPTS='-ousername=administrator,password=...,rdma,noperm,vers=3.0,mfsymlinks,actimeo=0'
export MOUNT_OPTIONS='-ousername=administrator,password=...,rdma,noperm,mfsymlinks,actimeo=0'
export SCRATCH_DEV=//172.31.9.1/SCRATCH
export SCRATCH_MNT=/mnt/scratch
And called:
./check -s smb3-1-rdma -E tests/cifs/exclude.incompatible-smb3.txt -E tests/cifs/exclude.very-slow.txt cifs/001
So I don't think it used seal, I'm also not seeing encrypted stuff in the capture,
so maybe the folioq iter comes from a higher layer (via cifs_readv_receive()) instead
of receive_encrypted_read().
metze
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
2025-06-24 14:22 ` David Howells
2025-06-24 16:05 ` Stefan Metzmacher
@ 2025-06-25 8:07 ` Stefan Metzmacher
2025-06-25 10:10 ` Stefan Metzmacher
2025-06-25 11:25 ` David Howells
1 sibling, 2 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-25 8:07 UTC (permalink / raw)
To: David Howells
Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French
Am 24.06.25 um 16:22 schrieb David Howells:
> Stefan Metzmacher <metze@samba.org> wrote:
>
>>> read_rfc1002_done:
>>> + /* SMBDirect will read it all or nothing */
>>> + msg->msg_iter.count = 0;
>>
>> And this iov_iter_truncate(0);
>
> Actually, it should probably have been iov_iter_advance().
>
>> While I'm wondering why we had this at all.
>>
>> It seems all callers of cifs_read_iter_from_socket()
>> don't care and the code path via sock_recvmsg() doesn't
>> truncate it just calls copy_to_iter() via this chain:
>> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
>> ->simple_copy_to_iter->copy_to_iter()
>>
>> I think the old code should have called
>> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
>>
>> But the new code doesn't need it as copy_to_iter()
>> calls iterate_and_advance().
>
> Yeah, it should. I seem to remember that there were situations in which it
> didn't, but it's possible I managed to get rid of them.
>
>>> - default:
>>> - /* It's a bug in upper layer to get there */
>>> - cifs_dbg(VFS, "Invalid msg type %d\n",
>>> - iov_iter_type(&msg->msg_iter));
>>> - rc = -EINVAL;
>>> - }
>>
>> I guess this is actually a real fix as I just saw
>> CIFS: VFS: Invalid msg type 4
>> in logs while running the cifs/001 test.
>> And 4 is ITER_FOLIOQ.
>
> Ah... Were you using "-o seal"? The encrypted data is held in a buffer formed
> from a folioq with a series of folios in it.
I know tested it standalone in this tree:
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=46a31189b8b059b3595a9586714761e6e76ba7c4
Doing following mount:
mount -t cifs -ousername=administrator,password=...,rdma,noperm,vers=3.0,mfsymlinks,actimeo=0 //172.31.9.1/test /mnt/test/
It's using the siw driver (with modifications to work against the chelsio t404-bt card on windows) from
here:
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=5b89ff89f440ec36cf2c5ed2212be0d8523a4c9b
But the siw difference should not really matter.
This realiable generates this:
[ 922.048997] [ T6639] CIFS: Attempting to mount //172.31.9.1/test
[ 922.188445] [ T6639] CIFS: VFS: RDMA transport established
[ 922.217974] [ T6642] usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)!
[ 922.218221] [ T6642] ------------[ cut here ]------------
[ 922.218230] [ T6642] kernel BUG at mm/usercopy.c:102!
[ 922.218299] [ T6642] Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
[ 922.218439] [ T6642] CPU: 1 UID: 0 PID: 6642 Comm: cifsd Kdump: loaded Tainted: G OE 6.16.0-rc3-metze.01+ #1 PREEMPT(voluntary)
[ 922.218585] [ T6642] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 922.218635] [ T6642] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 922.218704] [ T6642] RIP: 0010:usercopy_abort+0x6c/0x80
[ 922.218783] [ T6642] Code: fa 91 51 48 c7 c2 c0 d4 fa 91 41 52 48 c7 c7 40 d5 fa 91 48 0f 45 d6 48 c7 c6 00 d5 fa 91 48 89 c1 49 0f 45 f3 e8 84 aa 6b ff <0f> 0b 49 c7
c1 c0 d3 fa 91 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90
[ 922.218925] [ T6642] RSP: 0018:ffffc90001887820 EFLAGS: 00010246
[ 922.218983] [ T6642] RAX: 0000000000000079 RBX: 0000000000000051 RCX: 0000000000000000
[ 922.219046] [ T6642] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 922.219108] [ T6642] RBP: ffffc90001887838 R08: 0000000000000000 R09: 0000000000000000
[ 922.219201] [ T6642] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003f
[ 922.219261] [ T6642] R13: ffff88801f579280 R14: 0000000000000001 R15: ffffea0000163340
[ 922.219323] [ T6642] FS: 0000000000000000(0000) GS:ffff8881466e8000(0000) knlGS:0000000000000000
[ 922.219415] [ T6642] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 922.219469] [ T6642] CR2: 000075a216d19bb8 CR3: 000000000f5f6004 CR4: 00000000000726f0
[ 922.219560] [ T6642] Call Trace:
[ 922.219591] [ T6642] <TASK>
[ 922.219624] [ T6642] __check_heap_object+0xe3/0x120
[ 922.221090] [ T6642] __check_object_size+0x4dc/0x6d0
[ 922.222547] [ T6642] smbd_recv+0x77f/0xfe0 [cifs]
[ 922.224416] [ T6642] ? __pfx_smbd_recv+0x10/0x10 [cifs]
[ 922.226195] [ T6642] ? __kasan_check_write+0x14/0x30
[ 922.227722] [ T6642] ? _raw_spin_lock+0x81/0xf0
[ 922.229190] [ T6642] ? __pfx__raw_spin_lock+0x10/0x10
[ 922.230699] [ T6642] ? sched_clock_noinstr+0x9/0x10
[ 922.232248] [ T6642] cifs_readv_from_socket+0x276/0x8f0 [cifs]
[ 922.234149] [ T6642] ? __pfx_cifs_readv_from_socket+0x10/0x10 [cifs]
[ 922.236222] [ T6642] ? mempool_alloc_slab+0x15/0x20
[ 922.237705] [ T6642] cifs_read_from_socket+0xcd/0x120 [cifs]
[ 922.239559] [ T6642] ? __pfx_cifs_read_from_socket+0x10/0x10 [cifs]
[ 922.241403] [ T6642] ? __pfx_mempool_alloc_noprof+0x10/0x10
[ 922.242827] [ T6642] ? __kasan_check_write+0x14/0x30
[ 922.244141] [ T6642] ? cifs_small_buf_get+0x62/0x90 [cifs]
[ 922.245500] [ T6642] ? allocate_buffers+0x216/0x390 [cifs]
[ 922.246810] [ T6642] cifs_demultiplex_thread+0x7e9/0x2d50 [cifs]
[ 922.248150] [ T6642] ? _raw_spin_lock_irqsave+0x95/0x100
[ 922.249143] [ T6642] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
[ 922.250163] [ T6642] ? __pfx___schedule+0x10/0x10
[ 922.250977] [ T6642] ? _raw_spin_lock_irqsave+0x95/0x100
[ 922.251715] [ T6642] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 922.252415] [ T6642] ? __pfx_try_to_wake_up+0x10/0x10
[ 922.253094] [ T6642] ? __kasan_check_read+0x11/0x20
[ 922.253766] [ T6642] ? __kthread_parkme+0xa0/0x190
[ 922.254344] [ T6642] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
[ 922.255073] [ T6642] kthread+0x396/0x830
[ 922.255584] [ T6642] ? __pfx__raw_spin_lock_irq+0x10/0x10
[ 922.256070] [ T6642] ? __pfx_kthread+0x10/0x10
[ 922.256568] [ T6642] ? __kasan_check_write+0x14/0x30
[ 922.257047] [ T6642] ? recalc_sigpending+0x180/0x210
[ 922.257535] [ T6642] ? _raw_spin_unlock_irq+0xe/0x50
[ 922.258015] [ T6642] ? calculate_sigpending+0x84/0xb0
[ 922.258509] [ T6642] ? __pfx_kthread+0x10/0x10
[ 922.258976] [ T6642] ret_from_fork+0x2b8/0x3b0
[ 922.259377] [ T6642] ? __pfx_kthread+0x10/0x10
[ 922.259757] [ T6642] ret_from_fork_asm+0x1a/0x30
[ 922.260133] [ T6642] </TASK>
[ 922.260514] [ T6642] Modules linked in: cifs(OE) ccm cmac nls_utf8 cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 netfs siw(OE) ib_uverbs ib_core softdog vboxsf
vboxguest intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulni
ghash_clmulni_intel sha1_ssse3 aesni_intel rapl i2c_piix4 i2c_smbus input_leds joydev mac_hid sunrpc binfmt_misc kvm_intel kvm irqbypass sch_fq_codel efi_pstore nfnetlink
vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs ip_tables x_tables autofs4 hid_generic vboxvideo drm_vram_helper usbhid
drm_ttm_helper vga16fb hid vgastate ahci ttm libahci video pata_acpi psmouse serio_raw wmi [last unloaded: cifs(OE)]
Reverting it fixes it again.
metze
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
2025-06-25 8:07 ` Stefan Metzmacher
@ 2025-06-25 10:10 ` Stefan Metzmacher
2025-06-25 11:25 ` David Howells
1 sibling, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-25 10:10 UTC (permalink / raw)
To: David Howells
Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French
Am 25.06.25 um 10:07 schrieb Stefan Metzmacher:
> Am 24.06.25 um 16:22 schrieb David Howells:
>> Stefan Metzmacher <metze@samba.org> wrote:
>>
>>>> read_rfc1002_done:
>>>> + /* SMBDirect will read it all or nothing */
>>>> + msg->msg_iter.count = 0;
>>>
>>> And this iov_iter_truncate(0);
>>
>> Actually, it should probably have been iov_iter_advance().
>>
>>> While I'm wondering why we had this at all.
>>>
>>> It seems all callers of cifs_read_iter_from_socket()
>>> don't care and the code path via sock_recvmsg() doesn't
>>> truncate it just calls copy_to_iter() via this chain:
>>> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
>>> ->simple_copy_to_iter->copy_to_iter()
>>>
>>> I think the old code should have called
>>> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
>>>
>>> But the new code doesn't need it as copy_to_iter()
>>> calls iterate_and_advance().
>>
>> Yeah, it should. I seem to remember that there were situations in which it
>> didn't, but it's possible I managed to get rid of them.
>>
>>>> - default:
>>>> - /* It's a bug in upper layer to get there */
>>>> - cifs_dbg(VFS, "Invalid msg type %d\n",
>>>> - iov_iter_type(&msg->msg_iter));
>>>> - rc = -EINVAL;
>>>> - }
>>>
>>> I guess this is actually a real fix as I just saw
>>> CIFS: VFS: Invalid msg type 4
>>> in logs while running the cifs/001 test.
>>> And 4 is ITER_FOLIOQ.
>>
>> Ah... Were you using "-o seal"? The encrypted data is held in a buffer formed
>> from a folioq with a series of folios in it.
>
> I know tested it standalone in this tree:
> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=46a31189b8b059b3595a9586714761e6e76ba7c4
It also happens with this:
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=442dcd18dc1bf8d1e39f53d20810ca0a4958d139
Which contains your netfs fixes...
> Doing following mount:
>
> mount -t cifs -ousername=administrator,password=...,rdma,noperm,vers=3.0,mfsymlinks,actimeo=0 //172.31.9.1/test /mnt/test/
>
> It's using the siw driver (with modifications to work against the chelsio t404-bt card on windows) from
> here:
> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=5b89ff89f440ec36cf2c5ed2212be0d8523a4c9b
>
> But the siw difference should not really matter.
>
> This realiable generates this:
>
> [ 922.048997] [ T6639] CIFS: Attempting to mount //172.31.9.1/test
> [ 922.188445] [ T6639] CIFS: VFS: RDMA transport established
> [ 922.217974] [ T6642] usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)!
> [ 922.218221] [ T6642] ------------[ cut here ]------------
> [ 922.218230] [ T6642] kernel BUG at mm/usercopy.c:102!
> [ 922.218299] [ T6642] Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
> [ 922.218439] [ T6642] CPU: 1 UID: 0 PID: 6642 Comm: cifsd Kdump: loaded Tainted: G OE 6.16.0-rc3-metze.01+ #1 PREEMPT(voluntary)
> [ 922.218585] [ T6642] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [ 922.218635] [ T6642] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [ 922.218704] [ T6642] RIP: 0010:usercopy_abort+0x6c/0x80
> [ 922.218783] [ T6642] Code: fa 91 51 48 c7 c2 c0 d4 fa 91 41 52 48 c7 c7 40 d5 fa 91 48 0f 45 d6 48 c7 c6 00 d5 fa 91 48 89 c1 49 0f 45 f3 e8 84 aa 6b ff <0f> 0b 49 c7
> c1 c0 d3 fa 91 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90
> [ 922.218925] [ T6642] RSP: 0018:ffffc90001887820 EFLAGS: 00010246
> [ 922.218983] [ T6642] RAX: 0000000000000079 RBX: 0000000000000051 RCX: 0000000000000000
> [ 922.219046] [ T6642] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [ 922.219108] [ T6642] RBP: ffffc90001887838 R08: 0000000000000000 R09: 0000000000000000
> [ 922.219201] [ T6642] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003f
> [ 922.219261] [ T6642] R13: ffff88801f579280 R14: 0000000000000001 R15: ffffea0000163340
> [ 922.219323] [ T6642] FS: 0000000000000000(0000) GS:ffff8881466e8000(0000) knlGS:0000000000000000
> [ 922.219415] [ T6642] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 922.219469] [ T6642] CR2: 000075a216d19bb8 CR3: 000000000f5f6004 CR4: 00000000000726f0
> [ 922.219560] [ T6642] Call Trace:
> [ 922.219591] [ T6642] <TASK>
> [ 922.219624] [ T6642] __check_heap_object+0xe3/0x120
> [ 922.221090] [ T6642] __check_object_size+0x4dc/0x6d0
> [ 922.222547] [ T6642] smbd_recv+0x77f/0xfe0 [cifs]
> [ 922.224416] [ T6642] ? __pfx_smbd_recv+0x10/0x10 [cifs]
> [ 922.226195] [ T6642] ? __kasan_check_write+0x14/0x30
> [ 922.227722] [ T6642] ? _raw_spin_lock+0x81/0xf0
> [ 922.229190] [ T6642] ? __pfx__raw_spin_lock+0x10/0x10
> [ 922.230699] [ T6642] ? sched_clock_noinstr+0x9/0x10
> [ 922.232248] [ T6642] cifs_readv_from_socket+0x276/0x8f0 [cifs]
> [ 922.234149] [ T6642] ? __pfx_cifs_readv_from_socket+0x10/0x10 [cifs]
> [ 922.236222] [ T6642] ? mempool_alloc_slab+0x15/0x20
> [ 922.237705] [ T6642] cifs_read_from_socket+0xcd/0x120 [cifs]
> [ 922.239559] [ T6642] ? __pfx_cifs_read_from_socket+0x10/0x10 [cifs]
> [ 922.241403] [ T6642] ? __pfx_mempool_alloc_noprof+0x10/0x10
> [ 922.242827] [ T6642] ? __kasan_check_write+0x14/0x30
> [ 922.244141] [ T6642] ? cifs_small_buf_get+0x62/0x90 [cifs]
> [ 922.245500] [ T6642] ? allocate_buffers+0x216/0x390 [cifs]
> [ 922.246810] [ T6642] cifs_demultiplex_thread+0x7e9/0x2d50 [cifs]
> [ 922.248150] [ T6642] ? _raw_spin_lock_irqsave+0x95/0x100
> [ 922.249143] [ T6642] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> [ 922.250163] [ T6642] ? __pfx___schedule+0x10/0x10
> [ 922.250977] [ T6642] ? _raw_spin_lock_irqsave+0x95/0x100
> [ 922.251715] [ T6642] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [ 922.252415] [ T6642] ? __pfx_try_to_wake_up+0x10/0x10
> [ 922.253094] [ T6642] ? __kasan_check_read+0x11/0x20
> [ 922.253766] [ T6642] ? __kthread_parkme+0xa0/0x190
> [ 922.254344] [ T6642] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> [ 922.255073] [ T6642] kthread+0x396/0x830
> [ 922.255584] [ T6642] ? __pfx__raw_spin_lock_irq+0x10/0x10
> [ 922.256070] [ T6642] ? __pfx_kthread+0x10/0x10
> [ 922.256568] [ T6642] ? __kasan_check_write+0x14/0x30
> [ 922.257047] [ T6642] ? recalc_sigpending+0x180/0x210
> [ 922.257535] [ T6642] ? _raw_spin_unlock_irq+0xe/0x50
> [ 922.258015] [ T6642] ? calculate_sigpending+0x84/0xb0
> [ 922.258509] [ T6642] ? __pfx_kthread+0x10/0x10
> [ 922.258976] [ T6642] ret_from_fork+0x2b8/0x3b0
> [ 922.259377] [ T6642] ? __pfx_kthread+0x10/0x10
> [ 922.259757] [ T6642] ret_from_fork_asm+0x1a/0x30
> [ 922.260133] [ T6642] </TASK>
> [ 922.260514] [ T6642] Modules linked in: cifs(OE) ccm cmac nls_utf8 cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 netfs siw(OE) ib_uverbs ib_core softdog vboxsf
> vboxguest intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulni
> ghash_clmulni_intel sha1_ssse3 aesni_intel rapl i2c_piix4 i2c_smbus input_leds joydev mac_hid sunrpc binfmt_misc kvm_intel kvm irqbypass sch_fq_codel efi_pstore nfnetlink
> vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs ip_tables x_tables autofs4 hid_generic vboxvideo drm_vram_helper usbhid
> drm_ttm_helper vga16fb hid vgastate ahci ttm libahci video pata_acpi psmouse serio_raw wmi [last unloaded: cifs(OE)]
>
>
> Reverting it fixes it again.
>
> metze
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
2025-06-25 8:07 ` Stefan Metzmacher
2025-06-25 10:10 ` Stefan Metzmacher
@ 2025-06-25 11:25 ` David Howells
2025-06-25 11:51 ` Stefan Metzmacher
1 sibling, 1 reply; 10+ messages in thread
From: David Howells @ 2025-06-25 11:25 UTC (permalink / raw)
To: Stefan Metzmacher
Cc: dhowells, linux-cifs@vger.kernel.org, netfs, linux-fsdevel,
Steve French
Stefan Metzmacher <metze@samba.org> wrote:
> > [ 922.218230] [ T6642] kernel BUG at mm/usercopy.c:102!
Ah, I don't have that config option enabled. With it, I can reproduce that.
David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
2025-06-25 11:25 ` David Howells
@ 2025-06-25 11:51 ` Stefan Metzmacher
0 siblings, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-25 11:51 UTC (permalink / raw)
To: David Howells
Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French
Am 25.06.25 um 13:25 schrieb David Howells:
> Stefan Metzmacher <metze@samba.org> wrote:
>
>>> [ 922.218230] [ T6642] kernel BUG at mm/usercopy.c:102!
>
> Ah, I don't have that config option enabled. With it, I can reproduce that.
Ah, allocate_caches_and_workqueue() needs to use kmem_cache_create_usercopy/KMEM_CACHE_USERCOPY...
I was already using that in my old wip smbdirect code.
metze
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
[not found] <1107690.1750683895@warthog.procyon.org.uk>
2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
2025-06-24 14:22 ` David Howells
@ 2025-06-25 12:47 ` David Howells
2 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2025-06-25 12:47 UTC (permalink / raw)
Cc: dhowells, Stefan Metzmacher, linux-cifs@vger.kernel.org, netfs,
linux-fsdevel, Steve French
David Howells <dhowells@redhat.com> wrote:
> > And 4 is ITER_FOLIOQ.
I dumped some of the fields from the MID involved:
CIFS: VFS: Invalid msg type 4 (mid=a4 optype=0 command=8)
CIFS: VFS: - rcv=cifs_readv_receive+0x0/0x270 cb=smb2_readv_callback+0x0/0x480 hand=smb3_handle_read_data+0x0/0x40
So the ITER_FOLIOQ is from netfslib. I've attached corresponding trace log,
edited down a bit to remove some columns. Note that the EINVAL error gets
discarded by cifs_demultiplex_thread() and replaced with EAGAIN by netfslib.
David
---
diff-6828: netfs_rreq_ref: R=0000000c NEW r=2
diff-6828: netfs_read: R=0000000c READPAGE c=00000000 ni=1d4072 s=0 l=1000 sz=400
diff-6828: netfs_rreq_ref: R=0000000c GET SUBREQ r=3
diff-6828: smb3_rw_credits: R=0000000c[1] rd-submit cred=16 chg=0 pool=1688 ifl=1
diff-6828: netfs_sreq: R=0000000c[1] DOWN PREP f=000 s=0 0/400 s=0 e=0
diff-6828: smb3_rw_credits: R=0000000c[1] rd-issu-adj cred=16 chg=-15 pool=1688 ifl=1
diff-6828: netfs_sreq: R=0000000c[1] DOWN SUBMT f=102 s=0 0/400 s=0 e=0
diff-6828: netfs_rreq_ref: R=0000000c GET SUBREQ r=4
diff-6828: netfs_sreq: R=0000000c[2] ZERO SUBMT f=000 s=400 0/c00 s=0 e=0
diff-6828: netfs_sreq: R=0000000c[2] ZERO TERM f=102 s=400 c00/c00 s=1 e=0
diff-6828: netfs_collect_stream: R=0000000c[0:] cto=0 frn=0
diff-6828: netfs_rreq: R=0000000c RP WAIT-IP f=03
cifsd-6506: netfs_sreq: R=0000000c[1] DOWN I-RTR f=102 s=0 0/400 s=0 e=0
cifsd-6506: smb3_read_err: R=0000000c[1] xid=200 sid=0x8 tid=0x2 fid=0xa0952 offset=0x0 len=0x400 rc=-11
cifsd-6506: smb3_rw_credits: R=0000000c[1] rd-resp-clr cred=1 chg=0 pool=1703 ifl=1
cifsd-6506: netfs_sreq: R=0000000c[1] DOWN I-OK f=302 s=0 0/400 s=0 e=-11
cifsd-6506: netfs_failure: R=0000000c[1] DOWN f=302 s=0 0/400 read e=-11
cifsd-6506: netfs_rreq: R=0000000c RP PAUSE f=03
cifsd-6506: netfs_sreq: R=0000000c[1] DOWN TERM f=702 s=0 0/400 s=0 e=-11
cifsd-6506: netfs_rreq: R=0000000c RP WAKE-Q f=07
cifsd-6506: smb3_rw_credits: R=0000000c[1] rd-resp-add cred=0 chg=0 pool=1703 ifl=1
diff-6828: netfs_collect_stream: R=0000000c[0:] cto=0 frn=0
diff-6828: netfs_rreq: R=0000000c RP COLLECT f=07
diff-6828: netfs_collect: R=0000000c s=0-1000
diff-6828: netfs_collect_sreq: R=0000000c[0:01] s=0 t=0/400
diff-6828: netfs_rreq: R=0000000c RP S-ABNDN f=07
diff-6828: netfs_sreq: R=0000000c[1] DOWN ABNDN f=602 s=0 400/400 s=0 e=-11
diff-6828: netfs_sreq: R=0000000c[1] DOWN FREE f=602 s=0 400/400 s=0 e=-11
diff-6828: netfs_rreq_ref: R=0000000c PUT SUBREQ r=3
diff-6828: netfs_collect_sreq: R=0000000c[0:02] s=400 t=c00/c00
diff-6828: netfs_sreq: R=0000000c[2] ZERO ABNDN f=002 s=400 c00/c00 s=1 e=0
diff-6828: netfs_sreq: R=0000000c[2] ZERO FREE f=002 s=400 c00/c00 s=1 e=0
diff-6828: netfs_rreq_ref: R=0000000c PUT SUBREQ r=2
diff-6828: netfs_collect_stream: R=0000000c[0:] cto=1000 frn=ffffffff
diff-6828: netfs_collect_state: R=0000000c col=1000 cln=1000 n=8c
diff-6828: netfs_rreq: R=0000000c RP UNPAUSE f=0b
diff-6828: netfs_collect_stream: R=0000000c[0:] cto=1000 frn=ffffffff
diff-6828: netfs_collect_state: R=0000000c col=1000 cln=1000 n=8
diff-6828: netfs_rreq: R=0000000c RP COMPLET f=0b
diff-6828: netfs_rreq: R=0000000c RP WAKE-IP f=0a
diff-6828: netfs_rreq: R=0000000c RP DONE f=0a
diff-6828: netfs_rreq_ref: R=0000000c PUT WORK IP r=1
diff-6828: netfs_rreq: R=0000000c RP DONE-IP f=0a
diff-6828: netfs_rreq_ref: R=0000000c PUT RETURN r=0
kworker/u16:0-12 : netfs_rreq: R=0000000c RP FREE f=0a
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
@ 2025-06-25 14:18 ` Stefan Metzmacher
2025-06-25 16:00 ` David Howells
1 sibling, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-25 14:18 UTC (permalink / raw)
To: David Howells
Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French
Am 24.06.25 um 14:25 schrieb Stefan Metzmacher:
> Hi David,
>
> this looks very useful! Just a few comments below...
>
>> Collapse smbd_recv_buf() and smbd_recv_page() into smbd_recv() and just use
>> copy_to_iter() instead of memcpy().
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Steve French <stfrench@microsoft.com>
>> cc: Tom Talpey <tom@talpey.com>
>> cc: Stefan Metzmacher <metze@samba.org>
>> cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
>> cc: Matthew Wilcox <willy@infradead.org>
>> cc: linux-cifs@vger.kernel.org
>> cc: netfs@lists.linux.dev
>> cc: linux-fsdevel@vger.kernel.org
>> ---
>> fs/smb/client/smbdirect.c | 116 +++++++---------------------------------------
>> 1 file changed, 20 insertions(+), 96 deletions(-)
>>
>> diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c
>> index 5ae847919da5..dc64c337aae0 100644
>> --- a/fs/smb/client/smbdirect.c
>> +++ b/fs/smb/client/smbdirect.c
>> @@ -1747,35 +1747,39 @@ struct smbd_connection *smbd_get_connection(
>> }
>> /*
>> - * Receive data from receive reassembly queue
>> + * Receive data from the transport's receive reassembly queue
>> * All the incoming data packets are placed in reassembly queue
>> - * buf: the buffer to read data into
>> + * iter: the buffer to read data into
>> * size: the length of data to read
>> * return value: actual data read
>> - * Note: this implementation copies the data from reassebmly queue to receive
>> + *
>> + * Note: this implementation copies the data from reassembly queue to receive
>> * buffers used by upper layer. This is not the optimal code path. A better way
>> * to do it is to not have upper layer allocate its receive buffers but rather
>> * borrow the buffer from reassembly queue, and return it after data is
>> * consumed. But this will require more changes to upper layer code, and also
>> * need to consider packet boundaries while they still being reassembled.
>> */
>> -static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>> - unsigned int size)
>> +int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
>> {
>> struct smbdirect_socket *sc = &info->socket;
>> struct smbd_response *response;
>> struct smbdirect_data_transfer *data_transfer;
>> + size_t size = msg->msg_iter.count;
>
> I think this should be iov_iter_count()?
>
>> int to_copy, to_read, data_read, offset;
>> u32 data_length, remaining_data_length, data_offset;
>> int rc;
>> + if (WARN_ON_ONCE(iov_iter_rw(&msg->msg_iter) == WRITE))
>> + return -EINVAL; /* It's a bug in upper layer to get there */
>> +
>> again:
>> /*
>> * No need to hold the reassembly queue lock all the time as we are
>> * the only one reading from the front of the queue. The transport
>> * may add more entries to the back of the queue at the same time
>> */
>> - log_read(INFO, "size=%d info->reassembly_data_length=%d\n", size,
>> + log_read(INFO, "size=%zd info->reassembly_data_length=%d\n", size,
>> info->reassembly_data_length);
>> if (info->reassembly_data_length >= size) {
>> int queue_length;
>> @@ -1811,9 +1815,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>> * transport layer is added
>> */
>> if (response->first_segment && size == 4) {
>> - unsigned int rfc1002_len =
>> + unsigned int len =
Please keep the rfc1002_len variable as it's used in the log_read message below
and it should by host byteorder.
I'd propose a diff like this:
@@ -1846,8 +1850,11 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
if (response->first_segment && size == 4) {
unsigned int rfc1002_len =
data_length + remaining_data_length;
- *((__be32 *)buf) = cpu_to_be32(rfc1002_len);
+ __be32 rfc1002_hdr = cpu_to_be32(rfc1002_len);
data_read = 4;
+ if (copy_to_iter(&rfc1002_hdr, sizeof(rfc1002_hdr),
+ &msg->msg_iter) != data_read)
+ return -EFAULT;
response->first_segment = false;
log_read(INFO, "returning rfc1002 length %d\n",
rfc1002_len);
>> data_length + remaining_data_length;
>> - *((__be32 *)buf) = cpu_to_be32(rfc1002_len);
>> + __be32 rfc1002_len = cpu_to_be32(len);
>> + if (copy_to_iter(&rfc1002_len, sizeof(rfc1002_len),
>> + &msg->msg_iter) != sizeof(rfc1002_len))
>> + return -EFAULT;
>> data_read = 4;
>> response->first_segment = false;
>> log_read(INFO, "returning rfc1002 length %d\n",
>> @@ -1822,10 +1829,9 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>> }
>> to_copy = min_t(int, data_length - offset, to_read);
>> - memcpy(
>> - buf + data_read,
>> - (char *)data_transfer + data_offset + offset,
>> - to_copy);
>> + if (copy_to_iter((char *)data_transfer + data_offset + offset,
>> + to_copy, &msg->msg_iter) != to_copy)
>> + return -EFAULT;
>> /* move on to the next buffer? */
>> if (to_copy == data_length - offset) {
>> @@ -1870,6 +1876,8 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>> data_read, info->reassembly_data_length,
>> info->first_entry_offset);
>> read_rfc1002_done:
>> + /* SMBDirect will read it all or nothing */
>> + msg->msg_iter.count = 0;
>
> And this iov_iter_truncate(0);
>
> While I'm wondering why we had this at all.
>
> It seems all callers of cifs_read_iter_from_socket()
> don't care and the code path via sock_recvmsg() doesn't
> truncate it just calls copy_to_iter() via this chain:
> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
> ->simple_copy_to_iter->copy_to_iter()
>
> I think the old code should have called
> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
>
> But the new code doesn't need it as copy_to_iter()
> calls iterate_and_advance().
>
>> return data_read;
>> }
>> @@ -1890,90 +1898,6 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>> goto again;
>> }
>> -/*
>> - * Receive a page from receive reassembly queue
>> - * page: the page to read data into
>> - * to_read: the length of data to read
>> - * return value: actual data read
>> - */
>> -static int smbd_recv_page(struct smbd_connection *info,
>> - struct page *page, unsigned int page_offset,
>> - unsigned int to_read)
>> -{
>> - struct smbdirect_socket *sc = &info->socket;
>> - int ret;
>> - char *to_address;
>> - void *page_address;
>> -
>> - /* make sure we have the page ready for read */
>> - ret = wait_event_interruptible(
>> - info->wait_reassembly_queue,
>> - info->reassembly_data_length >= to_read ||
>> - sc->status != SMBDIRECT_SOCKET_CONNECTED);
>> - if (ret)
>> - return ret;
>> -
>> - /* now we can read from reassembly queue and not sleep */
>> - page_address = kmap_atomic(page);
>> - to_address = (char *) page_address + page_offset;
>> -
>> - log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
>> - page, to_address, to_read);
>> -
>> - ret = smbd_recv_buf(info, to_address, to_read);
>> - kunmap_atomic(page_address);
>> -
>> - return ret;
>> -}
>> -
>> -/*
>> - * Receive data from transport
>> - * msg: a msghdr point to the buffer, can be ITER_KVEC or ITER_BVEC
>> - * return: total bytes read, or 0. SMB Direct will not do partial read.
>> - */
>> -int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
>> -{
>> - char *buf;
>> - struct page *page;
>> - unsigned int to_read, page_offset;
>> - int rc;
>> -
>> - if (iov_iter_rw(&msg->msg_iter) == WRITE) {
>> - /* It's a bug in upper layer to get there */
>> - cifs_dbg(VFS, "Invalid msg iter dir %u\n",
>> - iov_iter_rw(&msg->msg_iter));
>> - rc = -EINVAL;
>> - goto out;
>> - }
>> -
>> - switch (iov_iter_type(&msg->msg_iter)) {
>> - case ITER_KVEC:
>> - buf = msg->msg_iter.kvec->iov_base;
>> - to_read = msg->msg_iter.kvec->iov_len;
>> - rc = smbd_recv_buf(info, buf, to_read);
>> - break;
>> -
>> - case ITER_BVEC:
>> - page = msg->msg_iter.bvec->bv_page;
>> - page_offset = msg->msg_iter.bvec->bv_offset;
>> - to_read = msg->msg_iter.bvec->bv_len;
>> - rc = smbd_recv_page(info, page, page_offset, to_read);
>> - break;
>> -
>> - default:
>> - /* It's a bug in upper layer to get there */
>> - cifs_dbg(VFS, "Invalid msg type %d\n",
>> - iov_iter_type(&msg->msg_iter));
>> - rc = -EINVAL;
>> - }
>
> I guess this is actually a real fix as I just saw
> CIFS: VFS: Invalid msg type 4
> in logs while running the cifs/001 test.
> And 4 is ITER_FOLIOQ.
>
> So there might be something broken when ITER_FOLIOQ was
> introduced, but I wasn't able to find a specific commit.
> Maybe it was also already broken when using
> smb3 encryption over smbdirect, when ITER_XARRAY was still used.
>
> metze
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
2025-06-25 14:18 ` Stefan Metzmacher
@ 2025-06-25 16:00 ` David Howells
1 sibling, 0 replies; 10+ messages in thread
From: David Howells @ 2025-06-25 16:00 UTC (permalink / raw)
To: Stefan Metzmacher
Cc: dhowells, linux-cifs@vger.kernel.org, netfs, linux-fsdevel,
Steve French
Stefan Metzmacher <metze@samba.org> wrote:
> Please keep the rfc1002_len variable as it's used in the log_read message
> below and it should by host byteorder.
So this change on top of the patch I posted?
@@ -1838,11 +1838,11 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
* transport layer is added
*/
if (response->first_segment && size == 4) {
- unsigned int len =
+ unsigned int rfc1002_len =
data_length + remaining_data_length;
- __be32 rfc1002_len = cpu_to_be32(len);
- if (copy_to_iter(&rfc1002_len, sizeof(rfc1002_len),
- &msg->msg_iter) != sizeof(rfc1002_len))
+ __be32 rfc1002_hdr = cpu_to_be32(rfc1002_len);
+ if (copy_to_iter(&rfc1002_hdr, sizeof(rfc1002_hdr),
+ &msg->msg_iter) != sizeof(rfc1002_hdr))
return -EFAULT;
data_read = 4;
response->first_segment = false;
Btw, I'm changing the patch subject and description to:
cifs: Fix reading into an ITER_FOLIOQ from the smbdirect code
When performing a file read from RDMA, smbd_recv() prints an "Invalid msg
type 4" error and fails the I/O. This is due to the switch-statement there
not handling the ITER_FOLIOQ handed down from netfslib.
Fix this by collapsing smbd_recv_buf() and smbd_recv_page() into
smbd_recv() and just using copy_to_iter() instead of memcpy(). This
future-proofs the function too, in case more ITER_* types are added.
Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
David
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-06-25 16:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1107690.1750683895@warthog.procyon.org.uk>
2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
2025-06-25 14:18 ` Stefan Metzmacher
2025-06-25 16:00 ` David Howells
2025-06-24 14:22 ` David Howells
2025-06-24 16:05 ` Stefan Metzmacher
2025-06-25 8:07 ` Stefan Metzmacher
2025-06-25 10:10 ` Stefan Metzmacher
2025-06-25 11:25 ` David Howells
2025-06-25 11:51 ` Stefan Metzmacher
2025-06-25 12:47 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).