netfs.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
       [not found] <1107690.1750683895@warthog.procyon.org.uk>
@ 2025-06-24 12:25 ` Stefan Metzmacher
  2025-06-25 14:18   ` Stefan Metzmacher
  2025-06-25 16:00   ` David Howells
  2025-06-24 14:22 ` David Howells
  2025-06-25 12:47 ` David Howells
  2 siblings, 2 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-24 12:25 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French

Hi David,

this looks very useful! Just a few comments below...

> Collapse smbd_recv_buf() and smbd_recv_page() into smbd_recv() and just use
> copy_to_iter() instead of memcpy().
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Steve French <stfrench@microsoft.com>
> cc: Tom Talpey <tom@talpey.com>
> cc: Stefan Metzmacher <metze@samba.org>
> cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: linux-cifs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>   fs/smb/client/smbdirect.c |  116 +++++++---------------------------------------
>   1 file changed, 20 insertions(+), 96 deletions(-)
> 
> diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c
> index 5ae847919da5..dc64c337aae0 100644
> --- a/fs/smb/client/smbdirect.c
> +++ b/fs/smb/client/smbdirect.c
> @@ -1747,35 +1747,39 @@ struct smbd_connection *smbd_get_connection(
>   }
>   
>   /*
> - * Receive data from receive reassembly queue
> + * Receive data from the transport's receive reassembly queue
>    * All the incoming data packets are placed in reassembly queue
> - * buf: the buffer to read data into
> + * iter: the buffer to read data into
>    * size: the length of data to read
>    * return value: actual data read
> - * Note: this implementation copies the data from reassebmly queue to receive
> + *
> + * Note: this implementation copies the data from reassembly queue to receive
>    * buffers used by upper layer. This is not the optimal code path. A better way
>    * to do it is to not have upper layer allocate its receive buffers but rather
>    * borrow the buffer from reassembly queue, and return it after data is
>    * consumed. But this will require more changes to upper layer code, and also
>    * need to consider packet boundaries while they still being reassembled.
>    */
> -static int smbd_recv_buf(struct smbd_connection *info, char *buf,
> -		unsigned int size)
> +int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
>   {
>   	struct smbdirect_socket *sc = &info->socket;
>   	struct smbd_response *response;
>   	struct smbdirect_data_transfer *data_transfer;
> +	size_t size = msg->msg_iter.count;

I think this should be iov_iter_count()?

>   	int to_copy, to_read, data_read, offset;
>   	u32 data_length, remaining_data_length, data_offset;
>   	int rc;
>   
> +	if (WARN_ON_ONCE(iov_iter_rw(&msg->msg_iter) == WRITE))
> +		return -EINVAL; /* It's a bug in upper layer to get there */
> +
>   again:
>   	/*
>   	 * No need to hold the reassembly queue lock all the time as we are
>   	 * the only one reading from the front of the queue. The transport
>   	 * may add more entries to the back of the queue at the same time
>   	 */
> -	log_read(INFO, "size=%d info->reassembly_data_length=%d\n", size,
> +	log_read(INFO, "size=%zd info->reassembly_data_length=%d\n", size,
>   		info->reassembly_data_length);
>   	if (info->reassembly_data_length >= size) {
>   		int queue_length;
> @@ -1811,9 +1815,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>   			 * transport layer is added
>   			 */
>   			if (response->first_segment && size == 4) {
> -				unsigned int rfc1002_len =
> +				unsigned int len =
>   					data_length + remaining_data_length;
> -				*((__be32 *)buf) = cpu_to_be32(rfc1002_len);
> +				__be32 rfc1002_len = cpu_to_be32(len);
> +				if (copy_to_iter(&rfc1002_len, sizeof(rfc1002_len),
> +						 &msg->msg_iter) != sizeof(rfc1002_len))
> +					return -EFAULT;
>   				data_read = 4;
>   				response->first_segment = false;
>   				log_read(INFO, "returning rfc1002 length %d\n",
> @@ -1822,10 +1829,9 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>   			}
>   
>   			to_copy = min_t(int, data_length - offset, to_read);
> -			memcpy(
> -				buf + data_read,
> -				(char *)data_transfer + data_offset + offset,
> -				to_copy);
> +			if (copy_to_iter((char *)data_transfer + data_offset + offset,
> +					 to_copy, &msg->msg_iter) != to_copy)
> +				return -EFAULT;
>   
>   			/* move on to the next buffer? */
>   			if (to_copy == data_length - offset) {
> @@ -1870,6 +1876,8 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>   			 data_read, info->reassembly_data_length,
>   			 info->first_entry_offset);
>   read_rfc1002_done:
> +		/* SMBDirect will read it all or nothing */
> +		msg->msg_iter.count = 0;

And this iov_iter_truncate(0);

While I'm wondering why we had this at all.

It seems all callers of cifs_read_iter_from_socket()
don't care and the code path via sock_recvmsg() doesn't
truncate it just calls copy_to_iter() via this chain:
->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
->simple_copy_to_iter->copy_to_iter()

I think the old code should have called
iov_iter_advance(rc) instead of msg->msg_iter.count = 0.

But the new code doesn't need it as copy_to_iter()
calls iterate_and_advance().

>   		return data_read;
>   	}
>   
> @@ -1890,90 +1898,6 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>   	goto again;
>   }
>   
> -/*
> - * Receive a page from receive reassembly queue
> - * page: the page to read data into
> - * to_read: the length of data to read
> - * return value: actual data read
> - */
> -static int smbd_recv_page(struct smbd_connection *info,
> -		struct page *page, unsigned int page_offset,
> -		unsigned int to_read)
> -{
> -	struct smbdirect_socket *sc = &info->socket;
> -	int ret;
> -	char *to_address;
> -	void *page_address;
> -
> -	/* make sure we have the page ready for read */
> -	ret = wait_event_interruptible(
> -		info->wait_reassembly_queue,
> -		info->reassembly_data_length >= to_read ||
> -			sc->status != SMBDIRECT_SOCKET_CONNECTED);
> -	if (ret)
> -		return ret;
> -
> -	/* now we can read from reassembly queue and not sleep */
> -	page_address = kmap_atomic(page);
> -	to_address = (char *) page_address + page_offset;
> -
> -	log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
> -		page, to_address, to_read);
> -
> -	ret = smbd_recv_buf(info, to_address, to_read);
> -	kunmap_atomic(page_address);
> -
> -	return ret;
> -}
> -
> -/*
> - * Receive data from transport
> - * msg: a msghdr point to the buffer, can be ITER_KVEC or ITER_BVEC
> - * return: total bytes read, or 0. SMB Direct will not do partial read.
> - */
> -int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
> -{
> -	char *buf;
> -	struct page *page;
> -	unsigned int to_read, page_offset;
> -	int rc;
> -
> -	if (iov_iter_rw(&msg->msg_iter) == WRITE) {
> -		/* It's a bug in upper layer to get there */
> -		cifs_dbg(VFS, "Invalid msg iter dir %u\n",
> -			 iov_iter_rw(&msg->msg_iter));
> -		rc = -EINVAL;
> -		goto out;
> -	}
> -
> -	switch (iov_iter_type(&msg->msg_iter)) {
> -	case ITER_KVEC:
> -		buf = msg->msg_iter.kvec->iov_base;
> -		to_read = msg->msg_iter.kvec->iov_len;
> -		rc = smbd_recv_buf(info, buf, to_read);
> -		break;
> -
> -	case ITER_BVEC:
> -		page = msg->msg_iter.bvec->bv_page;
> -		page_offset = msg->msg_iter.bvec->bv_offset;
> -		to_read = msg->msg_iter.bvec->bv_len;
> -		rc = smbd_recv_page(info, page, page_offset, to_read);
> -		break;
> -
> -	default:
> -		/* It's a bug in upper layer to get there */
> -		cifs_dbg(VFS, "Invalid msg type %d\n",
> -			 iov_iter_type(&msg->msg_iter));
> -		rc = -EINVAL;
> -	}

I guess this is actually a real fix as I just saw
CIFS: VFS: Invalid msg type 4
in logs while running the cifs/001 test.
And 4 is ITER_FOLIOQ.

So there might be something broken when ITER_FOLIOQ was
introduced, but I wasn't able to find a specific commit.
Maybe it was also already broken when using
smb3 encryption over smbdirect, when ITER_XARRAY was still used.

metze


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
       [not found] <1107690.1750683895@warthog.procyon.org.uk>
  2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
@ 2025-06-24 14:22 ` David Howells
  2025-06-24 16:05   ` Stefan Metzmacher
  2025-06-25  8:07   ` Stefan Metzmacher
  2025-06-25 12:47 ` David Howells
  2 siblings, 2 replies; 10+ messages in thread
From: David Howells @ 2025-06-24 14:22 UTC (permalink / raw)
  To: Stefan Metzmacher
  Cc: dhowells, linux-cifs@vger.kernel.org, netfs, linux-fsdevel,
	Steve French

Stefan Metzmacher <metze@samba.org> wrote:

> >   read_rfc1002_done:
> > +		/* SMBDirect will read it all or nothing */
> > +		msg->msg_iter.count = 0;
> 
> And this iov_iter_truncate(0);

Actually, it should probably have been iov_iter_advance().

> While I'm wondering why we had this at all.
> 
> It seems all callers of cifs_read_iter_from_socket()
> don't care and the code path via sock_recvmsg() doesn't
> truncate it just calls copy_to_iter() via this chain:
> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
> ->simple_copy_to_iter->copy_to_iter()
> 
> I think the old code should have called
> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
> 
> But the new code doesn't need it as copy_to_iter()
> calls iterate_and_advance().

Yeah, it should.  I seem to remember that there were situations in which it
didn't, but it's possible I managed to get rid of them.

> > -	default:
> > -		/* It's a bug in upper layer to get there */
> > -		cifs_dbg(VFS, "Invalid msg type %d\n",
> > -			 iov_iter_type(&msg->msg_iter));
> > -		rc = -EINVAL;
> > -	}
> 
> I guess this is actually a real fix as I just saw
> CIFS: VFS: Invalid msg type 4
> in logs while running the cifs/001 test.
> And 4 is ITER_FOLIOQ.

Ah... Were you using "-o seal"?  The encrypted data is held in a buffer formed
from a folioq with a series of folios in it.

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
  2025-06-24 14:22 ` David Howells
@ 2025-06-24 16:05   ` Stefan Metzmacher
  2025-06-25  8:07   ` Stefan Metzmacher
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-24 16:05 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French

Hi David,

>>>    read_rfc1002_done:
>>> +		/* SMBDirect will read it all or nothing */
>>> +		msg->msg_iter.count = 0;
>>
>> And this iov_iter_truncate(0);
> 
> Actually, it should probably have been iov_iter_advance().
> 
>> While I'm wondering why we had this at all.
>>
>> It seems all callers of cifs_read_iter_from_socket()
>> don't care and the code path via sock_recvmsg() doesn't
>> truncate it just calls copy_to_iter() via this chain:
>> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
>> ->simple_copy_to_iter->copy_to_iter()
>>
>> I think the old code should have called
>> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
>>
>> But the new code doesn't need it as copy_to_iter()
>> calls iterate_and_advance().
> 
> Yeah, it should.  I seem to remember that there were situations in which it
> didn't, but it's possible I managed to get rid of them.
> 
>>> -	default:
>>> -		/* It's a bug in upper layer to get there */
>>> -		cifs_dbg(VFS, "Invalid msg type %d\n",
>>> -			 iov_iter_type(&msg->msg_iter));
>>> -		rc = -EINVAL;
>>> -	}
>>
>> I guess this is actually a real fix as I just saw
>> CIFS: VFS: Invalid msg type 4
>> in logs while running the cifs/001 test.
>> And 4 is ITER_FOLIOQ.
> 
> Ah... Were you using "-o seal"?  The encrypted data is held in a buffer formed
> from a folioq with a series of folios in it.


In local.config I have this:

[smb3-1-rdma]
FSTYP=cifs
TEST_DEV=//172.31.9.1/TEST
TEST_DIR=/mnt/test
TEST_FS_MOUNT_OPTS='-ousername=administrator,password=...,rdma,noperm,vers=3.0,mfsymlinks,actimeo=0'
export MOUNT_OPTIONS='-ousername=administrator,password=...,rdma,noperm,mfsymlinks,actimeo=0'
export SCRATCH_DEV=//172.31.9.1/SCRATCH
export SCRATCH_MNT=/mnt/scratch

And called:

./check -s smb3-1-rdma -E tests/cifs/exclude.incompatible-smb3.txt -E tests/cifs/exclude.very-slow.txt cifs/001

So I don't think it used seal, I'm also not seeing encrypted stuff in the capture,
so maybe the folioq iter comes from a higher layer (via cifs_readv_receive()) instead
of receive_encrypted_read().

metze


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
  2025-06-24 14:22 ` David Howells
  2025-06-24 16:05   ` Stefan Metzmacher
@ 2025-06-25  8:07   ` Stefan Metzmacher
  2025-06-25 10:10     ` Stefan Metzmacher
  2025-06-25 11:25     ` David Howells
  1 sibling, 2 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-25  8:07 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French

Am 24.06.25 um 16:22 schrieb David Howells:
> Stefan Metzmacher <metze@samba.org> wrote:
> 
>>>    read_rfc1002_done:
>>> +		/* SMBDirect will read it all or nothing */
>>> +		msg->msg_iter.count = 0;
>>
>> And this iov_iter_truncate(0);
> 
> Actually, it should probably have been iov_iter_advance().
> 
>> While I'm wondering why we had this at all.
>>
>> It seems all callers of cifs_read_iter_from_socket()
>> don't care and the code path via sock_recvmsg() doesn't
>> truncate it just calls copy_to_iter() via this chain:
>> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
>> ->simple_copy_to_iter->copy_to_iter()
>>
>> I think the old code should have called
>> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
>>
>> But the new code doesn't need it as copy_to_iter()
>> calls iterate_and_advance().
> 
> Yeah, it should.  I seem to remember that there were situations in which it
> didn't, but it's possible I managed to get rid of them.
> 
>>> -	default:
>>> -		/* It's a bug in upper layer to get there */
>>> -		cifs_dbg(VFS, "Invalid msg type %d\n",
>>> -			 iov_iter_type(&msg->msg_iter));
>>> -		rc = -EINVAL;
>>> -	}
>>
>> I guess this is actually a real fix as I just saw
>> CIFS: VFS: Invalid msg type 4
>> in logs while running the cifs/001 test.
>> And 4 is ITER_FOLIOQ.
> 
> Ah... Were you using "-o seal"?  The encrypted data is held in a buffer formed
> from a folioq with a series of folios in it.

I know tested it standalone in this tree:
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=46a31189b8b059b3595a9586714761e6e76ba7c4

Doing following mount:

mount -t cifs -ousername=administrator,password=...,rdma,noperm,vers=3.0,mfsymlinks,actimeo=0 //172.31.9.1/test /mnt/test/

It's using the siw driver (with modifications to work against the chelsio t404-bt card on windows) from
here:
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=5b89ff89f440ec36cf2c5ed2212be0d8523a4c9b

But the siw difference should not really matter.

This realiable generates this:

[  922.048997] [   T6639] CIFS: Attempting to mount //172.31.9.1/test
[  922.188445] [   T6639] CIFS: VFS: RDMA transport established
[  922.217974] [   T6642] usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)!
[  922.218221] [   T6642] ------------[ cut here ]------------
[  922.218230] [   T6642] kernel BUG at mm/usercopy.c:102!
[  922.218299] [   T6642] Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
[  922.218439] [   T6642] CPU: 1 UID: 0 PID: 6642 Comm: cifsd Kdump: loaded Tainted: G           OE       6.16.0-rc3-metze.01+ #1 PREEMPT(voluntary)
[  922.218585] [   T6642] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  922.218635] [   T6642] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  922.218704] [   T6642] RIP: 0010:usercopy_abort+0x6c/0x80
[  922.218783] [   T6642] Code: fa 91 51 48 c7 c2 c0 d4 fa 91 41 52 48 c7 c7 40 d5 fa 91 48 0f 45 d6 48 c7 c6 00 d5 fa 91 48 89 c1 49 0f 45 f3 e8 84 aa 6b ff <0f> 0b 49 c7 
c1 c0 d3 fa 91 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90
[  922.218925] [   T6642] RSP: 0018:ffffc90001887820 EFLAGS: 00010246
[  922.218983] [   T6642] RAX: 0000000000000079 RBX: 0000000000000051 RCX: 0000000000000000
[  922.219046] [   T6642] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  922.219108] [   T6642] RBP: ffffc90001887838 R08: 0000000000000000 R09: 0000000000000000
[  922.219201] [   T6642] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003f
[  922.219261] [   T6642] R13: ffff88801f579280 R14: 0000000000000001 R15: ffffea0000163340
[  922.219323] [   T6642] FS:  0000000000000000(0000) GS:ffff8881466e8000(0000) knlGS:0000000000000000
[  922.219415] [   T6642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  922.219469] [   T6642] CR2: 000075a216d19bb8 CR3: 000000000f5f6004 CR4: 00000000000726f0
[  922.219560] [   T6642] Call Trace:
[  922.219591] [   T6642]  <TASK>
[  922.219624] [   T6642]  __check_heap_object+0xe3/0x120
[  922.221090] [   T6642]  __check_object_size+0x4dc/0x6d0
[  922.222547] [   T6642]  smbd_recv+0x77f/0xfe0 [cifs]
[  922.224416] [   T6642]  ? __pfx_smbd_recv+0x10/0x10 [cifs]
[  922.226195] [   T6642]  ? __kasan_check_write+0x14/0x30
[  922.227722] [   T6642]  ? _raw_spin_lock+0x81/0xf0
[  922.229190] [   T6642]  ? __pfx__raw_spin_lock+0x10/0x10
[  922.230699] [   T6642]  ? sched_clock_noinstr+0x9/0x10
[  922.232248] [   T6642]  cifs_readv_from_socket+0x276/0x8f0 [cifs]
[  922.234149] [   T6642]  ? __pfx_cifs_readv_from_socket+0x10/0x10 [cifs]
[  922.236222] [   T6642]  ? mempool_alloc_slab+0x15/0x20
[  922.237705] [   T6642]  cifs_read_from_socket+0xcd/0x120 [cifs]
[  922.239559] [   T6642]  ? __pfx_cifs_read_from_socket+0x10/0x10 [cifs]
[  922.241403] [   T6642]  ? __pfx_mempool_alloc_noprof+0x10/0x10
[  922.242827] [   T6642]  ? __kasan_check_write+0x14/0x30
[  922.244141] [   T6642]  ? cifs_small_buf_get+0x62/0x90 [cifs]
[  922.245500] [   T6642]  ? allocate_buffers+0x216/0x390 [cifs]
[  922.246810] [   T6642]  cifs_demultiplex_thread+0x7e9/0x2d50 [cifs]
[  922.248150] [   T6642]  ? _raw_spin_lock_irqsave+0x95/0x100
[  922.249143] [   T6642]  ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
[  922.250163] [   T6642]  ? __pfx___schedule+0x10/0x10
[  922.250977] [   T6642]  ? _raw_spin_lock_irqsave+0x95/0x100
[  922.251715] [   T6642]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  922.252415] [   T6642]  ? __pfx_try_to_wake_up+0x10/0x10
[  922.253094] [   T6642]  ? __kasan_check_read+0x11/0x20
[  922.253766] [   T6642]  ? __kthread_parkme+0xa0/0x190
[  922.254344] [   T6642]  ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
[  922.255073] [   T6642]  kthread+0x396/0x830
[  922.255584] [   T6642]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  922.256070] [   T6642]  ? __pfx_kthread+0x10/0x10
[  922.256568] [   T6642]  ? __kasan_check_write+0x14/0x30
[  922.257047] [   T6642]  ? recalc_sigpending+0x180/0x210
[  922.257535] [   T6642]  ? _raw_spin_unlock_irq+0xe/0x50
[  922.258015] [   T6642]  ? calculate_sigpending+0x84/0xb0
[  922.258509] [   T6642]  ? __pfx_kthread+0x10/0x10
[  922.258976] [   T6642]  ret_from_fork+0x2b8/0x3b0
[  922.259377] [   T6642]  ? __pfx_kthread+0x10/0x10
[  922.259757] [   T6642]  ret_from_fork_asm+0x1a/0x30
[  922.260133] [   T6642]  </TASK>
[  922.260514] [   T6642] Modules linked in: cifs(OE) ccm cmac nls_utf8 cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 netfs siw(OE) ib_uverbs ib_core softdog vboxsf 
vboxguest intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulni 
ghash_clmulni_intel sha1_ssse3 aesni_intel rapl i2c_piix4 i2c_smbus input_leds joydev mac_hid sunrpc binfmt_misc kvm_intel kvm irqbypass sch_fq_codel efi_pstore nfnetlink 
vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs ip_tables x_tables autofs4 hid_generic vboxvideo drm_vram_helper usbhid 
drm_ttm_helper vga16fb hid vgastate ahci ttm libahci video pata_acpi psmouse serio_raw wmi [last unloaded: cifs(OE)]


Reverting it fixes it again.

metze

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
  2025-06-25  8:07   ` Stefan Metzmacher
@ 2025-06-25 10:10     ` Stefan Metzmacher
  2025-06-25 11:25     ` David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-25 10:10 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French

Am 25.06.25 um 10:07 schrieb Stefan Metzmacher:
> Am 24.06.25 um 16:22 schrieb David Howells:
>> Stefan Metzmacher <metze@samba.org> wrote:
>>
>>>>    read_rfc1002_done:
>>>> +        /* SMBDirect will read it all or nothing */
>>>> +        msg->msg_iter.count = 0;
>>>
>>> And this iov_iter_truncate(0);
>>
>> Actually, it should probably have been iov_iter_advance().
>>
>>> While I'm wondering why we had this at all.
>>>
>>> It seems all callers of cifs_read_iter_from_socket()
>>> don't care and the code path via sock_recvmsg() doesn't
>>> truncate it just calls copy_to_iter() via this chain:
>>> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
>>> ->simple_copy_to_iter->copy_to_iter()
>>>
>>> I think the old code should have called
>>> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
>>>
>>> But the new code doesn't need it as copy_to_iter()
>>> calls iterate_and_advance().
>>
>> Yeah, it should.  I seem to remember that there were situations in which it
>> didn't, but it's possible I managed to get rid of them.
>>
>>>> -    default:
>>>> -        /* It's a bug in upper layer to get there */
>>>> -        cifs_dbg(VFS, "Invalid msg type %d\n",
>>>> -             iov_iter_type(&msg->msg_iter));
>>>> -        rc = -EINVAL;
>>>> -    }
>>>
>>> I guess this is actually a real fix as I just saw
>>> CIFS: VFS: Invalid msg type 4
>>> in logs while running the cifs/001 test.
>>> And 4 is ITER_FOLIOQ.
>>
>> Ah... Were you using "-o seal"?  The encrypted data is held in a buffer formed
>> from a folioq with a series of folios in it.
> 
> I know tested it standalone in this tree:
> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=46a31189b8b059b3595a9586714761e6e76ba7c4

It also happens with this:
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=442dcd18dc1bf8d1e39f53d20810ca0a4958d139

Which contains your netfs fixes...

> Doing following mount:
> 
> mount -t cifs -ousername=administrator,password=...,rdma,noperm,vers=3.0,mfsymlinks,actimeo=0 //172.31.9.1/test /mnt/test/
> 
> It's using the siw driver (with modifications to work against the chelsio t404-bt card on windows) from
> here:
> https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=5b89ff89f440ec36cf2c5ed2212be0d8523a4c9b
> 
> But the siw difference should not really matter.
> 
> This realiable generates this:
> 
> [  922.048997] [   T6639] CIFS: Attempting to mount //172.31.9.1/test
> [  922.188445] [   T6639] CIFS: VFS: RDMA transport established
> [  922.217974] [   T6642] usercopy: Kernel memory exposure attempt detected from SLUB object 'smbd_response_0000000091e24ea1' (offset 81, size 63)!
> [  922.218221] [   T6642] ------------[ cut here ]------------
> [  922.218230] [   T6642] kernel BUG at mm/usercopy.c:102!
> [  922.218299] [   T6642] Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
> [  922.218439] [   T6642] CPU: 1 UID: 0 PID: 6642 Comm: cifsd Kdump: loaded Tainted: G           OE       6.16.0-rc3-metze.01+ #1 PREEMPT(voluntary)
> [  922.218585] [   T6642] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [  922.218635] [   T6642] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [  922.218704] [   T6642] RIP: 0010:usercopy_abort+0x6c/0x80
> [  922.218783] [   T6642] Code: fa 91 51 48 c7 c2 c0 d4 fa 91 41 52 48 c7 c7 40 d5 fa 91 48 0f 45 d6 48 c7 c6 00 d5 fa 91 48 89 c1 49 0f 45 f3 e8 84 aa 6b ff <0f> 0b 49 c7 
> c1 c0 d3 fa 91 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90
> [  922.218925] [   T6642] RSP: 0018:ffffc90001887820 EFLAGS: 00010246
> [  922.218983] [   T6642] RAX: 0000000000000079 RBX: 0000000000000051 RCX: 0000000000000000
> [  922.219046] [   T6642] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [  922.219108] [   T6642] RBP: ffffc90001887838 R08: 0000000000000000 R09: 0000000000000000
> [  922.219201] [   T6642] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000003f
> [  922.219261] [   T6642] R13: ffff88801f579280 R14: 0000000000000001 R15: ffffea0000163340
> [  922.219323] [   T6642] FS:  0000000000000000(0000) GS:ffff8881466e8000(0000) knlGS:0000000000000000
> [  922.219415] [   T6642] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  922.219469] [   T6642] CR2: 000075a216d19bb8 CR3: 000000000f5f6004 CR4: 00000000000726f0
> [  922.219560] [   T6642] Call Trace:
> [  922.219591] [   T6642]  <TASK>
> [  922.219624] [   T6642]  __check_heap_object+0xe3/0x120
> [  922.221090] [   T6642]  __check_object_size+0x4dc/0x6d0
> [  922.222547] [   T6642]  smbd_recv+0x77f/0xfe0 [cifs]
> [  922.224416] [   T6642]  ? __pfx_smbd_recv+0x10/0x10 [cifs]
> [  922.226195] [   T6642]  ? __kasan_check_write+0x14/0x30
> [  922.227722] [   T6642]  ? _raw_spin_lock+0x81/0xf0
> [  922.229190] [   T6642]  ? __pfx__raw_spin_lock+0x10/0x10
> [  922.230699] [   T6642]  ? sched_clock_noinstr+0x9/0x10
> [  922.232248] [   T6642]  cifs_readv_from_socket+0x276/0x8f0 [cifs]
> [  922.234149] [   T6642]  ? __pfx_cifs_readv_from_socket+0x10/0x10 [cifs]
> [  922.236222] [   T6642]  ? mempool_alloc_slab+0x15/0x20
> [  922.237705] [   T6642]  cifs_read_from_socket+0xcd/0x120 [cifs]
> [  922.239559] [   T6642]  ? __pfx_cifs_read_from_socket+0x10/0x10 [cifs]
> [  922.241403] [   T6642]  ? __pfx_mempool_alloc_noprof+0x10/0x10
> [  922.242827] [   T6642]  ? __kasan_check_write+0x14/0x30
> [  922.244141] [   T6642]  ? cifs_small_buf_get+0x62/0x90 [cifs]
> [  922.245500] [   T6642]  ? allocate_buffers+0x216/0x390 [cifs]
> [  922.246810] [   T6642]  cifs_demultiplex_thread+0x7e9/0x2d50 [cifs]
> [  922.248150] [   T6642]  ? _raw_spin_lock_irqsave+0x95/0x100
> [  922.249143] [   T6642]  ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> [  922.250163] [   T6642]  ? __pfx___schedule+0x10/0x10
> [  922.250977] [   T6642]  ? _raw_spin_lock_irqsave+0x95/0x100
> [  922.251715] [   T6642]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [  922.252415] [   T6642]  ? __pfx_try_to_wake_up+0x10/0x10
> [  922.253094] [   T6642]  ? __kasan_check_read+0x11/0x20
> [  922.253766] [   T6642]  ? __kthread_parkme+0xa0/0x190
> [  922.254344] [   T6642]  ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs]
> [  922.255073] [   T6642]  kthread+0x396/0x830
> [  922.255584] [   T6642]  ? __pfx__raw_spin_lock_irq+0x10/0x10
> [  922.256070] [   T6642]  ? __pfx_kthread+0x10/0x10
> [  922.256568] [   T6642]  ? __kasan_check_write+0x14/0x30
> [  922.257047] [   T6642]  ? recalc_sigpending+0x180/0x210
> [  922.257535] [   T6642]  ? _raw_spin_unlock_irq+0xe/0x50
> [  922.258015] [   T6642]  ? calculate_sigpending+0x84/0xb0
> [  922.258509] [   T6642]  ? __pfx_kthread+0x10/0x10
> [  922.258976] [   T6642]  ret_from_fork+0x2b8/0x3b0
> [  922.259377] [   T6642]  ? __pfx_kthread+0x10/0x10
> [  922.259757] [   T6642]  ret_from_fork_asm+0x1a/0x30
> [  922.260133] [   T6642]  </TASK>
> [  922.260514] [   T6642] Modules linked in: cifs(OE) ccm cmac nls_utf8 cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 netfs siw(OE) ib_uverbs ib_core softdog vboxsf 
> vboxguest intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec polyval_clmulni 
> ghash_clmulni_intel sha1_ssse3 aesni_intel rapl i2c_piix4 i2c_smbus input_leds joydev mac_hid sunrpc binfmt_misc kvm_intel kvm irqbypass sch_fq_codel efi_pstore nfnetlink 
> vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci dmi_sysfs ip_tables x_tables autofs4 hid_generic vboxvideo drm_vram_helper usbhid 
> drm_ttm_helper vga16fb hid vgastate ahci ttm libahci video pata_acpi psmouse serio_raw wmi [last unloaded: cifs(OE)]
> 
> 
> Reverting it fixes it again.
> 
> metze


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
  2025-06-25  8:07   ` Stefan Metzmacher
  2025-06-25 10:10     ` Stefan Metzmacher
@ 2025-06-25 11:25     ` David Howells
  2025-06-25 11:51       ` Stefan Metzmacher
  1 sibling, 1 reply; 10+ messages in thread
From: David Howells @ 2025-06-25 11:25 UTC (permalink / raw)
  To: Stefan Metzmacher
  Cc: dhowells, linux-cifs@vger.kernel.org, netfs, linux-fsdevel,
	Steve French

Stefan Metzmacher <metze@samba.org> wrote:

> > [  922.218230] [   T6642] kernel BUG at mm/usercopy.c:102!

Ah, I don't have that config option enabled.  With it, I can reproduce that.

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
  2025-06-25 11:25     ` David Howells
@ 2025-06-25 11:51       ` Stefan Metzmacher
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-25 11:51 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French

Am 25.06.25 um 13:25 schrieb David Howells:
> Stefan Metzmacher <metze@samba.org> wrote:
> 
>>> [  922.218230] [   T6642] kernel BUG at mm/usercopy.c:102!
> 
> Ah, I don't have that config option enabled.  With it, I can reproduce that.

Ah, allocate_caches_and_workqueue() needs to use kmem_cache_create_usercopy/KMEM_CACHE_USERCOPY...

I was already using that in my old wip smbdirect code.

metze


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
       [not found] <1107690.1750683895@warthog.procyon.org.uk>
  2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
  2025-06-24 14:22 ` David Howells
@ 2025-06-25 12:47 ` David Howells
  2 siblings, 0 replies; 10+ messages in thread
From: David Howells @ 2025-06-25 12:47 UTC (permalink / raw)
  Cc: dhowells, Stefan Metzmacher, linux-cifs@vger.kernel.org, netfs,
	linux-fsdevel, Steve French

David Howells <dhowells@redhat.com> wrote:

> > And 4 is ITER_FOLIOQ.

I dumped some of the fields from the MID involved:

   CIFS: VFS: Invalid msg type 4 (mid=a4 optype=0 command=8)
   CIFS: VFS:  - rcv=cifs_readv_receive+0x0/0x270 cb=smb2_readv_callback+0x0/0x480 hand=smb3_handle_read_data+0x0/0x40

So the ITER_FOLIOQ is from netfslib.  I've attached corresponding trace log,
edited down a bit to remove some columns.  Note that the EINVAL error gets
discarded by cifs_demultiplex_thread() and replaced with EAGAIN by netfslib.

David
---
         diff-6828: netfs_rreq_ref: R=0000000c NEW         r=2
         diff-6828: netfs_read: R=0000000c READPAGE  c=00000000 ni=1d4072 s=0 l=1000 sz=400
         diff-6828: netfs_rreq_ref: R=0000000c GET SUBREQ  r=3
         diff-6828: smb3_rw_credits: R=0000000c[1] rd-submit   cred=16 chg=0 pool=1688 ifl=1
         diff-6828: netfs_sreq: R=0000000c[1] DOWN PREP  f=000 s=0 0/400 s=0 e=0
         diff-6828: smb3_rw_credits: R=0000000c[1] rd-issu-adj cred=16 chg=-15 pool=1688 ifl=1
         diff-6828: netfs_sreq: R=0000000c[1] DOWN SUBMT f=102 s=0 0/400 s=0 e=0
         diff-6828: netfs_rreq_ref: R=0000000c GET SUBREQ  r=4
         diff-6828: netfs_sreq: R=0000000c[2] ZERO SUBMT f=000 s=400 0/c00 s=0 e=0
         diff-6828: netfs_sreq: R=0000000c[2] ZERO TERM  f=102 s=400 c00/c00 s=1 e=0
         diff-6828: netfs_collect_stream: R=0000000c[0:] cto=0 frn=0
         diff-6828: netfs_rreq: R=0000000c RP WAIT-IP f=03
        cifsd-6506: netfs_sreq: R=0000000c[1] DOWN I-RTR f=102 s=0 0/400 s=0 e=0
        cifsd-6506: smb3_read_err:       R=0000000c[1] xid=200 sid=0x8 tid=0x2 fid=0xa0952 offset=0x0 len=0x400 rc=-11
        cifsd-6506: smb3_rw_credits: R=0000000c[1] rd-resp-clr cred=1 chg=0 pool=1703 ifl=1
        cifsd-6506: netfs_sreq: R=0000000c[1] DOWN I-OK  f=302 s=0 0/400 s=0 e=-11
        cifsd-6506: netfs_failure: R=0000000c[1] DOWN f=302 s=0 0/400 read e=-11
        cifsd-6506: netfs_rreq: R=0000000c RP PAUSE   f=03
        cifsd-6506: netfs_sreq: R=0000000c[1] DOWN TERM  f=702 s=0 0/400 s=0 e=-11
        cifsd-6506: netfs_rreq: R=0000000c RP WAKE-Q  f=07
        cifsd-6506: smb3_rw_credits: R=0000000c[1] rd-resp-add cred=0 chg=0 pool=1703 ifl=1
         diff-6828: netfs_collect_stream: R=0000000c[0:] cto=0 frn=0
         diff-6828: netfs_rreq: R=0000000c RP COLLECT f=07
         diff-6828: netfs_collect: R=0000000c s=0-1000
         diff-6828: netfs_collect_sreq: R=0000000c[0:01] s=0 t=0/400
         diff-6828: netfs_rreq: R=0000000c RP S-ABNDN f=07
         diff-6828: netfs_sreq: R=0000000c[1] DOWN ABNDN f=602 s=0 400/400 s=0 e=-11
         diff-6828: netfs_sreq: R=0000000c[1] DOWN FREE  f=602 s=0 400/400 s=0 e=-11
         diff-6828: netfs_rreq_ref: R=0000000c PUT SUBREQ  r=3
         diff-6828: netfs_collect_sreq: R=0000000c[0:02] s=400 t=c00/c00
         diff-6828: netfs_sreq: R=0000000c[2] ZERO ABNDN f=002 s=400 c00/c00 s=1 e=0
         diff-6828: netfs_sreq: R=0000000c[2] ZERO FREE  f=002 s=400 c00/c00 s=1 e=0
         diff-6828: netfs_rreq_ref: R=0000000c PUT SUBREQ  r=2
         diff-6828: netfs_collect_stream: R=0000000c[0:] cto=1000 frn=ffffffff
         diff-6828: netfs_collect_state: R=0000000c col=1000 cln=1000 n=8c
         diff-6828: netfs_rreq: R=0000000c RP UNPAUSE f=0b
         diff-6828: netfs_collect_stream: R=0000000c[0:] cto=1000 frn=ffffffff
         diff-6828: netfs_collect_state: R=0000000c col=1000 cln=1000 n=8
         diff-6828: netfs_rreq: R=0000000c RP COMPLET f=0b
         diff-6828: netfs_rreq: R=0000000c RP WAKE-IP f=0a
         diff-6828: netfs_rreq: R=0000000c RP DONE    f=0a
         diff-6828: netfs_rreq_ref: R=0000000c PUT WORK IP  r=1
         diff-6828: netfs_rreq: R=0000000c RP DONE-IP f=0a
         diff-6828: netfs_rreq_ref: R=0000000c PUT RETURN  r=0
kworker/u16:0-12  : netfs_rreq: R=0000000c RP FREE    f=0a


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
  2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
@ 2025-06-25 14:18   ` Stefan Metzmacher
  2025-06-25 16:00   ` David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Metzmacher @ 2025-06-25 14:18 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cifs@vger.kernel.org, netfs, linux-fsdevel, Steve French

Am 24.06.25 um 14:25 schrieb Stefan Metzmacher:
> Hi David,
> 
> this looks very useful! Just a few comments below...
> 
>> Collapse smbd_recv_buf() and smbd_recv_page() into smbd_recv() and just use
>> copy_to_iter() instead of memcpy().
>>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Steve French <stfrench@microsoft.com>
>> cc: Tom Talpey <tom@talpey.com>
>> cc: Stefan Metzmacher <metze@samba.org>
>> cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
>> cc: Matthew Wilcox <willy@infradead.org>
>> cc: linux-cifs@vger.kernel.org
>> cc: netfs@lists.linux.dev
>> cc: linux-fsdevel@vger.kernel.org
>> ---
>>   fs/smb/client/smbdirect.c |  116 +++++++---------------------------------------
>>   1 file changed, 20 insertions(+), 96 deletions(-)
>>
>> diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c
>> index 5ae847919da5..dc64c337aae0 100644
>> --- a/fs/smb/client/smbdirect.c
>> +++ b/fs/smb/client/smbdirect.c
>> @@ -1747,35 +1747,39 @@ struct smbd_connection *smbd_get_connection(
>>   }
>>   /*
>> - * Receive data from receive reassembly queue
>> + * Receive data from the transport's receive reassembly queue
>>    * All the incoming data packets are placed in reassembly queue
>> - * buf: the buffer to read data into
>> + * iter: the buffer to read data into
>>    * size: the length of data to read
>>    * return value: actual data read
>> - * Note: this implementation copies the data from reassebmly queue to receive
>> + *
>> + * Note: this implementation copies the data from reassembly queue to receive
>>    * buffers used by upper layer. This is not the optimal code path. A better way
>>    * to do it is to not have upper layer allocate its receive buffers but rather
>>    * borrow the buffer from reassembly queue, and return it after data is
>>    * consumed. But this will require more changes to upper layer code, and also
>>    * need to consider packet boundaries while they still being reassembled.
>>    */
>> -static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>> -        unsigned int size)
>> +int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
>>   {
>>       struct smbdirect_socket *sc = &info->socket;
>>       struct smbd_response *response;
>>       struct smbdirect_data_transfer *data_transfer;
>> +    size_t size = msg->msg_iter.count;
> 
> I think this should be iov_iter_count()?
> 
>>       int to_copy, to_read, data_read, offset;
>>       u32 data_length, remaining_data_length, data_offset;
>>       int rc;
>> +    if (WARN_ON_ONCE(iov_iter_rw(&msg->msg_iter) == WRITE))
>> +        return -EINVAL; /* It's a bug in upper layer to get there */
>> +
>>   again:
>>       /*
>>        * No need to hold the reassembly queue lock all the time as we are
>>        * the only one reading from the front of the queue. The transport
>>        * may add more entries to the back of the queue at the same time
>>        */
>> -    log_read(INFO, "size=%d info->reassembly_data_length=%d\n", size,
>> +    log_read(INFO, "size=%zd info->reassembly_data_length=%d\n", size,
>>           info->reassembly_data_length);
>>       if (info->reassembly_data_length >= size) {
>>           int queue_length;
>> @@ -1811,9 +1815,12 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>>                * transport layer is added
>>                */
>>               if (response->first_segment && size == 4) {
>> -                unsigned int rfc1002_len =
>> +                unsigned int len =

Please keep the rfc1002_len variable as it's used in the log_read message below
and it should by host byteorder.

I'd propose a diff like this:

@@ -1846,8 +1850,11 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
                         if (response->first_segment && size == 4) {
                                 unsigned int rfc1002_len =
                                         data_length + remaining_data_length;
-                               *((__be32 *)buf) = cpu_to_be32(rfc1002_len);
+                               __be32 rfc1002_hdr = cpu_to_be32(rfc1002_len);
                                 data_read = 4;
+                               if (copy_to_iter(&rfc1002_hdr, sizeof(rfc1002_hdr),
+                                                &msg->msg_iter) != data_read)
+                                       return -EFAULT;
                                 response->first_segment = false;
                                 log_read(INFO, "returning rfc1002 length %d\n",
                                         rfc1002_len);


>>                       data_length + remaining_data_length;
>> -                *((__be32 *)buf) = cpu_to_be32(rfc1002_len);
>> +                __be32 rfc1002_len = cpu_to_be32(len);
>> +                if (copy_to_iter(&rfc1002_len, sizeof(rfc1002_len),
>> +                         &msg->msg_iter) != sizeof(rfc1002_len))
>> +                    return -EFAULT;
>>                   data_read = 4;
>>                   response->first_segment = false;
>>                   log_read(INFO, "returning rfc1002 length %d\n",
>> @@ -1822,10 +1829,9 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>>               }
>>               to_copy = min_t(int, data_length - offset, to_read);
>> -            memcpy(
>> -                buf + data_read,
>> -                (char *)data_transfer + data_offset + offset,
>> -                to_copy);
>> +            if (copy_to_iter((char *)data_transfer + data_offset + offset,
>> +                     to_copy, &msg->msg_iter) != to_copy)
>> +                return -EFAULT;
>>               /* move on to the next buffer? */
>>               if (to_copy == data_length - offset) {
>> @@ -1870,6 +1876,8 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>>                data_read, info->reassembly_data_length,
>>                info->first_entry_offset);
>>   read_rfc1002_done:
>> +        /* SMBDirect will read it all or nothing */
>> +        msg->msg_iter.count = 0;
> 
> And this iov_iter_truncate(0);
> 
> While I'm wondering why we had this at all.
> 
> It seems all callers of cifs_read_iter_from_socket()
> don't care and the code path via sock_recvmsg() doesn't
> truncate it just calls copy_to_iter() via this chain:
> ->inet_recvmsg->tcp_recvmsg->skb_copy_datagram_msg->skb_copy_datagram_iter
> ->simple_copy_to_iter->copy_to_iter()
> 
> I think the old code should have called
> iov_iter_advance(rc) instead of msg->msg_iter.count = 0.
> 
> But the new code doesn't need it as copy_to_iter()
> calls iterate_and_advance().
> 
>>           return data_read;
>>       }
>> @@ -1890,90 +1898,6 @@ static int smbd_recv_buf(struct smbd_connection *info, char *buf,
>>       goto again;
>>   }
>> -/*
>> - * Receive a page from receive reassembly queue
>> - * page: the page to read data into
>> - * to_read: the length of data to read
>> - * return value: actual data read
>> - */
>> -static int smbd_recv_page(struct smbd_connection *info,
>> -        struct page *page, unsigned int page_offset,
>> -        unsigned int to_read)
>> -{
>> -    struct smbdirect_socket *sc = &info->socket;
>> -    int ret;
>> -    char *to_address;
>> -    void *page_address;
>> -
>> -    /* make sure we have the page ready for read */
>> -    ret = wait_event_interruptible(
>> -        info->wait_reassembly_queue,
>> -        info->reassembly_data_length >= to_read ||
>> -            sc->status != SMBDIRECT_SOCKET_CONNECTED);
>> -    if (ret)
>> -        return ret;
>> -
>> -    /* now we can read from reassembly queue and not sleep */
>> -    page_address = kmap_atomic(page);
>> -    to_address = (char *) page_address + page_offset;
>> -
>> -    log_read(INFO, "reading from page=%p address=%p to_read=%d\n",
>> -        page, to_address, to_read);
>> -
>> -    ret = smbd_recv_buf(info, to_address, to_read);
>> -    kunmap_atomic(page_address);
>> -
>> -    return ret;
>> -}
>> -
>> -/*
>> - * Receive data from transport
>> - * msg: a msghdr point to the buffer, can be ITER_KVEC or ITER_BVEC
>> - * return: total bytes read, or 0. SMB Direct will not do partial read.
>> - */
>> -int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
>> -{
>> -    char *buf;
>> -    struct page *page;
>> -    unsigned int to_read, page_offset;
>> -    int rc;
>> -
>> -    if (iov_iter_rw(&msg->msg_iter) == WRITE) {
>> -        /* It's a bug in upper layer to get there */
>> -        cifs_dbg(VFS, "Invalid msg iter dir %u\n",
>> -             iov_iter_rw(&msg->msg_iter));
>> -        rc = -EINVAL;
>> -        goto out;
>> -    }
>> -
>> -    switch (iov_iter_type(&msg->msg_iter)) {
>> -    case ITER_KVEC:
>> -        buf = msg->msg_iter.kvec->iov_base;
>> -        to_read = msg->msg_iter.kvec->iov_len;
>> -        rc = smbd_recv_buf(info, buf, to_read);
>> -        break;
>> -
>> -    case ITER_BVEC:
>> -        page = msg->msg_iter.bvec->bv_page;
>> -        page_offset = msg->msg_iter.bvec->bv_offset;
>> -        to_read = msg->msg_iter.bvec->bv_len;
>> -        rc = smbd_recv_page(info, page, page_offset, to_read);
>> -        break;
>> -
>> -    default:
>> -        /* It's a bug in upper layer to get there */
>> -        cifs_dbg(VFS, "Invalid msg type %d\n",
>> -             iov_iter_type(&msg->msg_iter));
>> -        rc = -EINVAL;
>> -    }
> 
> I guess this is actually a real fix as I just saw
> CIFS: VFS: Invalid msg type 4
> in logs while running the cifs/001 test.
> And 4 is ITER_FOLIOQ.
> 
> So there might be something broken when ITER_FOLIOQ was
> introduced, but I wasn't able to find a specific commit.
> Maybe it was also already broken when using
> smb3 encryption over smbdirect, when ITER_XARRAY was still used.
> 
> metze
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter()
  2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
  2025-06-25 14:18   ` Stefan Metzmacher
@ 2025-06-25 16:00   ` David Howells
  1 sibling, 0 replies; 10+ messages in thread
From: David Howells @ 2025-06-25 16:00 UTC (permalink / raw)
  To: Stefan Metzmacher
  Cc: dhowells, linux-cifs@vger.kernel.org, netfs, linux-fsdevel,
	Steve French

Stefan Metzmacher <metze@samba.org> wrote:

> Please keep the rfc1002_len variable as it's used in the log_read message
> below and it should by host byteorder.

So this change on top of the patch I posted?

@@ -1838,11 +1838,11 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
                         * transport layer is added
                         */
                        if (response->first_segment && size == 4) {
-                               unsigned int len =
+                               unsigned int rfc1002_len =
                                        data_length + remaining_data_length;
-                               __be32 rfc1002_len = cpu_to_be32(len);
-                               if (copy_to_iter(&rfc1002_len, sizeof(rfc1002_len),
-                                                &msg->msg_iter) != sizeof(rfc1002_len))
+                               __be32 rfc1002_hdr = cpu_to_be32(rfc1002_len);
+                               if (copy_to_iter(&rfc1002_hdr, sizeof(rfc1002_hdr),
+                                                &msg->msg_iter) != sizeof(rfc1002_hdr))
                                        return -EFAULT;
                                data_read = 4;
                                response->first_segment = false;

Btw, I'm changing the patch subject and description to:

    cifs: Fix reading into an ITER_FOLIOQ from the smbdirect code
    
    When performing a file read from RDMA, smbd_recv() prints an "Invalid msg
    type 4" error and fails the I/O.  This is due to the switch-statement there
    not handling the ITER_FOLIOQ handed down from netfslib.
    
    Fix this by collapsing smbd_recv_buf() and smbd_recv_page() into
    smbd_recv() and just using copy_to_iter() instead of memcpy().  This
    future-proofs the function too, in case more ITER_* types are added.
    
    Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
    Reported-by: Stefan Metzmacher <metze@samba.org>
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Steve French <stfrench@microsoft.com>
    cc: Tom Talpey <tom@talpey.com>
    cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: linux-cifs@vger.kernel.org
    cc: netfs@lists.linux.dev
    cc: linux-fsdevel@vger.kernel.org

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-06-25 16:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1107690.1750683895@warthog.procyon.org.uk>
2025-06-24 12:25 ` [PATCH] cifs: Collapse smbd_recv_*() into smbd_recv() and just use copy_to_iter() Stefan Metzmacher
2025-06-25 14:18   ` Stefan Metzmacher
2025-06-25 16:00   ` David Howells
2025-06-24 14:22 ` David Howells
2025-06-24 16:05   ` Stefan Metzmacher
2025-06-25  8:07   ` Stefan Metzmacher
2025-06-25 10:10     ` Stefan Metzmacher
2025-06-25 11:25     ` David Howells
2025-06-25 11:51       ` Stefan Metzmacher
2025-06-25 12:47 ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).