* Re: RPC-with-TLS client does not receive traffic [not found] <0288b61b-6a8e-409d-8e4c-3f482526cf46@oracle.com> @ 2025-05-15 14:44 ` Chuck Lever 2025-05-15 15:02 ` Hannes Reinecke 0 siblings, 1 reply; 6+ messages in thread From: Chuck Lever @ 2025-05-15 14:44 UTC (permalink / raw) To: Jakub Kicinski, Sabrina Dubroca Cc: netdev, Steve Sears, Thomas Haynes, Linux NFS Mailing List, kernel-tls-handshake Resending with linux-nfs and kernel-tls-handshake on Cc On 5/15/25 10:35 AM, Chuck Lever wrote: > Hi - > > I'm troubleshooting an issue where, after a successful handshake, the > kernel TLS socket's data_ready callback is never invoked. I'm able to > reproduce this 100% on an Atom-based system with a Realtek Ethernet > device. But on many other systems, the problem is intermittent or not > reproducible. > > The problem seems to be that strp->msg_ready is already set when > tls_data_ready is called, and that prevents any further processing. I > see that msg_ready is set when the handshake daemon sets the ktls > security parameters, and is then never cleared. > > function: tls_setsockopt > function: do_tls_setsockopt_conf > function: tls_set_device_offload_rx > function: tls_set_sw_offload > function: init_prot_info > function: tls_strp_init > function: tls_sw_strparser_arm > function: tls_strp_check_rcv > function: tls_strp_read_sock > function: tls_strp_load_anchor_with_queue > function: tls_rx_msg_size > function: tls_device_rx_resync_new_rec > function: tls_rx_msg_ready > > For a working system (a VMware guest using a VMXNet device), setsockopt > leaves msg_ready set to zero: > > function: tls_setsockopt > function: do_tls_setsockopt_conf > function: tls_set_device_offload_rx > function: tls_set_sw_offload > function: init_prot_info > function: tls_strp_init > function: tls_sw_strparser_arm > function: tls_strp_check_rcv > > The first tls_data_ready call then handles the waiting ingress data as > expected. > > Any advice is appreciated. > -- Chuck Lever ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RPC-with-TLS client does not receive traffic 2025-05-15 14:44 ` RPC-with-TLS client does not receive traffic Chuck Lever @ 2025-05-15 15:02 ` Hannes Reinecke 2025-05-15 15:05 ` Chuck Lever 0 siblings, 1 reply; 6+ messages in thread From: Hannes Reinecke @ 2025-05-15 15:02 UTC (permalink / raw) To: Chuck Lever, Jakub Kicinski, Sabrina Dubroca Cc: netdev, Steve Sears, Thomas Haynes, Linux NFS Mailing List, kernel-tls-handshake On 5/15/25 16:44, Chuck Lever wrote: > Resending with linux-nfs and kernel-tls-handshake on Cc > > > On 5/15/25 10:35 AM, Chuck Lever wrote: >> Hi - >> >> I'm troubleshooting an issue where, after a successful handshake, the >> kernel TLS socket's data_ready callback is never invoked. I'm able to >> reproduce this 100% on an Atom-based system with a Realtek Ethernet >> device. But on many other systems, the problem is intermittent or not >> reproducible. >> >> The problem seems to be that strp->msg_ready is already set when >> tls_data_ready is called, and that prevents any further processing. I >> see that msg_ready is set when the handshake daemon sets the ktls >> security parameters, and is then never cleared. >> >> function: tls_setsockopt >> function: do_tls_setsockopt_conf >> function: tls_set_device_offload_rx >> function: tls_set_sw_offload >> function: init_prot_info >> function: tls_strp_init >> function: tls_sw_strparser_arm >> function: tls_strp_check_rcv >> function: tls_strp_read_sock >> function: tls_strp_load_anchor_with_queue >> function: tls_rx_msg_size >> function: tls_device_rx_resync_new_rec >> function: tls_rx_msg_ready >> >> For a working system (a VMware guest using a VMXNet device), setsockopt >> leaves msg_ready set to zero: >> >> function: tls_setsockopt >> function: do_tls_setsockopt_conf >> function: tls_set_device_offload_rx >> function: tls_set_sw_offload >> function: init_prot_info >> function: tls_strp_init >> function: tls_sw_strparser_arm >> function: tls_strp_check_rcv >> >> The first tls_data_ready call then handles the waiting ingress data as >> expected. >> >> Any advice is appreciated. >> > I _think_ you are expected to set the callbacks prior to do the tls handshake upcall (at least, that's what I'm doing). It's not that you can (nor should) receive anything on the socket while the handshake is active. If it fails you can always reset them to the original callbacks. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RPC-with-TLS client does not receive traffic 2025-05-15 15:02 ` Hannes Reinecke @ 2025-05-15 15:05 ` Chuck Lever 2025-05-16 23:27 ` Jakub Kicinski 0 siblings, 1 reply; 6+ messages in thread From: Chuck Lever @ 2025-05-15 15:05 UTC (permalink / raw) To: Hannes Reinecke, Jakub Kicinski, Sabrina Dubroca Cc: netdev, Steve Sears, Thomas Haynes, Linux NFS Mailing List, kernel-tls-handshake On 5/15/25 11:02 AM, Hannes Reinecke wrote: > On 5/15/25 16:44, Chuck Lever wrote: >> Resending with linux-nfs and kernel-tls-handshake on Cc >> >> >> On 5/15/25 10:35 AM, Chuck Lever wrote: >>> Hi - >>> >>> I'm troubleshooting an issue where, after a successful handshake, the >>> kernel TLS socket's data_ready callback is never invoked. I'm able to >>> reproduce this 100% on an Atom-based system with a Realtek Ethernet >>> device. But on many other systems, the problem is intermittent or not >>> reproducible. >>> >>> The problem seems to be that strp->msg_ready is already set when >>> tls_data_ready is called, and that prevents any further processing. I >>> see that msg_ready is set when the handshake daemon sets the ktls >>> security parameters, and is then never cleared. >>> >>> function: tls_setsockopt >>> function: do_tls_setsockopt_conf >>> function: tls_set_device_offload_rx >>> function: tls_set_sw_offload >>> function: init_prot_info >>> function: tls_strp_init >>> function: tls_sw_strparser_arm >>> function: tls_strp_check_rcv >>> function: tls_strp_read_sock >>> function: tls_strp_load_anchor_with_queue >>> function: tls_rx_msg_size >>> function: tls_device_rx_resync_new_rec >>> function: tls_rx_msg_ready >>> >>> For a working system (a VMware guest using a VMXNet device), setsockopt >>> leaves msg_ready set to zero: >>> >>> function: tls_setsockopt >>> function: do_tls_setsockopt_conf >>> function: tls_set_device_offload_rx >>> function: tls_set_sw_offload >>> function: init_prot_info >>> function: tls_strp_init >>> function: tls_sw_strparser_arm >>> function: tls_strp_check_rcv >>> >>> The first tls_data_ready call then handles the waiting ingress data as >>> expected. >>> >>> Any advice is appreciated. >>> >> > I _think_ you are expected to set the callbacks prior to do the tls > handshake upcall (at least, that's what I'm doing). > It's not that you can (nor should) receive anything on the socket > while the handshake is active. > If it fails you can always reset them to the original callbacks. It looks to me like the socket callbacks are set up correctly. If I apply a patch to remove the msg_ready optimization from tls_data_ready, everything works as expected. diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c index 77e33e1e340e..0440391dc476 100644 --- a/net/tls/tls_strp.c +++ b/net/tls/tls_strp.c @@ -537,7 +537,7 @@ static int tls_strp_read_sock(struct tls_strparser *strp) void tls_strp_check_rcv(struct tls_strparser *strp) { - if (unlikely(strp->stopped) || strp->msg_ready) + if (unlikely(strp->stopped)) return; if (tls_strp_read_sock(strp) == -ENOMEM) -- Chuck Lever ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: RPC-with-TLS client does not receive traffic 2025-05-15 15:05 ` Chuck Lever @ 2025-05-16 23:27 ` Jakub Kicinski [not found] ` <8ABF3663-1BDD-4B87-8DA5-AB39774B1B89@oracle.com> 0 siblings, 1 reply; 6+ messages in thread From: Jakub Kicinski @ 2025-05-16 23:27 UTC (permalink / raw) To: Chuck Lever Cc: Hannes Reinecke, Sabrina Dubroca, netdev, Steve Sears, Thomas Haynes, Linux NFS Mailing List, kernel-tls-handshake On Thu, 15 May 2025 11:05:21 -0400 Chuck Lever wrote: > >>> The first tls_data_ready call then handles the waiting ingress data as > >>> expected. > > > > I _think_ you are expected to set the callbacks prior to do the tls > > handshake upcall (at least, that's what I'm doing). > > It's not that you can (nor should) receive anything on the socket > > while the handshake is active. > > If it fails you can always reset them to the original callbacks. > > It looks to me like the socket callbacks are set up correctly. If I > apply a patch to remove the msg_ready optimization from tls_data_ready, > everything works as expected. The thinking is that we can stop reporting "data ready" once we have a data record, because reader must check for pre-existing data when starting to monitor the socket. I suspect when you say "everything works as expected" you mean that the next chunk of data coming in wakes the reader and reader catches up? Could you point me to the exact code path that handles the callback installation? Does it handle a socket with data in rcvq already? ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <8ABF3663-1BDD-4B87-8DA5-AB39774B1B89@oracle.com>]
[parent not found: <20250516165355.6efb470e@kernel.org>]
* Re: RPC-with-TLS client does not receive traffic [not found] ` <20250516165355.6efb470e@kernel.org> @ 2025-05-17 16:39 ` Chuck Lever 2025-05-19 23:01 ` Jakub Kicinski 0 siblings, 1 reply; 6+ messages in thread From: Chuck Lever @ 2025-05-17 16:39 UTC (permalink / raw) To: Jakub Kicinski Cc: netdev, Steve Sears, Thomas Haynes, kernel-tls-handshake, Sabrina Dubroca On 5/16/25 7:53 PM, Jakub Kicinski wrote: > On Fri, 16 May 2025 23:38:18 +0000 Chuck Lever III wrote: >>> On Thu, 15 May 2025 11:05:21 -0400 Chuck Lever wrote: >>>> It looks to me like the socket callbacks are set up correctly. If I >>>> apply a patch to remove the msg_ready optimization from tls_data_ready, >>>> everything works as expected. >>> >>> The thinking is that we can stop reporting "data ready" once we have >>> a data record, because reader must check for pre-existing data when >>> starting to monitor the socket. I suspect when you say "everything >>> works as expected" you mean that the next chunk of data coming in >>> wakes the reader and reader catches up? >>> >>> Could you point me to the exact code path that handles the callback >>> installation? Does it handle a socket with data in rcvq already? >> >> I’m away from my plaintext MUA at the moment, so HTML only, I’m afraid. >> >> xs_tcp_tls_finish_connecting() is where the data_ready callback address is modified. > > Hm, yes, my intuition would be to add a xs_poll_check_readable() > after connection set up to check if we raced with data being queued? > > IIUC sk->sk_user_data is not set up when the first event fires > so xs_data_ready() ignores it? We can't set user_data sooner? I think the answer to this is that sunrpc never sees a data ready event. The value contained in sk->sk_user_data is therefore irrelevant. Because tls_setsockopt() sets strp->msg_ready, when the underlying socket event arrives tls_data_ready() is a no-op. That terminates the ->data_ready call chain before xs_data_ready can be called. The handshake daemon sets the session key by calling tls_setsockopt. When it hangs: function: tls_setsockopt function: do_tls_setsockopt_conf function: tls_set_device_offload_rx function: tls_set_sw_offload function: init_prot_info function: tls_strp_init function: tls_sw_strparser_arm function: tls_strp_check_rcv function: tls_strp_read_sock function: tls_strp_load_anchor_with_queue function: tls_rx_msg_size function: tls_device_rx_resync_new_rec function: tls_rx_msg_ready <<<<< The next call to tls_data_ready sees strp->msg_ready is set, returns without doing anything, and progress stops. In the successful case, tls_strp_check_rcv() simply returns, leaving strp->msg_ready set to zero. The next call to tls_data_ready can then process the ingress data and call xs_data_ready. -- Chuck Lever ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RPC-with-TLS client does not receive traffic 2025-05-17 16:39 ` Chuck Lever @ 2025-05-19 23:01 ` Jakub Kicinski 0 siblings, 0 replies; 6+ messages in thread From: Jakub Kicinski @ 2025-05-19 23:01 UTC (permalink / raw) To: Chuck Lever Cc: netdev, Steve Sears, Thomas Haynes, kernel-tls-handshake, Sabrina Dubroca On Sat, 17 May 2025 12:39:58 -0400 Chuck Lever wrote: > > Hm, yes, my intuition would be to add a xs_poll_check_readable() > > after connection set up to check if we raced with data being queued? > > > > IIUC sk->sk_user_data is not set up when the first event fires > > so xs_data_ready() ignores it? We can't set user_data sooner? > > I think the answer to this is that sunrpc never sees a data ready event. > The value contained in sk->sk_user_data is therefore irrelevant. > > Because tls_setsockopt() sets strp->msg_ready, when the underlying > socket event arrives tls_data_ready() is a no-op. That terminates the > ->data_ready call chain before xs_data_ready can be called. > > The handshake daemon sets the session key by calling tls_setsockopt. > When it hangs: > > function: tls_setsockopt > function: do_tls_setsockopt_conf > function: tls_set_device_offload_rx > function: tls_set_sw_offload > function: init_prot_info > function: tls_strp_init > function: tls_sw_strparser_arm > function: tls_strp_check_rcv > function: tls_strp_read_sock > function: tls_strp_load_anchor_with_queue > function: tls_rx_msg_size > function: tls_device_rx_resync_new_rec > function: tls_rx_msg_ready <<<<< > > The next call to tls_data_ready sees strp->msg_ready is set, returns > without doing anything, and progress stops. > > In the successful case, tls_strp_check_rcv() simply returns, leaving > strp->msg_ready set to zero. The next call to tls_data_ready can > then process the ingress data and call xs_data_ready. Is there any data queued on the TLS socket already when it "hangs" ? If it's getting into msg_ready state without the data - it's a bug in TLS. If there's a full record queued at the time when handshake passes the socket back to the kernel - it's up to the reader to read the already queued data out. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-19 23:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <0288b61b-6a8e-409d-8e4c-3f482526cf46@oracle.com>
2025-05-15 14:44 ` RPC-with-TLS client does not receive traffic Chuck Lever
2025-05-15 15:02 ` Hannes Reinecke
2025-05-15 15:05 ` Chuck Lever
2025-05-16 23:27 ` Jakub Kicinski
[not found] ` <8ABF3663-1BDD-4B87-8DA5-AB39774B1B89@oracle.com>
[not found] ` <20250516165355.6efb470e@kernel.org>
2025-05-17 16:39 ` Chuck Lever
2025-05-19 23:01 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox