netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RPC-with-TLS client does not receive traffic
@ 2025-05-15 14:35 Chuck Lever
  2025-05-15 14:44 ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2025-05-15 14:35 UTC (permalink / raw)
  To: Jakub Kicinski, Sabrina Dubroca; +Cc: netdev, Steve Sears, Thomas Haynes

Hi -

I'm troubleshooting an issue where, after a successful handshake, the
kernel TLS socket's data_ready callback is never invoked. I'm able to
reproduce this 100% on an Atom-based system with a Realtek Ethernet
device. But on many other systems, the problem is intermittent or not
reproducible.

The problem seems to be that strp->msg_ready is already set when
tls_data_ready is called, and that prevents any further processing. I
see that msg_ready is set when the handshake daemon sets the ktls
security parameters, and is then never cleared.

function:             tls_setsockopt
function:                do_tls_setsockopt_conf
function:                   tls_set_device_offload_rx
function:                   tls_set_sw_offload
function:                      init_prot_info
function:                      tls_strp_init
function:                   tls_sw_strparser_arm
function:                   tls_strp_check_rcv
function:                      tls_strp_read_sock
function:                         tls_strp_load_anchor_with_queue
function:                         tls_rx_msg_size
function:                            tls_device_rx_resync_new_rec
function:                         tls_rx_msg_ready

For a working system (a VMware guest using a VMXNet device), setsockopt
leaves msg_ready set to zero:

function:             tls_setsockopt
function:                do_tls_setsockopt_conf
function:                   tls_set_device_offload_rx
function:                   tls_set_sw_offload
function:                      init_prot_info
function:                      tls_strp_init
function:                   tls_sw_strparser_arm
function:                   tls_strp_check_rcv

The first tls_data_ready call then handles the waiting ingress data as
expected.

Any advice is appreciated.

-- 
Chuck Lever


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RPC-with-TLS client does not receive traffic
  2025-05-15 14:35 RPC-with-TLS client does not receive traffic Chuck Lever
@ 2025-05-15 14:44 ` Chuck Lever
  2025-05-15 15:02   ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2025-05-15 14:44 UTC (permalink / raw)
  To: Jakub Kicinski, Sabrina Dubroca
  Cc: netdev, Steve Sears, Thomas Haynes, Linux NFS Mailing List,
	kernel-tls-handshake

Resending with linux-nfs and kernel-tls-handshake on Cc


On 5/15/25 10:35 AM, Chuck Lever wrote:
> Hi -
> 
> I'm troubleshooting an issue where, after a successful handshake, the
> kernel TLS socket's data_ready callback is never invoked. I'm able to
> reproduce this 100% on an Atom-based system with a Realtek Ethernet
> device. But on many other systems, the problem is intermittent or not
> reproducible.
> 
> The problem seems to be that strp->msg_ready is already set when
> tls_data_ready is called, and that prevents any further processing. I
> see that msg_ready is set when the handshake daemon sets the ktls
> security parameters, and is then never cleared.
> 
> function:             tls_setsockopt
> function:                do_tls_setsockopt_conf
> function:                   tls_set_device_offload_rx
> function:                   tls_set_sw_offload
> function:                      init_prot_info
> function:                      tls_strp_init
> function:                   tls_sw_strparser_arm
> function:                   tls_strp_check_rcv
> function:                      tls_strp_read_sock
> function:                         tls_strp_load_anchor_with_queue
> function:                         tls_rx_msg_size
> function:                            tls_device_rx_resync_new_rec
> function:                         tls_rx_msg_ready
> 
> For a working system (a VMware guest using a VMXNet device), setsockopt
> leaves msg_ready set to zero:
> 
> function:             tls_setsockopt
> function:                do_tls_setsockopt_conf
> function:                   tls_set_device_offload_rx
> function:                   tls_set_sw_offload
> function:                      init_prot_info
> function:                      tls_strp_init
> function:                   tls_sw_strparser_arm
> function:                   tls_strp_check_rcv
> 
> The first tls_data_ready call then handles the waiting ingress data as
> expected.
> 
> Any advice is appreciated.
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RPC-with-TLS client does not receive traffic
  2025-05-15 14:44 ` Chuck Lever
@ 2025-05-15 15:02   ` Hannes Reinecke
  2025-05-15 15:05     ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2025-05-15 15:02 UTC (permalink / raw)
  To: Chuck Lever, Jakub Kicinski, Sabrina Dubroca
  Cc: netdev, Steve Sears, Thomas Haynes, Linux NFS Mailing List,
	kernel-tls-handshake

On 5/15/25 16:44, Chuck Lever wrote:
> Resending with linux-nfs and kernel-tls-handshake on Cc
> 
> 
> On 5/15/25 10:35 AM, Chuck Lever wrote:
>> Hi -
>>
>> I'm troubleshooting an issue where, after a successful handshake, the
>> kernel TLS socket's data_ready callback is never invoked. I'm able to
>> reproduce this 100% on an Atom-based system with a Realtek Ethernet
>> device. But on many other systems, the problem is intermittent or not
>> reproducible.
>>
>> The problem seems to be that strp->msg_ready is already set when
>> tls_data_ready is called, and that prevents any further processing. I
>> see that msg_ready is set when the handshake daemon sets the ktls
>> security parameters, and is then never cleared.
>>
>> function:             tls_setsockopt
>> function:                do_tls_setsockopt_conf
>> function:                   tls_set_device_offload_rx
>> function:                   tls_set_sw_offload
>> function:                      init_prot_info
>> function:                      tls_strp_init
>> function:                   tls_sw_strparser_arm
>> function:                   tls_strp_check_rcv
>> function:                      tls_strp_read_sock
>> function:                         tls_strp_load_anchor_with_queue
>> function:                         tls_rx_msg_size
>> function:                            tls_device_rx_resync_new_rec
>> function:                         tls_rx_msg_ready
>>
>> For a working system (a VMware guest using a VMXNet device), setsockopt
>> leaves msg_ready set to zero:
>>
>> function:             tls_setsockopt
>> function:                do_tls_setsockopt_conf
>> function:                   tls_set_device_offload_rx
>> function:                   tls_set_sw_offload
>> function:                      init_prot_info
>> function:                      tls_strp_init
>> function:                   tls_sw_strparser_arm
>> function:                   tls_strp_check_rcv
>>
>> The first tls_data_ready call then handles the waiting ingress data as
>> expected.
>>
>> Any advice is appreciated.
>>
> 
I _think_ you are expected to set the callbacks prior to do the tls 
handshake upcall (at least, that's what I'm doing).
It's not that you can (nor should) receive anything on the socket
while the handshake is active.
If it fails you can always reset them to the original callbacks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RPC-with-TLS client does not receive traffic
  2025-05-15 15:02   ` Hannes Reinecke
@ 2025-05-15 15:05     ` Chuck Lever
  2025-05-16 23:27       ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2025-05-15 15:05 UTC (permalink / raw)
  To: Hannes Reinecke, Jakub Kicinski, Sabrina Dubroca
  Cc: netdev, Steve Sears, Thomas Haynes, Linux NFS Mailing List,
	kernel-tls-handshake

On 5/15/25 11:02 AM, Hannes Reinecke wrote:
> On 5/15/25 16:44, Chuck Lever wrote:
>> Resending with linux-nfs and kernel-tls-handshake on Cc
>>
>>
>> On 5/15/25 10:35 AM, Chuck Lever wrote:
>>> Hi -
>>>
>>> I'm troubleshooting an issue where, after a successful handshake, the
>>> kernel TLS socket's data_ready callback is never invoked. I'm able to
>>> reproduce this 100% on an Atom-based system with a Realtek Ethernet
>>> device. But on many other systems, the problem is intermittent or not
>>> reproducible.
>>>
>>> The problem seems to be that strp->msg_ready is already set when
>>> tls_data_ready is called, and that prevents any further processing. I
>>> see that msg_ready is set when the handshake daemon sets the ktls
>>> security parameters, and is then never cleared.
>>>
>>> function:             tls_setsockopt
>>> function:                do_tls_setsockopt_conf
>>> function:                   tls_set_device_offload_rx
>>> function:                   tls_set_sw_offload
>>> function:                      init_prot_info
>>> function:                      tls_strp_init
>>> function:                   tls_sw_strparser_arm
>>> function:                   tls_strp_check_rcv
>>> function:                      tls_strp_read_sock
>>> function:                         tls_strp_load_anchor_with_queue
>>> function:                         tls_rx_msg_size
>>> function:                            tls_device_rx_resync_new_rec
>>> function:                         tls_rx_msg_ready
>>>
>>> For a working system (a VMware guest using a VMXNet device), setsockopt
>>> leaves msg_ready set to zero:
>>>
>>> function:             tls_setsockopt
>>> function:                do_tls_setsockopt_conf
>>> function:                   tls_set_device_offload_rx
>>> function:                   tls_set_sw_offload
>>> function:                      init_prot_info
>>> function:                      tls_strp_init
>>> function:                   tls_sw_strparser_arm
>>> function:                   tls_strp_check_rcv
>>>
>>> The first tls_data_ready call then handles the waiting ingress data as
>>> expected.
>>>
>>> Any advice is appreciated.
>>>
>>
> I _think_ you are expected to set the callbacks prior to do the tls
> handshake upcall (at least, that's what I'm doing).
> It's not that you can (nor should) receive anything on the socket
> while the handshake is active.
> If it fails you can always reset them to the original callbacks.

It looks to me like the socket callbacks are set up correctly. If I
apply a patch to remove the msg_ready optimization from tls_data_ready,
everything works as expected.

diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index 77e33e1e340e..0440391dc476 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -537,7 +537,7 @@ static int tls_strp_read_sock(struct tls_strparser
*strp)

 void tls_strp_check_rcv(struct tls_strparser *strp)
 {
-       if (unlikely(strp->stopped) || strp->msg_ready)
+       if (unlikely(strp->stopped))
                return;

        if (tls_strp_read_sock(strp) == -ENOMEM)


-- 
Chuck Lever

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: RPC-with-TLS client does not receive traffic
  2025-05-15 15:05     ` Chuck Lever
@ 2025-05-16 23:27       ` Jakub Kicinski
       [not found]         ` <8ABF3663-1BDD-4B87-8DA5-AB39774B1B89@oracle.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-05-16 23:27 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Hannes Reinecke, Sabrina Dubroca, netdev, Steve Sears,
	Thomas Haynes, Linux NFS Mailing List, kernel-tls-handshake

On Thu, 15 May 2025 11:05:21 -0400 Chuck Lever wrote:
> >>> The first tls_data_ready call then handles the waiting ingress data as
> >>> expected.
> >
> > I _think_ you are expected to set the callbacks prior to do the tls
> > handshake upcall (at least, that's what I'm doing).
> > It's not that you can (nor should) receive anything on the socket
> > while the handshake is active.
> > If it fails you can always reset them to the original callbacks.  
> 
> It looks to me like the socket callbacks are set up correctly. If I
> apply a patch to remove the msg_ready optimization from tls_data_ready,
> everything works as expected.

The thinking is that we can stop reporting "data ready" once we have 
a data record, because reader must check for pre-existing data when
starting to monitor the socket. I suspect when you say "everything
works as expected" you mean that the next chunk of data coming in
wakes the reader and reader catches up?

Could you point me to the exact code path that handles the callback
installation? Does it handle a socket with data in rcvq already?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RPC-with-TLS client does not receive traffic
       [not found]           ` <20250516165355.6efb470e@kernel.org>
@ 2025-05-17 16:39             ` Chuck Lever
  2025-05-19 23:01               ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2025-05-17 16:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Steve Sears, Thomas Haynes, kernel-tls-handshake,
	Sabrina Dubroca

On 5/16/25 7:53 PM, Jakub Kicinski wrote:
> On Fri, 16 May 2025 23:38:18 +0000 Chuck Lever III wrote:
>>> On Thu, 15 May 2025 11:05:21 -0400 Chuck Lever wrote:  
>>>> It looks to me like the socket callbacks are set up correctly. If I
>>>> apply a patch to remove the msg_ready optimization from tls_data_ready,
>>>> everything works as expected.  
>>>
>>> The thinking is that we can stop reporting "data ready" once we have
>>> a data record, because reader must check for pre-existing data when
>>> starting to monitor the socket. I suspect when you say "everything
>>> works as expected" you mean that the next chunk of data coming in
>>> wakes the reader and reader catches up?
>>>
>>> Could you point me to the exact code path that handles the callback
>>> installation? Does it handle a socket with data in rcvq already?  
>>
>> I’m away from my plaintext MUA at the moment, so HTML only, I’m afraid.
>>
>> xs_tcp_tls_finish_connecting() is where the data_ready callback address is modified.
> 
> Hm, yes, my intuition would be to add a xs_poll_check_readable() 
> after connection set up to check if we raced with data being queued?
> 
> IIUC sk->sk_user_data is not set up when the first event fires
> so xs_data_ready() ignores it?  We can't set user_data sooner?

I think the answer to this is that sunrpc never sees a data ready event.
The value contained in sk->sk_user_data is therefore irrelevant.

Because tls_setsockopt() sets strp->msg_ready, when the underlying
socket event arrives tls_data_ready() is a no-op. That terminates the
 ->data_ready call chain before xs_data_ready can be called.

The handshake daemon sets the session key by calling tls_setsockopt.
When it hangs:

function:             tls_setsockopt
function:                do_tls_setsockopt_conf
function:                   tls_set_device_offload_rx
function:                   tls_set_sw_offload
function:                      init_prot_info
function:                      tls_strp_init
function:                   tls_sw_strparser_arm
function:                   tls_strp_check_rcv
function:                      tls_strp_read_sock
function:                         tls_strp_load_anchor_with_queue
function:                         tls_rx_msg_size
function:                            tls_device_rx_resync_new_rec
function:                         tls_rx_msg_ready    <<<<<

The next call to tls_data_ready sees strp->msg_ready is set, returns
without doing anything, and progress stops.

In the successful case, tls_strp_check_rcv() simply returns, leaving
strp->msg_ready set to zero. The next call to tls_data_ready can
then process the ingress data and call xs_data_ready.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RPC-with-TLS client does not receive traffic
  2025-05-17 16:39             ` Chuck Lever
@ 2025-05-19 23:01               ` Jakub Kicinski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Kicinski @ 2025-05-19 23:01 UTC (permalink / raw)
  To: Chuck Lever
  Cc: netdev, Steve Sears, Thomas Haynes, kernel-tls-handshake,
	Sabrina Dubroca

On Sat, 17 May 2025 12:39:58 -0400 Chuck Lever wrote:
> > Hm, yes, my intuition would be to add a xs_poll_check_readable() 
> > after connection set up to check if we raced with data being queued?
> > 
> > IIUC sk->sk_user_data is not set up when the first event fires
> > so xs_data_ready() ignores it?  We can't set user_data sooner?  
> 
> I think the answer to this is that sunrpc never sees a data ready event.
> The value contained in sk->sk_user_data is therefore irrelevant.
> 
> Because tls_setsockopt() sets strp->msg_ready, when the underlying
> socket event arrives tls_data_ready() is a no-op. That terminates the
>  ->data_ready call chain before xs_data_ready can be called.  
> 
> The handshake daemon sets the session key by calling tls_setsockopt.
> When it hangs:
> 
> function:             tls_setsockopt
> function:                do_tls_setsockopt_conf
> function:                   tls_set_device_offload_rx
> function:                   tls_set_sw_offload
> function:                      init_prot_info
> function:                      tls_strp_init
> function:                   tls_sw_strparser_arm
> function:                   tls_strp_check_rcv
> function:                      tls_strp_read_sock
> function:                         tls_strp_load_anchor_with_queue
> function:                         tls_rx_msg_size
> function:                            tls_device_rx_resync_new_rec
> function:                         tls_rx_msg_ready    <<<<<
> 
> The next call to tls_data_ready sees strp->msg_ready is set, returns
> without doing anything, and progress stops.
> 
> In the successful case, tls_strp_check_rcv() simply returns, leaving
> strp->msg_ready set to zero. The next call to tls_data_ready can
> then process the ingress data and call xs_data_ready.

Is there any data queued on the TLS socket already when it "hangs" ?
If it's getting into msg_ready state without the data - it's a bug 
in TLS. If there's a full record queued at the time when handshake
passes the socket back to the kernel - it's up to the reader to read
the already queued data out.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-19 23:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 14:35 RPC-with-TLS client does not receive traffic Chuck Lever
2025-05-15 14:44 ` Chuck Lever
2025-05-15 15:02   ` Hannes Reinecke
2025-05-15 15:05     ` Chuck Lever
2025-05-16 23:27       ` Jakub Kicinski
     [not found]         ` <8ABF3663-1BDD-4B87-8DA5-AB39774B1B89@oracle.com>
     [not found]           ` <20250516165355.6efb470e@kernel.org>
2025-05-17 16:39             ` Chuck Lever
2025-05-19 23:01               ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).