Re: [PATCH net] tls: Skip tls_append_frag on zero copy size

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Maxim Mikityanskiy <maximmi@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Boris Pismenny <borisp@nvidia.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>, Tariq Toukan <tariqt@nvidia.com>,
	Aviad Yehezkel <aviadye@mellanox.com>,
	Ilya Lesokhin <ilyal@mellanox.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size
Date: Mon, 18 Apr 2022 17:56:26 +0300	[thread overview]
Message-ID: <3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com> (raw)
In-Reply-To: <20220414122808.09f31bfe@kernel.org>

On 2022-04-14 13:28, Jakub Kicinski wrote:
> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>> Calling tls_append_frag when max_open_record_len == record->len might
>> add an empty fragment to the TLS record if the call happens to be on the
>> page boundary. Normally tls_append_frag coalesces the zero-sized
>> fragment to the previous one, but not if it's on page boundary.
>>
>> If a resync happens then, the mlx5 driver posts dump WQEs in
>> tx_post_resync_dump, and the empty fragment may become a data segment
>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>> error.
>>
>> This commit fixes the described issue by skipping tls_append_frag on
>> zero size to avoid adding empty fragments. The fix is not in the driver,
>> because an empty fragment is hardly the desired behavior.
>>
>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>   net/tls/tls_device.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index 12f7b56771d9..af875ad4a822 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>>   		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>>   		copy = min_t(size_t, copy, (max_open_record_len - record->len));
>>   
>> -		rc = tls_device_copy_data(page_address(pfrag->page) +
>> -					  pfrag->offset, copy, msg_iter);
>> -		if (rc)
>> -			goto handle_error;
>> -		tls_append_frag(record, pfrag, copy);
>> +		if (copy) {
>> +			rc = tls_device_copy_data(page_address(pfrag->page) +
>> +						  pfrag->offset, copy, msg_iter);
>> +			if (rc)
>> +				goto handle_error;
>> +			tls_append_frag(record, pfrag, copy);
>> +		}
> 
> I appreciate you're likely trying to keep the fix minimal but Greg
> always says "fix it right, worry about backports later".
> 
> I think we should skip more, we can reorder the mins and if
> min(size, rec space) == 0 then we can skip the allocation as well.

Sorry, I didn't get the idea. Could you elaborate?

Reordering the mins:

copy = min_t(size_t, size, max_open_record_len - record->len);
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);

I assume by skipping the allocation you mean skipping 
tls_do_allocation(), right? Do you suggest to skip it if the result of 
the first min_t() is 0?

record->len used in the first min_t() comes from ctx->open_record, which 
either exists or is allocated by tls_do_allocation(). If we move the 
copy == 0 check above the tls_do_allocation() call, first we'll have to 
check whether ctx->open_record is NULL, which is currently checked by 
tls_do_allocation() itself.

If open_record is not NULL, there isn't much to skip in 
tls_do_allocation on copy == 0, the main part is already skipped, 
regardless of the value of copy. If open_record is NULL, we can't skip 
tls_do_allocation, and copy won't be 0 afterwards.

To compare, before (pseudocode):

tls_do_allocation {
     if (!ctx->open_record)
         ALLOCATE RECORD
         Now ctx->open_record is not NULL
     if (!sk_page_frag_refill(sk, pfrag))
         return -ENOMEM
}
handle errors from tls_do_allocation
copy = min(size, pfrag->size - pfrag->offset)
copy = min(copy, max_open_record_len - ctx->open_record->len)
if (copy)
     copy data and append frag

After:

if (ctx->open_record) {
     copy = min(size, max_open_record_len - ctx->open_record->len)
     if (copy) {
         // You want to put this part of tls_do_allocation under if (copy)?
         if (!sk_page_frag_refill(sk, pfrag))
             handle errors
         copy = min(copy, pfrag->size - pfrag->offset)
         if (copy)
             copy data and append frag
     }
} else {
     ALLOCATE RECORD
     if (!sk_page_frag_refill(sk, pfrag))
         handle errors
     // Have to do this after the allocation anyway.
     copy = min(size, max_open_record_len - ctx->open_record->len)
     copy = min(copy, pfrag->size - pfrag->offset)
     if (copy)
         copy data and append frag
}

Either I totally don't get what you suggested, or it doesn't make sense 
to me, because we have +1 branch in the common path when a record is 
open and copy is not 0, no changes when there is no record, and more 
repeating code hard to compress.

If I missed your idea, please explain in more details.

> Maybe some application wants to do zero-length sends to flush the
> MSG_MORE and would benefit that way?

If it's a zero-length send, it means that size is 0 initially, and 
max_open_record_len - ctx->open_record->len isn't 0 (otherwise the 
record would have been closed at a previous iteration). That doesn't 
sound related to swapping the mins and skipping tls_do_allocation on 
copy == 0.

Thanks,
Max

>>   		size -= copy;
>>   		if (!size) {
>

next prev parent reply	other threads:[~2022-04-18 15:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 13:49 [PATCH net] tls: Skip tls_append_frag on zero copy size Maxim Mikityanskiy
2022-04-14 10:28 ` Jakub Kicinski
2022-04-18 14:56   ` Maxim Mikityanskiy [this message]
2022-04-21  9:47     ` Maxim Mikityanskiy
2022-04-22 14:55       ` Jakub Kicinski
2022-04-26 15:48         ` Maxim Mikityanskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com \
    --to=maximmi@nvidia.com \
    --cc=aviadye@mellanox.com \
    --cc=borisp@nvidia.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ilyal@mellanox.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).