Re: [PATCH net] tls: Skip tls_append_frag on zero copy size

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Maxim Mikityanskiy <maximmi@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Boris Pismenny <borisp@nvidia.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>, Tariq Toukan <tariqt@nvidia.com>,
	Aviad Yehezkel <aviadye@mellanox.com>,
	Ilya Lesokhin <ilyal@mellanox.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size
Date: Mon, 18 Apr 2022 17:56:26 +0300	[thread overview]
Message-ID: <3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com> (raw)
In-Reply-To: <20220414122808.09f31bfe@kernel.org>

On 2022-04-14 13:28, Jakub Kicinski wrote:
> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>> Calling tls_append_frag when max_open_record_len == record->len might
>> add an empty fragment to the TLS record if the call happens to be on the
>> page boundary. Normally tls_append_frag coalesces the zero-sized
>> fragment to the previous one, but not if it's on page boundary.
>>
>> If a resync happens then, the mlx5 driver posts dump WQEs in
>> tx_post_resync_dump, and the empty fragment may become a data segment
>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>> error.
>>
>> This commit fixes the described issue by skipping tls_append_frag on
>> zero size to avoid adding empty fragments. The fix is not in the driver,
>> because an empty fragment is hardly the desired behavior.
>>
>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>   net/tls/tls_device.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index 12f7b56771d9..af875ad4a822 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>>   		copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>>   		copy = min_t(size_t, copy, (max_open_record_len - record->len));
>>   
>> -		rc = tls_device_copy_data(page_address(pfrag->page) +
>> -					  pfrag->offset, copy, msg_iter);
>> -		if (rc)
>> -			goto handle_error;
>> -		tls_append_frag(record, pfrag, copy);
>> +		if (copy) {
>> +			rc = tls_device_copy_data(page_address(pfrag->page) +
>> +						  pfrag->offset, copy, msg_iter);
>> +			if (rc)
>> +				goto handle_error;
>> +			tls_append_frag(record, pfrag, copy);
>> +		}
> 
> I appreciate you're likely trying to keep the fix minimal but Greg
> always says "fix it right, worry about backports later".
> 
> I think we should skip more, we can reorder the mins and if
> min(size, rec space) == 0 then we can skip the allocation as well.

Sorry, I didn't get the idea. Could you elaborate?

Reordering the mins:

copy = min_t(size_t, size, max_open_record_len - record->len);
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);

I assume by skipping the allocation you mean skipping 
tls_do_allocation(), right? Do you suggest to skip it if the result of 
the first min_t() is 0?

record->len used in the first min_t() comes from ctx->open_record, which 
either exists or is allocated by tls_do_allocation(). If we move the 
copy == 0 check above the tls_do_allocation() call, first we'll have to 
check whether ctx->open_record is NULL, which is currently checked by 
tls_do_allocation() itself.

If open_record is not NULL, there isn't much to skip in 
tls_do_allocation on copy == 0, the main part is already skipped, 
regardless of the value of copy. If open_record is NULL, we can't skip 
tls_do_allocation, and copy won't be 0 afterwards.

To compare, before (pseudocode):

tls_do_allocation {
     if (!ctx->open_record)
         ALLOCATE RECORD
         Now ctx->open_record is not NULL
     if (!sk_page_frag_refill(sk, pfrag))
         return -ENOMEM
}
handle errors from tls_do_allocation
copy = min(size, pfrag->size - pfrag->offset)
copy = min(copy, max_open_record_len - ctx->open_record->len)
if (copy)
     copy data and append frag

After:

if (ctx->open_record) {
     copy = min(size, max_open_record_len - ctx->open_record->len)
     if (copy) {
         // You want to put this part of tls_do_allocation under if (copy)?
         if (!sk_page_frag_refill(sk, pfrag))
             handle errors
         copy = min(copy, pfrag->size - pfrag->offset)
         if (copy)
             copy data and append frag
     }
} else {
     ALLOCATE RECORD
     if (!sk_page_frag_refill(sk, pfrag))
         handle errors
     // Have to do this after the allocation anyway.
     copy = min(size, max_open_record_len - ctx->open_record->len)
     copy = min(copy, pfrag->size - pfrag->offset)
     if (copy)
         copy data and append frag
}

Either I totally don't get what you suggested, or it doesn't make sense 
to me, because we have +1 branch in the common path when a record is 
open and copy is not 0, no changes when there is no record, and more 
repeating code hard to compress.

If I missed your idea, please explain in more details.

> Maybe some application wants to do zero-length sends to flush the
> MSG_MORE and would benefit that way?

If it's a zero-length send, it means that size is 0 initially, and 
max_open_record_len - ctx->open_record->len isn't 0 (otherwise the 
record would have been closed at a previous iteration). That doesn't 
sound related to swapping the mins and skipping tls_do_allocation on 
copy == 0.

Thanks,
Max

>>   		size -= copy;
>>   		if (!size) {
>

next prev parent reply	other threads:[~2022-04-18 15:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 13:49 [PATCH net] tls: Skip tls_append_frag on zero copy size Maxim Mikityanskiy
2022-04-14 10:28 ` Jakub Kicinski
2022-04-18 14:56   ` Maxim Mikityanskiy [this message]
2022-04-21  9:47     ` Maxim Mikityanskiy
2022-04-22 14:55       ` Jakub Kicinski
2022-04-26 15:48         ` Maxim Mikityanskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com \
    --to=maximmi@nvidia.com \
    --cc=aviadye@mellanox.com \
    --cc=borisp@nvidia.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=ilyal@mellanox.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.