From: Maxim Mikityanskiy <maximmi@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Boris Pismenny <borisp@nvidia.com>,
John Fastabend <john.fastabend@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Paolo Abeni <pabeni@redhat.com>, Tariq Toukan <tariqt@nvidia.com>,
Aviad Yehezkel <aviadye@mellanox.com>,
Ilya Lesokhin <ilyal@mellanox.com>,
netdev@vger.kernel.org
Subject: Re: [PATCH net] tls: Skip tls_append_frag on zero copy size
Date: Mon, 18 Apr 2022 17:56:26 +0300 [thread overview]
Message-ID: <3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com> (raw)
In-Reply-To: <20220414122808.09f31bfe@kernel.org>
On 2022-04-14 13:28, Jakub Kicinski wrote:
> On Wed, 13 Apr 2022 16:49:56 +0300 Maxim Mikityanskiy wrote:
>> Calling tls_append_frag when max_open_record_len == record->len might
>> add an empty fragment to the TLS record if the call happens to be on the
>> page boundary. Normally tls_append_frag coalesces the zero-sized
>> fragment to the previous one, but not if it's on page boundary.
>>
>> If a resync happens then, the mlx5 driver posts dump WQEs in
>> tx_post_resync_dump, and the empty fragment may become a data segment
>> with byte_count == 0, which will confuse the NIC and lead to a CQE
>> error.
>>
>> This commit fixes the described issue by skipping tls_append_frag on
>> zero size to avoid adding empty fragments. The fix is not in the driver,
>> because an empty fragment is hardly the desired behavior.
>>
>> Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
>> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>> net/tls/tls_device.c | 12 +++++++-----
>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index 12f7b56771d9..af875ad4a822 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -483,11 +483,13 @@ static int tls_push_data(struct sock *sk,
>> copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
>> copy = min_t(size_t, copy, (max_open_record_len - record->len));
>>
>> - rc = tls_device_copy_data(page_address(pfrag->page) +
>> - pfrag->offset, copy, msg_iter);
>> - if (rc)
>> - goto handle_error;
>> - tls_append_frag(record, pfrag, copy);
>> + if (copy) {
>> + rc = tls_device_copy_data(page_address(pfrag->page) +
>> + pfrag->offset, copy, msg_iter);
>> + if (rc)
>> + goto handle_error;
>> + tls_append_frag(record, pfrag, copy);
>> + }
>
> I appreciate you're likely trying to keep the fix minimal but Greg
> always says "fix it right, worry about backports later".
>
> I think we should skip more, we can reorder the mins and if
> min(size, rec space) == 0 then we can skip the allocation as well.
Sorry, I didn't get the idea. Could you elaborate?
Reordering the mins:
copy = min_t(size_t, size, max_open_record_len - record->len);
copy = min_t(size_t, copy, pfrag->size - pfrag->offset);
I assume by skipping the allocation you mean skipping
tls_do_allocation(), right? Do you suggest to skip it if the result of
the first min_t() is 0?
record->len used in the first min_t() comes from ctx->open_record, which
either exists or is allocated by tls_do_allocation(). If we move the
copy == 0 check above the tls_do_allocation() call, first we'll have to
check whether ctx->open_record is NULL, which is currently checked by
tls_do_allocation() itself.
If open_record is not NULL, there isn't much to skip in
tls_do_allocation on copy == 0, the main part is already skipped,
regardless of the value of copy. If open_record is NULL, we can't skip
tls_do_allocation, and copy won't be 0 afterwards.
To compare, before (pseudocode):
tls_do_allocation {
if (!ctx->open_record)
ALLOCATE RECORD
Now ctx->open_record is not NULL
if (!sk_page_frag_refill(sk, pfrag))
return -ENOMEM
}
handle errors from tls_do_allocation
copy = min(size, pfrag->size - pfrag->offset)
copy = min(copy, max_open_record_len - ctx->open_record->len)
if (copy)
copy data and append frag
After:
if (ctx->open_record) {
copy = min(size, max_open_record_len - ctx->open_record->len)
if (copy) {
// You want to put this part of tls_do_allocation under if (copy)?
if (!sk_page_frag_refill(sk, pfrag))
handle errors
copy = min(copy, pfrag->size - pfrag->offset)
if (copy)
copy data and append frag
}
} else {
ALLOCATE RECORD
if (!sk_page_frag_refill(sk, pfrag))
handle errors
// Have to do this after the allocation anyway.
copy = min(size, max_open_record_len - ctx->open_record->len)
copy = min(copy, pfrag->size - pfrag->offset)
if (copy)
copy data and append frag
}
Either I totally don't get what you suggested, or it doesn't make sense
to me, because we have +1 branch in the common path when a record is
open and copy is not 0, no changes when there is no record, and more
repeating code hard to compress.
If I missed your idea, please explain in more details.
> Maybe some application wants to do zero-length sends to flush the
> MSG_MORE and would benefit that way?
If it's a zero-length send, it means that size is 0 initially, and
max_open_record_len - ctx->open_record->len isn't 0 (otherwise the
record would have been closed at a previous iteration). That doesn't
sound related to swapping the mins and skipping tls_do_allocation on
copy == 0.
Thanks,
Max
>> size -= copy;
>> if (!size) {
>
next prev parent reply other threads:[~2022-04-18 15:34 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 13:49 [PATCH net] tls: Skip tls_append_frag on zero copy size Maxim Mikityanskiy
2022-04-14 10:28 ` Jakub Kicinski
2022-04-18 14:56 ` Maxim Mikityanskiy [this message]
2022-04-21 9:47 ` Maxim Mikityanskiy
2022-04-22 14:55 ` Jakub Kicinski
2022-04-26 15:48 ` Maxim Mikityanskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3c90d3cd-5224-4224-e9d9-e45546ce51c6@nvidia.com \
--to=maximmi@nvidia.com \
--cc=aviadye@mellanox.com \
--cc=borisp@nvidia.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=ilyal@mellanox.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).