From: Brett Creeley <bcreeley@amd.com>
To: Jakub Kicinski <kuba@kernel.org>,
Shannon Nelson <shannon.nelson@amd.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net,
andrew+netdev@lunn.ch, edumazet@google.com, pabeni@redhat.com,
brett.creeley@amd.com
Subject: Re: [PATCH net 2/2] pds_core: Add a retry mechanism when the adminq is full
Date: Fri, 31 Jan 2025 11:25:24 -0800 [thread overview]
Message-ID: <fb796fb9-90d9-4618-a228-e4a8aee8bde6@amd.com> (raw)
In-Reply-To: <20250129190326.456680c8@kernel.org>
On 1/29/2025 7:03 PM, Jakub Kicinski wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Tue, 28 Jan 2025 16:43:37 -0800 Shannon Nelson wrote:
>> If the adminq is full, the driver reports failure when trying to post
>> new adminq commands. This is a bit aggressive and unexpected because
>> technically the adminq post didn't fail in this case, it was just full.
>> To harden this path add support for a bounded retry mechanism.
>>
>> It's possible some commands take longer than expected, maybe hundreds
>> of milliseconds or seconds due to other processing on the device side,
>> so to further reduce the chance of failure due to adminq full increase
>> the PDS_CORE_DEVCMD_TIMEOUT from 5 to 10 seconds.
>>
>> The caller of pdsc_adminq_post() may still see -EAGAIN reported if the
>> space in the adminq never freed up. In this case they can choose to
>> call the function again or fail. For now, no callers will retry.
>
> How about a semaphore? You can initialize it to the number of slots
> in the queue, and use down_timeout() if you want the 10 sec timeout?
After spending time digging into it a bit more, I think that this is
probably the best long term solution.
It seems like we could refactor and replace the pds_core's adminq_refcnt
that was originally introduced to resolve race conditions related to the
adminq use/teardown between various client drivers (i.e. vfio/vdpa) with
a semaphore(). This would solve both the race condition issues mentioned
above and also the adminq overflow issue in this series.
However, I'm hoping you can accept this v1 solution as the fix for net
because it does solve a problem and is a simple solution.
In the meantime we can commit to working on a refactor to use a
semaphore/down_timeout() instead of the adminq_refcnt and this
sleep/-EAGAIN mechanism for the long term solution that gets pushed to
net-next. Ideally we do this right now, but it's a bit more of a
refactor than we feel comfortable with for net.
Thanks,
Brett
prev parent reply other threads:[~2025-01-31 19:25 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-29 0:43 [PATCH net 0/2] pds_core: fixes for adminq overflow Shannon Nelson
2025-01-29 0:43 ` [PATCH net 1/2] pds_core: Prevent possible adminq overflow/stuck condition Shannon Nelson
2025-01-29 7:59 ` Michal Swiatkowski
2025-01-29 0:43 ` [PATCH net 2/2] pds_core: Add a retry mechanism when the adminq is full Shannon Nelson
2025-01-29 8:08 ` Michal Swiatkowski
2025-01-30 3:03 ` Jakub Kicinski
2025-01-31 19:25 ` Brett Creeley [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fb796fb9-90d9-4618-a228-e4a8aee8bde6@amd.com \
--to=bcreeley@amd.com \
--cc=andrew+netdev@lunn.ch \
--cc=brett.creeley@amd.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shannon.nelson@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).