From: Simon Horman <horms@kernel.org>
To: Brett Creeley <brett.creeley@amd.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, shannon.nelson@amd.com
Subject: Re: [PATCH net-next 4/8] pds_core: Prevent race issues involving the adminq
Date: Thu, 4 Jan 2024 19:16:43 +0000 [thread overview]
Message-ID: <20240104191643.GL31813@kernel.org> (raw)
In-Reply-To: <20240104171221.31399-5-brett.creeley@amd.com>
On Thu, Jan 04, 2024 at 09:12:17AM -0800, Brett Creeley wrote:
> There are multiple paths that can result in using the pdsc's
> adminq.
>
> [1] pdsc_adminq_isr and the resulting work from queue_work(),
> i.e. pdsc_work_thread()->pdsc_process_adminq()
>
> [2] pdsc_adminq_post()
>
> When the device goes through reset via PCIe reset and/or
> a fw_down/fw_up cycle due to bad PCIe state or bad device
> state the adminq is destroyed and recreated.
>
> A NULL pointer dereference can happen if [1] or [2] happens
> after the adminq is already destroyed.
>
> In order to fix this, add some further state checks and
> implement reference counting for adminq uses. Reference
> counting was used because multiple threads can attempt to
> access the adminq at the same time via [1] or [2]. Additionally,
> multiple clients (i.e. pds-vfio-pci) can be using [2]
> at the same time.
>
> The adminq_refcnt is initialized to 1 when the adminq has been
> allocated and is ready to use. Users/clients of the adminq
> (i.e. [1] and [2]) will increment the refcnt when they are using
> the adminq. When the driver goes into a fw_down cycle it will
> set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt
> to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent
> any further adminq_refcnt increments. Waiting for the
> adminq_refcnt to hit 1 allows for any current users of the adminq
> to finish before the driver frees the adminq. Once the
> adminq_refcnt hits 1 the driver clears the refcnt to signify that
> the adminq is deleted and cannot be used. On the fw_up cycle the
> driver will once again initialize the adminq_refcnt to 1 allowing
> the adminq to be used again.
>
> Signed-off-by: Brett Creeley <brett.creeley@amd.com>
> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
...
> diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
> index 0356e56a6e99..3b3e1541dd1c 100644
> --- a/drivers/net/ethernet/amd/pds_core/core.c
> +++ b/drivers/net/ethernet/amd/pds_core/core.c
> @@ -450,6 +450,7 @@ int pdsc_setup(struct pdsc *pdsc, bool init)
> pdsc_debugfs_add_viftype(pdsc);
> }
>
> + refcount_set(&pdsc->adminq_refcnt, 1);
> clear_bit(PDSC_S_FW_DEAD, &pdsc->state);
> return 0;
>
> @@ -514,6 +515,24 @@ void pdsc_stop(struct pdsc *pdsc)
> PDS_CORE_INTR_MASK_SET);
> }
>
> +void pdsc_adminq_wait_and_dec_once_unused(struct pdsc *pdsc)
Hi Brett,
a minor nit from my side: pdsc_adminq_wait_and_dec_once_unused is only used
in this file so perhaps it should be static?
> +{
> + /* The driver initializes the adminq_refcnt to 1 when the adminq is
> + * allocated and ready for use. Other users/requesters will increment
> + * the refcnt while in use. If the refcnt is down to 1 then the adminq
> + * is not in use and the refcnt can be cleared and adminq freed. Before
> + * calling this function the driver will set PDSC_S_FW_DEAD, which
> + * prevent subsequent attempts to use the adminq and increment the
> + * refcnt to fail. This guarantees that this function will eventually
> + * exit.
> + */
> + while (!refcount_dec_if_one(&pdsc->adminq_refcnt)) {
> + dev_dbg_ratelimited(pdsc->dev, "%s: adminq in use\n",
> + __func__);
> + cpu_relax();
> + }
> +}
> +
> void pdsc_fw_down(struct pdsc *pdsc)
> {
> union pds_core_notifyq_comp reset_event = {
> @@ -529,6 +548,8 @@ void pdsc_fw_down(struct pdsc *pdsc)
> if (pdsc->pdev->is_virtfn)
> return;
>
> + pdsc_adminq_wait_and_dec_once_unused(pdsc);
> +
> /* Notify clients of fw_down */
> if (pdsc->fw_reporter)
> devlink_health_report(pdsc->fw_reporter, "FW down reported", pdsc);
...
next prev parent reply other threads:[~2024-01-04 19:16 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-04 17:12 [PATCH net-next 0/8] pds_core: Various improvements and AQ race condition cleanup Brett Creeley
2024-01-04 17:12 ` [PATCH net-next 1/8] pds_core: Prevent health thread from running during reset/remove Brett Creeley
2024-01-04 17:12 ` [PATCH net-next 2/8] pds_core: Cancel AQ work on teardown Brett Creeley
2024-01-04 17:12 ` [PATCH net-next 3/8] pds_core: Use struct pdsc for the pdsc_adminq_isr private data Brett Creeley
2024-01-04 17:12 ` [PATCH net-next 4/8] pds_core: Prevent race issues involving the adminq Brett Creeley
2024-01-04 19:16 ` Simon Horman [this message]
2024-01-04 19:24 ` Brett Creeley
2024-01-06 1:50 ` kernel test robot
2024-01-07 1:28 ` kernel test robot
2024-01-04 17:12 ` [PATCH net-next 5/8] pds_core: Clear BARs on reset Brett Creeley
2024-01-04 17:12 ` [PATCH net-next 6/8] pds_core: Don't assign interrupt index/bound_intr to notifyq Brett Creeley
2024-01-04 17:12 ` [PATCH net-next 7/8] pds_core: Unmask adminq interrupt in work thread Brett Creeley
2024-01-04 17:12 ` [PATCH net-next 8/8] pds_core: Fix up some RCT issues Brett Creeley
2024-01-05 7:59 ` [PATCH net-next 0/8] pds_core: Various improvements and AQ race condition cleanup Przemek Kitszel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240104191643.GL31813@kernel.org \
--to=horms@kernel.org \
--cc=brett.creeley@amd.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shannon.nelson@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).