* [PATCH v4 0/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove
@ 2023-11-22 4:23 Daniel Stodden
2023-11-22 4:23 ` [PATCH v4 1/1] " Daniel Stodden
2023-11-22 15:48 ` [PATCH v4 0/1] " Bjorn Helgaas
0 siblings, 2 replies; 7+ messages in thread
From: Daniel Stodden @ 2023-11-22 4:23 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dmitry Safonov, Logan Gunthorpe, Kurt Schwemmer, linux-pci,
Daniel Stodden
Changes since v3:
* Restart from upstream f9724598e29df3acfcf5327df11aae2aba1b7f61
* Add missing pci_dev_put() to stdev_create()'s failure path.
* Reviewed-by: Dmitry Safonov <dima@arista.com>
Thanks for the patience.
Daniel
Daniel Stodden (1):
PCI: switchtec: Fix stdev_release() crash after surprise hot remove
drivers/pci/switch/switchtec.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove
2023-11-22 4:23 [PATCH v4 0/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove Daniel Stodden
@ 2023-11-22 4:23 ` Daniel Stodden
2023-11-22 15:55 ` Bjorn Helgaas
2023-12-07 21:08 ` Christophe JAILLET
2023-11-22 15:48 ` [PATCH v4 0/1] " Bjorn Helgaas
1 sibling, 2 replies; 7+ messages in thread
From: Daniel Stodden @ 2023-11-22 4:23 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dmitry Safonov, Logan Gunthorpe, Kurt Schwemmer, linux-pci,
Daniel Stodden, Bjorn Helgaas
A PCI device hot removal may occur while stdev->cdev is held open. The call
to stdev_release() then happens during close or exit, at a point way past
switchtec_pci_remove(). Otherwise the last ref would vanish with the
trailing put_device(), just before return.
At that later point in time, the devm cleanup has already removed the
stdev->mmio_mrpc mapping. Also, the stdev->pdev reference was not a counted
one. Therefore, in DMA mode, the iowrite32() in stdev_release() will cause
a fatal page fault, and the subsequent dma_free_coherent(), if reached,
would pass a stale &stdev->pdev->dev pointer.
Fix by moving MRPC DMA shutdown into switchtec_pci_remove(), after
stdev_kill(). Counting the stdev->pdev ref is now optional, but may prevent
future accidents.
Reproducible via the script at
https://lore.kernel.org/r/20231113212150.96410-1-dns@arista.com
Link: https://lore.kernel.org/r/20231113212150.96410-2-dns@arista.com
Signed-off-by: Daniel Stodden <dns@arista.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
---
drivers/pci/switch/switchtec.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index 5b921387eca6..1804794d0e68 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -1308,13 +1308,6 @@ static void stdev_release(struct device *dev)
{
struct switchtec_dev *stdev = to_stdev(dev);
- if (stdev->dma_mrpc) {
- iowrite32(0, &stdev->mmio_mrpc->dma_en);
- flush_wc_buf(stdev);
- writeq(0, &stdev->mmio_mrpc->dma_addr);
- dma_free_coherent(&stdev->pdev->dev, sizeof(*stdev->dma_mrpc),
- stdev->dma_mrpc, stdev->dma_mrpc_dma_addr);
- }
kfree(stdev);
}
@@ -1358,7 +1351,7 @@ static struct switchtec_dev *stdev_create(struct pci_dev *pdev)
return ERR_PTR(-ENOMEM);
stdev->alive = true;
- stdev->pdev = pdev;
+ stdev->pdev = pci_dev_get(pdev);
INIT_LIST_HEAD(&stdev->mrpc_queue);
mutex_init(&stdev->mrpc_mutex);
stdev->mrpc_busy = 0;
@@ -1391,6 +1384,7 @@ static struct switchtec_dev *stdev_create(struct pci_dev *pdev)
return stdev;
err_put:
+ pci_dev_put(stdev->pdev);
put_device(&stdev->dev);
return ERR_PTR(rc);
}
@@ -1644,6 +1638,18 @@ static int switchtec_init_pci(struct switchtec_dev *stdev,
return 0;
}
+static void switchtec_exit_pci(struct switchtec_dev *stdev)
+{
+ if (stdev->dma_mrpc) {
+ iowrite32(0, &stdev->mmio_mrpc->dma_en);
+ flush_wc_buf(stdev);
+ writeq(0, &stdev->mmio_mrpc->dma_addr);
+ dma_free_coherent(&stdev->pdev->dev, sizeof(*stdev->dma_mrpc),
+ stdev->dma_mrpc, stdev->dma_mrpc_dma_addr);
+ stdev->dma_mrpc = NULL;
+ }
+}
+
static int switchtec_pci_probe(struct pci_dev *pdev,
const struct pci_device_id *id)
{
@@ -1703,6 +1709,9 @@ static void switchtec_pci_remove(struct pci_dev *pdev)
ida_free(&switchtec_minor_ida, MINOR(stdev->dev.devt));
dev_info(&stdev->dev, "unregistered.\n");
stdev_kill(stdev);
+ switchtec_exit_pci(stdev);
+ pci_dev_put(stdev->pdev);
+ stdev->pdev = NULL;
put_device(&stdev->dev);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 0/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove
2023-11-22 4:23 [PATCH v4 0/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove Daniel Stodden
2023-11-22 4:23 ` [PATCH v4 1/1] " Daniel Stodden
@ 2023-11-22 15:48 ` Bjorn Helgaas
1 sibling, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2023-11-22 15:48 UTC (permalink / raw)
To: Daniel Stodden; +Cc: Dmitry Safonov, Logan Gunthorpe, Kurt Schwemmer, linux-pci
On Tue, Nov 21, 2023 at 08:23:15PM -0800, Daniel Stodden wrote:
> Changes since v3:
> * Restart from upstream f9724598e29df3acfcf5327df11aae2aba1b7f61
> * Add missing pci_dev_put() to stdev_create()'s failure path.
> * Reviewed-by: Dmitry Safonov <dima@arista.com>
>
> Thanks for the patience.
> Daniel
>
> Daniel Stodden (1):
> PCI: switchtec: Fix stdev_release() crash after surprise hot remove
>
> drivers/pci/switch/switchtec.c | 25 +++++++++++++++++--------
> 1 file changed, 17 insertions(+), 8 deletions(-)
Thanks for the v4, I replaced the v3 patch, so this is on the
"switchtec" branch for v6.8.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove
2023-11-22 4:23 ` [PATCH v4 1/1] " Daniel Stodden
@ 2023-11-22 15:55 ` Bjorn Helgaas
2023-11-22 19:03 ` Daniel Stodden
2023-12-07 21:08 ` Christophe JAILLET
1 sibling, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2023-11-22 15:55 UTC (permalink / raw)
To: Daniel Stodden
Cc: Dmitry Safonov, Logan Gunthorpe, Kurt Schwemmer, linux-pci,
Bjorn Helgaas
On Tue, Nov 21, 2023 at 08:23:16PM -0800, Daniel Stodden wrote:
> A PCI device hot removal may occur while stdev->cdev is held open. The call
> to stdev_release() then happens during close or exit, at a point way past
> switchtec_pci_remove(). Otherwise the last ref would vanish with the
> trailing put_device(), just before return.
>
> At that later point in time, the devm cleanup has already removed the
> stdev->mmio_mrpc mapping. Also, the stdev->pdev reference was not a counted
> one. Therefore, in DMA mode, the iowrite32() in stdev_release() will cause
> a fatal page fault, and the subsequent dma_free_coherent(), if reached,
> would pass a stale &stdev->pdev->dev pointer.
>
> Fix by moving MRPC DMA shutdown into switchtec_pci_remove(), after
> stdev_kill(). Counting the stdev->pdev ref is now optional, but may prevent
> future accidents.
>
> Reproducible via the script at
> https://lore.kernel.org/r/20231113212150.96410-1-dns@arista.com
>
> Link: https://lore.kernel.org/r/20231113212150.96410-2-dns@arista.com
> Signed-off-by: Daniel Stodden <dns@arista.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Dmitry Safonov <dima@arista.com>
Oh, I forgot to mention: for future reference, you should only add
Signed-off-by when you create the patch or you receive it from
somebody else and are passing it on.
You should not add the Signed-off-by of the person you're sending it
*to*, because that person will add their own Signed-off-by when they
process it. E.g., I apply patches with "git am --signoff" which adds
my Signed-off-by, which would result in a duplicate.
No worries, I took care of it so there's no duplicate for me :)
Bjorn
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove
2023-11-22 15:55 ` Bjorn Helgaas
@ 2023-11-22 19:03 ` Daniel Stodden
0 siblings, 0 replies; 7+ messages in thread
From: Daniel Stodden @ 2023-11-22 19:03 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Dmitry Safonov, Logan Gunthorpe, Kurt Schwemmer, linux-pci,
Bjorn Helgaas
> On Nov 22, 2023, at 7:55 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Nov 21, 2023 at 08:23:16PM -0800, Daniel Stodden wrote:
>> A PCI device hot removal may occur while stdev->cdev is held open. The call
>> to stdev_release() then happens during close or exit, at a point way past
>> switchtec_pci_remove(). Otherwise the last ref would vanish with the
>> trailing put_device(), just before return.
>>
>> At that later point in time, the devm cleanup has already removed the
>> stdev->mmio_mrpc mapping. Also, the stdev->pdev reference was not a counted
>> one. Therefore, in DMA mode, the iowrite32() in stdev_release() will cause
>> a fatal page fault, and the subsequent dma_free_coherent(), if reached,
>> would pass a stale &stdev->pdev->dev pointer.
>>
>> Fix by moving MRPC DMA shutdown into switchtec_pci_remove(), after
>> stdev_kill(). Counting the stdev->pdev ref is now optional, but may prevent
>> future accidents.
>>
>> Reproducible via the script at
>> https://lore.kernel.org/r/20231113212150.96410-1-dns@arista.com
>>
>> Link: https://lore.kernel.org/r/20231113212150.96410-2-dns@arista.com
>> Signed-off-by: Daniel Stodden <dns@arista.com>
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
>> Reviewed-by: Dmitry Safonov <dima@arista.com>
>
> Oh, I forgot to mention: for future reference, you should only add
> Signed-off-by when you create the patch or you receive it from
> somebody else and are passing it on.
>
> You should not add the Signed-off-by of the person you're sending it
> *to*, because that person will add their own Signed-off-by when they
> process it. E.g., I apply patches with "git am --signoff" which adds
> my Signed-off-by, which would result in a duplicate.
>
> No worries, I took care of it so there's no duplicate for me :)
Foremost thanks for rolling the thing back again. Much appreciated.
It did occur to me before emailing thatI might as well remove it again. However, since the v4 change notice
explictly mentioned that v4 was re-made from your prior v3 commit — to not lose the commentary updates —
it seemed more like a passing-it-on---again. So I left it.
If it risks messing with your workflow — sorry.
Then again, since the whole thing made a loop out of what’s normally a fairly linear process,
seemed, to me, potentially messy either way. At that point.
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove
2023-11-22 4:23 ` [PATCH v4 1/1] " Daniel Stodden
2023-11-22 15:55 ` Bjorn Helgaas
@ 2023-12-07 21:08 ` Christophe JAILLET
2023-12-08 22:36 ` Daniel Stodden
1 sibling, 1 reply; 7+ messages in thread
From: Christophe JAILLET @ 2023-12-07 21:08 UTC (permalink / raw)
To: dns
Cc: bhelgaas, dima, helgaas, kurt.schwemmer, linux-pci, logang,
Christophe JAILLET
> A PCI device hot removal may occur while stdev->cdev is held open. The call
> to stdev_release() then happens during close or exit, at a point way past
> switchtec_pci_remove(). Otherwise the last ref would vanish with the
> trailing put_device(), just before return.
>
> At that later point in time, the devm cleanup has already removed the
> stdev->mmio_mrpc mapping. Also, the stdev->pdev reference was not a counted
> one. Therefore, in DMA mode, the iowrite32() in stdev_release() will cause
> a fatal page fault, and the subsequent dma_free_coherent(), if reached,
> would pass a stale &stdev->pdev->dev pointer.
>
> Fix by moving MRPC DMA shutdown into switchtec_pci_remove(), after
> stdev_kill(). Counting the stdev->pdev ref is now optional, but may prevent
> future accidents.
>
> Reproducible via the script at
> https://lore.kernel.org/r/20231113212150.96410-1-dns@arista.com
>
> Link: https://lore.kernel.org/r/20231113212150.96410-2-dns@arista.com
> Signed-off-by: Daniel Stodden <dns@arista.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
> Reviewed-by: Dmitry Safonov <dima@arista.com>
---
...
> @@ -1703,6 +1709,9 @@ static void switchtec_pci_remove(struct pci_dev *pdev) > ida_free(&switchtec_minor_ida, MINOR(stdev->dev.devt));
> dev_info(&stdev->dev, "unregistered.\n");
> stdev_kill(stdev);
> + switchtec_exit_pci(stdev); > + pci_dev_put(stdev->pdev); > + stdev->pdev = NULL; > put_device(&stdev->dev);
> }
Hi,
does a similarswitchtec_exit_pci() should be called in the error handling path of switchtec_pci_probe() if an error occurs after switchtec_init_pci()?
CJ
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove
2023-12-07 21:08 ` Christophe JAILLET
@ 2023-12-08 22:36 ` Daniel Stodden
0 siblings, 0 replies; 7+ messages in thread
From: Daniel Stodden @ 2023-12-08 22:36 UTC (permalink / raw)
To: Christophe JAILLET
Cc: Bjorn Helgaas, dima, Bjorn Helgaas, kurt.schwemmer, linux-pci,
logang
> On Dec 7, 2023, at 1:08 PM, Christophe JAILLET <christophe.jaillet@wanadoo.fr> wrote:
>
>> A PCI device hot removal may occur while stdev->cdev is held open. The call
>> to stdev_release() then happens during close or exit, at a point way past
>> switchtec_pci_remove(). Otherwise the last ref would vanish with the
>> trailing put_device(), just before return.
>> At that later point in time, the devm cleanup has already removed the
>> stdev->mmio_mrpc mapping. Also, the stdev->pdev reference was not a counted
>> one. Therefore, in DMA mode, the iowrite32() in stdev_release() will cause
>> a fatal page fault, and the subsequent dma_free_coherent(), if reached,
>> would pass a stale &stdev->pdev->dev pointer.
>> Fix by moving MRPC DMA shutdown into switchtec_pci_remove(), after
>> stdev_kill(). Counting the stdev->pdev ref is now optional, but may prevent
>> future accidents.
>> Reproducible via the script at
>> https://lore.kernel.org/r/20231113212150.96410-1-dns@arista.com
>> Link: https://lore.kernel.org/r/20231113212150.96410-2-dns@arista.com
>> Signed-off-by: Daniel Stodden <dns@arista.com>
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
>> Reviewed-by: Dmitry Safonov <dima@arista.com>
> ---
>
> ...
>
>> @@ -1703,6 +1709,9 @@ static void switchtec_pci_remove(struct pci_dev *pdev) > ida_free(&switchtec_minor_ida, MINOR(stdev->dev.devt));
>> dev_info(&stdev->dev, "unregistered.\n");
>> stdev_kill(stdev);
>> + switchtec_exit_pci(stdev); > + pci_dev_put(stdev->pdev); > + stdev->pdev = NULL; > put_device(&stdev->dev);
>> }
> Hi,
>
> does a similarswitchtec_exit_pci() should be called in the error handling path of switchtec_pci_probe() if an error occurs after switchtec_init_pci()?
>
Yep, that is correct.
Looks like this is actually another regression resulting from evacuating stdev_release.
Previous code could rely on the trailing put_device(), past err_put. No more.
Cheers,
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-12-08 22:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-22 4:23 [PATCH v4 0/1] PCI: switchtec: Fix stdev_release() crash after surprise hot remove Daniel Stodden
2023-11-22 4:23 ` [PATCH v4 1/1] " Daniel Stodden
2023-11-22 15:55 ` Bjorn Helgaas
2023-11-22 19:03 ` Daniel Stodden
2023-12-07 21:08 ` Christophe JAILLET
2023-12-08 22:36 ` Daniel Stodden
2023-11-22 15:48 ` [PATCH v4 0/1] " Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox