public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Philipp Stanner <pstanner@redhat.com>
To: "Ashish Kalra" <Ashish.Kalra@amd.com>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Bjorn Helgaas" <helgaas@kernel.org>
Cc: airlied@gmail.com, bhelgaas@google.com, dakr@redhat.com,
	daniel@ffwll.ch,  dri-devel@lists.freedesktop.org,
	hdegoede@redhat.com,  linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org,  maarten.lankhorst@linux.intel.com,
	mripard@kernel.org, sam@ravnborg.org,  tzimmermann@suse.de,
	thomas.lendacky@amd.com, mario.limonciello@amd.com
Subject: Re: [PATCH v9 10/13] PCI: Give pci_intx() its own devres callback
Date: Tue, 09 Jul 2024 09:21:34 +0200	[thread overview]
Message-ID: <7734192dbf4d07ce77ab7a20481ccb12ff71ffcb.camel@redhat.com> (raw)
In-Reply-To: <20240708214656.4721-1-Ashish.Kalra@amd.com>

@Bjorn, @Krzysztof

On Mon, 2024-07-08 at 21:46 +0000, Ashish Kalra wrote:
> With this patch applied, we are observing unloading and then
> reloading issues with the AMD Crypto (CCP) driver:

Thank you very much for digging into this, Ashish

Could you give me some pointers how one could test CCP by himself?

> 
> with DEVRES logging enabled, we observe the following logs:
> 
> [  218.093588] ccp 0000:a2:00.1: DEVRES REL 00000000c18c52fb
> 0xffff8d09dc1972c0 devm_kzalloc_release (152 bytes)
> [  218.105527] ccp 0000:a2:00.1: DEVRES REL 000000003091fb95
> 0xffff8d09d3aad000 devm_kzalloc_release (3072 bytes)
> [  218.117500] ccp 0000:a2:00.1: DEVRES REL 0000000049e4adfe
> 0xffff8d09d588f000 pcim_intx_restore (4 bytes)
> [  218.129519] ccp 0000:a2:00.1: DEVRES ADD 000000001a2ac6ad
> 0xffff8cfa867b7cc0 pcim_intx_restore (4 bytes)
> [  218.140434] ccp 0000:a2:00.1: DEVRES REL 00000000627ecaf7
> 0xffff8d09d588f680 pcim_msi_release (16 bytes)
> [  218.151665] ccp 0000:a2:00.1: DEVRES REL 0000000058b2252a
> 0xffff8d09dc199680 msi_device_data_release (80 bytes)
> [  218.163625] ccp 0000:a2:00.1: DEVRES REL 00000000435cc85e
> 0xffff8d09d588ff80 devm_attr_group_remove (8 bytes)
> [  218.175224] ccp 0000:a2:00.1: DEVRES REL 00000000cb6fcd9b
> 0xffff8d09eb583660 pcim_addr_resource_release (40 bytes)
> [  218.187319] ccp 0000:a2:00.1: DEVRES REL 00000000d64a8b84
> 0xffff8d09eb583180 pcim_iomap_release (48 bytes)
> [  218.198615] ccp 0000:a2:00.1: DEVRES REL 0000000099ac6b28
> 0xffff8d09eb5830c0 pcim_addr_resource_release (40 bytes)
> [  218.210730] ccp 0000:a2:00.1: DEVRES REL 00000000bdd27f88
> 0xffff8d09d3ac2700 pcim_release (0 bytes)
> [  218.221489] ccp 0000:a2:00.1: DEVRES REL 00000000e763315c
> 0xffff8d09d3ac2240 devm_kzalloc_release (20 bytes)
> [  218.233008] ccp 0000:a2:00.1: DEVRES REL 00000000ae90f983
> 0xffff8d09dc25a800 devm_kzalloc_release (184 bytes)
> [  218.245251] ccp 0000:23:00.1: DEVRES REL 00000000a2ec0085
> 0xffff8cfa86bee700 fw_name_devm_release (16 bytes)
> [  218.256748] ccp 0000:23:00.1: DEVRES REL 0000000021bccd98
> 0xffff8cfaa528d5c0 devm_pages_release (16 bytes)
> [  218.268044] ccp 0000:23:00.1: DEVRES REL 000000003ef7cbc7
> 0xffff8cfaa1b5ec00 devm_kzalloc_release (104 bytes)
> [  218.279631] ccp 0000:23:00.1: DEVRES REL 00000000619322e1
> 0xffff8cfaa1b5e480 devm_kzalloc_release (152 bytes)
> [  218.300438] ccp 0000:23:00.1: DEVRES REL 00000000c261523b
> 0xffff8cfaad88b000 devm_kzalloc_release (3072 bytes)
> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
> [  218.391226] ccp 0000:23:00.1: DEVRES REL 0000000058c9dce1
> 0xffff8cfaa528d880 pcim_msi_release (16 bytes)
> [  218.421340] ccp 0000:23:00.1: DEVRES REL 00000000c8ab08a7
> 0xffff8cfa9e617300 msi_device_data_release (80 bytes)
> [  218.452357] ccp 0000:23:00.1: DEVRES REL 00000000cf5baccb
> 0xffff8cfaa528d8c0 devm_attr_group_remove (8 bytes)
> [  218.483011] ccp 0000:23:00.1: DEVRES REL 00000000b8cbbadd
> 0xffff8cfa9c596060 pcim_addr_resource_release (40 bytes)
> [  218.514343] ccp 0000:23:00.1: DEVRES REL 00000000920f9607
> 0xffff8cfa9c596c60 pcim_iomap_release (48 bytes)
> [  218.544659] ccp 0000:23:00.1: DEVRES REL 00000000d401a708
> 0xffff8cfa9c596840 pcim_addr_resource_release (40 bytes)
> [  218.575774] ccp 0000:23:00.1: DEVRES REL 00000000865d2fa2
> 0xffff8cfaa528d940 pcim_release (0 bytes)
> [  218.605758] ccp 0000:23:00.1: DEVRES REL 00000000f5b79222
> 0xffff8cfaa528d080 devm_kzalloc_release (20 bytes)
> [  218.636260] ccp 0000:23:00.1: DEVRES REL 0000000037ef240a
> 0xffff8cfa9eeb3f00 devm_kzalloc_release (184 bytes)
> 
> and the CCP driver reload issue during driver probe:
> 
> [  226.552684] pci 0000:23:00.1: Resources present before probing
> [  226.568846] pci 0000:a2:00.1: Resources present before probing
> 
> From the above DEVRES logging, it looks like pcim_intx_restore
> associated resource is being released but then
> being re-added during detach/unload, which causes really_probe() to
> fail at probe time, as dev->devres_head is
> not empty due to this added resource:
> ...
> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
> ...
> 
> Going more deep into this: 
> 
> This is the initial pcim_intx_resoure associated resource being added
> during first (CCP) driver load:
> 
> [   40.418933]  pcim_intx+0x3a/0x120
> [   40.418936]  pci_intx+0x8b/0xa0
> [   40.418939]  __pci_enable_msix_range+0x369/0x530
> [   40.418943]  pci_enable_msix_range+0x18/0x20
> [   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
> [   40.418965] ipmi device interface
> [   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   40.418969]  local_pci_probe+0x4f/0xb0
> [   40.418973]  work_for_cpu_fn+0x1e/0x30
> [   40.418976]  process_one_work+0x183/0x350
> [   40.418980]  worker_thread+0x2df/0x3f0
> [   40.418982]  ? __pfx_worker_thread+0x10/0x10
> [   40.418985]  kthread+0xd0/0x100
> [   40.418987]  ? __pfx_kthread+0x10/0x10
> [   40.418990]  ret_from_fork+0x40/0x60
> [   40.418993]  ? __pfx_kthread+0x10/0x10
> [   40.418996]  ret_from_fork_asm+0x1a/0x30
> [   40.419001]  </TASK>
> ..
> ..
> [   40.419012] ccp 0000:23:00.1: DEVRES ADD 00000000fbd19618
> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
> 
> Now, at driver unload: 
> devres_release_all() -> remove_nodes() -> release_nodes() ...
> 
> remove_nodes() moves normal devres entries to the todo list, as can
> be seen with the following log:
> ...
> [  218.245241] moving node 00000000fbd19618 0xffff8cfaa528d140 from
> devres to todo list
> ...
> 
> So, now this pcim_intx_resource associated resource is no longer part
> of dev->devres_head list and has been
> moved to the todo list.
> 
> Later, when release_nodes() is invoked, it calls the associated
> release() callback associated with this devres:
> ...
> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
> ...
> 
> The call flow for that is:
> pcim_intx_restore() -> pci_intx() -> pcim_intx() ...
> 
> Now, pcim_intx() calls get_or_create_intx_devres() which tries to
> find it's associated devres using devres_find(), but 
> that fails to find the devres, as the devres is no longer on dev-
> >devres_head and has been moved to todo list.
> 
> Therefore, get_or_create_intx_devres() adds a new devres at driver
> unload/detach time:
> ...
> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
> ...

You're absolutely right, that seems to be the issue precisely. In fact,
this problem of PCI hybrid functions calling themselves again even
forced me to implement a "pure unmanaged" version of
__pci_request_region(). So it's a pity that I didn't think of that
problem for pci_intx().

> 
> But, then this is an issue as pcim_intx() is supposed to restore the
> original PCI INTx state on driver detach, but it now
> operating on a newly added devres and not the original devres (added
> at driver probe) which contains the original PCI INTx
> state, so it will be restoring an incorrect PCI INTx state ?

I think this is just UB and we don't have to think about whether it's
the correct state or not – it must only be restored once, so it's
broken in any case.

> 
> Additionally, this newly added devres causes driver reload/probe
> failure as really_probe() now finds resources present
> before probing.

Yes, that has to be separated.

@Bjorn:
So I think the solution will be not to call into pci_intx() from
pcim_intx_restore() at all anymore.

Similar as we do with __pci_request_region() <-> __pcim_request_region().

Let me dig into that..

I guess you'll prefer me to send a fixup commit to squash into the
pcim_intx() commit?

I'm quite busy today, but will definitely deliver that quite soon.

> 
> Not sure, if this issue has been observed with other PCI device
> drivers.

Everyone using pci_intx() AND pcim_enable_device() will have this
issue.

The only thing I'm wondering about is where your code in
drivers/crypto/ccp/ calls into pci_intx()?


Regards,
P.

> 
> Thanks,
> Ashish
> 


  reply	other threads:[~2024-07-09  7:21 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13 11:50 [PATCH v9 00/13] Make PCI's devres API more consistent Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 01/13] PCI: Add and use devres helper for bit masks Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 02/13] PCI: Add devres helpers for iomap table Philipp Stanner
2025-11-23 16:42   ` [PATCH v9 02/13] PCI: Add devres helpers for iomap table [resulting in backtraces on HPPA] Guenter Roeck
2025-11-25 15:48     ` Philipp Stanner
2025-11-25 16:12       ` Guenter Roeck
2025-11-25 16:28         ` Philipp Stanner
2025-11-25 18:49           ` Guenter Roeck
2024-06-13 11:50 ` [PATCH v9 03/13] PCI: Add partial-BAR devres support Philipp Stanner
2024-06-13 21:28   ` Bjorn Helgaas
2024-06-14  8:01     ` Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 04/13] PCI: Deprecate two surplus devres functions Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 05/13] PCI: Make devres region requests consistent Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 06/13] PCI: Warn users about complicated devres nature Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 07/13] PCI: Remove enabled status bit from pci_devres Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 08/13] PCI: Move pinned status bit to struct pci_dev Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 09/13] PCI: Give pcim_set_mwi() its own devres callback Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 10/13] PCI: Give pci_intx() " Philipp Stanner
2024-06-13 21:06   ` Bjorn Helgaas
2024-06-14  8:09     ` Philipp Stanner
2024-06-14 16:14       ` Bjorn Helgaas
2024-06-17  8:21         ` Philipp Stanner
2024-06-17 16:46           ` Bjorn Helgaas
2024-06-18  7:56             ` Philipp Stanner
2024-07-08 21:46   ` Ashish Kalra
2024-07-09  7:21     ` Philipp Stanner [this message]
2024-07-09  8:12       ` Kalra, Ashish
2024-07-09  8:56     ` Philipp Stanner
2024-07-09 18:46       ` Kalra, Ashish
2024-07-10  4:08         ` Krzysztof Wilczyński
2024-07-10  4:43       ` Krzysztof Wilczyński
2024-06-13 11:50 ` [PATCH v9 11/13] PCI: Remove legacy pcim_release() Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 12/13] PCI: Add pcim_iomap_range() Philipp Stanner
2024-06-13 11:50 ` [PATCH v9 13/13] drm/vboxvideo: fix mapping leaks Philipp Stanner
2024-06-13 21:57 ` [PATCH v9 00/13] Make PCI's devres API more consistent Bjorn Helgaas
2024-06-14 11:38   ` Philipp Stanner
2024-06-14 16:16     ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7734192dbf4d07ce77ab7a20481ccb12ff71ffcb.camel@redhat.com \
    --to=pstanner@redhat.com \
    --cc=Ashish.Kalra@amd.com \
    --cc=airlied@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hdegoede@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=kwilczynski@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mario.limonciello@amd.com \
    --cc=mripard@kernel.org \
    --cc=sam@ravnborg.org \
    --cc=thomas.lendacky@amd.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox