All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolu Lu <baolu.lu@linux.intel.com>
To: Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>,
	David Woodhouse <dwmw2@infradead.org>
Cc: kevin.tian@intel.com, ashok.raj@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	cai@lca.pw, jacob.jun.pan@intel.com
Subject: Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
Date: Fri, 1 Jul 2022 15:19:14 +0800	[thread overview]
Message-ID: <ffca1789-1e96-ae01-74a0-942fecb9caac@linux.intel.com> (raw)
In-Reply-To: <f3619c80-14d3-d934-755a-4c3734bfde20@arm.com>

On 2022/6/29 21:03, Robin Murphy wrote:
> On 2019-06-12 01:28, Lu Baolu wrote:
>> The drhd and device scope list should be iterated with the
>> iommu global lock held. Otherwise, a suspicious RCU usage
>> message will be displayed.
>>
>> [    3.695886] =============================
>> [    3.695917] WARNING: suspicious RCU usage
>> [    3.695950] 5.2.0-rc2+ #2467 Not tainted
>> [    3.695981] -----------------------------
>> [    3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
>> rcu_dereference_check() usage!
>> [    3.696069]
>>                 other info that might help us debug this:
>>
>> [    3.696126]
>>                 rcu_scheduler_active = 2, debug_locks = 1
>> [    3.696173] no locks held by swapper/0/1.
>> [    3.696204]
>>                 stack backtrace:
>> [    3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467
>> [    3.696370] Call Trace:
>> [    3.696404]  dump_stack+0x85/0xcb
>> [    3.696441]  intel_iommu_init+0x128c/0x13ce
>> [    3.696478]  ? kmem_cache_free+0x16b/0x2c0
>> [    3.696516]  ? __fput+0x14b/0x270
>> [    3.696550]  ? __call_rcu+0xb7/0x300
>> [    3.696583]  ? get_max_files+0x10/0x10
>> [    3.696631]  ? set_debug_rodata+0x11/0x11
>> [    3.696668]  ? e820__memblock_setup+0x60/0x60
>> [    3.696704]  ? pci_iommu_init+0x16/0x3f
>> [    3.696737]  ? set_debug_rodata+0x11/0x11
>> [    3.696770]  pci_iommu_init+0x16/0x3f
>> [    3.696805]  do_one_initcall+0x5d/0x2e4
>> [    3.696844]  ? set_debug_rodata+0x11/0x11
>> [    3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
>> [    3.696924]  kernel_init_freeable+0x1f0/0x27c
>> [    3.696961]  ? rest_init+0x260/0x260
>> [    3.696997]  kernel_init+0xa/0x110
>> [    3.697028]  ret_from_fork+0x3a/0x50
>>
>> Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space 
>> devices")
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-iommu.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 19c4c387a3f6..84e650c6a46d 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
>>       cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
>>                 intel_iommu_cpu_dead);
>> +    down_read(&dmar_global_lock);
>>       if (probe_acpi_namespace_devices())
>>           pr_warn("ACPI name space devices didn't probe correctly\n");
>> +    up_read(&dmar_global_lock);
> 
> Doing a bit of archaeology here, is this actually broken? If any ANDD 
> entries exist, we'd end up doing:
> 
>    down_read(&dmar_global_lock)
>    probe_acpi_namespace_devices()
>    -> iommu_probe_device()
>       -> iommu_create_device_direct_mappings()
>          -> iommu_get_resv_regions()
>             -> intel_iommu_get_resv_regions()
>                -> down_read(&dmar_global_lock)
> 
> I'm wondering whether this might explain why my bus_set_iommu series 
> prevented Baolu's machine from booting, since "iommu: Move bus setup to 
> IOMMU device registration" creates the same condition where we end up in 
> get_resv_regions (via bus_iommu_probe() this time) from the same task 
> that already holds dmar_global_lock. Of course that leaves me wondering 
> how it *did* manage to boot OK on my Xeon box, but maybe there's a 
> config difference or dumb luck at play?

This is really problematic. Where does the latest bus_set_iommu series
locate? I'd like to take a closer look at what happened here. Perhaps
two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20
these days.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Baolu Lu <baolu.lu@linux.intel.com>
To: Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>,
	David Woodhouse <dwmw2@infradead.org>
Cc: baolu.lu@linux.intel.com, kevin.tian@intel.com,
	ashok.raj@intel.com, linux-kernel@vger.kernel.org,
	iommu@lists.linux-foundation.org, cai@lca.pw,
	jacob.jun.pan@intel.com
Subject: Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
Date: Fri, 1 Jul 2022 15:19:14 +0800	[thread overview]
Message-ID: <ffca1789-1e96-ae01-74a0-942fecb9caac@linux.intel.com> (raw)
In-Reply-To: <f3619c80-14d3-d934-755a-4c3734bfde20@arm.com>

On 2022/6/29 21:03, Robin Murphy wrote:
> On 2019-06-12 01:28, Lu Baolu wrote:
>> The drhd and device scope list should be iterated with the
>> iommu global lock held. Otherwise, a suspicious RCU usage
>> message will be displayed.
>>
>> [    3.695886] =============================
>> [    3.695917] WARNING: suspicious RCU usage
>> [    3.695950] 5.2.0-rc2+ #2467 Not tainted
>> [    3.695981] -----------------------------
>> [    3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
>> rcu_dereference_check() usage!
>> [    3.696069]
>>                 other info that might help us debug this:
>>
>> [    3.696126]
>>                 rcu_scheduler_active = 2, debug_locks = 1
>> [    3.696173] no locks held by swapper/0/1.
>> [    3.696204]
>>                 stack backtrace:
>> [    3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467
>> [    3.696370] Call Trace:
>> [    3.696404]  dump_stack+0x85/0xcb
>> [    3.696441]  intel_iommu_init+0x128c/0x13ce
>> [    3.696478]  ? kmem_cache_free+0x16b/0x2c0
>> [    3.696516]  ? __fput+0x14b/0x270
>> [    3.696550]  ? __call_rcu+0xb7/0x300
>> [    3.696583]  ? get_max_files+0x10/0x10
>> [    3.696631]  ? set_debug_rodata+0x11/0x11
>> [    3.696668]  ? e820__memblock_setup+0x60/0x60
>> [    3.696704]  ? pci_iommu_init+0x16/0x3f
>> [    3.696737]  ? set_debug_rodata+0x11/0x11
>> [    3.696770]  pci_iommu_init+0x16/0x3f
>> [    3.696805]  do_one_initcall+0x5d/0x2e4
>> [    3.696844]  ? set_debug_rodata+0x11/0x11
>> [    3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
>> [    3.696924]  kernel_init_freeable+0x1f0/0x27c
>> [    3.696961]  ? rest_init+0x260/0x260
>> [    3.696997]  kernel_init+0xa/0x110
>> [    3.697028]  ret_from_fork+0x3a/0x50
>>
>> Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space 
>> devices")
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-iommu.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 19c4c387a3f6..84e650c6a46d 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
>>       cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
>>                 intel_iommu_cpu_dead);
>> +    down_read(&dmar_global_lock);
>>       if (probe_acpi_namespace_devices())
>>           pr_warn("ACPI name space devices didn't probe correctly\n");
>> +    up_read(&dmar_global_lock);
> 
> Doing a bit of archaeology here, is this actually broken? If any ANDD 
> entries exist, we'd end up doing:
> 
>    down_read(&dmar_global_lock)
>    probe_acpi_namespace_devices()
>    -> iommu_probe_device()
>       -> iommu_create_device_direct_mappings()
>          -> iommu_get_resv_regions()
>             -> intel_iommu_get_resv_regions()
>                -> down_read(&dmar_global_lock)
> 
> I'm wondering whether this might explain why my bus_set_iommu series 
> prevented Baolu's machine from booting, since "iommu: Move bus setup to 
> IOMMU device registration" creates the same condition where we end up in 
> get_resv_regions (via bus_iommu_probe() this time) from the same task 
> that already holds dmar_global_lock. Of course that leaves me wondering 
> how it *did* manage to boot OK on my Xeon box, but maybe there's a 
> config difference or dumb luck at play?

This is really problematic. Where does the latest bus_set_iommu series
locate? I'd like to take a closer look at what happened here. Perhaps
two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20
these days.

Best regards,
baolu

  reply	other threads:[~2022-07-01  7:19 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-12  0:28 [PATCH v2 0/7] iommu/vt-d: Fixes and cleanups for linux-next Lu Baolu
2019-06-12  0:28 ` Lu Baolu
2019-06-12  0:28 ` [PATCH v2 1/7] iommu/vt-d: Don't return error when device gets right domain Lu Baolu
2019-06-12  0:28   ` Lu Baolu
2019-06-12  0:28 ` [PATCH v2 2/7] iommu/vt-d: Set domain type for a private domain Lu Baolu
2019-06-12  0:28   ` Lu Baolu
2019-06-12  0:28 ` [PATCH v2 3/7] iommu/vt-d: Don't enable iommu's which have been ignored Lu Baolu
2019-06-12  0:28   ` Lu Baolu
2019-06-12  0:28 ` [PATCH v2 4/7] iommu/vt-d: Allow DMA domain attaching to rmrr locked device Lu Baolu
2019-06-12  0:28   ` Lu Baolu
2019-06-12  0:28 ` [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices() Lu Baolu
2019-06-12  0:28   ` Lu Baolu
2022-06-29 13:03   ` Robin Murphy
2022-06-29 13:03     ` Robin Murphy
2022-07-01  7:19     ` Baolu Lu [this message]
2022-07-01  7:19       ` Baolu Lu
2022-07-01  8:18       ` Robin Murphy
2022-07-01  8:18         ` Robin Murphy
2019-06-12  0:28 ` [PATCH v2 6/7] iommu/vt-d: Cleanup after delegating DMA domain to generic iommu Lu Baolu
2019-06-12  0:28   ` Lu Baolu
2019-06-12  0:28 ` [PATCH v2 7/7] iommu/vt-d: Consolidate domain_init() to avoid duplication Lu Baolu
2019-06-12  0:28   ` Lu Baolu
2019-07-18 23:16   ` Alex Williamson
2019-07-18 23:16     ` Alex Williamson
2019-07-19  8:27     ` Lu Baolu
2019-07-19  8:27       ` Lu Baolu
2019-07-19 15:19       ` Alex Williamson
2019-07-19 15:19         ` Alex Williamson
2019-07-20  1:15         ` Lu Baolu
2019-07-20  1:15           ` Lu Baolu
2019-07-22 14:22           ` Joerg Roedel
2019-07-22 14:22             ` Joerg Roedel
2019-07-25  1:41             ` Lu Baolu
2019-07-25  1:41               ` Lu Baolu
2019-06-12  8:37 ` [PATCH v2 0/7] iommu/vt-d: Fixes and cleanups for linux-next Joerg Roedel
2019-06-12  8:37   ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffca1789-1e96-ae01-74a0-942fecb9caac@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=cai@lca.pw \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.