From: Karolina Drobnik <karolina.drobnik@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
intel-gfx@lists.freedesktop.org,
Yunfei Wang <yf.wang@mediatek.com>,
Miles Chen <miles.chen@mediatek.com>,
Robin Murphy <robin.murphy@arm.com>,
Ning Li <ning.li@mediatek.com>
Subject: Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"
Date: Thu, 15 Sep 2022 07:02:27 +0200 [thread overview]
Message-ID: <77a699fd-82d0-87fe-fb61-09b786f44151@intel.com> (raw)
In-Reply-To: <20220914150121.y6ucj4mav65mt7we@ldmartin-desk2.lan>
On 14.09.2022 17:01, Lucas De Marchi wrote:
> On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:
>> This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.
>>
>> This change introduces a regression on Alder Lake that completely
>> blocks testing. To enable CI and avoid possible circular locking
>> warning, revert the patch.
>
> We are already on rc5. Are iommu authors involved aware of this issue?
> We could do this in our "for CI only" branch, but it's equally important
> that this is fixed for 6.0
I planned to reach out to them after merging this revert on "CI only"
branch (hence the topic tag) with more justification. And yes, I'm fully
aware we're quite late in the cycle, so that's also why I went with this
patch first.
Many thanks,
Karolina
> Cc'ing them.
>
> thanks
> Lucas De Marchi
>
>>
>> kernel log:
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted
>> ------------------------------------------------------
>> cpuhp/0/15 is trying to acquire lock:
>> ffff8881013df278 (&(&priv->bus_notifier)->rwsem){++++}-{3:3}, at:
>> blocking_notifier_call_chain+0x20/0x50
>> but task is already holding lock:
>> ffffffff826490c0 (cpuhp_state-up){+.+.}-{0:0}, at:
>> cpuhp_thread_fun+0x48/0x1f0
>> which lock already depends on the new loc
>> the existing dependency chain (in reverse order) is:
>> -> #3 (cpuhp_state-up){+.+.}-{0:0}:
>> lock_acquire+0xd3/0x310
>> cpuhp_thread_fun+0xa6/0x1f0
>> smpboot_thread_fn+0x1b5/0x260
>> kthread+0xed/0x120
>> ret_from_fork+0x1f/0x30
>> -> #2 (cpu_hotplug_lock){++++}-{0:0}:
>> lock_acquire+0xd3/0x310
>> __cpuhp_state_add_instance+0x43/0x1c0
>> iova_domain_init_rcaches+0x199/0x1c0
>> iommu_setup_dma_ops+0x130/0x440
>> bus_iommu_probe+0x26a/0x2d0
>> bus_set_iommu+0x82/0xd0
>> intel_iommu_init+0xe33/0x1039
>> pci_iommu_init+0x9/0x31
>> do_one_initcall+0x53/0x2f0
>> kernel_init_freeable+0x18f/0x1e1
>> kernel_init+0x11/0x120
>> ret_from_fork+0x1f/0x30
>> -> #1 (&domain->iova_cookie->mutex){+.+.}-{3:3}:
>> lock_acquire+0xd3/0x310
>> __mutex_lock+0x97/0xf10
>> iommu_setup_dma_ops+0xd7/0x440
>> iommu_probe_device+0xa4/0x180
>> iommu_bus_notifier+0x2d/0x40
>> notifier_call_chain+0x31/0x90
>> blocking_notifier_call_chain+0x3a/0x50
>> device_add+0x3c1/0x900
>> pci_device_add+0x255/0x580
>> pci_scan_single_device+0xa6/0xd0
>> pci_scan_slot+0x7a/0x1b0
>> pci_scan_child_bus_extend+0x35/0x2a0
>> vmd_probe+0x5cd/0x970
>> pci_device_probe+0x95/0x110
>> really_probe+0xd6/0x350
>> __driver_probe_device+0x73/0x170
>> driver_probe_device+0x1a/0x90
>> __driver_attach+0xbc/0x190
>> bus_for_each_dev+0x72/0xc0
>> bus_add_driver+0x1bb/0x210
>> driver_register+0x66/0xc0
>> do_one_initcall+0x53/0x2f0
>> kernel_init_freeable+0x18f/0x1e1
>> kernel_init+0x11/0x120
>> ret_from_fork+0x1f/0x30
>> -> #0 (&(&priv->bus_notifier)->rwsem){++++}-{3:3}:
>> validate_chain+0xb3f/0x2000
>> __lock_acquire+0x5a4/0xb70
>> lock_acquire+0xd3/0x310
>> down_read+0x39/0x140
>> blocking_notifier_call_chain+0x20/0x50
>> device_add+0x3c1/0x900
>> platform_device_add+0x108/0x240
>> coretemp_cpu_online+0xe1/0x15e [coretemp]
>> cpuhp_invoke_callback+0x181/0x8a0
>> cpuhp_thread_fun+0x188/0x1f0
>> smpboot_thread_fn+0x1b5/0x260
>> kthread+0xed/0x120
>> ret_from_fork+0x1f/0x30
>> other info that might help us debug thi
>> Chain exists of &(&priv->bus_notifier)->rwsem -->
>> cpu_hotplug_lock --> cpuhp_state-
>> Possible unsafe locking scenari
>> CPU0 CPU1
>> ---- ----
>> lock(cpuhp_state-up);
>> lock(cpu_hotplug_lock);
>> lock(cpuhp_state-up);
>> lock(&(&priv->bus_notifier)->rwsem);
>> *** DEADLOCK *
>> 2 locks held by cpuhp/0/15:
>> #0: ffffffff82648f10 (cpu_hotplug_lock){++++}-{0:0}, at:
>> cpuhp_thread_fun+0x48/0x1f0
>> #1: ffffffff826490c0 (cpuhp_state-up){+.+.}-{0:0}, at:
>> cpuhp_thread_fun+0x48/0x1f0
>> stack backtrace:
>> CPU: 0 PID: 15 Comm: cpuhp/0 Not tainted
>> 6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1
>> Hardware name: Intel Corporation Alder Lake Client
>> Platform/AlderLake-P DDR4 RVP, BIOS ADLPFWI1.R00.3135.A00.2203251419
>> 03/25/2022
>> Call Trace:
>> <TASK>
>> dump_stack_lvl+0x56/0x7f
>> check_noncircular+0x132/0x150
>> validate_chain+0xb3f/0x2000
>> __lock_acquire+0x5a4/0xb70
>> lock_acquire+0xd3/0x310
>> ? blocking_notifier_call_chain+0x20/0x50
>> down_read+0x39/0x140
>> ? blocking_notifier_call_chain+0x20/0x50
>> blocking_notifier_call_chain+0x20/0x50
>> device_add+0x3c1/0x900
>> ? dev_set_name+0x4e/0x70
>> platform_device_add+0x108/0x240
>> coretemp_cpu_online+0xe1/0x15e [coretemp]
>> ? create_core_data+0x550/0x550 [coretemp]
>> cpuhp_invoke_callback+0x181/0x8a0
>> cpuhp_thread_fun+0x188/0x1f0
>> ? smpboot_thread_fn+0x1e/0x260
>> smpboot_thread_fn+0x1b5/0x260
>> ? sort_range+0x20/0x20
>> kthread+0xed/0x120
>> ? kthread_complete_and_exit+0x20/0x20
>> ret_from_fork+0x1f/0x30
>> </TASK>
>>
>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6641
>>
>> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>> drivers/iommu/dma-iommu.c | 17 ++++-------------
>> 1 file changed, 4 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 17dd683b2fce..9616b473e4c7 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -65,7 +65,6 @@ struct iommu_dma_cookie {
>>
>> /* Domain for flush queue callback; NULL if flush queue not in use */
>> struct iommu_domain *fq_domain;
>> - struct mutex mutex;
>> };
>>
>> static DEFINE_STATIC_KEY_FALSE(iommu_deferred_attach_enabled);
>> @@ -312,7 +311,6 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
>> if (!domain->iova_cookie)
>> return -ENOMEM;
>>
>> - mutex_init(&domain->iova_cookie->mutex);
>> return 0;
>> }
>>
>> @@ -563,33 +561,26 @@ static int iommu_dma_init_domain(struct
>> iommu_domain *domain, dma_addr_t base,
>> }
>>
>> /* start_pfn is always nonzero for an already-initialised domain */
>> - mutex_lock(&cookie->mutex);
>> if (iovad->start_pfn) {
>> if (1UL << order != iovad->granule ||
>> base_pfn != iovad->start_pfn) {
>> pr_warn("Incompatible range for DMA domain\n");
>> - ret = -EFAULT;
>> - goto done_unlock;
>> + return -EFAULT;
>> }
>>
>> - ret = 0;
>> - goto done_unlock;
>> + return 0;
>> }
>>
>> init_iova_domain(iovad, 1UL << order, base_pfn);
>> ret = iova_domain_init_rcaches(iovad);
>> if (ret)
>> - goto done_unlock;
>> + return ret;
>>
>> /* If the FQ fails we can simply fall back to strict mode */
>> if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain))
>> domain->type = IOMMU_DOMAIN_DMA;
>>
>> - ret = iova_reserve_iommu_regions(dev, domain);
>> -
>> -done_unlock:
>> - mutex_unlock(&cookie->mutex);
>> - return ret;
>> + return iova_reserve_iommu_regions(dev, domain);
>> }
>>
>> /**
>> --
>> 2.25.1
>>
next prev parent reply other threads:[~2022-09-15 5:02 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-14 12:40 [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization" Karolina Drobnik
2022-09-14 13:39 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-09-14 15:01 ` [Intel-gfx] [topic/core-for-CI] " Lucas De Marchi
2022-09-14 15:54 ` Robin Murphy
2022-09-16 12:24 ` Karolina Drobnik
2022-09-16 20:32 ` Lucas De Marchi
2022-09-19 14:01 ` Karolina Drobnik
2022-09-21 17:15 ` Janusz Krzysztofik
2022-09-15 5:02 ` Karolina Drobnik [this message]
2022-09-15 7:05 ` [Intel-gfx] ✓ Fi.CI.IGT: success for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=77a699fd-82d0-87fe-fb61-09b786f44151@intel.com \
--to=karolina.drobnik@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=iommu@lists.linux.dev \
--cc=jroedel@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=lucas.demarchi@intel.com \
--cc=miles.chen@mediatek.com \
--cc=ning.li@mediatek.com \
--cc=robin.murphy@arm.com \
--cc=yf.wang@mediatek.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox