Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Baolu Lu <baolu.lu@linux.intel.com>
To: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"Kurmi, Suresh Kumar" <suresh.kumar.kurmi@intel.com>,
	"Saarinen, Jani" <jani.saarinen@intel.com>,
	"De Marchi, Lucas" <lucas.demarchi@intel.com>
Subject: Re: Regression on drm-tip
Date: Mon, 17 Mar 2025 12:04:40 +0800	[thread overview]
Message-ID: <ec6e3bb6-8093-4082-b09f-26068693b83c@linux.intel.com> (raw)
In-Reply-To: <SJ1PR11MB6129A28720CF33982397E777B9DC2@SJ1PR11MB6129.namprd11.prod.outlook.com>

On 3/16/25 18:01, Borah, Chaitanya Kumar wrote:
> 
>> -----Original Message-----
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Sunday, March 16, 2025 1:33 PM
>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>> Cc:intel-gfx@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>> iommu@lists.linux.dev; Kurmi, Suresh Kumar
>> <suresh.kumar.kurmi@intel.com>; Saarinen, Jani<jani.saarinen@intel.com>;
>> De Marchi, Lucas<lucas.demarchi@intel.com>
>> Subject: Re: Regression on drm-tip
>>
>> On 3/16/25 15:27, Borah, Chaitanya Kumar wrote:
>>>> -----Original Message-----
>>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>>> Sent: Sunday, March 16, 2025 8:04 AM
>>>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>>>> Cc:intel-gfx@lists.freedesktop.org;intel-xe@lists.freedesktop.org;
>>>> iommu@lists.linux.dev
>>>> Subject: Re: Regression on drm-tip
>>>>
>>>> On 3/14/25 17:04, Borah, Chaitanya Kumar wrote:
>>>>>> -----Original Message-----
>>>>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>>>>> Sent: Thursday, March 13, 2025 7:53 PM
>>>>>> To: Borah, Chaitanya Kumar<chaitanya.kumar.borah@intel.com>
>>>>>> Cc:baolu.lu@linux.intel.com;intel-gfx@lists.freedesktop.org; intel-
>>>>>> xe@lists.freedesktop.org;iommu@lists.linux.dev
>>>>>> Subject: Re: Regression on drm-tip
>>>>>>
>>>>>> On 2025/3/13 16:51, Borah, Chaitanya Kumar wrote:
>>>>>>> Hello Lu,
>>>>>>>
>>>>>>> Hope you are doing well. I am Chaitanya from the linux graphics
>>>>>>> team in
>>>>>> Intel.
>>>>>>> This mail is regarding a regression we are seeing in our CI
>>>>>>> runs[1] on drm-tip
>>>>>> repository.
>>>>>>> ``````````````````````````````````````````````````````````````````
>>>>>>> `` `` ``````````` <4>[    2.856622] WARNING: possible circular
>>>>>>> locking dependency detected <4>[    2.856631]
>>>>>>> 6.14.0-rc5-CI_DRM_16217-gc55ef90b69d3+ #1 Tainted: G          I
>>>>>>> <4>[ 2.856642]
>>>>>>> ------------------------------------------------------
>>>>>>> <4>[    2.856650] swapper/0/1 is trying to acquire lock:
>>>>>>> <4>[    2.856657] ffffffff8360ecc8
>>>>>>> (iommu_probe_device_lock){+.+.}-{3:3}, at:
>>>>>>> iommu_probe_device+0x1d/0x70 <4>[    2.856679]
>>>>>>>                       but task is already holding lock:
>>>>>>> <4>[    2.856686] ffff888102ab6fa8
>>>>>>> (&device->physical_node_lock){+.+.}-{3:3}, at:
>>>>>>> intel_iommu_init+0xea1/0x1220
>>>>>>> ``````````````````````````````````````````````````````````````````
>>>>>>> ``
>>>>>>> ``
>>>>>>> ```````````
>>>>>>> Details log can be found in [2].
>>>>>>>
>>>>>>> After bisecting the tree, the following patch [3] seems to be the
>>>>>>> first "bad" commit
>>>>>>>
>>>>>>> ``````````````````````````````````````````````````````````````````
>>>>>>> ``
>>>>>>> ``
>>>>>>> ```````````````````````````````````
>>>>>>> commit b150654f74bf0df8e6a7936d5ec51400d9ec06d8
>>>>>>> Author:LuBaolumailto:baolu.lu@linux.intel.com
>>>>>>> Date:   Fri Feb 28 18:27:26 2025 +0800
>>>>>>>
>>>>>>>         iommu/vt-d: Fix suspicious RCU usage
>>>>>>>
>>>>>>> ``````````````````````````````````````````````````````````````````
>>>>>>> ``
>>>>>>> ``
>>>>>>> ```````````````````````````````````
>>>>>>>
>>>>>>> We also verified that if we revert the patch the issue is not seen.
>>>>>>>
>>>>>>> Could you please check why the patch causes this regression and
>>>>>>> provide a
>>>>>> fix if necessary?
>>>>>>
>>>>>> Can you please take a quick test to check if the following fix works?
>>>>>>
>>>>>> diff --git a/drivers/iommu/intel/dmar.c
>>>>>> b/drivers/iommu/intel/dmar.c index
>>>>>> e540092d664d..06debeaec643 100644
>>>>>> --- a/drivers/iommu/intel/dmar.c
>>>>>> +++ b/drivers/iommu/intel/dmar.c
>>>>>> @@ -2051,8 +2051,13 @@ int enable_drhd_fault_handling(unsigned int
>>>> cpu)
>>>>>>                     if (iommu->irq || iommu->node != cpu_to_node(cpu))
>>>>>>                             continue;
>>>>>>
>>>>>> +               /*
>>>>>> +                * Call dmar_alloc_hwirq() with dmar_global_lock held,
>>>>>> +                * could cause possible lock race condition.
>>>>>> +                */
>>>>>> +               up_read(&dmar_global_lock);
>>>>>>                     ret = dmar_set_interrupt(iommu);
>>>>>> -
>>>>>> +               down_read(&dmar_global_lock);
>>>>>>                     if (ret) {
>>>>>>                             pr_err("DRHD %Lx: failed to enable
>>>>>> fault, interrupt, ret
>>>> %d\n",
>>>>>>                                    (unsigned long
>>>>>> long)drhd->reg_base_addr, ret);
>>>>>>
>>>>>> Thanks,
>>>>>> baolu
>>>>> We still see the issue with this change.
>>>> I am attempting to reproduce this issue with my MTL machine. I pulled
>>>> the test branch from:
>>>>
>>>> https://anongit.freedesktop.org/git/drm-tip.git
>>>>
>>>> and built the test kernel image using the configuration file from:
>>>>
>>>> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_16217/kconfig.txt
>>>>
>>>> But I did not observe the lockdep splat mentioned above after booting.
>>>>
>>>> Is there anything I might have missed?
>>>>
>>> +Suresh, Jani, Lucas
>>>
>>> We are seeing this only the skykale and kabylake on our CI runs.
>> If so, will below change make any difference?
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 85aa66ef4d61..ec2f385ae25b 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -3049,6 +3049,7 @@ static int __init
>> probe_acpi_namespace_devices(void)
>>                           if (dev->bus != &acpi_bus_type)
>>                                   continue;
>>
>> +                       up_read(&dmar_global_lock);
>>                           adev = to_acpi_device(dev);
>>                           mutex_lock(&adev->physical_node_lock);
>>                           list_for_each_entry(pn, @@ -3058,6 +3059,7 @@ static int __init
>> probe_acpi_namespace_devices(void)
>>                                           break;
>>                           }
>>                           mutex_unlock(&adev->physical_node_lock);
>> +                       down_read(&dmar_global_lock);
>>
>>                           if (ret)
>>                                   return ret;
>>
> Thank you for the change. This seems to be working. Can we expect a fix patch soon?

Sure. I have posted a fix patch here,

https://lore.kernel.org/linux-iommu/20250317035714.1041549-1-baolu.lu@linux.intel.com/

Thanks,
baolu

  reply	other threads:[~2025-03-17  4:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-13  8:51 Regression on drm-tip Borah, Chaitanya Kumar
2025-03-13  9:30 ` Baolu Lu
2025-03-13 14:23 ` Baolu Lu
2025-03-14  9:04   ` Borah, Chaitanya Kumar
2025-03-16  2:33     ` Baolu Lu
2025-03-16  7:27       ` Borah, Chaitanya Kumar
2025-03-16  8:03         ` Baolu Lu
2025-03-16 10:01           ` Borah, Chaitanya Kumar
2025-03-17  4:04             ` Baolu Lu [this message]
2025-03-22 20:59               ` Lucas De Marchi
2025-03-13 14:28 ` ✗ CI.Patch_applied: failure for " Patchwork
2025-03-16  8:15 ` ✗ CI.Patch_applied: failure for Regression on drm-tip (rev2) Patchwork
2025-03-18 10:15 ` Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2025-04-28  6:02 Regression on drm-tip Borah, Chaitanya Kumar
2025-11-27  6:25 REGRESSION " Borah, Chaitanya Kumar
2025-11-27 16:01 ` Saarinen, Jani
2025-11-27 16:06   ` Saarinen, Jani
2025-11-27 23:04 ` Ville Syrjälä
2025-11-28  7:46   ` Borah, Chaitanya Kumar
2025-12-05 10:14     ` Christian Brauner
2025-12-01 16:13   ` Saarinen, Jani
2025-12-05 10:14 ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ec6e3bb6-8093-4082-b09f-26068693b83c@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=chaitanya.kumar.borah@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=iommu@lists.linux.dev \
    --cc=jani.saarinen@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=suresh.kumar.kurmi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox