From: Baolu Lu <baolu.lu@linux.intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
"Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com>
Cc: baolu.lu@linux.intel.com,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
Lucas De Marchi <lucas.demarchi@intel.com>,
"Kurmi, Suresh Kumar" <suresh.kumar.kurmi@intel.com>,
"Saarinen, Jani" <jani.saarinen@intel.com>,
matthew.auld@intel.com, iommu@lists.linux.dev
Subject: Re: REGRESSION on linux-next (next-20251106)
Date: Tue, 18 Nov 2025 18:30:22 +0800 [thread overview]
Message-ID: <da9af809-9248-4cc4-ae4a-e64a03e43c13@linux.intel.com> (raw)
In-Reply-To: <20251118012944.GA60885@nvidia.com>
On 11/18/2025 9:29 AM, Jason Gunthorpe wrote:
> On Mon, Nov 10, 2025 at 12:06:30PM +0530, Borah, Chaitanya Kumar wrote:
>> Hello Jason,
>>
>> Hope you are doing well. I am Chaitanya from the linux graphics team in
>> Intel.
>>
>> This mail is regarding a regression we are seeing in our CI runs[1] on
>> linux-next repository.
>>
>> Since the version next-20251106 [2], we are seeing our tests timing out
>> presumably caused by a GPU Hang.
>>
>> `````````````````````````````````````````````````````````````````````````````````
>> <6> [490.872058] i915 0000:00:02.0: [drm] Got hung context on vcs0 with
>> active request 939:2 [0x1004] not yet started
>> <6> [490.875244] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:baffffff
>> <7> [496.424189] i915 0000:00:02.0: [drm:intel_guc_context_reset_process_msg
>> [i915]] GT1: GUC: Got context reset notification: 0x1004 on vcs0, exiting =
>> no, banned = no
>> <6> [496.921551] i915 0000:00:02.0: [drm] Got hung context on vcs0 with
>> active request 939:2 [0x1004] not yet started
>> <6> [496.924799] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:baffffff
>> <4> [499.946641] [IGT] Per-test timeout exceeded. Killing the current test
>> with SIGQUIT.
>> `````````````````````````````````````````````````````````````````````````````````
>> Details log can be found in [3].
> Chaitanya, can you check these two debugging patches:
>
> https://github.com/jgunthorpe/linux/commits/for-borah
>
> 10635ad3ff26a0 DEBUGGING: Force flush the whole cpu cache for the page table on every map operation
> 2789602b882499 DEBUGGING: Force flush the whole iotlb on every map operation
>
> Please run a test with each of them applied*individually* and report
> back what changes in the test. The "cpu cache" one may oops or
> something, we are just looking to see if it gets past the error Kevin
> pointed to:
>
> <7>[ 67.231149] [IGT] gem_exec_gttfill: starting subtest basic
> [..]
> <5>[ 68.824598] i915 0000:00:02.0: Using 46-bit DMA addresses
> <3>[ 68.825482] i915 0000:00:02.0: [drm]*ERROR* GT0: GUC: CT: Failed to process request 6000 (-EOPNOTSUPP)
>
> I could not test these patches so they may not work at all..
I applied and tested both debugging patches separately, but the failures
persist. And I also tried to flush all TLB caches by adding
flush_tlb_all() in the iommu mapping path. It doesn't help either.
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2d2f64ce2bc6..59a00235032b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3484,6 +3484,8 @@ static int intel_iommu_iotlb_sync_map(struct
iommu_domain *domain,
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ flush_tlb_all();
+
if (dmar_domain->iotlb_sync_map)
cache_tag_flush_range_np(dmar_domain, iova, iova + size
- 1);
Thanks,
baolu
next prev parent reply other threads:[~2025-11-18 10:30 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 6:36 REGRESSION on linux-next (next-20251106) Borah, Chaitanya Kumar
2025-11-12 22:32 ` Jason Gunthorpe
2025-11-13 2:00 ` Tian, Kevin
2025-11-17 15:24 ` Jason Gunthorpe
2025-11-17 12:54 ` Baolu Lu
2025-11-17 15:22 ` Jason Gunthorpe
2025-11-18 1:29 ` Jason Gunthorpe
2025-11-18 4:04 ` Tian, Kevin
2025-11-18 6:19 ` Baolu Lu
2025-11-18 6:23 ` Baolu Lu
2025-11-18 7:47 ` Tian, Kevin
2025-11-18 11:29 ` Baolu Lu
2025-11-18 12:35 ` Jason Gunthorpe
2025-11-19 7:25 ` Baolu Lu
2025-11-18 10:30 ` Baolu Lu [this message]
2025-11-18 15:16 ` Borah, Chaitanya Kumar
2025-11-18 16:13 ` Jason Gunthorpe
2025-11-19 7:40 ` Borah, Chaitanya Kumar
2025-11-19 9:31 ` Tian, Kevin
2025-11-19 18:51 ` Jason Gunthorpe
2025-11-19 23:56 ` Tian, Kevin
2025-11-20 2:18 ` Jason Gunthorpe
2025-11-20 2:24 ` Baolu Lu
2025-11-20 7:27 ` Baolu Lu
2025-11-20 0:19 ` Tian, Kevin
2025-11-19 9:29 ` Baolu Lu
2025-11-18 12:42 ` ✗ Fi.CI.BUILD: failure for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da9af809-9248-4cc4-ae4a-e64a03e43c13@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=chaitanya.kumar.borah@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=iommu@lists.linux.dev \
--cc=jani.saarinen@intel.com \
--cc=jgg@nvidia.com \
--cc=lucas.demarchi@intel.com \
--cc=matthew.auld@intel.com \
--cc=suresh.kumar.kurmi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.