Re: [PATCH v5 0/2] Fix TLB invalidate issues with Broadwell [preempt-rt regression]

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Paul Gortmaker <paul.gortmaker@windriver.com>
To: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Matthew Brost <matthew.brost@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>,
	Andi Shyti <andi.shyti@linux.intel.com>,
	David Airlie <airlied@linux.ie>,
	Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>,
	<dri-devel@lists.freedesktop.org>,
	Lucas De Marchi <lucas.demarchi@intel.com>,
	<linux-kernel@vger.kernel.org>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
	Bruce Chang <yu.bruce.chang@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Dave Airlie <airlied@redhat.com>,
	<intel-gfx@lists.freedesktop.org>,
	John Harrison <John.C.Harrison@Intel.com>,
	Clark Williams <clrkwllms@kernel.org>,
	<linux-rt-users@vger.kernel.org>
Subject: Re: [PATCH v5 0/2] Fix TLB invalidate issues with Broadwell [preempt-rt regression]
Date: Fri, 16 Sep 2022 14:19:37 -0400	[thread overview]
Message-ID: <20220916181934.GA16961@windriver.com> (raw)
In-Reply-To: <cover.1657639152.git.mchehab@kernel.org>

[[PATCH v5 0/2] Fix TLB invalidate issues with Broadwell] On 12/07/2022 (Tue 16:21) Mauro Carvalho Chehab wrote:

> i915 selftest hangcheck is causing the i915 driver timeouts, as reported
> by Intel CI bot:
> 
> http://gfx-ci.fi.intel.com/cibuglog-ng/issuefilterassoc/24297?query_key=42a999f48fa6ecce068bc8126c069be7c31153b4

[...]

> After that, the machine just silently hangs.
> 
> Bisecting the issue, the patch that introduced the regression is:
> 
>     7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Reverting it fix the issues, but introduce other problems, as TLB
> won't be invalidated anymore. So, instead, let's fix the root cause.
> 
> It turns that the TLB flush logic ends conflicting with i915 reset,
> which is called during selftest hangcheck. So, the TLB cache should
> be serialized together with i915 reset.
> 
> Tested on an Intel NUC5i7RYB with an i7-5557U Broadwell CPU.

It turns out that this breaks PM-suspend operations on preempt-rt, on
multiple versions, due to all the linux-stable backports.  This happens
because the uncore->lock is now used in atomic contexts.

As the uncore->lock is widely used, conversion to a raw lock seems
inappropriate at 1st glance, and hence some alternate solution will
likely be required.

Below is an example of the regression on v5.15-rt, with backport:

commit 0ee5874dad61d2b154a9e3db196fc33e8208ce1b
  Author: Chris Wilson <chris@chris-wilson.co.uk>
  Date:   Tue Jul 12 16:21:32 2022 +0100

    drm/i915/gt: Serialize GRDOM access between multiple engine resets

    [ Upstream commit b24dcf1dc507f69ed3b5c66c2b6a0209ae80d4d4 ]

Reverting the engine reset serialization change avoids the PM-suspend
regression and is a temporary workaround for -rt users, but of course
leaves this original TLB issue exposed.

  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 45092, name: kworker/u8:4
  preempt_count: 1, expected: 0
  RCU nest depth: 0, expected: 0
  INFO: lockdep is turned off.
  Preemption disabled at:
  [<ffffffffc0636522>] __intel_gt_reset+0x92/0x100 [i915]
  CPU: 3 PID: 45092 Comm: kworker/u8:4 Tainted: G        W  O      5.15.59-rt48-preempt-rt #1
  Hardware name: Intel(R) Client Systems NUC7i5DNKE/NUC7i5DNB, BIOS DNKBLi5v.86A.0064.2019.0523.1933 05/23/2019
  Workqueue: events_unbound async_run_entry_fn
  Call Trace:
   <TASK>
   show_stack+0x52/0x5c
   dump_stack_lvl+0x5b/0x86
   dump_stack+0x10/0x16
   __might_resched.cold+0xf7/0x12f
   ? __gen6_reset_engines.constprop.0+0x80/0x80 [i915]
   rt_spin_lock+0x4e/0xf0
   ? gen8_reset_engines+0x2e/0x1e0 [i915]
   gen8_reset_engines+0x2e/0x1e0 [i915]
   ? __gen6_reset_engines.constprop.0+0x80/0x80 [i915]
   __intel_gt_reset+0x9d/0x100 [i915]
   gt_sanitize+0x16c/0x190 [i915]
   intel_gt_suspend_late+0x3d/0xc0 [i915]
   i915_gem_suspend_late+0x57/0x130 [i915]
   i915_drm_suspend_late+0x38/0x110 [i915]
   i915_pm_suspend_late+0x1d/0x30 [i915]
   pm_generic_suspend_late+0x28/0x40
   pci_pm_suspend_late+0x37/0x50
   ? pci_pm_poweroff_late+0x50/0x50
   dpm_run_callback.cold+0x3c/0xa8
   __device_suspend_late+0xa4/0x1e0
   async_suspend_late+0x20/0xa0
   async_run_entry_fn+0x28/0xc0
   process_one_work+0x239/0x6c0
   worker_thread+0x58/0x3e0
   kthread+0x1a9/0x1d0
   ? process_one_work+0x6c0/0x6c0
   ? set_kthread_struct+0x50/0x50
   ret_from_fork+0x1f/0x30
   </TASK>
  PM: late suspend of devices complete after 26.497 msecs

Paul.
--

> 
> v5:
> - Added a missing SoB on patch 2.
> - No other changes.
> 
> v4:
> - No functional changes. All changes are at the patch descriptions:
>   - collected acked-by/reviewed-by;
>   - use the same e-mail on Author and SoB on patch 1.
> 
> v3:
> - Removed the logic that would check if the engine is awake before doing
>   TLB flush invalidation as backporting PM logic up to Kernel 4.x could be
>   too painful. After getting this one merged, I'll submit a separate patch
>   with the PM awake logic.
> 
> v2:
> 
> - Reduced to bare minimum fixes, as this shoud be backported deeply
>   into stable.
> 
> Chris Wilson (2):
>   drm/i915/gt: Serialize GRDOM access between multiple engine resets
>   drm/i915/gt: Serialize TLB invalidates with GT resets
> 
>  drivers/gpu/drm/i915/gt/intel_gt.c    | 15 ++++++++++-
>  drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++-------
>  2 files changed, 42 insertions(+), 10 deletions(-)
> 
> -- 
> 2.36.1
> 
>

     prev parent reply	other threads:[~2022-09-16 18:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12 15:21 [PATCH v5 0/2] Fix TLB invalidate issues with Broadwell Mauro Carvalho Chehab
2022-07-12 15:21 ` [PATCH v5 1/2] drm/i915/gt: Serialize GRDOM access between multiple engine resets Mauro Carvalho Chehab
2022-07-12 15:21 ` [PATCH v5 2/2] drm/i915/gt: Serialize TLB invalidates with GT resets Mauro Carvalho Chehab
2022-07-12 21:44   ` Rodrigo Vivi
2022-09-16 18:19 ` Paul Gortmaker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220916181934.GA16961@windriver.com \
    --to=paul.gortmaker@windriver.com \
    --cc=John.C.Harrison@Intel.com \
    --cc=airlied@linux.ie \
    --cc=airlied@redhat.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=clrkwllms@kernel.org \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mchehab@kernel.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=tejaskumarx.surendrakumar.upadhyay@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    --cc=yu.bruce.chang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox