All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Badal Nilawar <badal.nilawar@intel.com>
Cc: <intel-gfx@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
	<janusz.krzysztofik@intel.com>
Subject: Re: [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind
Date: Wed, 27 Mar 2024 12:37:29 -0700	[thread overview]
Message-ID: <85h6gr5uva.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <20240326124838.3049215-1-badal.nilawar@intel.com>

On Tue, 26 Mar 2024 05:48:38 -0700, Badal Nilawar wrote:
>

Hi Badal,

> i915_hwmon and its resources are managed resources of i915 dev.
> During i915 driver unregister flow the function i915_hwmon_unregister()
> explicitly makes i915_hwmon resource NULL. This happen before
> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> sysfs is being accessed:

I don't agree with this patch. For even if we remove setting to NULL, it
doesn't explain what we see, which is that the devm_kzalloc'd i915->hwmon
has been *freed* (since ddat (i915->hwmon->ddat) is 0x6b6b6b6b6b6b6dbf):

	(gdb) list *hwm_energy+0x2b
	0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
	129             struct hwm_energy_info *ei = &ddat->ei;
	130             intel_wakeref_t wakeref;
	131             i915_reg_t rgaddr;
	132             u32 reg_val;
	133
	134             if (ddat->gt_n >= 0)
	135                     rgaddr = hwmon->rg.energy_status_tile;
	136             else
	137                     rgaddr = hwmon->rg.energy_status_all;
	138

If we don't have i915_hwmon_unregister equivalent in xe, that's because xe
has not implemented equivalents of i915_hwmon_power_max_disable and
i915_hwmon_power_max_restore, where the check is used.

Also, we should verify any fix before sending a patch out, either via local
repro (say use a script to read hwmon energy file while running the
selftests) or CI. Though in this case CI doesn't show any failures but I am
wondering if that is only by chance.

Fwiu, the issue is somehow being caused by ddat being *freed* before hwmon
sysfs has disappeared, at module unload time (since selftests use module
unload), allowing prometheus-node process to call into the energy read
sysfs and cause the crash.

Thanks.
--
Ashutosh


>
> <7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> <7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper
> <4> [518.501476] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI
> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G     U             6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> <4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202
> <4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 0000000000000000
> <4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 000000006b6b6b6b
> <4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 0000000000000001
> <4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: ffff888243e1a010
> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff8881e3078428
> <4> [518.600343] FS:  00007f9def400700(0000) GS:ffff88888d100000(0000) knlGS:0000000000000000
> <4> [518.608373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 0000000000f50ef0
> <4> [518.621158] PKRU: 55555554
> <4> [518.623858] Call Trace:
> <4> [518.626303]  <TASK>
> <4> [518.628400]  ? __die_body+0x1a/0x60
> <4> [518.631881]  ? die_addr+0x38/0x60
> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
> <4> [518.652953]  hwmon_attr_show+0x36/0x140
> <4> [518.656775]  dev_attr_show+0x15/0x60
>
> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_driver.c | 2 --
>  drivers/gpu/drm/i915/i915_hwmon.c  | 5 -----
>  2 files changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 4b9233c07a22..a95b455964b7 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
>	for_each_gt(gt, dev_priv, i)
>		intel_gt_driver_unregister(gt);
>
> -	i915_hwmon_unregister(dev_priv);
> -
>	i915_perf_unregister(dev_priv);
>	i915_pmu_unregister(dev_priv);
>
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> index c9169e56b9a1..91f171752d34 100644
> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915)
>			ddat_gt->hwmon_dev = hwmon_dev;
>	}
>  }
> -
> -void i915_hwmon_unregister(struct drm_i915_private *i915)
> -{
> -	fetch_and_zero(&i915->hwmon);
> -}
> --
> 2.25.1
>

      parent reply	other threads:[~2024-03-27 19:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 12:48 [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind Badal Nilawar
2024-03-26 21:28 ` Krzysztofik, Janusz
2024-03-27  4:30   ` Nilawar, Badal
2024-03-27  9:15     ` Krzysztofik, Janusz
2024-03-27 20:37       ` Dixit, Ashutosh
2024-03-28 10:06         ` Krzysztofik, Janusz
2024-03-29  3:47         ` Dixit, Ashutosh
2024-03-27  1:14 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2024-03-27  1:28 ` ✗ Fi.CI.BAT: failure " Patchwork
2024-03-27 19:37 ` Dixit, Ashutosh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85h6gr5uva.wl-ashutosh.dixit@intel.com \
    --to=ashutosh.dixit@intel.com \
    --cc=anshuman.gupta@intel.com \
    --cc=badal.nilawar@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=janusz.krzysztofik@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.