Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
Cc: intel-gfx@lists.freedesktop.org,
	Andi Shyti <andi.shyti@intel.com>,
	Andrzej Hajda <andrzej.hajda@intel.com>
Subject: Re: [Intel-gfx] [PATCH 6/8] drm/i915/gt: Fix memory leaks in per-gt sysfs
Date: Fri, 13 May 2022 10:28:18 +0100	[thread overview]
Message-ID: <5fd75764-8f2c-800f-076c-ba8feb538fad@linux.intel.com> (raw)
In-Reply-To: <871qwy6mzd.wl-ashutosh.dixit@intel.com>


On 13/05/2022 06:05, Dixit, Ashutosh wrote:
> On Thu, 12 May 2022 00:48:08 -0700, Tvrtko Ursulin wrote:
> 
> Hi Tvrtko,
> 
>> On 12/05/2022 00:15, Dixit, Ashutosh wrote:
>>> On Tue, 10 May 2022 03:41:57 -0700, Andrzej Hajda wrote:
>>>> On 10.05.2022 11:48, Tvrtko Ursulin wrote:
>>>>> On 10/05/2022 10:39, Andrzej Hajda wrote:
>>>>>> On 10.05.2022 10:18, Tvrtko Ursulin wrote:
>>>>>>>>>
>>>>>>>>> Was there closure/agreement on the matter of whether or not there is
>>>>>>>>> a potential race between "kfree(gt)" and sysfs access (last put from
>>>>>>>>> sysfs that is)? I've noticed Andrzej and Ashutosh were discussing it
>>>>>>>>> but did not read all the details.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Not really :)
>>>>>>>> IMO docs are against this practice, Ashutosh shows examples of this
>>>>>>>> practice in code and according to his analysis it is safe.
>>>>>>>> I gave up looking for contradictions :) Either it is OK, kobject is
>>>>>>>> not fully shared object, docs are obsolete and needs update, either
>>>>>>>> the patch is wrong.
>>>>>>>> Anyway finally I tend to accept this solution, I failed to prove it is
>>>>>>>> wrong :)
>>>>>>>
>>>>>>> Like a question of whether hotunplug can be triggered while userspace
>>>>>>> is sitting in a sysfs hook? Final kfree then has to be delayed until
>>>>>>> userspace exists.
>>>>>>>
>>>>>>> Btw where is the "kfree(gt)" for the tiles on the PCI remove path? I
>>>>>>> can't find it.. Do we have a leak?
>>>>>>
>>>>>> intel_gt_tile_cleanup ?
>>>>>
>>>>> Called from intel_gt_release_all, whose only caller is the failure path
>>>>> of i915_driver_probe. Feels like something is missing?
>>>>
>>>> This is final proof this patch is safe - no kfree, no UAF :)
>>>>
>>>> Apparently it is broken in internal branch as well.
>>>> Should I take care of it?
>>>
>>> See Daniele's comment here:
>>>
>>> https://patchwork.freedesktop.org/patch/478856/?series=101551&rev=1
>>
>> Yeah we found that same leak yesterday, or the day before in this thread.
>>
>>> We clean up the gt sysfs during PCI device remove (i915_driver_remove ->
>>> i915_driver_unregister -> intel_gt_driver_unregister ->
>>> intel_gt_sysfs_unregister (added in this patch)). But from Daniele's mail
>>> it appears that "kfree(gt)" can only be done from i915_driver_release().
>>>
>>> So as long as i915_driver_release() always happens after
>>> i915_driver_remove() (which seems to be the case though I couldn't figure
>>> out why (i.e. who is putting the final reference of the drm device)) there
>>> is no UAF and no race. Thanks!
>>
>> No worried by the unknown?
> 
> Well if release() happens before or during remove() then (as is clear from
> Daniele's mail) we have a much bigger problem than sysfs on our hands and
> will see UAF crashes during device remove/unbind. But as far as we know no
> such crashes have been reported.
> 
>> I had a quick look whether core_hotunplug tests for sysfs interactions
>> but couldn't spot it. What I had in mind is userspace stuck in a sysfs
>> hook (say read into a userfaultfd buffer) with device hotunplug in
>> parallel. Maybe it is all handled already, not claiming that it isn't.
> 
> This is the 20 year old issue mentioned by Andrzej here:
> https://lwn.net/Articles/36850/
> 
> So I thought I'll try this out today and see what actually happens to
> settle this. And you will see it makes perfect sense. So this is what I
> did:
> 
> * Change IGT to add a 20 second sleep after opening a sysfs file
> * In that 20 second period, with an open fd, unbind the device using:
> 	echo -n "0000:03:00.0" > /sys/bus/pci/drivers/i915/unbind
>    And also rmmod i915.
> 
> So this is what we see when we do this:
> * As soon as the device is unbound, the complete i915 sysfs tree (under
>    /sys/card/drm/card0) is cleanly removed (even with the open fd in IGT).
> * The fd open in IGT is now orphan/invalid, so when IGT resumes and tries
>    to use that fd IGT crashes.
> * So no problem with device unbind but if IGT is still hanging around rmmod
>    fails (saying module is in use, most likely due to the still open drm fd)
>    but after IGT is completely killed rmmod is also fine.
> 
> So this confirms all this is correctly handled.

I was unsure what does "IGT crashes" exactly meant so I went to try it 
out myself. It's -ENODEV from read(2) it receives so it all indeed seems 
handled fine.

Although hotunplug seems generally very unhealthy, at least on 
5.18.0-rc8 I tested on.. I'll send my subtest to the mailing list in 
case it is consider useful to have it.

Regards,

Tvrtko

> 
> Separately, note that kobject_put's introduced in this patch are only
> needed for freeing the memory allocated for the kobject's themselves (or
> their containing struct's). kobject_put's don't play a role in cleaning up
> the sysfs hierarchy itself (which will get cleaned up even without the
> kobject_put's). Further, child kobject's take a reference on their parents
> so child kobjects need a kobject_put before the parent kobject_put to free
> the memory allocated for the parent (i.e. doing a kobject_put on the parent
> will not automatically free all the child kobjects).
> 
> Thanks.
> --
> Ashutosh

  reply	other threads:[~2022-05-13  9:28 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29 19:56 [Intel-gfx] [PATCH v4 0/8] drm/i915: Media freq factor and per-gt enhancements/fixes Ashutosh Dixit
2022-04-29 19:56 ` [Intel-gfx] [PATCH 1/8] drm/i915: Introduce has_media_ratio_mode Ashutosh Dixit
2022-04-29 19:56 ` [Intel-gfx] [PATCH 2/8] drm/i915/gt: Add media freq factor to per-gt sysfs Ashutosh Dixit
2022-05-10  7:24   ` Tvrtko Ursulin
2022-05-12  4:25     ` Dixit, Ashutosh
2022-04-29 19:56 ` [Intel-gfx] [PATCH 3/8] drm/i915/pcode: Extend pcode functions for multiple gt's Ashutosh Dixit
2022-05-02 12:54   ` Rodrigo Vivi
2022-05-10  6:51   ` Andi Shyti
2022-05-10  7:34   ` Tvrtko Ursulin
2022-05-10  7:43     ` Jani Nikula
2022-05-11  5:26       ` Dixit, Ashutosh
2022-05-11  8:18         ` Tvrtko Ursulin
2022-05-12  4:28           ` Dixit, Ashutosh
2022-04-29 19:56 ` [Intel-gfx] [PATCH 4/8] drm/i915/pcode: Add a couple of pcode helpers Ashutosh Dixit
2022-04-29 19:56 ` [Intel-gfx] [PATCH 5/8] drm/i915/gt: Add media RP0/RPn to per-gt sysfs Ashutosh Dixit
2022-05-10  7:37   ` Tvrtko Ursulin
2022-05-12  4:25     ` Dixit, Ashutosh
2022-04-29 19:56 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Fix memory leaks in " Ashutosh Dixit
2022-05-10  6:02   ` Andi Shyti
2022-05-10  7:28   ` Tvrtko Ursulin
2022-05-10  7:58     ` Andrzej Hajda
2022-05-10  8:18       ` Tvrtko Ursulin
2022-05-10  9:39         ` Andrzej Hajda
2022-05-10  9:48           ` Tvrtko Ursulin
2022-05-10 10:41             ` Andrzej Hajda
2022-05-10 13:25               ` Tvrtko Ursulin
2022-05-11 23:15               ` Dixit, Ashutosh
2022-05-12  7:48                 ` Tvrtko Ursulin
2022-05-13  5:05                   ` Dixit, Ashutosh
2022-05-13  9:28                     ` Tvrtko Ursulin [this message]
2022-04-29 19:56 ` [Intel-gfx] [PATCH 7/8] drm/i915/gt: Expose per-gt RPS defaults in sysfs Ashutosh Dixit
2022-05-10  7:53   ` Tvrtko Ursulin
2022-05-10 10:58     ` Andi Shyti
2022-05-26 19:10       ` Dixit, Ashutosh
2022-05-26 19:09     ` Dixit, Ashutosh
2022-04-29 19:56 ` [Intel-gfx] [PATCH 8/8] drm/i915/gt: Expose default value for media_freq_factor in per-gt sysfs Ashutosh Dixit
2022-04-29 20:37 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Media freq factor and per-gt enhancements/fixes (rev4) Patchwork
2022-04-29 20:37 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-04-29 21:10 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-04-29 23:38 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-04-30  0:44   ` Dixit, Ashutosh
2022-04-30  5:28     ` Vudum, Lakshminarayana
2022-04-30  5:09 ` [Intel-gfx] ✓ Fi.CI.IGT: success " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2022-04-13 18:11 [Intel-gfx] [PATCH 0/8] drm/i915: Media freq factor and per-gt enhancements/fixes Ashutosh Dixit
2022-04-13 18:11 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Fix memory leaks in per-gt sysfs Ashutosh Dixit
2022-04-13 19:14   ` Dixit, Ashutosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5fd75764-8f2c-800f-076c-ba8feb538fad@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=andi.shyti@intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=ashutosh.dixit@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox