Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 7/9] drm/i915/gt: Fix memory leaks in per-gt sysfs
Date: Thu, 28 Apr 2022 21:25:53 -0700	[thread overview]
Message-ID: <871qxg5xda.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <ce0f7b16-b06b-4eb2-9c47-e7d4092a6c4f@intel.com>

On Thu, 28 Apr 2022 07:36:14 -0700, Andrzej Hajda wrote:
> On 27.04.2022 22:46, Dixit, Ashutosh wrote:
> > On Sun, 24 Apr 2022 15:36:23 -0700, Andi Shyti wrote:
> >> Hi Andrzej and Ashutosh,
> >>
> >>>>>> b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> >>>>>> index 937b2e1a305e..4c72b4f983a6 100644
> >>>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> >>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> >>>>>> @@ -222,6 +222,9 @@ struct intel_gt {
> >>>>>>	} mocs;
> >>>>>>		struct intel_pxp pxp;
> >>>>>> +
> >>>>>> +	/* gt/gtN sysfs */
> >>>>>> +	struct kobject sysfs_gtn;
> >>>>> If you put kobject as a part of intel_gt what assures you that lifetime of
> >>>>> kobject is shorter than intel_gt? Ie its refcounter is 0 on removal of
> >>>>> intel_gt?
> >>>> Because we are explicitly doing a kobject_put() in
> >>>> intel_gt_sysfs_unregister(). Which is exactly what we are *not* doing in
> >>>> the previous code.
> >>>>
> >>>> Let me explain a bit about the previous code (but feel free to skip since
> >>>> the patch should speak for itself):
> >>>> * Previously we kzalloc a 'struct kobj_gt'
> >>>> * But we don't save a pointer to the 'struct kobj_gt' so we don't have the
> >>>>     pointer to the kobject to be able to do a kobject_put() on it later
> >>>> * Therefore we need to store the pointer in 'struct intel_gt'
> >>>> * But if we have to put the pointer in 'struct intel_gt' we might as well
> >>>>     put the kobject as part of 'struct intel_gt' and that also removes the
> >>>>     need to have a 'struct kobj_gt' (kobj_to_gt() can just use container_of()
> >>>>     to get gt from kobj).
> >>>> * So I think this patch simpler/cleaner than the original code if you take
> >>>>     the requirement for kobject_put() into account.
> >> This is my oversight. This was something I completely forgot to
> >> fix but it was my intention to do and actually I had some fixes
> >> ongoing. But because this patch took too long to get in I
> >> completely forgot about it (Sujaritha was actually the first who
> >> pointed this out).
> >>
> >> Thanks, Ashutosh for taking this.
> >>
> >>> I fully agree that previous code is incorrect but I am not convinced current
> >>> code is correct.
> >>> If some objects are kref-counted it means usually they can have multiple
> >>> concurrent users and kobject_put does not work as traditional
> >>> destructor/cleanup/unregister.
> >>> So in this particular case after calling kobject_init_and_add sysfs core can
> >>> get multiple references on the object. Later, during driver unregistration
> >>> kobject_put is called, but if the object is still in use by sysfs core, the
> >>> object will not be destroyed/released. If the driver unregistration
> >>> continues memory will be freed, leaving sysfs-core (or other users) with
> >>> dangling pointers. Unless there is some additional synchronization mechanism
> >>> I am not aware of.
> >> Thanks Andrzej for summarizing this and what you said is actually
> >> what happens. I had a similar solution developed and I had wrong
> >> pointer reference happening.
> > Hi Andrzej/Andi,
> >
> > I did do some research into kobject's and such before writing this patch
> > and based on that I believe the patch is correct. Presenting some evidence
> > below.
> >
> > The patch is verified by:
> >
> > a. Putting a printk in the release() method when it exists (it does for
> >     sysfs_gtn kobject)
> > b. Enabling dynamic prints for lib/kobject.c
> >
> > For example, with the following:
> >
> > # echo 'file kobject.c +p' > /sys/kernel/debug/dynamic_debug/control
> > # echo -n "0000:03:00.0" > /sys/bus/pci/drivers/i915/unbind
> >
> > We see this in dmesg (see kobject_cleanup() called from kobject_put()):
> >
> > [ 1034.930007] kobject: '.defaults' (ffff88817130a640): kobject_cleanup, parent ffff8882262b5778
> > [ 1034.930020] kobject: '.defaults' (ffff88817130a640): auto cleanup kobject_del
> > [ 1034.930336] kobject: '.defaults' (ffff88817130a640): calling ktype release
> > [ 1034.930340] kobject: (ffff88817130a640): dynamic_kobj_release
> > [ 1034.930354] kobject: '.defaults': free name
> > [ 1034.930366] kobject: 'gt0' (ffff8882262b5778): kobject_cleanup, parent ffff88817130a240
> > [ 1034.930371] kobject: 'gt0' (ffff8882262b5778): auto cleanup kobject_del
> > [ 1034.931930] kobject: 'gt0' (ffff8882262b5778): calling ktype release
> > [ 1034.931936] kobject: 'gt0': free name
> > [ 1034.958004] kobject: 'i915_0000_03_00.0' (ffff88810e1f8800): fill_kobj_path: path = '/devices/i915_0000_03_00.0'
> > [ 1034.958155] kobject: 'i915_0000_03_00.0' (ffff88810e1f8800): kobject_cleanup, parent 0000000000000000
> > [ 1034.958162] kobject: 'i915_0000_03_00.0' (ffff88810e1f8800): calling ktype release
> > [ 1034.958188] kobject: 'i915_0000_03_00.0': free name
> > [ 1034.958729] kobject: 'gt' (ffff88817130a240): kobject_cleanup, parent ffff8881160c5000
> > [ 1034.958736] kobject: 'gt' (ffff88817130a240): auto cleanup kobject_del
> > [ 1034.958762] kobject: 'gt' (ffff88817130a240): calling ktype release
> > [ 1034.958767] kobject: (ffff88817130a240): dynamic_kobj_release
> > [ 1034.958778] kobject: 'gt': free name
> >
> > We have the following directory structure (one of the patches is creating
> > /sys/class/drm/card0/gt/gt0/.defaults):
> >
> >        /sys/class/drm/card0/gt
> >                             |-gt0
> >                                |-.defaults
> >
> > And we see from dmesg .defaults, gt0 and gt kobjects being cleaned up in
> > that order.
> >
> > Looking at lib/kobject.c there are several interesting things:
> >
> > * Three subsystems are involved: kobject, sysfs and kernfs.
> >
> > * A child kobject takes a reference on the parent, so we must do a
> >    kobject_put() on the child before doing kobject_put() on the parent
> >    (creating a child kobject creates a corresponding sub-directory in sysfs).
> >
> > * Adding files to a sysfs directory does not take a reference on the
> >    kobject, only on the parent kernfs_node.
> >
> > * Since we do call sysfs_create_group() (for RC6) ordinarily we will need
> >    to call sysfs_remove_group() but this does not seem to be needed because
> >    we are not creating a directory for the group (by providing a name for
> >    the group). So sysfs_create_group() is equivalent to sysfs_create_files().
> >    So it seems we don't need sysfs_remove_group().
> >
> > * Similarly it appears files created by sysfs_create_files() do not need to
> >    be removed by sysfs_remove_files() because __kobject_del() and
> >    sysfs_remove_dir() called from kobject_cleanup() do that for us (the
> >    comment in kobject_cleanup() says "remove from sysfs if the caller did
> >    not do it").
> >
> > Based on the above it is clear that no one except a child kobject takes a
> > reference on the parent kobject and as long as we kobject_put() them in the
> > correct order (as we seem to be doing based on dmesg trace above) we should
> > be ok.
> >
> > Also what is followed in this patch is a fairly standard coding
> > pattern. Further, in case of any errors we generally see failure to unload
> > the module etc. and none of these things are being observed, module reload
> > works fine.
> >
> > I hope these points are helpful in completing review of the patch.
>
> See [1], it is quite old, so maybe it is not valid anymore, but I see no
> code proving sth has changed.

Hi Andrzej,

A lot has changed since that article from 2003 (for 2.5 kernel). For
instance there is kernfs (as I mention above):

	https://lwn.net/Articles/571590/

A process having a sysfs file open today in my view will result in the
following:
* It will take a reference on kernfs_node (not on kobject as was the case
  in kernel 2.5 in [1])
* An open file will prevent the module from being unloaded (not the kernel
  crashing as in 2.5 in [1])

So this is what I would expect with today's kernel. I am not seeing
anything we've done here which violates anything in [1] or [2].

> Also current doc says also [2] similar things, especially:
> "Once you registered your kobject via kobject_add(), you must never use
> kfree() to free it directly"

Correct, we are using kobject_put(), not kfree'ing the kobject.

Thanks.
--
Ashutosh

> [1]: https://lwn.net/Articles/36850/
> [2]: https://elixir.bootlin.com/linux/v5.18-rc4/source/Documentation/core-api/kobject.rst#L246

  reply	other threads:[~2022-04-29  4:25 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 18:11 [Intel-gfx] [PATCH 0/8] drm/i915: Media freq factor and per-gt enhancements/fixes Ashutosh Dixit
2022-04-13 18:11 ` [Intel-gfx] [PATCH 1/8] drm/i915: Introduce has_media_ratio_mode Ashutosh Dixit
2022-04-15 10:26   ` Rodrigo Vivi
2022-04-13 18:11 ` [Intel-gfx] [PATCH 2/8] drm/i915/gt: Add media freq factor to per-gt sysfs Ashutosh Dixit
2022-04-13 18:11 ` [Intel-gfx] [PATCH 3/8] drm/i915/pcode: Extend pcode functions for multiple gt's Ashutosh Dixit
2022-04-14 13:28   ` Jani Nikula
2022-04-14 22:31     ` Dixit, Ashutosh
2022-04-15 10:21       ` Rodrigo Vivi
2022-04-20  5:54         ` Dixit, Ashutosh
2022-04-20 16:32           ` Vivi, Rodrigo
2022-04-26  7:42             ` Jani Nikula
2022-04-13 18:11 ` [Intel-gfx] [PATCH 4/8] drm/i915/pcode: Add a couple of pcode helpers Ashutosh Dixit
2022-04-15 10:31   ` Rodrigo Vivi
2022-04-19  1:23     ` Dixit, Ashutosh
2022-04-13 18:11 ` [Intel-gfx] [PATCH 5/8] drm/i915/gt: Add media RP0/RPn to per-gt sysfs Ashutosh Dixit
2022-04-25  9:39   ` Kamil Konieczny
2022-04-26  0:43     ` Dixit, Ashutosh
2022-04-13 18:11 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Fix memory leaks in " Ashutosh Dixit
2022-04-13 19:14   ` Dixit, Ashutosh
2022-04-13 18:11 ` [Intel-gfx] [PATCH 7/8] drm/i915/gt: Expose per-gt RPS defaults in sysfs Ashutosh Dixit
2022-04-13 18:11 ` [Intel-gfx] [PATCH 8/8] drm/i915/gt: Expose default value for media_freq_factor in per-gt sysfs Ashutosh Dixit
2022-04-14  0:38 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Media freq factor and per-gt enhancements/fixes Patchwork
2022-04-14  0:38 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-04-14  1:00 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-04-14  5:57   ` Dixit, Ashutosh
2022-04-14  7:11     ` Vudum, Lakshminarayana
2022-04-14  6:43 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-04-14  9:27 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-04-20  5:21 ` [Intel-gfx] [PATCH v2 0/9] " Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 1/9] drm/i915: Introduce has_media_ratio_mode Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 2/9] drm/i915/gt: Add media freq factor to per-gt sysfs Ashutosh Dixit
2022-04-21 20:57     ` Rodrigo Vivi
2022-04-26  0:29       ` Dixit, Ashutosh
2022-04-20  5:21   ` [Intel-gfx] [PATCH 3/9] drm/i915/pcode: Extend pcode functions for multiple gt's Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 4/9] drm/i915/gt: Convert callers to user per-gt pcode functions Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 5/9] drm/i915/pcode: Add a couple of pcode helpers Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 6/9] drm/i915/gt: Add media RP0/RPn to per-gt sysfs Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 7/9] drm/i915/gt: Fix memory leaks in " Ashutosh Dixit
2022-04-20 12:17     ` Andrzej Hajda
2022-04-20 16:12       ` Dixit, Ashutosh
2022-04-20 19:51         ` Andrzej Hajda
2022-04-24 22:36           ` Andi Shyti
2022-04-27 20:46             ` Dixit, Ashutosh
2022-04-28 14:36               ` Andrzej Hajda
2022-04-29  4:25                 ` Dixit, Ashutosh [this message]
2022-05-02  6:22                   ` Andrzej Hajda
2022-05-03  4:29                     ` Dixit, Ashutosh
2022-04-20  5:21   ` [Intel-gfx] [PATCH 8/9] drm/i915/gt: Expose per-gt RPS defaults in sysfs Ashutosh Dixit
2022-04-20  5:21   ` [Intel-gfx] [PATCH 9/9] drm/i915/gt: Expose default value for media_freq_factor in per-gt sysfs Ashutosh Dixit
2022-04-20  6:39 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Media freq factor and per-gt enhancements/fixes Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2022-04-20  6:25 [Intel-gfx] [PATCH v2 0/9] " Ashutosh Dixit
2022-04-20  6:25 ` [Intel-gfx] [PATCH 7/9] drm/i915/gt: Fix memory leaks in per-gt sysfs Ashutosh Dixit
2022-04-20 16:23   ` Dixit, Ashutosh
2022-04-24 22:30   ` Andi Shyti
2022-04-26 20:21     ` Dixit, Ashutosh
2022-04-27 11:45       ` Andi Shyti
2022-04-27 20:50         ` Dixit, Ashutosh
2022-04-29  0:39 [Intel-gfx] [PATCH v3 0/9] drm/i915: Media freq factor and per-gt enhancements/fixes Ashutosh Dixit
2022-04-29  0:39 ` [Intel-gfx] [PATCH 7/9] drm/i915/gt: Fix memory leaks in per-gt sysfs Ashutosh Dixit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qxg5xda.wl-ashutosh.dixit@intel.com \
    --to=ashutosh.dixit@intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox