public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "David E. Box" <david.e.box@linux.intel.com>
To: "Hans de Goede" <hdegoede@redhat.com>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	platform-driver-x86@vger.kernel.org,
	rajvi.jingar@linux.intel.com
Subject: Re: [PATCH V6 01/20] platform/x86/intel/vsec: Fix xa_alloc memory leak
Date: Mon, 04 Dec 2023 11:57:22 -0800	[thread overview]
Message-ID: <198aadd7a91152393ec56d421fa042d30378af40.camel@linux.intel.com> (raw)
In-Reply-To: <f2a3bab9-296c-43a1-9b7e-944c5044feaf@redhat.com>

Hi Hans,

On Mon, 2023-12-04 at 14:51 +0100, Hans de Goede wrote:
> Hi,
> 
> On 11/30/23 12:02, Ilpo Järvinen wrote:
> > On Wed, 29 Nov 2023, David E. Box wrote:
> > 
> > > Commit 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery
> > > support to Intel PMT") added an xarray to track the list of vsec devices
> > > to
> > > be recovered after a PCI error. But it did not provide cleanup for the
> > > list
> > > leading to a memory leak that was caught by kmemleak.  Do xa_alloc()
> > > before
> > > devm_add_action_or_reset() so that the list may be cleaned up with
> > > xa_erase() in the release function.
> > > 
> > > Fixes: 936874b77dd0 ("platform/x86/intel/vsec: Add PCI error recovery
> > > support to Intel PMT")
> > > Signed-off-by: David E. Box <david.e.box@linux.intel.com>
> > > ---
> > > 
> > > V6 - Move xa_alloc() before ida_alloc() to reduce mutex use during error
> > >      recovery.
> > >    - Fix return value after id_alloc() fail
> > >    - Add Fixes tag
> > >    - Add more detail to changelog
> > > 
> > > V5 - New patch
> > > 
> > >  drivers/platform/x86/intel/vsec.c | 24 ++++++++++++++----------
> > >  drivers/platform/x86/intel/vsec.h |  1 +
> > >  2 files changed, 15 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/platform/x86/intel/vsec.c
> > > b/drivers/platform/x86/intel/vsec.c
> > > index c1f9e4471b28..2d568466b4e2 100644
> > > --- a/drivers/platform/x86/intel/vsec.c
> > > +++ b/drivers/platform/x86/intel/vsec.c
> > > @@ -120,6 +120,8 @@ static void intel_vsec_dev_release(struct device *dev)
> > >  {
> > >         struct intel_vsec_device *intel_vsec_dev = dev_to_ivdev(dev);
> > >  
> > > +       xa_erase(&auxdev_array, intel_vsec_dev->id);
> > > +
> > >         mutex_lock(&vsec_ida_lock);
> > >         ida_free(intel_vsec_dev->ida, intel_vsec_dev->auxdev.id);
> > >         mutex_unlock(&vsec_ida_lock);
> > > @@ -135,19 +137,27 @@ int intel_vsec_add_aux(struct pci_dev *pdev, struct
> > > device *parent,
> > >         struct auxiliary_device *auxdev = &intel_vsec_dev->auxdev;
> > >         int ret, id;
> > >  
> > > -       mutex_lock(&vsec_ida_lock);
> > > -       ret = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL);
> > > -       mutex_unlock(&vsec_ida_lock);
> > > +       ret = xa_alloc(&auxdev_array, &intel_vsec_dev->id, intel_vsec_dev,
> > > +                      PMT_XA_LIMIT, GFP_KERNEL);
> > >         if (ret < 0) {
> > >                 kfree(intel_vsec_dev->resource);
> > >                 kfree(intel_vsec_dev);
> > >                 return ret;
> > >         }
> > >  
> > > +       mutex_lock(&vsec_ida_lock);
> > > +       id = ida_alloc(intel_vsec_dev->ida, GFP_KERNEL);
> > > +       mutex_unlock(&vsec_ida_lock);
> > > +       if (id < 0) {
> > > +               kfree(intel_vsec_dev->resource);
> > > +               kfree(intel_vsec_dev);
> > > +               return id;
> > 
> > Thanks, this looks much better this way around but it is missing 
> > xa_alloc() rollback now that the order is reversed.
> > 
> > Once that is fixed,
> > 
> > Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> 
> I have fixed this up, adding the missing:
> 
>         xa_erase(&auxdev_array, intel_vsec_dev->id);
> 
> to this error-exit path while merging this.

Thanks. Does that include the rest of the series which was all reviewed by Ilpo?

> 
> While looking into this I did find one other thing which
> worries me (different issue, needs a separate fix):
> 
> intel_vsec_pci_slot_reset() uses
> 
>                 devm_release_action(&pdev->dev, intel_vsec_remove_aux,
>                                     &intel_vsec_dev->auxdev);
> 
> and seems to assume that after this intel_vsec_remove_aux()
> has run for the auxdev-s. *But this is not the case*
> 
> devm_release_action() only removes the action from the list
> of devres resources tied to the parent PCI device.
> 
> It does *NOT* call the registered action function,
> so intel_vsec_remove_aux() is NOT called here.
> 
> And then on re-probing the device as is done in
> intel_vsec_pci_slot_reset() a second set of aux
> devices with the same parent will be created AFAICT.
> 
> So it seems that this also needs an explicit
> intel_vsec_remove_aux() call for each auxdev!
> 
> ###
> 
> This makes me wonder if the PCI error handling here
> and specifically the reset code was ever tested ?

I recall the code was tested using error injection to cause the slot reset to
occur and reprobe was confirmed. I'll have to find out the specific test so that
we can check it again with the proposed fix and ensure no leaks.

> 
> ###
> 
> Note that simply forcing a reprobe using device_reprobe()
> will cause all the aux-devices to also get removed through
> the action on driver-unbind without ever needing
> the auxdev_array at all!

Okay. That would be a lot simpler.

> 
> I guess that you want the removal to happen before
> the pci_restore_state(pdev) state though, so that
> simply relying on the removal on driver unbind
> is not an option ?

I'm not sure this is needed given the simplicity of the device. The array was
added only to track the list of devices and reprobe the one that was reset.
We'll look at changing this to do driver_reprobe() instead. Thanks.

David



  reply	other threads:[~2023-12-04 19:57 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-29 22:21 [PATCH V6 00/20] intel_pmc: Add telemetry API to read counters David E. Box
2023-11-29 22:21 ` [PATCH V6 01/20] platform/x86/intel/vsec: Fix xa_alloc memory leak David E. Box
2023-11-30 11:02   ` Ilpo Järvinen
2023-12-04 13:51     ` Hans de Goede
2023-12-04 19:57       ` David E. Box [this message]
2023-12-04 20:13         ` Hans de Goede
2023-11-29 22:21 ` [PATCH V6 02/20] platform/x86/intel/vsec: Remove unnecessary return David E. Box
2023-11-30 11:03   ` Ilpo Järvinen
2023-11-29 22:21 ` [PATCH V6 03/20] platform/x86/intel/vsec: Move structures to header David E. Box
2023-11-29 22:21 ` [PATCH V6 04/20] platform/x86/intel/vsec: remove platform_info from vsec device structure David E. Box
2023-11-29 22:21 ` [PATCH V6 05/20] platform/x86/intel/vsec: Use cleanup.h David E. Box
2023-11-29 22:21 ` [PATCH V6 06/20] platform/x86/intel/vsec: Assign auxdev parent by argument David E. Box
2023-11-29 22:21 ` [PATCH V6 07/20] platform/x86/intel/vsec: Add intel_vsec_register David E. Box
2023-11-30 11:23   ` Ilpo Järvinen
2023-11-29 22:21 ` [PATCH V6 08/20] platform/x86/intel/vsec: Add base address field David E. Box
2023-11-29 22:21 ` [PATCH V6 09/20] platform/x86/intel/pmt: Add header to struct intel_pmt_entry David E. Box
2023-11-29 22:21 ` [PATCH V6 10/20] platform/x86/intel/pmt: telemetry: Export API to read telemetry David E. Box
2023-11-29 22:21 ` [PATCH V6 11/20] platform/x86:intel/pmc: Call pmc_get_low_power_modes from platform init David E. Box
2023-11-29 22:21 ` [PATCH V6 12/20] platform/x86/intel/pmc: Allow pmc_core_ssram_init to fail David E. Box
2023-11-29 22:21 ` [PATCH V6 13/20] platform/x86/intel/pmc: Cleanup SSRAM discovery David E. Box
2023-11-29 22:21 ` [PATCH V6 14/20] platform/x86/intel/pmc/mtl: Use return value from pmc_core_ssram_init() David E. Box
2023-11-29 22:21 ` [PATCH V6 15/20] platform/x86/intel/pmc: Find and register PMC telemetry entries David E. Box
2023-11-29 22:21 ` [PATCH V6 16/20] platform/x86/intel/pmc: Display LPM requirements for multiple PMCs David E. Box
2023-11-29 22:21 ` [PATCH V6 17/20] platform/x86/intel/pmc: Retrieve LPM information using Intel PMT David E. Box
2023-11-29 22:21 ` [PATCH V6 18/20] platform/x86/intel/pmc: Read low power mode requirements for MTL-M and MTL-P David E. Box
2023-11-29 22:21 ` [PATCH V6 19/20] platform/x86/intel/pmc: Add debug attribute for Die C6 counter David E. Box
2023-11-29 22:21 ` [PATCH V6 20/20] platform/x86/intel/pmc: Show Die C6 counter on Meteor Lake David E. Box
2023-12-04 13:55 ` [PATCH V6 00/20] intel_pmc: Add telemetry API to read counters Hans de Goede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=198aadd7a91152393ec56d421fa042d30378af40.camel@linux.intel.com \
    --to=david.e.box@linux.intel.com \
    --cc=hdegoede@redhat.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=rajvi.jingar@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox