LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 08/13] drm: remove drm_fb_helper_modinit
From: Christoph Hellwig @ 2021-01-21  8:28 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	Linux Kernel Mailing List, Maxime Ripard, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, live-patching, Miroslav Benes, linuxppc-dev,
	Christoph Hellwig
In-Reply-To: <CAKMK7uFo3epNAUdcp0vvW=VyWMMTZghGyRTPbz_Z37S6nem_2A@mail.gmail.com>

On Thu, Jan 21, 2021 at 09:25:40AM +0100, Daniel Vetter wrote:
> On Thu, Jan 21, 2021 at 8:55 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > drm_fb_helper_modinit has a lot of boilerplate for what is not very
> > simple functionality.  Just open code it in the only caller using
> > IS_ENABLED and IS_MODULE.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> I didn't spot any dependencies with your series, should I just merge
> this through drm trees? Or do you want an ack?

I'd prefer an ACK - module_loaded() is only introduced earlier in this
series.

^ permalink raw reply

* Re: [PATCH 08/13] drm: remove drm_fb_helper_modinit
From: Daniel Vetter @ 2021-01-21  8:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	Linux Kernel Mailing List, Maxime Ripard, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, live-patching, Miroslav Benes, linuxppc-dev
In-Reply-To: <20210121074959.313333-9-hch@lst.de>

On Thu, Jan 21, 2021 at 8:55 AM Christoph Hellwig <hch@lst.de> wrote:
>
> drm_fb_helper_modinit has a lot of boilerplate for what is not very
> simple functionality.  Just open code it in the only caller using
> IS_ENABLED and IS_MODULE.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I didn't spot any dependencies with your series, should I just merge
this through drm trees? Or do you want an ack?
-Daniel

> ---
>  drivers/gpu/drm/drm_crtc_helper_internal.h | 10 ---------
>  drivers/gpu/drm/drm_fb_helper.c            | 16 -------------
>  drivers/gpu/drm/drm_kms_helper_common.c    | 26 +++++++++++-----------
>  3 files changed, 13 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_crtc_helper_internal.h b/drivers/gpu/drm/drm_crtc_helper_internal.h
> index 25ce42e799952c..61e09f8a8d0ff0 100644
> --- a/drivers/gpu/drm/drm_crtc_helper_internal.h
> +++ b/drivers/gpu/drm/drm_crtc_helper_internal.h
> @@ -32,16 +32,6 @@
>  #include <drm/drm_encoder.h>
>  #include <drm/drm_modes.h>
>
> -/* drm_fb_helper.c */
> -#ifdef CONFIG_DRM_FBDEV_EMULATION
> -int drm_fb_helper_modinit(void);
> -#else
> -static inline int drm_fb_helper_modinit(void)
> -{
> -       return 0;
> -}
> -#endif
> -
>  /* drm_dp_aux_dev.c */
>  #ifdef CONFIG_DRM_DP_AUX_CHARDEV
>  int drm_dp_aux_dev_init(void);
> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> index ce6d63ca75c32a..0b9f1ae1b7864c 100644
> --- a/drivers/gpu/drm/drm_fb_helper.c
> +++ b/drivers/gpu/drm/drm_fb_helper.c
> @@ -2499,19 +2499,3 @@ void drm_fbdev_generic_setup(struct drm_device *dev,
>         drm_client_register(&fb_helper->client);
>  }
>  EXPORT_SYMBOL(drm_fbdev_generic_setup);
> -
> -/* The Kconfig DRM_KMS_HELPER selects FRAMEBUFFER_CONSOLE (if !EXPERT)
> - * but the module doesn't depend on any fb console symbols.  At least
> - * attempt to load fbcon to avoid leaving the system without a usable console.
> - */
> -int __init drm_fb_helper_modinit(void)
> -{
> -#if defined(CONFIG_FRAMEBUFFER_CONSOLE_MODULE) && !defined(CONFIG_EXPERT)
> -       const char name[] = "fbcon";
> -
> -       if (!module_loaded(name))
> -               request_module_nowait(name);
> -#endif
> -       return 0;
> -}
> -EXPORT_SYMBOL(drm_fb_helper_modinit);
> diff --git a/drivers/gpu/drm/drm_kms_helper_common.c b/drivers/gpu/drm/drm_kms_helper_common.c
> index 221a8528c9937a..b694a7da632eae 100644
> --- a/drivers/gpu/drm/drm_kms_helper_common.c
> +++ b/drivers/gpu/drm/drm_kms_helper_common.c
> @@ -64,19 +64,19 @@ MODULE_PARM_DESC(edid_firmware,
>
>  static int __init drm_kms_helper_init(void)
>  {
> -       int ret;
> -
> -       /* Call init functions from specific kms helpers here */
> -       ret = drm_fb_helper_modinit();
> -       if (ret < 0)
> -               goto out;
> -
> -       ret = drm_dp_aux_dev_init();
> -       if (ret < 0)
> -               goto out;
> -
> -out:
> -       return ret;
> +       /*
> +        * The Kconfig DRM_KMS_HELPER selects FRAMEBUFFER_CONSOLE (if !EXPERT)
> +        * but the module doesn't depend on any fb console symbols.  At least
> +        * attempt to load fbcon to avoid leaving the system without a usable
> +        * console.
> +        */
> +       if (IS_ENABLED(CONFIG_DRM_FBDEV_EMULATION) &&
> +           IS_MODULE(CONFIG_FRAMEBUFFER_CONSOLE) &&
> +           !IS_ENABLED(CONFIG_EXPERT) &&
> +           !module_loaded("fbcon"))
> +               request_module_nowait("fbcon");
> +
> +       return drm_dp_aux_dev_init();
>  }
>
>  static void __exit drm_kms_helper_exit(void)
> --
> 2.29.2
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH 08/13] drm: remove drm_fb_helper_modinit
From: Daniel Vetter @ 2021-01-21  8:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	Linux Kernel Mailing List, Maxime Ripard, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, live-patching, Miroslav Benes, linuxppc-dev
In-Reply-To: <20210121082820.GA25719@lst.de>

On Thu, Jan 21, 2021 at 9:28 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Thu, Jan 21, 2021 at 09:25:40AM +0100, Daniel Vetter wrote:
> > On Thu, Jan 21, 2021 at 8:55 AM Christoph Hellwig <hch@lst.de> wrote:
> > >
> > > drm_fb_helper_modinit has a lot of boilerplate for what is not very
> > > simple functionality.  Just open code it in the only caller using
> > > IS_ENABLED and IS_MODULE.
> > >
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> >
> > I didn't spot any dependencies with your series, should I just merge
> > this through drm trees? Or do you want an ack?
>
> I'd prefer an ACK - module_loaded() is only introduced earlier in this
> series.

I was looking for that but didn't find the hunk touching drm somehow ...

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH 01/13] powerpc/powernv: remove get_cxl_module
From: Andrew Donnellan @ 2021-01-21  9:09 UTC (permalink / raw)
  To: Christoph Hellwig, Frederic Barrat, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Daniel Vetter,
	Jessica Yu, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence
  Cc: Michal Marek, linux-kbuild, Masahiro Yamada, linux-kernel,
	dri-devel, live-patching, linuxppc-dev
In-Reply-To: <20210121074959.313333-2-hch@lst.de>

On 21/1/21 6:49 pm, Christoph Hellwig wrote:
> The static inline get_cxl_module function is entirely unused,
> remove it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

The one user of this was removed in 8bf6b91a5125a ("Revert 
"powerpc/powernv: Add support for the cxl kernel api on the real phb").

Thanks for picking this up.

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply

* Re: [PATCH 1/2] crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
From: Christophe Leroy @ 2021-01-21  9:54 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux Crypto Mailing List, Linux Kernel Mailing List,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Herbert Xu,
	David S. Miller
In-Reply-To: <CAMj1kXHz8LdDgfOcifcB-MBMM9-TbymOU_psT3JBFQfyvQ=EjQ@mail.gmail.com>



Le 21/01/2021 à 08:31, Ard Biesheuvel a écrit :
> On Thu, 21 Jan 2021 at 06:35, Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>>
>>
>> Le 20/01/2021 à 23:23, Ard Biesheuvel a écrit :
>>> On Wed, 20 Jan 2021 at 19:59, Christophe Leroy
>>> <christophe.leroy@csgroup.eu> wrote:
>>>>
>>>> Talitos Security Engine AESU considers any input
>>>> data size that is not a multiple of 16 bytes to be an error.
>>>> This is not a problem in general, except for Counter mode
>>>> that is a stream cipher and can have an input of any size.
>>>>
>>>> Test Manager for ctr(aes) fails on 4th test vector which has
>>>> a length of 499 while all previous vectors which have a 16 bytes
>>>> multiple length succeed.
>>>>
>>>> As suggested by Freescale, round up the input data length to the
>>>> nearest 16 bytes.
>>>>
>>>> Fixes: 5e75ae1b3cef ("crypto: talitos - add new crypto modes")
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>>
>>> Doesn't this cause the hardware to write outside the given buffer?
>>
>>
>> Only the input length is modified. Not the output length.
>>
>> The ERRATA says:
>>
>> The input data length (in the descriptor) can be rounded up to the nearest 16B. Set the
>> data-in length (in the descriptor) to include X bytes of data beyond the payload. Set the
>> data-out length to only output the relevant payload (don't need to output the padding).
>> SEC reads from memory are not destructive, so the extra bytes included in the AES-CTR
>> operation can be whatever bytes are contiguously trailing the payload.
> 
> So what happens if the input is not 16 byte aligned, and rounding it
> up causes it to extend across a page boundary into a page that is not
> mapped by the IOMMU/SMMU?
> 

What is the IOMMU/SMMU ?

The mpc8xx, mpc82xx and mpc83xx which embed the Talitos Security Engine don't have such thing, the 
security engine uses DMA and has direct access to the memory bus for reading and writing.

Christophe

^ permalink raw reply

* Re: [PATCH 02/13] module: add a module_loaded helper
From: Christophe Leroy @ 2021-01-21  9:59 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20210121074959.313333-3-hch@lst.de>



Le 21/01/2021 à 08:49, Christoph Hellwig a écrit :
> Add a helper that takes modules_mutex and uses find_module to check if a
> given module is loaded.  This provides a better abstraction for the two
> callers, and allows to unexport modules_mutex and find_module.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/gpu/drm/drm_fb_helper.c |  7 +------
>   include/linux/module.h          |  3 +++
>   kernel/module.c                 | 14 ++++++++++++--
>   kernel/trace/trace_kprobe.c     |  4 +---
>   4 files changed, 17 insertions(+), 11 deletions(-)
> 

> diff --git a/include/linux/module.h b/include/linux/module.h
> index 7a0bcb5b1ffccd..b4654f8a408134 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -589,6 +589,9 @@ static inline bool within_module(unsigned long addr, const struct module *mod)
>   /* Search for module by name: must hold module_mutex. */
>   struct module *find_module(const char *name);
>   
> +/* Check if a module is loaded. */
> +bool module_loaded(const char *name);

Maybe module_is_loaded() would be a better name.

^ permalink raw reply

* Re: [PATCH 02/13] module: add a module_loaded helper
From: Christophe Leroy @ 2021-01-21 10:00 UTC (permalink / raw)
  To: Christoph Hellwig, Frederic Barrat, Andrew Donnellan,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter, Jessica Yu, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek, Joe Lawrence
  Cc: Michal Marek, linux-kbuild, Masahiro Yamada, linux-kernel,
	dri-devel, live-patching, linuxppc-dev
In-Reply-To: <20210121074959.313333-3-hch@lst.de>




Le 21/01/2021 à 08:49, Christoph Hellwig a écrit :
 > Add a helper that takes modules_mutex and uses find_module to check if a
 > given module is loaded.  This provides a better abstraction for the two
 > callers, and allows to unexport modules_mutex and find_module.
 >
 > Signed-off-by: Christoph Hellwig <hch@lst.de>
 > ---
 >   drivers/gpu/drm/drm_fb_helper.c |  7 +------
 >   include/linux/module.h          |  3 +++
 >   kernel/module.c                 | 14 ++++++++++++--
 >   kernel/trace/trace_kprobe.c     |  4 +---
 >   4 files changed, 17 insertions(+), 11 deletions(-)
 >

 > diff --git a/include/linux/module.h b/include/linux/module.h
 > index 7a0bcb5b1ffccd..b4654f8a408134 100644
 > --- a/include/linux/module.h
 > +++ b/include/linux/module.h
 > @@ -589,6 +589,9 @@ static inline bool within_module(unsigned long addr, const struct module *mod)
 >   /* Search for module by name: must hold module_mutex. */
 >   struct module *find_module(const char *name);
 >   +/* Check if a module is loaded. */
 > +bool module_loaded(const char *name);

Maybe module_is_loaded() would be a better name.

^ permalink raw reply

* Re: [PATCH 1/2] crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
From: Ard Biesheuvel @ 2021-01-21 10:02 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Linux Crypto Mailing List, Linux Kernel Mailing List,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Herbert Xu,
	David S. Miller
In-Reply-To: <ecdd07b3-afca-7e26-b6b6-3a3a985bc5a1@csgroup.eu>

On Thu, 21 Jan 2021 at 10:54, Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 21/01/2021 à 08:31, Ard Biesheuvel a écrit :
> > On Thu, 21 Jan 2021 at 06:35, Christophe Leroy
> > <christophe.leroy@csgroup.eu> wrote:
> >>
> >>
> >>
> >> Le 20/01/2021 à 23:23, Ard Biesheuvel a écrit :
> >>> On Wed, 20 Jan 2021 at 19:59, Christophe Leroy
> >>> <christophe.leroy@csgroup.eu> wrote:
> >>>>
> >>>> Talitos Security Engine AESU considers any input
> >>>> data size that is not a multiple of 16 bytes to be an error.
> >>>> This is not a problem in general, except for Counter mode
> >>>> that is a stream cipher and can have an input of any size.
> >>>>
> >>>> Test Manager for ctr(aes) fails on 4th test vector which has
> >>>> a length of 499 while all previous vectors which have a 16 bytes
> >>>> multiple length succeed.
> >>>>
> >>>> As suggested by Freescale, round up the input data length to the
> >>>> nearest 16 bytes.
> >>>>
> >>>> Fixes: 5e75ae1b3cef ("crypto: talitos - add new crypto modes")
> >>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> >>>
> >>> Doesn't this cause the hardware to write outside the given buffer?
> >>
> >>
> >> Only the input length is modified. Not the output length.
> >>
> >> The ERRATA says:
> >>
> >> The input data length (in the descriptor) can be rounded up to the nearest 16B. Set the
> >> data-in length (in the descriptor) to include X bytes of data beyond the payload. Set the
> >> data-out length to only output the relevant payload (don't need to output the padding).
> >> SEC reads from memory are not destructive, so the extra bytes included in the AES-CTR
> >> operation can be whatever bytes are contiguously trailing the payload.
> >
> > So what happens if the input is not 16 byte aligned, and rounding it
> > up causes it to extend across a page boundary into a page that is not
> > mapped by the IOMMU/SMMU?
> >
>
> What is the IOMMU/SMMU ?
>
> The mpc8xx, mpc82xx and mpc83xx which embed the Talitos Security Engine don't have such thing, the
> security engine uses DMA and has direct access to the memory bus for reading and writing.
>

OK, good. So the only case where this could break is when the DMA
access spills over into a page that does not exist, and I suppose this
could only happen if the transfer involves a buffer located at the
very top of DRAM, right?

^ permalink raw reply

* Re: [PATCH 1/2] crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error)
From: Christophe Leroy @ 2021-01-21 10:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux Crypto Mailing List, Linux Kernel Mailing List,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Herbert Xu,
	David S. Miller
In-Reply-To: <CAMj1kXFNZba9T45RaB_W58Z+4sdAyUDVJM_ZZPk+Y6Mf9DZUQw@mail.gmail.com>



Le 21/01/2021 à 11:02, Ard Biesheuvel a écrit :
> On Thu, 21 Jan 2021 at 10:54, Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>>
>>
>> Le 21/01/2021 à 08:31, Ard Biesheuvel a écrit :
>>> On Thu, 21 Jan 2021 at 06:35, Christophe Leroy
>>> <christophe.leroy@csgroup.eu> wrote:
>>>>
>>>>
>>>>
>>>> Le 20/01/2021 à 23:23, Ard Biesheuvel a écrit :
>>>>> On Wed, 20 Jan 2021 at 19:59, Christophe Leroy
>>>>> <christophe.leroy@csgroup.eu> wrote:
>>>>>>
>>>>>> Talitos Security Engine AESU considers any input
>>>>>> data size that is not a multiple of 16 bytes to be an error.
>>>>>> This is not a problem in general, except for Counter mode
>>>>>> that is a stream cipher and can have an input of any size.
>>>>>>
>>>>>> Test Manager for ctr(aes) fails on 4th test vector which has
>>>>>> a length of 499 while all previous vectors which have a 16 bytes
>>>>>> multiple length succeed.
>>>>>>
>>>>>> As suggested by Freescale, round up the input data length to the
>>>>>> nearest 16 bytes.
>>>>>>
>>>>>> Fixes: 5e75ae1b3cef ("crypto: talitos - add new crypto modes")
>>>>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>>>>
>>>>> Doesn't this cause the hardware to write outside the given buffer?
>>>>
>>>>
>>>> Only the input length is modified. Not the output length.
>>>>
>>>> The ERRATA says:
>>>>
>>>> The input data length (in the descriptor) can be rounded up to the nearest 16B. Set the
>>>> data-in length (in the descriptor) to include X bytes of data beyond the payload. Set the
>>>> data-out length to only output the relevant payload (don't need to output the padding).
>>>> SEC reads from memory are not destructive, so the extra bytes included in the AES-CTR
>>>> operation can be whatever bytes are contiguously trailing the payload.
>>>
>>> So what happens if the input is not 16 byte aligned, and rounding it
>>> up causes it to extend across a page boundary into a page that is not
>>> mapped by the IOMMU/SMMU?
>>>
>>
>> What is the IOMMU/SMMU ?
>>
>> The mpc8xx, mpc82xx and mpc83xx which embed the Talitos Security Engine don't have such thing, the
>> security engine uses DMA and has direct access to the memory bus for reading and writing.
>>
> 
> OK, good. So the only case where this could break is when the DMA
> access spills over into a page that does not exist, and I suppose this
> could only happen if the transfer involves a buffer located at the
> very top of DRAM, right?
> 

Right.

Christophe

^ permalink raw reply

* RE: [PATCH 02/13] module: add a module_loaded helper
From: David Laight @ 2021-01-21 10:13 UTC (permalink / raw)
  To: 'Christophe Leroy', linuxppc-dev@lists.ozlabs.org
In-Reply-To: <9052b54a-e05a-1534-9e0f-c73c8b3509bd@csgroup.eu>

From: Christophe Leroy
> Sent: 21 January 2021 10:00
> 
> Le 21/01/2021 à 08:49, Christoph Hellwig a écrit :
> > Add a helper that takes modules_mutex and uses find_module to check if a
> > given module is loaded.  This provides a better abstraction for the two
> > callers, and allows to unexport modules_mutex and find_module.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >   drivers/gpu/drm/drm_fb_helper.c |  7 +------
> >   include/linux/module.h          |  3 +++
> >   kernel/module.c                 | 14 ++++++++++++--
> >   kernel/trace/trace_kprobe.c     |  4 +---
> >   4 files changed, 17 insertions(+), 11 deletions(-)
> >
> 
> > diff --git a/include/linux/module.h b/include/linux/module.h
> > index 7a0bcb5b1ffccd..b4654f8a408134 100644
> > --- a/include/linux/module.h
> > +++ b/include/linux/module.h
> > @@ -589,6 +589,9 @@ static inline bool within_module(unsigned long addr, const struct module *mod)
> >   /* Search for module by name: must hold module_mutex. */
> >   struct module *find_module(const char *name);
> >
> > +/* Check if a module is loaded. */
> > +bool module_loaded(const char *name);
> 
> Maybe module_is_loaded() would be a better name.

I can't see the original patch.

What is the point of the function.
By the time it returns the information is stale - so mostly useless.

Surely you need to use try_module_get() instead?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH 02/13] module: add a module_loaded helper
From: Christophe Leroy @ 2021-01-21 10:17 UTC (permalink / raw)
  To: David Laight, linuxppc-dev@lists.ozlabs.org, Christoph Hellwig
In-Reply-To: <39a4c883684c418ba324c3db702802b6@AcuMS.aculab.com>

Le 21/01/2021 à 11:13, David Laight a écrit :
> From: Christophe Leroy

Sorry I hit "Reply to the list" instead of "Reply all"

Cc Christoph H. who is the author.

>> Sent: 21 January 2021 10:00
>>
>> Le 21/01/2021 à 08:49, Christoph Hellwig a écrit :
>>> Add a helper that takes modules_mutex and uses find_module to check if a
>>> given module is loaded.  This provides a better abstraction for the two
>>> callers, and allows to unexport modules_mutex and find_module.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> ---
>>>    drivers/gpu/drm/drm_fb_helper.c |  7 +------
>>>    include/linux/module.h          |  3 +++
>>>    kernel/module.c                 | 14 ++++++++++++--
>>>    kernel/trace/trace_kprobe.c     |  4 +---
>>>    4 files changed, 17 insertions(+), 11 deletions(-)
>>>
>>
>>> diff --git a/include/linux/module.h b/include/linux/module.h
>>> index 7a0bcb5b1ffccd..b4654f8a408134 100644
>>> --- a/include/linux/module.h
>>> +++ b/include/linux/module.h
>>> @@ -589,6 +589,9 @@ static inline bool within_module(unsigned long addr, const struct module *mod)
>>>    /* Search for module by name: must hold module_mutex. */
>>>    struct module *find_module(const char *name);
>>>
>>> +/* Check if a module is loaded. */
>>> +bool module_loaded(const char *name);
>>
>> Maybe module_is_loaded() would be a better name.
> 
> I can't see the original patch.

You have it there 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210121074959.313333-3-hch@lst.de/

> 
> What is the point of the function.
> By the time it returns the information is stale - so mostly useless.
> 
> Surely you need to use try_module_get() instead?
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply

* Re: [PATCH] PCI: dwc: layerscape: convert to builtin_platform_driver()
From: Geert Uytterhoeven @ 2021-01-21 11:01 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Roy Zang, Lorenzo Pieralisi, PCI, LKML, Minghuan Lian,
	Michael Walle, linux-arm-kernel, Greg Kroah-Hartman,
	Bjorn Helgaas, linuxppc-dev, Mingkai Hu
In-Reply-To: <CAGETcx8_6Hp+MWFOhRohXwdWFSfCc7A=zpb5QYNHZE5zv0bDig@mail.gmail.com>

Hi Saravana,

On Thu, Jan 21, 2021 at 1:05 AM Saravana Kannan <saravanak@google.com> wrote:
> On Wed, Jan 20, 2021 at 3:53 PM Michael Walle <michael@walle.cc> wrote:
> > Am 2021-01-20 20:47, schrieb Saravana Kannan:
> > > On Wed, Jan 20, 2021 at 11:28 AM Michael Walle <michael@walle.cc>
> > > wrote:
> > >>
> > >> [RESEND, fat-fingered the buttons of my mail client and converted
> > >> all CCs to BCCs :(]
> > >>
> > >> Am 2021-01-20 20:02, schrieb Saravana Kannan:
> > >> > On Wed, Jan 20, 2021 at 6:24 AM Rob Herring <robh@kernel.org> wrote:
> > >> >>
> > >> >> On Wed, Jan 20, 2021 at 4:53 AM Michael Walle <michael@walle.cc>
> > >> >> wrote:
> > >> >> >
> > >> >> > fw_devlink will defer the probe until all suppliers are ready. We can't
> > >> >> > use builtin_platform_driver_probe() because it doesn't retry after probe
> > >> >> > deferral. Convert it to builtin_platform_driver().
> > >> >>
> > >> >> If builtin_platform_driver_probe() doesn't work with fw_devlink, then
> > >> >> shouldn't it be fixed or removed?
> > >> >
> > >> > I was actually thinking about this too. The problem with fixing
> > >> > builtin_platform_driver_probe() to behave like
> > >> > builtin_platform_driver() is that these probe functions could be
> > >> > marked with __init. But there are also only 20 instances of
> > >> > builtin_platform_driver_probe() in the kernel:
> > >> > $ git grep ^builtin_platform_driver_probe | wc -l
> > >> > 20
> > >> >
> > >> > So it might be easier to just fix them to not use
> > >> > builtin_platform_driver_probe().
> > >> >
> > >> > Michael,
> > >> >
> > >> > Any chance you'd be willing to help me by converting all these to
> > >> > builtin_platform_driver() and delete builtin_platform_driver_probe()?
> > >>
> > >> If it just moving the probe function to the _driver struct and
> > >> remove the __init annotations. I could look into that.
> > >
> > > Yup. That's pretty much it AFAICT.
> > >
> > > builtin_platform_driver_probe() also makes sure the driver doesn't ask
> > > for async probe, etc. But I doubt anyone is actually setting async
> > > flags and still using builtin_platform_driver_probe().
> >
> > Hasn't module_platform_driver_probe() the same problem? And there
> > are ~80 drivers which uses that.
>
> Yeah. The biggest problem with all of these is the __init markers.
> Maybe some familiar with coccinelle can help?

And dropping them will increase memory usage.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH] powerpc/64: prevent replayed interrupt handlers from running softirqs
From: Michael Ellerman @ 2021-01-21 12:50 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20210120075005.1678565-1-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:
> Running softirqs enables interrupts, which can then end up recursing
> into the irq soft-mask code we're adjusting, including replaying
> interrupts itself, which might be theoretically unbounded.
>
> This abridged trace shows how this can occur:
>
> ! NIP replay_soft_interrupts
>   LR  interrupt_exit_kernel_prepare
>   Call Trace:
>     interrupt_exit_kernel_prepare (unreliable)
>     interrupt_return
>   --- interrupt: ea0 at __rb_reserve_next
>   NIP __rb_reserve_next
>   LR __rb_reserve_next
>   Call Trace:
>     ring_buffer_lock_reserve
>     trace_function
>     function_trace_call
>     ftrace_call
>     __do_softirq
>     irq_exit
>     timer_interrupt
> !   replay_soft_interrupts
>     interrupt_exit_kernel_prepare
>     interrupt_return
>   --- interrupt: ea0 at arch_local_irq_restore
>
> Fix this by disabling bhs (softirqs) around the interrupt replay.
>
> I don't know that commit 3282a3da25bd ("powerpc/64: Implement soft
> interrupt replay in C") actually introduced the problem. Prior to that
> change, the interrupt replay seems like it should still be subect to
> this recusion, however it's done after all the irq state has been fixed
> up at the end of the replay, so it seems reasonable to fix back to this
> commit.

This seems very unhappy for me (on P9 bare metal):

[    0.038571] Mountpoint-cache hash table entries: 131072 (order: 4, 1048576 bytes, linear)
[    0.040194] ------------[ cut here ]------------
[    0.040228] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:176 __local_bh_enable_ip+0x150/0x210
[    0.040263] Modules linked in:
[    0.040280] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-rc2-00008-g4899f32e4f2a #1
[    0.040321] NIP:  c000000000114bc0 LR: c0000000000172a0 CTR: c00000000002a020
[    0.040360] REGS: c00000000177f670 TRAP: 0700   Not tainted  (5.11.0-rc2-00008-g4899f32e4f2a)
[    0.040410] MSR:  9000000002021033 <SF,HV,VEC,ME,IR,DR,RI,LE>  CR: 28000224  XER: 20040000
[    0.040472] CFAR: c000000000114ae8 IRQMASK: 3
               GPR00: c0000000000172a0 c00000000177f910 c000000001783900 c000000000017290
               GPR04: 0000000000000200 4000000000000000 0000000000000002 00000001312d0000
               GPR08: 0000000000000000 c0000000016f3480 0000000000000202 0000000000000000
               GPR12: c00000000002a020 c0000000023a0000 0000000000000000 0000000000000000
               GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
               GPR20: 0000000000000001 00000000100051c6 0000000000000000 0000000000000009
               GPR24: 0000000000000e60 0000000000000900 0000000000000500 0000000000000a00
               GPR28: 0000000000000f00 0000000000000002 0000000000000003 0000000000000200
[    0.040824] NIP [c000000000114bc0] __local_bh_enable_ip+0x150/0x210
[    0.040863] LR [c0000000000172a0] replay_soft_interrupts+0x2e0/0x340
[    0.040904] Call Trace:
[    0.040926] [c00000000177f910] [0000000000000500] 0x500 (unreliable)
[    0.040962] [c00000000177f950] [c0000000000172a0] replay_soft_interrupts+0x2e0/0x340
[    0.041008] [c00000000177fb50] [c000000000017370] arch_local_irq_restore+0x70/0xe0
[    0.041042] [c00000000177fb80] [c000000000476514] kmem_cache_alloc+0x474/0x520
[    0.041066] [c00000000177fc00] [c0000000004e394c] __d_alloc+0x4c/0x2e0
[    0.041109] [c00000000177fc50] [c0000000004e40ac] d_make_root+0x3c/0xa0
[    0.041142] [c00000000177fc80] [c000000000679ce0] ramfs_fill_super+0x80/0xb0
[    0.041186] [c00000000177fcb0] [c0000000004c1b04] get_tree_nodev+0xb4/0x130
[    0.041230] [c00000000177fcf0] [c000000000679578] ramfs_get_tree+0x28/0x40
[    0.041282] [c00000000177fd10] [c0000000004bee78] vfs_get_tree+0x48/0x120
[    0.041325] [c00000000177fd80] [c0000000004f7fe0] vfs_kern_mount.part.0+0xd0/0x130
[    0.041368] [c00000000177fdc0] [c000000001366700] mnt_init+0x1c8/0x2fc
[    0.041420] [c00000000177fe60] [c000000001366178] vfs_caches_init+0x110/0x138
[    0.041454] [c00000000177fee0] [c000000001331020] start_kernel+0x6d8/0x780
[    0.041497] [c00000000177ff90] [c00000000000d354] start_here_common+0x1c/0x5c8
[    0.041539] Instruction dump:
[    0.041555] e9490002 394a0001 91490000 e90d0028 3d42ffcc 394a4730 7d0a42aa e9490002
[    0.041608] 2c280000 394affff 91490000 4082ff30 <0fe00000> 892d0988 39400001 994d0988
[    0.041660] irq event stamp: 555
[    0.041674] hardirqs last  enabled at (553): [<c00000000047654c>] kmem_cache_alloc+0x4ac/0x520
[    0.041707] hardirqs last disabled at (554): [<c000000000017368>] arch_local_irq_restore+0x68/0xe0
[    0.041750] softirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.041778] softirqs last disabled at (555): [<c000000000016fd0>] replay_soft_interrupts+0x10/0x340
[    0.041824] ---[ end trace aa6f9769e07e43db ]---


And lots and lots of these, or similar:


[   14.369838] =============================
[   14.369839] WARNING: suspicious RCU usage
[   14.369841] 5.11.0-rc2-00008-g4899f32e4f2a #1 Tainted: G        W
[   14.369843] -----------------------------
[   14.369844] include/linux/rcupdate.h:692 rcu_read_unlock() used illegally while idle!
[   14.369846]
               other info that might help us debug this:

[   14.369848]
               rcu_scheduler_active = 2, debug_locks = 1
[   14.369850] RCU used illegally from extended quiescent state!
[   14.369851] 2 locks held by swapper/32/0:
[   14.369853]  #0: c0000000015e6fc0 (rcu_callback){....}-{0:0}, at: rcu_core+0x2e0/0x990
[   14.369864]  #1: c0000000015e6f30 (rcu_read_lock){....}-{1:3}, at: kmem_cache_free+0x7cc/0x7e0
[   14.369874]
               stack backtrace:
[   14.369876] CPU: 32 PID: 0 Comm: swapper/32 Tainted: G        W         5.11.0-rc2-00008-g4899f32e4f2a #1
[   14.369879] Call Trace:
[   14.369880] [c000001fff557c10] [c0000000008630b8] dump_stack+0xec/0x144 (unreliable)
[   14.369886] [c000001fff557c60] [c0000000001ad2d0] lockdep_rcu_suspicious+0x124/0x144
[   14.369890] [c000001fff557cf0] [c00000000047783c] kmem_cache_free+0x2ac/0x7e0
[   14.369894] [c000001fff557db0] [c0000000004bdeac] file_free_rcu+0x5c/0xa0
[   14.369898] [c000001fff557de0] [c0000000001e214c] rcu_core+0x33c/0x990
[   14.369902] [c000001fff557e90] [c000000000f496d0] __do_softirq+0x180/0x688
[   14.369906] [c000001fff557f90] [c0000000000307bc] call_do_softirq+0x14/0x24
[   14.369911] [c000000002e1fab0] [c000000000017418] do_softirq_own_stack+0x38/0x50
[   14.369916] [c000000002e1fad0] [c000000000114a60] do_softirq+0x120/0x130
[   14.369920] [c000000002e1fb00] [c000000000114c64] __local_bh_enable_ip+0x1f4/0x210
[   14.369924] [c000000002e1fb40] [c0000000000172a0] replay_soft_interrupts+0x2e0/0x340
[   14.369928] [c000000002e1fd40] [c000000000017370] arch_local_irq_restore+0x70/0xe0
[   14.369933] [c000000002e1fd70] [c000000000c87184] snooze_loop+0x64/0x2e4
[   14.369937] [c000000002e1fdb0] [c000000000c84204] cpuidle_enter_state+0x2e4/0x550
[   14.369941] [c000000002e1fe10] [c000000000c8450c] cpuidle_enter+0x4c/0x70
[   14.369946] [c000000002e1fe50] [c00000000016892c] call_cpuidle+0x4c/0x90
[   14.369949] [c000000002e1fe70] [c000000000168d74] do_idle+0x2f4/0x380
[   14.369953] [c000000002e1ff10] [c000000000169208] cpu_startup_entry+0x38/0x40
[   14.369957] [c000000002e1ff40] [c000000000053484] start_secondary+0x2a4/0x2b0
[   14.369961] [c000000002e1ff90] [c00000000000d254] start_secondary_prolog+0x10/0x14


cheers

^ permalink raw reply

* Re: [PATCH 5/6] powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
From: Nathan Lynch @ 2021-01-21 15:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: tyreld, brking, ajd, aneesh.kumar
In-Reply-To: <7988dce5-6cf3-df79-1276-7bc91ce7c8b2@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> On 20/01/2021 12:17, Nathan Lynch wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>> On 16/01/2021 02:56, Nathan Lynch wrote:
>>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>>> On 15/01/2021 09:00, Nathan Lynch wrote:
>>>>>> +#define RTAS_WORK_AREA_SIZE   4096
>>>>>> +
>>>>>> +/* Work areas allocated for user space access. */
>>>>>> +#define RTAS_USER_REGION_SIZE (RTAS_WORK_AREA_SIZE * 16)
>>>>>
>>>>> This is still 64K but no clarity why. There is 16 of something, what
>>>>> is it?
>>>>
>>>> There are 16 4KB work areas in the region. I can name it
>>>> RTAS_NR_USER_WORK_AREAS or similar.
>>>
>>>
>>> Why 16? PAPR (then add "per PAPR") or we just like 16 ("should be
>>> enough")?
>> 
>> PAPR doesn't know anything about the user region; it's a Linux
>> construct. It's been 64KB since pre-git days and I'm not sure what the
>> original reason is. At this point, maintaining a kernel-user ABI seems
>> like enough justification for the value.
>
> I am not arguing keeping the numbers but you are replacing one magic 
> number with another and for neither it is horribly obvious where they 
> came from.

When I wrote it I viewed it as changing one of the factors in (64 *
1024) to a named constant that better expresses how the region is used
and adjusting the remaining factor to arrive at the same end result. I
considered it a net improvement even if we're not sure how 64K was
arrived at in the first place, although I suspect it was chosen to
support multiple concurrent users, and to be compatible with both 4K
and 64K page sizes. Then again 64K pages came a bit after this was
introduced.

The change that introduced RTAS_RMOBUF_MAX (here renamed to
RTAS_USER_REGION_SIZE) does not explain how the value was derived:

================
Author: Andrew Morton <akpm@osdl.org>
Date:   Sun Jan 18 18:17:30 2004 -0800

    [PATCH] ppc64: add rtas syscall, from John Rose

    From: Anton Blanchard <anton@samba.org>

    Added RTAS syscall.  Reserved lowmem rtas_rmo_buf for userspace use.  Created
    "rmo_buffer" proc file to export bounds of rtas_rmo_buf.

[...]

diff --git a/include/asm-ppc64/rtas.h b/include/asm-ppc64/rtas.h
index 42a0b484077c..d9e426161044 100644
--- a/include/asm-ppc64/rtas.h
+++ b/include/asm-ppc64/rtas.h
@@ -19,6 +19,9 @@
 #define RTAS_UNKNOWN_SERVICE (-1)
 #define RTAS_INSTANTIATE_MAX (1UL<<30) /* Don't instantiate rtas at/above this value */

+/* Buffer size for ppc_rtas system call. */
+#define RTAS_RMOBUF_MAX (64 * 1024)
+
================

The comment "Buffer size for ppc_rtas system call" (removed by my
change) is not really appropriate because 1. not all sys_rtas
invocations use the buffer, and 2. no callers use the entire buffer.

> Is 16 the max number of concurrently running sys_rtas system 
> calls? Does the userspace ensure there is no more than 16?

No and no; not all calls to sys_rtas need to use a work area. However,
librtas uses record locking to arbitrate access to the user region, and
the unit of allocation is 4KB. This is a reasonable choice: many RTAS
calls which take a work area require 4KB alignment. But some do not
(ibm,get-system-parameter), and librtas conceivably could be made to
perform finer-grained allocations.

It's not the kernel's concern how librtas partitions the user region, so
I'm inclined to leave the (64 * 1024) expression alone now. Thanks for
your review.

> btw where is that userspace code? I thought
> https://github.com/power-ras/ppc64-diag.git but no. Thanks,

librtas, of which ppc64-diag and powerpc-utils are users:

https://github.com/ibm-power-utilities/librtas

^ permalink raw reply related

* Re: [PATCH 6/6] powerpc/rtas: constrain user region allocation to RMA
From: Nathan Lynch @ 2021-01-21 15:27 UTC (permalink / raw)
  To: Michael Ellerman, Alexey Kardashevskiy, linuxppc-dev
  Cc: tyreld, brking, ajd, aneesh.kumar
In-Reply-To: <87ft2vrew6.fsf@mpe.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>> On 16/01/2021 02:38, Nathan Lynch wrote:
>>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>>> On 15/01/2021 09:00, Nathan Lynch wrote:
>>>>>> Memory locations passed as arguments from the OS to RTAS usually need
>>>>>> to be addressable in 32-bit mode and must reside in the Real Mode
>>>>>> Area. On PAPR guests, the RMA starts at logical address 0 and is the
>>>>>> first logical memory block reported in the LPAR’s device tree.
>>>>>>
>>>>>> On powerpc targets with RTAS, Linux makes available to user space a
>>>>>> region of memory suitable for arguments to be passed to RTAS via
>>>>>> sys_rtas(). This region (rtas_rmo_buf) is allocated via the memblock
>>>>>> API during boot in order to ensure that it satisfies the requirements
>>>>>> described above.
>>>>>>
>>>>>> With radix MMU, the upper limit supplied to the memblock allocation
>>>>>> can exceed the bounds of the first logical memory block, since
>>>>>> ppc64_rma_size is ULONG_MAX and RTAS_INSTANTIATE_MAX is 1GB. (512MB is
>>>>>> a common size of the first memory block according to a small sample of
>>>>>> LPARs I have checked.) This leads to failures when user space invokes
>>>>>> an RTAS function that uses a work area, such as
>>>>>> ibm,configure-connector.
>>>>>>
>>>>>> Alter the determination of the upper limit for rtas_rmo_buf's
>>>>>> allocation to consult the device tree directly, ensuring placement
>>>>>> within the RMA regardless of the MMU in use.
>>>>>
>>>>> Can we tie this with RTAS (which also needs to be in RMA) and simply add
>>>>> extra 64K in prom_instantiate_rtas() and advertise this address
>>>>> (ALIGH_UP(rtas-base + rtas-size, PAGE_SIZE)) to the user space? We do
>>>>> not need this RMO area before that point.
>>>> 
>>>> Can you explain more about what advantage that would bring? I'm not
>>>> seeing it. It's a more significant change than what I've written
>>>> here.
>>>
>>>
>>> We already allocate space for RTAS and (like RMO) it needs to be in RMA, 
>>> and RMO is useless without RTAS. We can reuse RTAS allocation code for 
>>> RMO like this:
>>
>> When you say RMO I assume you are referring to rtas_rmo_buf? (I don't
>> think it is well-named.)
> ...
>
> RMO (Real mode offset) is the old term we used to use to refer to what
> is now called the RMA (Real mode area). There are still many references
> to RMO in Linux, but they almost certainly all refer to what we now call
> the RMA.

Yes... but I think in this discussion Alexey was using RMO to stand in
for rtas_rmo_buf, which was what I was trying to clarify.


>>> May be store in the FDT as "linux,rmo-base" next to "linux,rtas-base", 
>>> for clarity, as sharing symbols between prom and main kernel is a bit 
>>> tricky.
>>>
>>> The benefit is that we do not do the same thing   (== find 64K in RMA) 
>>> in 2 different ways and if the RMO allocated my way is broken - we'll 
>>> know it much sooner as RTAS itself will break too.
>>
>> Implementation details aside... I'll grant that combining the
>> allocations into one in prom_init reduces some duplication in the sense
>> that both are subject to the same constraints (mostly - the RTAS data
>> area must not cross a 256MB boundary, while the user region may). But
>> they really are distinct concerns. The RTAS private data area is
>> specified in the platform architecture, the OS is obligated to allocate
>> it and pass it to instantiate-rtas, etc etc. However the user region
>> (rtas_rmo_buf) is purely a Linux construct which is there to support
>> sys_rtas.
>>
>> Now, there are multiple sites in the kernel proper that must allocate
>> memory suitable for passing to RTAS. Obviously there is value in
>> consolidating the logic for that purpose in one place, so I'll work on
>> adding that in v2. OK?
>
> I don't think we want to move any allocations into prom_init.c unless we
> have to.
>
> It's best thought of as a trampoline, that runs before the kernel
> proper, to transition from live OF to a flat DT environment. One thing
> that must be done as part of that is instantiating RTAS, because it's
> basically a runtime copy of the live OF. But any other allocs are for
> Linux to handle later, IMHO.

Agreed.

^ permalink raw reply

* Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool
From: Rob Herring @ 2021-01-21 15:48 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Heikki Krogerus, Peter Zijlstra, Grant Likely, Paul Mackerras,
	Frank Rowand, Ingo Molnar, Marek Szyprowski, Stefano Stabellini,
	Saravana Kannan, Heinrich Schuchardt, Joerg Roedel,
	Wysocki, Rafael J, Christoph Hellwig, Bartosz Golaszewski,
	xen-devel, Thierry Reding, devicetree, Will Deacon,
	Konrad Rzeszutek Wilk, Dan Williams, Nicolas Boichat,
	Claire Chang, Boris Ostrovsky, Andy Shevchenko, Juergen Gross,
	Greg Kroah-Hartman, Randy Dunlap, linux-kernel@vger.kernel.org,
	Tomasz Figa, Linux IOMMU, linuxppc-dev, Thiago Jung Bauermann
In-Reply-To: <c0d631de-8840-4f6e-aebf-41bb8449f78c@arm.com>

On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-01-20 21:31, Rob Herring wrote:
> > On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 2021-01-20 16:53, Rob Herring wrote:
> >>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> >>>> Introduce the new compatible string, restricted-dma-pool, for restricted
> >>>> DMA. One can specify the address and length of the restricted DMA memory
> >>>> region by restricted-dma-pool in the device tree.
> >>>
> >>> If this goes into DT, I think we should be able to use dma-ranges for
> >>> this purpose instead. Normally, 'dma-ranges' is for physical bus
> >>> restrictions, but there's no reason it can't be used for policy or to
> >>> express restrictions the firmware has enabled.
> >>
> >> There would still need to be some way to tell SWIOTLB to pick up the
> >> corresponding chunk of memory and to prevent the kernel from using it
> >> for anything else, though.
> >
> > Don't we already have that problem if dma-ranges had a very small
> > range? We just get lucky because the restriction is generally much
> > more RAM than needed.
>
> Not really - if a device has a naturally tiny addressing capability that
> doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer will be
> allocated then it's unlikely to work well, but that's just crap system
> design. Yes, memory pressure in ZONE_DMA{32} is particularly problematic
> for such limited devices, but it's irrelevant to the issue at hand here.

Yesterday's crap system design is today's security feature. Couldn't
this feature make crap system design work better?

> What we have here is a device that's not allowed to see *kernel* memory
> at all. It's been artificially constrained to a particular region by a
> TZASC or similar, and the only data which should ever be placed in that

May have been constrained, but that's entirely optional.

In the optional case where the setup is entirely up to the OS, I don't
think this belongs in the DT at all. Perhaps that should be solved
first.

> region is data intended for that device to see. That way if it tries to
> go rogue it physically can't start slurping data intended for other
> devices or not mapped for DMA at all. The bouncing is an important part
> of this - I forget the title off-hand but there was an interesting paper
> a few years ago which demonstrated that even with an IOMMU, streaming
> DMA of in-place buffers could reveal enough adjacent data from the same
> page to mount an attack on the system. Memory pressure should be
> immaterial since the size of each bounce pool carveout will presumably
> be tuned for the needs of the given device.
>
> > In any case, wouldn't finding all the dma-ranges do this? We're
> > already walking the tree to find the max DMA address now.
>
> If all you can see are two "dma-ranges" properties, how do you propose
> to tell that one means "this is the extent of what I can address, please
> set my masks and dma-range-map accordingly and try to allocate things
> where I can reach them" while the other means "take this output range
> away from the page allocator and hook it up as my dedicated bounce pool,
> because it is Serious Security Time"? Especially since getting that
> choice wrong either way would be a Bad Thing.

Either we have some heuristic based on the size or we add some hint.
The point is let's build on what we already have for defining DMA
accessible memory in DT rather than some parallel mechanism.

Rob

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines
From: Konrad Rzeszutek Wilk @ 2021-01-21 15:54 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: linuxppc-dev, Satheesh Rajendran, Ram Pai, linux-kernel
In-Reply-To: <87bldzlzu2.fsf@manicouagan.localdomain>

On Fri, Jan 08, 2021 at 09:27:01PM -0300, Thiago Jung Bauermann wrote:
> 
> Ram Pai <linuxram@us.ibm.com> writes:
> 
> > On Wed, Dec 23, 2020 at 09:06:01PM -0300, Thiago Jung Bauermann wrote:
> >> 
> >> Hi Ram,
> >> 
> >> Thanks for reviewing this patch.
> >> 
> >> Ram Pai <linuxram@us.ibm.com> writes:
> >> 
> >> > On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote:
> >> >> On server-class POWER machines, we don't need the SWIOTLB unless we're a
> >> >> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we unconditionally
> >> >> allocate it.
> >> >> 
> >> >> In most cases this is harmless, but on a few machine configurations (e.g.,
> >> >> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) it can
> >> >> happen that memblock can't find a 64 MB chunk of memory for the SWIOTLB and
> >> >> fails with a scary-looking WARN_ONCE:
> >> >> 
> >> >>  ------------[ cut here ]------------
> >> >>  memblock: bottom-up allocation failed, memory hotremove may be affected
> >> >>  WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x328/0x340
> >> >>  Modules linked in:
> >> >>  CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6
> >> >>  NIP:  c000000000442f38 LR: c000000000442f34 CTR: c0000000001e0080
> >> >>  REGS: c000000001def900 TRAP: 0700   Not tainted  (5.10.0-rc2-orig+)
> >> >>  MSR:  9000000002021033 <SF,HV,VEC,ME,IR,DR,RI,LE>  CR: 28022222  XER: 20040000
> >> >>  CFAR: c00000000014b7b4 IRQMASK: 1
> >> >>  GPR00: c000000000442f34 c000000001defba0 c000000001deff00 0000000000000047
> >> >>  GPR04: 00000000ffff7fff c000000001def828 c000000001def820 0000000000000000
> >> >>  GPR08: 0000001ffc3e0000 c000000001b75478 c000000001b75478 0000000000000001
> >> >>  GPR12: 0000000000002000 c000000002030000 0000000000000000 0000000000000000
> >> >>  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000002030000
> >> >>  GPR20: 0000000000000000 0000000000010000 0000000000010000 c000000001defc10
> >> >>  GPR24: c000000001defc08 c000000001c91868 c000000001defc18 c000000001c91890
> >> >>  GPR28: 0000000000000000 ffffffffffffffff 0000000004000000 00000000ffffffff
> >> >>  NIP [c000000000442f38] memblock_find_in_range_node+0x328/0x340
> >> >>  LR [c000000000442f34] memblock_find_in_range_node+0x324/0x340
> >> >>  Call Trace:
> >> >>  [c000000001defba0] [c000000000442f34] memblock_find_in_range_node+0x324/0x340 (unreliable)
> >> >>  [c000000001defc90] [c0000000015ac088] memblock_alloc_range_nid+0xec/0x1b0
> >> >>  [c000000001defd40] [c0000000015ac1f8] memblock_alloc_internal+0xac/0x110
> >> >>  [c000000001defda0] [c0000000015ac4d0] memblock_alloc_try_nid+0x94/0xcc
> >> >>  [c000000001defe30] [c00000000159c3c8] swiotlb_init+0x78/0x104
> >> >>  [c000000001defea0] [c00000000158378c] mem_init+0x4c/0x98
> >> >>  [c000000001defec0] [c00000000157457c] start_kernel+0x714/0xac8
> >> >>  [c000000001deff90] [c00000000000d244] start_here_common+0x1c/0x58
> >> >>  Instruction dump:
> >> >>  2c230000 4182ffd4 ea610088 ea810090 4bfffe84 39200001 3d42fff4 3c62ff60
> >> >>  3863c560 992a8bfc 4bd0881d 60000000 <0fe00000> ea610088 4bfffd94 60000000
> >> >>  random: get_random_bytes called from __warn+0x128/0x184 with crng_init=0
> >> >>  ---[ end trace 0000000000000000 ]---
> >> >>  software IO TLB: Cannot allocate buffer
> >> >> 
> >> >> Unless this is a secure VM the message can actually be ignored, because the
> >> >> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those cases.
> >> >
> >> > The above warn_on is conveying a genuine warning. Should it be silenced?
> >> 
> >> Not sure I understand your point. This patch doesn't silence the
> >> warning, it avoids the problem it is warning about.
> >
> > Sorry, I should have explained it better. My point is...  
> >
> > 	If CONFIG_SWIOTLB is enabled, it means that the kernel is
> > 	promising the bounce buffering capability. I know, currently we
> > 	do not have any kernel subsystems that use bounce buffers on
> > 	non-secure-pseries-kernel or powernv-kernel.  But that does not
> > 	mean, there wont be any. In case there is such a third-party
> > 	module needing bounce buffering, it wont be able to operate,
> > 	because of the proposed change in your patch.
> >
> > 	Is that a good thing or a bad thing, I do not know. I will let
> > 	the experts opine.
> 
> Ping? Does anyone else has an opinion on this? The other option I can
> think of is changing the crashkernel code to not reserve so much memory
> below 4 GB. Other people are considering this option, but it's not
> planned for the near future.

That seems a more suitable solution regardless, but there is always
the danger of not being enough or being too big.

There was some autocrashkernel allocation patches going around
for x86 and ARM that perhaps could be re-used?

> 
> Also, there's a patch currently in linux-next which removes the scary
> warning because of unrelated reasons:
> 
> https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com
> 
> So assuming that the patch above goes in and keeping the assumption that
> the swiotlb won't be needed in the powernv machines where I've seen the
> warning happen, we can just leave things as they are now.

If that solves the problem, then that is OK.


> 
> -- 
> Thiago Jung Bauermann
> IBM Linux Technology Center

^ permalink raw reply

* [PATCH] lib/sstep: Fix incorrect return from analyze_instr()
From: Ananth N Mavinakayanahalli @ 2021-01-21 16:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: naveen.n.rao, paulus, sandipan, ravi.bangoria

We currently just percolate the return value from analyze_instr()
to the caller of emulate_step(), especially if it is a -1.

For one particular case (opcode = 4) for instructions that
aren't currently emulated, we are returning 'should not be
single-stepped' while we should have returned 0 which says
'did not emulate, may have to single-step'.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/lib/sstep.c |   49 +++++++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 5a425a4a1d88..a3a0373843cd 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1445,34 +1445,39 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 
 #ifdef __powerpc64__
 	case 4:
-		if (!cpu_has_feature(CPU_FTR_ARCH_300))
-			return -1;
-
-		switch (word & 0x3f) {
-		case 48:	/* maddhd */
-			asm volatile(PPC_MADDHD(%0, %1, %2, %3) :
-				     "=r" (op->val) : "r" (regs->gpr[ra]),
-				     "r" (regs->gpr[rb]), "r" (regs->gpr[rc]));
-			goto compute_done;
+		/*
+		 * There are very many instructions with this primary opcode
+		 * introduced in the ISA as early as v2.03. However, the ones
+		 * we currently emulate were all introduced with ISA 3.0
+		 */
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			switch (word & 0x3f) {
+			case 48:	/* maddhd */
+				asm volatile(PPC_MADDHD(%0, %1, %2, %3) :
+					     "=r" (op->val) : "r" (regs->gpr[ra]),
+					     "r" (regs->gpr[rb]), "r" (regs->gpr[rc]));
+				goto compute_done;
 
-		case 49:	/* maddhdu */
-			asm volatile(PPC_MADDHDU(%0, %1, %2, %3) :
-				     "=r" (op->val) : "r" (regs->gpr[ra]),
-				     "r" (regs->gpr[rb]), "r" (regs->gpr[rc]));
-			goto compute_done;
+			case 49:	/* maddhdu */
+				asm volatile(PPC_MADDHDU(%0, %1, %2, %3) :
+					     "=r" (op->val) : "r" (regs->gpr[ra]),
+					     "r" (regs->gpr[rb]), "r" (regs->gpr[rc]));
+				goto compute_done;
 
-		case 51:	/* maddld */
-			asm volatile(PPC_MADDLD(%0, %1, %2, %3) :
-				     "=r" (op->val) : "r" (regs->gpr[ra]),
-				     "r" (regs->gpr[rb]), "r" (regs->gpr[rc]));
-			goto compute_done;
+			case 51:	/* maddld */
+				asm volatile(PPC_MADDLD(%0, %1, %2, %3) :
+					     "=r" (op->val) : "r" (regs->gpr[ra]),
+					     "r" (regs->gpr[rb]), "r" (regs->gpr[rc]));
+				goto compute_done;
+			}
 		}
 
 		/*
-		 * There are other instructions from ISA 3.0 with the same
-		 * primary opcode which do not have emulation support yet.
+		 * Rest of the instructions with this primary opcode do not
+		 * have emulation support yet.
 		 */
-		return -1;
+		op->type = UNKNOWN;
+		return 0;
 #endif
 
 	case 7:		/* mulli */



^ permalink raw reply related

* Re: [PATCH 02/13] module: add a module_loaded helper
From: Christoph Hellwig @ 2021-01-21 17:11 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Petr Mladek, Jiri Kosina, Andrew Donnellan, linux-kbuild,
	David Airlie, Masahiro Yamada, Josh Poimboeuf, Maarten Lankhorst,
	linux-kernel, Maxime Ripard, live-patching, Michal Marek,
	Joe Lawrence, dri-devel, Thomas Zimmermann, Jessica Yu,
	Frederic Barrat, Daniel Vetter, Miroslav Benes, linuxppc-dev,
	Christoph Hellwig
In-Reply-To: <844a7fc3-2cba-46d2-fd4e-e5fe16b08573@csgroup.eu>

On Thu, Jan 21, 2021 at 11:00:20AM +0100, Christophe Leroy wrote:
> > +bool module_loaded(const char *name);
>
> Maybe module_is_loaded() would be a better name.

Fine with me.

^ permalink raw reply

* Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool
From: Robin Murphy @ 2021-01-21 17:29 UTC (permalink / raw)
  To: Rob Herring
  Cc: Heikki Krogerus, Peter Zijlstra, Grant Likely, Paul Mackerras,
	Frank Rowand, Ingo Molnar, Marek Szyprowski, Stefano Stabellini,
	Saravana Kannan, Heinrich Schuchardt, Joerg Roedel,
	Wysocki, Rafael J, Christoph Hellwig, Bartosz Golaszewski,
	xen-devel, Thierry Reding, devicetree, Will Deacon,
	Konrad Rzeszutek Wilk, Dan Williams, Nicolas Boichat,
	Claire Chang, Boris Ostrovsky, Andy Shevchenko, Juergen Gross,
	Greg Kroah-Hartman, Randy Dunlap, linux-kernel@vger.kernel.org,
	Tomasz Figa, Linux IOMMU, linuxppc-dev, Thiago Jung Bauermann
In-Reply-To: <CAL_JsqLv-FaiY_k+wS=iXG5AtccsXSBtvTfEGHvsN-VNqXdwpA@mail.gmail.com>

On 2021-01-21 15:48, Rob Herring wrote:
> On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy <robin.murphy@arm.com>
> wrote:
>> 
>> On 2021-01-20 21:31, Rob Herring wrote:
>>> On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy
>>> <robin.murphy@arm.com> wrote:
>>>> 
>>>> On 2021-01-20 16:53, Rob Herring wrote:
>>>>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang
>>>>> wrote:
>>>>>> Introduce the new compatible string, restricted-dma-pool,
>>>>>> for restricted DMA. One can specify the address and length
>>>>>> of the restricted DMA memory region by restricted-dma-pool
>>>>>> in the device tree.
>>>>> 
>>>>> If this goes into DT, I think we should be able to use
>>>>> dma-ranges for this purpose instead. Normally, 'dma-ranges'
>>>>> is for physical bus restrictions, but there's no reason it
>>>>> can't be used for policy or to express restrictions the
>>>>> firmware has enabled.
>>>> 
>>>> There would still need to be some way to tell SWIOTLB to pick
>>>> up the corresponding chunk of memory and to prevent the kernel
>>>> from using it for anything else, though.
>>> 
>>> Don't we already have that problem if dma-ranges had a very
>>> small range? We just get lucky because the restriction is
>>> generally much more RAM than needed.
>> 
>> Not really - if a device has a naturally tiny addressing capability
>> that doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer
>> will be allocated then it's unlikely to work well, but that's just
>> crap system design. Yes, memory pressure in ZONE_DMA{32} is
>> particularly problematic for such limited devices, but it's
>> irrelevant to the issue at hand here.
> 
> Yesterday's crap system design is today's security feature. Couldn't 
> this feature make crap system design work better?

Indeed! Say you bring out your shiny new "Strawberry Flan 4" machine
with all the latest connectivity, but tragically its PCIe can only
address 25% of the RAM. So you decide to support deploying it in two
configurations: one where it runs normally for best performance, and
another "secure" one where it dedicates that quarter of RAM as a 
restricted DMA pool for any PCIe devices - that way, even if that hotel 
projector you plug in turns out to be a rogue Thunderbolt endpoint, it 
can never snarf your private keys off your eMMC out of the page cache.

(Yes, is is the thinnest of strawmen, but it sets the scene for the 
point you raised...)

...which is that in both cases the dma-ranges will still be identical. 
So how is the kernel going to know whether to steal that whole area from 
memblock before anything else can allocate from it, or not?

I don't disagree that even in Claire's original intended case it would 
be semantically correct to describe the hardware-firewalled region with 
dma-ranges. It just turns out not to be necessary, and you're already 
arguing for not adding anything in DT that doesn't need to be.

>> What we have here is a device that's not allowed to see *kernel*
>> memory at all. It's been artificially constrained to a particular
>> region by a TZASC or similar, and the only data which should ever
>> be placed in that
> 
> May have been constrained, but that's entirely optional.
> 
> In the optional case where the setup is entirely up to the OS, I
> don't think this belongs in the DT at all. Perhaps that should be
> solved first.

Yes! Let's definitely consider that case! Say you don't have any 
security or physical limitations but want to use a bounce pool for some 
device anyway because reasons (perhaps copying streaming DMA data to a 
better guaranteed alignment gives an overall performance win). Now the 
*only* relevant thing to communicate to the kernel is to, ahem, reserve 
a large chunk of memory, and use it for this special purpose. Isn't that 
literally what reserved-memory bindings are for?

>> region is data intended for that device to see. That way if it
>> tries to go rogue it physically can't start slurping data intended
>> for other devices or not mapped for DMA at all. The bouncing is an
>> important part of this - I forget the title off-hand but there was
>> an interesting paper a few years ago which demonstrated that even
>> with an IOMMU, streaming DMA of in-place buffers could reveal
>> enough adjacent data from the same page to mount an attack on the
>> system. Memory pressure should be immaterial since the size of each
>> bounce pool carveout will presumably be tuned for the needs of the
>> given device.
>> 
>>> In any case, wouldn't finding all the dma-ranges do this? We're 
>>> already walking the tree to find the max DMA address now.
>> 
>> If all you can see are two "dma-ranges" properties, how do you
>> propose to tell that one means "this is the extent of what I can
>> address, please set my masks and dma-range-map accordingly and try
>> to allocate things where I can reach them" while the other means
>> "take this output range away from the page allocator and hook it up
>> as my dedicated bounce pool, because it is Serious Security Time"?
>> Especially since getting that choice wrong either way would be a
>> Bad Thing.
> 
> Either we have some heuristic based on the size or we add some hint. 
> The point is let's build on what we already have for defining DMA 
> accessible memory in DT rather than some parallel mechanism.

The point I'm trying to bang home is that it's really not about the DMA 
accessibility, it's about the purpose of the memory itself. Even when 
DMA accessibility *is* relevant it's already implied by that purpose, 
from the point of view of the implementation. The only difference it 
might make is to the end user if they want to ascertain whether the 
presence of such a pool represents protection against an untrusted 
device or just some DMA optimisation tweak.

Robin.

^ permalink raw reply

* [PATCH 2/2] ima: Free IMA measurement buffer after kexec syscall
From: Lakshmi Ramasubramanian @ 2021-01-21 17:30 UTC (permalink / raw)
  To: zohar, bauerman, dmitry.kasatkin, ebiederm, gregkh, sashal,
	tyhicks
  Cc: linux-integrity, linuxppc-dev, linux-kernel
In-Reply-To: <20210121173003.18324-1-nramas@linux.microsoft.com>

IMA allocates kernel virtual memory to carry forward the measurement
list, from the current kernel to the next kernel on kexec system call,
in ima_add_kexec_buffer() function.  This buffer is not freed before
completing the kexec system call resulting in memory leak.

Add ima_buffer field in "struct kimage" to store the virtual address
of the buffer allocated for the IMA measurement list.
Free the memory allocated for the IMA measurement list in
kimage_file_post_load_cleanup() function.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
---
 include/linux/kexec.h              | 5 +++++
 kernel/kexec_file.c                | 5 +++++
 security/integrity/ima/ima_kexec.c | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 9e93bef52968..5f61389f5f36 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -300,6 +300,11 @@ struct kimage {
 	/* Information for loading purgatory */
 	struct purgatory_info purgatory_info;
 #endif
+
+#ifdef CONFIG_IMA_KEXEC
+	/* Virtual address of IMA measurement buffer for kexec syscall */
+	void *ima_buffer;
+#endif
 };
 
 /* kexec interface functions */
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index b02086d70492..5c3447cf7ad5 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -166,6 +166,11 @@ void kimage_file_post_load_cleanup(struct kimage *image)
 	vfree(pi->sechdrs);
 	pi->sechdrs = NULL;
 
+#ifdef CONFIG_IMA_KEXEC
+	vfree(image->ima_buffer);
+	image->ima_buffer = NULL;
+#endif /* CONFIG_IMA_KEXEC */
+
 	/* See if architecture has anything to cleanup post load */
 	arch_kimage_file_post_load_cleanup(image);
 
diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c
index 212145008a01..8eadd0674629 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -130,6 +130,8 @@ void ima_add_kexec_buffer(struct kimage *image)
 		return;
 	}
 
+	image->ima_buffer = kexec_buffer;
+
 	pr_debug("kexec measurement buffer for the loaded kernel at 0x%lx.\n",
 		 kbuf.mem);
 }
-- 
2.30.0


^ permalink raw reply related

* [PATCH 1/2] ima: Free IMA measurement buffer on error
From: Lakshmi Ramasubramanian @ 2021-01-21 17:30 UTC (permalink / raw)
  To: zohar, bauerman, dmitry.kasatkin, ebiederm, gregkh, sashal,
	tyhicks
  Cc: linux-integrity, linuxppc-dev, linux-kernel

IMA allocates kernel virtual memory to carry forward the measurement
list, from the current kernel to the next kernel on kexec system call,
in ima_add_kexec_buffer() function.  In error code paths this memory
is not freed resulting in memory leak.

Free the memory allocated for the IMA measurement list in
the error code paths in ima_add_kexec_buffer() function.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
---
 security/integrity/ima/ima_kexec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c
index 121de3e04af2..212145008a01 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -119,12 +119,14 @@ void ima_add_kexec_buffer(struct kimage *image)
 	ret = kexec_add_buffer(&kbuf);
 	if (ret) {
 		pr_err("Error passing over kexec measurement buffer.\n");
+		vfree(kexec_buffer);
 		return;
 	}
 
 	ret = arch_ima_add_kexec_buffer(image, kbuf.mem, kexec_segment_size);
 	if (ret) {
 		pr_err("Error passing over kexec measurement buffer.\n");
+		vfree(kexec_buffer);
 		return;
 	}
 
-- 
2.30.0


^ permalink raw reply related

* RE: [PATCH 02/13] module: add a module_loaded helper
From: David Laight @ 2021-01-21 17:44 UTC (permalink / raw)
  To: 'Christoph Hellwig', Christophe Leroy
  Cc: Petr Mladek, Michal Marek, Andrew Donnellan, Jessica Yu,
	linux-kbuild@vger.kernel.org, David Airlie, Masahiro Yamada,
	Jiri Kosina, Maarten Lankhorst, linux-kernel@vger.kernel.org,
	Maxime Ripard, Joe Lawrence, dri-devel@lists.freedesktop.org,
	Thomas Zimmermann, Josh Poimboeuf, Frederic Barrat,
	live-patching@vger.kernel.org, Daniel Vetter, Miroslav Benes,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20210121171119.GA29916@lst.de>

> 
> On Thu, Jan 21, 2021 at 11:00:20AM +0100, Christophe Leroy wrote:
> > > +bool module_loaded(const char *name);
> >
> > Maybe module_is_loaded() would be a better name.
> 
> Fine with me.

It does look like both callers aren't 'unsafe'.
But it is a bit strange returning a stale value.

It did make be look a bit more deeply at try_module_get().
For THIS_MODULE (eg to get an extra reference for a kthread)
I doubt it can ever fail.

But the other cases require a 'module' structure be passed in.
ISTM that unloading the module could invalidate the pointer
that has just been read from some other structure.

I'm guessing that some other lock must always be held.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH net] ibmvnic: device remove has higher precedence over reset
From: Dany Madden @ 2021-01-21 18:24 UTC (permalink / raw)
  To: Lijun Pan
  Cc: gregkh, julietk, netdev, Uwe Kleine-König, paulus, kernel,
	kuba, sukadev, linuxppc-dev, davem
In-Reply-To: <20210121062005.53271-1-ljp@linux.ibm.com>

On 2021-01-20 22:20, Lijun Pan wrote:
> Returning -EBUSY in ibmvnic_remove() does not actually hold the
> removal procedure since driver core doesn't care for the return
> value (see __device_release_driver() in drivers/base/dd.c
> calling dev->bus->remove()) though vio_bus_remove
> (in arch/powerpc/platforms/pseries/vio.c) records the
> return value and passes it on. [1]
> 
> During the device removal precedure, we should not schedule
> any new reset (ibmvnic_reset check for REMOVING and exit),
> and should rely on the flush_work and flush_delayed_work
> to complete the pending resets, specifically we need to
> let __ibmvnic_reset() keep running while in REMOVING state since
> flush_work and flush_delayed_work shall call __ibmvnic_reset finally.
> So we skip the checking for REMOVING in __ibmvnic_reset.
> 
> [1]
> https://lore.kernel.org/linuxppc-dev/20210117101242.dpwayq6wdgfdzirl@pengutronix.de/T/#m48f5befd96bc9842ece2a3ad14f4c27747206a53
> Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during
> device reset")
> Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
> ---
> v1 versus RFC:
>   1/ articulate why remove the REMOVING checking in __ibmvnic_reset
>   and why keep the current checking for REMOVING in ibmvnic_reset.
>   2/ The locking issue mentioned by Uwe are being addressed separately
>      by	https://lists.openwall.net/netdev/2021/01/08/89
>   3/ This patch does not have merge conflict with 2/
> 
>  drivers/net/ethernet/ibm/ibmvnic.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index aed985e08e8a..11f28fd03057 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -2235,8 +2235,7 @@ static void __ibmvnic_reset(struct work_struct 
> *work)
>  	while (rwi) {
>  		spin_lock_irqsave(&adapter->state_lock, flags);
> 
> -		if (adapter->state == VNIC_REMOVING ||
> -		    adapter->state == VNIC_REMOVED) {
> +		if (adapter->state == VNIC_REMOVED) {

If we do get here, we would crash because ibmvnic_remove() happened. It 
frees the adapter struct already.

>  			spin_unlock_irqrestore(&adapter->state_lock, flags);
>  			kfree(rwi);
>  			rc = EBUSY;
> @@ -5372,11 +5371,6 @@ static int ibmvnic_remove(struct vio_dev *dev)
>  	unsigned long flags;
> 
>  	spin_lock_irqsave(&adapter->state_lock, flags);
> -	if (test_bit(0, &adapter->resetting)) {
> -		spin_unlock_irqrestore(&adapter->state_lock, flags);
> -		return -EBUSY;
> -	}
> -
>  	adapter->state = VNIC_REMOVING;
>  	spin_unlock_irqrestore(&adapter->state_lock, flags);

^ permalink raw reply

* Re: [PATCH net] ibmvnic: device remove has higher precedence over reset
From: Lijun Pan @ 2021-01-21 18:46 UTC (permalink / raw)
  To: Dany Madden
  Cc: gregkh, julietk, netdev, Uwe Kleine-König, Jakub Kicinski,
	Lijun Pan, kernel, paulus, sukadev, linuxppc-dev, davem
In-Reply-To: <c34816a13d857b7f5d1a25991b58ec63@imap.linux.ibm.com>

On Thu, Jan 21, 2021 at 12:42 PM Dany Madden <drt@linux.ibm.com> wrote:
>
> On 2021-01-20 22:20, Lijun Pan wrote:
> > Returning -EBUSY in ibmvnic_remove() does not actually hold the
> > removal procedure since driver core doesn't care for the return
> > value (see __device_release_driver() in drivers/base/dd.c
> > calling dev->bus->remove()) though vio_bus_remove
> > (in arch/powerpc/platforms/pseries/vio.c) records the
> > return value and passes it on. [1]
> >
> > During the device removal precedure, we should not schedule
> > any new reset (ibmvnic_reset check for REMOVING and exit),
> > and should rely on the flush_work and flush_delayed_work
> > to complete the pending resets, specifically we need to
> > let __ibmvnic_reset() keep running while in REMOVING state since
> > flush_work and flush_delayed_work shall call __ibmvnic_reset finally.
> > So we skip the checking for REMOVING in __ibmvnic_reset.
> >
> > [1]
> > https://lore.kernel.org/linuxppc-dev/20210117101242.dpwayq6wdgfdzirl@pengutronix.de/T/#m48f5befd96bc9842ece2a3ad14f4c27747206a53
> > Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> > Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during
> > device reset")
> > Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
> > ---
> > v1 versus RFC:
> >   1/ articulate why remove the REMOVING checking in __ibmvnic_reset
> >   and why keep the current checking for REMOVING in ibmvnic_reset.
> >   2/ The locking issue mentioned by Uwe are being addressed separately
> >      by       https://lists.openwall.net/netdev/2021/01/08/89
> >   3/ This patch does not have merge conflict with 2/
> >
> >  drivers/net/ethernet/ibm/ibmvnic.c | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
> > b/drivers/net/ethernet/ibm/ibmvnic.c
> > index aed985e08e8a..11f28fd03057 100644
> > --- a/drivers/net/ethernet/ibm/ibmvnic.c
> > +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> > @@ -2235,8 +2235,7 @@ static void __ibmvnic_reset(struct work_struct
> > *work)
> >       while (rwi) {
> >               spin_lock_irqsave(&adapter->state_lock, flags);
> >
> > -             if (adapter->state == VNIC_REMOVING ||
> > -                 adapter->state == VNIC_REMOVED) {
> > +             if (adapter->state == VNIC_REMOVED) {
>
> If we do get here, we would crash because ibmvnic_remove() happened. It
> frees the adapter struct already.

Not exactly. viodev is gone; netdev is done; ibmvnic_adapter is still there.

Lijun
>
> >                       spin_unlock_irqrestore(&adapter->state_lock, flags);
> >                       kfree(rwi);
> >                       rc = EBUSY;
> > @@ -5372,11 +5371,6 @@ static int ibmvnic_remove(struct vio_dev *dev)
> >       unsigned long flags;
> >
> >       spin_lock_irqsave(&adapter->state_lock, flags);
> > -     if (test_bit(0, &adapter->resetting)) {
> > -             spin_unlock_irqrestore(&adapter->state_lock, flags);
> > -             return -EBUSY;
> > -     }
> > -
> >       adapter->state = VNIC_REMOVING;
> >       spin_unlock_irqrestore(&adapter->state_lock, flags);

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox