* Re: [PATCH v2 3/9] cxl/pci: Remove pci request/release regions
From: Dan Williams @ 2021-09-28 14:42 UTC (permalink / raw)
To: Ben Widawsky
Cc: Andrew Donnellan, Linux PCI, linuxppc-dev, linux-cxl,
open list:DMA MAPPING HELPERS, Bjorn Helgaas, David E. Box,
Frederic Barrat, Lu Baolu, David Woodhouse, Kan Liang
In-Reply-To: <20210923172647.72738-4-ben.widawsky@intel.com>
On Thu, Sep 23, 2021 at 10:26 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> Quoting Dan, "... the request + release regions should probably just be
> dropped. It's not like any of the register enumeration would collide
> with someone else who already has the registers mapped. The collision
> only comes when the registers are mapped for their final usage, and that
> will have more precision in the request."
Looks good to me:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>
> Recommended-by: Dan Williams <dan.j.williams@intel.com>
This isn't one of the canonical tags:
Documentation/process/submitting-patches.rst
I'll change this to Suggested-by:
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
> drivers/cxl/pci.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index ccc7c2573ddc..7256c236fdb3 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -453,9 +453,6 @@ static int cxl_pci_setup_regs(struct cxl_mem *cxlm)
> return -ENXIO;
> }
>
> - if (pci_request_mem_regions(pdev, pci_name(pdev)))
> - return -ENODEV;
> -
> /* Get the size of the Register Locator DVSEC */
> pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, ®loc_size);
> regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size);
> @@ -499,8 +496,6 @@ static int cxl_pci_setup_regs(struct cxl_mem *cxlm)
> n_maps++;
> }
>
> - pci_release_mem_regions(pdev);
> -
> for (i = 0; i < n_maps; i++) {
> ret = cxl_map_regs(cxlm, &maps[i]);
> if (ret)
> --
> 2.33.0
>
^ permalink raw reply
* Re: [PATCH v2 2/9] cxl/pci: Remove dev_dbg for unknown register blocks
From: Dan Williams @ 2021-09-28 14:37 UTC (permalink / raw)
To: Ben Widawsky
Cc: Andrew Donnellan, Linux PCI, linuxppc-dev, linux-cxl,
open list:DMA MAPPING HELPERS, Bjorn Helgaas, David E. Box,
Frederic Barrat, Lu Baolu, David Woodhouse, Kan Liang
In-Reply-To: <20210923172647.72738-3-ben.widawsky@intel.com>
On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> While interesting to driver developers, the dev_dbg message doesn't do
> much except clutter up logs. This information should be attainable
> through sysfs, and someday lspci like utilities. This change
> additionally helps reduce the LOC in a subsequent patch to refactor some
> of cxl_pci register mapping.
Looks good to me:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply
* Re: [PATCH v8 1/2] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Pratik Sampat @ 2021-09-28 14:11 UTC (permalink / raw)
To: Greg KH
Cc: farosas, pratik.r.sampat, linuxppc-dev, kvm-ppc, linux-kernel,
paulus, linux-kselftest, kjain, shuah
In-Reply-To: <YVMfonwjmbgL/ZCX@kroah.com>
On 28/09/21 7:28 pm, Greg KH wrote:
> On Tue, Sep 28, 2021 at 06:13:18PM +0530, Pratik Sampat wrote:
>> Hello Greg,
>>
>> Thank you for your review.
>>
>> On 28/09/21 5:38 pm, Greg KH wrote:
>>> On Tue, Sep 28, 2021 at 05:21:01PM +0530, Pratik R. Sampat wrote:
>>>> Adds a generic interface to represent the energy and frequency related
>>>> PAPR attributes on the system using the new H_CALL
>>>> "H_GET_ENERGY_SCALE_INFO".
>>>>
>>>> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
>>>> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
>>>> will be deprecated P10 onwards.
>>>>
>>>> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
>>>> hcall(
>>>> uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
>>>> uint64 flags, // Per the flag request
>>>> uint64 firstAttributeId,// The attribute id
>>>> uint64 bufferAddress, // Guest physical address of the output buffer
>>>> uint64 bufferSize // The size in bytes of the output buffer
>>>> );
>>>>
>>>> This H_CALL can query either all the attributes at once with
>>>> firstAttributeId = 0, flags = 0 as well as query only one attribute
>>>> at a time with firstAttributeId = id, flags = 1.
>>>>
>>>> The output buffer consists of the following
>>>> 1. number of attributes - 8 bytes
>>>> 2. array offset to the data location - 8 bytes
>>>> 3. version info - 1 byte
>>>> 4. A data array of size num attributes, which contains the following:
>>>> a. attribute ID - 8 bytes
>>>> b. attribute value in number - 8 bytes
>>>> c. attribute name in string - 64 bytes
>>>> d. attribute value in string - 64 bytes
>>>>
>>>> The new H_CALL exports information in direct string value format, hence
>>>> a new interface has been introduced in
>>>> /sys/firmware/papr/energy_scale_info to export this information to
>>>> userspace in an extensible pass-through format.
>>>>
>>>> The H_CALL returns the name, numeric value and string value (if exists)
>>>>
>>>> The format of exposing the sysfs information is as follows:
>>>> /sys/firmware/papr/energy_scale_info/
>>>> |-- <id>/
>>>> |-- desc
>>>> |-- value
>>>> |-- value_desc (if exists)
>>>> |-- <id>/
>>>> |-- desc
>>>> |-- value
>>>> |-- value_desc (if exists)
>>>> ...
>>>>
>>>> The energy information that is exported is useful for userspace tools
>>>> such as powerpc-utils. Currently these tools infer the
>>>> "power_mode_data" value in the lparcfg, which in turn is obtained from
>>>> the to be deprecated H_GET_EM_PARMS H_CALL.
>>>> On future platforms, such userspace utilities will have to look at the
>>>> data returned from the new H_CALL being populated in this new sysfs
>>>> interface and report this information directly without the need of
>>>> interpretation.
>>>>
>>>> Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
>>>> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
>>>> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
>>>> Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
>>>> ---
>>>> .../sysfs-firmware-papr-energy-scale-info | 26 ++
>>>> arch/powerpc/include/asm/hvcall.h | 24 +-
>>>> arch/powerpc/kvm/trace_hv.h | 1 +
>>>> arch/powerpc/platforms/pseries/Makefile | 3 +-
>>>> .../pseries/papr_platform_attributes.c | 312 ++++++++++++++++++
>>>> 5 files changed, 364 insertions(+), 2 deletions(-)
>>>> create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>>>> create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
>>>>
>>>> diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>>>> new file mode 100644
>>>> index 000000000000..139a576c7c9d
>>>> --- /dev/null
>>>> +++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>>>> @@ -0,0 +1,26 @@
>>>> +What: /sys/firmware/papr/energy_scale_info
>>>> +Date: June 2021
>>>> +Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
>>>> +Description: Directory hosting a set of platform attributes like
>>>> + energy/frequency on Linux running as a PAPR guest.
>>>> +
>>>> + Each file in a directory contains a platform
>>>> + attribute hierarchy pertaining to performance/
>>>> + energy-savings mode and processor frequency.
>>>> +
>>>> +What: /sys/firmware/papr/energy_scale_info/<id>
>>>> + /sys/firmware/papr/energy_scale_info/<id>/desc
>>>> + /sys/firmware/papr/energy_scale_info/<id>/value
>>>> + /sys/firmware/papr/energy_scale_info/<id>/value_desc
>>>> +Date: June 2021
>>>> +Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
>>>> +Description: Energy, frequency attributes directory for POWERVM servers
>>>> +
>>>> + This directory provides energy, frequency, folding information. It
>>>> + contains below sysfs attributes:
>>>> +
>>>> + - desc: String description of the attribute <id>
>>>> +
>>>> + - value: Numeric value of attribute <id>
>>>> +
>>>> + - value_desc: String value of attribute <id>
>>> Can you just make 4 different entries in this file, making it easier to
>>> parse and extend over time?
>> Do you mean I only create one file per attribute and populate it with 4
>> different entries as follows?
>>
>> # cat /sys/firmware/papr/energy_scale_info/<id>
>> id:
>> desc:
>> value:
>> value_desc:
> No, I mean in this documentation file, have 4 different "What:" entries,
> don't lump 4 of them together into one larger Description for no reason
> like you did here.
>
> The sysfs files themselves are fine.
Ah okay, I understand what you're saying. I just need to make 4 different
entries in the documentation.
Thanks for that clarification.
>>>> +struct papr_attr {
>>>> + u64 id;
>>>> + struct kobj_attribute kobj_attr;
>>> Why does an attribute have to be part of this structure?
>> I bundled both an attribute as well as its ID in a structure because each
>> attributes value could only be queried from the firmware with the corresponding
>> ID.
>> It seemed to be logically connected and that's why I had them in the structure.
>> Are you suggesting we maintain them separately and don't need the coupling?
> The id is connected to the kobject, not the attribute, right?
> Attributes do not have uniqueness like this normally.
>
>
>>>> +static struct papr_ops_info {
>>>> + const char *attr_name;
>>>> + ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *kobj_attr,
>>>> + char *buf);
>>>> +} ops_info[MAX_ATTRS] = {
>>>> + { "desc", papr_show_desc },
>>>> + { "value", papr_show_value },
>>>> + { "value_desc", papr_show_value_desc },
>>> What is wrong with just using the __ATTR_RO() macro and then having an
>>> array of attributes in a single group? That should be a lot simpler
>>> overall, right?
>> If I understand this correctly, you mean I can have a array of attributes in a
>> flat single group?
> Yes.
>
>> I suppose that would be a simpler, given your earlier suggestion to wrap
>> attribute values up in a single file per attribute.
>>
>> However, the intent of grouping and keeping files separate was that each sysfs
>> file has only one value to display.
> That is correct, and not a problem here at all.
>
>> I can change it to using an array of attributes in a single group too if you
>> believe that is right way to go instead.
> You have 3 variables for your attributes:
>
> static struct kobj_attribute papr_desc = __ATTR_RO(desc);
> static struct kobj_attribute papr_value = __ATTR_RO(value);
> static struct kobj_attribute papr_value_desc = __ATTR_RO(value_desc);
>
> and then your attribute group:
> static struct attribute papr_attrs[] = {
> &papr_desc.attr,
> &papr_value.attr,
> &papr_value_desc.attr,
> NULL,
> };
>
> ATTRIBUTE_GROUPS(papr);
>
> Then take that papr_groups and register that with the kobject when
> needed.
>
> But, you seem to only be having a whole kobject for a subdirectory,
> right? No need for that, just name your attribute group, so instead of
>
> ATTRIBUTE_GROUPS(papr);
>
> do:
> static const struct attribute_group papr_group = {
> .name = "Your Subdirectory Name here",
> .attrs = papr_attrs,
> };
>
> Hope this helps,
Yes, this does!
I understand now that a whole kobject for a sub-directory is futile.
The approach you suggested for having papr_groups register with the kobject
whenever needed is more cleaner.
Thanks for the help, I'll rework my current logic according to that.
Pratik
> greg k-h
^ permalink raw reply
* Re: [PATCH v8 1/2] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Greg KH @ 2021-09-28 13:58 UTC (permalink / raw)
To: Pratik Sampat
Cc: farosas, pratik.r.sampat, linuxppc-dev, kvm-ppc, linux-kernel,
paulus, linux-kselftest, kjain, shuah
In-Reply-To: <289d2081-7ae8-f76a-5180-49bc6061a05c@linux.ibm.com>
On Tue, Sep 28, 2021 at 06:13:18PM +0530, Pratik Sampat wrote:
> Hello Greg,
>
> Thank you for your review.
>
> On 28/09/21 5:38 pm, Greg KH wrote:
> > On Tue, Sep 28, 2021 at 05:21:01PM +0530, Pratik R. Sampat wrote:
> > > Adds a generic interface to represent the energy and frequency related
> > > PAPR attributes on the system using the new H_CALL
> > > "H_GET_ENERGY_SCALE_INFO".
> > >
> > > H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> > > information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> > > will be deprecated P10 onwards.
> > >
> > > The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> > > hcall(
> > > uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
> > > uint64 flags, // Per the flag request
> > > uint64 firstAttributeId,// The attribute id
> > > uint64 bufferAddress, // Guest physical address of the output buffer
> > > uint64 bufferSize // The size in bytes of the output buffer
> > > );
> > >
> > > This H_CALL can query either all the attributes at once with
> > > firstAttributeId = 0, flags = 0 as well as query only one attribute
> > > at a time with firstAttributeId = id, flags = 1.
> > >
> > > The output buffer consists of the following
> > > 1. number of attributes - 8 bytes
> > > 2. array offset to the data location - 8 bytes
> > > 3. version info - 1 byte
> > > 4. A data array of size num attributes, which contains the following:
> > > a. attribute ID - 8 bytes
> > > b. attribute value in number - 8 bytes
> > > c. attribute name in string - 64 bytes
> > > d. attribute value in string - 64 bytes
> > >
> > > The new H_CALL exports information in direct string value format, hence
> > > a new interface has been introduced in
> > > /sys/firmware/papr/energy_scale_info to export this information to
> > > userspace in an extensible pass-through format.
> > >
> > > The H_CALL returns the name, numeric value and string value (if exists)
> > >
> > > The format of exposing the sysfs information is as follows:
> > > /sys/firmware/papr/energy_scale_info/
> > > |-- <id>/
> > > |-- desc
> > > |-- value
> > > |-- value_desc (if exists)
> > > |-- <id>/
> > > |-- desc
> > > |-- value
> > > |-- value_desc (if exists)
> > > ...
> > >
> > > The energy information that is exported is useful for userspace tools
> > > such as powerpc-utils. Currently these tools infer the
> > > "power_mode_data" value in the lparcfg, which in turn is obtained from
> > > the to be deprecated H_GET_EM_PARMS H_CALL.
> > > On future platforms, such userspace utilities will have to look at the
> > > data returned from the new H_CALL being populated in this new sysfs
> > > interface and report this information directly without the need of
> > > interpretation.
> > >
> > > Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
> > > Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > > Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
> > > Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
> > > ---
> > > .../sysfs-firmware-papr-energy-scale-info | 26 ++
> > > arch/powerpc/include/asm/hvcall.h | 24 +-
> > > arch/powerpc/kvm/trace_hv.h | 1 +
> > > arch/powerpc/platforms/pseries/Makefile | 3 +-
> > > .../pseries/papr_platform_attributes.c | 312 ++++++++++++++++++
> > > 5 files changed, 364 insertions(+), 2 deletions(-)
> > > create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> > > create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> > > new file mode 100644
> > > index 000000000000..139a576c7c9d
> > > --- /dev/null
> > > +++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> > > @@ -0,0 +1,26 @@
> > > +What: /sys/firmware/papr/energy_scale_info
> > > +Date: June 2021
> > > +Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> > > +Description: Directory hosting a set of platform attributes like
> > > + energy/frequency on Linux running as a PAPR guest.
> > > +
> > > + Each file in a directory contains a platform
> > > + attribute hierarchy pertaining to performance/
> > > + energy-savings mode and processor frequency.
> > > +
> > > +What: /sys/firmware/papr/energy_scale_info/<id>
> > > + /sys/firmware/papr/energy_scale_info/<id>/desc
> > > + /sys/firmware/papr/energy_scale_info/<id>/value
> > > + /sys/firmware/papr/energy_scale_info/<id>/value_desc
> > > +Date: June 2021
> > > +Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> > > +Description: Energy, frequency attributes directory for POWERVM servers
> > > +
> > > + This directory provides energy, frequency, folding information. It
> > > + contains below sysfs attributes:
> > > +
> > > + - desc: String description of the attribute <id>
> > > +
> > > + - value: Numeric value of attribute <id>
> > > +
> > > + - value_desc: String value of attribute <id>
> > Can you just make 4 different entries in this file, making it easier to
> > parse and extend over time?
>
> Do you mean I only create one file per attribute and populate it with 4
> different entries as follows?
>
> # cat /sys/firmware/papr/energy_scale_info/<id>
> id:
> desc:
> value:
> value_desc:
No, I mean in this documentation file, have 4 different "What:" entries,
don't lump 4 of them together into one larger Description for no reason
like you did here.
The sysfs files themselves are fine.
> > > +struct papr_attr {
> > > + u64 id;
> > > + struct kobj_attribute kobj_attr;
> > Why does an attribute have to be part of this structure?
>
> I bundled both an attribute as well as its ID in a structure because each
> attributes value could only be queried from the firmware with the corresponding
> ID.
> It seemed to be logically connected and that's why I had them in the structure.
> Are you suggesting we maintain them separately and don't need the coupling?
The id is connected to the kobject, not the attribute, right?
Attributes do not have uniqueness like this normally.
> > > +static struct papr_ops_info {
> > > + const char *attr_name;
> > > + ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *kobj_attr,
> > > + char *buf);
> > > +} ops_info[MAX_ATTRS] = {
> > > + { "desc", papr_show_desc },
> > > + { "value", papr_show_value },
> > > + { "value_desc", papr_show_value_desc },
> > What is wrong with just using the __ATTR_RO() macro and then having an
> > array of attributes in a single group? That should be a lot simpler
> > overall, right?
>
> If I understand this correctly, you mean I can have a array of attributes in a
> flat single group?
Yes.
> I suppose that would be a simpler, given your earlier suggestion to wrap
> attribute values up in a single file per attribute.
>
> However, the intent of grouping and keeping files separate was that each sysfs
> file has only one value to display.
That is correct, and not a problem here at all.
> I can change it to using an array of attributes in a single group too if you
> believe that is right way to go instead.
You have 3 variables for your attributes:
static struct kobj_attribute papr_desc = __ATTR_RO(desc);
static struct kobj_attribute papr_value = __ATTR_RO(value);
static struct kobj_attribute papr_value_desc = __ATTR_RO(value_desc);
and then your attribute group:
static struct attribute papr_attrs[] = {
&papr_desc.attr,
&papr_value.attr,
&papr_value_desc.attr,
NULL,
};
ATTRIBUTE_GROUPS(papr);
Then take that papr_groups and register that with the kobject when
needed.
But, you seem to only be having a whole kobject for a subdirectory,
right? No need for that, just name your attribute group, so instead of
ATTRIBUTE_GROUPS(papr);
do:
static const struct attribute_group papr_group = {
.name = "Your Subdirectory Name here",
.attrs = papr_attrs,
};
Hope this helps,
greg k-h
^ permalink raw reply
* Re: [RFC PATCH 3/8] s390: add CPU field to struct thread_info
From: Ard Biesheuvel @ 2021-09-28 13:52 UTC (permalink / raw)
To: Linux Kernel Mailing List, Heiko Carstens, Christian Borntraeger,
Vasily Gorbik
Cc: Peter Zijlstra, Catalin Marinas, Paul Mackerras, linux-riscv,
Will Deacon, open list:S390, Russell King, Ingo Molnar, Albert Ou,
Kees Cook, Arnd Bergmann, Keith Packard, Borislav Petkov,
Andy Lutomirski, Paul Walmsley, Thomas Gleixner, Linux ARM,
Linus Torvalds, Palmer Dabbelt,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20210914121036.3975026-4-ardb@kernel.org>
On Tue, 14 Sept 2021 at 14:11, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> The CPU field will be moved back into thread_info even when
> THREAD_INFO_IN_TASK is enabled, so add it back to s390's definition of
> struct thread_info.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/s390/include/asm/thread_info.h | 1 +
> 1 file changed, 1 insertion(+)
>
Heiko, Christian, Vasily,
Do you have any objections to this change? If you don't, could you
please ack it so it can be taken through another tree (or if that is
problematic for you, could you please propose another way of merging
these changes?)
Thanks,
Ard.
> diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h
> index e6674796aa6f..b2ffcb4fe000 100644
> --- a/arch/s390/include/asm/thread_info.h
> +++ b/arch/s390/include/asm/thread_info.h
> @@ -37,6 +37,7 @@
> struct thread_info {
> unsigned long flags; /* low level flags */
> unsigned long syscall_work; /* SYSCALL_WORK_ flags */
> + unsigned int cpu; /* current CPU */
> };
>
> /*
> --
> 2.30.2
>
^ permalink raw reply
* Re: [RFC PATCH 4/8] powerpc: add CPU field to struct thread_info
From: Ard Biesheuvel @ 2021-09-28 13:18 UTC (permalink / raw)
To: Michael Ellerman
Cc: Peter Zijlstra, Catalin Marinas, Paul Mackerras, linux-riscv,
Will Deacon, open list:S390, Arnd Bergmann, Russell King,
Christian Borntraeger, Ingo Molnar, Albert Ou, Kees Cook,
Vasily Gorbik, Heiko Carstens, Keith Packard, Borislav Petkov,
Andy Lutomirski, Paul Walmsley, Thomas Gleixner, Linux ARM,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
Linux Kernel Mailing List, Palmer Dabbelt, Linus Torvalds
In-Reply-To: <87pmst1rn9.fsf@mpe.ellerman.id.au>
On Tue, 28 Sept 2021 at 02:16, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Michael Ellerman <mpe@ellerman.id.au> writes:
> > Ard Biesheuvel <ardb@kernel.org> writes:
> >> On Tue, 14 Sept 2021 at 14:11, Ard Biesheuvel <ardb@kernel.org> wrote:
> >>>
> >>> The CPU field will be moved back into thread_info even when
> >>> THREAD_INFO_IN_TASK is enabled, so add it back to powerpc's definition
> >>> of struct thread_info.
> >>>
> >>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >>
> >> Michael,
> >>
> >> Do you have any objections or issues with this patch or the subsequent
> >> ones cleaning up the task CPU kludge for ppc32? Christophe indicated
> >> that he was happy with it.
> >
> > No objections, it looks good to me, thanks for cleaning up that horror :)
> >
> > It didn't apply cleanly to master so I haven't tested it at all, if you can point me at a
> > git tree with the dependencies I'd be happy to run some tests over it.
>
> Actually I realised I can just drop the last patch.
>
> So that looks fine, passes my standard quick build & boot on qemu tests,
> and builds with/without stack protector enabled.
>
Thanks.
Do you have any opinion on how this series should be merged? Kees Cook
is willing to take them via his cross-arch tree, or you could carry
them if you prefer. Taking it via multiple trees at the same time is
going to be tricky, or take two cycles, with I'd prefer to avoid.
--
Ard.
^ permalink raw reply
* [PATCH v5 4/4] docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for nvdimm pmu
From: Kajol Jain @ 2021-09-28 12:48 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
tglx
Details are added for the event, cpumask and format attributes
in the ABI documentation.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
Documentation/ABI/testing/sysfs-bus-nvdimm | 35 ++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-bus-nvdimm b/Documentation/ABI/testing/sysfs-bus-nvdimm
index bff84a16812a..64004d5e4840 100644
--- a/Documentation/ABI/testing/sysfs-bus-nvdimm
+++ b/Documentation/ABI/testing/sysfs-bus-nvdimm
@@ -6,3 +6,38 @@ Description:
The libnvdimm sub-system implements a common sysfs interface for
platform nvdimm resources. See Documentation/driver-api/nvdimm/.
+
+What: /sys/bus/event_source/devices/nmemX/format
+Date: September 2021
+KernelVersion: 5.16
+Contact: Kajol Jain <kjain@linux.ibm.com>
+Description: (RO) Attribute group to describe the magic bits
+ that go into perf_event_attr.config for a particular pmu.
+ (See ABI/testing/sysfs-bus-event_source-devices-format).
+
+ Each attribute under this group defines a bit range of the
+ perf_event_attr.config. Supported attribute is listed
+ below::
+ event = "config:0-4" - event ID
+
+ For example::
+ ctl_res_cnt = "event=0x1"
+
+What: /sys/bus/event_source/devices/nmemX/events
+Date: September 2021
+KernelVersion: 5.16
+Contact: Kajol Jain <kjain@linux.ibm.com>
+Description: (RO) Attribute group to describe performance monitoring events
+ for the nvdimm memory device. Each attribute in this group
+ describes a single performance monitoring event supported by
+ this nvdimm pmu. The name of the file is the name of the event.
+ (See ABI/testing/sysfs-bus-event_source-devices-events). A
+ listing of the events supported by a given nvdimm provider type
+ can be found in Documentation/driver-api/nvdimm/$provider.
+
+What: /sys/bus/event_source/devices/nmemX/cpumask
+Date: September 2021
+KernelVersion: 5.16
+Contact: Kajol Jain <kjain@linux.ibm.com>
+Description: (RO) This sysfs file exposes the cpumask which is designated to
+ to retrieve nvdimm pmu event counter data.
--
2.26.2
^ permalink raw reply related
* [PATCH v5 3/4] powerpc/papr_scm: Add perf interface support
From: Kajol Jain @ 2021-09-28 12:48 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
tglx
Performance monitoring support for papr-scm nvdimm devices
via perf interface is added which includes addition of pmu
functions like add/del/read/event_init for nvdimm_pmu struture.
A new parameter 'priv' in added to the pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.
papr_scm_pmu_register function populates the nvdimm_pmu structure
with name, capabilities, cpumask along with event handling
functions. Finally the populated nvdimm_pmu structure is passed to
register the pmu device. Event handling functions internally uses
hcall to get events and counter data.
Result in power9 machine with 2 nvdimm device:
Ex: List all event by perf list
command:# perf list nmem
nmem0/cache_rh_cnt/ [Kernel PMU event]
nmem0/cache_wh_cnt/ [Kernel PMU event]
nmem0/cri_res_util/ [Kernel PMU event]
nmem0/ctl_res_cnt/ [Kernel PMU event]
nmem0/ctl_res_tm/ [Kernel PMU event]
nmem0/fast_w_cnt/ [Kernel PMU event]
nmem0/host_l_cnt/ [Kernel PMU event]
nmem0/host_l_dur/ [Kernel PMU event]
nmem0/host_s_cnt/ [Kernel PMU event]
nmem0/host_s_dur/ [Kernel PMU event]
nmem0/med_r_cnt/ [Kernel PMU event]
nmem0/med_r_dur/ [Kernel PMU event]
nmem0/med_w_cnt/ [Kernel PMU event]
nmem0/med_w_dur/ [Kernel PMU event]
nmem0/mem_life/ [Kernel PMU event]
nmem0/poweron_secs/ [Kernel PMU event]
...
nmem1/mem_life/ [Kernel PMU event]
nmem1/poweron_secs/ [Kernel PMU event]
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
arch/powerpc/include/asm/device.h | 5 +
arch/powerpc/platforms/pseries/papr_scm.c | 225 ++++++++++++++++++++++
2 files changed, 230 insertions(+)
diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {
struct pdev_archdata {
u64 dma_mask;
+ /*
+ * Pointer to nvdimm_pmu structure, to handle the unregistering
+ * of pmu device
+ */
+ void *priv;
};
#endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f48e87ac89c9..bdf2620db461 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -19,6 +19,7 @@
#include <asm/papr_pdsm.h>
#include <asm/mce.h>
#include <asm/unaligned.h>
+#include <linux/perf_event.h>
#define BIND_ANY_ADDR (~0ul)
@@ -68,6 +69,8 @@
#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
#define PAPR_SCM_PERF_STATS_VERSION 0x1
+#define to_nvdimm_pmu(_pmu) container_of(_pmu, struct nvdimm_pmu, pmu)
+
/* Struct holding a single performance metric */
struct papr_scm_perf_stat {
u8 stat_id[8];
@@ -120,6 +123,9 @@ struct papr_scm_priv {
/* length of the stat buffer as expected by phyp */
size_t stat_buffer_len;
+
+ /* array to have event_code and stat_id mappings */
+ char **nvdimm_events_map;
};
static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -340,6 +346,218 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
return 0;
}
+static int papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
+{
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats;
+ struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+ int rc, size;
+
+ /* Allocate request buffer enough to hold single performance stat */
+ size = sizeof(struct papr_scm_perf_stats) +
+ sizeof(struct papr_scm_perf_stat);
+
+ if (!p || !p->nvdimm_events_map)
+ return -EINVAL;
+
+ stats = kzalloc(size, GFP_KERNEL);
+ if (!stats)
+ return -ENOMEM;
+
+ stat = &stats->scm_statistic[0];
+ memcpy(&stat->stat_id,
+ p->nvdimm_events_map[event->attr.config],
+ sizeof(stat->stat_id));
+ stat->stat_val = 0;
+
+ rc = drc_pmem_query_stats(p, stats, 1);
+ if (rc < 0) {
+ kfree(stats);
+ return rc;
+ }
+
+ *count = be64_to_cpu(stat->stat_val);
+ kfree(stats);
+ return 0;
+}
+
+static int papr_scm_pmu_event_init(struct perf_event *event)
+{
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+ struct papr_scm_priv *p;
+
+ if (!nd_pmu)
+ return -EINVAL;
+
+ /* test the event attr type for PMU enumeration */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* it does not support event sampling mode */
+ if (is_sampling_event(event))
+ return -EOPNOTSUPP;
+
+ /* no branch sampling */
+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
+ p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
+ if (!p)
+ return -EINVAL;
+
+ /* Invalid eventcode */
+ if (event->attr.config == 0 || event->attr.config > 16)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int papr_scm_pmu_add(struct perf_event *event, int flags)
+{
+ u64 count;
+ int rc;
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+ if (!nd_pmu)
+ return -EINVAL;
+
+ if (flags & PERF_EF_START) {
+ rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &count);
+ if (rc)
+ return rc;
+
+ local64_set(&event->hw.prev_count, count);
+ }
+
+ return 0;
+}
+
+static void papr_scm_pmu_read(struct perf_event *event)
+{
+ u64 prev, now;
+ int rc;
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+ if (!nd_pmu)
+ return;
+
+ rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &now);
+ if (rc)
+ return;
+
+ prev = local64_xchg(&event->hw.prev_count, now);
+ local64_add(now - prev, &event->count);
+}
+
+static void papr_scm_pmu_del(struct perf_event *event, int flags)
+{
+ papr_scm_pmu_read(event);
+}
+
+static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu *nd_pmu)
+{
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats;
+ char *statid;
+ int index, rc, count;
+ u32 available_events;
+
+ if (!p->stat_buffer_len)
+ return -ENOENT;
+
+ available_events = (p->stat_buffer_len - sizeof(struct papr_scm_perf_stats))
+ / sizeof(struct papr_scm_perf_stat);
+
+ /* Allocate the buffer for phyp where stats are written */
+ stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
+ if (!stats) {
+ rc = -ENOMEM;
+ return rc;
+ }
+
+ /* Allocate memory to nvdimm_event_map */
+ p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL);
+ if (!p->nvdimm_events_map) {
+ rc = -ENOMEM;
+ goto out_stats;
+ }
+
+ /* Called to get list of events supported */
+ rc = drc_pmem_query_stats(p, stats, 0);
+ if (rc)
+ goto out_nvdimm_events_map;
+
+ for (index = 0, stat = stats->scm_statistic, count = 0;
+ index < available_events; index++, ++stat) {
+ statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+ if (!statid) {
+ rc = -ENOMEM;
+ goto out_nvdimm_events_map;
+ }
+
+ strcpy(statid, stat->stat_id);
+ p->nvdimm_events_map[count] = statid;
+ count++;
+ }
+ p->nvdimm_events_map[count] = NULL;
+ kfree(stats);
+ return 0;
+
+out_nvdimm_events_map:
+ kfree(p->nvdimm_events_map);
+out_stats:
+ kfree(stats);
+ return rc;
+}
+
+static void papr_scm_pmu_register(struct papr_scm_priv *p)
+{
+ struct nvdimm_pmu *nd_pmu;
+ int rc, nodeid;
+
+ nd_pmu = kzalloc(sizeof(*nd_pmu), GFP_KERNEL);
+ if (!nd_pmu) {
+ rc = -ENOMEM;
+ goto pmu_err_print;
+ }
+
+ rc = papr_scm_pmu_check_events(p, nd_pmu);
+ if (rc)
+ goto pmu_check_events_err;
+
+ nd_pmu->pmu.task_ctx_nr = perf_invalid_context;
+ nd_pmu->pmu.name = nvdimm_name(p->nvdimm);
+ nd_pmu->pmu.event_init = papr_scm_pmu_event_init;
+ nd_pmu->pmu.read = papr_scm_pmu_read;
+ nd_pmu->pmu.add = papr_scm_pmu_add;
+ nd_pmu->pmu.del = papr_scm_pmu_del;
+
+ nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
+ PERF_PMU_CAP_NO_EXCLUDE;
+
+ /*updating the cpumask variable */
+ nodeid = dev_to_node(&p->pdev->dev);
+ nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
+
+ rc = register_nvdimm_pmu(nd_pmu, p->pdev);
+ if (rc)
+ goto pmu_register_err;
+
+ /*
+ * Set archdata.priv value to nvdimm_pmu structure, to handle the
+ * unregistering of pmu device.
+ */
+ p->pdev->archdata.priv = nd_pmu;
+ return;
+
+pmu_register_err:
+ kfree(p->nvdimm_events_map);
+pmu_check_events_err:
+ kfree(nd_pmu);
+pmu_err_print:
+ dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
+}
+
/*
* Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
* health information.
@@ -1236,6 +1454,7 @@ static int papr_scm_probe(struct platform_device *pdev)
goto err2;
platform_set_drvdata(pdev, p);
+ papr_scm_pmu_register(p);
return 0;
@@ -1254,6 +1473,12 @@ static int papr_scm_remove(struct platform_device *pdev)
nvdimm_bus_unregister(p->bus);
drc_pmem_unbind(p);
+
+ if (pdev->archdata.priv)
+ unregister_nvdimm_pmu(pdev->archdata.priv);
+
+ pdev->archdata.priv = NULL;
+ kfree(p->nvdimm_events_map);
kfree(p->bus_desc.provider_name);
kfree(p);
--
2.26.2
^ permalink raw reply related
* [PATCH v5 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
From: Kajol Jain @ 2021-09-28 12:47 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: santosh, maddy, kernel test robot, rnsastry, aneesh.kumar,
atrajeev, kjain, vaibhav, tglx
A common interface is added to get performance stats reporting
support for nvdimm devices. Added interface defines supported
event list, config fields for the event attributes and their
corresponding bit values which are exported via sysfs.
Interface also added support for pmu register/unregister functions,
cpu hotplug feature along with macros for handling events addition
via sysfs. It adds attribute groups for format, cpumask and events
to the pmu structure.
User could use the standard perf tool to access perf events exposed
via nvdimm pmu.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[Make hotplug function static as reported by kernel test rorbot]
Reported-by: kernel test robot <lkp@intel.com>
---
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/nd_perf.c | 328 +++++++++++++++++++++++++++++++++++++++
include/linux/nd.h | 21 +++
3 files changed, 350 insertions(+)
create mode 100644 drivers/nvdimm/nd_perf.c
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..25dba6095612 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -18,6 +18,7 @@ nd_e820-y := e820.o
libnvdimm-y := core.o
libnvdimm-y += bus.o
libnvdimm-y += dimm_devs.o
+libnvdimm-y += nd_perf.o
libnvdimm-y += dimm.o
libnvdimm-y += region_devs.o
libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
new file mode 100644
index 000000000000..314415894acf
--- /dev/null
+++ b/drivers/nvdimm/nd_perf.c
@@ -0,0 +1,328 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * nd_perf.c: NVDIMM Device Performance Monitoring Unit support
+ *
+ * Perf interface to expose nvdimm performance stats.
+ *
+ * Copyright (C) 2021 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "nvdimm_pmu: " fmt
+
+#include <linux/nd.h>
+
+#define EVENT(_name, _code) enum{_name = _code}
+
+/*
+ * NVDIMM Events codes.
+ */
+
+/* Controller Reset Count */
+EVENT(CTL_RES_CNT, 0x1);
+/* Controller Reset Elapsed Time */
+EVENT(CTL_RES_TM, 0x2);
+/* Power-on Seconds */
+EVENT(POWERON_SECS, 0x3);
+/* Life Remaining */
+EVENT(MEM_LIFE, 0x4);
+/* Critical Resource Utilization */
+EVENT(CRI_RES_UTIL, 0x5);
+/* Host Load Count */
+EVENT(HOST_L_CNT, 0x6);
+/* Host Store Count */
+EVENT(HOST_S_CNT, 0x7);
+/* Host Store Duration */
+EVENT(HOST_S_DUR, 0x8);
+/* Host Load Duration */
+EVENT(HOST_L_DUR, 0x9);
+/* Media Read Count */
+EVENT(MED_R_CNT, 0xa);
+/* Media Write Count */
+EVENT(MED_W_CNT, 0xb);
+/* Media Read Duration */
+EVENT(MED_R_DUR, 0xc);
+/* Media Write Duration */
+EVENT(MED_W_DUR, 0xd);
+/* Cache Read Hit Count */
+EVENT(CACHE_RH_CNT, 0xe);
+/* Cache Write Hit Count */
+EVENT(CACHE_WH_CNT, 0xf);
+/* Fast Write Count */
+EVENT(FAST_W_CNT, 0x10);
+
+NVDIMM_EVENT_ATTR(ctl_res_cnt, CTL_RES_CNT);
+NVDIMM_EVENT_ATTR(ctl_res_tm, CTL_RES_TM);
+NVDIMM_EVENT_ATTR(poweron_secs, POWERON_SECS);
+NVDIMM_EVENT_ATTR(mem_life, MEM_LIFE);
+NVDIMM_EVENT_ATTR(cri_res_util, CRI_RES_UTIL);
+NVDIMM_EVENT_ATTR(host_l_cnt, HOST_L_CNT);
+NVDIMM_EVENT_ATTR(host_s_cnt, HOST_S_CNT);
+NVDIMM_EVENT_ATTR(host_s_dur, HOST_S_DUR);
+NVDIMM_EVENT_ATTR(host_l_dur, HOST_L_DUR);
+NVDIMM_EVENT_ATTR(med_r_cnt, MED_R_CNT);
+NVDIMM_EVENT_ATTR(med_w_cnt, MED_W_CNT);
+NVDIMM_EVENT_ATTR(med_r_dur, MED_R_DUR);
+NVDIMM_EVENT_ATTR(med_w_dur, MED_W_DUR);
+NVDIMM_EVENT_ATTR(cache_rh_cnt, CACHE_RH_CNT);
+NVDIMM_EVENT_ATTR(cache_wh_cnt, CACHE_WH_CNT);
+NVDIMM_EVENT_ATTR(fast_w_cnt, FAST_W_CNT);
+
+static struct attribute *nvdimm_events_attr[] = {
+ NVDIMM_EVENT_PTR(CTL_RES_CNT),
+ NVDIMM_EVENT_PTR(CTL_RES_TM),
+ NVDIMM_EVENT_PTR(POWERON_SECS),
+ NVDIMM_EVENT_PTR(MEM_LIFE),
+ NVDIMM_EVENT_PTR(CRI_RES_UTIL),
+ NVDIMM_EVENT_PTR(HOST_L_CNT),
+ NVDIMM_EVENT_PTR(HOST_S_CNT),
+ NVDIMM_EVENT_PTR(HOST_S_DUR),
+ NVDIMM_EVENT_PTR(HOST_L_DUR),
+ NVDIMM_EVENT_PTR(MED_R_CNT),
+ NVDIMM_EVENT_PTR(MED_W_CNT),
+ NVDIMM_EVENT_PTR(MED_R_DUR),
+ NVDIMM_EVENT_PTR(MED_W_DUR),
+ NVDIMM_EVENT_PTR(CACHE_RH_CNT),
+ NVDIMM_EVENT_PTR(CACHE_WH_CNT),
+ NVDIMM_EVENT_PTR(FAST_W_CNT),
+ NULL
+};
+
+static struct attribute_group nvdimm_pmu_events_group = {
+ .name = "events",
+ .attrs = nvdimm_events_attr,
+};
+
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+ .name = "format",
+ .attrs = nvdimm_pmu_format_attr,
+};
+
+ssize_t nvdimm_events_sysfs_show(struct device *dev,
+ struct device_attribute *attr, char *page)
+{
+ struct perf_pmu_events_attr *pmu_attr;
+
+ pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
+
+ return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
+}
+
+static ssize_t nvdimm_pmu_cpumask_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+ struct nvdimm_pmu *nd_pmu;
+
+ nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+ return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+ struct nvdimm_pmu *nd_pmu;
+ u32 target;
+ int nodeid;
+ const struct cpumask *cpumask;
+
+ nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+ /* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
+ cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
+
+ /*
+ * If given cpu is not same as current designated cpu for
+ * counter access, just return.
+ */
+ if (cpu != nd_pmu->cpu)
+ return 0;
+
+ /* Check for any active cpu in nd_pmu->arch_cpumask */
+ target = cpumask_any(&nd_pmu->arch_cpumask);
+
+ /*
+ * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
+ * check in given cpu's numa node list.
+ */
+ if (target >= nr_cpu_ids) {
+ nodeid = cpu_to_node(cpu);
+ cpumask = cpumask_of_node(nodeid);
+ target = cpumask_any_but(cpumask, cpu);
+ }
+ nd_pmu->cpu = target;
+
+ /* Migrate nvdimm pmu events to the new target cpu if valid */
+ if (target >= 0 && target < nr_cpu_ids)
+ perf_pmu_migrate_context(&nd_pmu->pmu, cpu, target);
+
+ return 0;
+}
+
+static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+ struct nvdimm_pmu *nd_pmu;
+
+ nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+ if (nd_pmu->cpu >= nr_cpu_ids)
+ nd_pmu->cpu = cpu;
+
+ return 0;
+}
+
+static int create_cpumask_attr_group(struct nvdimm_pmu *nd_pmu)
+{
+ struct perf_pmu_events_attr *pmu_events_attr;
+ struct attribute **attrs_group;
+ struct attribute_group *nvdimm_pmu_cpumask_group;
+
+ pmu_events_attr = kzalloc(sizeof(*pmu_events_attr), GFP_KERNEL);
+ if (!pmu_events_attr)
+ return -ENOMEM;
+
+ attrs_group = kzalloc(2 * sizeof(struct attribute *), GFP_KERNEL);
+ if (!attrs_group) {
+ kfree(pmu_events_attr);
+ return -ENOMEM;
+ }
+
+ /* Allocate memory for cpumask attribute group */
+ nvdimm_pmu_cpumask_group = kzalloc(sizeof(*nvdimm_pmu_cpumask_group), GFP_KERNEL);
+ if (!nvdimm_pmu_cpumask_group) {
+ kfree(pmu_events_attr);
+ kfree(attrs_group);
+ return -ENOMEM;
+ }
+
+ sysfs_attr_init(&pmu_events_attr->attr.attr);
+ pmu_events_attr->attr.attr.name = "cpumask";
+ pmu_events_attr->attr.attr.mode = 0444;
+ pmu_events_attr->attr.show = nvdimm_pmu_cpumask_show;
+ attrs_group[0] = &pmu_events_attr->attr.attr;
+ attrs_group[1] = NULL;
+
+ nvdimm_pmu_cpumask_group->attrs = attrs_group;
+ nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = nvdimm_pmu_cpumask_group;
+ return 0;
+}
+
+static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
+{
+ int nodeid, rc;
+ const struct cpumask *cpumask;
+
+ /*
+ * Incase of cpu hotplug feature, arch specific code
+ * can provide required cpumask which can be used
+ * to get designatd cpu for counter access.
+ * Check for any active cpu in nd_pmu->arch_cpumask.
+ */
+ if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
+ nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
+ } else {
+ /* pick active cpu from the cpumask of device numa node. */
+ nodeid = dev_to_node(nd_pmu->dev);
+ cpumask = cpumask_of_node(nodeid);
+ nd_pmu->cpu = cpumask_any(cpumask);
+ }
+
+ rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
+ nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
+
+ if (rc < 0)
+ return rc;
+
+ nd_pmu->cpuhp_state = rc;
+
+ /* Register the pmu instance for cpu hotplug */
+ rc = cpuhp_state_add_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ if (rc) {
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+ return rc;
+ }
+
+ /* Create cpumask attribute group */
+ rc = create_cpumask_attr_group(nd_pmu);
+ if (rc) {
+ cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+ return rc;
+ }
+
+ return 0;
+}
+
+static void nvdimm_pmu_free_hotplug_memory(struct nvdimm_pmu *nd_pmu)
+{
+ cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+
+ if (nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR])
+ kfree(nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR]->attrs);
+ kfree(nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR]);
+}
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nd_pmu, struct platform_device *pdev)
+{
+ int rc;
+
+ if (!nd_pmu || !pdev)
+ return -EINVAL;
+
+ /* event functions like add/del/read/event_init and pmu name should not be NULL */
+ if (WARN_ON_ONCE(!(nd_pmu->pmu.event_init && nd_pmu->pmu.add &&
+ nd_pmu->pmu.del && nd_pmu->pmu.read && nd_pmu->pmu.name)))
+ return -EINVAL;
+
+ nd_pmu->pmu.attr_groups = kzalloc((NVDIMM_PMU_NULL_ATTR + 1) *
+ sizeof(struct attribute_group *), GFP_KERNEL);
+ if (!nd_pmu->pmu.attr_groups)
+ return -ENOMEM;
+
+ /*
+ * Add platform_device->dev pointer to nvdimm_pmu to access
+ * device data in events functions.
+ */
+ nd_pmu->dev = &pdev->dev;
+
+ /* Fill attribute groups for the nvdimm pmu device */
+ nd_pmu->pmu.attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
+ nd_pmu->pmu.attr_groups[NVDIMM_PMU_EVENT_ATTR] = &nvdimm_pmu_events_group;
+ nd_pmu->pmu.attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
+
+ /* Fill attribute group for cpumask */
+ rc = nvdimm_pmu_cpu_hotplug_init(nd_pmu);
+ if (rc) {
+ pr_info("cpu hotplug feature failed for device: %s\n", nd_pmu->pmu.name);
+ kfree(nd_pmu->pmu.attr_groups);
+ return rc;
+ }
+
+ rc = perf_pmu_register(&nd_pmu->pmu, nd_pmu->pmu.name, -1);
+ if (rc) {
+ kfree(nd_pmu->pmu.attr_groups);
+ nvdimm_pmu_free_hotplug_memory(nd_pmu);
+ return rc;
+ }
+
+ pr_info("%s NVDIMM performance monitor support registered\n",
+ nd_pmu->pmu.name);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(register_nvdimm_pmu);
+
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu)
+{
+ perf_pmu_unregister(&nd_pmu->pmu);
+ nvdimm_pmu_free_hotplug_memory(nd_pmu);
+ kfree(nd_pmu);
+}
+EXPORT_SYMBOL_GPL(unregister_nvdimm_pmu);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index f5ed4db2d859..fa4370607bdb 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -9,6 +9,7 @@
#include <linux/device.h>
#include <linux/badblocks.h>
#include <linux/perf_event.h>
+#include <linux/platform_device.h>
enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
@@ -24,6 +25,19 @@ enum nvdimm_claim_class {
NVDIMM_CCLASS_UNKNOWN,
};
+#define NVDIMM_EVENT_VAR(_id) event_attr_##_id
+#define NVDIMM_EVENT_PTR(_id) (&event_attr_##_id.attr.attr)
+
+#define NVDIMM_EVENT_ATTR(_name, _id) \
+ PMU_EVENT_ATTR(_name, NVDIMM_EVENT_VAR(_id), _id, \
+ nvdimm_events_sysfs_show)
+
+/* Event attribute array index */
+#define NVDIMM_PMU_FORMAT_ATTR 0
+#define NVDIMM_PMU_EVENT_ATTR 1
+#define NVDIMM_PMU_CPUMASK_ATTR 2
+#define NVDIMM_PMU_NULL_ATTR 3
+
/**
* struct nvdimm_pmu - data structure for nvdimm perf driver
* @pmu: pmu data structure for nvdimm performance stats.
@@ -43,6 +57,13 @@ struct nvdimm_pmu {
struct cpumask arch_cpumask;
};
+extern ssize_t nvdimm_events_sysfs_show(struct device *dev,
+ struct device_attribute *attr,
+ char *page);
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
+
struct nd_device_driver {
struct device_driver drv;
unsigned long type;
--
2.26.2
^ permalink raw reply related
* [PATCH v5 1/4] drivers/nvdimm: Add nvdimm pmu structure
From: Kajol Jain @ 2021-09-28 12:47 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
tglx
A structure is added called nvdimm_pmu, for performance
stats reporting support of nvdimm devices. It can be used to add
device pmu data such as pmu data structure for performance
stats, nvdimm device pointer along with cpumask attributes.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
include/linux/nd.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/include/linux/nd.h b/include/linux/nd.h
index ee9ad76afbba..f5ed4db2d859 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,7 @@
#include <linux/ndctl.h>
#include <linux/device.h>
#include <linux/badblocks.h>
+#include <linux/perf_event.h>
enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
@@ -23,6 +24,25 @@ enum nvdimm_claim_class {
NVDIMM_CCLASS_UNKNOWN,
};
+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @dev: nvdimm device pointer.
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ * @arch_cpumask: cpumask to get designated cpu for counter access.
+ */
+struct nvdimm_pmu {
+ struct pmu pmu;
+ struct device *dev;
+ int cpu;
+ struct hlist_node node;
+ enum cpuhp_state cpuhp_state;
+ /* cpumask provided by arch/platform specific code */
+ struct cpumask arch_cpumask;
+};
+
struct nd_device_driver {
struct device_driver drv;
unsigned long type;
--
2.26.2
^ permalink raw reply related
* [PATCH] powerpc: fix unbalanced node refcount in check_kvm_guest()
From: Nathan Lynch @ 2021-09-28 12:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: srikar, npiggin
When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it
returns without performing a matching put for the lookup, leaving the
node's reference count elevated.
Add the necessary call to of_node_put(), rearranging the code slightly to
avoid repetition or goto.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell restrictions")
---
arch/powerpc/kernel/firmware.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kernel/firmware.c b/arch/powerpc/kernel/firmware.c
index c7022c41cc31..20328f72f9f2 100644
--- a/arch/powerpc/kernel/firmware.c
+++ b/arch/powerpc/kernel/firmware.c
@@ -31,11 +31,10 @@ int __init check_kvm_guest(void)
if (!hyper_node)
return 0;
- if (!of_device_is_compatible(hyper_node, "linux,kvm"))
- return 0;
-
- static_branch_enable(&kvm_guest);
+ if (of_device_is_compatible(hyper_node, "linux,kvm"))
+ static_branch_enable(&kvm_guest);
+ of_node_put(hyper_node);
return 0;
}
core_initcall(check_kvm_guest); // before kvm_guest_init()
--
2.31.1
^ permalink raw reply related
* Re: [PATCH v8 1/2] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Pratik Sampat @ 2021-09-28 12:43 UTC (permalink / raw)
To: Greg KH
Cc: farosas, pratik.r.sampat, linuxppc-dev, kvm-ppc, linux-kernel,
paulus, linux-kselftest, kjain, shuah
In-Reply-To: <YVMFvyGwfH+rxYPz@kroah.com>
Hello Greg,
Thank you for your review.
On 28/09/21 5:38 pm, Greg KH wrote:
> On Tue, Sep 28, 2021 at 05:21:01PM +0530, Pratik R. Sampat wrote:
>> Adds a generic interface to represent the energy and frequency related
>> PAPR attributes on the system using the new H_CALL
>> "H_GET_ENERGY_SCALE_INFO".
>>
>> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
>> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
>> will be deprecated P10 onwards.
>>
>> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
>> hcall(
>> uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
>> uint64 flags, // Per the flag request
>> uint64 firstAttributeId,// The attribute id
>> uint64 bufferAddress, // Guest physical address of the output buffer
>> uint64 bufferSize // The size in bytes of the output buffer
>> );
>>
>> This H_CALL can query either all the attributes at once with
>> firstAttributeId = 0, flags = 0 as well as query only one attribute
>> at a time with firstAttributeId = id, flags = 1.
>>
>> The output buffer consists of the following
>> 1. number of attributes - 8 bytes
>> 2. array offset to the data location - 8 bytes
>> 3. version info - 1 byte
>> 4. A data array of size num attributes, which contains the following:
>> a. attribute ID - 8 bytes
>> b. attribute value in number - 8 bytes
>> c. attribute name in string - 64 bytes
>> d. attribute value in string - 64 bytes
>>
>> The new H_CALL exports information in direct string value format, hence
>> a new interface has been introduced in
>> /sys/firmware/papr/energy_scale_info to export this information to
>> userspace in an extensible pass-through format.
>>
>> The H_CALL returns the name, numeric value and string value (if exists)
>>
>> The format of exposing the sysfs information is as follows:
>> /sys/firmware/papr/energy_scale_info/
>> |-- <id>/
>> |-- desc
>> |-- value
>> |-- value_desc (if exists)
>> |-- <id>/
>> |-- desc
>> |-- value
>> |-- value_desc (if exists)
>> ...
>>
>> The energy information that is exported is useful for userspace tools
>> such as powerpc-utils. Currently these tools infer the
>> "power_mode_data" value in the lparcfg, which in turn is obtained from
>> the to be deprecated H_GET_EM_PARMS H_CALL.
>> On future platforms, such userspace utilities will have to look at the
>> data returned from the new H_CALL being populated in this new sysfs
>> interface and report this information directly without the need of
>> interpretation.
>>
>> Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
>> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
>> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
>> Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
>> ---
>> .../sysfs-firmware-papr-energy-scale-info | 26 ++
>> arch/powerpc/include/asm/hvcall.h | 24 +-
>> arch/powerpc/kvm/trace_hv.h | 1 +
>> arch/powerpc/platforms/pseries/Makefile | 3 +-
>> .../pseries/papr_platform_attributes.c | 312 ++++++++++++++++++
>> 5 files changed, 364 insertions(+), 2 deletions(-)
>> create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>> create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
>>
>> diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>> new file mode 100644
>> index 000000000000..139a576c7c9d
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>> @@ -0,0 +1,26 @@
>> +What: /sys/firmware/papr/energy_scale_info
>> +Date: June 2021
>> +Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
>> +Description: Directory hosting a set of platform attributes like
>> + energy/frequency on Linux running as a PAPR guest.
>> +
>> + Each file in a directory contains a platform
>> + attribute hierarchy pertaining to performance/
>> + energy-savings mode and processor frequency.
>> +
>> +What: /sys/firmware/papr/energy_scale_info/<id>
>> + /sys/firmware/papr/energy_scale_info/<id>/desc
>> + /sys/firmware/papr/energy_scale_info/<id>/value
>> + /sys/firmware/papr/energy_scale_info/<id>/value_desc
>> +Date: June 2021
>> +Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
>> +Description: Energy, frequency attributes directory for POWERVM servers
>> +
>> + This directory provides energy, frequency, folding information. It
>> + contains below sysfs attributes:
>> +
>> + - desc: String description of the attribute <id>
>> +
>> + - value: Numeric value of attribute <id>
>> +
>> + - value_desc: String value of attribute <id>
> Can you just make 4 different entries in this file, making it easier to
> parse and extend over time?
Do you mean I only create one file per attribute and populate it with 4
different entries as follows?
# cat /sys/firmware/papr/energy_scale_info/<id>
id:
desc:
value:
value_desc:
>
>> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
>> index 9bcf345cb208..38980fef7a3d 100644
>> --- a/arch/powerpc/include/asm/hvcall.h
>> +++ b/arch/powerpc/include/asm/hvcall.h
>> @@ -323,7 +323,8 @@
>> #define H_SCM_PERFORMANCE_STATS 0x418
>> #define H_RPT_INVALIDATE 0x448
>> #define H_SCM_FLUSH 0x44C
>> -#define MAX_HCALL_OPCODE H_SCM_FLUSH
>> +#define H_GET_ENERGY_SCALE_INFO 0x450
>> +#define MAX_HCALL_OPCODE H_GET_ENERGY_SCALE_INFO
>>
>> /* Scope args for H_SCM_UNBIND_ALL */
>> #define H_UNBIND_SCOPE_ALL (0x1)
>> @@ -641,6 +642,27 @@ struct hv_gpci_request_buffer {
>> uint8_t bytes[HGPCI_MAX_DATA_BYTES];
>> } __packed;
>>
>> +#define ESI_VERSION 0x1
>> +#define MAX_ESI_ATTRS 10
>> +#define MAX_BUF_SZ (sizeof(struct h_energy_scale_info_hdr) + \
>> + (sizeof(struct energy_scale_attribute) * MAX_ESI_ATTRS))
>> +
>> +struct energy_scale_attribute {
>> + __be64 id;
>> + __be64 value;
>> + unsigned char desc[64];
>> + unsigned char value_desc[64];
>> +} __packed;
>> +
>> +struct h_energy_scale_info_hdr {
>> + __be64 num_attrs;
>> + __be64 array_offset;
>> + __u8 data_header_version;
>> +} __packed;
>> +
>> +/* /sys/firmware/papr */
>> +extern struct kobject *papr_kobj;
>> +
>> #endif /* __ASSEMBLY__ */
>> #endif /* __KERNEL__ */
>> #endif /* _ASM_POWERPC_HVCALL_H */
>> diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
>> index 830a126e095d..38cd0ed0a617 100644
>> --- a/arch/powerpc/kvm/trace_hv.h
>> +++ b/arch/powerpc/kvm/trace_hv.h
>> @@ -115,6 +115,7 @@
>> {H_VASI_STATE, "H_VASI_STATE"}, \
>> {H_ENABLE_CRQ, "H_ENABLE_CRQ"}, \
>> {H_GET_EM_PARMS, "H_GET_EM_PARMS"}, \
>> + {H_GET_ENERGY_SCALE_INFO, "H_GET_ENERGY_SCALE_INFO"}, \
>> {H_SET_MPP, "H_SET_MPP"}, \
>> {H_GET_MPP, "H_GET_MPP"}, \
>> {H_HOME_NODE_ASSOCIATIVITY, "H_HOME_NODE_ASSOCIATIVITY"}, \
>> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
>> index 4cda0ef87be0..c4c19f6a5975 100644
>> --- a/arch/powerpc/platforms/pseries/Makefile
>> +++ b/arch/powerpc/platforms/pseries/Makefile
>> @@ -6,7 +6,8 @@ obj-y := lpar.o hvCall.o nvram.o reconfig.o \
>> of_helpers.o \
>> setup.o iommu.o event_sources.o ras.o \
>> firmware.o power.o dlpar.o mobility.o rng.o \
>> - pci.o pci_dlpar.o eeh_pseries.o msi.o
>> + pci.o pci_dlpar.o eeh_pseries.o msi.o \
>> + papr_platform_attributes.o
>> obj-$(CONFIG_SMP) += smp.o
>> obj-$(CONFIG_SCANLOG) += scanlog.o
>> obj-$(CONFIG_KEXEC_CORE) += kexec.o
>> diff --git a/arch/powerpc/platforms/pseries/papr_platform_attributes.c b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
>> new file mode 100644
>> index 000000000000..84ddce52e519
>> --- /dev/null
>> +++ b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
>> @@ -0,0 +1,312 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Platform energy and frequency attributes driver
>> + *
>> + * This driver creates a sys file at /sys/firmware/papr/ which encapsulates a
>> + * directory structure containing files in keyword - value pairs that specify
>> + * energy and frequency configuration of the system.
>> + *
>> + * The format of exposing the sysfs information is as follows:
>> + * /sys/firmware/papr/energy_scale_info/
>> + * |-- <id>/
>> + * |-- desc
>> + * |-- value
>> + * |-- value_desc (if exists)
>> + * |-- <id>/
>> + * |-- desc
>> + * |-- value
>> + * |-- value_desc (if exists)
>> + *
>> + * Copyright 2021 IBM Corp.
>> + */
>> +
>> +#include <asm/hvcall.h>
>> +#include <asm/machdep.h>
>> +
>> +#include "pseries.h"
>> +
>> +/*
>> + * Flag attributes to fetch either all or one attribute from the HCALL
>> + * flag = BE(0) => fetch all attributes with firstAttributeId = 0
>> + * flag = BE(1) => fetch a single attribute with firstAttributeId = id
>> + */
>> +#define ESI_FLAGS_ALL 0
>> +#define ESI_FLAGS_SINGLE PPC_BIT(0)
>> +
>> +#define MAX_ATTRS 3
>> +
>> +struct papr_attr {
>> + u64 id;
>> + struct kobj_attribute kobj_attr;
> Why does an attribute have to be part of this structure?
I bundled both an attribute as well as its ID in a structure because each
attributes value could only be queried from the firmware with the corresponding
ID.
It seemed to be logically connected and that's why I had them in the structure.
Are you suggesting we maintain them separately and don't need the coupling?
>> +};
>> +struct papr_group {
>> + struct attribute_group pg;
>> + struct papr_attr pgattrs[MAX_ATTRS];
>> +} *pgs;
>> +
>> +/* /sys/firmware/papr */
>> +struct kobject *papr_kobj;
>> +/* /sys/firmware/papr/energy_scale_info */
>> +struct kobject *esi_kobj;
>> +
>> +/*
>> + * Extract and export the description of the energy scale attributes
>> + */
>> +static ssize_t papr_show_desc(struct kobject *kobj,
>> + struct kobj_attribute *kobj_attr,
>> + char *buf)
>> +{
>> + struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
>> + kobj_attr);
>> + struct h_energy_scale_info_hdr *t_hdr;
>> + struct energy_scale_attribute *t_esi;
>> + char *t_buf;
>> + int ret = 0;
>> +
>> + t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
>> + if (t_buf == NULL)
>> + return -ENOMEM;
>> +
>> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
>> + pattr->id, virt_to_phys(t_buf),
>> + MAX_BUF_SZ);
>> +
>> + if (ret != H_SUCCESS) {
>> + pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
>> + goto out;
>> + }
>> +
>> + t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
>> + t_esi = (struct energy_scale_attribute *)
>> + (t_buf + be64_to_cpu(t_hdr->array_offset));
>> +
>> + ret = snprintf(buf, sizeof(t_esi->desc), "%s\n", t_esi->desc);
>> + if (ret < 0)
>> + ret = -EIO;
>> +out:
>> + kfree(t_buf);
>> +
>> + return ret;
>> +}
>> +
>> +/*
>> + * Extract and export the numeric value of the energy scale attributes
>> + */
>> +static ssize_t papr_show_value(struct kobject *kobj,
>> + struct kobj_attribute *kobj_attr,
>> + char *buf)
>> +{
>> + struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
>> + kobj_attr);
>> + struct h_energy_scale_info_hdr *t_hdr;
>> + struct energy_scale_attribute *t_esi;
>> + char *t_buf;
>> + int ret = 0;
>> +
>> + t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
>> + if (t_buf == NULL)
>> + return -ENOMEM;
>> +
>> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
>> + pattr->id, virt_to_phys(t_buf),
>> + MAX_BUF_SZ);
>> +
>> + if (ret != H_SUCCESS) {
>> + pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
>> + goto out;
>> + }
>> +
>> + t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
>> + t_esi = (struct energy_scale_attribute *)
>> + (t_buf + be64_to_cpu(t_hdr->array_offset));
>> +
>> + ret = snprintf(buf, sizeof(t_esi->value), "%llu\n",
>> + be64_to_cpu(t_esi->value));
> sysfs_emit() for when writing out to a sysfs file please. Same
> elsewhere in this file.
Sure, I can use sysfs_emit for writing to a sysfs file.
>> + if (ret < 0)
>> + ret = -EIO;
>> +out:
>> + kfree(t_buf);
>> +
>> + return ret;
>> +}
>> +
>> +/*
>> + * Extract and export the value description in string format of the energy
>> + * scale attributes
>> + */
>> +static ssize_t papr_show_value_desc(struct kobject *kobj,
>> + struct kobj_attribute *kobj_attr,
>> + char *buf)
>> +{
>> + struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
>> + kobj_attr);
>> + struct h_energy_scale_info_hdr *t_hdr;
>> + struct energy_scale_attribute *t_esi;
>> + char *t_buf;
>> + int ret = 0;
>> +
>> + t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
>> + if (t_buf == NULL)
>> + return -ENOMEM;
>> +
>> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
>> + pattr->id, virt_to_phys(t_buf),
>> + MAX_BUF_SZ);
>> +
>> + if (ret != H_SUCCESS) {
>> + pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
>> + goto out;
>> + }
>> +
>> + t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
>> + t_esi = (struct energy_scale_attribute *)
>> + (t_buf + be64_to_cpu(t_hdr->array_offset));
>> +
>> + ret = snprintf(buf, sizeof(t_esi->value_desc), "%s\n",
>> + t_esi->value_desc);
>> + if (ret < 0)
>> + ret = -EIO;
>> +out:
>> + kfree(t_buf);
>> +
>> + return ret;
>> +}
>> +
>> +static struct papr_ops_info {
>> + const char *attr_name;
>> + ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *kobj_attr,
>> + char *buf);
>> +} ops_info[MAX_ATTRS] = {
>> + { "desc", papr_show_desc },
>> + { "value", papr_show_value },
>> + { "value_desc", papr_show_value_desc },
> What is wrong with just using the __ATTR_RO() macro and then having an
> array of attributes in a single group? That should be a lot simpler
> overall, right?
If I understand this correctly, you mean I can have a array of attributes in a
flat single group?
I suppose that would be a simpler, given your earlier suggestion to wrap
attribute values up in a single file per attribute.
However, the intent of grouping and keeping files separate was that each sysfs
file has only one value to display.
I can change it to using an array of attributes in a single group too if you
believe that is right way to go instead.
>> +};
>> +
>> +static void add_attr(u64 id, int index, struct papr_attr *attr)
>> +{
>> + attr->id = id;
>> + sysfs_attr_init(&attr->kobj_attr.attr);
>> + attr->kobj_attr.attr.name = ops_info[index].attr_name;
>> + attr->kobj_attr.attr.mode = 0444;
>> + attr->kobj_attr.show = ops_info[index].show;
> If you do the above, no need for this function at all.
>
>> +}
>> +
>> +static int add_attr_group(u64 id, struct papr_group *pg, bool show_val_desc)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < MAX_ATTRS; i++) {
>> + if (!strcmp(ops_info[i].attr_name, "value_desc") &&
>> + !show_val_desc) {
>> + continue;
>> + }
>> + add_attr(id, i, &pg->pgattrs[i]);
>> + pg->pg.attrs[i] = &pg->pgattrs[i].kobj_attr.attr;
>> + }
>> +
>> + return sysfs_create_group(esi_kobj, &pg->pg);
> Again, if you just have a list of attributes, there's no need for this
> function either.
>
> I think this can be a lot simpler than you are currently making it.
I agree, if the groups are eliminated, then all the complexity of adding a
attribute groups vanishes as well.
Thanks for your feedback again.
Pratik
> thanks,
>
> greg k-h
^ permalink raw reply
* [PATCH v5 0/4] Add perf interface to expose nvdimm
From: Kajol Jain @ 2021-09-28 12:41 UTC (permalink / raw)
To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
ira.weiny, vishal.l.verma
Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
tglx
Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for pmu register/unregister
functions. A structure is added called nvdimm_pmu to be used for
adding arch/platform specific data such as cpumask, nvdimm device
pointer and pmu event functions like event_init/add/read/del.
User could use the standard perf tool to access perf events
exposed via pmu.
Interface also defines supported event list, config fields for the
event attributes and their corresponding bit values which are exported
via sysfs. Patch 3 exposes IBM pseries platform nmem* device
performance stats using this interface.
Result from power9 pseries lpar with 2 nvdimm device:
Ex: List all event by perf list
command:# perf list nmem
nmem0/cache_rh_cnt/ [Kernel PMU event]
nmem0/cache_wh_cnt/ [Kernel PMU event]
nmem0/cri_res_util/ [Kernel PMU event]
nmem0/ctl_res_cnt/ [Kernel PMU event]
nmem0/ctl_res_tm/ [Kernel PMU event]
nmem0/fast_w_cnt/ [Kernel PMU event]
nmem0/host_l_cnt/ [Kernel PMU event]
nmem0/host_l_dur/ [Kernel PMU event]
nmem0/host_s_cnt/ [Kernel PMU event]
nmem0/host_s_dur/ [Kernel PMU event]
nmem0/med_r_cnt/ [Kernel PMU event]
nmem0/med_r_dur/ [Kernel PMU event]
nmem0/med_w_cnt/ [Kernel PMU event]
nmem0/med_w_dur/ [Kernel PMU event]
nmem0/mem_life/ [Kernel PMU event]
nmem0/poweron_secs/ [Kernel PMU event]
...
nmem1/mem_life/ [Kernel PMU event]
nmem1/poweron_secs/ [Kernel PMU event]
Patch1:
Introduces the nvdimm_pmu structure
Patch2:
Adds common interface to add arch/platform specific data
includes nvdimm device pointer, pmu data along with
pmu event functions. It also defines supported event list
and adds attribute groups for format, events and cpumask.
It also adds code for cpu hotplug support.
Patch3:
Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
capabilities, cpumask and event functions and then registers
the pmu by adding callbacks to register_nvdimm_pmu.
Patch4:
Sysfs documentation patch
Changelog
---
v4 -> v5:
- Remove multiple variables defined in nvdimm_pmu structure include
name and pmu functions(event_int/add/del/read) as they are just
used to copy them again in pmu variable. Now we are directly doing
this step in arch specific code as suggested by Dan Williams.
- Remove attribute group field from nvdimm pmu structure and
defined these attribute groups in common interface which
includes format, event list along with cpumask as suggested by
Dan Williams.
Since we added static defination for attrbute groups needed in
common interface, removes corresponding code from papr.
- Add nvdimm pmu event list with event codes in the common interface.
- Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
to handle review comments from Dan.
- Make nvdimm_pmu_free_hotplug_memory function static as reported
by kernel test robot, also add corresponding Reported-by tag.
- Link to the patchset v4: https://lkml.org/lkml/2021/9/3/45
v3 -> v4
- Rebase code on top of current papr_scm code without any logical
changes.
- Added Acked-by tag from Peter Zijlstra and Reviewed by tag
from Madhavan Srinivasan.
- Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605
v2 -> v3
- Added Tested-by tag.
- Fix nvdimm mailing list in the ABI Documentation.
- Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25
v1 -> v2
- Fix hotplug code by adding pmu migration call
incase current designated cpu got offline. As
pointed by Peter Zijlstra.
- Removed the retun -1 part from cpu hotplug offline
function.
- Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500
Kajol Jain (4):
drivers/nvdimm: Add nvdimm pmu structure
drivers/nvdimm: Add perf interface to expose nvdimm performance stats
powerpc/papr_scm: Add perf interface support
docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for
nvdimm pmu
Documentation/ABI/testing/sysfs-bus-nvdimm | 35 +++
arch/powerpc/include/asm/device.h | 5 +
arch/powerpc/platforms/pseries/papr_scm.c | 225 ++++++++++++++
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/nd_perf.c | 328 +++++++++++++++++++++
include/linux/nd.h | 41 +++
6 files changed, 635 insertions(+)
create mode 100644 drivers/nvdimm/nd_perf.c
--
2.26.2
^ permalink raw reply
* [PATCH] powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
From: Vasant Hegde @ 2021-09-28 12:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Vasant Hegde
Commit 587164cd, introduced new opal message type (OPAL_MSG_PRD2) and added
opal notifier. But I missed to unregister the notifier during module unload
path. This results in below call trace if you try to unload and load
opal_prd module.
Sample calltrace (modprobe -r opal_prd; modprobe opal_prd)
[ 213.335261] BUG: Unable to handle kernel data access on read at 0xc0080000192200e0
[ 213.335287] Faulting instruction address: 0xc00000000018d1cc
[ 213.335301] Oops: Kernel access of bad area, sig: 11 [#1]
[ 213.335313] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[ 213.335736] CPU: 66 PID: 7446 Comm: modprobe Kdump: loaded Tainted: G E 5.14.0prd #759
[ 213.335772] NIP: c00000000018d1cc LR: c00000000018d2a8 CTR: c0000000000cde10
[ 213.335805] REGS: c0000003c4c0f0a0 TRAP: 0300 Tainted: G E (5.14.0prd)
[ 213.335848] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 24224824 XER: 20040000
[ 213.335893] CFAR: c00000000018d2a4 DAR: c0080000192200e0 DSISR: 40000000 IRQMASK: 1
[ 213.335893] GPR00: c00000000018d2a8 c0000003c4c0f340 c000000001995300 c000000001a5ad08
[ 213.335893] GPR04: c00800000e3700d0 c0000003c4c0f434 c0000000010a8c08 6e616d6500000000
[ 213.335893] GPR08: 0000000000000000 c0080000192200d0 0000000000000001 c00800000e351020
[ 213.335893] GPR12: c0000000000cde10 c000000ffffecb80 c0000003c4c0fd00 0000000000000000
[ 213.335893] GPR16: 0000000000000990 c00800000d950000 c00800000d950990 c00000000103fd10
[ 213.335893] GPR20: c0000003c4c0fbc0 0000000000000001 c0000003c4c0fbc0 c00800000e370498
[ 213.335893] GPR24: 0000000000000000 c000000000dab5c8 c000000001a5ad18 c000000001a5ac80
[ 213.335893] GPR28: 0000000000000008 0000000000000001 c000000001a5ad00 c000000001a5ad00
[ 213.336170] NIP [c00000000018d1cc] notifier_chain_register+0x2c/0xc0
[ 213.336205] LR [c00000000018d2a8] atomic_notifier_chain_register+0x48/0x80
[ 213.336238] Call Trace:
[ 213.336255] [c0000003c4c0f340] [c000000002090610] 0xc000000002090610 (unreliable)
[ 213.336281] [c0000003c4c0f3a0] [c00000000018d2b8] atomic_notifier_chain_register+0x58/0x80
[ 213.336309] [c0000003c4c0f3f0] [c0000000000cde8c] opal_message_notifier_register+0x7c/0x1e0
[ 213.336345] [c0000003c4c0f4b0] [c00800000e3508ac] opal_prd_probe+0x84/0x150 [opal_prd]
[ 213.336382] [c0000003c4c0f530] [c00000000097acc8] platform_probe+0x78/0x130
[ 213.336416] [c0000003c4c0f5b0] [c000000000976520] really_probe+0x110/0x5d0
[ 213.336467] [c0000003c4c0f630] [c000000000976b5c] __driver_probe_device+0x17c/0x230
[ 213.336512] [c0000003c4c0f6b0] [c000000000976c70] driver_probe_device+0x60/0x130
[ 213.336556] [c0000003c4c0f700] [c00000000097746c] __driver_attach+0xfc/0x220
[ 213.336592] [c0000003c4c0f780] [c000000000972e68] bus_for_each_dev+0xa8/0x130
[ 213.336627] [c0000003c4c0f7e0] [c000000000975b04] driver_attach+0x34/0x50
[ 213.336661] [c0000003c4c0f800] [c000000000974e20] bus_add_driver+0x1b0/0x300
[ 213.336696] [c0000003c4c0f890] [c000000000978468] driver_register+0x98/0x1a0
[ 213.336732] [c0000003c4c0f900] [c00000000097a878] __platform_driver_register+0x38/0x50
[ 213.336768] [c0000003c4c0f920] [c00800000e350dc0] opal_prd_driver_init+0x34/0x50 [opal_prd]
[ 213.336804] [c0000003c4c0f940] [c000000000012410] do_one_initcall+0x60/0x2d0
[ 213.336821] [c0000003c4c0fa10] [c000000000262c8c] do_init_module+0x7c/0x320
[ 213.336845] [c0000003c4c0fa90] [c000000000266544] load_module+0x3394/0x3650
[ 213.336880] [c0000003c4c0fc90] [c000000000266b34] __do_sys_finit_module+0xd4/0x160
[ 213.336914] [c0000003c4c0fdb0] [c000000000031e00] system_call_exception+0x140/0x290
[ 213.336958] [c0000003c4c0fe10] [c00000000000c764] system_call_common+0xf4/0x258
Fixes: 587164cd ("powerpc/powernv: Add new opal message type")
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
arch/powerpc/platforms/powernv/opal-prd.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/powerpc/platforms/powernv/opal-prd.c b/arch/powerpc/platforms/powernv/opal-prd.c
index a191f4c60ce7..83305615db9e 100644
--- a/arch/powerpc/platforms/powernv/opal-prd.c
+++ b/arch/powerpc/platforms/powernv/opal-prd.c
@@ -393,6 +393,7 @@ static int opal_prd_probe(struct platform_device *pdev)
rc = opal_message_notifier_register(OPAL_MSG_PRD2, &opal_prd_event_nb);
if (rc) {
pr_err("Couldn't register PRD2 event notifier\n");
+ opal_message_notifier_unregister(OPAL_MSG_PRD, &opal_prd_event_nb);
return rc;
}
@@ -401,6 +402,8 @@ static int opal_prd_probe(struct platform_device *pdev)
pr_err("failed to register miscdev\n");
opal_message_notifier_unregister(OPAL_MSG_PRD,
&opal_prd_event_nb);
+ opal_message_notifier_unregister(OPAL_MSG_PRD2,
+ &opal_prd_event_nb);
return rc;
}
@@ -411,6 +414,7 @@ static int opal_prd_remove(struct platform_device *pdev)
{
misc_deregister(&opal_prd_dev);
opal_message_notifier_unregister(OPAL_MSG_PRD, &opal_prd_event_nb);
+ opal_message_notifier_unregister(OPAL_MSG_PRD2, &opal_prd_event_nb);
return 0;
}
--
2.31.1
^ permalink raw reply related
* Re: [PATCH v8 1/2] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Greg KH @ 2021-09-28 12:08 UTC (permalink / raw)
To: Pratik R. Sampat
Cc: farosas, pratik.r.sampat, linuxppc-dev, kvm-ppc, linux-kernel,
paulus, linux-kselftest, kjain, shuah
In-Reply-To: <20210928115102.57117-2-psampat@linux.ibm.com>
On Tue, Sep 28, 2021 at 05:21:01PM +0530, Pratik R. Sampat wrote:
> Adds a generic interface to represent the energy and frequency related
> PAPR attributes on the system using the new H_CALL
> "H_GET_ENERGY_SCALE_INFO".
>
> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> will be deprecated P10 onwards.
>
> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> hcall(
> uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
> uint64 flags, // Per the flag request
> uint64 firstAttributeId,// The attribute id
> uint64 bufferAddress, // Guest physical address of the output buffer
> uint64 bufferSize // The size in bytes of the output buffer
> );
>
> This H_CALL can query either all the attributes at once with
> firstAttributeId = 0, flags = 0 as well as query only one attribute
> at a time with firstAttributeId = id, flags = 1.
>
> The output buffer consists of the following
> 1. number of attributes - 8 bytes
> 2. array offset to the data location - 8 bytes
> 3. version info - 1 byte
> 4. A data array of size num attributes, which contains the following:
> a. attribute ID - 8 bytes
> b. attribute value in number - 8 bytes
> c. attribute name in string - 64 bytes
> d. attribute value in string - 64 bytes
>
> The new H_CALL exports information in direct string value format, hence
> a new interface has been introduced in
> /sys/firmware/papr/energy_scale_info to export this information to
> userspace in an extensible pass-through format.
>
> The H_CALL returns the name, numeric value and string value (if exists)
>
> The format of exposing the sysfs information is as follows:
> /sys/firmware/papr/energy_scale_info/
> |-- <id>/
> |-- desc
> |-- value
> |-- value_desc (if exists)
> |-- <id>/
> |-- desc
> |-- value
> |-- value_desc (if exists)
> ...
>
> The energy information that is exported is useful for userspace tools
> such as powerpc-utils. Currently these tools infer the
> "power_mode_data" value in the lparcfg, which in turn is obtained from
> the to be deprecated H_GET_EM_PARMS H_CALL.
> On future platforms, such userspace utilities will have to look at the
> data returned from the new H_CALL being populated in this new sysfs
> interface and report this information directly without the need of
> interpretation.
>
> Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
> Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
> ---
> .../sysfs-firmware-papr-energy-scale-info | 26 ++
> arch/powerpc/include/asm/hvcall.h | 24 +-
> arch/powerpc/kvm/trace_hv.h | 1 +
> arch/powerpc/platforms/pseries/Makefile | 3 +-
> .../pseries/papr_platform_attributes.c | 312 ++++++++++++++++++
> 5 files changed, 364 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
>
> diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> new file mode 100644
> index 000000000000..139a576c7c9d
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> @@ -0,0 +1,26 @@
> +What: /sys/firmware/papr/energy_scale_info
> +Date: June 2021
> +Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> +Description: Directory hosting a set of platform attributes like
> + energy/frequency on Linux running as a PAPR guest.
> +
> + Each file in a directory contains a platform
> + attribute hierarchy pertaining to performance/
> + energy-savings mode and processor frequency.
> +
> +What: /sys/firmware/papr/energy_scale_info/<id>
> + /sys/firmware/papr/energy_scale_info/<id>/desc
> + /sys/firmware/papr/energy_scale_info/<id>/value
> + /sys/firmware/papr/energy_scale_info/<id>/value_desc
> +Date: June 2021
> +Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> +Description: Energy, frequency attributes directory for POWERVM servers
> +
> + This directory provides energy, frequency, folding information. It
> + contains below sysfs attributes:
> +
> + - desc: String description of the attribute <id>
> +
> + - value: Numeric value of attribute <id>
> +
> + - value_desc: String value of attribute <id>
Can you just make 4 different entries in this file, making it easier to
parse and extend over time?
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
> index 9bcf345cb208..38980fef7a3d 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -323,7 +323,8 @@
> #define H_SCM_PERFORMANCE_STATS 0x418
> #define H_RPT_INVALIDATE 0x448
> #define H_SCM_FLUSH 0x44C
> -#define MAX_HCALL_OPCODE H_SCM_FLUSH
> +#define H_GET_ENERGY_SCALE_INFO 0x450
> +#define MAX_HCALL_OPCODE H_GET_ENERGY_SCALE_INFO
>
> /* Scope args for H_SCM_UNBIND_ALL */
> #define H_UNBIND_SCOPE_ALL (0x1)
> @@ -641,6 +642,27 @@ struct hv_gpci_request_buffer {
> uint8_t bytes[HGPCI_MAX_DATA_BYTES];
> } __packed;
>
> +#define ESI_VERSION 0x1
> +#define MAX_ESI_ATTRS 10
> +#define MAX_BUF_SZ (sizeof(struct h_energy_scale_info_hdr) + \
> + (sizeof(struct energy_scale_attribute) * MAX_ESI_ATTRS))
> +
> +struct energy_scale_attribute {
> + __be64 id;
> + __be64 value;
> + unsigned char desc[64];
> + unsigned char value_desc[64];
> +} __packed;
> +
> +struct h_energy_scale_info_hdr {
> + __be64 num_attrs;
> + __be64 array_offset;
> + __u8 data_header_version;
> +} __packed;
> +
> +/* /sys/firmware/papr */
> +extern struct kobject *papr_kobj;
> +
> #endif /* __ASSEMBLY__ */
> #endif /* __KERNEL__ */
> #endif /* _ASM_POWERPC_HVCALL_H */
> diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
> index 830a126e095d..38cd0ed0a617 100644
> --- a/arch/powerpc/kvm/trace_hv.h
> +++ b/arch/powerpc/kvm/trace_hv.h
> @@ -115,6 +115,7 @@
> {H_VASI_STATE, "H_VASI_STATE"}, \
> {H_ENABLE_CRQ, "H_ENABLE_CRQ"}, \
> {H_GET_EM_PARMS, "H_GET_EM_PARMS"}, \
> + {H_GET_ENERGY_SCALE_INFO, "H_GET_ENERGY_SCALE_INFO"}, \
> {H_SET_MPP, "H_SET_MPP"}, \
> {H_GET_MPP, "H_GET_MPP"}, \
> {H_HOME_NODE_ASSOCIATIVITY, "H_HOME_NODE_ASSOCIATIVITY"}, \
> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
> index 4cda0ef87be0..c4c19f6a5975 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -6,7 +6,8 @@ obj-y := lpar.o hvCall.o nvram.o reconfig.o \
> of_helpers.o \
> setup.o iommu.o event_sources.o ras.o \
> firmware.o power.o dlpar.o mobility.o rng.o \
> - pci.o pci_dlpar.o eeh_pseries.o msi.o
> + pci.o pci_dlpar.o eeh_pseries.o msi.o \
> + papr_platform_attributes.o
> obj-$(CONFIG_SMP) += smp.o
> obj-$(CONFIG_SCANLOG) += scanlog.o
> obj-$(CONFIG_KEXEC_CORE) += kexec.o
> diff --git a/arch/powerpc/platforms/pseries/papr_platform_attributes.c b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
> new file mode 100644
> index 000000000000..84ddce52e519
> --- /dev/null
> +++ b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
> @@ -0,0 +1,312 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Platform energy and frequency attributes driver
> + *
> + * This driver creates a sys file at /sys/firmware/papr/ which encapsulates a
> + * directory structure containing files in keyword - value pairs that specify
> + * energy and frequency configuration of the system.
> + *
> + * The format of exposing the sysfs information is as follows:
> + * /sys/firmware/papr/energy_scale_info/
> + * |-- <id>/
> + * |-- desc
> + * |-- value
> + * |-- value_desc (if exists)
> + * |-- <id>/
> + * |-- desc
> + * |-- value
> + * |-- value_desc (if exists)
> + *
> + * Copyright 2021 IBM Corp.
> + */
> +
> +#include <asm/hvcall.h>
> +#include <asm/machdep.h>
> +
> +#include "pseries.h"
> +
> +/*
> + * Flag attributes to fetch either all or one attribute from the HCALL
> + * flag = BE(0) => fetch all attributes with firstAttributeId = 0
> + * flag = BE(1) => fetch a single attribute with firstAttributeId = id
> + */
> +#define ESI_FLAGS_ALL 0
> +#define ESI_FLAGS_SINGLE PPC_BIT(0)
> +
> +#define MAX_ATTRS 3
> +
> +struct papr_attr {
> + u64 id;
> + struct kobj_attribute kobj_attr;
Why does an attribute have to be part of this structure?
> +};
> +struct papr_group {
> + struct attribute_group pg;
> + struct papr_attr pgattrs[MAX_ATTRS];
> +} *pgs;
> +
> +/* /sys/firmware/papr */
> +struct kobject *papr_kobj;
> +/* /sys/firmware/papr/energy_scale_info */
> +struct kobject *esi_kobj;
> +
> +/*
> + * Extract and export the description of the energy scale attributes
> + */
> +static ssize_t papr_show_desc(struct kobject *kobj,
> + struct kobj_attribute *kobj_attr,
> + char *buf)
> +{
> + struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
> + kobj_attr);
> + struct h_energy_scale_info_hdr *t_hdr;
> + struct energy_scale_attribute *t_esi;
> + char *t_buf;
> + int ret = 0;
> +
> + t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
> + if (t_buf == NULL)
> + return -ENOMEM;
> +
> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
> + pattr->id, virt_to_phys(t_buf),
> + MAX_BUF_SZ);
> +
> + if (ret != H_SUCCESS) {
> + pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> + goto out;
> + }
> +
> + t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
> + t_esi = (struct energy_scale_attribute *)
> + (t_buf + be64_to_cpu(t_hdr->array_offset));
> +
> + ret = snprintf(buf, sizeof(t_esi->desc), "%s\n", t_esi->desc);
> + if (ret < 0)
> + ret = -EIO;
> +out:
> + kfree(t_buf);
> +
> + return ret;
> +}
> +
> +/*
> + * Extract and export the numeric value of the energy scale attributes
> + */
> +static ssize_t papr_show_value(struct kobject *kobj,
> + struct kobj_attribute *kobj_attr,
> + char *buf)
> +{
> + struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
> + kobj_attr);
> + struct h_energy_scale_info_hdr *t_hdr;
> + struct energy_scale_attribute *t_esi;
> + char *t_buf;
> + int ret = 0;
> +
> + t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
> + if (t_buf == NULL)
> + return -ENOMEM;
> +
> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
> + pattr->id, virt_to_phys(t_buf),
> + MAX_BUF_SZ);
> +
> + if (ret != H_SUCCESS) {
> + pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> + goto out;
> + }
> +
> + t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
> + t_esi = (struct energy_scale_attribute *)
> + (t_buf + be64_to_cpu(t_hdr->array_offset));
> +
> + ret = snprintf(buf, sizeof(t_esi->value), "%llu\n",
> + be64_to_cpu(t_esi->value));
sysfs_emit() for when writing out to a sysfs file please. Same
elsewhere in this file.
> + if (ret < 0)
> + ret = -EIO;
> +out:
> + kfree(t_buf);
> +
> + return ret;
> +}
> +
> +/*
> + * Extract and export the value description in string format of the energy
> + * scale attributes
> + */
> +static ssize_t papr_show_value_desc(struct kobject *kobj,
> + struct kobj_attribute *kobj_attr,
> + char *buf)
> +{
> + struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
> + kobj_attr);
> + struct h_energy_scale_info_hdr *t_hdr;
> + struct energy_scale_attribute *t_esi;
> + char *t_buf;
> + int ret = 0;
> +
> + t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
> + if (t_buf == NULL)
> + return -ENOMEM;
> +
> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
> + pattr->id, virt_to_phys(t_buf),
> + MAX_BUF_SZ);
> +
> + if (ret != H_SUCCESS) {
> + pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> + goto out;
> + }
> +
> + t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
> + t_esi = (struct energy_scale_attribute *)
> + (t_buf + be64_to_cpu(t_hdr->array_offset));
> +
> + ret = snprintf(buf, sizeof(t_esi->value_desc), "%s\n",
> + t_esi->value_desc);
> + if (ret < 0)
> + ret = -EIO;
> +out:
> + kfree(t_buf);
> +
> + return ret;
> +}
> +
> +static struct papr_ops_info {
> + const char *attr_name;
> + ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *kobj_attr,
> + char *buf);
> +} ops_info[MAX_ATTRS] = {
> + { "desc", papr_show_desc },
> + { "value", papr_show_value },
> + { "value_desc", papr_show_value_desc },
What is wrong with just using the __ATTR_RO() macro and then having an
array of attributes in a single group? That should be a lot simpler
overall, right?
> +};
> +
> +static void add_attr(u64 id, int index, struct papr_attr *attr)
> +{
> + attr->id = id;
> + sysfs_attr_init(&attr->kobj_attr.attr);
> + attr->kobj_attr.attr.name = ops_info[index].attr_name;
> + attr->kobj_attr.attr.mode = 0444;
> + attr->kobj_attr.show = ops_info[index].show;
If you do the above, no need for this function at all.
> +}
> +
> +static int add_attr_group(u64 id, struct papr_group *pg, bool show_val_desc)
> +{
> + int i;
> +
> + for (i = 0; i < MAX_ATTRS; i++) {
> + if (!strcmp(ops_info[i].attr_name, "value_desc") &&
> + !show_val_desc) {
> + continue;
> + }
> + add_attr(id, i, &pg->pgattrs[i]);
> + pg->pg.attrs[i] = &pg->pgattrs[i].kobj_attr.attr;
> + }
> +
> + return sysfs_create_group(esi_kobj, &pg->pg);
Again, if you just have a list of attributes, there's no need for this
function either.
I think this can be a lot simpler than you are currently making it.
thanks,
greg k-h
^ permalink raw reply
* [PATCH v8 0/2] Interface to represent PAPR firmware attributes
From: Pratik R. Sampat @ 2021-09-28 11:51 UTC (permalink / raw)
To: mpe, benh, paulus, shuah, farosas, kjain, linuxppc-dev, kvm-ppc,
linux-kselftest, linux-kernel, psampat, pratik.r.sampat
RFC: https://lkml.org/lkml/2021/6/4/791
PATCH v1: https://lkml.org/lkml/2021/6/16/805
PATCH v2: https://lkml.org/lkml/2021/7/6/138
PATCH v3: https://lkml.org/lkml/2021/7/12/2799
PATCH v4: https://lkml.org/lkml/2021/7/16/532
PATCH v5: https://lkml.org/lkml/2021/7/19/247
PATCH v6: https://lkml.org/lkml/2021/7/20/36
PATCH v7: https://lkml.org/lkml/2021/7/23/26
Changelog v7-->v8
1. Rebased and tested against 5.15
2. Added a selftest to check if the energy and frequency attribues
exist and their files populated
Also, have implemented a POC using this interface for the powerpc-utils'
ppc64_cpu --frequency command-line tool to utilize this information
in userspace.
The POC for the new interface has been sent to the powerpc-utils mailing
list for early review: https://groups.google.com/g/powerpc-utils-devel/c/r4i7JnlyQ8s
Sample output from the powerpc-utils tool is as follows:
# ppc64_cpu --frequency
Power and Performance Mode: XXXX
Idle Power Saver Status : XXXX
Processor Folding Status : XXXX --> Printed if Idle power save status is supported
Platform reported frequencies --> Frequencies reported from the platform's H_CALL i.e PAPR interface
min : NNNN GHz
max : NNNN GHz
static : NNNN GHz
Tool Computed frequencies
min : NNNN GHz (cpu XX)
max : NNNN GHz (cpu XX)
avg : NNNN GHz
Pratik R. Sampat (2):
powerpc/pseries: Interface to represent PAPR firmware attributes
selftest/powerpc: Add PAPR sysfs attributes sniff test
.../sysfs-firmware-papr-energy-scale-info | 26 ++
arch/powerpc/include/asm/hvcall.h | 24 +-
arch/powerpc/kvm/trace_hv.h | 1 +
arch/powerpc/platforms/pseries/Makefile | 3 +-
.../pseries/papr_platform_attributes.c | 312 ++++++++++++++++++
tools/testing/selftests/powerpc/Makefile | 1 +
.../powerpc/papr_attributes/.gitignore | 2 +
.../powerpc/papr_attributes/Makefile | 7 +
.../powerpc/papr_attributes/attr_test.c | 107 ++++++
9 files changed, 481 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/.gitignore
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/Makefile
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/attr_test.c
--
2.31.1
^ permalink raw reply
* [PATCH v8 2/2] selftest/powerpc: Add PAPR sysfs attributes sniff test
From: Pratik R. Sampat @ 2021-09-28 11:51 UTC (permalink / raw)
To: mpe, benh, paulus, shuah, farosas, kjain, linuxppc-dev, kvm-ppc,
linux-kselftest, linux-kernel, psampat, pratik.r.sampat
In-Reply-To: <20210928115102.57117-1-psampat@linux.ibm.com>
Include a testcase to check if the sysfs files for energy and frequency
related have its related attribute files exist and populated
Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
---
tools/testing/selftests/powerpc/Makefile | 1 +
.../powerpc/papr_attributes/.gitignore | 2 +
.../powerpc/papr_attributes/Makefile | 7 ++
.../powerpc/papr_attributes/attr_test.c | 107 ++++++++++++++++++
4 files changed, 117 insertions(+)
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/.gitignore
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/Makefile
create mode 100644 tools/testing/selftests/powerpc/papr_attributes/attr_test.c
diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile
index 0830e63818c1..c68c872efb23 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -30,6 +30,7 @@ SUB_DIRS = alignment \
eeh \
vphn \
math \
+ papr_attributes \
ptrace \
security
diff --git a/tools/testing/selftests/powerpc/papr_attributes/.gitignore b/tools/testing/selftests/powerpc/papr_attributes/.gitignore
new file mode 100644
index 000000000000..9c8cb54c8b28
--- /dev/null
+++ b/tools/testing/selftests/powerpc/papr_attributes/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+attr_test
\ No newline at end of file
diff --git a/tools/testing/selftests/powerpc/papr_attributes/Makefile b/tools/testing/selftests/powerpc/papr_attributes/Makefile
new file mode 100644
index 000000000000..135886f200ad
--- /dev/null
+++ b/tools/testing/selftests/powerpc/papr_attributes/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_GEN_PROGS := attr_test
+
+top_srcdir = ../../../../..
+include ../../lib.mk
+
+$(TEST_GEN_PROGS): ../harness.c ../utils.c
diff --git a/tools/testing/selftests/powerpc/papr_attributes/attr_test.c b/tools/testing/selftests/powerpc/papr_attributes/attr_test.c
new file mode 100644
index 000000000000..905e2cbb3863
--- /dev/null
+++ b/tools/testing/selftests/powerpc/papr_attributes/attr_test.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * PAPR Energy attributes sniff test
+ * This checks if the papr folders and contents are populated relating to
+ * the energy and frequency attributes
+ *
+ * Copyright 2021, Pratik Rajesh Sampat, IBM Corp.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <dirent.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+#include "utils.h"
+
+enum energy_freq_attrs {
+ POWER_PERFORMANCE_MODE = 1,
+ IDLE_POWER_SAVER_STATUS = 2,
+ MIN_FREQ = 3,
+ STAT_FREQ = 4,
+ MAX_FREQ = 6,
+ PROC_FOLDING_STATUS = 8
+};
+
+enum type {
+ INVALID,
+ STR_VAL,
+ NUM_VAL
+};
+
+int value_type(int id)
+{
+ int val_type;
+
+ switch(id) {
+ case POWER_PERFORMANCE_MODE:
+ case IDLE_POWER_SAVER_STATUS:
+ val_type = STR_VAL;
+ break;
+ case MIN_FREQ:
+ case STAT_FREQ:
+ case MAX_FREQ:
+ case PROC_FOLDING_STATUS:
+ val_type = NUM_VAL;
+ break;
+ default:
+ val_type = INVALID;
+ }
+
+ return val_type;
+}
+
+int verify_energy_info()
+{
+ const char *path = "/sys/firmware/papr/energy_scale_info";
+ struct dirent *entry;
+ struct stat s;
+ DIR *dirp;
+
+ if (stat(path, &s) || !S_ISDIR(s.st_mode))
+ return -1;
+ dirp = opendir(path);
+
+ while ((entry = readdir(dirp)) != NULL) {
+ char file_name[64];
+ int id, attr_type;
+ FILE *f;
+
+ if (strcmp(entry->d_name,".") == 0 ||
+ strcmp(entry->d_name,"..") == 0)
+ continue;
+
+ id = atoi(entry->d_name);
+ attr_type = value_type(id);
+ if (attr_type == INVALID)
+ return -1;
+
+ /* Check if the files exist and have data in them */
+ sprintf(file_name, "%s/%d/desc", path, id);
+ f = fopen(file_name, "r");
+ if (!f || fgetc(f) == EOF)
+ return -1;
+
+ sprintf(file_name, "%s/%d/value", path, id);
+ f = fopen(file_name, "r");
+ if (!f || fgetc(f) == EOF)
+ return -1;
+
+ if (attr_type == STR_VAL) {
+ sprintf(file_name, "%s/%d/value_desc", path, id);
+ f = fopen(file_name, "r");
+ if (!f || fgetc(f) == EOF)
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+int main(void)
+{
+ return test_harness(verify_energy_info, "papr_attributes");
+}
--
2.31.1
^ permalink raw reply related
* [PATCH v8 1/2] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Pratik R. Sampat @ 2021-09-28 11:51 UTC (permalink / raw)
To: mpe, benh, paulus, shuah, farosas, kjain, linuxppc-dev, kvm-ppc,
linux-kselftest, linux-kernel, psampat, pratik.r.sampat
In-Reply-To: <20210928115102.57117-1-psampat@linux.ibm.com>
Adds a generic interface to represent the energy and frequency related
PAPR attributes on the system using the new H_CALL
"H_GET_ENERGY_SCALE_INFO".
H_GET_EM_PARMS H_CALL was previously responsible for exporting this
information in the lparcfg, however the H_GET_EM_PARMS H_CALL
will be deprecated P10 onwards.
The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
hcall(
uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
uint64 flags, // Per the flag request
uint64 firstAttributeId,// The attribute id
uint64 bufferAddress, // Guest physical address of the output buffer
uint64 bufferSize // The size in bytes of the output buffer
);
This H_CALL can query either all the attributes at once with
firstAttributeId = 0, flags = 0 as well as query only one attribute
at a time with firstAttributeId = id, flags = 1.
The output buffer consists of the following
1. number of attributes - 8 bytes
2. array offset to the data location - 8 bytes
3. version info - 1 byte
4. A data array of size num attributes, which contains the following:
a. attribute ID - 8 bytes
b. attribute value in number - 8 bytes
c. attribute name in string - 64 bytes
d. attribute value in string - 64 bytes
The new H_CALL exports information in direct string value format, hence
a new interface has been introduced in
/sys/firmware/papr/energy_scale_info to export this information to
userspace in an extensible pass-through format.
The H_CALL returns the name, numeric value and string value (if exists)
The format of exposing the sysfs information is as follows:
/sys/firmware/papr/energy_scale_info/
|-- <id>/
|-- desc
|-- value
|-- value_desc (if exists)
|-- <id>/
|-- desc
|-- value
|-- value_desc (if exists)
...
The energy information that is exported is useful for userspace tools
such as powerpc-utils. Currently these tools infer the
"power_mode_data" value in the lparcfg, which in turn is obtained from
the to be deprecated H_GET_EM_PARMS H_CALL.
On future platforms, such userspace utilities will have to look at the
data returned from the new H_CALL being populated in this new sysfs
interface and report this information directly without the need of
interpretation.
Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
---
.../sysfs-firmware-papr-energy-scale-info | 26 ++
arch/powerpc/include/asm/hvcall.h | 24 +-
arch/powerpc/kvm/trace_hv.h | 1 +
arch/powerpc/platforms/pseries/Makefile | 3 +-
.../pseries/papr_platform_attributes.c | 312 ++++++++++++++++++
5 files changed, 364 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
new file mode 100644
index 000000000000..139a576c7c9d
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
@@ -0,0 +1,26 @@
+What: /sys/firmware/papr/energy_scale_info
+Date: June 2021
+Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
+Description: Directory hosting a set of platform attributes like
+ energy/frequency on Linux running as a PAPR guest.
+
+ Each file in a directory contains a platform
+ attribute hierarchy pertaining to performance/
+ energy-savings mode and processor frequency.
+
+What: /sys/firmware/papr/energy_scale_info/<id>
+ /sys/firmware/papr/energy_scale_info/<id>/desc
+ /sys/firmware/papr/energy_scale_info/<id>/value
+ /sys/firmware/papr/energy_scale_info/<id>/value_desc
+Date: June 2021
+Contact: Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
+Description: Energy, frequency attributes directory for POWERVM servers
+
+ This directory provides energy, frequency, folding information. It
+ contains below sysfs attributes:
+
+ - desc: String description of the attribute <id>
+
+ - value: Numeric value of attribute <id>
+
+ - value_desc: String value of attribute <id>
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 9bcf345cb208..38980fef7a3d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -323,7 +323,8 @@
#define H_SCM_PERFORMANCE_STATS 0x418
#define H_RPT_INVALIDATE 0x448
#define H_SCM_FLUSH 0x44C
-#define MAX_HCALL_OPCODE H_SCM_FLUSH
+#define H_GET_ENERGY_SCALE_INFO 0x450
+#define MAX_HCALL_OPCODE H_GET_ENERGY_SCALE_INFO
/* Scope args for H_SCM_UNBIND_ALL */
#define H_UNBIND_SCOPE_ALL (0x1)
@@ -641,6 +642,27 @@ struct hv_gpci_request_buffer {
uint8_t bytes[HGPCI_MAX_DATA_BYTES];
} __packed;
+#define ESI_VERSION 0x1
+#define MAX_ESI_ATTRS 10
+#define MAX_BUF_SZ (sizeof(struct h_energy_scale_info_hdr) + \
+ (sizeof(struct energy_scale_attribute) * MAX_ESI_ATTRS))
+
+struct energy_scale_attribute {
+ __be64 id;
+ __be64 value;
+ unsigned char desc[64];
+ unsigned char value_desc[64];
+} __packed;
+
+struct h_energy_scale_info_hdr {
+ __be64 num_attrs;
+ __be64 array_offset;
+ __u8 data_header_version;
+} __packed;
+
+/* /sys/firmware/papr */
+extern struct kobject *papr_kobj;
+
#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_HVCALL_H */
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index 830a126e095d..38cd0ed0a617 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -115,6 +115,7 @@
{H_VASI_STATE, "H_VASI_STATE"}, \
{H_ENABLE_CRQ, "H_ENABLE_CRQ"}, \
{H_GET_EM_PARMS, "H_GET_EM_PARMS"}, \
+ {H_GET_ENERGY_SCALE_INFO, "H_GET_ENERGY_SCALE_INFO"}, \
{H_SET_MPP, "H_SET_MPP"}, \
{H_GET_MPP, "H_GET_MPP"}, \
{H_HOME_NODE_ASSOCIATIVITY, "H_HOME_NODE_ASSOCIATIVITY"}, \
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..c4c19f6a5975 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,8 @@ obj-y := lpar.o hvCall.o nvram.o reconfig.o \
of_helpers.o \
setup.o iommu.o event_sources.o ras.o \
firmware.o power.o dlpar.o mobility.o rng.o \
- pci.o pci_dlpar.o eeh_pseries.o msi.o
+ pci.o pci_dlpar.o eeh_pseries.o msi.o \
+ papr_platform_attributes.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_SCANLOG) += scanlog.o
obj-$(CONFIG_KEXEC_CORE) += kexec.o
diff --git a/arch/powerpc/platforms/pseries/papr_platform_attributes.c b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
new file mode 100644
index 000000000000..84ddce52e519
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr_platform_attributes.c
@@ -0,0 +1,312 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Platform energy and frequency attributes driver
+ *
+ * This driver creates a sys file at /sys/firmware/papr/ which encapsulates a
+ * directory structure containing files in keyword - value pairs that specify
+ * energy and frequency configuration of the system.
+ *
+ * The format of exposing the sysfs information is as follows:
+ * /sys/firmware/papr/energy_scale_info/
+ * |-- <id>/
+ * |-- desc
+ * |-- value
+ * |-- value_desc (if exists)
+ * |-- <id>/
+ * |-- desc
+ * |-- value
+ * |-- value_desc (if exists)
+ *
+ * Copyright 2021 IBM Corp.
+ */
+
+#include <asm/hvcall.h>
+#include <asm/machdep.h>
+
+#include "pseries.h"
+
+/*
+ * Flag attributes to fetch either all or one attribute from the HCALL
+ * flag = BE(0) => fetch all attributes with firstAttributeId = 0
+ * flag = BE(1) => fetch a single attribute with firstAttributeId = id
+ */
+#define ESI_FLAGS_ALL 0
+#define ESI_FLAGS_SINGLE PPC_BIT(0)
+
+#define MAX_ATTRS 3
+
+struct papr_attr {
+ u64 id;
+ struct kobj_attribute kobj_attr;
+};
+struct papr_group {
+ struct attribute_group pg;
+ struct papr_attr pgattrs[MAX_ATTRS];
+} *pgs;
+
+/* /sys/firmware/papr */
+struct kobject *papr_kobj;
+/* /sys/firmware/papr/energy_scale_info */
+struct kobject *esi_kobj;
+
+/*
+ * Extract and export the description of the energy scale attributes
+ */
+static ssize_t papr_show_desc(struct kobject *kobj,
+ struct kobj_attribute *kobj_attr,
+ char *buf)
+{
+ struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
+ kobj_attr);
+ struct h_energy_scale_info_hdr *t_hdr;
+ struct energy_scale_attribute *t_esi;
+ char *t_buf;
+ int ret = 0;
+
+ t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
+ if (t_buf == NULL)
+ return -ENOMEM;
+
+ ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
+ pattr->id, virt_to_phys(t_buf),
+ MAX_BUF_SZ);
+
+ if (ret != H_SUCCESS) {
+ pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
+ goto out;
+ }
+
+ t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
+ t_esi = (struct energy_scale_attribute *)
+ (t_buf + be64_to_cpu(t_hdr->array_offset));
+
+ ret = snprintf(buf, sizeof(t_esi->desc), "%s\n", t_esi->desc);
+ if (ret < 0)
+ ret = -EIO;
+out:
+ kfree(t_buf);
+
+ return ret;
+}
+
+/*
+ * Extract and export the numeric value of the energy scale attributes
+ */
+static ssize_t papr_show_value(struct kobject *kobj,
+ struct kobj_attribute *kobj_attr,
+ char *buf)
+{
+ struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
+ kobj_attr);
+ struct h_energy_scale_info_hdr *t_hdr;
+ struct energy_scale_attribute *t_esi;
+ char *t_buf;
+ int ret = 0;
+
+ t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
+ if (t_buf == NULL)
+ return -ENOMEM;
+
+ ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
+ pattr->id, virt_to_phys(t_buf),
+ MAX_BUF_SZ);
+
+ if (ret != H_SUCCESS) {
+ pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
+ goto out;
+ }
+
+ t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
+ t_esi = (struct energy_scale_attribute *)
+ (t_buf + be64_to_cpu(t_hdr->array_offset));
+
+ ret = snprintf(buf, sizeof(t_esi->value), "%llu\n",
+ be64_to_cpu(t_esi->value));
+ if (ret < 0)
+ ret = -EIO;
+out:
+ kfree(t_buf);
+
+ return ret;
+}
+
+/*
+ * Extract and export the value description in string format of the energy
+ * scale attributes
+ */
+static ssize_t papr_show_value_desc(struct kobject *kobj,
+ struct kobj_attribute *kobj_attr,
+ char *buf)
+{
+ struct papr_attr *pattr = container_of(kobj_attr, struct papr_attr,
+ kobj_attr);
+ struct h_energy_scale_info_hdr *t_hdr;
+ struct energy_scale_attribute *t_esi;
+ char *t_buf;
+ int ret = 0;
+
+ t_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
+ if (t_buf == NULL)
+ return -ENOMEM;
+
+ ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_SINGLE,
+ pattr->id, virt_to_phys(t_buf),
+ MAX_BUF_SZ);
+
+ if (ret != H_SUCCESS) {
+ pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
+ goto out;
+ }
+
+ t_hdr = (struct h_energy_scale_info_hdr *) t_buf;
+ t_esi = (struct energy_scale_attribute *)
+ (t_buf + be64_to_cpu(t_hdr->array_offset));
+
+ ret = snprintf(buf, sizeof(t_esi->value_desc), "%s\n",
+ t_esi->value_desc);
+ if (ret < 0)
+ ret = -EIO;
+out:
+ kfree(t_buf);
+
+ return ret;
+}
+
+static struct papr_ops_info {
+ const char *attr_name;
+ ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *kobj_attr,
+ char *buf);
+} ops_info[MAX_ATTRS] = {
+ { "desc", papr_show_desc },
+ { "value", papr_show_value },
+ { "value_desc", papr_show_value_desc },
+};
+
+static void add_attr(u64 id, int index, struct papr_attr *attr)
+{
+ attr->id = id;
+ sysfs_attr_init(&attr->kobj_attr.attr);
+ attr->kobj_attr.attr.name = ops_info[index].attr_name;
+ attr->kobj_attr.attr.mode = 0444;
+ attr->kobj_attr.show = ops_info[index].show;
+}
+
+static int add_attr_group(u64 id, struct papr_group *pg, bool show_val_desc)
+{
+ int i;
+
+ for (i = 0; i < MAX_ATTRS; i++) {
+ if (!strcmp(ops_info[i].attr_name, "value_desc") &&
+ !show_val_desc) {
+ continue;
+ }
+ add_attr(id, i, &pg->pgattrs[i]);
+ pg->pg.attrs[i] = &pg->pgattrs[i].kobj_attr.attr;
+ }
+
+ return sysfs_create_group(esi_kobj, &pg->pg);
+}
+
+static int __init papr_init(void)
+{
+ struct h_energy_scale_info_hdr *esi_hdr;
+ struct energy_scale_attribute *esi_attrs;
+ uint64_t num_attrs;
+ int ret, idx, i;
+ char *esi_buf;
+
+ if (!firmware_has_feature(FW_FEATURE_LPAR))
+ return -ENXIO;
+
+ esi_buf = kmalloc(MAX_BUF_SZ, GFP_KERNEL);
+ if (esi_buf == NULL)
+ return -ENOMEM;
+ /*
+ * hcall(
+ * uint64 H_GET_ENERGY_SCALE_INFO, // Get energy scale info
+ * uint64 flags, // Per the flag request
+ * uint64 firstAttributeId, // The attribute id
+ * uint64 bufferAddress, // Guest physical address of the output buffer
+ * uint64 bufferSize); // The size in bytes of the output buffer
+ */
+ ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, ESI_FLAGS_ALL, 0,
+ virt_to_phys(esi_buf), MAX_BUF_SZ);
+ if (ret != H_SUCCESS) {
+ pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
+ goto out;
+ }
+
+ esi_hdr = (struct h_energy_scale_info_hdr *) esi_buf;
+ if (esi_hdr->data_header_version != ESI_VERSION) {
+ pr_warn("H_GET_ENERGY_SCALE_INFO VER MISMATCH - EXP: 0x%x, REC: 0x%x",
+ ESI_VERSION, esi_hdr->data_header_version);
+ }
+
+ num_attrs = be64_to_cpu(esi_hdr->num_attrs);
+ esi_attrs = (struct energy_scale_attribute *)
+ (esi_buf + be64_to_cpu(esi_hdr->array_offset));
+
+ pgs = kcalloc(num_attrs, sizeof(*pgs), GFP_KERNEL);
+ if (!pgs)
+ goto out;
+
+ papr_kobj = kobject_create_and_add("papr", firmware_kobj);
+ if (!papr_kobj) {
+ pr_warn("kobject_create_and_add papr failed\n");
+ goto out_pgs;
+ }
+
+ esi_kobj = kobject_create_and_add("energy_scale_info", papr_kobj);
+ if (!esi_kobj) {
+ pr_warn("kobject_create_and_add energy_scale_info failed\n");
+ goto out_kobj;
+ }
+
+ for (idx = 0; idx < num_attrs; idx++) {
+ bool show_val_desc = true;
+
+ pgs[idx].pg.attrs = kcalloc(MAX_ATTRS + 1,
+ sizeof(*pgs[idx].pg.attrs),
+ GFP_KERNEL);
+ if (!pgs[idx].pg.attrs) {
+ goto out_pgattrs;
+ }
+
+ pgs[idx].pg.name = kasprintf(GFP_KERNEL, "%lld",
+ be64_to_cpu(esi_attrs[idx].id));
+ if (pgs[idx].pg.name == NULL) {
+ goto out_pgattrs;
+ }
+ /* Do not add the value description if it does not exist */
+ if (strnlen(esi_attrs[idx].value_desc,
+ sizeof(esi_attrs[idx].value_desc)) == 0)
+ show_val_desc = false;
+
+ if (add_attr_group(be64_to_cpu(esi_attrs[idx].id), &pgs[idx],
+ show_val_desc)) {
+ pr_warn("Failed to create papr attribute group %s\n",
+ pgs[idx].pg.name);
+ goto out_pgattrs;
+ }
+ }
+
+ kfree(esi_buf);
+ return 0;
+
+out_pgattrs:
+ for (i = 0; i < idx ; i++) {
+ kfree(pgs[i].pg.attrs);
+ kfree(pgs[i].pg.name);
+ }
+ kobject_put(esi_kobj);
+out_kobj:
+ kobject_put(papr_kobj);
+out_pgs:
+ kfree(pgs);
+out:
+ kfree(esi_buf);
+
+ return -ENOMEM;
+}
+
+machine_device_initcall(pseries, papr_init);
--
2.31.1
^ permalink raw reply related
* Re: [PATCH v4 4/8] PCI: replace pci_dev::driver usage that gets the driver name
From: Uwe Kleine-König @ 2021-09-28 10:31 UTC (permalink / raw)
To: Simon Horman
Cc: Alexander Duyck, oss-drivers, Paul Mackerras, Herbert Xu,
Ido Schimmel, Rafał Miłecki, Jesse Brandeburg,
Bjorn Helgaas, linux-pci, Jakub Kicinski, Yisen Zhuang,
Uwe Kleine-König, Vadym Kochan, Michael Buesch, Jiri Pirko,
Salil Mehta, netdev, linux-wireless, linux-kernel, Taras Chornyi,
Zhou Wang, linux-crypto, kernel, Oliver O'Halloran,
linuxppc-dev, David S. Miller
In-Reply-To: <20210928100127.GA16801@corigine.com>
[-- Attachment #1: Type: text/plain, Size: 1983 bytes --]
On Tue, Sep 28, 2021 at 12:01:28PM +0200, Simon Horman wrote:
> On Mon, Sep 27, 2021 at 10:43:22PM +0200, Uwe Kleine-König wrote:
> > From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> >
> > struct pci_dev::driver holds (apart from a constant offset) the same
> > data as struct pci_dev::dev->driver. With the goal to remove struct
> > pci_dev::driver to get rid of data duplication replace getting the
> > driver name by dev_driver_string() which implicitly makes use of struct
> > pci_dev::dev->driver.
> >
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>
> ...
>
> > diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> > index 0685ece1f155..23dfb599c828 100644
> > --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> > +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> > @@ -202,7 +202,7 @@ nfp_get_drvinfo(struct nfp_app *app, struct pci_dev *pdev,
> > {
> > char nsp_version[ETHTOOL_FWVERS_LEN] = {};
> >
> > - strlcpy(drvinfo->driver, pdev->driver->name, sizeof(drvinfo->driver));
> > + strlcpy(drvinfo->driver, dev_driver_string(&pdev->dev), sizeof(drvinfo->driver));
>
> I'd slightly prefer to maintain lines under 80 columns wide.
> But not nearly strongly enough to engage in a long debate about it.
:-)
Looking at the output of
git grep strlcpy.\*sizeof
I wonder if it would be sensible to introduce something like
#define strlcpy_array(arr, src) (strlcpy(arr, src, sizeof(arr)) + __must_be_array(arr))
but not sure this is possible without a long debate either (and this
line is over 80 chars wide, too :-).
> In any case, for the NFP portion of this patch.
>
> Acked-by: Simon Horman <simon.horman@corigine.com>
Thanks
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | https://www.pengutronix.de/ |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [next-20210827][ppc][multipathd] INFO: task hung in dm_table_add_target
From: Abdul Haleem @ 2021-09-28 10:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: axboe, sachinp, jack, linux-scsi, linux-kernel, dm-devel,
linux-next, dougmill, Brian King, linuxppc-dev
In-Reply-To: <20210901133618.GA16687@lst.de>
[-- Attachment #1: Type: text/plain, Size: 586 bytes --]
On 9/1/21 7:06 PM, Christoph Hellwig wrote:
> On Wed, Sep 01, 2021 at 04:47:26PM +0530, Abdul Haleem wrote:
>> Greeting's
>>
>> multiple task hung while adding the vfc disk back to the multipath on my
>> powerpc box running linux-next kernel
> Can you retry to reproduce this with lockdep enabled to see if there
> is anything interesting holding this lock?
LOCKDEP was earlier enabled by default
# cat .config | grep LOCKDEP
CONFIG_LOCKDEP_SUPPORT=y
BTW, Recreated again on 5.15.0-rc2 mainline kernel and attaching the logs
--
Regard's
Abdul Haleem
IBM Linux Technology Center
[-- Attachment #2: hungtasklogs.txt --]
[-- Type: text/plain, Size: 26095 bytes --]
device-mapper: multipath: 253:1: Reinstating path 8:16.
device-mapper: multipath: 253:0: Failing path 8:0.
device-mapper: multipath: 253:0: Failing path 8:32.
device-mapper: multipath: 253:0: Failing path 8:192.
device-mapper: multipath: 253:0: Failing path 8:208.
device-mapper: multipath: 253:0: Reinstating path 8:0.
device-mapper: multipath: 253:0: Reinstating path 8:32.
device-mapper: multipath: 253:0: Reinstating path 8:192.
device-mapper: multipath: 253:0: Reinstating path 8:208.
INFO: task multipathd:881519 blocked for more than 122 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:multipathd state:D stack: 0 pid:881519 ppid: 1 flags:0x00040082
Call Trace:
[c000000096eff2b0] [c0000000ae18dd10] 0xc0000000ae18dd10 (unreliable)
[c000000096eff4a0] [c00000000001ea68] __switch_to+0x288/0x4a0
[c000000096eff500] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c000000096eff5c0] [c000000000e08348] schedule+0x68/0x120
[c000000096eff5f0] [c000000000e08930] schedule_preempt_disabled+0x20/0x30
[c000000096eff610] [c000000000e0aedc] __mutex_lock.isra.11+0x36c/0x700
[c000000096eff6a0] [c000000000788e0c] bd_link_disk_holder+0x3c/0x280
[c000000096eff6f0] [c008000000fb5848] dm_get_table_device+0x1f0/0x2d0 [dm_mod]
[c000000096eff790] [c008000000fb9ce8] dm_get_device+0x130/0x2f0 [dm_mod]
[c000000096eff840] [c0080000011553b4] multipath_ctr+0x9cc/0x1000 [dm_multipath]
[c000000096eff9c0] [c008000000fba704] dm_table_add_target+0x1ac/0x420 [dm_mod]
[c000000096effa80] [c008000000fc0a04] table_load+0x16c/0x4c0 [dm_mod]
[c000000096effb30] [c008000000fc3734] ctl_ioctl+0x28c/0x7e0 [dm_mod]
[c000000096effd40] [c008000000fc3ca8] dm_ctl_ioctl+0x20/0x40 [dm_mod]
[c000000096effd60] [c000000000545db8] sys_ioctl+0xf8/0x150
[c000000096effdb0] [c000000000031074] system_call_exception+0x174/0x370
[c000000096effe10] [c00000000000c74c] system_call_common+0xec/0x250
--- interrupt: c00 at 0x7fffb86ac010
NIP: 00007fffb86ac010 LR: 00007fffb8a86924 CTR: 0000000000000000
REGS: c000000096effe80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 24042204 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000036 00007fffb7cec3a0 00007fffb8797300 0000000000000005
GPR04: 00000000c138fd09 00007fffb0069c90 00007fffb8a8a118 00007fffb7cea298
GPR08: 0000000000000005 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffb7cf6300 00007fffb0069c90 00007fffb8a89e80
GPR16: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8ac3670 0000000000000000
GPR20: 00007fffb8ac2040 00007fffb8a93460 00007fffb0069cc0 000001001f65ab80
GPR24: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8a89e80 0000000000000000
GPR28: 00007fffb8a89e80 00007fffb8a89e80 0000000000000000 00007fffb8a89e80
NIP [00007fffb86ac010] 0x7fffb86ac010
LR [00007fffb8a86924] 0x7fffb8a86924
--- interrupt: c00
INFO: task systemd-udevd:881738 blocked for more than 122 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd-udevd state:D stack: 0 pid:881738 ppid: 708 flags:0x00042482
Call Trace:
[c0000006b317b280] [c0000000007640a4] bio_associate_blkg+0x44/0xb0 (unreliable)
[c0000006b317b470] [c00000000001ea68] __switch_to+0x288/0x4a0
[c0000006b317b4d0] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c0000006b317b590] [c000000000e08348] schedule+0x68/0x120
[c0000006b317b5c0] [c000000000e08adc] io_schedule+0x2c/0x50
[c0000006b317b5f0] [c0000000003ea624] __lock_page+0x1e4/0x430
[c0000006b317b6d0] [c000000000407fc8] truncate_inode_pages_range+0x338/0x8b0
[c0000006b317b850] [c000000000725714] kill_bdev.isra.14+0x44/0x60
[c0000006b317b880] [c0000000007261f4] blkdev_flush_mapping+0x54/0x260
[c0000006b317b960] [c000000000726488] blkdev_put_whole+0x88/0x90
[c0000006b317b9a0] [c00000000072714c] blkdev_put+0x1cc/0x280
[c0000006b317ba00] [c000000000727e9c] blkdev_close+0x3c/0x60
[c0000006b317ba30] [c000000000525694] __fput+0xc4/0x350
[c0000006b317ba80] [c000000000191128] task_work_run+0xf8/0x170
[c0000006b317bad0] [c000000000161c34] do_exit+0x4a4/0xd30
[c0000006b317bba0] [c000000000162594] do_group_exit+0x64/0xe0
[c0000006b317bbe0] [c000000000177fb8] get_signal+0x258/0xce0
[c0000006b317bcd0] [c0000000000219d4] do_notify_resume+0x114/0x480
[c0000006b317bd80] [c000000000030e40] interrupt_exit_user_prepare_main+0x1a0/0x260
[c0000006b317bde0] [c0000000000312e0] syscall_exit_prepare+0x70/0x150
[c0000006b317be10] [c00000000000c758] system_call_common+0xf8/0x250
--- interrupt: c00 at 0x7fff8c6c8674
NIP: 00007fff8c6c8674 LR: 00007fff8c757430 CTR: 0000000000000000
REGS: c0000006b317be80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 28002242 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000003 00007fffe0bff620 00007fff8c6f7f00 0000000000010000
GPR04: 00007fff89980038 0000000000040000 00007fff8c6a0190 00007fff8c6a01a0
GPR08: 00007fff8c6a0160 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fff8cc46480 0000000000000000 0000000000bab10c
GPR16: 0cb1ba0000000000 00007fff8c79b700 0000000000000000 0000000000000002
GPR20: 0000000000040000 000001002710e800 00000100270680f8 00007fff8c798d28
GPR24: 00007fff8c7c03a4 00007fff8c7c03a4 00007fff89980028 000001002710e850
GPR28: 0000000000040000 0000000c7ff80000 00007fff89980010 000001002710e800
NIP [00007fff8c6c8674] 0x7fff8c6c8674
LR [00007fff8c757430] 0x7fff8c757430
--- interrupt: c00
INFO: task multipathd:881519 blocked for more than 245 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:multipathd state:D stack: 0 pid:881519 ppid: 1 flags:0x00040082
Call Trace:
[c000000096eff2b0] [c0000000ae18dd10] 0xc0000000ae18dd10 (unreliable)
[c000000096eff4a0] [c00000000001ea68] __switch_to+0x288/0x4a0
[c000000096eff500] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c000000096eff5c0] [c000000000e08348] schedule+0x68/0x120
[c000000096eff5f0] [c000000000e08930] schedule_preempt_disabled+0x20/0x30
[c000000096eff610] [c000000000e0aedc] __mutex_lock.isra.11+0x36c/0x700
[c000000096eff6a0] [c000000000788e0c] bd_link_disk_holder+0x3c/0x280
[c000000096eff6f0] [c008000000fb5848] dm_get_table_device+0x1f0/0x2d0 [dm_mod]
[c000000096eff790] [c008000000fb9ce8] dm_get_device+0x130/0x2f0 [dm_mod]
[c000000096eff840] [c0080000011553b4] multipath_ctr+0x9cc/0x1000 [dm_multipath]
[c000000096eff9c0] [c008000000fba704] dm_table_add_target+0x1ac/0x420 [dm_mod]
[c000000096effa80] [c008000000fc0a04] table_load+0x16c/0x4c0 [dm_mod]
[c000000096effb30] [c008000000fc3734] ctl_ioctl+0x28c/0x7e0 [dm_mod]
[c000000096effd40] [c008000000fc3ca8] dm_ctl_ioctl+0x20/0x40 [dm_mod]
[c000000096effd60] [c000000000545db8] sys_ioctl+0xf8/0x150
[c000000096effdb0] [c000000000031074] system_call_exception+0x174/0x370
[c000000096effe10] [c00000000000c74c] system_call_common+0xec/0x250
--- interrupt: c00 at 0x7fffb86ac010
NIP: 00007fffb86ac010 LR: 00007fffb8a86924 CTR: 0000000000000000
REGS: c000000096effe80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 24042204 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000036 00007fffb7cec3a0 00007fffb8797300 0000000000000005
GPR04: 00000000c138fd09 00007fffb0069c90 00007fffb8a8a118 00007fffb7cea298
GPR08: 0000000000000005 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffb7cf6300 00007fffb0069c90 00007fffb8a89e80
GPR16: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8ac3670 0000000000000000
GPR20: 00007fffb8ac2040 00007fffb8a93460 00007fffb0069cc0 000001001f65ab80
GPR24: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8a89e80 0000000000000000
GPR28: 00007fffb8a89e80 00007fffb8a89e80 0000000000000000 00007fffb8a89e80
NIP [00007fffb86ac010] 0x7fffb86ac010
LR [00007fffb8a86924] 0x7fffb8a86924
--- interrupt: c00
INFO: task systemd-udevd:881738 blocked for more than 245 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd-udevd state:D stack: 0 pid:881738 ppid: 708 flags:0x00042482
Call Trace:
[c0000006b317b280] [c0000000007640a4] bio_associate_blkg+0x44/0xb0 (unreliable)
[c0000006b317b470] [c00000000001ea68] __switch_to+0x288/0x4a0
[c0000006b317b4d0] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c0000006b317b590] [c000000000e08348] schedule+0x68/0x120
[c0000006b317b5c0] [c000000000e08adc] io_schedule+0x2c/0x50
[c0000006b317b5f0] [c0000000003ea624] __lock_page+0x1e4/0x430
[c0000006b317b6d0] [c000000000407fc8] truncate_inode_pages_range+0x338/0x8b0
[c0000006b317b850] [c000000000725714] kill_bdev.isra.14+0x44/0x60
[c0000006b317b880] [c0000000007261f4] blkdev_flush_mapping+0x54/0x260
[c0000006b317b960] [c000000000726488] blkdev_put_whole+0x88/0x90
[c0000006b317b9a0] [c00000000072714c] blkdev_put+0x1cc/0x280
[c0000006b317ba00] [c000000000727e9c] blkdev_close+0x3c/0x60
[c0000006b317ba30] [c000000000525694] __fput+0xc4/0x350
[c0000006b317ba80] [c000000000191128] task_work_run+0xf8/0x170
[c0000006b317bad0] [c000000000161c34] do_exit+0x4a4/0xd30
[c0000006b317bba0] [c000000000162594] do_group_exit+0x64/0xe0
[c0000006b317bbe0] [c000000000177fb8] get_signal+0x258/0xce0
[c0000006b317bcd0] [c0000000000219d4] do_notify_resume+0x114/0x480
[c0000006b317bd80] [c000000000030e40] interrupt_exit_user_prepare_main+0x1a0/0x260
[c0000006b317bde0] [c0000000000312e0] syscall_exit_prepare+0x70/0x150
[c0000006b317be10] [c00000000000c758] system_call_common+0xf8/0x250
--- interrupt: c00 at 0x7fff8c6c8674
NIP: 00007fff8c6c8674 LR: 00007fff8c757430 CTR: 0000000000000000
REGS: c0000006b317be80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 28002242 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000003 00007fffe0bff620 00007fff8c6f7f00 0000000000010000
GPR04: 00007fff89980038 0000000000040000 00007fff8c6a0190 00007fff8c6a01a0
GPR08: 00007fff8c6a0160 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fff8cc46480 0000000000000000 0000000000bab10c
GPR16: 0cb1ba0000000000 00007fff8c79b700 0000000000000000 0000000000000002
GPR20: 0000000000040000 000001002710e800 00000100270680f8 00007fff8c798d28
GPR24: 00007fff8c7c03a4 00007fff8c7c03a4 00007fff89980028 000001002710e850
GPR28: 0000000000040000 0000000c7ff80000 00007fff89980010 000001002710e800
NIP [00007fff8c6c8674] 0x7fff8c6c8674
LR [00007fff8c757430] 0x7fff8c757430
--- interrupt: c00
INFO: task multipathd:881519 blocked for more than 368 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:multipathd state:D stack: 0 pid:881519 ppid: 1 flags:0x00040082
Call Trace:
[c000000096eff2b0] [c0000000ae18dd10] 0xc0000000ae18dd10 (unreliable)
[c000000096eff4a0] [c00000000001ea68] __switch_to+0x288/0x4a0
[c000000096eff500] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c000000096eff5c0] [c000000000e08348] schedule+0x68/0x120
[c000000096eff5f0] [c000000000e08930] schedule_preempt_disabled+0x20/0x30
[c000000096eff610] [c000000000e0aedc] __mutex_lock.isra.11+0x36c/0x700
[c000000096eff6a0] [c000000000788e0c] bd_link_disk_holder+0x3c/0x280
[c000000096eff6f0] [c008000000fb5848] dm_get_table_device+0x1f0/0x2d0 [dm_mod]
[c000000096eff790] [c008000000fb9ce8] dm_get_device+0x130/0x2f0 [dm_mod]
[c000000096eff840] [c0080000011553b4] multipath_ctr+0x9cc/0x1000 [dm_multipath]
[c000000096eff9c0] [c008000000fba704] dm_table_add_target+0x1ac/0x420 [dm_mod]
[c000000096effa80] [c008000000fc0a04] table_load+0x16c/0x4c0 [dm_mod]
[c000000096effb30] [c008000000fc3734] ctl_ioctl+0x28c/0x7e0 [dm_mod]
[c000000096effd40] [c008000000fc3ca8] dm_ctl_ioctl+0x20/0x40 [dm_mod]
[c000000096effd60] [c000000000545db8] sys_ioctl+0xf8/0x150
[c000000096effdb0] [c000000000031074] system_call_exception+0x174/0x370
[c000000096effe10] [c00000000000c74c] system_call_common+0xec/0x250
--- interrupt: c00 at 0x7fffb86ac010
NIP: 00007fffb86ac010 LR: 00007fffb8a86924 CTR: 0000000000000000
REGS: c000000096effe80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 24042204 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000036 00007fffb7cec3a0 00007fffb8797300 0000000000000005
GPR04: 00000000c138fd09 00007fffb0069c90 00007fffb8a8a118 00007fffb7cea298
GPR08: 0000000000000005 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffb7cf6300 00007fffb0069c90 00007fffb8a89e80
GPR16: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8ac3670 0000000000000000
GPR20: 00007fffb8ac2040 00007fffb8a93460 00007fffb0069cc0 000001001f65ab80
GPR24: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8a89e80 0000000000000000
GPR28: 00007fffb8a89e80 00007fffb8a89e80 0000000000000000 00007fffb8a89e80
NIP [00007fffb86ac010] 0x7fffb86ac010
LR [00007fffb8a86924] 0x7fffb8a86924
--- interrupt: c00
INFO: task systemd-udevd:881738 blocked for more than 368 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd-udevd state:D stack: 0 pid:881738 ppid: 708 flags:0x00042482
Call Trace:
[c0000006b317b280] [c0000000007640a4] bio_associate_blkg+0x44/0xb0 (unreliable)
[c0000006b317b470] [c00000000001ea68] __switch_to+0x288/0x4a0
[c0000006b317b4d0] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c0000006b317b590] [c000000000e08348] schedule+0x68/0x120
[c0000006b317b5c0] [c000000000e08adc] io_schedule+0x2c/0x50
[c0000006b317b5f0] [c0000000003ea624] __lock_page+0x1e4/0x430
[c0000006b317b6d0] [c000000000407fc8] truncate_inode_pages_range+0x338/0x8b0
[c0000006b317b850] [c000000000725714] kill_bdev.isra.14+0x44/0x60
[c0000006b317b880] [c0000000007261f4] blkdev_flush_mapping+0x54/0x260
[c0000006b317b960] [c000000000726488] blkdev_put_whole+0x88/0x90
[c0000006b317b9a0] [c00000000072714c] blkdev_put+0x1cc/0x280
[c0000006b317ba00] [c000000000727e9c] blkdev_close+0x3c/0x60
[c0000006b317ba30] [c000000000525694] __fput+0xc4/0x350
[c0000006b317ba80] [c000000000191128] task_work_run+0xf8/0x170
[c0000006b317bad0] [c000000000161c34] do_exit+0x4a4/0xd30
[c0000006b317bba0] [c000000000162594] do_group_exit+0x64/0xe0
[c0000006b317bbe0] [c000000000177fb8] get_signal+0x258/0xce0
[c0000006b317bcd0] [c0000000000219d4] do_notify_resume+0x114/0x480
[c0000006b317bd80] [c000000000030e40] interrupt_exit_user_prepare_main+0x1a0/0x260
[c0000006b317bde0] [c0000000000312e0] syscall_exit_prepare+0x70/0x150
[c0000006b317be10] [c00000000000c758] system_call_common+0xf8/0x250
--- interrupt: c00 at 0x7fff8c6c8674
NIP: 00007fff8c6c8674 LR: 00007fff8c757430 CTR: 0000000000000000
REGS: c0000006b317be80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 28002242 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000003 00007fffe0bff620 00007fff8c6f7f00 0000000000010000
GPR04: 00007fff89980038 0000000000040000 00007fff8c6a0190 00007fff8c6a01a0
GPR08: 00007fff8c6a0160 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fff8cc46480 0000000000000000 0000000000bab10c
GPR16: 0cb1ba0000000000 00007fff8c79b700 0000000000000000 0000000000000002
GPR20: 0000000000040000 000001002710e800 00000100270680f8 00007fff8c798d28
GPR24: 00007fff8c7c03a4 00007fff8c7c03a4 00007fff89980028 000001002710e850
GPR28: 0000000000040000 0000000c7ff80000 00007fff89980010 000001002710e800
NIP [00007fff8c6c8674] 0x7fff8c6c8674
LR [00007fff8c757430] 0x7fff8c757430
--- interrupt: c00
INFO: task multipathd:881519 blocked for more than 491 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:multipathd state:D stack: 0 pid:881519 ppid: 1 flags:0x00040082
Call Trace:
[c000000096eff2b0] [c0000000ae18dd10] 0xc0000000ae18dd10 (unreliable)
[c000000096eff4a0] [c00000000001ea68] __switch_to+0x288/0x4a0
[c000000096eff500] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c000000096eff5c0] [c000000000e08348] schedule+0x68/0x120
[c000000096eff5f0] [c000000000e08930] schedule_preempt_disabled+0x20/0x30
[c000000096eff610] [c000000000e0aedc] __mutex_lock.isra.11+0x36c/0x700
[c000000096eff6a0] [c000000000788e0c] bd_link_disk_holder+0x3c/0x280
[c000000096eff6f0] [c008000000fb5848] dm_get_table_device+0x1f0/0x2d0 [dm_mod]
[c000000096eff790] [c008000000fb9ce8] dm_get_device+0x130/0x2f0 [dm_mod]
[c000000096eff840] [c0080000011553b4] multipath_ctr+0x9cc/0x1000 [dm_multipath]
[c000000096eff9c0] [c008000000fba704] dm_table_add_target+0x1ac/0x420 [dm_mod]
[c000000096effa80] [c008000000fc0a04] table_load+0x16c/0x4c0 [dm_mod]
[c000000096effb30] [c008000000fc3734] ctl_ioctl+0x28c/0x7e0 [dm_mod]
[c000000096effd40] [c008000000fc3ca8] dm_ctl_ioctl+0x20/0x40 [dm_mod]
[c000000096effd60] [c000000000545db8] sys_ioctl+0xf8/0x150
[c000000096effdb0] [c000000000031074] system_call_exception+0x174/0x370
[c000000096effe10] [c00000000000c74c] system_call_common+0xec/0x250
--- interrupt: c00 at 0x7fffb86ac010
NIP: 00007fffb86ac010 LR: 00007fffb8a86924 CTR: 0000000000000000
REGS: c000000096effe80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 24042204 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000036 00007fffb7cec3a0 00007fffb8797300 0000000000000005
GPR04: 00000000c138fd09 00007fffb0069c90 00007fffb8a8a118 00007fffb7cea298
GPR08: 0000000000000005 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffb7cf6300 00007fffb0069c90 00007fffb8a89e80
GPR16: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8ac3670 0000000000000000
GPR20: 00007fffb8ac2040 00007fffb8a93460 00007fffb0069cc0 000001001f65ab80
GPR24: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8a89e80 0000000000000000
GPR28: 00007fffb8a89e80 00007fffb8a89e80 0000000000000000 00007fffb8a89e80
NIP [00007fffb86ac010] 0x7fffb86ac010
LR [00007fffb8a86924] 0x7fffb8a86924
--- interrupt: c00
INFO: task systemd-udevd:881738 blocked for more than 491 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd-udevd state:D stack: 0 pid:881738 ppid: 708 flags:0x00042482
Call Trace:
[c0000006b317b280] [c0000000007640a4] bio_associate_blkg+0x44/0xb0 (unreliable)
[c0000006b317b470] [c00000000001ea68] __switch_to+0x288/0x4a0
[c0000006b317b4d0] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c0000006b317b590] [c000000000e08348] schedule+0x68/0x120
[c0000006b317b5c0] [c000000000e08adc] io_schedule+0x2c/0x50
[c0000006b317b5f0] [c0000000003ea624] __lock_page+0x1e4/0x430
[c0000006b317b6d0] [c000000000407fc8] truncate_inode_pages_range+0x338/0x8b0
[c0000006b317b850] [c000000000725714] kill_bdev.isra.14+0x44/0x60
[c0000006b317b880] [c0000000007261f4] blkdev_flush_mapping+0x54/0x260
[c0000006b317b960] [c000000000726488] blkdev_put_whole+0x88/0x90
[c0000006b317b9a0] [c00000000072714c] blkdev_put+0x1cc/0x280
[c0000006b317ba00] [c000000000727e9c] blkdev_close+0x3c/0x60
[c0000006b317ba30] [c000000000525694] __fput+0xc4/0x350
[c0000006b317ba80] [c000000000191128] task_work_run+0xf8/0x170
[c0000006b317bad0] [c000000000161c34] do_exit+0x4a4/0xd30
[c0000006b317bba0] [c000000000162594] do_group_exit+0x64/0xe0
[c0000006b317bbe0] [c000000000177fb8] get_signal+0x258/0xce0
[c0000006b317bcd0] [c0000000000219d4] do_notify_resume+0x114/0x480
[c0000006b317bd80] [c000000000030e40] interrupt_exit_user_prepare_main+0x1a0/0x260
[c0000006b317bde0] [c0000000000312e0] syscall_exit_prepare+0x70/0x150
[c0000006b317be10] [c00000000000c758] system_call_common+0xf8/0x250
--- interrupt: c00 at 0x7fff8c6c8674
NIP: 00007fff8c6c8674 LR: 00007fff8c757430 CTR: 0000000000000000
REGS: c0000006b317be80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 28002242 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000003 00007fffe0bff620 00007fff8c6f7f00 0000000000010000
GPR04: 00007fff89980038 0000000000040000 00007fff8c6a0190 00007fff8c6a01a0
GPR08: 00007fff8c6a0160 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fff8cc46480 0000000000000000 0000000000bab10c
GPR16: 0cb1ba0000000000 00007fff8c79b700 0000000000000000 0000000000000002
GPR20: 0000000000040000 000001002710e800 00000100270680f8 00007fff8c798d28
GPR24: 00007fff8c7c03a4 00007fff8c7c03a4 00007fff89980028 000001002710e850
GPR28: 0000000000040000 0000000c7ff80000 00007fff89980010 000001002710e800
NIP [00007fff8c6c8674] 0x7fff8c6c8674
LR [00007fff8c757430] 0x7fff8c757430
--- interrupt: c00
INFO: task multipathd:881519 blocked for more than 614 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:multipathd state:D stack: 0 pid:881519 ppid: 1 flags:0x00040082
Call Trace:
[c000000096eff2b0] [c0000000ae18dd10] 0xc0000000ae18dd10 (unreliable)
[c000000096eff4a0] [c00000000001ea68] __switch_to+0x288/0x4a0
[c000000096eff500] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c000000096eff5c0] [c000000000e08348] schedule+0x68/0x120
[c000000096eff5f0] [c000000000e08930] schedule_preempt_disabled+0x20/0x30
[c000000096eff610] [c000000000e0aedc] __mutex_lock.isra.11+0x36c/0x700
[c000000096eff6a0] [c000000000788e0c] bd_link_disk_holder+0x3c/0x280
[c000000096eff6f0] [c008000000fb5848] dm_get_table_device+0x1f0/0x2d0 [dm_mod]
[c000000096eff790] [c008000000fb9ce8] dm_get_device+0x130/0x2f0 [dm_mod]
[c000000096eff840] [c0080000011553b4] multipath_ctr+0x9cc/0x1000 [dm_multipath]
[c000000096eff9c0] [c008000000fba704] dm_table_add_target+0x1ac/0x420 [dm_mod]
[c000000096effa80] [c008000000fc0a04] table_load+0x16c/0x4c0 [dm_mod]
[c000000096effb30] [c008000000fc3734] ctl_ioctl+0x28c/0x7e0 [dm_mod]
[c000000096effd40] [c008000000fc3ca8] dm_ctl_ioctl+0x20/0x40 [dm_mod]
[c000000096effd60] [c000000000545db8] sys_ioctl+0xf8/0x150
[c000000096effdb0] [c000000000031074] system_call_exception+0x174/0x370
[c000000096effe10] [c00000000000c74c] system_call_common+0xec/0x250
--- interrupt: c00 at 0x7fffb86ac010
NIP: 00007fffb86ac010 LR: 00007fffb8a86924 CTR: 0000000000000000
REGS: c000000096effe80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 24042204 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000036 00007fffb7cec3a0 00007fffb8797300 0000000000000005
GPR04: 00000000c138fd09 00007fffb0069c90 00007fffb8a8a118 00007fffb7cea298
GPR08: 0000000000000005 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffb7cf6300 00007fffb0069c90 00007fffb8a89e80
GPR16: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8ac3670 0000000000000000
GPR20: 00007fffb8ac2040 00007fffb8a93460 00007fffb0069cc0 000001001f65ab80
GPR24: 00007fffb8a89e80 00007fffb8a89e80 00007fffb8a89e80 0000000000000000
GPR28: 00007fffb8a89e80 00007fffb8a89e80 0000000000000000 00007fffb8a89e80
NIP [00007fffb86ac010] 0x7fffb86ac010
LR [00007fffb8a86924] 0x7fffb8a86924
--- interrupt: c00
INFO: task systemd-udevd:881738 blocked for more than 614 seconds.
Not tainted 5.15.0-rc2+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd-udevd state:D stack: 0 pid:881738 ppid: 708 flags:0x00042482
Call Trace:
[c0000006b317b280] [c0000000007640a4] bio_associate_blkg+0x44/0xb0 (unreliable)
[c0000006b317b470] [c00000000001ea68] __switch_to+0x288/0x4a0
[c0000006b317b4d0] [c000000000e07bfc] __schedule+0x30c/0x9f0
[c0000006b317b590] [c000000000e08348] schedule+0x68/0x120
[c0000006b317b5c0] [c000000000e08adc] io_schedule+0x2c/0x50
[c0000006b317b5f0] [c0000000003ea624] __lock_page+0x1e4/0x430
[c0000006b317b6d0] [c000000000407fc8] truncate_inode_pages_range+0x338/0x8b0
[c0000006b317b850] [c000000000725714] kill_bdev.isra.14+0x44/0x60
[c0000006b317b880] [c0000000007261f4] blkdev_flush_mapping+0x54/0x260
[c0000006b317b960] [c000000000726488] blkdev_put_whole+0x88/0x90
[c0000006b317b9a0] [c00000000072714c] blkdev_put+0x1cc/0x280
[c0000006b317ba00] [c000000000727e9c] blkdev_close+0x3c/0x60
[c0000006b317ba30] [c000000000525694] __fput+0xc4/0x350
[c0000006b317ba80] [c000000000191128] task_work_run+0xf8/0x170
[c0000006b317bad0] [c000000000161c34] do_exit+0x4a4/0xd30
[c0000006b317bba0] [c000000000162594] do_group_exit+0x64/0xe0
[c0000006b317bbe0] [c000000000177fb8] get_signal+0x258/0xce0
[c0000006b317bcd0] [c0000000000219d4] do_notify_resume+0x114/0x480
[c0000006b317bd80] [c000000000030e40] interrupt_exit_user_prepare_main+0x1a0/0x260
[c0000006b317bde0] [c0000000000312e0] syscall_exit_prepare+0x70/0x150
[c0000006b317be10] [c00000000000c758] system_call_common+0xf8/0x250
--- interrupt: c00 at 0x7fff8c6c8674
NIP: 00007fff8c6c8674 LR: 00007fff8c757430 CTR: 0000000000000000
REGS: c0000006b317be80 TRAP: 0c00 Not tainted (5.15.0-rc2+)
MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 28002242 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000003 00007fffe0bff620 00007fff8c6f7f00 0000000000010000
GPR04: 00007fff89980038 0000000000040000 00007fff8c6a0190 00007fff8c6a01a0
GPR08: 00007fff8c6a0160 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fff8cc46480 0000000000000000 0000000000bab10c
GPR16: 0cb1ba0000000000 00007fff8c79b700 0000000000000000 0000000000000002
GPR20: 0000000000040000 000001002710e800 00000100270680f8 00007fff8c798d28
GPR24: 00007fff8c7c03a4 00007fff8c7c03a4 00007fff89980028 000001002710e850
GPR28: 0000000000040000 0000000c7ff80000 00007fff89980010 000001002710e800
NIP [00007fff8c6c8674] 0x7fff8c6c8674
LR [00007fff8c757430] 0x7fff8c757430
--- interrupt: c00
EXT4-fs (dm-5): error count since last fsck: 3
EXT4-fs (dm-5): initial error at time 1632612344: ext4_map_blocks:593: inode 8: block 1081345
EXT4-fs (dm-5): last error at time 1632628929: ext4_map_blocks:593: inode 2: block 9251
EXT4-fs error (device dm-5): ext4_map_blocks:593: inode #2: block 9251: comm scp: lblock 0 mapped to illegal pblock 9251 (length 1)
EXT4-fs (dm-5): error count since last fsck: 4
EXT4-fs (dm-5): initial error at time 1632612344: ext4_map_blocks:593: inode 8: block 1081345
EXT4-fs (dm-5): last error at time 1632724166: ext4_map_blocks:593: inode 2: block 9251
^ permalink raw reply
* [PATCH kernel] powerps/pseries/dma: Add support for 2M IOMMU page size
From: Alexey Kardashevskiy @ 2021-09-28 10:15 UTC (permalink / raw)
To: linuxppc-dev
Cc: Leonardo Bras, Brian J King, Alexey Kardashevskiy, Travis Pizel,
Frederic Barrat, Leonardo Augusto Guimaraes Garcia,
Murilo Vicentini
The upcoming PAPR spec adds a 2M page size, bit 23 right after the 16G page
size in the "ibm,query-pe-dma-window" call.
This adds support for the new page size. Since the new page size is out
of sorted order, this changes the loop to not assume that shift[] is
sorted.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
This might not work if PHYP keeps rejecting new window requests for less
than 32768 TCEs. This is needed:
https://github.com/aik/linux/commit/8cc8fa5ba5b3b4a18efbc9d81d9e5b85ca7c8a95
---
arch/powerpc/platforms/pseries/iommu.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index c741689a5165..237bf405b0cb 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1159,14 +1159,15 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
/* Return largest page shift based on "IO Page Sizes" output of ibm,query-pe-dma-window. */
static int iommu_get_page_shift(u32 query_page_size)
{
- /* Supported IO page-sizes according to LoPAR */
+ /* Supported IO page-sizes according to LoPAR, note that 2M is out of order */
const int shift[] = {
__builtin_ctzll(SZ_4K), __builtin_ctzll(SZ_64K), __builtin_ctzll(SZ_16M),
__builtin_ctzll(SZ_32M), __builtin_ctzll(SZ_64M), __builtin_ctzll(SZ_128M),
- __builtin_ctzll(SZ_256M), __builtin_ctzll(SZ_16G)
+ __builtin_ctzll(SZ_256M), __builtin_ctzll(SZ_16G), __builtin_ctzll(SZ_2M)
};
int i = ARRAY_SIZE(shift) - 1;
+ int ret = 0;
/*
* On LoPAR, ibm,query-pe-dma-window outputs "IO Page Sizes" using a bit field:
@@ -1176,11 +1177,10 @@ static int iommu_get_page_shift(u32 query_page_size)
*/
for (; i >= 0 ; i--) {
if (query_page_size & (1 << i))
- return shift[i];
+ ret = max(ret, shift[i]);
}
- /* No valid page size found. */
- return 0;
+ return ret;
}
static struct property *ddw_property_create(const char *propname, u32 liobn, u64 dma_addr,
--
2.30.2
^ permalink raw reply related
* Re: [PATCH v2 4/4] s390: Use generic version of arch_is_kernel_initmem_freed()
From: Heiko Carstens @ 2021-09-28 9:45 UTC (permalink / raw)
To: Christophe Leroy
Cc: linux-arch, linux-s390, arnd, linux-kernel, linux-mm,
Andrew Morton, linuxppc-dev, Gerald Schaefer
In-Reply-To: <d4a15dc0e699e6a60858bff4d183a9b1aea90433.1632813331.git.christophe.leroy@csgroup.eu>
On Tue, Sep 28, 2021 at 09:15:37AM +0200, Christophe Leroy wrote:
> Generic version of arch_is_kernel_initmem_freed() now does the same
> as s390 version.
>
> Remove the s390 version.
>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
> v2: No change
> ---
> arch/s390/include/asm/sections.h | 12 ------------
> arch/s390/mm/init.c | 3 ---
> 2 files changed, 15 deletions(-)
Looks good. Thanks for cleaning this up!
Acked-by: Heiko Carstens <hca@linux.ibm.com>
^ permalink raw reply
* Re: [PATCH 1/4] crypto: nintendo-aes - add a new AES driver
From: Geert Uytterhoeven @ 2021-09-28 9:00 UTC (permalink / raw)
To: Joel Stanley
Cc: devicetree, Herbert Xu, Emmanuel Gil Peyrot,
Linux Kernel Mailing List, Rob Herring, Paul Mackerras,
Linux Crypto Mailing List, Jonathan Neuschäfer, linuxppc-dev,
David S. Miller, Ash Logan
In-Reply-To: <CACPK8Xc+J0PbCdgheRxJbOVZ=OyyfsCA=cwkneMoboJLzC8TZQ@mail.gmail.com>
On Wed, Sep 22, 2021 at 4:12 AM Joel Stanley <joel@jms.id.au> wrote:
> On Tue, 21 Sept 2021 at 21:47, Emmanuel Gil Peyrot
> <linkmauve@linkmauve.fr> wrote:
> >
> > This engine implements AES in CBC mode, using 128-bit keys only. It is
> > present on both the Wii and the Wii U, and is apparently identical in
> > both consoles.
> >
> > The hardware is capable of firing an interrupt when the operation is
> > done, but this driver currently uses a busy loop, I’m not too sure
> > whether it would be preferable to switch, nor how to achieve that.
> >
> > It also supports a mode where no operation is done, and thus could be
> > used as a DMA copy engine, but I don’t know how to expose that to the
> > kernel or whether it would even be useful.
> >
> > In my testing, on a Wii U, this driver reaches 80.7 MiB/s, while the
> > aes-generic driver only reaches 30.9 MiB/s, so it is a quite welcome
> > speedup.
> >
> > This driver was written based on reversed documentation, see:
> > https://wiibrew.org/wiki/Hardware/AES
> >
> > Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
> > Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr> # on Wii U
> > ---
> > drivers/crypto/Kconfig | 11 ++
> > drivers/crypto/Makefile | 1 +
> > drivers/crypto/nintendo-aes.c | 273 ++++++++++++++++++++++++++++++++++
> > 3 files changed, 285 insertions(+)
> > create mode 100644 drivers/crypto/nintendo-aes.c
> >
> > diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
> > index 9a4c275a1335..adc94ad7462d 100644
> > --- a/drivers/crypto/Kconfig
> > +++ b/drivers/crypto/Kconfig
> > @@ -871,4 +871,15 @@ config CRYPTO_DEV_SA2UL
> >
> > source "drivers/crypto/keembay/Kconfig"
> >
> > +config CRYPTO_DEV_NINTENDO
> > + tristate "Support for the Nintendo Wii U AES engine"
> > + depends on WII || WIIU || COMPILE_TEST
>
> This current seteup will allow the driver to be compile tested for
> non-powerpc, which will fail on the dcbf instructions.
>
> Perhaps use this instead:
>
> depends on WII || WIIU || (COMPILE_TEST && PPC)
Or:
depends on PPC
depends on WII || WIIU || COMPILE_TEST
to distinguish between hard and soft dependencies.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH v4 4/8] PCI: replace pci_dev::driver usage that gets the driver name
From: Kalle Valo @ 2021-09-28 8:28 UTC (permalink / raw)
To: Uwe Kleine-König
Cc: linux-pci, Alexander Duyck, oss-drivers, Paul Mackerras,
Herbert Xu, Rafał Miłecki, Jesse Brandeburg,
Bjorn Helgaas, Ido Schimmel, Jakub Kicinski, Yisen Zhuang,
Vadym Kochan, Uwe Kleine-König, Michael Buesch, Jiri Pirko,
Salil Mehta, netdev, linux-wireless, linux-kernel, Taras Chornyi,
Zhou Wang, linux-crypto, kernel, Simon Horman,
Oliver O'Halloran, linuxppc-dev, David S. Miller
In-Reply-To: <20210927204326.612555-5-uwe@kleine-koenig.org>
Uwe Kleine-König <uwe@kleine-koenig.org> writes:
> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>
> struct pci_dev::driver holds (apart from a constant offset) the same
> data as struct pci_dev::dev->driver. With the goal to remove struct
> pci_dev::driver to get rid of data duplication replace getting the
> driver name by dev_driver_string() which implicitly makes use of struct
> pci_dev::dev->driver.
>
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
> arch/powerpc/include/asm/ppc-pci.h | 9 ++++++++-
> drivers/bcma/host_pci.c | 7 ++++---
For bcma:
Acked-by: Kalle Valo <kvalo@codeaurora.org>
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* [PATCH v2 2/4] mm: Make generic arch_is_kernel_initmem_freed() do what it says
From: Christophe Leroy @ 2021-09-28 7:15 UTC (permalink / raw)
To: Andrew Morton, arnd
Cc: linux-arch, linux-s390, linux-kernel, linux-mm, Gerald Schaefer,
linuxppc-dev
In-Reply-To: <ffa99e8e91e756b081427b27e408f275b7d43df7.1632813331.git.christophe.leroy@csgroup.eu>
Commit 7a5da02de8d6 ("locking/lockdep: check for freed initmem in
static_obj()") added arch_is_kernel_initmem_freed() which is supposed
to report whether an object is part of already freed init memory.
For the time being, the generic version of arch_is_kernel_initmem_freed()
always reports 'false', allthough free_initmem() is generically called
on all architectures.
Therefore, change the generic version of arch_is_kernel_initmem_freed()
to check whether free_initmem() has been called. If so, then check
if a given address falls into init memory.
In order to use function init_section_contains(), the fonction is
moved at the end of asm-generic/section.h
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Change to using the new SYSTEM_FREEING_INITMEM state
---
include/asm-generic/sections.h | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index d16302d3eb59..13f301239007 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -80,20 +80,6 @@ static inline int arch_is_kernel_data(unsigned long addr)
}
#endif
-/*
- * Check if an address is part of freed initmem. This is needed on architectures
- * with virt == phys kernel mapping, for code that wants to check if an address
- * is part of a static object within [_stext, _end]. After initmem is freed,
- * memory can be allocated from it, and such allocations would then have
- * addresses within the range [_stext, _end].
- */
-#ifndef arch_is_kernel_initmem_freed
-static inline int arch_is_kernel_initmem_freed(unsigned long addr)
-{
- return 0;
-}
-#endif
-
/**
* memory_contains - checks if an object is contained within a memory region
* @begin: virtual address of the beginning of the memory region
@@ -172,4 +158,21 @@ static inline bool is_kernel_rodata(unsigned long addr)
addr < (unsigned long)__end_rodata;
}
+/*
+ * Check if an address is part of freed initmem. This is needed on architectures
+ * with virt == phys kernel mapping, for code that wants to check if an address
+ * is part of a static object within [_stext, _end]. After initmem is freed,
+ * memory can be allocated from it, and such allocations would then have
+ * addresses within the range [_stext, _end].
+ */
+#ifndef arch_is_kernel_initmem_freed
+static inline int arch_is_kernel_initmem_freed(unsigned long addr)
+{
+ if (system_state < SYSTEM_FREEING_INITMEM)
+ return 0;
+
+ return init_section_contains((void *)addr, 1);
+}
+#endif
+
#endif /* _ASM_GENERIC_SECTIONS_H_ */
--
2.31.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox