LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active
From: Nathan Lynch @ 2021-10-21 17:33 UTC (permalink / raw)
  To: Nicholas Piggin, Athira Rajeev, mpe; +Cc: kjain, maddy, linuxppc-dev, rnsastry
In-Reply-To: <1634812863.5e9oss88pa.astroid@bobo.none>

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Athira Rajeev's message of July 11, 2021 10:25 pm:
>> During Live Partition Migration (LPM), it is observed that perf
>> counter values reports zero post migration completion. However
>> 'perf stat' with workload continues to show counts post migration
>> since PMU gets disabled/enabled during sched switches. But incase
>> of system/cpu wide monitoring, zero counts were reported with 'perf
>> stat' after migration completion.
>> 
>> Example:
>>  ./perf stat -e r1001e -I 1000
>>            time             counts unit events
>>      1.001010437         22,137,414      r1001e
>>      2.002495447         15,455,821      r1001e
>> <<>> As seen in next below logs, the counter values shows zero
>>         after migration is completed.
>> <<>>
>>     86.142535370    129,392,333,440      r1001e
>>     87.144714617                  0      r1001e
>>     88.146526636                  0      r1001e
>>     89.148085029                  0      r1001e
>> 
>> Here PMU is enabled during start of perf session and counter
>> values are read at intervals. Counters are only disabled at the
>> end of session. The powerpc mobility code presently does not handle
>> disabling and enabling back of PMU counters during partition
>> migration. Also since the PMU register values are not saved/restored
>> during migration, PMU registers like Monitor Mode Control Register 0
>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
>> the value it was programmed with. Hence PMU counters will not be
>> enabled correctly post migration.
>> 
>> Fix this in mobility code by handling disabling and enabling of
>> PMU in all cpu's before and after migration. Patch introduces two
>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
>> mobility_pmu_disable() is called before the processor threads goes
>> to suspend state so as to disable the PMU counters. And disable is
>> done only if there are any active events running on that cpu.
>> mobility_pmu_enable() is called after the processor threads are
>> back online to enable back the PMU counters.
>> 
>> Since the performance Monitor counters ( PMCs) are not
>> saved/restored during LPM, results in PMC value being zero and the
>> 'event->hw.prev_count' being non-zero value. This causes problem
>
> Interesting. Are they defined to not be migrated, or may not be 
> migrated?

PAPR may be silent on this... at least I haven't found anything yet. But
I'm not very familiar with perf counters.

How much assurance do we have that hardware events we've programmed on
the source can be reliably re-enabled on the destination, with the same
semantics? Aren't there some model-specific counters that don't make
sense to handle this way?


>> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
>> index 9dc97d2..cea72d7 100644
>> --- a/arch/powerpc/include/asm/rtas.h
>> +++ b/arch/powerpc/include/asm/rtas.h
>> @@ -380,5 +380,13 @@ static inline void rtas_initialize(void) { }
>>  static inline void read_24x7_sys_info(void) { }
>>  #endif
>>  
>> +#ifdef CONFIG_PPC_PERF_CTRS
>> +void mobility_pmu_disable(void);
>> +void mobility_pmu_enable(void);
>> +#else
>> +static inline void mobility_pmu_disable(void) { }
>> +static inline void mobility_pmu_enable(void) { }
>> +#endif
>> +
>>  #endif /* __KERNEL__ */
>>  #endif /* _POWERPC_RTAS_H */
>
> It's not implemented in rtas, maybe consider putting this into a perf 
> header?

+1


^ permalink raw reply

* Re: [PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active
From: Nathan Lynch @ 2021-10-21 17:17 UTC (permalink / raw)
  To: Athira Rajeev; +Cc: maddy, rnsastry, kjain, Nicholas Piggin, linuxppc-dev
In-Reply-To: <1626006357-1611-1-git-send-email-atrajeev@linux.vnet.ibm.com>

Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
> During Live Partition Migration (LPM), it is observed that perf
> counter values reports zero post migration completion. However
> 'perf stat' with workload continues to show counts post migration
> since PMU gets disabled/enabled during sched switches. But incase
> of system/cpu wide monitoring, zero counts were reported with 'perf
> stat' after migration completion.
>
> Example:
>  ./perf stat -e r1001e -I 1000
>            time             counts unit events
>      1.001010437         22,137,414      r1001e
>      2.002495447         15,455,821      r1001e
> <<>> As seen in next below logs, the counter values shows zero
>         after migration is completed.
> <<>>
>     86.142535370    129,392,333,440      r1001e
>     87.144714617                  0      r1001e
>     88.146526636                  0      r1001e
>     89.148085029                  0      r1001e

Confirmed in my environment:

    51.099987985            300,338      cache-misses
    52.101839374            296,586      cache-misses
    53.116089796            263,150      cache-misses
    54.117949249            232,290      cache-misses
    55.602029375     68,700,421,711      cache-misses
    56.610073969                  0      cache-misses
    57.614732000                  0      cache-misses

I wonder what it means that there is a very unlikely huge value before
the counter stops working -- I believe your example has this phenomenon
too.


> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index e83e089..ff7a77c 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -476,6 +476,8 @@ static int do_join(void *arg)
>  retry:
>  	/* Must ensure MSR.EE off for H_JOIN. */
>  	hard_irq_disable();
> +	/* Disable PMU before suspend */
> +	mobility_pmu_disable();
>  	hvrc = plpar_hcall_norets(H_JOIN);
>  
>  	switch (hvrc) {
> @@ -530,6 +532,8 @@ static int do_join(void *arg)
>  	 * reset the watchdog.
>  	 */
>  	touch_nmi_watchdog();
> +	/* Enable PMU after resuming */
> +	mobility_pmu_enable();
>  	return ret;
>  }

We should minimize calls into other subsystems from this context (the
callback function we've passed to stop_machine); it's fairly sensitive.
Can this be moved out to pseries_migrate_partition() or similar?

^ permalink raw reply

* Re: [PATCH v4 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers
From: Bjorn Helgaas @ 2021-10-21 17:11 UTC (permalink / raw)
  To: Naveen Naidu
  Cc: Oza Pawandeep, linux-pci, Sinan Kaya, linux-kernel, Keith Busch,
	oohall, bhelgaas, linuxppc-dev, linux-kernel-mentees
In-Reply-To: <20211021165330.lcqajtwej4s7oadt@theprophet>

On Thu, Oct 21, 2021 at 10:23:30PM +0530, Naveen Naidu wrote:
> On 20/10, Bjorn Helgaas wrote:
> > On Tue, Oct 05, 2021 at 10:48:12PM +0530, Naveen Naidu wrote:

> > > In EDR path, AER status registers are cleared irrespective of whether
> > > the error was an RP PIO or unmasked uncorrectable error. But in DPC, the
> > > AER status registers are cleared only when it's an unmasked uncorrectable
> > > error.
> > > 
> > > This leads to two different behaviours for the same task (handling of
> > > DPC errors) in FFS systems and when native OS has control.
> > 
> > FFS?
> 
> Firmware First Systems

I assumed that's what it was, but it's helpful to use the same terms
used by the specs to make things easier to find.  I don't think it's
actually the case that "Firmware First" necessary applies to the
entire system, since the ACPI FIRMWARE_FIRST flag is a per-error
source thing, not a per-system thing.

^ permalink raw reply

* Re: [PATCH v4 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()
From: Bjorn Helgaas @ 2021-10-21 17:05 UTC (permalink / raw)
  To: Naveen Naidu
  Cc: linux-pci, linux-kernel, oohall, bhelgaas, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <20211021163611.rsfmberxw6dqn5gx@theprophet>

On Thu, Oct 21, 2021 at 10:06:11PM +0530, Naveen Naidu wrote:
> On 20/10, Bjorn Helgaas wrote:
> > On Tue, Oct 05, 2021 at 10:48:11PM +0530, Naveen Naidu wrote:
> > > dpc_process_error() clears both AER fatal and non fatal status
> > > registers. Instead of clearing each status registers via a different
> > > function call use pci_aer_clear_status().
> > > 
> > > This helps clean up the code a bit.
> > > 
> > > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
> > > ---
> > >  drivers/pci/pcie/dpc.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index df3f3a10f8bc..faf4a1e77fab 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
> > >  		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
> > >  		 aer_get_device_error_info(pdev, &info)) {
> > >  		aer_print_error(pdev, &info);
> > > -		pci_aer_clear_nonfatal_status(pdev);
> > > -		pci_aer_clear_fatal_status(pdev);
> > > +		pci_aer_clear_status(pdev);
> > 
> > The commit log suggests that this is a simple cleanup that doesn't
> > change any behavior, but that's not quite true:
> > 
> >   - The new code would clear PCI_ERR_ROOT_STATUS, but the old code
> >     does not.
> > 
> >   - The old code masks the status bits with the severity bits before
> >     clearing, but the new code does not.
> > 
> > The commit log needs to show why these changes are what we want.
> >
> 
> Reading through the code again, I realize how wrong(stupid) I was when
> making this patch. I was thinking that:
> 
>   pci_aer_clear_status() = pci_aer_clear_fatal_status() + pci_aer_clear_nonfatal_status()
> 
> Now I understand, that it is not at all the case. I apologize for the
> mistake. I'll make sure to be meticulous while reading functions and not
> just assume their behaviour just from their function names.

No problem, one could argue that the collection of pci_aer_clear_*()
functions that do slightly different things is itself a defect.

^ permalink raw reply

* Re: [PATCH v4 1/8] PCI/AER: Remove ID from aer_agent_string[]
From: Bjorn Helgaas @ 2021-10-21 17:03 UTC (permalink / raw)
  To: Naveen Naidu
  Cc: linux-pci, linux-kernel, oohall, bhelgaas, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <20211021163021.r4ekhfol42ftw5zw@theprophet>

On Thu, Oct 21, 2021 at 10:00:21PM +0530, Naveen Naidu wrote:
> On 20/10, Bjorn Helgaas wrote:
> > On Tue, Oct 05, 2021 at 10:48:08PM +0530, Naveen Naidu wrote:
> > > Currently, we do not print the "id" field in the AER error logs. Yet the
> > > aer_agent_string[] has the word "id" in it. The AER error log looks
> > > like:
> > > 
> > >   pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
> > > 
> > > Without the "id" field in the error log, The aer_agent_string[]
> > > (eg: "Receiver ID") does not make sense. A user reading the
> > > aer_agent_string[] in the log, might inadvertently look for an "id"
> > > field and not finding it might lead to confusion.
> > > 
> > > Remove the "ID" from the aer_agent_string[].
> > > 
> > > The following are sample dummy errors inject via aer-inject.
> > 
> > I like this, and the problem it fixes was my fault because
> > these "ID" strings should have been removed by 010caed4ccb6.
> > 
> > If it's straightforward enough, it would be nice to have the
> > aer-inject command line here in the commit log to make it easier
> > for people to play with this.
> >
> 
> Thank you for the review. Do you mean something like:
> 
> The following sample dummy errors are injected via aer-inject via the
> following steps:
> 
>   1. The steps to compile the aer-inject tool is mentioned in (Section
>      4. Software error inject) of the document [1]
> 
>      [1]: https://www.kernel.org/doc/Documentation/PCI/pcieaer-howto.txt
> 
>      Make sure to place the aer-inject executable at the home directory
>      of the qemu system or at any other place.
> 
>   2. Emulate a PCIE architecture using qemu, A sample looks like
>      following:
>      
> 		qemu-system-x86_64 -kernel ../linux/arch/x86_64/boot/bzImage \
>         -initrd  buildroot-build/images/rootfs.cpio.gz \
>         -append "console=ttyS0"  \
>         -enable-kvm -nographic \
>         -M q35 \
>         -device pcie-root-port,bus=pcie.0,id=rp1,slot=1 \
>         -device pcie-pci-bridge,id=br1,bus=rp1 \
>         -device e1000,bus=br1,addr=8
>        
>     Note that the PCIe features are available only when using the 
>     'q35' Machine [2]
>     [2]: https://github.com/qemu/qemu/blob/master/docs/pcie.txt
> 
>   3. Once the qemu system starts up, create a sample aer-file or use any
>      example aer file from [3]
> 
>      [3]:
>      https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git/tree/examples
> 
>   4. Inject any aer-error using
>       
>       ./aer-inject aer-file
> 
> This does look a tad bit longer for a commit log so I am unsure if you
> would like to have it there. If you are okay with it, I would be happy
> to add it to that :)

Yes, that's kind of long.  Something like this
https://git.kernel.org/linus/d95f20c4f070 would be enough for the
commit log, especially since you've now provided all the details in
the email thread, where we can find them via the Link: tag.

Bjorn

^ permalink raw reply

* Re: [PATCH v4 5/8] PCI/DPC: Converge EDR and DPC Path of clearing AER registers
From: Naveen Naidu @ 2021-10-21 16:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Oza Pawandeep, linux-pci, Sinan Kaya, linux-kernel, Keith Busch,
	oohall, bhelgaas, linuxppc-dev, linux-kernel-mentees
In-Reply-To: <20211021020934.GA2658296@bhelgaas>

On 20/10, Bjorn Helgaas wrote:
> [+cc Keith, Sinan, Oza]
> 
> On Tue, Oct 05, 2021 at 10:48:12PM +0530, Naveen Naidu wrote:
> > In the EDR path, AER registers are cleared *after* DPC error event is
> > processed. The process stack in EDR is:
> > 
> >   edr_handle_event()
> >     dpc_process_error()
> >     pci_aer_raw_clear_status()
> >     pcie_do_recovery()
> > 
> > But in DPC path, AER status registers are cleared *while* processing
> > the error. The process stack in DPC is:
> > 
> >   dpc_handler()
> >     dpc_process_error()
> >       pci_aer_clear_status()
> >     pcie_do_recovery()
> 
> These are accurate but they both include dpc_process_error(), so we
> need a hint to show why the one here is different from the one in the
> EDR path, e.g.,
> 
>   dpc_handler
>     dpc_process_error
>       if (reason == 0)
>         pci_aer_clear_status    # uncorrectable errors only
>     pcie_do_recovery
> 
> > In EDR path, AER status registers are cleared irrespective of whether
> > the error was an RP PIO or unmasked uncorrectable error. But in DPC, the
> > AER status registers are cleared only when it's an unmasked uncorrectable
> > error.
> > 
> > This leads to two different behaviours for the same task (handling of
> > DPC errors) in FFS systems and when native OS has control.
> 
> FFS?
>

Firmware First Systems

> I'd really like to have a specific example of how a user would observe
> this difference.  I know you probably don't have two systems to
> compare like that, but maybe we can work it out manually.
> 

Apologies again! Reading through the code again and the specification, I
realize that my understanding was very incorrect at the time of making
this patch. I grossly oversimplified EDR and DPC when I was learning
about it.

I'll drop this patch when I send the v5 for the series.

Apologies again ^^'

> I guess you're saying the problem is in the native DPC handling, and
> we don't clear the AER status registers for ERR_NONFATAL,
> ERR_NONFATAL, etc., right?
> 

But yes, I did have this question though (I wasn't able to find the
answers to it when reading the spec). Why do we not clear the entire
ERR_NONFATAL and ERR_FATAL registers in the DPC path just like EDR does
using the pci_aer_raw_clear_status() before going to pcie_do_recovery()

I am sure I might have missed something in the spec. I guess I'll
look/re-read these bits again.

Thanks for the review :)

> I think the current behavior is from 8aefa9b0d910 ("PCI/DPC: Print AER
> status in DPC event handling"), where Keith explicitly mentions those
> cases.  The commit log here should connect back to that and explain
> whether something has changed.
> 
> I cc'd Keith and the reviewers of that change in case any of them have
> time to dig into this again.
> 
> > Bring the same semantics for clearing the AER status register in EDR
> > path and DPC path.
> > 
> > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
> > ---
> >  drivers/pci/pcie/dpc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > index faf4a1e77fab..68899a3db126 100644
> > --- a/drivers/pci/pcie/dpc.c
> > +++ b/drivers/pci/pcie/dpc.c
> > @@ -288,7 +288,6 @@ void dpc_process_error(struct pci_dev *pdev)
> >  		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
> >  		 aer_get_device_error_info(pdev, &info)) {
> >  		aer_print_error(pdev, &info);
> > -		pci_aer_clear_status(pdev);
> >  	}
> >  }
> >  
> > @@ -297,6 +296,7 @@ static irqreturn_t dpc_handler(int irq, void *context)
> >  	struct pci_dev *pdev = context;
> >  
> >  	dpc_process_error(pdev);
> > +	pci_aer_clear_status(pdev);
> >  
> >  	/* We configure DPC so it only triggers on ERR_FATAL */
> >  	pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link);
> > -- 
> > 2.25.1
> > 
> > _______________________________________________
> > Linux-kernel-mentees mailing list
> > Linux-kernel-mentees@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply

* Re: [PATCH v4 4/8] PCI/DPC: Use pci_aer_clear_status() in dpc_process_error()
From: Naveen Naidu @ 2021-10-21 16:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, oohall, bhelgaas, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <20211021014029.GA2657684@bhelgaas>

On 20/10, Bjorn Helgaas wrote:
> On Tue, Oct 05, 2021 at 10:48:11PM +0530, Naveen Naidu wrote:
> > dpc_process_error() clears both AER fatal and non fatal status
> > registers. Instead of clearing each status registers via a different
> > function call use pci_aer_clear_status().
> > 
> > This helps clean up the code a bit.
> > 
> > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
> > ---
> >  drivers/pci/pcie/dpc.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > index df3f3a10f8bc..faf4a1e77fab 100644
> > --- a/drivers/pci/pcie/dpc.c
> > +++ b/drivers/pci/pcie/dpc.c
> > @@ -288,8 +288,7 @@ void dpc_process_error(struct pci_dev *pdev)
> >  		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
> >  		 aer_get_device_error_info(pdev, &info)) {
> >  		aer_print_error(pdev, &info);
> > -		pci_aer_clear_nonfatal_status(pdev);
> > -		pci_aer_clear_fatal_status(pdev);
> > +		pci_aer_clear_status(pdev);
> 
> The commit log suggests that this is a simple cleanup that doesn't
> change any behavior, but that's not quite true:
> 
>   - The new code would clear PCI_ERR_ROOT_STATUS, but the old code
>     does not.
> 
>   - The old code masks the status bits with the severity bits before
>     clearing, but the new code does not.
> 
> The commit log needs to show why these changes are what we want.
>

Reading through the code again, I realize how wrong(stupid) I was when
making this patch. I was thinking that:

  pci_aer_clear_status() = pci_aer_clear_fatal_status() + pci_aer_clear_nonfatal_status()

Now I understand, that it is not at all the case. I apologize for the
mistake. I'll make sure to be meticulous while reading functions and not
just assume their behaviour just from their function names.

I'll drop this patch in the next version of the patch series I make.

Apologies again ^^'

> >  	}
> >  }
> >  
> > -- 
> > 2.25.1
> > 
> > _______________________________________________
> > Linux-kernel-mentees mailing list
> > Linux-kernel-mentees@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply

* Re: [PATCH v4 3/8] PCI/DPC: Initialize info->id in dpc_process_error()
From: Naveen Naidu @ 2021-10-21 16:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, oohall, bhelgaas, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <20211021012800.GA2656128@bhelgaas>

On 20/10, Bjorn Helgaas wrote:
> On Tue, Oct 05, 2021 at 10:48:10PM +0530, Naveen Naidu wrote:
> > In the dpc_process_error() path, info->id isn't initialized before being
> > passed to aer_print_error(). In the corresponding AER path, it is
> > initialized in aer_isr_one_error().
> > 
> > The error message shown during Coverity Scan is:
> > 
> >   Coverity #1461602
> >   CID 1461602 (#1 of 1): Uninitialized scalar variable (UNINIT)
> >   8. uninit_use_in_call: Using uninitialized value info.id when calling aer_print_error.
> > 
> > Initialize the "info->id" before passing it to aer_print_error()
> > 
> > Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling")
> > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
> > ---
> >  drivers/pci/pcie/dpc.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > index c556e7beafe3..df3f3a10f8bc 100644
> > --- a/drivers/pci/pcie/dpc.c
> > +++ b/drivers/pci/pcie/dpc.c
> > @@ -262,14 +262,14 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev *dev,
> >  
> >  void dpc_process_error(struct pci_dev *pdev)
> >  {
> > -	u16 cap = pdev->dpc_cap, status, source, reason, ext_reason;
> > +	u16 cap = pdev->dpc_cap, status, reason, ext_reason;
> >  	struct aer_err_info info;
> >  
> >  	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
> > -	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &source);
> > +	pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &info.id);
> >  
> >  	pci_info(pdev, "containment event, status:%#06x source:%#06x\n",
> > -		 status, source);
> > +		 status, info.id);
> >  
> >  	reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1;
> 
> Per PCIe r5.0, sec 7.9.15.5, the Source ID is defined only when the
> Trigger Reason indicates ERR_NONFATAL or ERR_FATAL.  So I think we
> need to extract this reason before reading PCI_EXP_DPC_SOURCE_ID,
> e.g.,
> 
>   reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN) >> 1;
>   if (reason == 1 || reason == 2)
>     pci_read_config_word(pdev, cap + PCI_EXP_DPC_SOURCE_ID, &info.id);
>   else
>     info.id = 0;
>

Thank you for the review, I'll make this change when I send a v5 for the
patch series.

> >  	ext_reason = (status & PCI_EXP_DPC_STATUS_TRIGGER_RSN_EXT) >> 5;
> > -- 
> > 2.25.1
> > 
> > _______________________________________________
> > Linux-kernel-mentees mailing list
> > Linux-kernel-mentees@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply

* Re: [PATCH v4 1/8] PCI/AER: Remove ID from aer_agent_string[]
From: Naveen Naidu @ 2021-10-21 16:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, oohall, bhelgaas, linuxppc-dev,
	linux-kernel-mentees
In-Reply-To: <20211021012826.GA2655655@bhelgaas>

On 20/10, Bjorn Helgaas wrote:
> On Tue, Oct 05, 2021 at 10:48:08PM +0530, Naveen Naidu wrote:
> > Currently, we do not print the "id" field in the AER error logs. Yet the
> > aer_agent_string[] has the word "id" in it. The AER error log looks
> > like:
> > 
> >   pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
> > 
> > Without the "id" field in the error log, The aer_agent_string[]
> > (eg: "Receiver ID") does not make sense. A user reading the
> > aer_agent_string[] in the log, might inadvertently look for an "id"
> > field and not finding it might lead to confusion.
> > 
> > Remove the "ID" from the aer_agent_string[].
> > 
> > The following are sample dummy errors inject via aer-inject.
> 
> I like this, and the problem it fixes was my fault because
> these "ID" strings should have been removed by 010caed4ccb6.
> 
> If it's straightforward enough, it would be nice to have the
> aer-inject command line here in the commit log to make it easier
> for people to play with this.
>

Thank you for the review. Do you mean something like:

The following sample dummy errors are injected via aer-inject via the
following steps:

  1. The steps to compile the aer-inject tool is mentioned in (Section
     4. Software error inject) of the document [1]

     [1]: https://www.kernel.org/doc/Documentation/PCI/pcieaer-howto.txt

     Make sure to place the aer-inject executable at the home directory
     of the qemu system or at any other place.

  2. Emulate a PCIE architecture using qemu, A sample looks like
     following:
     
		qemu-system-x86_64 -kernel ../linux/arch/x86_64/boot/bzImage \
        -initrd  buildroot-build/images/rootfs.cpio.gz \
        -append "console=ttyS0"  \
        -enable-kvm -nographic \
        -M q35 \
        -device pcie-root-port,bus=pcie.0,id=rp1,slot=1 \
        -device pcie-pci-bridge,id=br1,bus=rp1 \
        -device e1000,bus=br1,addr=8
       
    Note that the PCIe features are available only when using the 
    'q35' Machine [2]
    [2]: https://github.com/qemu/qemu/blob/master/docs/pcie.txt

  3. Once the qemu system starts up, create a sample aer-file or use any
     example aer file from [3]

     [3]:
     https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git/tree/examples

  4. Inject any aer-error using
      
      ./aer-inject aer-file

This does look a tad bit longer for a commit log so I am unsure if you
would like to have it there. If you are okay with it, I would be happy
to add it to that :)

> > Before
> > =======
> > 
> > In 010caed4ccb6 ("PCI/AER: Decode Error Source Requester ID"),
> > the "id" field was removed from the AER error logs, so currently AER
> > logs look like:
> > 
> >   pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03:0
> >   pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) <--- no id field
> >   pcieport 0000:00:03.0:   device [1b36:000c] error status/mask=00000040/0000e000
> >   pcieport 0000:00:03.0:    [ 6] BadTLP
> > 
> > After
> > ======
> > 
> >   pcieport 0000:00:03.0: AER: Corrected error received: 0000:00:03.0
> >   pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver)
> >   pcieport 0000:00:03.0:   device [1b36:000c] error status/mask=00000040/0000e000
> >   pcieport 0000:00:03.0:    [ 6] BadTLP
> > 
> > Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
> > ---
> >  drivers/pci/pcie/aer.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 9784fdcf3006..241ff361b43c 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -516,10 +516,10 @@ static const char *aer_uncorrectable_error_string[] = {
> >  };
> >  
> >  static const char *aer_agent_string[] = {
> > -	"Receiver ID",
> > -	"Requester ID",
> > -	"Completer ID",
> > -	"Transmitter ID"
> > +	"Receiver",
> > +	"Requester",
> > +	"Completer",
> > +	"Transmitter"
> >  };
> >  
> >  #define aer_stats_dev_attr(name, stats_array, strings_array,		\
> > @@ -703,7 +703,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
> >  	const char *level;
> >  
> >  	if (!info->status) {
> > -		pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
> > +		pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent)\n",
> >  			aer_error_severity_string[info->severity]);
> >  		goto out;
> >  	}
> > -- 
> > 2.25.1
> > 
> > _______________________________________________
> > Linux-kernel-mentees mailing list
> > Linux-kernel-mentees@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

^ permalink raw reply

* Re: [PATCH 07/20] signal/powerpc: On swapcontext failure force SIGSEGV
From: Kees Cook @ 2021-10-21 16:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-arch, linuxppc-dev, linux-kernel, Oleg Nesterov,
	Paul Mackerras, Al Viro, Linus Torvalds
In-Reply-To: <20211020174406.17889-7-ebiederm@xmission.com>

On Wed, Oct 20, 2021 at 12:43:53PM -0500, Eric W. Biederman wrote:
> If the register state may be partial and corrupted instead of calling
> do_exit, call force_sigsegv(SIGSEGV).  Which properly kills the
> process with SIGSEGV and does not let any more userspace code execute,
> instead of just killing one thread of the process and potentially
> confusing everything.
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> Fixes: 756f1ae8a44e ("PPC32: Rework signal code and add a swapcontext system call.")
> Fixes: 04879b04bf50 ("[PATCH] ppc64: VMX (Altivec) support & signal32 rework, from Ben Herrenschmidt")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

This looks right to me.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply

* [PATCH v3 19/25] PCI/DPC: Use RESPONSE_IS_PCI_ERROR() to check read from hardware
From: Naveen Naidu @ 2021-10-21 15:07 UTC (permalink / raw)
  To: bhelgaas
  Cc: open list:PCI ENHANCED ERROR HANDLING EEH FOR POWERPC, linux-pci,
	skhan, linux-kernel, Naveen Naidu, Oliver O'Halloran,
	linux-kernel-mentees
In-Reply-To: <cover.1634825082.git.naveennaidu479@gmail.com>

An MMIO read from a PCI device that doesn't exist or doesn't respond
causes a PCI error.  There's no real data to return to satisfy the
CPU read, so most hardware fabricates ~0 data.

Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read
data from hardware.

This helps unify PCI error response checking and make error checks
consistent and easier to find.

Compile tested only.

Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/pcie/dpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index c556e7beafe3..4a051a096075 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -79,7 +79,7 @@ static bool dpc_completed(struct pci_dev *pdev)
 	u16 status;
 
 	pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS, &status);
-	if ((status != 0xffff) && (status & PCI_EXP_DPC_STATUS_TRIGGER))
+	if ((!RESPONSE_IS_PCI_ERROR(status)) && (status & PCI_EXP_DPC_STATUS_TRIGGER))
 		return false;
 
 	if (test_bit(PCI_DPC_RECOVERING, &pdev->priv_flags))
@@ -312,7 +312,7 @@ static irqreturn_t dpc_irq(int irq, void *context)
 
 	pci_read_config_word(pdev, cap + PCI_EXP_DPC_STATUS, &status);
 
-	if (!(status & PCI_EXP_DPC_STATUS_INTERRUPT) || status == (u16)(~0))
+	if (!(status & PCI_EXP_DPC_STATUS_INTERRUPT) || RESPONSE_IS_PCI_ERROR(status))
 		return IRQ_NONE;
 
 	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 02/25] PCI: Set error response in config access defines when ops->read() fails
From: Naveen Naidu @ 2021-10-21 15:07 UTC (permalink / raw)
  To: bhelgaas
  Cc: Krzysztof Wilczyński, linux-hyperv, Heiko Stuebner,
	linux-pci, Shawn Lin, Binghui Wang, Kuppuswamy Sathyanarayanan,
	Amey Narkhede, Oliver O'Halloran, Thomas Petazzoni,
	Lorenzo Pieralisi, Toan Le, K. Y. Srinivasan, Nirmal Patel,
	Marek Vasut, Rob Herring, Wei Liu, linux-samsung-soc,
	Marc Zyngier, Naveen Naidu, Joyce Ooi, Dexuan Cui, Jianjun Wang,
	linux-rockchip, maintainer:BROADCOM IPROC ARM ARCHITECTURE,
	Jonathan Derrick, Xiaowei Song, linux-kernel-mentees,
	Robert Richter, Sean V Kelley, Ray Jui, Haiyang Zhang, Ryder Lee,
	linux-mediatek, skhan, Matthias Brugger, Stephen Hemminger,
	linux-arm-kernel, Qiuxu Zhuo, Scott Branden, linuxppc-dev,
	Yoshihiro Shimoda, Krzysztof Kozlowski, linux-kernel,
	linux-renesas-soc, Lukas Wunner, Jingoo Han, Shawn Guo,
	Pali Rohár
In-Reply-To: <cover.1634825082.git.naveennaidu479@gmail.com>

Make PCI_OP_READ and PCI_USER_READ_CONFIG set the data value with error
response (~0), when the PCI device read by a host controller fails.

This ensures that the controller drivers no longer need to fabricate
(~0) value when they detect error. It also  gurantees that the error
response (~0) is always set when the controller drivers fails to read a
config register from a device.

This makes error response fabrication consistent and helps in removal of
a lot of repeated code.

Suggested-by: Rob Herring <robh@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 drivers/pci/access.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 46935695cfb9..0f732ba2f71a 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -42,7 +42,10 @@ int noinline pci_bus_read_config_##size \
 	if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER;	\
 	pci_lock_config(flags);						\
 	res = bus->ops->read(bus, devfn, pos, len, &data);		\
-	*value = (type)data;						\
+	if (res)							\
+		SET_PCI_ERROR_RESPONSE(value);				\
+	else								\
+		*value = (type)data;					\
 	pci_unlock_config(flags);					\
 	return res;							\
 }
@@ -228,7 +231,10 @@ int pci_user_read_config_##size						\
 	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
 					pos, sizeof(type), &data);	\
 	raw_spin_unlock_irq(&pci_lock);				\
-	*val = (type)data;						\
+	if (ret)							\
+		SET_PCI_ERROR_RESPONSE(val);				\
+	else								\
+		*val = (type)data;					\
 	return pcibios_err_to_errno(ret);				\
 }									\
 EXPORT_SYMBOL_GPL(pci_user_read_config_##size);
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 01/25] PCI: Add PCI_ERROR_RESPONSE and it's related definitions
From: Naveen Naidu @ 2021-10-21 15:07 UTC (permalink / raw)
  To: bhelgaas
  Cc: Krzysztof Wilczyński, linux-hyperv, Heiko Stuebner,
	linux-pci, Shawn Lin, Binghui Wang, Kuppuswamy Sathyanarayanan,
	Amey Narkhede, Oliver O'Halloran, Thomas Petazzoni,
	Lorenzo Pieralisi, Toan Le, K. Y. Srinivasan, Nirmal Patel,
	Marek Vasut, Rob Herring, Wei Liu, linux-samsung-soc,
	Marc Zyngier, Naveen Naidu, Joyce Ooi, Dexuan Cui, Jianjun Wang,
	linux-rockchip, maintainer:BROADCOM IPROC ARM ARCHITECTURE,
	Jonathan Derrick, Xiaowei Song, linux-kernel-mentees,
	Robert Richter, Sean V Kelley, Ray Jui, Haiyang Zhang, Ryder Lee,
	linux-mediatek, skhan, Matthias Brugger, Stephen Hemminger,
	linux-arm-kernel, Qiuxu Zhuo, Scott Branden, linuxppc-dev,
	Yoshihiro Shimoda, Krzysztof Kozlowski, linux-kernel,
	linux-renesas-soc, Lukas Wunner, Jingoo Han, Shawn Guo,
	Pali Rohár
In-Reply-To: <cover.1634825082.git.naveennaidu479@gmail.com>

An MMIO read from a PCI device that doesn't exist or doesn't respond
causes a PCI error.  There's no real data to return to satisfy the
CPU read, so most hardware fabricates ~0 data.

Add a PCI_ERROR_RESPONSE definition for that and use it where
appropriate to make these checks consistent and easier to find.

Also add helper definitions SET_PCI_ERROR_RESPONSE and
RESPONSE_IS_PCI_ERROR to make the code more readable.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
---
 include/linux/pci.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index cd8aa6fce204..689c8277c584 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -154,6 +154,15 @@ enum pci_interrupt_pin {
 /* The number of legacy PCI INTx interrupts */
 #define PCI_NUM_INTX	4
 
+/*
+ * Reading from a device that doesn't respond typically returns ~0.  A
+ * successful read from a device may also return ~0, so you need additional
+ * information to reliably identify errors.
+ */
+#define PCI_ERROR_RESPONSE     (~0ULL)
+#define SET_PCI_ERROR_RESPONSE(val)    (*(val) = ((typeof(*(val))) PCI_ERROR_RESPONSE))
+#define RESPONSE_IS_PCI_ERROR(val) ((val) == ((typeof(val)) PCI_ERROR_RESPONSE))
+
 /*
  * pci_power_t values must match the bits in the Capabilities PME_Support
  * and Control/Status PowerState fields in the Power Management capability.
-- 
2.25.1


^ permalink raw reply related

* [PATCH v3 00/25] Unify PCI error response checking
From: Naveen Naidu @ 2021-10-21 15:07 UTC (permalink / raw)
  To: bhelgaas
  Cc: Krzysztof Wilczyński, linux-hyperv, Heiko Stuebner,
	linux-pci, Shawn Lin, Binghui Wang, Kuppuswamy Sathyanarayanan,
	Amey Narkhede, Oliver O'Halloran, Thomas Petazzoni,
	Lorenzo Pieralisi, Toan Le, K. Y. Srinivasan, Nirmal Patel,
	Marek Vasut, Rob Herring, Wei Liu, linux-samsung-soc,
	Marc Zyngier, Naveen Naidu, Joyce Ooi, Dexuan Cui,
	Kishon Vijay Abraham I, Jianjun Wang, linux-rockchip,
	maintainer:BROADCOM IPROC ARM ARCHITECTURE, Jonathan Derrick,
	Xiaowei Song, linux-kernel-mentees, Robert Richter, Sean V Kelley,
	Ray Jui, Haiyang Zhang, Ryder Lee, linux-mediatek, skhan,
	Matthias Brugger, Stephen Hemminger, linux-arm-kernel, Qiuxu Zhuo,
	Scott Branden, linuxppc-dev, Yoshihiro Shimoda,
	Krzysztof Kozlowski, linux-kernel, linux-renesas-soc,
	Lukas Wunner, Jingoo Han, Shawn Guo, Pali Rohár

An MMIO read from a PCI device that doesn't exist or doesn't respond
causes a PCI error.  There's no real data to return to satisfy the 
CPU read, so most hardware fabricates ~0 data.

This patch series adds PCI_ERROR_RESPONSE definition and other helper
definition SET_PCI_ERROR_RESPONSE and RESPONSE_IS_PCI_ERROR and uses it
where appropriate to make these checks consistent and easier to find.

This helps unify PCI error response checking and make error check
consistent and easier to find.

This series also ensures that the error response fabrication now happens
in the PCI_OP_READ and PCI_USER_READ_CONFIG. This removes the
responsibility from controller drivers to do the error response setting. 

Patch 1:
    - Adds the PCI_ERROR_RESPONSE and other related defintions
    - All other patches are dependent on this patch. This patch needs to
      be applied first, before the others

Patch 2:
    - Error fabrication happens in PCI_OP_READ and PCI_USER_READ_CONFIG
      whenever the data read via the controller driver fails.
    - This patch needs to be applied before, Patch 4/24 to Patch 15/24 are
      applied.

Patch 3:
    - Uses SET_PCI_ERROR_RESPONSE() when device is not found 

Patch 4 - 15:
    - Removes redundant error fabrication that happens in controller 
      drivers when the read from a PCI device fails.
    - These patches are dependent on Patch 2/24 of the series.
    - These can be applied in any order.

Patch 16 - 22:
    - Uses RESPONSE_IS_PCI_ERROR() to check the reads from hardware
    - Patches can be applied in any order.

Patch 23 - 25:
    - Edits the comments to include PCI_ERROR_RESPONSE alsong with
      0xFFFFFFFF, so that it becomes easier to grep for faulty 
      hardware reads.

Changelog
=========
v3:
   - Change RESPONSE_IS_PCI_ERROR macro definition
   - Fix the macros, Add () around macro parameters
   - Fix alignment issue in Patch 2/24
   - Add proper receipients for all the patches

v2:
    - Instead of using SET_PCI_ERROR_RESPONSE in all controller drivers
      to fabricate error response, only use them in PCI_OP_READ and
      PCI_USER_READ_CONFIG

Naveen Naidu (25):
  [Patch 1/25] PCI: Add PCI_ERROR_RESPONSE and it's related definitions
  [Patch 2/25] PCI: Set error response in config access defines when ops->read() fails
  [Patch 3/25] PCI: Use SET_PCI_ERROR_RESPONSE() when device not found
  [Patch 4/25] PCI: Remove redundant error fabrication when device read fails
  [Patch 5/25] PCI: thunder: Remove redundant error fabrication when device read fails
  [Patch 6/25] PCI: iproc: Remove redundant error fabrication when device read fails
  [Patch 7/25] PCI: mediatek: Remove redundant error fabrication when device read fails
  [Patch 8/25] PCI: exynos: Remove redundant error fabrication when device read fails
  [Patch 9/25] PCI: histb: Remove redundant error fabrication when device read fails
  [Patch 10/25] PCI: kirin: Remove redundant error fabrication when device read fails
  [Patch 11/25] PCI: aardvark: Remove redundant error fabrication when device read fails
  [Patch 12/25] PCI: mvebu: Remove redundant error fabrication when device read fails
  [Patch 13/25] PCI: altera: Remove redundant error fabrication when device read fails
  [Patch 14/25] PCI: rcar: Remove redundant error fabrication when device read fails
  [Patch 15/25] PCI: rockchip: Remove redundant error fabrication when device read fails
  [Patch 16/25] PCI/ERR: Use RESPONSE_IS_PCI_ERROR() to check read from hardware
  [Patch 17/25] PCI: vmd: Use RESPONSE_IS_PCI_ERROR() to check read from hardware
  [Patch 18/25] PCI: pciehp: Use RESPONSE_IS_PCI_ERROR() to check read from hardware
  [Patch 19/25] PCI/DPC: Use RESPONSE_IS_PCI_ERROR() to check read from hardware
  [Patch 20/25] PCI/PME: Use RESPONSE_IS_PCI_ERROR() to check read from hardware
  [Patch 21/25] PCI: cpqphp: Use RESPONSE_IS_PCI_ERROR() to check read from hardware
  [Patch 22/25] PCI: Use PCI_ERROR_RESPONSE to specify hardware error
  [Patch 23/25] PCI: keystone: Use PCI_ERROR_RESPONSE to specify hardware error
  [Patch 24/25] PCI: hv: Use PCI_ERROR_RESPONSE to specify hardware read error
  [Patch 25/25] PCI: xgene: Use PCI_ERROR_RESPONSE to specify hardware error

 drivers/pci/access.c                        | 32 +++++++-------
 drivers/pci/controller/dwc/pci-exynos.c     |  4 +-
 drivers/pci/controller/dwc/pci-keystone.c   |  4 +-
 drivers/pci/controller/dwc/pcie-histb.c     |  4 +-
 drivers/pci/controller/dwc/pcie-kirin.c     |  4 +-
 drivers/pci/controller/pci-aardvark.c       | 10 +----
 drivers/pci/controller/pci-hyperv.c         |  2 +-
 drivers/pci/controller/pci-mvebu.c          |  8 +---
 drivers/pci/controller/pci-thunder-ecam.c   | 46 +++++++--------------
 drivers/pci/controller/pci-thunder-pem.c    |  4 +-
 drivers/pci/controller/pci-xgene.c          |  8 ++--
 drivers/pci/controller/pcie-altera.c        |  4 +-
 drivers/pci/controller/pcie-iproc.c         |  4 +-
 drivers/pci/controller/pcie-mediatek.c      | 11 +----
 drivers/pci/controller/pcie-rcar-host.c     |  4 +-
 drivers/pci/controller/pcie-rockchip-host.c |  4 +-
 drivers/pci/controller/vmd.c                |  2 +-
 drivers/pci/hotplug/cpqphp_ctrl.c           |  4 +-
 drivers/pci/hotplug/pciehp_hpc.c            | 10 ++---
 drivers/pci/pci.c                           | 10 ++---
 drivers/pci/pcie/dpc.c                      |  4 +-
 drivers/pci/pcie/pme.c                      |  4 +-
 drivers/pci/probe.c                         | 10 ++---
 include/linux/pci.h                         |  9 ++++
 24 files changed, 85 insertions(+), 121 deletions(-)

-- 
2.25.1


^ permalink raw reply

* Re: coherency issue observed after hotplug on POWER8
From: Krzysztof Kozlowski @ 2021-10-21 15:01 UTC (permalink / raw)
  To: Naveen N. Rao, Thadeu Lima de Souza Cascardo, linuxppc-dev
In-Reply-To: <1632500323.sp1p885nv8.naveen@linux.ibm.com>

On 24/09/2021 19:17, Naveen N. Rao wrote:
> Hi Cascardo,
> Thanks for reporting this.
> 
> 
> Thadeu Lima de Souza Cascardo wrote:
>> Hi, there.
>>
>> We have been investigating an issue we have observed on POWER8 POWERNV systems.
>> When running the kernel selftests reuseport_bpf_cpu after a CPU hotplug, we see
>> crashes, in different forms. [1]
> 
> Just to re-confirm: you are only seeing this on P8 powernv, and not in a 
> P8 guest/LPAR? I haven't been able to reproduce this on a firestone -- 
> can you share more details about your power8 machine?
> 
> Also, do you only see this with ubuntu kernels, or are you also able to 
> reproduce this with the upstream tree?

Let me just covert this part of your email:

Upstream trees (5.11, 5.13, 5.14). See also:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1927076/comments/28

I could not reproduce it on Power8 LPAR. Neither on Power9 QEMU guest.

Reproduced on few machines:
IBM, POWER8NVL, 8335-GTB
POWER8, 8001-22C and 8335-GTA

lspcpu for the last one:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1927076/comments/15


Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
From: Eric W. Biederman @ 2021-10-21 13:33 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Rich Felker, Linux-sh list, Linux Kernel Mailing List,
	Max Filippov, Paul Mackerras, Greentime Hu, H Peter Anvin,
	sparclinux, Vincent Chen, Linux-Arch, linux-s390, Yoshinori Sato,
	Christian Borntraeger, Ingo Molnar,
	open list:TENSILICA XTENSA PORT (xtensa), Kees Cook,
	Vasily Gorbik, Heiko Carstens, Openrisc, Borislav Petkov, Al Viro,
	Andy Lutomirski, Oleg Nesterov, Thomas Gleixner, Chris Zankel,
	Jonas Bonn, Nick Hu, Greg Kroah-Hartman, Linus Torvalds,
	open list:BROADCOM NVRAM DRIVER, Thomas Bogendoerfer,
	linuxppc-dev, David Miller, Maciej Rozycki
In-Reply-To: <CAMuHMdURxrdjsXq7+q-AWTwxVUdmddOj2vvNHv6M=WtsU5nRvg@mail.gmail.com>

Geert Uytterhoeven <geert@linux-m68k.org> writes:

> Hi Eric,
>
> Patch 21/20?

In reviewing another part of the patchset Linus asked if force_sigsegv
could go away.  It can't completely but I can get this far.

Given that it is just a cleanup it makes most sense to me as an
additional patch on top of what is already here.


> On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Now that force_fatal_sig exists it is unnecessary and a bit confusing
>> to use force_sigsegv in cases where the simpler force_fatal_sig is
>> wanted.  So change every instance we can to make the code clearer.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
>>  arch/m68k/kernel/traps.c        | 2 +-
>
> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Thank you.

Eric

^ permalink raw reply

* Re: [PATCH v2] powerpc/pseries/mobility: ignore ibm, platform-facilities updates
From: Nathan Lynch @ 2021-10-21 12:18 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: tyreld, cheloha, ldufour, linuxppc-dev
In-Reply-To: <87zgr3expl.fsf@dja-thinkpad.axtens.net>

Daniel Axtens <dja@axtens.net> writes:
>> On VMs with NX encryption, compression, and/or RNG offload, these
>> capabilities are described by nodes in the ibm,platform-facilities device
>> tree hierarchy:
>>
>>   $ tree -d /sys/firmware/devicetree/base/ibm,platform-facilities/
>>   /sys/firmware/devicetree/base/ibm,platform-facilities/
>>   ├── ibm,compression-v1
>>   ├── ibm,random-v1
>>   └── ibm,sym-encryption-v1
>>
>>   3 directories
>>
>> The acceleration functions that these nodes describe are not disrupted by
>> live migration, not even temporarily.
>>
>> But the post-migration ibm,update-nodes sequence firmware always sends
>> "delete" messages for this hierarchy, followed by an "add" directive to
>> reconstruct it via ibm,configure-connector (log with debugging statements
>> enabled in mobility.c):
>>
>>   mobility: removing node /ibm,platform-facilities/ibm,random-v1:4294967285
>>   mobility: removing node /ibm,platform-facilities/ibm,compression-v1:4294967284
>>   mobility: removing node /ibm,platform-facilities/ibm,sym-encryption-v1:4294967283
>>   mobility: removing node /ibm,platform-facilities:4294967286
>>   ...
>>   mobility: added node /ibm,platform-facilities:4294967286
>>
>> Note we receive a single "add" message for the entire hierarchy, and what
>> we receive from the ibm,configure-connector sequence is the top-level
>> platform-facilities node along with its three children. The debug message
>> simply reports the parent node and not the whole subtree.
>
> If I understand correctly, (and again, this is not my area at all!) we
> still have to go out to the firmware and call the
> ibm,configure-connector sequence in order to figure out that the node
> we're supposed to add is the ibm,platform-facilites node, right? We
> can't save enough information at delete time to avoid the trip out to
> firmware?

That is right... but maybe I don't understand your angle here. Unsure
what avoiding the configure-connector sequence for the nodes would buy
us.


>> Until that can be realized we have a confirmed use-after-free and the
>> possibility of memory corruption. So add a limited workaround that
>> discriminates on the node type, ignoring adds and removes. This should be
>> amenable to backporting in the meantime.
>
> Yeah it's an unpleasant situation to find ourselves in. It's a bit icky
> but as I think you said in a previous email, at least this isn't worse:
> in the common case it should now succeed and and if properties change
> significantly it will still fail.
>
> My one question (from more of a security point of view) is:
>  1) Say you start using the facilities with a particular set of
>     parameters.
>
>  2) Say you then get migrated and the parameters change.
>
>  3) If you keep using the platform facilities as if the original
>     properties are still valid, can this cause any Interesting,
>     unexpected or otherwise Bad consequences? Are we going to end up
>     (for example) scribbling over random memory somehow?

If drivers are safely handling errors from H_COP_OP etc, then no. (I
know, this looks like a Well That Would Be a Driver Bug dismissal, but
that's not my attitude.) And again this is a case where the change
cannot make things worse.

In the current design of the pseries LPM implementation, user space and
other normal system activity resume as soon as we return from the
stop_machine() call which suspends the partition, executing concurrently
with any device tree updates. So even if we had code in place to
correctly resolve the DT changes and the drivers were able to respond to
the changes, there would still be a window of exposure to the kind of
problem you describe: the changed characteristics, if any, of the
destination obtain as soon as execution resumes, regardless of when the
OS initiates the update-nodes sequence.

The way out of that mess is to use the Linux suspend framework, or
otherwise prevent user space from executing until the destination
system's characteristics have been appropriately propagated out to the
necessary drivers etc. I'm trying to get there.


> Apart from that, the code seems to do what it says, it seems to solve a
> real problem, the error and memory handling makes sense, you _put the DT
> nodes that you _get, the comments are helpful and descriptive, and it
> passes the automated tests on patchwork/snowpatch.

I appreciate your review!

^ permalink raw reply

* [GIT PULL] Please pull powerpc/linux.git powerpc-5.15-5 tag
From: Michael Ellerman @ 2021-10-21 11:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: nathanl, linuxppc-dev, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 5.15:

The following changes since commit cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337:

  KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest (2021-10-16 00:40:03 +1100)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-5.15-5

for you to fetch changes up to 787252a10d9422f3058df9a4821f389e5326c440:

  powerpc/smp: do not decrement idle task preempt count in CPU offline (2021-10-20 21:38:01 +1100)

- ------------------------------------------------------------------
powerpc fixes for 5.15 #5

Fix a bug exposed by a previous fix, where running guests with certain SMT topologies
could crash the host on Power8.

Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is enabled.

Thanks to: Nathan Lynch, Srikar Dronamraju, Valentin Schneider.

- ------------------------------------------------------------------
Michael Ellerman (1):
      powerpc/idle: Don't corrupt back chain when going idle

Nathan Lynch (1):
      powerpc/smp: do not decrement idle task preempt count in CPU offline


 arch/powerpc/kernel/idle_book3s.S | 10 ++++++----
 arch/powerpc/kernel/smp.c         |  2 --
 2 files changed, 6 insertions(+), 6 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmFxT3IACgkQUevqPMjh
pYD0Mg//fh9BEVcfCtGJskqpbppkLJL8Pk/npKdiXZ1//ESTnSSXp0SfjHwnKW8H
R+FYUomAdB4pis6lfUUlVFHcADPrf1C55IF6f4pP7mLWGKuscMTjDRmgBOkgEreY
pciP+aGkNWu6Lmzoz1ZEqYr1mZW6TX3/Os9BabFUNze4gzTT6Y4U+/QOYrt5VQZB
SAnzyfjOq0c9HDP3OFVn9xUGkOpikRA2rT/0lKVFs5CPkqmLv82i/slz9SwE96kX
Zfi9CCJ3ule0RgysYg33QpAzZfQiLATAJBLk+Wlyl9SAFQ8w+cOhFtJmHryGFPz5
n5JopbE2lECJxw5fhasLwraDZzTd84xHvx1xpl2nIQEzrRlpV+Kq2c7SEbNROakL
rp/xmnBjfo9wMVwjo7x20arqj+o5XBs7yW04gMXV5yJVMUjgn288LAZzTe9nNegk
drzfrVHyvNKCoEWZr0egUDazSuh1kiWC9srmZ3IB2Cx2AGXYvXk+MsftoIqP8Nu6
gs9+NRwNiroaX3aujlcZHC4J7QwGMUZo6Dvs2e0VX+kvGlvnrP8+7pQ0eSoSJlVv
DfeTZa630yD0TdPVqWXHHeOUqbza/bpeHUxdFwSYdlr4DLrr3XkfaO/VQ6XJmTHa
9aZq3y7ALgQGvXHUIJYH4upZYoK1BhNqCJ0A2w+8uqKlunlXx8w=
=ES/7
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH v2] powerpc/smp: do not decrement idle task preempt count in CPU offline
From: Michael Ellerman @ 2021-10-21 11:07 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev
  Cc: srikar, peterz, linux-kernel, mingo, clg, valentin.schneider
In-Reply-To: <20211015173902.2278118-1-nathanl@linux.ibm.com>

On Fri, 15 Oct 2021 12:39:02 -0500, Nathan Lynch wrote:
> With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
> get:
> 
> BUG: scheduling while atomic: swapper/1/0/0x00000000
> no locks held by swapper/1/0.
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100
> Call Trace:
>  dump_stack_lvl+0xac/0x108
>  __schedule_bug+0xac/0xe0
>  __schedule+0xcf8/0x10d0
>  schedule_idle+0x3c/0x70
>  do_idle+0x2d8/0x4a0
>  cpu_startup_entry+0x38/0x40
>  start_secondary+0x2ec/0x3a0
>  start_secondary_prolog+0x10/0x14
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/smp: do not decrement idle task preempt count in CPU offline
      https://git.kernel.org/powerpc/c/787252a10d9422f3058df9a4821f389e5326c440

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/idle: Don't corrupt back chain when going idle
From: Michael Ellerman @ 2021-10-21 11:07 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: npiggin, kvm-ppc
In-Reply-To: <20211020094826.3222052-1-mpe@ellerman.id.au>

On Wed, 20 Oct 2021 20:48:26 +1100, Michael Ellerman wrote:
> In isa206_idle_insn_mayloss() we store various registers into the stack
> red zone, which is allowed.
> 
> However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again,
> to 0(r1), which corrupts the stack back chain.
> 
> We used to do the same in isa206_idle_insn_mayloss() itself, but we
> fixed that in 73287caa9210 ("powerpc64/idle: Fix SP offsets when saving
> GPRs"), however we missed that the macro also corrupts the back chain.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/idle: Don't corrupt back chain when going idle
      https://git.kernel.org/powerpc/c/496c5fe25c377ddb7815c4ce8ecfb676f051e9b6

cheers

^ permalink raw reply

* [PATCH] powerpc: mpc8349emitx: Make mcu_gpiochip_remove() return void
From: Uwe Kleine-König @ 2021-10-21 10:56 UTC (permalink / raw)
  To: Scott Wood, Michael Ellerman; +Cc: Paul Mackerras, linuxppc-dev, kernel

Up to now mcu_gpiochip_remove() returns zero unconditionally. Make it
return void instead which makes it easier to see in the callers that
there is no error to handle.

Also the return value of i2c remove callbacks is ignored anyway.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c b/arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c
index 409481016928..bb789f33c70e 100644
--- a/arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c
+++ b/arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c
@@ -135,11 +135,10 @@ static int mcu_gpiochip_add(struct mcu *mcu)
 	return gpiochip_add_data(gc, mcu);
 }
 
-static int mcu_gpiochip_remove(struct mcu *mcu)
+static void mcu_gpiochip_remove(struct mcu *mcu)
 {
 	kfree(mcu->gc.label);
 	gpiochip_remove(&mcu->gc);
-	return 0;
 }
 
 static int mcu_probe(struct i2c_client *client)
@@ -198,9 +197,7 @@ static int mcu_remove(struct i2c_client *client)
 		glob_mcu = NULL;
 	}
 
-	ret = mcu_gpiochip_remove(mcu);
-	if (ret)
-		return ret;
+	mcu_gpiochip_remove(mcu);
 	kfree(mcu);
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related

* Re: [PATCH V2] powerpc/perf: Enable PMU counters post partition migration if PMU is active
From: Nicholas Piggin @ 2021-10-21 10:54 UTC (permalink / raw)
  To: Athira Rajeev, mpe; +Cc: kjain, maddy, linuxppc-dev, rnsastry
In-Reply-To: <1626006357-1611-1-git-send-email-atrajeev@linux.vnet.ibm.com>

Excerpts from Athira Rajeev's message of July 11, 2021 10:25 pm:
> During Live Partition Migration (LPM), it is observed that perf
> counter values reports zero post migration completion. However
> 'perf stat' with workload continues to show counts post migration
> since PMU gets disabled/enabled during sched switches. But incase
> of system/cpu wide monitoring, zero counts were reported with 'perf
> stat' after migration completion.
> 
> Example:
>  ./perf stat -e r1001e -I 1000
>            time             counts unit events
>      1.001010437         22,137,414      r1001e
>      2.002495447         15,455,821      r1001e
> <<>> As seen in next below logs, the counter values shows zero
>         after migration is completed.
> <<>>
>     86.142535370    129,392,333,440      r1001e
>     87.144714617                  0      r1001e
>     88.146526636                  0      r1001e
>     89.148085029                  0      r1001e
> 
> Here PMU is enabled during start of perf session and counter
> values are read at intervals. Counters are only disabled at the
> end of session. The powerpc mobility code presently does not handle
> disabling and enabling back of PMU counters during partition
> migration. Also since the PMU register values are not saved/restored
> during migration, PMU registers like Monitor Mode Control Register 0
> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
> the value it was programmed with. Hence PMU counters will not be
> enabled correctly post migration.
> 
> Fix this in mobility code by handling disabling and enabling of
> PMU in all cpu's before and after migration. Patch introduces two
> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
> mobility_pmu_disable() is called before the processor threads goes
> to suspend state so as to disable the PMU counters. And disable is
> done only if there are any active events running on that cpu.
> mobility_pmu_enable() is called after the processor threads are
> back online to enable back the PMU counters.
> 
> Since the performance Monitor counters ( PMCs) are not
> saved/restored during LPM, results in PMC value being zero and the
> 'event->hw.prev_count' being non-zero value. This causes problem

Interesting. Are they defined to not be migrated, or may not be 
migrated?

I wonder what QEMU migration does with PMU registers.

> during updation of event->count since we always accumulate
> (event->hw.prev_count - PMC value) in event->count.  If
> event->hw.prev_count is greater PMC value, event->count becomes
> negative. Fix this by re-initialising 'prev_count' also for all
> events while enabling back the events. A new variable 'migrate' is
> introduced in 'struct cpu_hw_event' to achieve this for LPM cases
> in power_pmu_enable. Use the 'migrate' value to clear the PMC
> index (stored in event->hw.idx) for all events so that event
> count settings will get re-initialised correctly.
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> [ Fixed compilation error reported by kernel test robot ]
> Reported-by: kernel test robot <lkp@intel.com>
> ---
> Change from v1 -> v2:
>  - Moved the mobility_pmu_enable and mobility_pmu_disable
>    declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>    Also included 'asm/rtas.h' in core-book3s to fix the
>    compilation warning reported by kernel test robot.
> 
>  arch/powerpc/include/asm/rtas.h           |  8 ++++++
>  arch/powerpc/perf/core-book3s.c           | 44 ++++++++++++++++++++++++++++---
>  arch/powerpc/platforms/pseries/mobility.c |  4 +++
>  3 files changed, 53 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 9dc97d2..cea72d7 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -380,5 +380,13 @@ static inline void rtas_initialize(void) { }
>  static inline void read_24x7_sys_info(void) { }
>  #endif
>  
> +#ifdef CONFIG_PPC_PERF_CTRS
> +void mobility_pmu_disable(void);
> +void mobility_pmu_enable(void);
> +#else
> +static inline void mobility_pmu_disable(void) { }
> +static inline void mobility_pmu_enable(void) { }
> +#endif
> +
>  #endif /* __KERNEL__ */
>  #endif /* _POWERPC_RTAS_H */

It's not implemented in rtas, maybe consider putting this into a perf 
header?

> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index bb0ee71..90da7fa 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -18,6 +18,7 @@
>  #include <asm/ptrace.h>
>  #include <asm/code-patching.h>
>  #include <asm/interrupt.h>
> +#include <asm/rtas.h>
>  
>  #ifdef CONFIG_PPC64
>  #include "internal.h"
> @@ -58,6 +59,7 @@ struct cpu_hw_events {
>  
>  	/* Store the PMC values */
>  	unsigned long pmcs[MAX_HWEVENTS];
> +	int migrate;
>  };
>  
>  static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
> @@ -1335,6 +1337,22 @@ static void power_pmu_disable(struct pmu *pmu)
>  }
>  
>  /*
> + * Called from powerpc mobility code
> + * before migration to disable counters
> + * if the PMU is active.
> + */
> +void mobility_pmu_disable(void)
> +{
> +	struct cpu_hw_events *cpuhw;
> +
> +	cpuhw = this_cpu_ptr(&cpu_hw_events);
> +	if (cpuhw->n_events != 0) {
> +		power_pmu_disable(NULL);
> +		cpuhw->migrate = 1;
> +	}

Rather than add this migrate logic into power_pmu_enable(), would you be 
able to do the work in the mobility callbacks instead? Something like
this:

void mobility_pmu_disable(void)
{
        struct cpu_hw_events *cpuhw;

        cpuhw = this_cpu_ptr(&cpu_hw_events);
        if (cpuhw->n_events != 0) {
                int i;

                power_pmu_disable(NULL);
                /*
                 * Read off any pre-existing events because the register
                 * values may not be migrated.
                 */
                for (i = 0; i < cpuhw->n_events; ++i) {
                        event = cpuhw->event[i];
                        if (event->hw.idx) {
                                power_pmu_read(event);
                                event->hw.idx = 0;
                        }
                }
        }
}

void mobility_pmu_enable(void)
{
        struct cpu_hw_events *cpuhw;
 
        cpuhw = this_cpu_ptr(&cpu_hw_events);
        cpuhw->n_added = cpuhw->n_events;
        power_pmu_enable(NULL);
}

> +}
> +
> +/*
>   * Re-enable all events if disable == 0.
>   * If we were previously disabled and events were added, then
>   * put the new config on the PMU.
> @@ -1379,8 +1397,10 @@ static void power_pmu_enable(struct pmu *pmu)
>  	 * no need to recalculate MMCR* settings and reset the PMCs.
>  	 * Just reenable the PMU with the current MMCR* settings
>  	 * (possibly updated for removal of events).
> +	 * While reenabling PMU during partition migration, continue
> +	 * with normal flow.
>  	 */
> -	if (!cpuhw->n_added) {
> +	if (!cpuhw->n_added && !cpuhw->migrate) {
>  		mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
>  		mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
>  		if (ppmu->flags & PPMU_ARCH_31)
> @@ -1434,11 +1454,15 @@ static void power_pmu_enable(struct pmu *pmu)
>  	/*
>  	 * Read off any pre-existing events that need to move
>  	 * to another PMC.
> +	 * While enabling PMU during partition migration,
> +	 * skip power_pmu_read since all event count settings
> +	 * needs to be re-initialised after migration.
>  	 */
>  	for (i = 0; i < cpuhw->n_events; ++i) {
>  		event = cpuhw->event[i];
> -		if (event->hw.idx && event->hw.idx != hwc_index[i] + 1) {
> -			power_pmu_read(event);
> +		if ((event->hw.idx && event->hw.idx != hwc_index[i] + 1) || (cpuhw->migrate)) {
> +			if (!cpuhw->migrate)
> +				power_pmu_read(event);
>  			write_pmc(event->hw.idx, 0);
>  			event->hw.idx = 0;
>  		}
> @@ -1506,6 +1530,20 @@ static void power_pmu_enable(struct pmu *pmu)
>  	local_irq_restore(flags);
>  }
>  
> +/*
> + * Called from powerpc mobility code
> + * during migration completion to
> + * enable back PMU counters.
> + */
> +void mobility_pmu_enable(void)
> +{
> +	struct cpu_hw_events *cpuhw;
> +
> +	cpuhw = this_cpu_ptr(&cpu_hw_events);
> +	power_pmu_enable(NULL);
> +	cpuhw->migrate = 0;
> +}
> +
>  static int collect_events(struct perf_event *group, int max_count,
>  			  struct perf_event *ctrs[], u64 *events,
>  			  unsigned int *flags)
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index e83e089..ff7a77c 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -476,6 +476,8 @@ static int do_join(void *arg)
>  retry:
>  	/* Must ensure MSR.EE off for H_JOIN. */
>  	hard_irq_disable();
> +	/* Disable PMU before suspend */
> +	mobility_pmu_disable();
>  	hvrc = plpar_hcall_norets(H_JOIN);
>  
>  	switch (hvrc) {

Does the retry case cause mobility_pmu_disable to be called twice 
without mobility_pmu_enable called? Would be neater if it was
balanced.


> @@ -530,6 +532,8 @@ static int do_join(void *arg)
>  	 * reset the watchdog.
>  	 */
>  	touch_nmi_watchdog();
> +	/* Enable PMU after resuming */
> +	mobility_pmu_enable();
>  	return ret;
>  }

Not really a big deal but while you're updating it, you could leave 
touch_nmi_watchdog() at the very end.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v11 0/3] make hvc pass dma capable memory to its backend
From: Xianting Tian @ 2021-10-21  8:56 UTC (permalink / raw)
  To: Greg KH
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov
In-Reply-To: <YXEmW071C+GlmXqw@kroah.com>

I am very glad to get this reply:)

Thank you and other experts' kindly help and guide, which improved me a lot.

在 2021/10/21 下午4:35, Greg KH 写道:
> On Fri, Oct 15, 2021 at 10:46:55AM +0800, Xianting Tian wrote:
>> Dear all,
>>
>> This patch series make hvc framework pass DMA capable memory to
>> put_chars() of hvc backend(eg, virtio-console), and revert commit
>> c4baad5029 ("virtio-console: avoid DMA from stack”)
> Thanks for sticking with this, looks much better now, all now queued up.
>
> greg k-h

^ permalink raw reply

* Re: [PATCH v11 0/3] make hvc pass dma capable memory to its backend
From: Greg KH @ 2021-10-21  8:35 UTC (permalink / raw)
  To: Xianting Tian
  Cc: arnd, amit, jirislaby, shile.zhang, linux-kernel, virtualization,
	linuxppc-dev, osandov
In-Reply-To: <20211015024658.1353987-1-xianting.tian@linux.alibaba.com>

On Fri, Oct 15, 2021 at 10:46:55AM +0800, Xianting Tian wrote:
> Dear all,
> 
> This patch series make hvc framework pass DMA capable memory to
> put_chars() of hvc backend(eg, virtio-console), and revert commit
> c4baad5029 ("virtio-console: avoid DMA from stack”)

Thanks for sticking with this, looks much better now, all now queued up.

greg k-h

^ permalink raw reply

* Re: [PATCH 21/20] signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
From: Philippe Mathieu-Daudé @ 2021-10-21  8:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rich Felker, linux-xtensa, Oleg Nesterov, Max Filippov,
	Paul Mackerras, H Peter Anvin, sparclinux, Vincent Chen,
	Thomas Gleixner, linux-arch, linux-s390, Yoshinori Sato, linux-sh,
	Christian Borntraeger, Ingo Molnar,
	open list:BROADCOM NVRAM DRIVER, Jonas Bonn, Kees Cook,
	Vasily Gorbik, Heiko Carstens, Stefan Kristiansson, openrisc,
	Borislav Petkov, Al Viro, Andy Lutomirski, Stafford Horne,
	Chris Zankel, Thomas Bogendoerfer, Nick Hu, linuxppc-dev,
	open list, Greg Kroah-Hartman, Maciej Rozycki, Linus Torvalds,
	David Miller, Greentime Hu
In-Reply-To: <877de7jrev.fsf@disp2133>

On Wed, Oct 20, 2021 at 11:52 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
>
> Now that force_fatal_sig exists it is unnecessary and a bit confusing
> to use force_sigsegv in cases where the simpler force_fatal_sig is
> wanted.  So change every instance we can to make the code clearer.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  arch/arc/kernel/process.c       | 2 +-
>  arch/m68k/kernel/traps.c        | 2 +-
>  arch/powerpc/kernel/signal_32.c | 2 +-
>  arch/powerpc/kernel/signal_64.c | 4 ++--
>  arch/s390/kernel/traps.c        | 2 +-
>  arch/um/kernel/trap.c           | 2 +-
>  arch/x86/kernel/vm86_32.c       | 2 +-
>  fs/exec.c                       | 2 +-
>  8 files changed, 9 insertions(+), 9 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox