LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc: always enable queued spinlocks for 64s, disable for others
From: Nicholas Piggin @ 2020-12-22  3:28 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <769ec5dd-8e74-56cb-a3fe-3b657bb3d14c@csgroup.eu>

Excerpts from Christophe Leroy's message of December 21, 2020 4:04 pm:
> 
> 
> Le 21/12/2020 à 04:22, Nicholas Piggin a écrit :
>> Queued spinlocks have shown to have good performance and fairness
>> properties even on smaller (2 socket) POWER systems. This selects
>> them automatically for 64s. For other platforms they are de-selected,
>> the standard spinlock is far simpler and smaller code, and single
>> chips with a handful of cores is unlikely to show any improvement.
>> 
>> CONFIG_EXPERT still allows this to be changed, e.g., to help debug
>> performance or correctness issues.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/Kconfig | 8 +++-----
>>   1 file changed, 3 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index ae7391627054..1f9f9e64d638 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -255,6 +255,7 @@ config PPC
>>   	select PCI_MSI_ARCH_FALLBACKS		if PCI_MSI
>>   	select PCI_SYSCALL			if PCI
>>   	select PPC_DAWR				if PPC64
>> +	select PPC_QUEUED_SPINLOCKS		if !EXPERT && PPC_BOOK3S_64 && SMP
> 
> The condition is a bit complicated, and it doesn't set it to Y by default when EXPERT is selected.

Yeah, I don't know how to do that (switch people's oldconfig from =N to 
=Y) otherwise (without renaming the option). I think it's enough though, 
experts should have said yes already :)

> 
>>   	select RTC_LIB
>>   	select SPARSE_IRQ
>>   	select SYSCTL_EXCEPTION_TRACE
>> @@ -506,16 +507,13 @@ config HOTPLUG_CPU
>>   config PPC_QUEUED_SPINLOCKS
>>   	bool "Queued spinlocks"
>>   	depends on SMP
>> +	depends on EXPERT || PPC_BOOK3S_64
>> +
> 
> I would do:
> 
>     config PPC_QUEUED_SPINLOCKS
>   	bool "Queued spinlocks" if EXPERT
>   	depends on SMP
> 	default PPC_BOOK3S_64

That's nicer.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH] powerpc:Don't print raw EIP/LR hex values in dump_stack() and show_regs()
From: Segher Boessenkool @ 2020-12-21 17:12 UTC (permalink / raw)
  To: David Laight
  Cc: ravi.bangoria@linux.ibm.com, mikey@neuling.org,
	yanaijie@huawei.com, wangle6@huawei.com,
	linuxppc-dev@lists.ozlabs.org, haren@linux.ibm.com,
	linux-kernel@vger.kernel.org, paulus@samba.org, npiggin@gmail.com,
	aneesh.kumar@linux.ibm.com, Xiaoming Ni
In-Reply-To: <ad814ccf34c14c76b45e50b6e7741c3a@AcuMS.aculab.com>

On Mon, Dec 21, 2020 at 04:42:23PM +0000, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 21 December 2020 16:32
> > 
> > On Mon, Dec 21, 2020 at 04:17:21PM +0100, Christophe Leroy wrote:
> > > Le 21/12/2020 à 04:27, Xiaoming Ni a écrit :
> > > >Since the commit 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR
> > > >infrastructure"), the powerpc system is ready to support KASLR.
> > > >To reduces the risk of invalidating address randomization, don't print the
> > > >EIP/LR hex values in dump_stack() and show_regs().
> > 
> > > I think your change is not enough to hide EIP address, see below a dump
> > > with you patch, you get "Faulting instruction address: 0xc03a0c14"
> > 
> > As far as I can see the patch does nothing to the GPR printout.  Often
> > GPRs contain code addresses.  As one example, the LR is moved via a GPR
> > (often GPR0, but not always) for storing on the stack.
> > 
> > So this needs more work.
> 
> If the dump_stack() is from an oops you need the real EIP value
> on order to stand any chance of making headway.

Or at least the function name + offset, yes.

> Otherwise you might just as well just print 'borked - tough luck'.

Yes.  ASLR is a house of cards.  But that isn't constructive wrt this
patch :-)


Segher

^ permalink raw reply

* RE: [PATCH] powerpc:Don't print raw EIP/LR hex values in dump_stack() and show_regs()
From: David Laight @ 2020-12-21 16:42 UTC (permalink / raw)
  To: 'Segher Boessenkool', Christophe Leroy
  Cc: ravi.bangoria@linux.ibm.com, mikey@neuling.org,
	yanaijie@huawei.com, wangle6@huawei.com,
	linuxppc-dev@lists.ozlabs.org, haren@linux.ibm.com,
	linux-kernel@vger.kernel.org, npiggin@gmail.com, paulus@samba.org,
	aneesh.kumar@linux.ibm.com, Xiaoming Ni
In-Reply-To: <20201221163130.GZ2672@gate.crashing.org>

From: Segher Boessenkool
> Sent: 21 December 2020 16:32
> 
> On Mon, Dec 21, 2020 at 04:17:21PM +0100, Christophe Leroy wrote:
> > Le 21/12/2020 à 04:27, Xiaoming Ni a écrit :
> > >Since the commit 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR
> > >infrastructure"), the powerpc system is ready to support KASLR.
> > >To reduces the risk of invalidating address randomization, don't print the
> > >EIP/LR hex values in dump_stack() and show_regs().
> 
> > I think your change is not enough to hide EIP address, see below a dump
> > with you patch, you get "Faulting instruction address: 0xc03a0c14"
> 
> As far as I can see the patch does nothing to the GPR printout.  Often
> GPRs contain code addresses.  As one example, the LR is moved via a GPR
> (often GPR0, but not always) for storing on the stack.
> 
> So this needs more work.

If the dump_stack() is from an oops you need the real EIP value
on order to stand any chance of making headway.

Otherwise you might just as well just print 'borked - tough luck'.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply

* Re: [PATCH] powerpc:Don't print raw EIP/LR hex values in dump_stack() and show_regs()
From: Segher Boessenkool @ 2020-12-21 16:31 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Xiaoming Ni, ravi.bangoria, mikey, yanaijie, haren, linux-kernel,
	npiggin, wangle6, paulus, aneesh.kumar, linuxppc-dev
In-Reply-To: <2279fc96-1f10-0c3f-64d9-734f18758620@csgroup.eu>

On Mon, Dec 21, 2020 at 04:17:21PM +0100, Christophe Leroy wrote:
> Le 21/12/2020 à 04:27, Xiaoming Ni a écrit :
> >Since the commit 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR
> >infrastructure"), the powerpc system is ready to support KASLR.
> >To reduces the risk of invalidating address randomization, don't print the
> >EIP/LR hex values in dump_stack() and show_regs().

> I think your change is not enough to hide EIP address, see below a dump 
> with you patch, you get "Faulting instruction address: 0xc03a0c14"

As far as I can see the patch does nothing to the GPR printout.  Often
GPRs contain code addresses.  As one example, the LR is moved via a GPR
(often GPR0, but not always) for storing on the stack.

So this needs more work.


Segher

^ permalink raw reply

* Re: [PATCH] powerpc:Don't print raw EIP/LR hex values in dump_stack() and show_regs()
From: Christophe Leroy @ 2020-12-21 15:17 UTC (permalink / raw)
  To: Xiaoming Ni, linux-kernel, benh, mpe, paulus, linuxppc-dev,
	yanaijie, npiggin, ravi.bangoria, mikey, aneesh.kumar, haren
  Cc: wangle6
In-Reply-To: <20201221032758.12143-1-nixiaoming@huawei.com>



Le 21/12/2020 à 04:27, Xiaoming Ni a écrit :
> Since the commit 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR
> infrastructure"), the powerpc system is ready to support KASLR.
> To reduces the risk of invalidating address randomization, don't print the
> EIP/LR hex values in dump_stack() and show_regs().

I think this change is worth providing more details. Explain how the change improves debugging.

> 
> This patch follows x86 and arm64's lead:
>      commit a25ffd3a6302a6 ("arm64: traps: Don't print stack or raw
>       PC/LR values in backtraces")
>      commit bb5e5ce545f203 ("x86/dumpstack: Remove kernel text
>       addresses from stack dump")

I think your change is not enough to hide EIP address, see below a dump with you patch, you get 
"Faulting instruction address: 0xc03a0c14"

Also, you replace a 8 digits value by a potentially long ling. Should should therefore put NIP and 
LR on different lines, and put CTR somewhere else (next to XER ?)

Now, the NIP and LR before the REGS and after the GPRs are exactly the same. No need to duplicate.

root@vgoip:~# busybox echo ACCESS_USERSPACE > /sys/kernel/debug/provoke-crash/DI
RECT
[  198.698395] lkdtm: Performing direct entry ACCESS_USERSPACE
[  198.698742] lkdtm: attempting bad read at 77ce8000
[  198.698844] Kernel attempted to read user page (77ce8000) - exploit attempt? (uid: 0)
[  198.706489] BUG: Unable to handle kernel data access on read at 0x77ce8000
[  198.713274] Faulting instruction address: 0xc03a0c14
[  198.718187] Oops: Kernel access of bad area, sig: 11 [#1]
[  198.723516] BE PAGE_SIZE=16K PREEMPT CMPC885
[  198.731272] CPU: 0 PID: 370 Comm: busybox Not tainted 5.10.0-s3k-dev-13158-g8e389861badd #4386
[  198.739785] NIP: lkdtm_ACCESS_USERSPACE+0xbc/0x110 LR: lkdtm_ACCESS_USERSPACE+0xbc/0x110 CTR: 
00000000
[  198.748994] REGS: cad83d30 TRAP: 0300   Not tainted  (5.10.0-s3k-dev-13158-g8e389861badd)
[  198.757081] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 24002222  XER: 00000000
[  198.763881] DAR: 77ce8000 DSISR: 88000000
[  198.763881] GPR00: c03a0c14 cad83df0 c28a8000 00000026 c093e518 00000001 00000027 00000027
[  198.763881] GPR08: c09a26dc 00000000 00000000 3ffff000 24002228 100d3dd6 100a2ffc 00000000
[  198.763881] GPR16: 100cd280 100b0000 107e42ec 107e54b5 100d0000 100d0000 00000000 100a2fdc
[  198.763881] GPR24: ffffffef ffffffff cad83f10 00000011 c078298c c2930000 c084b980 77ce8000
[  198.802865] NIP lkdtm_ACCESS_USERSPACE+0xbc/0x110
[  198.807514] LR lkdtm_ACCESS_USERSPACE+0xbc/0x110
[  198.812077] Call Trace:
[  198.814485]  [cad83df0] lkdtm_ACCESS_USERSPACE+0xbc/0x110 (unreliable)
[  198.820940]  [cad83e20] direct_entry+0xe0/0x164
[  198.825415]  [cad83e50] full_proxy_write+0x78/0xbc
[  198.830148]  [cad83e80] vfs_write+0x12c/0x478
[  198.834452]  [cad83f00] ksys_write+0x78/0x128
[  198.838754]  [cad83f30] ret_from_syscall+0x0/0x34
[  198.843401] --- interrupt: c01 at 0xfd51d0c
[  198.847532] NIP: 0xfd51d0c LR: 0x10008404 CTR: 0fcff380
[  198.852696] REGS: cad83f40 TRAP: 0c01   Not tainted  (5.10.0-s3k-dev-13158-g8e389861badd)
[  198.860784] MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 44002424  XER: 00000000
[  198.867928]
[  198.867928] GPR00: 00000004 7fde21f0 77cf34d0 00000001 10640008 00000011 00000000 0fd5b36c
[  198.867928] GPR08: 00000000 00024000 00000000 00000009 84022222
[  198.884193] NIP 0xfd51d0c
[  198.886775] LR 0x10008404
[  198.889357] --- interrupt: c01
[  198.892373] Instruction dump:
[  198.895298] 7d295279 39400000 40820078 80010034 83e1002c 7c0803a6 38210030 4e800020
[  198.903214] 3c60c085 7fe4fb78 3863c874 4bcc5805 <813f0000> 3c60c085 3d29c0df 3929c0de
[  198.911316] ---[ end trace ba4047052f99b7bf ]---
[  198.915865]
Segmentation fault




> 
> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
> ---
>   arch/powerpc/kernel/process.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index a66f435dabbf..913cf1ea702e 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1455,8 +1455,8 @@ static void __show_regs(struct pt_regs *regs)
>   {
>   	int i, trap;
>   
> -	printk("NIP:  "REG" LR: "REG" CTR: "REG"\n",
> -	       regs->nip, regs->link, regs->ctr);
> +	printk("NIP: %pS LR: %pS CTR: "REG"\n",
> +	       (void *)regs->nip, (void *)regs->link, regs->ctr);
>   	printk("REGS: %px TRAP: %04lx   %s  (%s)\n",
>   	       regs, regs->trap, print_tainted(), init_utsname()->release);
>   	printk("MSR:  "REG" ", regs->msr);
> @@ -1493,8 +1493,8 @@ static void __show_regs(struct pt_regs *regs)
>   	 * above info out without failing
>   	 */
>   	if (IS_ENABLED(CONFIG_KALLSYMS)) {
> -		printk("NIP ["REG"] %pS\n", regs->nip, (void *)regs->nip);
> -		printk("LR ["REG"] %pS\n", regs->link, (void *)regs->link);
> +		printk("NIP %pS\n", (void *)regs->nip);
> +		printk("LR %pS\n", (void *)regs->link);
>   	}
>   }
>   
> @@ -2160,8 +2160,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
>   		newsp = stack[0];
>   		ip = stack[STACK_FRAME_LR_SAVE];
>   		if (!firstframe || ip != lr) {
> -			printk("%s["REG"] ["REG"] %pS",
> -				loglvl, sp, ip, (void *)ip);
> +			printk("%s ["REG"] %pS",
> +				loglvl, sp, (void *)ip);
>   			ret_addr = ftrace_graph_ret_addr(current,
>   						&ftrace_idx, ip, stack);
>   			if (ret_addr != ip)
> 

^ permalink raw reply

* Re: [RFC Qemu PATCH v2 1/2] spapr: drc: Add support for async hcalls at the drc level
From: Greg Kurz @ 2020-12-21 14:37 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: xiaoguangrong.eric, linux-nvdimm, aneesh.kumar, mst, qemu-devel,
	kvm-ppc, shivaprasadbhat, qemu-ppc, bharata, imammedo,
	linuxppc-dev, david
In-Reply-To: <20201221130853.15c8ddfd@bahia.lan>

On Mon, 21 Dec 2020 13:08:53 +0100
Greg Kurz <groug@kaod.org> wrote:

> Hi Shiva,
> 
> On Mon, 30 Nov 2020 09:16:39 -0600
> Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
> 
> > The patch adds support for async hcalls at the DRC level for the
> > spapr devices. To be used by spapr-scm devices in the patch/es to follow.
> > 
> > Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> > ---
> 
> The overall idea looks good but I think you should consider using
> a thread pool to implement it. See below.
> 

Some more comments.

> >  hw/ppc/spapr_drc.c         |  149 ++++++++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr_drc.h |   25 +++++++
> >  2 files changed, 174 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index 77718cde1f..4ecd04f686 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -15,6 +15,7 @@
> >  #include "qapi/qmp/qnull.h"
> >  #include "cpu.h"
> >  #include "qemu/cutils.h"
> > +#include "qemu/guest-random.h"
> >  #include "hw/ppc/spapr_drc.h"
> >  #include "qom/object.h"
> >  #include "migration/vmstate.h"
> > @@ -421,6 +422,148 @@ void spapr_drc_detach(SpaprDrc *drc)
> >      spapr_drc_release(drc);
> >  }
> >  
> > +
> > +/*
> > + * @drc : device DRC targetting which the async hcalls to be made.
> > + *
> > + * All subsequent requests to run/query the status should use the
> > + * unique token returned here.
> > + */
> > +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc)
> > +{
> > +    Error *err = NULL;
> > +    uint64_t token;
> > +    SpaprDrcDeviceAsyncHCallState *tmp, *next, *state;
> > +
> > +    state = g_malloc0(sizeof(*state));
> > +    state->pending = true;
> > +
> > +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> > +retry:
> > +    if (qemu_guest_getrandom(&token, sizeof(token), &err) < 0) {
> > +        error_report_err(err);
> > +        g_free(state);
> > +        qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +        return 0;
> > +    }
> > +
> > +    if (!token) /* Token should be non-zero */
> > +        goto retry;
> > +
> > +    if (!QLIST_EMPTY(&drc->async_hcall_states)) {
> > +        QLIST_FOREACH_SAFE(tmp, &drc->async_hcall_states, node, next) {
> > +            if (tmp->continue_token == token) {
> > +                /* If the token already in use, get a new one */
> > +                goto retry;
> > +            }
> > +        }
> > +    }
> > +
> > +    state->continue_token = token;
> > +    QLIST_INSERT_HEAD(&drc->async_hcall_states, state, node);
> > +
> > +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +
> > +    return state->continue_token;
> > +}
> > +
> > +static void *spapr_drc_async_hcall_runner(void *opaque)
> > +{
> > +    int response = -1;

It feels wrong since the return value of func() is supposed to be
opaque to this function. And anyway it isn't needed since response
is always set a few lines below.

> > +    SpaprDrcDeviceAsyncHCallState *state = opaque;
> > +
> > +    /*
> > +     * state is freed only after this thread finishes(after pthread_join()),
> > +     * don't worry about it becoming NULL.
> > +     */
> > +
> > +    response = state->func(state->data);
> > +
> > +    state->hcall_ret = response;
> > +    state->pending = 0;

s/0/false/

> > +
> > +    return NULL;
> > +}
> > +
> > +/*
> > + * @drc  : device DRC targetting which the async hcalls to be made.
> > + * token : The continue token to be used for tracking as recived from
> > + *         spapr_drc_get_new_async_hcall_token
> > + * @func() : the worker function which needs to be executed asynchronously
> > + * @data : data to be passed to the asynchronous function. Worker is supposed
> > + *         to free/cleanup the data that is passed here
> 
> It'd be cleaner to pass a completion callback and have free/cleanup handled there.
> 
> > + */
> > +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token,
> > +                               SpaprDrcAsyncHcallWorkerFunc *func, void *data)
> > +{
> > +    SpaprDrcDeviceAsyncHCallState *state;
> > +
> > +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> > +    QLIST_FOREACH(state, &drc->async_hcall_states, node) {
> > +        if (state->continue_token == token) {
> > +            state->func = func;
> > +            state->data = data;
> > +            qemu_thread_create(&state->thread, "sPAPR Async HCALL",
> > +                               spapr_drc_async_hcall_runner, state,
> > +                               QEMU_THREAD_JOINABLE);
> 
> qemu_thread_create() exits on failure, it shouldn't be called on
> a guest triggerable path, eg. a buggy guest could call it up to
> the point that pthread_create() returns EAGAIN.
> 
> Please use a thread pool (see thread_pool_submit_aio()). This takes care
> of all the thread housekeeping for you in a safe way, and it provides a
> completion callback API. The implementation could then be just about
> having two lists: one for pending requests (fed here) and one for
> completed requests (fed by the completion callback).
> 
> > +            break;
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +}
> > +
> > +/*
> > + * spapr_drc_finish_async_hcalls
> > + *      Waits for all pending async requests to complete
> > + *      thier execution and free the states
> > + */
> > +static void spapr_drc_finish_async_hcalls(SpaprDrc *drc)
> > +{
> > +    SpaprDrcDeviceAsyncHCallState *state, *next;
> > +
> > +    if (QLIST_EMPTY(&drc->async_hcall_states)) {
> > +        return;
> > +    }
> > +

This is called during machine reset and there won't be contention
in the large majority of cases. If the list is empty QLIST_FOREACH_SAFE()
won't iterate. So I don't think special casing the empty list brings much.

> > +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> > +    QLIST_FOREACH_SAFE(state, &drc->async_hcall_states, node, next) {
> > +        qemu_thread_join(&state->thread);
> 
> With a thread-pool, you'd just need to aio_poll() until the pending list
> is empty and then clear the completed list.
> 
> > +        QLIST_REMOVE(state, node);
> > +        g_free(state);
> > +    }
> > +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +}
> > +
> > +/*
> > + * spapr_drc_get_async_hcall_status
> > + *      Fetches the status of the hcall worker and returns H_BUSY
> > + *      if the worker is still running.
> > + */
> > +int spapr_drc_get_async_hcall_status(SpaprDrc *drc, uint64_t token)
> > +{
> > +    int ret = H_PARAMETER;
> > +    SpaprDrcDeviceAsyncHCallState *state, *node;
> > +
> > +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> > +    QLIST_FOREACH_SAFE(state, &drc->async_hcall_states, node, node) {
> > +        if (state->continue_token == token) {
> > +            if (state->pending) {
> > +                ret = H_BUSY;
> > +                break;
> > +            } else {
> > +                ret = state->hcall_ret;
> > +                qemu_thread_join(&state->thread);
> 
> Like for qemu_thread_create(), the guest shouldn't be responsible for
> thread housekeeping. Getting the hcall status should just be about
> finding the token in the pending or completed lists.
> 
> > +                QLIST_REMOVE(state, node);
> > +                g_free(state);
> > +                break;
> > +            }
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> > +
> > +    return ret;
> > +}
> > +
> >  void spapr_drc_reset(SpaprDrc *drc)
> >  {
> >      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > @@ -448,6 +591,7 @@ void spapr_drc_reset(SpaprDrc *drc)
> >          drc->ccs_offset = -1;
> >          drc->ccs_depth = -1;
> >      }
> > +    spapr_drc_finish_async_hcalls(drc);
> >  }
> >  
> >  static bool spapr_drc_unplug_requested_needed(void *opaque)
> > @@ -558,6 +702,7 @@ SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> >      drc->owner = owner;
> >      prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
> >                                  spapr_drc_index(drc));
> > +
> 
> Unrelated change.
> 
> >      object_property_add_child(owner, prop_name, OBJECT(drc));
> >      object_unref(OBJECT(drc));
> >      qdev_realize(DEVICE(drc), NULL, NULL);
> > @@ -577,6 +722,10 @@ static void spapr_dr_connector_instance_init(Object *obj)
> >      object_property_add(obj, "fdt", "struct", prop_get_fdt,
> >                          NULL, NULL, NULL);
> >      drc->state = drck->empty_state;
> > +
> > +    qemu_mutex_init(&drc->async_hcall_states_lock);
> > +    QLIST_INIT(&drc->async_hcall_states);
> > +
> 
> Empty line not needed.
> 
> >  }
> >  
> >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > index 165b281496..77f6e4386c 100644
> > --- a/include/hw/ppc/spapr_drc.h
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -18,6 +18,7 @@
> >  #include "sysemu/runstate.h"
> >  #include "hw/qdev-core.h"
> >  #include "qapi/error.h"
> > +#include "block/thread-pool.h"
> >  
> >  #define TYPE_SPAPR_DR_CONNECTOR "spapr-dr-connector"
> >  #define SPAPR_DR_CONNECTOR_GET_CLASS(obj) \
> > @@ -168,6 +169,21 @@ typedef enum {
> >      SPAPR_DRC_STATE_PHYSICAL_CONFIGURED = 8,
> >  } SpaprDrcState;
> >  
> > +typedef struct SpaprDrc SpaprDrc;
> > +
> > +typedef int SpaprDrcAsyncHcallWorkerFunc(void *opaque);
> > +typedef struct SpaprDrcDeviceAsyncHCallState {
> > +    uint64_t continue_token;
> > +    bool pending;
> > +
> > +    int hcall_ret;
> > +    SpaprDrcAsyncHcallWorkerFunc *func;
> > +    void *data;
> > +
> > +    QemuThread thread;
> > +
> > +    QLIST_ENTRY(SpaprDrcDeviceAsyncHCallState) node;
> > +} SpaprDrcDeviceAsyncHCallState;
> >  typedef struct SpaprDrc {
> >      /*< private >*/
> >      DeviceState parent;
> > @@ -182,6 +198,10 @@ typedef struct SpaprDrc {
> >      int ccs_offset;
> >      int ccs_depth;
> >  
> > +    /* async hcall states */
> > +    QemuMutex async_hcall_states_lock;
> > +    QLIST_HEAD(, SpaprDrcDeviceAsyncHCallState) async_hcall_states;
> > +
> >      /* device pointer, via link property */
> >      DeviceState *dev;
> >      bool unplug_requested;
> > @@ -241,6 +261,11 @@ void spapr_drc_detach(SpaprDrc *drc);
> >  /* Returns true if a hot plug/unplug request is pending */
> >  bool spapr_drc_transient(SpaprDrc *drc);
> >  
> > +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc);
> > +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token,
> > +                               SpaprDrcAsyncHcallWorkerFunc, void *data);
> > +int spapr_drc_get_async_hcall_status(SpaprDrc *drc, uint64_t token);
> > +
> >  static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
> >  {
> >      return drc->unplug_requested;
> > 
> > 
> > 
> 
> 


^ permalink raw reply

* Re: [RFC Qemu PATCH v2 0/2] spapr: nvdimm: Asynchronus flush hcall support
From: Greg Kurz @ 2020-12-21 13:07 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: xiaoguangrong.eric, mst, aneesh.kumar, linux-nvdimm, qemu-devel,
	kvm-ppc, shivaprasadbhat, qemu-ppc, bharata, imammedo,
	linuxppc-dev, david
In-Reply-To: <160674929554.2492771.17651548703390170573.stgit@lep8c.aus.stglabs.ibm.com>

On Mon, 30 Nov 2020 09:16:14 -0600
Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:

> The nvdimm devices are expected to ensure write persistent during power
> failure kind of scenarios.
> 
> The libpmem has architecture specific instructions like dcbf on power
> to flush the cache data to backend nvdimm device during normal writes.
> 
> Qemu - virtual nvdimm devices are memory mapped. The dcbf in the guest
> doesn't traslate to actual flush to the backend file on the host in case
> of file backed vnvdimms. This is addressed by virtio-pmem in case of x86_64
> by making asynchronous flushes.
> 
> On PAPR, issue is addressed by adding a new hcall to
> request for an explicit asynchronous flush requests from the guest ndctl
> driver when the backend nvdimm cannot ensure write persistence with dcbf
> alone. So, the approach here is to convey when the asynchronous flush is
> required in a device tree property. The guest makes the hcall when the
> property is found, instead of relying on dcbf.
> 
> The first patch adds the necessary asynchronous hcall support infrastructure
> code at the DRC level. Second patch implements the hcall using the
> infrastructure.
> 
> Hcall semantics are in review and not final.
> 
> A new device property sync-dax is added to the nvdimm device. When the 
> sync-dax is off(default), the asynchronous hcalls will be called.
> 
> With respect to save from new qemu to restore on old qemu, having the
> sync-dax by default off(when not specified) causes IO errors in guests as
> the async-hcall would not be supported on old qemu. The new hcall
> implementation being supported only on the new  pseries machine version,
> the current machine version checks may be sufficient to prevent
> such migration. Please suggest what should be done.
> 

First, all requests that are still not completed from the guest POV,
ie. the hcall hasn't returned H_SUCCESS yet, are state that we should
migrate in theory. In this case, I guess we rather want to drain all
pending requests on the source in some pre-save handler.

Then, as explained in another mail, you should enforce stable behavior
for existing machine types with some hw_compat magic.

> The below demonstration shows the map_sync behavior with sync-dax on & off.
> (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c)
> 
> The pmem0 is from nvdimm with With sync-dax=on, and pmem1 is from nvdimm with syn-dax=off, mounted as
> /dev/pmem0 on /mnt1 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
> /dev/pmem1 on /mnt2 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
> 
> [root@atest-guest ~]# ./mapsync /mnt1/newfile    ----> When sync-dax=off
> [root@atest-guest ~]# ./mapsync /mnt2/newfile    ----> when sync-dax=on
> Failed to mmap  with Operation not supported
> 
> ---
> v1 - https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg06330.html
> Changes from v1
>       - Fixed a missed-out unlock
>       - using QLIST_FOREACH instead of QLIST_FOREACH_SAFE while generating token
> 
> Shivaprasad G Bhat (2):
>       spapr: drc: Add support for async hcalls at the drc level
>       spapr: nvdimm: Implement async flush hcalls
> 
> 
>  hw/mem/nvdimm.c            |    1
>  hw/ppc/spapr_drc.c         |  146 ++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_nvdimm.c      |   79 ++++++++++++++++++++++++
>  include/hw/mem/nvdimm.h    |   10 +++
>  include/hw/ppc/spapr.h     |    3 +
>  include/hw/ppc/spapr_drc.h |   25 ++++++++
>  6 files changed, 263 insertions(+), 1 deletion(-)
> 
> --
> Signature
> 
> 


^ permalink raw reply

* Re: [RFC Qemu PATCH v2 2/2] spapr: nvdimm: Implement async flush hcalls
From: Greg Kurz @ 2020-12-21 13:07 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: xiaoguangrong.eric, mst, aneesh.kumar, linux-nvdimm, qemu-devel,
	kvm-ppc, shivaprasadbhat, qemu-ppc, bharata, imammedo,
	linuxppc-dev, david
In-Reply-To: <160674940727.2492771.7855399693883710135.stgit@lep8c.aus.stglabs.ibm.com>

On Mon, 30 Nov 2020 09:17:24 -0600
Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:

> When the persistent memory beacked by a file, a cpu cache flush instruction
> is not sufficient to ensure the stores are correctly flushed to the media.
> 
> The patch implements the async hcalls for flush operation on demand from the
> guest kernel.
> 
> The device option sync-dax is by default off and enables explicit asynchronous
> flush requests from guest. It can be disabled by setting syn-dax=on.
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> ---
>  hw/mem/nvdimm.c         |    1 +
>  hw/ppc/spapr_nvdimm.c   |   79 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/mem/nvdimm.h |   10 ++++++
>  include/hw/ppc/spapr.h  |    3 +-
>  4 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> index 03c2201b56..37a4db0135 100644
> --- a/hw/mem/nvdimm.c
> +++ b/hw/mem/nvdimm.c
> @@ -220,6 +220,7 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
>  
>  static Property nvdimm_properties[] = {
>      DEFINE_PROP_BOOL(NVDIMM_UNARMED_PROP, NVDIMMDevice, unarmed, false),
> +    DEFINE_PROP_BOOL(NVDIMM_SYNC_DAX_PROP, NVDIMMDevice, sync_dax, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index a833a63b5e..557e36aa98 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -22,6 +22,7 @@
>   * THE SOFTWARE.
>   */
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include "qapi/error.h"
>  #include "hw/ppc/spapr_drc.h"
>  #include "hw/ppc/spapr_nvdimm.h"
> @@ -155,6 +156,11 @@ static int spapr_dt_nvdimm(SpaprMachineState *spapr, void *fdt,
>                               "operating-system")));
>      _FDT(fdt_setprop(fdt, child_offset, "ibm,cache-flush-required", NULL, 0));
>  
> +    if (!nvdimm->sync_dax) {

So this is done unconditionally for all machine types. This means that a
guest started on a newer QEMU cannot be migrated to an older QEMU. This
is annoying because people legitimately expect an existing machine type
to be migratable to any QEMU version that supports it.

This means that something like the following should be added in hw_compat_5_2[]
to fix the property for pre-6.0 machine types:

    { "nvdimm", "sync-dax", "on" },

> +        _FDT(fdt_setprop(fdt, child_offset, "ibm,async-flush-required",
> +                         NULL, 0));
> +    }
> +
>      return child_offset;
>  }
>  
> @@ -370,6 +376,78 @@ static target_ulong h_scm_bind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
>      return H_SUCCESS;
>  }
>  
> +typedef struct SCMAsyncFlushData {
> +    int fd;
> +    uint64_t token;
> +} SCMAsyncFlushData;
> +
> +static int flush_worker_cb(void *opaque)
> +{
> +    int ret = H_SUCCESS;
> +    SCMAsyncFlushData *req_data = opaque;
> +
> +    /* flush raw backing image */
> +    if (qemu_fdatasync(req_data->fd) < 0) {
> +        error_report("papr_scm: Could not sync nvdimm to backend file: %s",
> +                     strerror(errno));
> +        ret = H_HARDWARE;
> +    }
> +
> +    g_free(req_data);
> +
> +    return ret;
> +}
> +
> +static target_ulong h_scm_async_flush(PowerPCCPU *cpu, SpaprMachineState *spapr,
> +                                      target_ulong opcode, target_ulong *args)
> +{
> +    int ret;
> +    uint32_t drc_index = args[0];
> +    uint64_t continue_token = args[1];
> +    SpaprDrc *drc = spapr_drc_by_index(drc_index);
> +    PCDIMMDevice *dimm;
> +    HostMemoryBackend *backend = NULL;
> +    SCMAsyncFlushData *req_data = NULL;
> +
> +    if (!drc || !drc->dev ||
> +        spapr_drc_type(drc) != SPAPR_DR_CONNECTOR_TYPE_PMEM) {
> +        return H_PARAMETER;
> +    }
> +
> +    if (continue_token != 0) {
> +        ret = spapr_drc_get_async_hcall_status(drc, continue_token);
> +        if (ret == H_BUSY) {
> +            args[0] = continue_token;
> +            return H_LONG_BUSY_ORDER_1_SEC;
> +        }
> +
> +        return ret;
> +    }
> +
> +    dimm = PC_DIMM(drc->dev);
> +    backend = MEMORY_BACKEND(dimm->hostmem);
> +
> +    req_data = g_malloc0(sizeof(SCMAsyncFlushData));
> +    req_data->fd = memory_region_get_fd(&backend->mr);
> +
> +    continue_token = spapr_drc_get_new_async_hcall_token(drc);
> +    if (!continue_token) {
> +        g_free(req_data);
> +        return H_P2;
> +    }
> +    req_data->token = continue_token;
> +
> +    spapr_drc_run_async_hcall(drc, continue_token, &flush_worker_cb, req_data);
> +
> +    ret = spapr_drc_get_async_hcall_status(drc, continue_token);
> +    if (ret == H_BUSY) {
> +        args[0] = req_data->token;
> +        return ret;
> +    }
> +
> +    return ret;
> +}
> +
>  static target_ulong h_scm_unbind_mem(PowerPCCPU *cpu, SpaprMachineState *spapr,
>                                       target_ulong opcode, target_ulong *args)
>  {
> @@ -486,6 +564,7 @@ static void spapr_scm_register_types(void)
>      spapr_register_hypercall(H_SCM_BIND_MEM, h_scm_bind_mem);
>      spapr_register_hypercall(H_SCM_UNBIND_MEM, h_scm_unbind_mem);
>      spapr_register_hypercall(H_SCM_UNBIND_ALL, h_scm_unbind_all);
> +    spapr_register_hypercall(H_SCM_ASYNC_FLUSH, h_scm_async_flush);
>  }
>  
>  type_init(spapr_scm_register_types)
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> index c699842dd0..9e8795766e 100644
> --- a/include/hw/mem/nvdimm.h
> +++ b/include/hw/mem/nvdimm.h
> @@ -51,6 +51,7 @@ OBJECT_DECLARE_TYPE(NVDIMMDevice, NVDIMMClass, NVDIMM)
>  #define NVDIMM_LABEL_SIZE_PROP "label-size"
>  #define NVDIMM_UUID_PROP       "uuid"
>  #define NVDIMM_UNARMED_PROP    "unarmed"
> +#define NVDIMM_SYNC_DAX_PROP    "sync-dax"
>  
>  struct NVDIMMDevice {
>      /* private */
> @@ -85,6 +86,15 @@ struct NVDIMMDevice {
>       */
>      bool unarmed;
>  
> +    /*
> +     * On PPC64,
> +     * The 'off' value results in the async-flush-required property set
> +     * in the device tree for pseries machines. When 'off', the guest
> +     * initiates explicity flush requests to the backend device ensuring
> +     * write persistence.
> +     */
> +    bool sync_dax;
> +
>      /*
>       * The PPC64 - spapr requires each nvdimm device have a uuid.
>       */
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 2e89e36cfb..6d7110b7dc 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -535,8 +535,9 @@ struct SpaprMachineState {
>  #define H_SCM_BIND_MEM          0x3EC
>  #define H_SCM_UNBIND_MEM        0x3F0
>  #define H_SCM_UNBIND_ALL        0x3FC
> +#define H_SCM_ASYNC_FLUSH       0x4A0
>  
> -#define MAX_HCALL_OPCODE        H_SCM_UNBIND_ALL
> +#define MAX_HCALL_OPCODE        H_SCM_ASYNC_FLUSH
>  
>  /* The hcalls above are standardized in PAPR and implemented by pHyp
>   * as well.
> 
> 
> 


^ permalink raw reply

* Re: [RFC Qemu PATCH v2 1/2] spapr: drc: Add support for async hcalls at the drc level
From: Greg Kurz @ 2020-12-21 12:08 UTC (permalink / raw)
  To: Shivaprasad G Bhat
  Cc: xiaoguangrong.eric, mst, aneesh.kumar, linux-nvdimm, qemu-devel,
	kvm-ppc, shivaprasadbhat, qemu-ppc, bharata, imammedo,
	linuxppc-dev, david
In-Reply-To: <160674938210.2492771.1728601884822491679.stgit@lep8c.aus.stglabs.ibm.com>

Hi Shiva,

On Mon, 30 Nov 2020 09:16:39 -0600
Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:

> The patch adds support for async hcalls at the DRC level for the
> spapr devices. To be used by spapr-scm devices in the patch/es to follow.
> 
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> ---

The overall idea looks good but I think you should consider using
a thread pool to implement it. See below.

>  hw/ppc/spapr_drc.c         |  149 ++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h |   25 +++++++
>  2 files changed, 174 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 77718cde1f..4ecd04f686 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -15,6 +15,7 @@
>  #include "qapi/qmp/qnull.h"
>  #include "cpu.h"
>  #include "qemu/cutils.h"
> +#include "qemu/guest-random.h"
>  #include "hw/ppc/spapr_drc.h"
>  #include "qom/object.h"
>  #include "migration/vmstate.h"
> @@ -421,6 +422,148 @@ void spapr_drc_detach(SpaprDrc *drc)
>      spapr_drc_release(drc);
>  }
>  
> +
> +/*
> + * @drc : device DRC targetting which the async hcalls to be made.
> + *
> + * All subsequent requests to run/query the status should use the
> + * unique token returned here.
> + */
> +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc)
> +{
> +    Error *err = NULL;
> +    uint64_t token;
> +    SpaprDrcDeviceAsyncHCallState *tmp, *next, *state;
> +
> +    state = g_malloc0(sizeof(*state));
> +    state->pending = true;
> +
> +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> +retry:
> +    if (qemu_guest_getrandom(&token, sizeof(token), &err) < 0) {
> +        error_report_err(err);
> +        g_free(state);
> +        qemu_mutex_unlock(&drc->async_hcall_states_lock);
> +        return 0;
> +    }
> +
> +    if (!token) /* Token should be non-zero */
> +        goto retry;
> +
> +    if (!QLIST_EMPTY(&drc->async_hcall_states)) {
> +        QLIST_FOREACH_SAFE(tmp, &drc->async_hcall_states, node, next) {
> +            if (tmp->continue_token == token) {
> +                /* If the token already in use, get a new one */
> +                goto retry;
> +            }
> +        }
> +    }
> +
> +    state->continue_token = token;
> +    QLIST_INSERT_HEAD(&drc->async_hcall_states, state, node);
> +
> +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> +
> +    return state->continue_token;
> +}
> +
> +static void *spapr_drc_async_hcall_runner(void *opaque)
> +{
> +    int response = -1;
> +    SpaprDrcDeviceAsyncHCallState *state = opaque;
> +
> +    /*
> +     * state is freed only after this thread finishes(after pthread_join()),
> +     * don't worry about it becoming NULL.
> +     */
> +
> +    response = state->func(state->data);
> +
> +    state->hcall_ret = response;
> +    state->pending = 0;
> +
> +    return NULL;
> +}
> +
> +/*
> + * @drc  : device DRC targetting which the async hcalls to be made.
> + * token : The continue token to be used for tracking as recived from
> + *         spapr_drc_get_new_async_hcall_token
> + * @func() : the worker function which needs to be executed asynchronously
> + * @data : data to be passed to the asynchronous function. Worker is supposed
> + *         to free/cleanup the data that is passed here

It'd be cleaner to pass a completion callback and have free/cleanup handled there.

> + */
> +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token,
> +                               SpaprDrcAsyncHcallWorkerFunc *func, void *data)
> +{
> +    SpaprDrcDeviceAsyncHCallState *state;
> +
> +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> +    QLIST_FOREACH(state, &drc->async_hcall_states, node) {
> +        if (state->continue_token == token) {
> +            state->func = func;
> +            state->data = data;
> +            qemu_thread_create(&state->thread, "sPAPR Async HCALL",
> +                               spapr_drc_async_hcall_runner, state,
> +                               QEMU_THREAD_JOINABLE);

qemu_thread_create() exits on failure, it shouldn't be called on
a guest triggerable path, eg. a buggy guest could call it up to
the point that pthread_create() returns EAGAIN.

Please use a thread pool (see thread_pool_submit_aio()). This takes care
of all the thread housekeeping for you in a safe way, and it provides a
completion callback API. The implementation could then be just about
having two lists: one for pending requests (fed here) and one for
completed requests (fed by the completion callback).

> +            break;
> +        }
> +    }
> +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> +}
> +
> +/*
> + * spapr_drc_finish_async_hcalls
> + *      Waits for all pending async requests to complete
> + *      thier execution and free the states
> + */
> +static void spapr_drc_finish_async_hcalls(SpaprDrc *drc)
> +{
> +    SpaprDrcDeviceAsyncHCallState *state, *next;
> +
> +    if (QLIST_EMPTY(&drc->async_hcall_states)) {
> +        return;
> +    }
> +
> +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> +    QLIST_FOREACH_SAFE(state, &drc->async_hcall_states, node, next) {
> +        qemu_thread_join(&state->thread);

With a thread-pool, you'd just need to aio_poll() until the pending list
is empty and then clear the completed list.

> +        QLIST_REMOVE(state, node);
> +        g_free(state);
> +    }
> +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> +}
> +
> +/*
> + * spapr_drc_get_async_hcall_status
> + *      Fetches the status of the hcall worker and returns H_BUSY
> + *      if the worker is still running.
> + */
> +int spapr_drc_get_async_hcall_status(SpaprDrc *drc, uint64_t token)
> +{
> +    int ret = H_PARAMETER;
> +    SpaprDrcDeviceAsyncHCallState *state, *node;
> +
> +    qemu_mutex_lock(&drc->async_hcall_states_lock);
> +    QLIST_FOREACH_SAFE(state, &drc->async_hcall_states, node, node) {
> +        if (state->continue_token == token) {
> +            if (state->pending) {
> +                ret = H_BUSY;
> +                break;
> +            } else {
> +                ret = state->hcall_ret;
> +                qemu_thread_join(&state->thread);

Like for qemu_thread_create(), the guest shouldn't be responsible for
thread housekeeping. Getting the hcall status should just be about
finding the token in the pending or completed lists.

> +                QLIST_REMOVE(state, node);
> +                g_free(state);
> +                break;
> +            }
> +        }
> +    }
> +    qemu_mutex_unlock(&drc->async_hcall_states_lock);
> +
> +    return ret;
> +}
> +
>  void spapr_drc_reset(SpaprDrc *drc)
>  {
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> @@ -448,6 +591,7 @@ void spapr_drc_reset(SpaprDrc *drc)
>          drc->ccs_offset = -1;
>          drc->ccs_depth = -1;
>      }
> +    spapr_drc_finish_async_hcalls(drc);
>  }
>  
>  static bool spapr_drc_unplug_requested_needed(void *opaque)
> @@ -558,6 +702,7 @@ SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
>      drc->owner = owner;
>      prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
>                                  spapr_drc_index(drc));
> +

Unrelated change.

>      object_property_add_child(owner, prop_name, OBJECT(drc));
>      object_unref(OBJECT(drc));
>      qdev_realize(DEVICE(drc), NULL, NULL);
> @@ -577,6 +722,10 @@ static void spapr_dr_connector_instance_init(Object *obj)
>      object_property_add(obj, "fdt", "struct", prop_get_fdt,
>                          NULL, NULL, NULL);
>      drc->state = drck->empty_state;
> +
> +    qemu_mutex_init(&drc->async_hcall_states_lock);
> +    QLIST_INIT(&drc->async_hcall_states);
> +

Empty line not needed.

>  }
>  
>  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 165b281496..77f6e4386c 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -18,6 +18,7 @@
>  #include "sysemu/runstate.h"
>  #include "hw/qdev-core.h"
>  #include "qapi/error.h"
> +#include "block/thread-pool.h"
>  
>  #define TYPE_SPAPR_DR_CONNECTOR "spapr-dr-connector"
>  #define SPAPR_DR_CONNECTOR_GET_CLASS(obj) \
> @@ -168,6 +169,21 @@ typedef enum {
>      SPAPR_DRC_STATE_PHYSICAL_CONFIGURED = 8,
>  } SpaprDrcState;
>  
> +typedef struct SpaprDrc SpaprDrc;
> +
> +typedef int SpaprDrcAsyncHcallWorkerFunc(void *opaque);
> +typedef struct SpaprDrcDeviceAsyncHCallState {
> +    uint64_t continue_token;
> +    bool pending;
> +
> +    int hcall_ret;
> +    SpaprDrcAsyncHcallWorkerFunc *func;
> +    void *data;
> +
> +    QemuThread thread;
> +
> +    QLIST_ENTRY(SpaprDrcDeviceAsyncHCallState) node;
> +} SpaprDrcDeviceAsyncHCallState;
>  typedef struct SpaprDrc {
>      /*< private >*/
>      DeviceState parent;
> @@ -182,6 +198,10 @@ typedef struct SpaprDrc {
>      int ccs_offset;
>      int ccs_depth;
>  
> +    /* async hcall states */
> +    QemuMutex async_hcall_states_lock;
> +    QLIST_HEAD(, SpaprDrcDeviceAsyncHCallState) async_hcall_states;
> +
>      /* device pointer, via link property */
>      DeviceState *dev;
>      bool unplug_requested;
> @@ -241,6 +261,11 @@ void spapr_drc_detach(SpaprDrc *drc);
>  /* Returns true if a hot plug/unplug request is pending */
>  bool spapr_drc_transient(SpaprDrc *drc);
>  
> +uint64_t spapr_drc_get_new_async_hcall_token(SpaprDrc *drc);
> +void spapr_drc_run_async_hcall(SpaprDrc *drc, uint64_t token,
> +                               SpaprDrcAsyncHcallWorkerFunc, void *data);
> +int spapr_drc_get_async_hcall_status(SpaprDrc *drc, uint64_t token);
> +
>  static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
>  {
>      return drc->unplug_requested;
> 
> 
> 


^ permalink raw reply

* Re: [PATCH] KVM: PPC: fix comparison to bool warning
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: xiakaixu1987@gmail.com, paulus; +Cc: Kaixu Xia, linuxppc-dev, kvm-ppc
In-Reply-To: <1604764178-8087-1-git-send-email-kaixuxia@tencent.com>

On Sat, 7 Nov 2020 23:49:38 +0800, xiakaixu1987@gmail.com wrote:
> Fix the following coccicheck warning:
> 
> ./arch/powerpc/kvm/booke.c:503:6-16: WARNING: Comparison to bool
> ./arch/powerpc/kvm/booke.c:505:6-17: WARNING: Comparison to bool
> ./arch/powerpc/kvm/booke.c:507:6-16: WARNING: Comparison to bool

Applied to powerpc/next.

[1/1] KVM: PPC: fix comparison to bool warning
      https://git.kernel.org/powerpc/c/a300bf8c5f24bdeaa84925d1e0ec6221cbdc7597

cheers

^ permalink raw reply

* Re: [PATCH] KVM: PPC: Book3S: Assign boolean values to a bool variable
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: xiakaixu1987@gmail.com, paulus
  Cc: Kaixu Xia, linuxppc-dev, linux-kernel, kvm-ppc
In-Reply-To: <1604730382-5810-1-git-send-email-kaixuxia@tencent.com>

On Sat, 7 Nov 2020 14:26:22 +0800, xiakaixu1987@gmail.com wrote:
> Fix the following coccinelle warnings:
> 
> ./arch/powerpc/kvm/book3s_xics.c:476:3-15: WARNING: Assignment of 0/1 to bool variable
> ./arch/powerpc/kvm/book3s_xics.c:504:3-15: WARNING: Assignment of 0/1 to bool variable

Applied to powerpc/next.

[1/1] KVM: PPC: Book3S: Assign boolean values to a bool variable
      https://git.kernel.org/powerpc/c/13751f8747519fe3bdc738fa6d802fbd94a85ac4

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/ps3: use dma_mapping_error()
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: linuxppc-dev, Vincent Stehlé, linux-kernel
  Cc: Geoff Levand, Geert Uytterhoeven
In-Reply-To: <20201213182622.23047-1-vincent.stehle@laposte.net>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 356 bytes --]

On Sun, 13 Dec 2020 19:26:22 +0100, Vincent Stehlé wrote:
> The DMA address returned by dma_map_single() should be checked with
> dma_mapping_error(). Fix the ps3stor_setup() function accordingly.

Applied to powerpc/next.

[1/1] powerpc/ps3: use dma_mapping_error()
      https://git.kernel.org/powerpc/c/d0edaa28a1f7830997131cbce87b6c52472825d1

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/configs: Add ppc64le_allnoconfig target
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman; +Cc: dja
In-Reply-To: <20201125031551.2112715-1-mpe@ellerman.id.au>

On Wed, 25 Nov 2020 14:15:51 +1100, Michael Ellerman wrote:
> Add a phony target for ppc64le_allnoconfig, which tests some
> combinations of CONFIG symbols that aren't covered by any of our
> defconfigs.

Applied to powerpc/next.

[1/1] powerpc/configs: Add ppc64le_allnoconfig target
      https://git.kernel.org/powerpc/c/5d82344795dbd3fcd74c974ab60b2845970dc5e3

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: Add config fragment for disabling -Werror
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman
In-Reply-To: <20201023040002.3313371-1-mpe@ellerman.id.au>

On Fri, 23 Oct 2020 15:00:02 +1100, Michael Ellerman wrote:
> This makes it easy to disable building with -Werror:
> 
>   $ make defconfig
>   $ grep WERROR .config
>   # CONFIG_PPC_DISABLE_WERROR is not set
>   CONFIG_PPC_WERROR=y
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Add config fragment for disabling -Werror
      https://git.kernel.org/powerpc/c/c15d1f9d03a0f4f68bf52dffdd541c8054e6de35

cheers

^ permalink raw reply

* Re: [PATCH v2 1/1] powerpc/kvm: Fix mask size for emulated msgsndp
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: Paul Mackerras, Leonardo Bras, Benjamin Herrenschmidt,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, kvm-ppc
In-Reply-To: <20201208215707.31149-1-leobras.c@gmail.com>

On Tue, 8 Dec 2020 18:57:08 -0300, Leonardo Bras wrote:
> According to ISAv3.1 and ISAv3.0b, the msgsndp is described to split RB in:
> msgtype <- (RB) 32:36
> payload <- (RB) 37:63
> t       <- (RB) 57:63
> 
> The current way of getting 'msgtype', and 't' is missing their MSB:
> msgtype: ((arg >> 27) & 0xf) : Gets (RB) 33:36, missing bit 32
> t:       (arg &= 0x3f)       : Gets (RB) 58:63, missing bit 57
> 
> [...]

Applied to powerpc/next.

[1/1] KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
      https://git.kernel.org/powerpc/c/87fb4978ef8f7e3d6f51ea8e259638c4e96f2fc0

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/memhotplug: quieting some DLPAR operations
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: linuxppc-dev, Laurent Dufour; +Cc: nathanl, cheloha, paulus, linux-kernel
In-Reply-To: <20201211145954.90143-1-ldufour@linux.ibm.com>

On Fri, 11 Dec 2020 15:59:54 +0100, Laurent Dufour wrote:
> When attempting to remove by index a set of LMB a lot of messages are
> displayed on the console, even when everything goes fine:
> 
>  pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000002d
>  Offlined Pages 4096
>  pseries-hotplug-mem: Memory at 2d0000000 was hot-removed
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries/memhotplug: Quieten some DLPAR operations
      https://git.kernel.org/powerpc/c/20e9de85edae3a5866f29b6cce87c9ec66d62a1b

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Michael Ellerman, qcai,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <56feccd7b6fcd98e353361a233fa7bb8e67c3164.1607780469.git.christophe.leroy@csgroup.eu>

On Sat, 12 Dec 2020 13:41:25 +0000 (UTC), Christophe Leroy wrote:
> Commit 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in
> hugetlb range freeing functions") inadvertely removed the mask
> applied to start parameter in those two functions, leading to the
> following crash on power9.
> 
> [ 7703.114640][T58070] LTP: starting hugemmap05_1 (hugemmap05 -m)
> [ 7703.157792][   C99] ------------[ cut here ]------------
> [ 7703.158279][   C99] kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:387!
> [ 7703.158306][   C99] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 7703.158330][   C99] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV
> [ 7703.158343][   C99] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod ahci libahci tg3 libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_del_mod]
> [ 7703.158435][   C99] CPU: 99 PID: 308 Comm: ksoftirqd/99 Tainted: G           O      5.10.0-rc7-next-20201211 #1
> [ 7703.158464][   C99] NIP:  c00000000005dbec LR: c0000000003352f4 CTR: 0000000000000000
> [ 7703.158489][   C99] REGS: c00020000bb6f830 TRAP: 0700   Tainted: G           O       (5.10.0-rc7-next-20201211)
> [ 7703.158528][   C99] MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002284  XER: 20040000
> [ 7703.158570][   C99] GPR00: c0000000003352f4 c00020000bb6fad0 c000000007f70b00 c0002000385b3ff0
> [ 7703.158570][   C99] GPR04: 0000000000000000 0000000000000003 c00020000bb6f8b4 0000000000000001
> [ 7703.158570][   C99] GPR08: 0000000000000001 0000000000000009 0000000000000008 0000000000000002
> [ 7703.158570][   C99] GPR12: 0000000024002488 c000201fff649c00 c000000007f2a20c 0000000000000000
> [ 7703.158570][   C99] GPR16: 0000000000000007 0000000000000000 c000000000194d10 c000000000194d10
> [ 7703.158570][   C99] GPR24: 0000000000000014 0000000000000015 c000201cc6e72398 c000000007fac4b4
> [ 7703.158570][   C99] GPR28: c000000007f2bf80 c000000007fac2f8 0000000000000008 c000200033870000
> [ 7703.158766][   C99] NIP [c00000000005dbec] __tlb_remove_table+0x1dc/0x1e0
> pgtable_free at arch/powerpc/mm/book3s64/pgtable.c:387
> (inlined by) __tlb_remove_table at arch/powerpc/mm/book3s64/pgtable.c:405
> [ 7703.158805][   C99] LR [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0
> [ 7703.158853][   C99] Call Trace:
> [ 7703.158872][   C99] [c00020000bb6fad0] [c00000000005db4c] __tlb_remove_table+0x13c/0x1e0 (unreliable)
> [ 7703.158890][   C99] [c00020000bb6fb00] [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0
> __tlb_remove_table_free at mm/mmu_gather.c:101
> (inlined by) tlb_remove_table_rcu at mm/mmu_gather.c:156
> [ 7703.158927][   C99] [c00020000bb6fb30] [c000000000194d7c] rcu_core+0x35c/0xbb0
> rcu_do_batch at kernel/rcu/tree.c:2502
> (inlined by) rcu_core at kernel/rcu/tree.c:2737
> [ 7703.158966][   C99] [c00020000bb6fbf0] [c00000000095a3d0] __do_softirq+0x480/0x704
> [ 7703.159006][   C99] [c00020000bb6fd10] [c0000000000cc1f4] run_ksoftirqd+0x74/0xd0
> run_ksoftirqd at kernel/softirq.c:651
> (inlined by) run_ksoftirqd at kernel/softirq.c:642
> [ 7703.159046][   C99] [c00020000bb6fd30] [c0000000001040c8] smpboot_thread_fn+0x278/0x320
> [ 7703.159096][   C99] [c00020000bb6fda0] [c0000000000fc8a4] kthread+0x1c4/0x1d0
> [ 7703.159145][   C99] [c00020000bb6fe10] [c00000000000d9fc] ret_from_kernel_thread+0x5c/0x80
> [ 7703.159183][   C99] Instruction dump:
> [ 7703.159204][   C99] 60000000 7c0802a6 3c82f8b4 7fe3fb78 38847470 f8010040 482b4fc5 60000000
> [ 7703.159248][   C99] 0fe00000 7c0802a6 fbe10028 f8010040 <0fe00000> 3c4c07f1 38422f10 7c0802a6
> [ 7703.159293][   C99] ---[ end trace 1d92a5231ba6a0d5 ]---
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
      https://git.kernel.org/powerpc/c/2198d4934ee8b81341a84c9ec8bb25b4b0d02522

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: Christophe Leroy, Paul Mackerras, Benjamin Herrenschmidt,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <a1d31f84ddb0926813b17fcd5cc7f3fa7b4deac2.1602759123.git.christophe.leroy@csgroup.eu>

On Thu, 15 Oct 2020 10:52:20 +0000 (UTC), Christophe Leroy wrote:
> 	ppc-linux-objdump -d vmlinux | grep -e "<csum_partial>" -e "<__csum_partial>"
> 
> With gcc9 I get:
> 
> 	c0017ef8 <__csum_partial>:
> 	c00182fc:	4b ff fb fd 	bl      c0017ef8 <__csum_partial>
> 	c0018478:	4b ff fa 80 	b       c0017ef8 <__csum_partial>
> 	c03e8458:	4b c2 fa a0 	b       c0017ef8 <__csum_partial>
> 	c03e8518:	4b c2 f9 e1 	bl      c0017ef8 <__csum_partial>
> 	c03ef410:	4b c2 8a e9 	bl      c0017ef8 <__csum_partial>
> 	c03f0b24:	4b c2 73 d5 	bl      c0017ef8 <__csum_partial>
> 	c04279a4:	4b bf 05 55 	bl      c0017ef8 <__csum_partial>
> 	c0429820:	4b be e6 d9 	bl      c0017ef8 <__csum_partial>
> 	c0429944:	4b be e5 b5 	bl      c0017ef8 <__csum_partial>
> 	c042b478:	4b be ca 81 	bl      c0017ef8 <__csum_partial>
> 	c042b554:	4b be c9 a5 	bl      c0017ef8 <__csum_partial>
> 	c045f15c:	4b bb 8d 9d 	bl      c0017ef8 <__csum_partial>
> 	c0492190:	4b b8 5d 69 	bl      c0017ef8 <__csum_partial>
> 	c0492310:	4b b8 5b e9 	bl      c0017ef8 <__csum_partial>
> 	c0495594:	4b b8 29 65 	bl      c0017ef8 <__csum_partial>
> 	c049c420:	4b b7 ba d9 	bl      c0017ef8 <__csum_partial>
> 	c049c870:	4b b7 b6 89 	bl      c0017ef8 <__csum_partial>
> 	c049c930:	4b b7 b5 c9 	bl      c0017ef8 <__csum_partial>
> 	c04a9ca0:	4b b6 e2 59 	bl      c0017ef8 <__csum_partial>
> 	c04bdde4:	4b b5 a1 15 	bl      c0017ef8 <__csum_partial>
> 	c04be480:	4b b5 9a 79 	bl      c0017ef8 <__csum_partial>
> 	c04be710:	4b b5 97 e9 	bl      c0017ef8 <__csum_partial>
> 	c04c969c:	4b b4 e8 5d 	bl      c0017ef8 <__csum_partial>
> 	c04ca2fc:	4b b4 db fd 	bl      c0017ef8 <__csum_partial>
> 	c04cf5bc:	4b b4 89 3d 	bl      c0017ef8 <__csum_partial>
> 	c04d0440:	4b b4 7a b9 	bl      c0017ef8 <__csum_partial>
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
      https://git.kernel.org/powerpc/c/328e7e487a464aad024fbde6663b7859df082b7b

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/powernv: Rate limit opal-elog read failure message
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: aneesh.kumar, mahesh, hegdevasant
In-Reply-To: <20201211021140.28402-1-ajd@linux.ibm.com>

On Fri, 11 Dec 2020 13:11:41 +1100, Andrew Donnellan wrote:
> Sometimes we can't read an error log from OPAL, and we print an error
> message accordingly. But the OPAL userspace tools seem to like retrying a
> lot, in which case we flood the kernel log with a lot of messages.
> 
> Change pr_err() to pr_err_ratelimited() to help with this.

Applied to powerpc/next.

[1/1] powerpc/powernv: Rate limit opal-elog read failure message
      https://git.kernel.org/powerpc/c/c88017cf2af614409da69934c1738ed5ff2f7022

cheers

^ permalink raw reply

* Re: [PATCH V3] powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
From: Michael Ellerman @ 2020-12-21 11:03 UTC (permalink / raw)
  To: Athira Rajeev, mpe; +Cc: maddy, linuxppc-dev
In-Reply-To: <1608022578-1532-1-git-send-email-atrajeev@linux.vnet.ibm.com>

On Tue, 15 Dec 2020 03:56:18 -0500, Athira Rajeev wrote:
> Threshold Event Counter Multiplier (TECM) is part of Monitor Mode
> Control Register A (MMCRA). This field along with Threshold Event
> Counter Exponent (TECE) is used to get threshould counter value.
> In Power10, this is a 8bit field, so patch fixes the
> current code to modify the MMCRA[TECM] extraction macro to
> handle this change. ISA v3.1 says this is a 7 bit field but
> POWER10 it's actually 8 bits which will hopefully be fixed
> in ISA v3.1 update.

Applied to powerpc/next.

[1/1] powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
      https://git.kernel.org/powerpc/c/ef0e3b650f8ddc54bb70868852f50642ee3ae765

cheers

^ permalink raw reply

* Re: [PATCH 01/23] kernel: irq: irqdescs: warn on spurious IRQ
From: Andy Shevchenko @ 2020-12-21  9:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Mark Rutland, Rich Felker, Catalin Marinas, Linux-SH,
	Alexander Shishkin, Linus Walleij, linux-mips, James Bottomley,
	Paul Mackerras, H. Peter Anvin, sparclinux, linux-ia64,
	Will Deacon, gerg, Linux-Arch, linux-s390, linux-c6x-dev,
	Yoshinori Sato, Jiri Olsa, Helge Deller,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Bartosz Golaszewski, Tony Lindgren, Geert Uytterhoeven, msalter,
	Arnd Bergmann, linux-alpha, jacquiot.aurelien,
	open list:GPIO SUBSYSTEM, linux-m68k, Borislav Petkov,
	Namhyung Kim, Thomas Gleixner, Linux OMAP Mailing List,
	Thomas Bogendoerfer, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Linux Kernel Mailing List, Marc Zyngier,
	Enrico Weigelt, metux IT consult, David S. Miller
In-Reply-To: <87ft3zyaqa.fsf@mpe.ellerman.id.au>

On Mon, Dec 21, 2020 at 7:44 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> Andy Shevchenko <andy.shevchenko@gmail.com> writes:
> > On Fri, Dec 18, 2020 at 4:37 PM Enrico Weigelt, metux IT consult
> > <info@metux.net> wrote:
> >
> >> +               if (printk_ratelimit())
> >> +                       pr_warn("spurious IRQ: irq=%d hwirq=%d nr_irqs=%d\n",
> >> +                               irq, hwirq, nr_irqs);
> >
> > Perhaps you missed pr_warn_ratelimit() macro which is already in the
> > kernel for a long time.
>
> pr_warn_ratelimited() which calls printk_ratelimited().

I stand corrected.
Right, that's what I had in mind (actually didn't know that there are variants).

Thanks!

> And see the comment above printk_ratelimit():
>
> /*
>  * Please don't use printk_ratelimit(), because it shares ratelimiting state
>  * with all other unrelated printk_ratelimit() callsites.  Instead use
>  * printk_ratelimited() or plain old __ratelimit().
>  */


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v2] powerpc/perf/hv-24x7: Dont create sysfs event files for dummy events
From: kajoljain @ 2020-12-21  9:02 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: suka, maddy, atrajeev
In-Reply-To: <87im8vyawb.fsf@mpe.ellerman.id.au>



On 12/21/20 11:10 AM, Michael Ellerman wrote:
> Kajol Jain <kjain@linux.ibm.com> writes:
>> hv_24x7 performance monitoring unit creates list of supported events
>> from the event catalog obtained via HCALL. hv_24x7 catalog could also
>> contain invalid or dummy events (with names like FREE_* or CPM_FREE_*
>> and RESERVED*). These events do not have any hardware counters
>> backing them. So patch adds a check to string compare the event names
>> to filter out them.
>>
>> Result in power9 machine:
>>
>> Before this patch:
>> .....
>>   hv_24x7/PM_XLINK2_OUT_ODD_CYC,chip=?/              [Kernel PMU event]
>>   hv_24x7/PM_XLINK2_OUT_ODD_DATA_COUNT,chip=?/       [Kernel PMU event]
>>   hv_24x7/PM_XLINK2_OUT_ODD_TOTAL_UTIL,chip=?/       [Kernel PMU event]
>>   hv_24x7/PM_XTS_ATR_DEMAND_CHECKOUT,chip=?/         [Kernel PMU event]
>>   hv_24x7/PM_XTS_ATR_DEMAND_CHECKOUT_MISS,chip=?/    [Kernel PMU event]
>>   hv_24x7/PM_XTS_ATSD_SENT,chip=?/                   [Kernel PMU event]
>>   hv_24x7/PM_XTS_ATSD_TLBI_RCV,chip=?/               [Kernel PMU event]
>>   hv_24x7/RESERVED_NEST1,chip=?/                     [Kernel PMU event]
>>   hv_24x7/RESERVED_NEST10,chip=?/                    [Kernel PMU event]
>>   hv_24x7/RESERVED_NEST11,chip=?/                    [Kernel PMU event]
>>   hv_24x7/RESERVED_NEST12,chip=?/                    [Kernel PMU event]
>>   hv_24x7/RESERVED_NEST13,chip=?/                    [Kernel PMU event]
>> ......
>>
>> Dmesg:
>> [    0.000362] printk: console [hvc0] enabled
>> [    0.815452] hv-24x7: read 1530 catalog entries, created 537 event attrs
>> (0 failures), 275 descs
>>
>> After this patch:
>> ......
>>   hv_24x7/PM_XLINK2_OUT_ODD_AVLBL_CYC,chip=?/        [Kernel PMU event]
>>   hv_24x7/PM_XLINK2_OUT_ODD_CYC,chip=?/              [Kernel PMU event]
>>   hv_24x7/PM_XLINK2_OUT_ODD_DATA_COUNT,chip=?/       [Kernel PMU event]
>>   hv_24x7/PM_XLINK2_OUT_ODD_TOTAL_UTIL,chip=?/       [Kernel PMU event]
>>   hv_24x7/PM_XTS_ATR_DEMAND_CHECKOUT,chip=?/         [Kernel PMU event]
>>   hv_24x7/PM_XTS_ATR_DEMAND_CHECKOUT_MISS,chip=?/    [Kernel PMU event]
>>   hv_24x7/PM_XTS_ATSD_SENT,chip=?/                   [Kernel PMU event]
>>   hv_24x7/PM_XTS_ATSD_TLBI_RCV,chip=?/               [Kernel PMU event]
>>   hv_24x7/TOD,chip=?/                                [Kernel PMU event]
>> ......
>>
>> Demsg:
>> [    0.000357] printk: console [hvc0] enabled
>> [    0.808592] hv-24x7: read 1530 catalog entries, created 509 event attrs
>> (0 failures), 275 descs
>>
>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>> ---
>>  arch/powerpc/perf/hv-24x7.c | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> ---
>> Changelog
>> v1 -> v2
>> - Include "RESERVED*" as part of the invalid event check as
>>   suggested by Madhavan Srinivasan
>> - Add new helper function "ignore_event" to check invalid/dummy
>>   events as suggested by Michael Ellerman
>> - Remove pr_info to print each invalid event as suggested by
>>   Michael Ellerman
>> ---
>> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
>> index 6e7e820508df..1a6004d88f98 100644
>> --- a/arch/powerpc/perf/hv-24x7.c
>> +++ b/arch/powerpc/perf/hv-24x7.c
>> @@ -764,6 +764,16 @@ static ssize_t catalog_event_len_validate(struct hv_24x7_event_data *event,
>>  	return ev_len;
>>  }
>>  
>> +/*
>> + * Return true incase of invalid or dummy events with names like FREE_* or CPM_FREE_*
>> + * and RESERVED*
>> + */
>> +static bool ignore_event(const char *name)
>> +{
>> +	return (strstr(name, "FREE_") || !strncmp(name, "RESERVED", 8)) ?
>> +			true : false;
> 
> That's FREE_ anywhere in the string, which seems a bit loose.
> 
> Do we have any documentation or anything that tells us that any event
> with "FREE_" in the name will always be invalid?

Hi Michael,
      We don't have any such document which says any event with "FREE_"
in the name will be invalid. So I will replace strstr check with strcmp
to look for events with names have "FREE" or "CPM_FREE" at start.

Thanks,
Kajol Jain

> 
> cheers
> 

^ permalink raw reply

* Re: [PATCH 23/23] powerpc/pseries/eeh: Make pseries_send_allow_unfreeze() static
From: Christophe Leroy @ 2020-12-21  8:49 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev; +Cc: Frederic Barrat
In-Reply-To: <20201221074222.403894-24-clg@kaod.org>



Le 21/12/2020 à 08:42, Cédric Le Goater a écrit :
> It fixes these W=1 compile error :
> 
> ../arch/powerpc/platforms/pseries/eeh_pseries.c:697:5: error: no previous prototype for ‘pseries_send_allow_unfreeze’ [-Werror=missing-prototypes]
>    697 | int pseries_send_allow_unfreeze(struct pci_dn *pdn,
>        |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Cc: Frederic Barrat <fbarrat@linux.ibm.com>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>   arch/powerpc/platforms/pseries/eeh_pseries.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index de45ceb634f9..cb615dbd35e7 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -694,8 +694,8 @@ static int pseries_eeh_write_config(struct eeh_dev *edev, int where, int size, u
>   }
>   
>   #ifdef CONFIG_PCI_IOV
> -int pseries_send_allow_unfreeze(struct pci_dn *pdn,
> -				u16 *vf_pe_array, int cur_vfs)
> +static int pseries_send_allow_unfreeze(struct pci_dn *pdn,
> +				       u16 *vf_pe_array, int cur_vfs)

Doesn't this fit on one line ?

>   {
>   	int rc;
>   	int ibm_allow_unfreeze = rtas_token("ibm,open-sriov-allow-unfreeze");
> 

^ permalink raw reply

* Re: [PATCH 17/23] powerpc/watchdog: Declare soft_nmi_interrupt() prototype
From: Christophe Leroy @ 2020-12-21  8:48 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201221074222.403894-18-clg@kaod.org>



Le 21/12/2020 à 08:42, Cédric Le Goater a écrit :
> It fixes this W=1 compile error :
> 
> ../arch/powerpc/kernel/watchdog.c:250:6: error: no previous prototype for ‘soft_nmi_interrupt’ [-Werror=missing-prototypes]
>    250 | void soft_nmi_interrupt(struct pt_regs *regs)
>        |      ^~~~~~~~~~~~~~~~~~
> 
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>   arch/powerpc/include/asm/asm-prototypes.h | 1 +

This is a misuse of asm/asm-prototypes.h

This file is for prototypes of ASM functions.

See discussion at 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/1463534212-4879-2-git-send-email-dja@axtens.net/


>   arch/powerpc/kernel/watchdog.c            | 1 +
>   2 files changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
> index d0b832cbbec8..0f39eefbd5a5 100644
> --- a/arch/powerpc/include/asm/asm-prototypes.h
> +++ b/arch/powerpc/include/asm/asm-prototypes.h
> @@ -84,6 +84,7 @@ void machine_check_exception(struct pt_regs *regs);
>   void emulation_assist_interrupt(struct pt_regs *regs);
>   long do_slb_fault(struct pt_regs *regs, unsigned long ea);
>   void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err);
> +void soft_nmi_interrupt(struct pt_regs *regs);
>   
>   /* signals, syscalls and interrupts */
>   long sys_swapcontext(struct ucontext __user *old_ctx,
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index af3c15a1d41e..855716f563ac 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -27,6 +27,7 @@
>   #include <linux/smp.h>
>   
>   #include <asm/paca.h>
> +#include <asm/asm-prototypes.h>
>   
>   /*
>    * The powerpc watchdog ensures that each CPU is able to service timers.
> 

^ permalink raw reply

* Re: [PATCH 13/23] powerpc/mm: Move hpte_insert_repeating() prototype
From: Christophe Leroy @ 2020-12-21  8:16 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev; +Cc: Aneesh Kumar K.V
In-Reply-To: <20201221074222.403894-14-clg@kaod.org>



Le 21/12/2020 à 08:42, Cédric Le Goater a écrit :
> It fixes this W=1 compile error :
> 
> ../arch/powerpc/mm/book3s64/hash_utils.c:1867:6: error: no previous prototype for ‘hpte_insert_repeating’ [-Werror=missing-prototypes]
>   1867 | long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
>        |      ^~~~~~~~~~~~~~~~~~~~~
> 
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>   arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 +++++
>   arch/powerpc/mm/book3s64/hash_hugetlbpage.c   | 4 ----
>   2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index a94fd4e0c182..76ff95950309 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -454,6 +454,11 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>   #define HPTE_NOHPTE_UPDATE	0x2
>   #define HPTE_USE_KERNEL_KEY	0x4
>   
> +extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
> +				  unsigned long pa, unsigned long rlags,
> +				  unsigned long vflags, int psize, int ssize);
> +
> +

Don't copy the 'extern' keyword. It is useless for function prototypes.

Then you could probably fit on only two lines (nowadays 100 chars are allowed per line)


>   extern int __hash_page_4K(unsigned long ea, unsigned long access,
>   			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>   			  unsigned long flags, int ssize, int subpage_prot);
> diff --git a/arch/powerpc/mm/book3s64/hash_hugetlbpage.c b/arch/powerpc/mm/book3s64/hash_hugetlbpage.c
> index b5e9fff8c217..a688e1324ae5 100644
> --- a/arch/powerpc/mm/book3s64/hash_hugetlbpage.c
> +++ b/arch/powerpc/mm/book3s64/hash_hugetlbpage.c
> @@ -16,10 +16,6 @@
>   unsigned int hpage_shift;
>   EXPORT_SYMBOL(hpage_shift);
>   
> -extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
> -				  unsigned long pa, unsigned long rlags,
> -				  unsigned long vflags, int psize, int ssize);
> -
>   int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
>   		     pte_t *ptep, unsigned long trap, unsigned long flags,
>   		     int ssize, unsigned int shift, unsigned int mmu_psize)
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox