LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2 3/6] powerpc/eeh: Improve debug messages around device addition
From: Oliver O'Halloran @ 2019-06-20  3:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Sam Bobroff, linuxppc-dev, Tyrel Datwyler
In-Reply-To: <e8f68068-bb62-6d2e-f484-d6a111811fbc@ozlabs.ru>

On Thu, Jun 20, 2019 at 12:40 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
> On 19/06/2019 14:27, Sam Bobroff wrote:
> > On Tue, Jun 11, 2019 at 03:47:58PM +1000, Alexey Kardashevskiy wrote:
> >>
> >> On 07/05/2019 14:30, Sam Bobroff wrote:
> >>> Also remove useless comment.
> >>>
> >>> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> >>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>> ---
> *snip*
> >
> > I can see that edev will be non-NULL here, but that pr_debug() pattern
> > (using the PDN information to form the PCI address) is quite common
> > across the EEH code, so I think rather than changing a couple of
> > specific cases, I should do a separate cleanup patch and introduce
> > something like pdn_debug(pdn, "...."). What do you think?
>
> I'd switch them all to already existing dev_dbg/pci_debug rather than
> adding pdn_debug as imho it should not have been used in the first place
> really...
>
> > (I don't know exactly when edev->pdev can be NULL.)
>
> ... and if you switch to dev_dbg/pci_debug, I think quite soon you'll
> know if it can or cannot be NULL :)

As far as I can tell edev->pdev is NULL in two cases:

1. Before eeh_device_add_late() has been called on the pdev. The late
part of the add maps the pdev to an edev and sets the pdev's edev
pointer and vis a vis.
2. While recoverying EEH unaware devices. Unaware devices are
destroyed and rescanned and the edev->pdev pointer is cleared by
pcibios_device_release()

In most of these cases it should be safe to use the pci_*() functions
rather than making a new one up for printing pdns. In the cases where
we might not have a PCI dev i'd make a new set of prints that take an
EEH dev rather than a pci_dn since i'd like pci_dn to die sooner
rather than later.

Oliver

^ permalink raw reply

* Re: [PATCH v2 3/6] powerpc/eeh: Improve debug messages around device addition
From: Alexey Kardashevskiy @ 2019-06-20  2:40 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: oohall, linuxppc-dev, tyreld
In-Reply-To: <20190619042706.GA24143@tungsten.ozlabs.ibm.com>



On 19/06/2019 14:27, Sam Bobroff wrote:
> On Tue, Jun 11, 2019 at 03:47:58PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 07/05/2019 14:30, Sam Bobroff wrote:
>>> Also remove useless comment.
>>>
>>> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
>>> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  arch/powerpc/kernel/eeh.c                    |  2 +-
>>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 14 ++++++++----
>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 23 +++++++++++++++-----
>>>  3 files changed, 28 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>>> index 8d3c36a1f194..b14d89547895 100644
>>> --- a/arch/powerpc/kernel/eeh.c
>>> +++ b/arch/powerpc/kernel/eeh.c
>>> @@ -1291,7 +1291,7 @@ void eeh_add_device_late(struct pci_dev *dev)
>>>  	pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>>>  	edev = pdn_to_eeh_dev(pdn);
>>>  	if (edev->pdev == dev) {
>>> -		pr_debug("EEH: Already referenced !\n");
>>> +		pr_debug("EEH: Device %s already referenced!\n", pci_name(dev));
>>>  		return;
>>>  	}
>>>  
>>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> index 6fc1a463b796..0e374cdba961 100644
>>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> @@ -50,10 +50,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
>>>  	if (!pdev->is_virtfn)
>>>  		return;
>>>  
>>> -	/*
>>> -	 * The following operations will fail if VF's sysfs files
>>> -	 * aren't created or its resources aren't finalized.
>>> -	 */
>>> +	pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
>>
>>
>> dev_dbg() seems more appropriate.
> 
> Oh! It does, or even pci_debug() :-)
> 
> I'll change it if I need to do another version, otherwise I'll clean it
> up later.
> 
>>>  	eeh_add_device_early(pdn);
>>>  	eeh_add_device_late(pdev);
>>>  	eeh_sysfs_add_device(pdev);
>>> @@ -397,6 +394,10 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
>>>  	int ret;
>>>  	int config_addr = (pdn->busno << 8) | (pdn->devfn);
>>>  
>>> +	pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
>>> +		__func__, hose->global_number, pdn->busno,
>>> +		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>> +
>>>  	/*
>>>  	 * When probing the root bridge, which doesn't have any
>>>  	 * subordinate PCI devices. We don't have OF node for
>>> @@ -491,6 +492,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
>>>  	/* Save memory bars */
>>>  	eeh_save_bars(edev);
>>>  
>>> +	pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
>>> +		__func__, pdn->busno, PCI_SLOT(pdn->devfn),
>>> +		PCI_FUNC(pdn->devfn), edev->pe->phb->global_number,
>>> +		edev->pe->addr);
>>> +
>>>  	return NULL;
>>>  }
>>>  
>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> index 7aa50258dd42..ae06878fbdea 100644
>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> @@ -65,6 +65,8 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
>>>  	if (!pdev->is_virtfn)
>>>  		return;
>>>  
>>> +	pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
>>> +
>>>  	pdn->device_id  =  pdev->device;
>>>  	pdn->vendor_id  =  pdev->vendor;
>>>  	pdn->class_code =  pdev->class;
>>> @@ -251,6 +253,10 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
>>>  	int enable = 0;
>>>  	int ret;
>>>  
>>> +	pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
>>> +		__func__, pdn->phb->global_number, pdn->busno,
>>> +		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
>>> +
>>>  	/* Retrieve OF node and eeh device */
>>>  	edev = pdn_to_eeh_dev(pdn);
>>>  	if (!edev || edev->pe)
>>> @@ -294,7 +300,12 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
>>>  
>>>  	/* Enable EEH on the device */
>>>  	ret = eeh_ops->set_option(&pe, EEH_OPT_ENABLE);
>>> -	if (!ret) {
>>> +	if (ret) {
>>> +		pr_debug("%s: EEH failed to enable on %02x:%02x.%01x PHB#%x-PE#%x (code %d)\n",
>>> +			__func__, pdn->busno, PCI_SLOT(pdn->devfn),
>>> +			PCI_FUNC(pdn->devfn), pe.phb->global_number,
>>> +			pe.addr, ret);
>>> +	} else {
>>
>>
>> edev!=NULL here so you could do dev_dbg(&edev->pdev->dev,...) and skip
>> PCI_SLOT/PCI_FUNC. Or is (edev!=NULL && edev->pdev==NULL) possible (it
>> could be, just asking)?
> 
> I can see that edev will be non-NULL here, but that pr_debug() pattern
> (using the PDN information to form the PCI address) is quite common
> across the EEH code, so I think rather than changing a couple of
> specific cases, I should do a separate cleanup patch and introduce
> something like pdn_debug(pdn, "...."). What do you think?


I'd switch them all to already existing dev_dbg/pci_debug rather than
adding pdn_debug as imho it should not have been used in the first place
really...

> (I don't know exactly when edev->pdev can be NULL.)

... and if you switch to dev_dbg/pci_debug, I think quite soon you'll
know if it can or cannot be NULL :)



> 
>>
>>>  		/* Retrieve PE address */
>>>  		edev->pe_config_addr = eeh_ops->get_pe_addr(&pe);
>>>  		pe.addr = edev->pe_config_addr;
>>> @@ -310,11 +321,6 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
>>>  		if (enable) {
>>>  			eeh_add_flag(EEH_ENABLED);
>>>  			eeh_add_to_parent_pe(edev);
>>> -
>>> -			pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
>>> -				__func__, pdn->busno, PCI_SLOT(pdn->devfn),
>>> -				PCI_FUNC(pdn->devfn), pe.phb->global_number,
>>> -				pe.addr);
>>>  		} else if (pdn->parent && pdn_to_eeh_dev(pdn->parent) &&
>>>  			   (pdn_to_eeh_dev(pdn->parent))->pe) {
>>>  			/* This device doesn't support EEH, but it may have an
>>> @@ -323,6 +329,11 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
>>>  			edev->pe_config_addr = pdn_to_eeh_dev(pdn->parent)->pe_config_addr;
>>>  			eeh_add_to_parent_pe(edev);
>>>  		}
>>> +		pr_debug("%s: EEH %s on %02x:%02x.%01x PHB#%x-PE#%x (code %d)\n",
>>> +			__func__, (enable ? "enabled" : "unsupported"),
>>> +			pdn->busno, PCI_SLOT(pdn->devfn),
>>> +			PCI_FUNC(pdn->devfn), pe.phb->global_number,
>>> +			pe.addr, ret);
>>
>> Same here. I understand though this one is a cut-n-paste :)
>>
>>
>>>  	}
>>>  
>>>  	/* Save memory bars */
>>>
>>
>> -- 
>> Alexey

-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 01/10] powerpc/8xx: move CPM1 related files from sysdev/ to platforms/8xx
From: kbuild test robot @ 2019-06-20  2:17 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-kernel, oss, Paul Mackerras, kbuild-all, linuxppc-dev
In-Reply-To: <1930b7e67ab361e68f356e7fdeba403e32ce3ad0.1560416986.git.christophe.leroy@c-s.fr>

[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]

Hi Christophe,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.2-rc5 next-20190619]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-8xx-move-CPM1-related-files-from-sysdev-to-platforms-8xx/20190613-184514
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-tqm8555_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> make[3]: *** No rule to make target 'arch/powerpc/sysdev/cpm_gpio.o', needed by 'arch/powerpc/sysdev/built-in.a'.
   make[3]: Target '__build' not remade because of errors.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 16157 bytes --]

^ permalink raw reply

* [PATCH 2/3] KVM: PPC: Book3S HV: Signed extend decrementer value if not using large decr
From: Suraj Jitindar Singh @ 2019-06-20  1:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: clg, kvm-ppc, sjitindarsingh
In-Reply-To: <20190620014651.7645-1-sjitindarsingh@gmail.com>

On POWER9 the decrementer can operate in large decrementer mode where
the decrementer is 56 bits and signed extended to 64 bits. When not
operating in this mode the decrementer behaves as a 32 bit decrementer
which is NOT signed extended (as on POWER8).

Currently when reading a guest decrementer value we don't take into
account whether the large decrementer is enabled or not, and this means
the value will be incorrect when the guest is not using the large
decrementer. Fix this by sign extending the value read when the guest
isn't using the large decrementer.

Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests"

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d3684509da35..719fd2529eec 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3607,6 +3607,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,

 	vcpu->arch.slb_max = 0;
 	dec = mfspr(SPRN_DEC);
+	if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
+		dec = (s32) dec;
 	tb = mftb();
 	vcpu->arch.dec_expires = dec + tb;
 	vcpu->cpu = -1;
-- 
2.13.6

^ permalink raw reply related

* [PATCH 3/3] KVM: PPC: Book3S HV: Clear pending decr exceptions on nested guest entry
From: Suraj Jitindar Singh @ 2019-06-20  1:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: clg, kvm-ppc, sjitindarsingh
In-Reply-To: <20190620014651.7645-1-sjitindarsingh@gmail.com>

If we enter an L1 guest with a pending decrementer exception then this
is cleared on guest exit if the guest has writtien a positive value into
the decrementer (indicating that it handled the decrementer exception)
since there is no other way to detect that the guest has handled the
pending exception and that it should be dequeued. In the event that the
L1 guest tries to run a nested (L2) guest immediately after this and the
L2 guest decrementer is negative (which is loaded by L1 before making
the H_ENTER_NESTED hcall), then the pending decrementer exception
isn't cleared and the L2 entry is blocked since L1 has a pending
exception, even though L1 may have already handled the exception and
written a positive value for it's decrementer. This results in a loop of
L1 trying to enter the L2 guest and L0 blocking the entry since L1 has
an interrupt pending with the outcome being that L2 never gets to run
and hangs.

Fix this by clearing any pending decrementer exceptions when L1 makes
the H_ENTER_NESTED hcall since it won't do this if it's decrementer has
gone negative, and anyway it's decrementer has been communicated to L0
in the hdec_expires field and L0 will return control to L1 when this
goes negative by delivering an H_DECREMENTER exception.

Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests"

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 719fd2529eec..4a5eb29b952f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4128,8 +4128,15 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run,

 	preempt_enable();

-	/* cancel pending decrementer exception if DEC is now positive */
-	if (get_tb() < vcpu->arch.dec_expires && kvmppc_core_pending_dec(vcpu))
+	/*
+	 * cancel pending decrementer exception if DEC is now positive, or if
+	 * entering a nested guest in which case the decrementer is now owned
+	 * by L2 and the L1 decrementer is provided in hdec_expires
+	 */
+	if (kvmppc_core_pending_dec(vcpu) &&
+			((get_tb() < vcpu->arch.dec_expires) ||
+			 (trap == BOOK3S_INTERRUPT_SYSCALL &&
+			  kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED)))
 		kvmppc_core_dequeue_dec(vcpu);

 	trace_kvm_guest_exit(vcpu);
-- 
2.13.6

^ permalink raw reply related

* [PATCH 1/3] KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
From: Suraj Jitindar Singh @ 2019-06-20  1:46 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: clg, kvm-ppc, sjitindarsingh

When a guest vcpu moves from one physical thread to another it is
necessary for the host to perform a tlb flush on the previous core if
another vcpu from the same guest is going to run there. This is because the
guest may use the local form of the tlb invalidation instruction meaning
stale tlb entries would persist where it previously ran. This is handled
on guest entry in kvmppc_check_need_tlb_flush() which calls
flush_guest_tlb() to perform the tlb flush.

Previously the generic radix__local_flush_tlb_lpid_guest() function was
used, however the functionality was reimplemented in flush_guest_tlb()
to avoid the trace_tlbie() call as the flushing may be done in real
mode. The reimplementation in flush_guest_tlb() was missing an erat
invalidation after flushing the tlb.

This lead to observable memory corruption in the guest due to the
caching of stale translations. Fix this by adding the erat invalidation.

Fixes: 70ea13f6e609 "KVM: PPC: Book3S HV: Flush TLB on secondary radix threads"

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
---
 arch/powerpc/kvm/book3s_hv_builtin.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 6035d24f1d1d..a46286f73eec 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -833,6 +833,7 @@ static void flush_guest_tlb(struct kvm *kvm)
 		}
 	}
 	asm volatile("ptesync": : :"memory");
+	asm volatile(PPC_INVALIDATE_ERAT : : :"memory");
 }

 void kvmppc_check_need_tlb_flush(struct kvm *kvm, int pcpu,
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH 3/4] powerpc/powernv: remove dead NPU DMA code
From: Alexey Kardashevskiy @ 2019-06-20  1:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <20190619072837.GA6858@lst.de>



On 19/06/2019 17:28, Christoph Hellwig wrote:
> On Wed, Jun 19, 2019 at 10:34:54AM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 23/05/2019 17:49, Christoph Hellwig wrote:
>>> None of these routines were ever used since they were added to the
>>> kernel.
>>
>>
>> It is still being used exactly in the way as it was explained before in
>> previous respins. Thanks.
> 
> Please point to the in-kernel user, because that is the only relevant
> one.  This is not just my opinion but we had a clear discussion on that
> at least years kernel summit.


There is no in-kernel user which still does not mean that the code is
dead. If it is irrelevant - put this to the commit log instead of saying
it is dead; also if there was a clear outcome from that discussion, then
please point me to that, I do not get to attend these discussions. Thanks,


-- 
Alexey

^ permalink raw reply

* Re: [PATCH 0/2] Fix handling of h_set_dawr
From: Suraj Jitindar Singh @ 2019-06-20  1:45 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev; +Cc: mikey, kvm-ppc
In-Reply-To: <87e219c8-1db7-9976-03ce-5a566a8df7ab@kaod.org>

On Mon, 2019-06-17 at 11:06 +0200, Cédric Le Goater wrote:
> On 17/06/2019 09:16, Suraj Jitindar Singh wrote:
> > Series contains 2 patches to fix the host in kernel handling of the
> > hcall
> > h_set_dawr.
> > 
> > First patch from Michael Neuling is just a resend added here for
> > clarity.
> > 
> > Michael Neuling (1):
> >   KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
> > 
> > Suraj Jitindar Singh (1):
> >   KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr
> > in
> >     real mode
> 
> 
> 
> Reviewed-by: Cédric Le Goater <clg@kaod.org>
> 
> and 
> 
> Tested-by: Cédric Le Goater <clg@kaod.org>
> 
> 
> but I see slowdowns in nested as if the IPIs were not delivered. Have
> we
> touch this part in 5.2 ? 

Hi,

I've seen the same and tracked it down to decrementer exceptions not
being delivered when the guest is using large decrementer. I've got a
patch I'm about to send so I'll CC you.

Another option is to disable the large decrementer with:
-machine pseries,cap-large-decr=false

Thanks,
Suraj

> 
> Thanks,
> 
> C.
> 

^ permalink raw reply

* Re: [PATCH v5 2/2] powerpc: Fix compile issue with force DAWR
From: Christophe Leroy @ 2019-06-19 18:03 UTC (permalink / raw)
  To: Michael Neuling, mpe; +Cc: Mathieu Malaterre, hch, linuxppc-dev
In-Reply-To: <3426e38c9028694f2ea55f6adaf3b679a1bce19f.camel@neuling.org>



Le 19/06/2019 à 03:11, Michael Neuling a écrit :
> On Tue, 2019-06-18 at 18:28 +0200, Christophe Leroy wrote:
>>
>> Le 04/06/2019 à 05:00, Michael Neuling a écrit :
>>> If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
>>> at linking with:
>>>     arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined
>>> reference to `dawr_force_enable'
>>>
>>> This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
>>> DAWR on P9 option").
>>>
>>> This moves a bunch of code around to fix this. It moves a lot of the
>>> DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable
>>> compiling it.
>>
>> After looking at all this once more, I'm just wondering: why are we
>> creating stuff specific to DAWR ?
>>
>> In the old days, we only add DABR, and everything was named on DABR.
>> When DAWR was introduced some years ago we renamed stuff like do_dabr()
>> to do_break() so that we could regroup things together. And now we are
>> taking dawr() out of the rest. Why not keep dabr() stuff and dawr()
>> stuff all together in something dedicated to breakpoints, and try to
>> regroup all breakpoint stuff in a single place ? I see some
>> breakpointing stuff done in kernel/process.c and other things done in
>> hw_breakpoint.c, to common functions call from one file to the other,
>> preventing GCC to fully optimise, etc ...
>>
>> Also, behing this thinking, I have the idea that we could easily
>> implement 512 bytes breakpoints on the 8xx too. The 8xx have neither
>> DABR nor DAWR, but is using a set of comparators. And as you can see in
>> the 8xx version of __set_dabr() in kernel/process.c, we emulate the DABR
>> behaviour by setting two comparators. By using the same comparators with
>> a different setup, we should be able to implement breakpoints on larger
>> ranges of address.
> 
> Christophe
> 
> I agree that their are opportunities to refactor this code and I appreciate your
> efforts in making this code better but...
> 
> We have a problem here of not being able to compile an odd ball case that almost
> no one ever hits (it was just an odd mpe CI case). We're up to v5 of a simple
> fix which is just silly.
> 
> So let's get this fix in and move on to the whole bunch of refactoring we can do
> in this code which is already documented in the github issue tracking.
> 

Agreed.

I've filed the following issue to keep that in mind: 
https://github.com/linuxppc/issues/issues/251

Thanks
Christophe

^ permalink raw reply

* Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel
From: Naveen N. Rao @ 2019-06-19 17:14 UTC (permalink / raw)
  To: Masami Hiramatsu, Ingo Molnar, Michael Ellerman, Nicholas Piggin,
	Steven Rostedt
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1560939496.ovo51ph4i4.astroid@bobo.none>

Nicholas Piggin wrote:
> Naveen N. Rao's on June 19, 2019 7:53 pm:
>> Nicholas Piggin wrote:
>>> Michael Ellerman's on June 19, 2019 3:14 pm:
>>>> 
>>>> I'm also not convinced the ordering between the two patches is
>>>> guaranteed by the ISA, given that there's possibly no isync on the other
>>>> CPU.
>>> 
>>> Will they go through a context synchronizing event?
>>> 
>>> synchronize_rcu_tasks() should ensure a thread is scheduled away, but
>>> I'm not actually sure it guarantees CSI if it's kernel->kernel. Could
>>> do a smp_call_function to do the isync on each CPU to be sure.
>> 
>> Good point. Per 
>> Documentation/RCU/Design/Requirements/Requirements.html#Tasks RCU:
>> "The solution, in the form of Tasks RCU, is to have implicit read-side 
>> critical sections that are delimited by voluntary context switches, that 
>> is, calls to schedule(), cond_resched(), and synchronize_rcu_tasks(). In 
>> addition, transitions to and from userspace execution also delimit 
>> tasks-RCU read-side critical sections."
>> 
>> I suppose transitions to/from userspace, as well as calls to schedule() 
>> result in context synchronizing instruction being executed. But, if some 
>> tasks call cond_resched() and synchronize_rcu_tasks(), we probably won't 
>> have a CSI executed.
>> 
>> Also:
>> "In CONFIG_PREEMPT=n kernels, trampolines cannot be preempted, so these 
>> APIs map to call_rcu(), synchronize_rcu(), and rcu_barrier(), 
>> respectively."
>> 
>> In this scenario as well, I think we won't have a CSI executed in case 
>> of cond_resched().
>> 
>> Should we enhance patch_instruction() to handle that?
> 
> Well, not sure. Do we have many post-boot callers of it? Should
> they take care of their own synchronization requirements?

Kprobes and ftrace are the two users (along with anything else that may 
use jump labels).

Looking at this from the CMODX perspective: the main example quoted of 
an erratic behavior is when any variant of the patched instruction 
causes an exception.

With ftrace, I think we are ok since we only ever patch a 'nop' or a 
'bl' (and the 'mflr' now), none of which should cause an exception. As 
such, the existing patch_instruction() should suffice.

However, with kprobes, we patch a 'trap' (or a branch in case of 
optprobes) on most instructions. I wonder if we should be issuing an 
'isync' on all cpus in this case. Or, even if that is sufficient or 
necessary.


Thanks,
Naveen



^ permalink raw reply

* Re: [PATCH 2/3] powerpc/64s/radix: ioremap use ioremap_page_range
From: Christophe Leroy @ 2019-06-19 16:25 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <1560915874.eudrz3r20a.astroid@bobo.none>

Le 19/06/2019 à 05:59, Nicholas Piggin a écrit :
> Christophe Leroy's on June 11, 2019 4:46 pm:
>>
>>
>> Le 10/06/2019 à 05:08, Nicholas Piggin a écrit :
> I would like to remove the early ioremap or make it into its own
> function. Re-implement map_kernel_page with ioremap_page_range,
> allow page tables that don't use slab to avoid the early check,
> unbolt the hptes mapped in early boot, etc.

Getting early ioremap out of the picture is a very good idea, it will 
help making things more common between all platform types. Today we face 
the fact that PPC32 allocates early io from the top of memory while 
PPC64 allocates it from the bottom of memory.

Any idea on how to proceed ?

Christophe

^ permalink raw reply

* [PATCH v3] KVM: PPC: Report single stepping capability
From: Fabiano Rosas @ 2019-06-19 16:01 UTC (permalink / raw)
  To: kvm-ppc; +Cc: kvm, rkrcmar, aik, pbonzini, linuxppc-dev, david

When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
the next instruction to be single stepped via the
KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.

This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
to inform userspace about the state of single stepping support.

We currently don't have support for guest single stepping implemented
in Book3S HV so the capability is only present for Book3S PR and
BookE.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---

v1 -> v2:
 - add capability description to Documentation/virtual/kvm/api.txt

v2 -> v3:
 - be explicit in the commit message about when the capability is
   present
 - remove unnecessary check for CONFIG_BOOKE

 Documentation/virtual/kvm/api.txt | 3 +++
 arch/powerpc/kvm/powerpc.c        | 2 ++
 include/uapi/linux/kvm.h          | 1 +
 3 files changed, 6 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index ba6c42c576dd..a77643bfa917 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2969,6 +2969,9 @@ can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
 KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
 indicating the number of supported registers.

+For ppc, the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability indicates whether
+the single-step debug event (KVM_GUESTDBG_SINGLESTEP) is supported.
+
 When debug events exit the main run loop with the reason
 KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
 structure containing architecture specific debug information.
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6d704ad2472b..bd0a73eaf7ba 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -527,6 +527,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_IMMEDIATE_EXIT:
 		r = 1;
 		break;
+	case KVM_CAP_PPC_GUEST_DEBUG_SSTEP:
+		/* fall through */
 	case KVM_CAP_PPC_PAIRED_SINGLES:
 	case KVM_CAP_PPC_OSI:
 	case KVM_CAP_PPC_GET_PVINFO:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2fe12b40d503..cad9fcd90f39 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -993,6 +993,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_SVE 170
 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
+#define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 173

 #ifdef KVM_CAP_IRQ_ROUTING

--
2.20.1


^ permalink raw reply related

* Re: [RFC PATCH v0] powerpc: Fix BUG_ON during memory unplug on radix
From: Bharata B Rao @ 2019-06-19 14:40 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, aneesh.kumar, npiggin
In-Reply-To: <87tvcm0wr5.fsf@linux.ibm.com>

On Wed, Jun 19, 2019 at 02:36:54PM +0530, Aneesh Kumar K.V wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> 
> > We hit the following BUG_ON when memory hotplugged before reboot
> > is unplugged after reboot:
> >
> > kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> >
> >  remove_pagetable+0x594/0x6a0
> >  (unreliable)
> >  remove_pagetable+0x94/0x6a0
> >  vmemmap_free+0x394/0x410
> >  sparse_remove_one_section+0x26c/0x2e8
> >  __remove_pages+0x428/0x540
> >  arch_remove_memory+0xd0/0x170
> >  __remove_memory+0xd4/0x1a0
> >  dlpar_remove_lmb+0xbc/0x110
> >  dlpar_memory+0xa80/0xd20
> >  handle_dlpar_errorlog+0xa8/0x160
> >  pseries_hp_work_fn+0x2c/0x60
> >  process_one_work+0x46c/0x860
> >  worker_thread+0x364/0x5e0
> >  kthread+0x1b0/0x1c0
> >  ret_from_kernel_thread+0x5c/0x68
> >
> > This occurs because, during reboot-after-hotplug, the hotplugged
> > memory range gets initialized as regular memory and page
> > tables are setup using memblock allocator. This means that we
> > wouldn't have initialized the PMD or PTE fragment count for
> > those PMD or PTE pages.
> >
> > Fixing this includes 3 aspects:
> >
> > - Walk the init_mm page tables from mem_init() and initialize
> >   the PMD and PTE fragment counts appropriately.
> > - When we do early allocation of PMD (and PGD as well) pages,
> >   allocate in page size PAGE_SIZE granularity so that we are
> >   sure that the complete page is available for us to set the
> >   fragment count which is part of struct page.
> 
> 
> That is an important change now. For early page table we now allocate
> PAGE_SIZE tables and hencec we consider then as pages with fragment
> count 1. You also may want to explain here why.

Sure will make this clear in my next version.

> I guess the challenge is
> due to the fact that we can't clearly control how the rest of the page
> will get used and we are not sure they all will be allocated for backing
> page table pages.
> 
> > - When PMD or PTE page is freed, check if it comes from memblock
> >   allocator and free it appropriately.
> >
> > Reported-by: Srikanth Aithal <sraithal@linux.vnet.ibm.com>
> > Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> > ---
> >  arch/powerpc/include/asm/book3s/64/radix.h |  1 +
> >  arch/powerpc/include/asm/sparsemem.h       |  1 +
> >  arch/powerpc/mm/book3s64/pgtable.c         | 12 +++-
> >  arch/powerpc/mm/book3s64/radix_pgtable.c   | 67 +++++++++++++++++++++-
> >  arch/powerpc/mm/mem.c                      |  5 ++
> >  arch/powerpc/mm/pgtable-frag.c             |  5 +-
> >  6 files changed, 87 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> > index 574eca33f893..4320f2790e8d 100644
> > --- a/arch/powerpc/include/asm/book3s/64/radix.h
> > +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> > @@ -285,6 +285,7 @@ static inline unsigned long radix__get_tree_size(void)
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> >  int radix__create_section_mapping(unsigned long start, unsigned long end, int nid);
> >  int radix__remove_section_mapping(unsigned long start, unsigned long end);
> > +void radix__fixup_pgtable_fragments(void);
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> >  #endif /* __ASSEMBLY__ */
> >  #endif
> > diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h
> > index 3192d454a733..e662f9232d35 100644
> > --- a/arch/powerpc/include/asm/sparsemem.h
> > +++ b/arch/powerpc/include/asm/sparsemem.h
> > @@ -15,6 +15,7 @@
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> >  extern int create_section_mapping(unsigned long start, unsigned long end, int nid);
> >  extern int remove_section_mapping(unsigned long start, unsigned long end);
> > +void fixup_pgtable_fragments(void);
> >
> >  #ifdef CONFIG_PPC_BOOK3S_64
> >  extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> > index 01bc9663360d..7efe9cc16b39 100644
> > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > @@ -186,6 +186,13 @@ int __meminit remove_section_mapping(unsigned long start, unsigned long end)
> >
> >  	return hash__remove_section_mapping(start, end);
> >  }
> > +
> > +void fixup_pgtable_fragments(void)
> > +{
> > +	if (radix_enabled())
> > +		radix__fixup_pgtable_fragments();
> > +}
> > +
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> >
> >  void __init mmu_partition_table_init(void)
> > @@ -320,7 +327,10 @@ void pmd_fragment_free(unsigned long *pmd)
> >  	BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
> >  	if (atomic_dec_and_test(&page->pt_frag_refcount)) {
> >  		pgtable_pmd_page_dtor(page);
> > -		__free_page(page);
> > +		if (PageReserved(page))
> > +			free_reserved_page(page);
> > +		else
> > +			__free_page(page);
> >  	}
> >  }
> >
> > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> > index 273ae66a9a45..402e8da28cab 100644
> > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> > @@ -32,6 +32,69 @@
> >  unsigned int mmu_pid_bits;
> >  unsigned int mmu_base_pid;
> >
> > +static void fixup_pmd_fragments(pmd_t *pmd)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
> > +		pte_t *pte;
> > +		struct page *page;
> > +
> > +		if (pmd_none(*pmd))
> > +			continue;
> > +		if (pmd_huge(*pmd))
> > +			continue;
> > +
> > +		pte = pte_offset_kernel(pmd, 0);
> > +		if (!pte_none(*pte)) {
> > +			page = virt_to_page(pte);
> > +			atomic_inc(&page->pt_frag_refcount);
> > +		}
> > +	}
> > +}
> 
> 
> That naming is confusing. You are fixing up fragemts used for level 4
> table here. I would call them pte fragments? 

Sure, will rename.

> 
> 
> > +
> > +static void fixup_pud_fragments(pud_t *pud)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
> > +		pmd_t *pmd;
> > +		struct page *page;
> > +
> > +		if (pud_none(*pud))
> > +			continue;
> > +		if (pud_huge(*pud))
> > +			continue;
> > +
> > +		pmd = pmd_offset(pud, 0);
> > +		if (!pmd_none(*pmd)) {
> > +			page = virt_to_page(pmd);
> > +			atomic_inc(&page->pt_frag_refcount);
> > +		}
> > +		fixup_pmd_fragments(pmd);
> > +	}
> > +}
> > +
> 
> 
> How do we handle pud table free. That is going to be tricky for general
> allocation they are allocated out of slab and we free them back to slab.
> With pages allocated out of memblock, we need to special case them? 

Yes. With out addressing the freeing of pud table, we can't say that
memory unplug on radix is working fully. However with this fixing of
frag counts, it does work for a few cases.

May be we handle the pud case separately?

Regards,
Bharata.


^ permalink raw reply

* Re: [RFC PATCH v0] powerpc: Fix BUG_ON during memory unplug on radix
From: Bharata B Rao @ 2019-06-19 14:36 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, aneesh.kumar
In-Reply-To: <1560939185.n3y8722qvc.astroid@bobo.none>

On Wed, Jun 19, 2019 at 08:17:01PM +1000, Nicholas Piggin wrote:
> Bharata B Rao's on June 19, 2019 5:45 pm:
> > We hit the following BUG_ON when memory hotplugged before reboot
> > is unplugged after reboot:
> > 
> > kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> > 
> >  remove_pagetable+0x594/0x6a0
> >  (unreliable)
> >  remove_pagetable+0x94/0x6a0
> >  vmemmap_free+0x394/0x410
> >  sparse_remove_one_section+0x26c/0x2e8
> >  __remove_pages+0x428/0x540
> >  arch_remove_memory+0xd0/0x170
> >  __remove_memory+0xd4/0x1a0
> >  dlpar_remove_lmb+0xbc/0x110
> >  dlpar_memory+0xa80/0xd20
> >  handle_dlpar_errorlog+0xa8/0x160
> >  pseries_hp_work_fn+0x2c/0x60
> >  process_one_work+0x46c/0x860
> >  worker_thread+0x364/0x5e0
> >  kthread+0x1b0/0x1c0
> >  ret_from_kernel_thread+0x5c/0x68
> > 
> > This occurs because, during reboot-after-hotplug, the hotplugged
> > memory range gets initialized as regular memory and page
> > tables are setup using memblock allocator. This means that we
> > wouldn't have initialized the PMD or PTE fragment count for
> > those PMD or PTE pages.
> > 
> > Fixing this includes 3 aspects:
> > 
> > - Walk the init_mm page tables from mem_init() and initialize
> >   the PMD and PTE fragment counts appropriately.
> > - When we do early allocation of PMD (and PGD as well) pages,
> >   allocate in page size PAGE_SIZE granularity so that we are
> >   sure that the complete page is available for us to set the
> >   fragment count which is part of struct page.
> > - When PMD or PTE page is freed, check if it comes from memblock
> >   allocator and free it appropriately.
> > 
> > Reported-by: Srikanth Aithal <sraithal@linux.vnet.ibm.com>
> > Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
> > ---
> >  arch/powerpc/include/asm/book3s/64/radix.h |  1 +
> >  arch/powerpc/include/asm/sparsemem.h       |  1 +
> >  arch/powerpc/mm/book3s64/pgtable.c         | 12 +++-
> >  arch/powerpc/mm/book3s64/radix_pgtable.c   | 67 +++++++++++++++++++++-
> >  arch/powerpc/mm/mem.c                      |  5 ++
> >  arch/powerpc/mm/pgtable-frag.c             |  5 +-
> >  6 files changed, 87 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> > index 574eca33f893..4320f2790e8d 100644
> > --- a/arch/powerpc/include/asm/book3s/64/radix.h
> > +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> > @@ -285,6 +285,7 @@ static inline unsigned long radix__get_tree_size(void)
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> >  int radix__create_section_mapping(unsigned long start, unsigned long end, int nid);
> >  int radix__remove_section_mapping(unsigned long start, unsigned long end);
> > +void radix__fixup_pgtable_fragments(void);
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> >  #endif /* __ASSEMBLY__ */
> >  #endif
> > diff --git a/arch/powerpc/include/asm/sparsemem.h b/arch/powerpc/include/asm/sparsemem.h
> > index 3192d454a733..e662f9232d35 100644
> > --- a/arch/powerpc/include/asm/sparsemem.h
> > +++ b/arch/powerpc/include/asm/sparsemem.h
> > @@ -15,6 +15,7 @@
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> >  extern int create_section_mapping(unsigned long start, unsigned long end, int nid);
> >  extern int remove_section_mapping(unsigned long start, unsigned long end);
> > +void fixup_pgtable_fragments(void);
> >  
> >  #ifdef CONFIG_PPC_BOOK3S_64
> >  extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
> > diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
> > index 01bc9663360d..7efe9cc16b39 100644
> > --- a/arch/powerpc/mm/book3s64/pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/pgtable.c
> > @@ -186,6 +186,13 @@ int __meminit remove_section_mapping(unsigned long start, unsigned long end)
> >  
> >  	return hash__remove_section_mapping(start, end);
> >  }
> > +
> > +void fixup_pgtable_fragments(void)
> > +{
> > +	if (radix_enabled())
> > +		radix__fixup_pgtable_fragments();
> > +}
> > +
> >  #endif /* CONFIG_MEMORY_HOTPLUG */
> >  
> >  void __init mmu_partition_table_init(void)
> > @@ -320,7 +327,10 @@ void pmd_fragment_free(unsigned long *pmd)
> >  	BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0);
> >  	if (atomic_dec_and_test(&page->pt_frag_refcount)) {
> >  		pgtable_pmd_page_dtor(page);
> > -		__free_page(page);
> > +		if (PageReserved(page))
> > +			free_reserved_page(page);
> 
> Hmm. Rather than adding this special case here, I wonder if you can
> just go along in your fixup walk and convert all these pages to
> non-reserved pages?
> 
> ClearPageReserved ; init_page_count ; adjust_managed_page_count ; 
> should do the trick, right?

Yes, that should. We are anyway fixing the frag count during
the walk, might as well do all the above too and avoid the special
case in the free path.

Regards,
Bharata.


^ permalink raw reply

* Re: [PATCH] ocxl: Update for AFU descriptor template version 1.1
From: christophe lombard @ 2019-06-19 14:31 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, alastair, ajd, clombard; +Cc: groug
In-Reply-To: <20190605111545.19762-1-fbarrat@linux.ibm.com>

On 05/06/2019 13:15, Frederic Barrat wrote:
> From: Alastair D'Silva <alastair@d-silva.org>
> 
> The OpenCAPI discovery and configuration specification has been
> updated and introduces version 1.1 of the AFU descriptor template,
> with new fields to better define the memory layout of an OpenCAPI
> adapter.
> 
> The ocxl driver doesn't do much yet to support LPC memory but as we
> start seeing (non-LPC) AFU images using the new template, this patch
> updates the config space parsing code to avoid spitting a warning.
> 
> Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> 

The content of the patch sounds good. Thanks.

Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>


^ permalink raw reply

* Re: [PATCH] powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
From: Mathieu Malaterre @ 2019-06-19 14:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: aaro.koskinen, LKML, Paul Mackerras, linuxppc-dev,
	Christoph Hellwig, Larry Finger
In-Reply-To: <a5fc355e44fb5edea41274329f7c5d04a8dff6fc.camel@kernel.crashing.org>

On Wed, Jun 19, 2019 at 4:18 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Wed, 2019-06-19 at 22:32 +1000, Michael Ellerman wrote:
> > Christoph Hellwig <hch@lst.de> writes:
> > > Any chance this could get picked up to fix the regression?
> >
> > Was hoping Ben would Ack it. He's still powermac maintainer :)
> >
> > I guess he OK'ed it in the other thread, will add it to my queue.
>
> Yeah ack. If I had written it myself, I would have made the DMA bits a
> variable and only set it down to 30 if I see that device in the DT
> early on, but I can't be bothered now, if it works, ship it :-)
>
> Note: The patch affects all ppc32, though I don't think it will cause
> any significant issue on those who don't need it.

Thanks, that answer my earlier question.

> Cheers,
> Ben.
>
> > cheers
> >
> > > On Thu, Jun 13, 2019 at 10:24:46AM +0200, Christoph Hellwig wrote:
> > > > With the strict dma mask checking introduced with the switch to
> > > > the generic DMA direct code common wifi chips on 32-bit
> > > > powerbooks
> > > > stopped working.  Add a 30-bit ZONE_DMA to the 32-bit pmac builds
> > > > to allow them to reliably allocate dma coherent memory.
> > > >
> > > > Fixes: 65a21b71f948 ("powerpc/dma: remove
> > > > dma_nommu_dma_supported")
> > > > Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
> > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  arch/powerpc/include/asm/page.h         | 7 +++++++
> > > >  arch/powerpc/mm/mem.c                   | 3 ++-
> > > >  arch/powerpc/platforms/powermac/Kconfig | 1 +
> > > >  3 files changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/powerpc/include/asm/page.h
> > > > b/arch/powerpc/include/asm/page.h
> > > > index b8286a2013b4..0d52f57fca04 100644
> > > > --- a/arch/powerpc/include/asm/page.h
> > > > +++ b/arch/powerpc/include/asm/page.h
> > > > @@ -319,6 +319,13 @@ struct vm_area_struct;
> > > >  #endif /* __ASSEMBLY__ */
> > > >  #include <asm/slice.h>
> > > >
> > > > +/*
> > > > + * Allow 30-bit DMA for very limited Broadcom wifi chips on many
> > > > powerbooks.
> > > > + */
> > > > +#ifdef CONFIG_PPC32
> > > > +#define ARCH_ZONE_DMA_BITS 30
> > > > +#else
> > > >  #define ARCH_ZONE_DMA_BITS 31
> > > > +#endif
> > > >
> > > >  #endif /* _ASM_POWERPC_PAGE_H */
> > > > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > > > index cba29131bccc..2540d3b2588c 100644
> > > > --- a/arch/powerpc/mm/mem.c
> > > > +++ b/arch/powerpc/mm/mem.c
> > > > @@ -248,7 +248,8 @@ void __init paging_init(void)
> > > >          (long int)((top_of_ram - total_ram) >> 20));
> > > >
> > > >  #ifdef CONFIG_ZONE_DMA
> > > > - max_zone_pfns[ZONE_DMA] = min(max_low_pfn, 0x7fffffffUL
> > > > >> PAGE_SHIFT);
> > > > + max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
> > > > +                 ((1UL << ARCH_ZONE_DMA_BITS) - 1) >>
> > > > PAGE_SHIFT);
> > > >  #endif
> > > >   max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
> > > >  #ifdef CONFIG_HIGHMEM
> > > > diff --git a/arch/powerpc/platforms/powermac/Kconfig
> > > > b/arch/powerpc/platforms/powermac/Kconfig
> > > > index f834a19ed772..c02d8c503b29 100644
> > > > --- a/arch/powerpc/platforms/powermac/Kconfig
> > > > +++ b/arch/powerpc/platforms/powermac/Kconfig
> > > > @@ -7,6 +7,7 @@ config PPC_PMAC
> > > >   select PPC_INDIRECT_PCI if PPC32
> > > >   select PPC_MPC106 if PPC32
> > > >   select PPC_NATIVE
> > > > + select ZONE_DMA if PPC32
> > > >   default y
> > > >
> > > >  config PPC_PMAC64
> > > > --
> > > > 2.20.1
> > >
> > > ---end quoted text---
>

^ permalink raw reply

* Re: [PATCH] powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
From: Benjamin Herrenschmidt @ 2019-06-19 13:31 UTC (permalink / raw)
  To: Michael Ellerman, Christoph Hellwig, paulus
  Cc: aaro.koskinen, linuxppc-dev, linux-kernel, Larry.Finger
In-Reply-To: <87tvcldacn.fsf@concordia.ellerman.id.au>

On Wed, 2019-06-19 at 22:32 +1000, Michael Ellerman wrote:
> Christoph Hellwig <hch@lst.de> writes:
> > Any chance this could get picked up to fix the regression?
> 
> Was hoping Ben would Ack it. He's still powermac maintainer :)
> 
> I guess he OK'ed it in the other thread, will add it to my queue.

Yeah ack. If I had written it myself, I would have made the DMA bits a
variable and only set it down to 30 if I see that device in the DT
early on, but I can't be bothered now, if it works, ship it :-)

Note: The patch affects all ppc32, though I don't think it will cause
any significant issue on those who don't need it.

Cheers,
Ben.

> cheers
> 
> > On Thu, Jun 13, 2019 at 10:24:46AM +0200, Christoph Hellwig wrote:
> > > With the strict dma mask checking introduced with the switch to
> > > the generic DMA direct code common wifi chips on 32-bit
> > > powerbooks
> > > stopped working.  Add a 30-bit ZONE_DMA to the 32-bit pmac builds
> > > to allow them to reliably allocate dma coherent memory.
> > > 
> > > Fixes: 65a21b71f948 ("powerpc/dma: remove
> > > dma_nommu_dma_supported")
> > > Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  arch/powerpc/include/asm/page.h         | 7 +++++++
> > >  arch/powerpc/mm/mem.c                   | 3 ++-
> > >  arch/powerpc/platforms/powermac/Kconfig | 1 +
> > >  3 files changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/powerpc/include/asm/page.h
> > > b/arch/powerpc/include/asm/page.h
> > > index b8286a2013b4..0d52f57fca04 100644
> > > --- a/arch/powerpc/include/asm/page.h
> > > +++ b/arch/powerpc/include/asm/page.h
> > > @@ -319,6 +319,13 @@ struct vm_area_struct;
> > >  #endif /* __ASSEMBLY__ */
> > >  #include <asm/slice.h>
> > >  
> > > +/*
> > > + * Allow 30-bit DMA for very limited Broadcom wifi chips on many
> > > powerbooks.
> > > + */
> > > +#ifdef CONFIG_PPC32
> > > +#define ARCH_ZONE_DMA_BITS 30
> > > +#else
> > >  #define ARCH_ZONE_DMA_BITS 31
> > > +#endif
> > >  
> > >  #endif /* _ASM_POWERPC_PAGE_H */
> > > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> > > index cba29131bccc..2540d3b2588c 100644
> > > --- a/arch/powerpc/mm/mem.c
> > > +++ b/arch/powerpc/mm/mem.c
> > > @@ -248,7 +248,8 @@ void __init paging_init(void)
> > >  	       (long int)((top_of_ram - total_ram) >> 20));
> > >  
> > >  #ifdef CONFIG_ZONE_DMA
> > > -	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn, 0x7fffffffUL
> > > >> PAGE_SHIFT);
> > > +	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
> > > +			((1UL << ARCH_ZONE_DMA_BITS) - 1) >>
> > > PAGE_SHIFT);
> > >  #endif
> > >  	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
> > >  #ifdef CONFIG_HIGHMEM
> > > diff --git a/arch/powerpc/platforms/powermac/Kconfig
> > > b/arch/powerpc/platforms/powermac/Kconfig
> > > index f834a19ed772..c02d8c503b29 100644
> > > --- a/arch/powerpc/platforms/powermac/Kconfig
> > > +++ b/arch/powerpc/platforms/powermac/Kconfig
> > > @@ -7,6 +7,7 @@ config PPC_PMAC
> > >  	select PPC_INDIRECT_PCI if PPC32
> > >  	select PPC_MPC106 if PPC32
> > >  	select PPC_NATIVE
> > > +	select ZONE_DMA if PPC32
> > >  	default y
> > >  
> > >  config PPC_PMAC64
> > > -- 
> > > 2.20.1
> > 
> > ---end quoted text---


^ permalink raw reply

* [RFC 00/11] opencapi: enable card reset and link retraining
From: Frederic Barrat @ 2019-06-19 13:28 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, clombard
  Cc: aik, arbab, oohall, groug, alastair

This is the linux part of the work to use the PCI hotplug framework to
control an opencapi card so that it can be reset and re-read after
flashing a new FPGA image.

It needs support in skiboot:
http://patchwork.ozlabs.org/project/skiboot/list/?series=114803
On an old skiboot, it will do nothing.

A virtual PCI slot is created for the opencapi adapter, and its state
can be controlled through the pnv-php hotplug driver:

  echo 0|1 > /sys/bus/pci/slots/OPENCAPI-<...>/power

Note that the power to the card is not really turned off, as the card
needs to stay on to be flashed with a new image. Instead the card is
placed in reset.

The first part of the series mostly deals with the pci/ioda state, as
the devices can now go away and the state needs to be cleaned up.

The second part is modifications to the hotplug driver on powernv, so
that a virtual slot is created for the opencapi adapters found in the
device tree

Frederic Barrat (11):
  powerpc/powernv/ioda: Fix ref count for devices with their own PE
  powerpc/powernv/ioda: Protect PE list
  powerpc/powernv/ioda: set up PE on opencapi device when enabling
  powerpc/powernv/ioda: Release opencapi device
  powerpc/powernv/ioda: Find opencapi slot for a device node
  pci/hotplug/pnv-php: Remove erroneous warning
  pci/hotplug/pnv-php: Improve error msg on power state change failure
  pci/hotplug/pnv-php: Register opencapi slots
  pci/hotplug/pnv-php: Relax check when disabling slot
  pci/hotplug/pnv-php: Wrap warnings in macro
  ocxl: Add PCI hotplug dependency to Kconfig

 arch/powerpc/include/asm/pnv-pci.h        |   1 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 106 ++++++++++++++--------
 arch/powerpc/platforms/powernv/pci.c      |  10 +-
 drivers/misc/ocxl/Kconfig                 |   1 +
 drivers/pci/hotplug/pnv_php.c             |  66 ++++++++------
 5 files changed, 115 insertions(+), 69 deletions(-)

-- 
2.21.0

^ permalink raw reply

* [RFC 07/11] pci/hotplug/pnv-php: Improve error msg on power state change failure
From: Frederic Barrat @ 2019-06-19 13:28 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, clombard
  Cc: aik, arbab, oohall, groug, alastair
In-Reply-To: <20190619132840.27634-1-fbarrat@linux.ibm.com>

When changing the slot state, if opal hits an error and tells as such
in the asynchronous reply, the warning "Wrong msg" is logged, which is
rather confusing. Instead we can reuse the better message which is
already used when we couldn't submit the asynchronous opal request
initially.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 drivers/pci/hotplug/pnv_php.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 5b5cbf1e636d..5cdd2a3a4dd9 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -336,18 +336,19 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 	ret = pnv_pci_set_power_state(php_slot->id, state, &msg);
 	if (ret > 0) {
 		if (be64_to_cpu(msg.params[1]) != php_slot->dn->phandle	||
-		    be64_to_cpu(msg.params[2]) != state			||
-		    be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) {
+		    be64_to_cpu(msg.params[2]) != state) {
 			pci_warn(php_slot->pdev, "Wrong msg (%lld, %lld, %lld)\n",
 				 be64_to_cpu(msg.params[1]),
 				 be64_to_cpu(msg.params[2]),
 				 be64_to_cpu(msg.params[3]));
 			return -ENOMSG;
 		}
+		if (be64_to_cpu(msg.params[3]) != OPAL_SUCCESS) {
+			ret = -ENODEV;
+			goto error;
+		}
 	} else if (ret < 0) {
-		pci_warn(php_slot->pdev, "Error %d powering %s\n",
-			 ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off");
-		return ret;
+		goto error;
 	}
 
 	if (state == OPAL_PCI_SLOT_POWER_OFF || state == OPAL_PCI_SLOT_OFFLINE)
@@ -356,6 +357,11 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 		ret = pnv_php_add_devtree(php_slot);
 
 	return ret;
+
+error:
+	pci_warn(php_slot->pdev, "Error %d powering %s\n",
+		ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off");
+	return ret;
 }
 EXPORT_SYMBOL_GPL(pnv_php_set_slot_power_state);
 
-- 
2.21.0


^ permalink raw reply related

* [RFC 11/11] ocxl: Add PCI hotplug dependency to Kconfig
From: Frederic Barrat @ 2019-06-19 13:28 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, clombard
  Cc: aik, arbab, oohall, groug, alastair
In-Reply-To: <20190619132840.27634-1-fbarrat@linux.ibm.com>

The PCI hotplug framework is used to update the devices when a new
image is written to the FPGA.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 drivers/misc/ocxl/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
index 7fb6d39d4c5a..13a5d9f30369 100644
--- a/drivers/misc/ocxl/Kconfig
+++ b/drivers/misc/ocxl/Kconfig
@@ -12,6 +12,7 @@ config OCXL
 	tristate "OpenCAPI coherent accelerator support"
 	depends on PPC_POWERNV && PCI && EEH
 	select OCXL_BASE
+	select HOTPLUG_PCI_POWERNV
 	default m
 	help
 	  Select this option to enable the ocxl driver for Open
-- 
2.21.0


^ permalink raw reply related

* [RFC 06/11] pci/hotplug/pnv-php: Remove erroneous warning
From: Frederic Barrat @ 2019-06-19 13:28 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, clombard
  Cc: aik, arbab, oohall, groug, alastair
In-Reply-To: <20190619132840.27634-1-fbarrat@linux.ibm.com>

On powernv, when removing a device through hotplug, the following
warning is logged:

     Invalid refcount <.> on <...>

It may be incorrect, the refcount may be set to a higher value than 1
and be valid. of_detach_node() can drop more than one reference. As it
doesn't seem trivial to assert the correct value, let's remove the
warning.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 drivers/pci/hotplug/pnv_php.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 6758fd7c382e..5b5cbf1e636d 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -151,17 +151,11 @@ static void pnv_php_rmv_pdns(struct device_node *dn)
 static void pnv_php_detach_device_nodes(struct device_node *parent)
 {
 	struct device_node *dn;
-	int refcount;
 
 	for_each_child_of_node(parent, dn) {
 		pnv_php_detach_device_nodes(dn);
 
 		of_node_put(dn);
-		refcount = kref_read(&dn->kobj.kref);
-		if (refcount != 1)
-			pr_warn("Invalid refcount %d on <%pOF>\n",
-				refcount, dn);
-
 		of_detach_node(dn);
 	}
 }
-- 
2.21.0


^ permalink raw reply related

* [RFC 04/11] powerpc/powernv/ioda: Release opencapi device
From: Frederic Barrat @ 2019-06-19 13:28 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, clombard
  Cc: aik, arbab, oohall, groug, alastair
In-Reply-To: <20190619132840.27634-1-fbarrat@linux.ibm.com>

With hotplug, an opencapi device can now go away. It needs to be
released, mostly to clean up its PE state. We were previously not
defining any device callback. We can reuse the standard PCI release
callback, it does a bit too much for an opencapi device, but it's
harmless, and only needs minor tuning.

Also separate the undo of the PELT-V code in a separate function, it
is not needed for NPU devices and it improves a bit the readability of
the code.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 58 +++++++++++++++--------
 1 file changed, 38 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2cf06fb98978..33054d00b2c5 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -186,7 +186,7 @@ static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
 	unsigned int pe_num = pe->pe_number;
 
 	WARN_ON(pe->pdev);
-	WARN_ON(pe->npucomp); /* NPUs are not supposed to be freed */
+	WARN_ON(pe->npucomp); /* NPUs for nvlink are not supposed to be freed */
 	kfree(pe->npucomp);
 	memset(pe, 0, sizeof(struct pnv_ioda_pe));
 	clear_bit(pe_num, phb->ioda.pe_alloc);
@@ -775,6 +775,33 @@ static int pnv_ioda_set_peltv(struct pnv_phb *phb,
 	return 0;
 }
 
+static void pnv_ioda_unset_peltv(struct pnv_phb *phb,
+				 struct pnv_ioda_pe *pe,
+				 struct pci_dev *parent)
+{
+	int64_t rc;
+
+	while (parent) {
+		struct pci_dn *pdn = pci_get_pdn(parent);
+		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
+			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
+						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
+			/* XXX What to do in case of error ? */
+		}
+		parent = parent->bus->self;
+	}
+
+	opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
+				OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+
+	/* Disassociate PE in PELT */
+	rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number,
+				pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
+	if (rc)
+		pe_warn(pe, "OPAL error %lld remove self from PELTV\n", rc);
+
+}
+
 static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 {
 	struct pci_dev *parent;
@@ -825,25 +852,13 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
 	for (rid = pe->rid; rid < rid_end; rid++)
 		phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
 
-	/* Release from all parents PELT-V */
-	while (parent) {
-		struct pci_dn *pdn = pci_get_pdn(parent);
-		if (pdn && pdn->pe_number != IODA_INVALID_PE) {
-			rc = opal_pci_set_peltv(phb->opal_id, pdn->pe_number,
-						pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
-			/* XXX What to do in case of error ? */
-		}
-		parent = parent->bus->self;
-	}
-
-	opal_pci_eeh_freeze_clear(phb->opal_id, pe->pe_number,
-				  OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+	/*
+	 * Release from all parents PELT-V. NPUs don't have a PELTV
+	 * table
+	 */
+	if (phb->type != PNV_PHB_NPU_NVLINK && phb->type != PNV_PHB_NPU_OCAPI)
+		pnv_ioda_unset_peltv(phb, pe, parent);
 
-	/* Disassociate PE in PELT */
-	rc = opal_pci_set_peltv(phb->opal_id, pe->pe_number,
-				pe->pe_number, OPAL_REMOVE_PE_FROM_DOMAIN);
-	if (rc)
-		pe_warn(pe, "OPAL error %lld remove self from PELTV\n", rc);
 	rc = opal_pci_set_pe(phb->opal_id, pe->pe_number, pe->rid,
 			     bcomp, dcomp, fcomp, OPAL_UNMAP_PE);
 	if (rc)
@@ -3528,6 +3543,8 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
 	case PNV_PHB_IODA2:
 		pnv_pci_ioda2_release_pe_dma(pe);
 		break;
+	case PNV_PHB_NPU_OCAPI:
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -3580,7 +3597,7 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
 	pe = &phb->ioda.pe_array[pdn->pe_number];
 	pdn->pe_number = IODA_INVALID_PE;
 
-	WARN_ON(--pe->device_count < 0);
+	WARN_ON((pe->flags != PNV_IODA_PE_DEV) && (--pe->device_count < 0));
 	if (pe->device_count == 0)
 		pnv_ioda_release_pe(pe);
 }
@@ -3629,6 +3646,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
 	.enable_device_hook	= pnv_ocapi_enable_device_hook,
+	.release_device		= pnv_pci_release_device,
 	.window_alignment	= pnv_pci_window_alignment,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.shutdown		= pnv_pci_ioda_shutdown,
-- 
2.21.0


^ permalink raw reply related

* [RFC 08/11] pci/hotplug/pnv-php: Register opencapi slots
From: Frederic Barrat @ 2019-06-19 13:28 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, clombard
  Cc: aik, arbab, oohall, groug, alastair
In-Reply-To: <20190619132840.27634-1-fbarrat@linux.ibm.com>

Add the opencapi PHBs to the list of PHBs being scanned to look for
slots.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 drivers/pci/hotplug/pnv_php.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 5cdd2a3a4dd9..f9c624334ef7 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -954,7 +954,8 @@ static int __init pnv_php_init(void)
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
 		pnv_php_register(dn);
-
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-phb")
+		pnv_php_register_one(dn);
 	return 0;
 }
 
@@ -964,6 +965,8 @@ static void __exit pnv_php_exit(void)
 
 	for_each_compatible_node(dn, NULL, "ibm,ioda2-phb")
 		pnv_php_unregister(dn);
+	for_each_compatible_node(dn, NULL, "ibm,ioda2-npu2-opencapi-phb")
+		pnv_php_unregister(dn);
 }
 
 module_init(pnv_php_init);
-- 
2.21.0


^ permalink raw reply related

* [RFC 10/11] pci/hotplug/pnv-php: Wrap warnings in macro
From: Frederic Barrat @ 2019-06-19 13:28 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, clombard
  Cc: aik, arbab, oohall, groug, alastair
In-Reply-To: <20190619132840.27634-1-fbarrat@linux.ibm.com>

An opencapi slot doesn't have an associated bridge device. It's not
needed for operation, but any warning is displayed through pci_warn()
which uses the pci_dev struct of the assocated bridge device. So wrap
those warning so that a different trace mechanism can be used if it's
an opencapi slot.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 drivers/pci/hotplug/pnv_php.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 74b62a8e11e7..08ac8f0df06c 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -18,6 +18,9 @@
 #define DRIVER_AUTHOR	"Gavin Shan, IBM Corporation"
 #define DRIVER_DESC	"PowerPC PowerNV PCI Hotplug Driver"
 
+#define SLOT_WARN(slot, x...) \
+	(slot->pdev ? pci_warn(slot->pdev, x) : dev_warn(&slot->bus->dev, x))
+
 struct pnv_php_event {
 	bool			added;
 	struct pnv_php_slot	*php_slot;
@@ -265,7 +268,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot)
 
 	ret = pnv_pci_get_device_tree(php_slot->dn->phandle, fdt1, 0x10000);
 	if (ret) {
-		pci_warn(php_slot->pdev, "Error %d getting FDT blob\n", ret);
+		SLOT_WARN(php_slot, "Error %d getting FDT blob\n", ret);
 		goto free_fdt1;
 	}
 
@@ -279,7 +282,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot)
 	dt = of_fdt_unflatten_tree(fdt, php_slot->dn, NULL);
 	if (!dt) {
 		ret = -EINVAL;
-		pci_warn(php_slot->pdev, "Cannot unflatten FDT\n");
+		SLOT_WARN(php_slot, "Cannot unflatten FDT\n");
 		goto free_fdt;
 	}
 
@@ -289,7 +292,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot)
 	ret = pnv_php_populate_changeset(&php_slot->ocs, php_slot->dn);
 	if (ret) {
 		pnv_php_reverse_nodes(php_slot->dn);
-		pci_warn(php_slot->pdev, "Error %d populating changeset\n",
+		SLOT_WARN(php_slot, "Error %d populating changeset\n",
 			 ret);
 		goto free_dt;
 	}
@@ -297,7 +300,7 @@ static int pnv_php_add_devtree(struct pnv_php_slot *php_slot)
 	php_slot->dn->child = NULL;
 	ret = of_changeset_apply(&php_slot->ocs);
 	if (ret) {
-		pci_warn(php_slot->pdev, "Error %d applying changeset\n", ret);
+		SLOT_WARN(php_slot, "Error %d applying changeset\n", ret);
 		goto destroy_changeset;
 	}
 
@@ -337,7 +340,7 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 	if (ret > 0) {
 		if (be64_to_cpu(msg.params[1]) != php_slot->dn->phandle	||
 		    be64_to_cpu(msg.params[2]) != state) {
-			pci_warn(php_slot->pdev, "Wrong msg (%lld, %lld, %lld)\n",
+			SLOT_WARN(php_slot, "Wrong msg (%lld, %lld, %lld)\n",
 				 be64_to_cpu(msg.params[1]),
 				 be64_to_cpu(msg.params[2]),
 				 be64_to_cpu(msg.params[3]));
@@ -359,7 +362,7 @@ int pnv_php_set_slot_power_state(struct hotplug_slot *slot,
 	return ret;
 
 error:
-	pci_warn(php_slot->pdev, "Error %d powering %s\n",
+	SLOT_WARN(php_slot, "Error %d powering %s\n",
 		ret, (state == OPAL_PCI_SLOT_POWER_ON) ? "on" : "off");
 	return ret;
 }
@@ -378,7 +381,7 @@ static int pnv_php_get_power_state(struct hotplug_slot *slot, u8 *state)
 	 */
 	ret = pnv_pci_get_power_state(php_slot->id, &power_state);
 	if (ret) {
-		pci_warn(php_slot->pdev, "Error %d getting power status\n",
+		SLOT_WARN(php_slot, "Error %d getting power status\n",
 			 ret);
 	} else {
 		*state = power_state;
@@ -402,7 +405,7 @@ static int pnv_php_get_adapter_state(struct hotplug_slot *slot, u8 *state)
 		*state = presence;
 		ret = 0;
 	} else {
-		pci_warn(php_slot->pdev, "Error %d getting presence\n", ret);
+		SLOT_WARN(php_slot, "Error %d getting presence\n", ret);
 	}
 
 	return ret;
@@ -637,7 +640,7 @@ static int pnv_php_register_slot(struct pnv_php_slot *php_slot)
 	ret = pci_hp_register(&php_slot->slot, php_slot->bus,
 			      php_slot->slot_no, php_slot->name);
 	if (ret) {
-		pci_warn(php_slot->pdev, "Error %d registering slot\n", ret);
+		SLOT_WARN(php_slot, "Error %d registering slot\n", ret);
 		return ret;
 	}
 
@@ -690,7 +693,7 @@ static int pnv_php_enable_msix(struct pnv_php_slot *php_slot)
 	/* Enable MSIx */
 	ret = pci_enable_msix_exact(pdev, &entry, 1);
 	if (ret) {
-		pci_warn(pdev, "Error %d enabling MSIx\n", ret);
+		SLOT_WARN(php_slot, "Error %d enabling MSIx\n", ret);
 		return ret;
 	}
 
@@ -734,7 +737,7 @@ static irqreturn_t pnv_php_interrupt(int irq, void *data)
 		   (sts & PCI_EXP_SLTSTA_PDC)) {
 		ret = pnv_pci_get_presence_state(php_slot->id, &presence);
 		if (ret) {
-			pci_warn(pdev, "PCI slot [%s] error %d getting presence (0x%04x), to retry the operation.\n",
+			SLOT_WARN(php_slot, "PCI slot [%s] error %d getting presence (0x%04x), to retry the operation.\n",
 				 php_slot->name, ret, sts);
 			return IRQ_HANDLED;
 		}
@@ -764,7 +767,7 @@ static irqreturn_t pnv_php_interrupt(int irq, void *data)
 	 */
 	event = kzalloc(sizeof(*event), GFP_ATOMIC);
 	if (!event) {
-		pci_warn(pdev, "PCI slot [%s] missed hotplug event 0x%04x\n",
+		SLOT_WARN(php_slot, "PCI slot [%s] missed hotplug event 0x%04x\n",
 			 php_slot->name, sts);
 		return IRQ_HANDLED;
 	}
@@ -789,7 +792,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq)
 	/* Allocate workqueue */
 	php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name);
 	if (!php_slot->wq) {
-		pci_warn(pdev, "Cannot alloc workqueue\n");
+		SLOT_WARN(php_slot, "Cannot alloc workqueue\n");
 		pnv_php_disable_irq(php_slot, true);
 		return;
 	}
@@ -813,7 +816,7 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot, int irq)
 			  php_slot->name, php_slot);
 	if (ret) {
 		pnv_php_disable_irq(php_slot, true);
-		pci_warn(pdev, "Error %d enabling IRQ %d\n", ret, irq);
+		SLOT_WARN(php_slot, "Error %d enabling IRQ %d\n", ret, irq);
 		return;
 	}
 
@@ -849,7 +852,7 @@ static void pnv_php_enable_irq(struct pnv_php_slot *php_slot)
 
 	ret = pci_enable_device(pdev);
 	if (ret) {
-		pci_warn(pdev, "Error %d enabling device\n", ret);
+		SLOT_WARN(php_slot, "Error %d enabling device\n", ret);
 		return;
 	}
 
-- 
2.21.0


^ permalink raw reply related

* [RFC 09/11] pci/hotplug/pnv-php: Relax check when disabling slot
From: Frederic Barrat @ 2019-06-19 13:28 UTC (permalink / raw)
  To: linuxppc-dev, andrew.donnellan, clombard
  Cc: aik, arbab, oohall, groug, alastair
In-Reply-To: <20190619132840.27634-1-fbarrat@linux.ibm.com>

The driver only allows to disable a slot in the POPULATED
state. However, if an error occurs while enabling the slot, say
because the link couldn't be trained, then the POPULATED state may not
be reached, yet the power state of the slot is on. So allow to disable
a slot in the REGISTERED state. Removing the devices will do nothing
since it's not populated, and we'll set the power state of the slot
back to off.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
 drivers/pci/hotplug/pnv_php.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index f9c624334ef7..74b62a8e11e7 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -523,7 +523,13 @@ static int pnv_php_disable_slot(struct hotplug_slot *slot)
 	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
 	int ret;
 
-	if (php_slot->state != PNV_PHP_STATE_POPULATED)
+	/*
+	 * Allow to disable a slot already in the registered state to
+	 * cover cases where the slot couldn't be enabled and never
+	 * reached the populated state
+	 */
+	if (php_slot->state != PNV_PHP_STATE_POPULATED &&
+		php_slot->state != PNV_PHP_STATE_REGISTERED)
 		return 0;
 
 	/* Remove all devices behind the slot */
-- 
2.21.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox