* Re: [PATCH v4 09/10] Powerpc/smp: Create coregroup domain
From: Valentin Schneider @ 2020-07-31 1:05 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling, Peter Zijlstra,
LKML, Nicholas Piggin, Morten Rasmussen, Oliver O'Halloran,
Jordan Niethe, linuxppc-dev, Ingo Molnar
In-Reply-To: <20200729061355.GA14603@linux.vnet.ibm.com>
(+Cc Morten)
On 29/07/20 07:13, Srikar Dronamraju wrote:
> * Valentin Schneider <valentin.schneider@arm.com> [2020-07-28 16:03:11]:
>
> Hi Valentin,
>
> Thanks for looking into the patches.
>
>> On 27/07/20 06:32, Srikar Dronamraju wrote:
>> > Add percpu coregroup maps and masks to create coregroup domain.
>> > If a coregroup doesn't exist, the coregroup domain will be degenerated
>> > in favour of SMT/CACHE domain.
>> >
>>
>> So there's at least one arm64 platform out there with the same "pairs of
>> cores share L2" thing (Ampere eMAG), and that lives quite happily with the
>> default scheduler topology (SMT/MC/DIE). Each pair of core gets its MC
>> domain, and the whole system is covered by DIE.
>>
>> Now arguably it's not a perfect representation; DIE doesn't have
>> SD_SHARE_PKG_RESOURCES so the highest level sd_llc can point to is MC. That
>> will impact all callsites using cpus_share_cache(): in the eMAG case, only
>> pairs of cores will be seen as sharing cache, even though *all* cores share
>> the same L3.
>>
>
> Okay, Its good to know that we have a chip which is similar to P9 in
> topology.
>
>> I'm trying to paint a picture of what the P9 topology looks like (the one
>> you showcase in your cover letter) to see if there are any similarities;
>> from what I gather in [1], wikichips and your cover letter, with P9 you can
>> have something like this in a single DIE (somewhat unsure about L3 setup;
>> it looks to be distributed?)
>>
>> +---------------------------------------------------------------------+
>> | L3 |
>> +---------------+-+---------------+-+---------------+-+---------------+
>> | L2 | | L2 | | L2 | | L2 |
>> +------+-+------+ +------+-+------+ +------+-+------+ +------+-+------+
>> | L1 | | L1 | | L1 | | L1 | | L1 | | L1 | | L1 | | L1 |
>> +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+
>> |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs|
>> +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+
>>
>> Which would lead to (ignoring the whole SMT CPU numbering shenanigans)
>>
>> NUMA [ ...
>> DIE [ ]
>> MC [ ] [ ] [ ] [ ]
>> BIGCORE [ ] [ ] [ ] [ ]
>> SMT [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
>> 00-03 04-07 08-11 12-15 16-19 20-23 24-27 28-31 <other node here>
>>
>
> What you have summed up is perfectly what a P9 topology looks like. I dont
> think I could have explained it better than this.
>
Yay!
>> This however has MC == BIGCORE; what makes it you can have different spans
>> for these two domains? If it's not too much to ask, I'd love to have a P9
>> topology diagram.
>>
>> [1]: 20200722081822.GG9290@linux.vnet.ibm.com
>
> At this time the current topology would be good enough i.e BIGCORE would
> always be equal to a MC. However in future we could have chips that can have
> lesser/larger number of CPUs in llc than in a BIGCORE or we could have
> granular or split L3 caches within a DIE. In such a case BIGCORE != MC.
>
Right, that one's fair enough.
> Also in the current P9 itself, two neighbouring core-pairs form a quad.
> Cache latency within a quad is better than a latency to a distant core-pair.
> Cache latency within a core pair is way better than latency within a quad.
> So if we have only 4 threads running on a DIE all of them accessing the same
> cache-lines, then we could probably benefit if all the tasks were to run
> within the quad aka MC/Coregroup.
>
Did you test this? WRT load balance we do try to balance "load" over the
different domain spans, so if you represent quads as their own MC domain,
you would AFAICT end up spreading tasks over the quads (rather than packing
them) when balancing at e.g. DIE level. The desired behaviour might be
hackable with some more ASYM_PACKING, but I'm not sure I should be
suggesting that :-)
> I have found some benchmarks which are latency sensitive to benefit by
> having a grouping a quad level (using kernel hacks and not backed by
> firmware changes). Gautham also found similar results in his experiments
> but he only used binding within the stock kernel.
>
IIUC you reflect this "fabric quirk" (i.e. coregroups) using this DT
binding thing.
That's also where things get interesting (for me) because I experienced
something similar on another arm64 platform (ThunderX1). This was more
about cache bandwidth than cache latency, but IMO it's in the same bag of
fabric quirks. I blabbered a bit about this at last LPC [1], but kind of
gave up on it given the TX1 was the only (arm64) platform where I could get
both significant and reproducible results.
Now, if you folks are seeing this on completely different hardware and have
"real" workloads that truly benefit from this kind of domain partitioning,
this might be another incentive to try and sort of generalize this. That's
outside the scope of your series, but your findings give me some hope!
I think what I had in mind back then was that if enough folks cared about
it, we might get some bits added to the ACPI spec; something along the
lines of proximity domains for the caches described in PPTT, IOW a cache
distance matrix. I don't really know what it'll take to get there, but I
figured I'd dump this in case someone's listening :-)
> I am not setting SD_SHARE_PKG_RESOURCES in MC/Coregroup sd_flags as in MC
> domain need not be LLC domain for Power.
From what I understood your MC domain does seem to map to LLC; but in any
case, shouldn't you set that flag at least for BIGCORE (i.e. L2)? AIUI with
your changes your sd_llc is gonna be SMT, and that's not going to be a very
big mask. IMO you do want to correctly reflect your LLC situation via this
flag to make cpus_share_cache() work properly.
[1]: https://linuxplumbersconf.org/event/4/contributions/484/
^ permalink raw reply
* Re: [PATCH v2] powerpc/vio: drop bus_type from parent device
From: Michael Ellerman @ 2020-07-31 0:53 UTC (permalink / raw)
To: Greg KH
Cc: Stephen Rothwell, Thadeu Lima de Souza Cascardo, Peter Rajnoha,
linuxppc-dev
In-Reply-To: <20200730053716.GA3862178@kroah.com>
Greg KH <gregkh@linuxfoundation.org> writes:
> On Thu, Jul 30, 2020 at 11:28:38AM +1000, Michael Ellerman wrote:
>> [ Added Peter & Greg to Cc ]
>>
>> Thadeu Lima de Souza Cascardo <cascardo@canonical.com> writes:
>> > Commit df44b479654f62b478c18ee4d8bc4e9f897a9844 ("kobject: return error
>> > code if writing /sys/.../uevent fails") started returning failure when
>> > writing to /sys/devices/vio/uevent.
>> >
>> > This causes an early udevadm trigger to fail. On some installer versions of
>> > Ubuntu, this will cause init to exit, thus panicing the system very early
>> > during boot.
>> >
>> > Removing the bus_type from the parent device will remove some of the extra
>> > empty files from /sys/devices/vio/, but will keep the rest of the layout
>> > for vio devices, keeping them under /sys/devices/vio/.
>>
>> What exactly does it change?
>>
>> I'm finding it hard to evaluate if this change is going to cause a
>> regression somehow.
>>
>> I'm also not clear on why removing the bus type is correct, apart from
>> whether it fixes the bug you're seeing.
>>
>> > It has been tested that uevents for vio devices don't change after this
>> > fix, they still contain MODALIAS.
>> >
>> > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
>> > Fixes: df44b479654f ("kobject: return error code if writing /sys/.../uevent fails")
>>
>> AFAICS there haven't been any other fixes for that commit. Do we know
>> why it is only vio that was affected? (possibly because it's a fake bus
>> to begin with?)
>
> So there was an error previously, the core was ignoring it, and now it
> isn't and to fix that you want to remove describing what bus a device is
> on?
>
> Huh???
Right.
Not to mention there are existing unfixed kernels out there, so whatever
userspace is crashing will need to be fixed for those anyway.
>> > diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
>> > index 37f1f25ba804..a94dab3972a0 100644
>> > --- a/arch/powerpc/platforms/pseries/vio.c
>> > +++ b/arch/powerpc/platforms/pseries/vio.c
>> > @@ -36,7 +36,6 @@ static struct vio_dev vio_bus_device = { /* fake "parent" device */
>> > .name = "vio",
>> > .type = "",
>> > .dev.init_name = "vio",
>> > - .dev.bus = &vio_bus_type,
>> > };
>
> Wait, a static 'struct device'? You all are playing with fire there.
> That's a reference counted object, and should never be declared like
> that at all.
Since 2005 :)
AC33c9bcf1 ("[PATCH] ppc64: tidy up vio devices fake parent")
> I see you register it, but never unregister it, why? Why is it even
> needed?
I don't remember, if I ever knew.
The code says:
/*
* The fake parent of all vio devices, just to give us
* a nice directory
*/
err = device_register(&vio_bus_device.dev);
But I suspect that may no longer be true.
ie. the devices show up in /sys/bus/vio/devices because they have
dev.bus = vio_bus_type, the fake parent doesn't seem to determine the
location.
> And if you remove the bus type of it, it will show up in a different
> part of sysfs, so I think this patch will show a user-visable change,
> right?
Yes I think so. But because it's a fake device to begin with that's
possibly OK.
I think we really need to get to the bottom of whether we need that
device at all, it seems like it might be left over cruft from the
ancient past.
I'll try and find time to work it out.
cheers
^ permalink raw reply
* Re: OF: Can't handle multiple dma-ranges with different offsets
From: Chris Packham @ 2020-07-31 0:10 UTC (permalink / raw)
To: robh+dt@kernel.org, frowand.list@gmail.com, mpe@ellerman.id.au,
benh@kernel.crashing.org, paulus@samba.org,
christophe.leroy@c-s.fr
Cc: devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org
In-Reply-To: <961bc990-c815-1a19-c349-8b03065d5aab@alliedtelesis.co.nz>
On 23/07/20 10:11 am, Chris Packham wrote:
>
> On 22/07/20 4:19 pm, Chris Packham wrote:
>> Hi,
>>
>> I've just fired up linux kernel v5.7 on a p2040 based system and I'm
>> getting the following new warning
>>
>> OF: Can't handle multiple dma-ranges with different offsets on
>> node(/pcie@ffe202000)
>> OF: Can't handle multiple dma-ranges with different offsets on
>> node(/pcie@ffe202000)
>>
>> The warning itself was added in commit 9d55bebd9816 ("of/address:
>> Support multiple 'dma-ranges' entries") but I gather it's pointing
>> out something about the dts. My boards dts is based heavily on
>> p2041rdb.dts and the relevant pci2 section is identical (reproduced
>> below for reference).
>>
>> pci2: pcie@ffe202000 {
>> reg = <0xf 0xfe202000 0 0x1000>;
>> ranges = <0x02000000 0 0xe0000000 0xc 0x40000000 0 0x20000000
>> 0x01000000 0 0x00000000 0xf 0xf8020000 0 0x00010000>;
>> pcie@0 {
>> ranges = <0x02000000 0 0xe0000000
>> 0x02000000 0 0xe0000000
>> 0 0x20000000
>>
>> 0x01000000 0 0x00000000
>> 0x01000000 0 0x00000000
>> 0 0x00010000>;
>> };
>> };
>>
>> I haven't noticed any ill effect (aside from the scary message). I'm
>> not sure if there's something missing in the dts or in the code that
>> checks the ranges. Any guidance would be appreciated.
>
> I've also just checked the T2080RDB on v5.7.9 which shows a similar issue
>
> OF: Can't handle multiple dma-ranges with different offsets on
> node(/pcie@ffe250000)
> OF: Can't handle multiple dma-ranges with different offsets on
> node(/pcie@ffe250000)
> pcieport 0000:00:00.0: Invalid size 0xfffff9 for dma-range
> pcieport 0000:00:00.0: AER: enabled with IRQ 21
> OF: Can't handle multiple dma-ranges with different offsets on
> node(/pcie@ffe270000)
> OF: Can't handle multiple dma-ranges with different offsets on
> node(/pcie@ffe270000)
> pcieport 0001:00:00.0: Invalid size 0xfffff9 for dma-range
> pcieport 0001:00:00.0: AER: enabled with IRQ 23
I've been doing a bit more digging. The dma-ranges property is not in
the dts/dtb. It's actually inserted by u-boot via ft_fsl_pci_setup().
Here's some output from my T2080RDB
root@linuxbox ~]# xxd -g4
/sys/firmware/devicetree/base/pcie@ffe240000/dma-ranges
0000000: 02000000 00000000 df000007 0000000f ................
0000010: fe000000 00000000 00fffff9 42000000 ............B...
0000020: 00000000 00000000 00000000 00000000 ................
0000030: 00000000 df000007 43000000 00000010 ........C.......
0000040: 00000000 00000000 00000000 00000001 ................
0000050: 00000000 ....
I'm still wondering how best to deal with this. Hopefully without
needing to deploy a u-boot update.
^ permalink raw reply
* [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()
From: Ram Pai @ 2020-07-30 23:25 UTC (permalink / raw)
To: kvm-ppc, linuxppc-dev
Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
sukadev, bauerman, david
Observed the following oops while stress-testing, using multiple
secureVM on a distro kernel. However this issue theoritically exists in
5.5 kernel and later.
This issue occurs when the total number of requested device-PFNs exceed
the total-number of available device-PFNs. PFN migration fails to
allocate a device-pfn, which causes migrate_vma_finalize() to trigger
kvmppc_uvmem_page_free() on a page, that is not associated with any
device-pfn. kvmppc_uvmem_page_free() blindly tries to access the
contents of the private data which can be null, leading to the following
kernel fault.
--------------------------------------------------------------------------
Unable to handle kernel paging request for data at address 0x00000011
Faulting instruction address: 0xc00800000e36e110
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
....
MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
CR: 24424822 XER: 00000000
CFAR: c000000000e3d764 DAR: 0000000000000011 DSISR: 40000000 IRQMASK: 0
GPR00: c00800000e36e0a4 c000001f1d59f610 c00800000e38a400 0000000000000000
GPR04: c000001fa5000000 fffffffffffffffe ffffffffffffffff c000201fffeaf300
GPR08: 00000000000001f0 0000000000000000 0000000000000f80 c00800000e373608
GPR12: c000000000e3d710 c000201fffeaf300 0000000000000001 00007fef87360000
GPR16: 00007fff97db4410 c000201c3b66a578 ffffffffffffffff 0000000000000000
GPR20: 0000000119db9ad0 000000000000000a fffffffffffffffc 0000000000000001
GPR24: c000201c3b660000 c000001f1d59f7a0 c0000000004cffb0 0000000000000001
GPR28: 0000000000000000 c00a001ff003e000 c00800000e386150 0000000000000f80
NIP [c00800000e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv]
LR [c00800000e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv]
Call Trace:
[c000000000512010] free_devmap_managed_page+0xd0/0x100
[c0000000003f71d0] put_devmap_managed_page+0xa0/0xc0
[c0000000004d24bc] migrate_vma_finalize+0x32c/0x410
[c00800000e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv]
[c00800000e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv]
[c00800000e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv]
[c00800000e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv]
[c00800000e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv]
[c00800000e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm]
[c00800000e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
[c00800000e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
[c0000000005451d0] do_vfs_ioctl+0xe0/0xaa0
[c000000000545d64] sys_ioctl+0xc4/0x160
[c00000000000b408] system_call+0x5c/0x70
Instruction dump:
a12d1174 2f890000 409e0158 a1271172 3929ffff b1271172 7c2004ac 39200000
913e0140 39200000 e87d0010 f93d0010 <89230011> e8c30000 e9030008 2f890000
--------------------------------------------------------------------------
Fix the oops..
fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
arch/powerpc/kvm/book3s_hv_uvmem.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 2806983..f4002bf 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *page)
{
unsigned long pfn = page_to_pfn(page) -
(kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
- struct kvmppc_uvmem_page_pvt *pvt;
+ struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data;
+
+ if (!pvt)
+ return;
spin_lock(&kvmppc_uvmem_bitmap_lock);
bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
spin_unlock(&kvmppc_uvmem_bitmap_lock);
- pvt = page->zone_device_data;
page->zone_device_data = NULL;
if (pvt->remove_gfn)
kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
--
1.8.3.1
^ permalink raw reply related
* [PATCH] KVM: PPC: Book3S HV: Define H_PAGE_IN_NONSHARED for H_SVM_PAGE_IN hcall
From: Ram Pai @ 2020-07-30 23:21 UTC (permalink / raw)
To: Julia Lawall
Cc: ldufour, linux-doc, corbet, kvm-ppc, bharata, sathnaga, sukadev,
linuxppc-dev, david
In-Reply-To: <alpine.DEB.2.22.394.2007301231140.2548@hadrien>
H_SVM_PAGE_IN hcall takes a flag parameter. This parameter specifies the
way in which a page will be treated. H_PAGE_IN_NONSHARED indicates
that the page will be shared with the Secure VM, and H_PAGE_IN_SHARED
indicates that the page will not be shared but its contents will
be copied.
However H_PAGE_IN_NONSHARED is not defined in the header file, though
it is defined and documented in the API captured in
Documentation/powerpc/ultravisor.rst
Define H_PAGE_IN_NONSHARED in the header file.
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
arch/powerpc/include/asm/hvcall.h | 4 +++-
arch/powerpc/kvm/book3s_hv_uvmem.c | 3 ++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index e90c073..43e3f8d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -343,7 +343,9 @@
#define H_COPY_TOFROM_GUEST 0xF80C
/* Flags for H_SVM_PAGE_IN */
-#define H_PAGE_IN_SHARED 0x1
+#define H_PAGE_IN_NONSHARED 0x0 /* Page is not shared with the UV */
+#define H_PAGE_IN_SHARED 0x1 /* Page is shared with UV */
+#define H_PAGE_IN_MASK 0x1
/* Platform-specific hcalls used by the Ultravisor */
#define H_SVM_PAGE_IN 0xEF00
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 2dde0fb..2806983 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -947,12 +947,13 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
if (page_shift != PAGE_SHIFT)
return H_P3;
- if (flags & ~H_PAGE_IN_SHARED)
+ if (flags & ~H_PAGE_IN_MASK)
return H_P2;
if (flags & H_PAGE_IN_SHARED)
return kvmppc_share_page(kvm, gpa, page_shift);
+ /* handle H_PAGE_IN_NONSHARED */
ret = H_PARAMETER;
srcu_idx = srcu_read_lock(&kvm->srcu);
mmap_read_lock(kvm->mm);
--
1.8.3.1
--
Ram Pai
^ permalink raw reply related
* Re: [PATCH V5 0/4] powerpc/perf: Add support for perf extended regs in powerpc
From: Jiri Olsa @ 2020-07-30 19:50 UTC (permalink / raw)
To: Athira Rajeev
Cc: Ravi Bangoria, Michael Neuling, maddy, Arnaldo Carvalho de Melo,
Jiri Olsa, kjain, linuxppc-dev
In-Reply-To: <27D1CE26-A506-4CFF-B1C2-E0545F26E637@linux.vnet.ibm.com>
On Thu, Jul 30, 2020 at 01:24:40PM +0530, Athira Rajeev wrote:
>
>
> > On 27-Jul-2020, at 10:46 PM, Athira Rajeev <atrajeev@linux.vnet.ibm.com> wrote:
> >
> > Patch set to add support for perf extended register capability in
> > powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to
> > indicate the PMU which support extended registers. The generic code
> > define the mask of extended registers as 0 for non supported architectures.
> >
> > Patches 1 and 2 are the kernel side changes needed to include
> > base support for extended regs in powerpc and in power10.
> > Patches 3 and 4 are the perf tools side changes needed to support the
> > extended registers.
> >
>
> Hi Arnaldo, Jiri
>
> please let me know if you have any comments/suggestions on this patch series to add support for perf extended regs.
hi,
can't really tell for powerpc, but in general
perf tool changes look ok
jirka
^ permalink raw reply
* Re: [PATCH v4 00/10] Coregroup support on Powerpc
From: Srikar Dronamraju @ 2020-07-30 17:22 UTC (permalink / raw)
To: Michael Ellerman
Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
Michael Ellerman, Peter Zijlstra, Jordan Niethe, Anton Blanchard,
LKML, Ingo Molnar, Nick Piggin, linuxppc-dev, Valentin Schneider
In-Reply-To: <20200727053230.19753-1-srikar@linux.vnet.ibm.com>
* Srikar Dronamraju <srikar@linux.vnet.ibm.com> [2020-07-27 11:02:20]:
> Changelog v3 ->v4:
> v3: https://lore.kernel.org/lkml/20200723085116.4731-1-srikar@linux.vnet.ibm.com/t/#u
>
Here is a summary of some of the testing done with coregroup v4 patchsets.
It includes ebizzy, schbench, perf bench sched pipe and topology verification.
One the left side are results from powerpc/next tree and on the right are the
results with the patchset applied. Topological verification clearly shows that
there is no change in topology with and without the patches on all the 3 class
of systems that were tested.
On PowerPc/Next On Powerpc/next + Coregroup Support v4 patchset
Power 9 PowerNV (2 Node/ 160 Cpu System)
---------------------------------
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
N Min Max Median Avg Stddev N Min Max Median Avg Stddev
100 993884 1276090 1173476 1165914 54867.201 100 910470 1279820 1171095 1162091 67363.28
schbench (latency hence lower is better)
Latency percentiles (usec) Latency percentiles (usec)
50.0th: 455 50.0th: 454
75.0th: 533 75.0th: 543
90.0th: 683 90.0th: 701
95.0th: 743 95.0th: 737
*99.0th: 815 *99.0th: 805
99.5th: 839 99.5th: 835
99.9th: 913 99.9th: 893
min=0, max=1011 min=0, max=2833
perf bench sched pipe (lesser time and higher ops/sec is better)
# Running 'sched/pipe' benchmark: # Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes # Executed 1000000 pipe operations between two processes
Total time: 6.083 [sec] Total time: 6.303 [sec]
6.083576 usecs/op 6.303318 usecs/op
164377 ops/sec 158646 ops/sec
Power 9 LPAR (2 Node/ 128 Cpu System)
---------------------------------
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
N Min Max Median Avg Stddev N Min Max Median Avg Stddev
100 1058029 1295393 1200414 1188306.7 56786.538 100 943264 1287619 1180522 1168473.2 64469.955
schbench (latency hence lower is better)
Latency percentiles (usec) Latency percentiles (usec)
50.0000th: 34 50.0000th: 39
75.0000th: 46 75.0000th: 52
90.0000th: 53 90.0000th: 68
95.0000th: 56 95.0000th: 77
*99.0000th: 61 *99.0000th: 89
99.5000th: 63 99.5000th: 94
99.9000th: 81 99.9000th: 169
min=0, max=8405 min=0, max=23674
perf bench sched pipe (lesser time and higher ops/sec is better)
# Running 'sched/pipe' benchmark: # Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes # Executed 1000000 pipe operations between two processes
Total time: 8.768 [sec] Total time: 5.217 [sec]
8.768400 usecs/op 5.217625 usecs/op
114045 ops/sec 191658 ops/sec
Power 8 LPAR (8 Node/ 256 Cpu System)
---------------------------------
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
N Min Max Median Avg Stddev N Min Max Median Avg Stddev
100 1267615 1965234 1707423 1689137.6 144363.29 100 1175357 1924262 1691104 1664792.1 145876.4
schbench (latency hence lower is better)
Latency percentiles (usec) Latency percentiles (usec)
50.0th: 37 50.0th: 36
75.0th: 51 75.0th: 48
90.0th: 59 90.0th: 55
95.0th: 63 95.0th: 59
*99.0th: 71 *99.0th: 67
99.5th: 75 99.5th: 72
99.9th: 105 99.9th: 170
min=0, max=18560 min=0, max=27031
perf bench sched pipe (lesser time and higher ops/sec is better)
# Running 'sched/pipe' benchmark: # Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes # Executed 1000000 pipe operations between two processes
Total time: 6.013 [sec] Total time: 5.930 [sec]
6.013963 usecs/op 5.930724 usecs/op
166279 ops/sec 168613 ops/sec
Topology verification on Power9
Power9/ PowerNV / SMT4
tail -f /proc/cpuinfo
---------------------
cpu : POWER9, altivec supported
clock : 3600.000000MHz
revision : 2.2 (pvr 004e 1202)
timebase : 512000000
platform : PowerNV
model : 9006-22P
machine : PowerNV 9006-22P
firmware : OPAL
MMU : Radix
On PowerPc/Next On Powerpc/next + Coregroup Support v4 patchset
lscpu lscpu
------ ------
Architecture: ppc64le Architecture: ppc64le
Byte Order: Little Endian Byte Order: Little Endian
CPU(s): 160 CPU(s): 160
On-line CPU(s) list: 0-159 On-line CPU(s) list: 0-159
Thread(s) per core: 4 Thread(s) per core: 4
Core(s) per socket: 20 Core(s) per socket: 20
Socket(s): 2 Socket(s): 2
NUMA node(s): 2 NUMA node(s): 2
Model: 2.2 (pvr 004e 1202) Model: 2.2 (pvr 004e 1202)
Model name: POWER9, altivec supported Model name: POWER9, altivec supported
CPU max MHz: 3800.0000 CPU max MHz: 3800.0000
CPU min MHz: 2166.0000 CPU min MHz: 2166.0000
L1d cache: 32K L1d cache: 32K
L1i cache: 32K L1i cache: 32K
L2 cache: 512K L2 cache: 512K
L3 cache: 10240K L3 cache: 10240K
NUMA node0 CPU(s): 0-79 NUMA node0 CPU(s): 0-79
NUMA node8 CPU(s): 80-159 NUMA node8 CPU(s): 80-159
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name
----------------------------------------------------- -----------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT /proc/sys/kernel/sched_domain/cpu0/domain0/name:SMT
/proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE /proc/sys/kernel/sched_domain/cpu0/domain1/name:CACHE
/proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE /proc/sys/kernel/sched_domain/cpu0/domain2/name:DIE
/proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA /proc/sys/kernel/sched_domain/cpu0/domain3/name:NUMA
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags
------------------------------------------------------ ------------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391 /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2391
/proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327 /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2327
/proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071 /proc/sys/kernel/sched_domain/cpu0/domain2/flags:2071
/proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801 /proc/sys/kernel/sched_domain/cpu0/domain3/flags:12801
On PowerPc/Next
head /proc/schedstat
--------------------
version 15
timestamp 4295043536
cpu0 0 0 0 0 0 0 9597119314 2408913694 11897
domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,0000ffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 4941435230 11106132 1583
domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
On Powerpc/next + Coregroup Support v4 patchset
head /proc/schedstat
--------------------
version 15
timestamp 4296311826
cpu0 0 0 0 0 0 0 3353674045024 3781680865826 297483
domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,0000ffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 3337873293332 4231590033856 229090
domain0 00000000,00000000,00000000,00000000,0000000f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Post sudo ppc64_cpu --smt=1 Post sudo ppc64_cpu --smt=1
--------------------- ---------------------
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name grep . /proc/sys/kernel/sched_domain/cpu0/domain*/name
----------------------------------------------------- -----------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/name:CACHE /proc/sys/kernel/sched_domain/cpu0/domain0/name:CACHE
/proc/sys/kernel/sched_domain/cpu0/domain1/name:DIE /proc/sys/kernel/sched_domain/cpu0/domain1/name:DIE
/proc/sys/kernel/sched_domain/cpu0/domain2/name:NUMA /proc/sys/kernel/sched_domain/cpu0/domain2/name:NUMA
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags grep . /proc/sys/kernel/sched_domain/cpu0/domain*/flags
------------------------------------------------------ ------------------------------------------------------
/proc/sys/kernel/sched_domain/cpu0/domain0/flags:2327 /proc/sys/kernel/sched_domain/cpu0/domain0/flags:2327
/proc/sys/kernel/sched_domain/cpu0/domain1/flags:2071 /proc/sys/kernel/sched_domain/cpu0/domain1/flags:2071
/proc/sys/kernel/sched_domain/cpu0/domain2/flags:12801 /proc/sys/kernel/sched_domain/cpu0/domain2/flags:12801
On Powerpc/next
head /proc/schedstat
--------------------
version 15
timestamp 4295046242
cpu0 0 0 0 0 0 0 10978610020 2658997390 13068
domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu4 0 0 0 0 0 0 5408663896 95701034 7697
domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
On Powerpc/next + Coregroup Support v4 patchset
head /proc/schedstat
--------------------
version 15
timestamp 4296314905
cpu0 0 0 0 0 0 0 3355392013536 3781975150576 298723
domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu4 0 0 0 0 0 0 3351637920996 4427329763050 256776
domain0 00000000,00000000,00000000,00000000,00000011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00001111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 91111111,11111111,11111111,11111111,11111111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Similar verification was done on Power 8 (8 Node 256 CPU LPAR) and Power 9 (2
node 128 Cpu LPAR) and they showed the topology before and after the patch to be
identical. If Interested, I could provide the same.
^ permalink raw reply
* Re: Documentation/powerpc: Ultravisor API
From: Ram Pai @ 2020-07-30 16:48 UTC (permalink / raw)
To: Julia Lawall; +Cc: sukadev, linuxppc-dev, linux-doc, corbet
In-Reply-To: <alpine.DEB.2.22.394.2007301231140.2548@hadrien>
On Thu, Jul 30, 2020 at 12:35:38PM +0200, Julia Lawall wrote:
> The file Documentation/powerpc/ultravisor.rst contains:
>
> Only valid value(s) in ``flags`` are:
>
> * H_PAGE_IN_SHARED which indicates that the page is to be shared
> with the Ultravisor.
>
> * H_PAGE_IN_NONSHARED indicates that the UV is not anymore
> interested in the page. Applicable if the page is a shared page.
>
> The flag H_PAGE_IN_SHARED exists in the Linux kernel
> (arch/powerpc/include/asm/hvcall.h), but the flag H_PAGE_IN_NONSHARED does
> not. Should the documentation be changed in some way?
Currently the code assumes H_PAGE_IN_NONSHARED as !H_PAGE_IN_SHARED.
We need to patch the kernel to explicitly define the flag.
I will submit a patch towards this.
Thanks,
RP
^ permalink raw reply
* Re: [PATCH -next] PCI: rpadlpar: Make some functions static
From: Bjorn Helgaas @ 2020-07-30 16:16 UTC (permalink / raw)
To: Wei Yongjun
Cc: Tyrel Datwyler, linux-pci, Hulk Robot, Bjorn Helgaas,
linuxppc-dev
In-Reply-To: <20200721151735.41181-1-weiyongjun1@huawei.com>
On Tue, Jul 21, 2020 at 11:17:35PM +0800, Wei Yongjun wrote:
> The sparse tool report build warnings as follows:
>
> drivers/pci/hotplug/rpadlpar_core.c:355:5: warning:
> symbol 'dlpar_remove_pci_slot' was not declared. Should it be static?
> drivers/pci/hotplug/rpadlpar_core.c:461:12: warning:
> symbol 'rpadlpar_io_init' was not declared. Should it be static?
> drivers/pci/hotplug/rpadlpar_core.c:473:6: warning:
> symbol 'rpadlpar_io_exit' was not declared. Should it be static?
>
> Those functions are not used outside of this file, so marks them
> static.
> Also mark rpadlpar_io_exit() as __exit.
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Applied to pci/hotplug for v5.9, thanks!
> ---
> drivers/pci/hotplug/rpadlpar_core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
> index c5eb509c72f0..f979b7098acf 100644
> --- a/drivers/pci/hotplug/rpadlpar_core.c
> +++ b/drivers/pci/hotplug/rpadlpar_core.c
> @@ -352,7 +352,7 @@ static int dlpar_remove_vio_slot(char *drc_name, struct device_node *dn)
> * -ENODEV Not a valid drc_name
> * -EIO Internal PCI Error
> */
> -int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
> +static int dlpar_remove_pci_slot(char *drc_name, struct device_node *dn)
> {
> struct pci_bus *bus;
> struct slot *slot;
> @@ -458,7 +458,7 @@ static inline int is_dlpar_capable(void)
> return (int) (rc != RTAS_UNKNOWN_SERVICE);
> }
>
> -int __init rpadlpar_io_init(void)
> +static int __init rpadlpar_io_init(void)
> {
>
> if (!is_dlpar_capable()) {
> @@ -470,7 +470,7 @@ int __init rpadlpar_io_init(void)
> return dlpar_sysfs_init();
> }
>
> -void rpadlpar_io_exit(void)
> +static void __exit rpadlpar_io_exit(void)
> {
> dlpar_sysfs_exit();
> }
>
^ permalink raw reply
* [powerpc:next] BUILD SUCCESS cf1ae052e073c7ef6cf1a783a6427f7228253bd3
From: kernel test robot @ 2020-07-30 16:12 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
branch HEAD: cf1ae052e073c7ef6cf1a783a6427f7228253bd3 powerpc/powernv/sriov: Remove unused but set variable 'phb'
elapsed time: 1486m
configs tested: 54
configs skipped: 1
The following configs have been built successfully.
More configs may be tested in the coming days.
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
c6x allyesconfig
nds32 defconfig
nios2 allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
parisc defconfig
s390 allyesconfig
parisc allyesconfig
s390 defconfig
i386 allyesconfig
sparc allyesconfig
sparc defconfig
i386 defconfig
mips allyesconfig
mips allmodconfig
powerpc defconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
i386 randconfig-a016-20200730
i386 randconfig-a012-20200730
i386 randconfig-a014-20200730
i386 randconfig-a015-20200730
i386 randconfig-a011-20200730
i386 randconfig-a013-20200730
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
x86_64 rhel
x86_64 allyesconfig
x86_64 rhel-7.6-kselftests
x86_64 defconfig
x86_64 rhel-8.3
x86_64 kexec
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [PATCH v2] powerpc/vio: drop bus_type from parent device
From: Thadeu Lima de Souza Cascardo @ 2020-07-30 15:35 UTC (permalink / raw)
To: Greg KH; +Cc: Peter Rajnoha, linuxppc-dev
In-Reply-To: <20200730053716.GA3862178@kroah.com>
On Thu, Jul 30, 2020 at 07:37:16AM +0200, Greg KH wrote:
> On Thu, Jul 30, 2020 at 11:28:38AM +1000, Michael Ellerman wrote:
> > [ Added Peter & Greg to Cc ]
> >
> > Thadeu Lima de Souza Cascardo <cascardo@canonical.com> writes:
> > > Commit df44b479654f62b478c18ee4d8bc4e9f897a9844 ("kobject: return error
> > > code if writing /sys/.../uevent fails") started returning failure when
> > > writing to /sys/devices/vio/uevent.
> > >
> > > This causes an early udevadm trigger to fail. On some installer versions of
> > > Ubuntu, this will cause init to exit, thus panicing the system very early
> > > during boot.
> > >
> > > Removing the bus_type from the parent device will remove some of the extra
> > > empty files from /sys/devices/vio/, but will keep the rest of the layout
> > > for vio devices, keeping them under /sys/devices/vio/.
> >
> > What exactly does it change?
> >
> > I'm finding it hard to evaluate if this change is going to cause a
> > regression somehow.
> >
> > I'm also not clear on why removing the bus type is correct, apart from
> > whether it fixes the bug you're seeing.
> >
> > > It has been tested that uevents for vio devices don't change after this
> > > fix, they still contain MODALIAS.
> > >
> > > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> > > Fixes: df44b479654f ("kobject: return error code if writing /sys/.../uevent fails")
> >
> > AFAICS there haven't been any other fixes for that commit. Do we know
> > why it is only vio that was affected? (possibly because it's a fake bus
> > to begin with?)
>
> So there was an error previously, the core was ignoring it, and now it
> isn't and to fix that you want to remove describing what bus a device is
> on?
>
> Huh???
>
> >
> > cheers
> >
> > > diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
> > > index 37f1f25ba804..a94dab3972a0 100644
> > > --- a/arch/powerpc/platforms/pseries/vio.c
> > > +++ b/arch/powerpc/platforms/pseries/vio.c
> > > @@ -36,7 +36,6 @@ static struct vio_dev vio_bus_device = { /* fake "parent" device */
> > > .name = "vio",
> > > .type = "",
> > > .dev.init_name = "vio",
> > > - .dev.bus = &vio_bus_type,
> > > };
>
> Wait, a static 'struct device'? You all are playing with fire there.
> That's a reference counted object, and should never be declared like
> that at all.
>
> I see you register it, but never unregister it, why? Why is it even
> needed?
>
> And if you remove the bus type of it, it will show up in a different
> part of sysfs, so I think this patch will show a user-visable change,
> right?
>
> thanks,
>
> greg k-h
As the comment says, it's a "fake parent device". There is a user-visible
change, which is removing some attributes from the object, but it's still
showing up on the same path.
Returning an error code like df44b479654f does is also a user visible change
and it breaks installer images that panic early on boot.
I could investigate an alternative here, which would be not fail when writing
to uevent for this specific fake device.
Cascardo.
^ permalink raw reply
* Re: [PATCH] powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
From: Vladis Dronov @ 2020-07-30 15:34 UTC (permalink / raw)
To: Michael Ellerman
Cc: Aneesh Kumar K . V, Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <87ft995hv8.fsf@mpe.ellerman.id.au>
Hello, Michael,
----- Original Message -----
> From: "Michael Ellerman" <mpe@ellerman.id.au>
> Subject: Re: [PATCH] powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
>
...
> >> > So what changed? These functions were inlined with older compilers, but
> >> > not anymore?
> >>
> >> Yes, exactly. Gcc-10 does not inline them anymore. If this is because of
> >> my
> >> build system, this can happen to others also.
> >>
> >> The same thing was fixed by Linus in e99332e7b4cd ("gcc-10: mark more
> >> functions
> >> __init to avoid section mismatch warnings").
> >
> > It sounds like this is part of "-finline-functions was retuned" on
> > <https://gcc.gnu.org/gcc-10/changes.html>? So everyone should see it
> > (no matter what config or build system), and it is a good thing too :-)
>
> I haven't seen it in my GCC 10 builds, so there must be some other
> subtlety. Probably it depends on details of the .config.
>
I've just had this building the latest upstream for the ppc64le with a derivative
of the RHEL-8 config. This can probably be a compiler/linker setting, like -O2
versus -O3.
> cheers
Best regards,
Vladis Dronov | Red Hat, Inc. | The Core Kernel | Senior Software Engineer
^ permalink raw reply
* [powerpc:fixes-test] BUILD SUCCESS 909adfc66b9a1db21b5e8733e9ebfa6cd5135d74
From: kernel test robot @ 2020-07-30 15:30 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test
branch HEAD: 909adfc66b9a1db21b5e8733e9ebfa6cd5135d74 powerpc/64s/hash: Fix hash_preload running with interrupts enabled
elapsed time: 4429m
configs tested: 102
configs skipped: 3
The following configs have been built successfully.
More configs may be tested in the coming days.
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
sh r7785rp_defconfig
mips tb0226_defconfig
mips loongson3_defconfig
um kunit_defconfig
nds32 alldefconfig
arm imx_v4_v5_defconfig
mips gcw0_defconfig
mips fuloong2e_defconfig
arm pxa255-idp_defconfig
s390 defconfig
arm prima2_defconfig
arm footbridge_defconfig
mips nlm_xlr_defconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k defconfig
m68k allmodconfig
m68k allyesconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
c6x allyesconfig
nds32 defconfig
nios2 allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
parisc defconfig
s390 allyesconfig
parisc allyesconfig
i386 allyesconfig
sparc allyesconfig
sparc defconfig
i386 defconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
powerpc defconfig
x86_64 randconfig-a005-20200727
x86_64 randconfig-a004-20200727
x86_64 randconfig-a003-20200727
x86_64 randconfig-a006-20200727
x86_64 randconfig-a002-20200727
x86_64 randconfig-a001-20200727
i386 randconfig-a003-20200728
i386 randconfig-a004-20200728
i386 randconfig-a005-20200728
i386 randconfig-a002-20200728
i386 randconfig-a006-20200728
i386 randconfig-a001-20200728
i386 randconfig-a003-20200727
i386 randconfig-a005-20200727
i386 randconfig-a004-20200727
i386 randconfig-a006-20200727
i386 randconfig-a002-20200727
i386 randconfig-a001-20200727
x86_64 randconfig-a014-20200728
x86_64 randconfig-a012-20200728
x86_64 randconfig-a015-20200728
x86_64 randconfig-a016-20200728
x86_64 randconfig-a013-20200728
x86_64 randconfig-a011-20200728
i386 randconfig-a016-20200728
i386 randconfig-a012-20200728
i386 randconfig-a013-20200728
i386 randconfig-a014-20200728
i386 randconfig-a011-20200728
i386 randconfig-a015-20200728
i386 randconfig-a016-20200727
i386 randconfig-a013-20200727
i386 randconfig-a012-20200727
i386 randconfig-a015-20200727
i386 randconfig-a011-20200727
i386 randconfig-a014-20200727
i386 randconfig-a016-20200730
i386 randconfig-a012-20200730
i386 randconfig-a014-20200730
i386 randconfig-a015-20200730
i386 randconfig-a011-20200730
i386 randconfig-a013-20200730
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
x86_64 rhel
x86_64 allyesconfig
x86_64 rhel-7.6-kselftests
x86_64 defconfig
x86_64 rhel-8.3
x86_64 kexec
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: explicitly reschedule during drmem_lmb list traversal
From: Nathan Lynch @ 2020-07-30 15:01 UTC (permalink / raw)
To: Michael Ellerman, Laurent Dufour; +Cc: tyreld, cheloha, linuxppc-dev
In-Reply-To: <87lfj16cql.fsf@mpe.ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Laurent Dufour <ldufour@linux.ibm.com> writes:
>>> Le 28/07/2020 à 19:37, Nathan Lynch a écrit :
>>>> The drmem lmb list can have hundreds of thousands of entries, and
>>>> unfortunately lookups take the form of linear searches. As long as
>>>> this is the case, traversals have the potential to monopolize the CPU
>>>> and provoke lockup reports, workqueue stalls, and the like unless
>>>> they explicitly yield.
>>>>
>>>> Rather than placing cond_resched() calls within various
>>>> for_each_drmem_lmb() loop blocks in the code, put it in the iteration
>>>> expression of the loop macro itself so users can't omit it.
>>>
>>> Is that not too much to call cond_resched() on every LMB?
>>>
>>> Could that be less frequent, every 10, or 100, I don't really know ?
>>
>> Everything done within for_each_drmem_lmb is relatively heavyweight
>> already. E.g. calling dlpar_remove_lmb()/dlpar_add_lmb() can take dozens
>> of milliseconds. I don't think cond_resched() is an expensive check in
>> this context.
>
> Hmm, mostly.
>
> But there are quite a few cases like drmem_update_dt_v1():
>
> for_each_drmem_lmb(lmb) {
> dr_cell->base_addr = cpu_to_be64(lmb->base_addr);
> dr_cell->drc_index = cpu_to_be32(lmb->drc_index);
> dr_cell->aa_index = cpu_to_be32(lmb->aa_index);
> dr_cell->flags = cpu_to_be32(drmem_lmb_flags(lmb));
>
> dr_cell++;
> }
>
> Which will compile to a pretty tight loop at the moment.
>
> Or drmem_update_dt_v2() which has two loops over all lmbs.
>
> And although the actual TIF check is cheap the function call to do it is
> not free.
>
> So I worry this is going to make some of those long loops take even
> longer.
That's fair, and I was wrong - some of the loop bodies are relatively
simple, not doing allocations or taking locks, etc.
One way to deal is to keep for_each_drmem_lmb() as-is and add a new
iterator that can reschedule, e.g. for_each_drmem_lmb_slow().
On the other hand... it's probably not too strong to say that the
drmem/hotplug code is in crisis with respect to correctness and
algorithmic complexity, so those are my overriding concerns right
now. Yes, this change will pessimize loops that are reinitializing the
entire drmem_lmb array on every DLPAR operation, but:
1. it doesn't make any user of for_each_drmem_lmb() less correct;
2. why is this code doing that in the first place, other than to
accommodate a poor data structure choice?
The duration of the system calls where this code runs are measured in
minutes or hours on large configurations because of all the behaviors
that are at best O(n) with the amount of memory assigned to the
partition. For simplicity's sake I'd rather defer lower-level
performance considerations like this until the drmem data structures'
awful lookup properties are fixed -- hopefully in the 5.10 timeframe.
Thoughts?
^ permalink raw reply
* [powerpc:next-test] BUILD SUCCESS 2e6bd221d96fcfd9bd1eed5cd9c008e7959daed7
From: kernel test robot @ 2020-07-30 14:42 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
branch HEAD: 2e6bd221d96fcfd9bd1eed5cd9c008e7959daed7 powerpc/kexec_file: Enable early kernel OPAL calls
elapsed time: 1395m
configs tested: 52
configs skipped: 1
The following configs have been built successfully.
More configs may be tested in the coming days.
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
m68k allyesconfig
nds32 defconfig
nios2 allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
parisc defconfig
s390 allyesconfig
parisc allyesconfig
s390 defconfig
i386 allyesconfig
sparc allyesconfig
sparc defconfig
i386 defconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
c6x allyesconfig
mips allyesconfig
mips allmodconfig
powerpc defconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
i386 randconfig-a016-20200730
i386 randconfig-a012-20200730
i386 randconfig-a014-20200730
i386 randconfig-a015-20200730
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
x86_64 rhel
x86_64 allyesconfig
x86_64 rhel-7.6-kselftests
x86_64 defconfig
x86_64 rhel-8.3
x86_64 kexec
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* [PATCHv4 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents
From: Pingfan Liu @ 2020-07-30 13:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: Nathan Lynch, kexec, Pingfan Liu, Nathan Fontenot, Hari Bathini
In-Reply-To: <1596116005-27511-1-git-send-email-kernelfans@gmail.com>
A bug is observed on pseries by taking the following steps on rhel:
-1. drmgr -c mem -r -q 5
-2. echo c > /proc/sysrq-trigger
And then, the failure looks like:
kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Checking for memory holes : [ 0.0 %] / Checking for memory holes : [100.0 %] | Excluding unnecessary pages : [100.0 %] \ Copying data : [ 0.3 %] - eta: 38s[ 44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
[ 44.337663] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
[ 44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba400000 access=0x8000000000000004 current=makedumpfile
[ 44.337692] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 psize 2 pte=0xc000000050000504
[ 44.337708] makedumpfile[469]: unhandled signal 7 at 00007fffba400000 nip 00007fffbbc4d7fc lr 000000011356ca3c code 2
[ 44.338548] Core dump to |/bin/false pipe failed
/lib/kdump-lib-initramfs.sh: line 98: 469 Bus error $CORE_COLLECTOR /proc/vmcore $_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
kdump: saving vmcore failed
* Root cause *
After analyzing, it turns out that in the current implementation,
when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
the code __remove_memory() comes before drmem_update_dt().
So in kdump kernel, when read_from_oldmem() resorts to
pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
can be observed "Bus error"
From a viewpoint of listener and publisher, the publisher notifies the
listener before data is ready. This introduces a problem where udev
launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
updating. And in capture kernel, makedumpfile will access the memory based
on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.
* Fix *
This bug is introduced by commit 063b8b1251fd
("powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR
request"), which tried to combine all the dt updating into one.
To fix this issue, meanwhile not to introduce a quadratic runtime
complexity by the model:
dlpar_memory_add_by_count
for_each_drmem_lmb <--
dlpar_add_lmb
drmem_update_dt(_v1|_v2)
for_each_drmem_lmb <--
The dt should still be only updated once, and just before the last memory
online/offline event is ejected to user space. Achieve this by tracing the
num of lmb added or removed.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
v3 -> v4: resolve a quadratic runtime complexity issue.
This series is applied on next-test branch
arch/powerpc/platforms/pseries/hotplug-memory.c | 88 ++++++++++++++++++-------
1 file changed, 66 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 1a3ac3b..e07d5b1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -350,13 +350,13 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
return true;
}
-static int dlpar_add_lmb(struct drmem_lmb *);
+static int dlpar_add_lmb(struct drmem_lmb *lmb, bool dt_update);
-static int dlpar_remove_lmb(struct drmem_lmb *lmb)
+static int dlpar_remove_lmb(struct drmem_lmb *lmb, bool dt_update)
{
unsigned long block_sz;
phys_addr_t base_addr;
- int rc, nid;
+ int rc, ret, nid;
if (!lmb_is_removable(lmb))
return -EINVAL;
@@ -372,6 +372,11 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+ if (dt_update) {
+ ret = drmem_update_dt();
+ if (ret)
+ pr_warn("%s fail to update dt, but continue\n", __func__);
+ }
__remove_memory(nid, base_addr, block_sz);
@@ -387,6 +392,7 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
int lmbs_removed = 0;
int lmbs_available = 0;
int rc;
+ bool dt_update = false;
pr_info("Attempting to hot-remove %d LMB(s)\n", lmbs_to_remove);
@@ -409,7 +415,7 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
}
for_each_drmem_lmb(lmb) {
- rc = dlpar_remove_lmb(lmb);
+ rc = dlpar_remove_lmb(lmb, dt_update);
if (rc)
continue;
@@ -421,16 +427,27 @@ static int dlpar_memory_remove_by_count(u32 lmbs_to_remove)
lmbs_removed++;
if (lmbs_removed == lmbs_to_remove)
break;
+ /* combine dt updating */
+ else if (lmbs_removed == lmbs_to_remove - 1)
+ dt_update = true;
}
if (lmbs_removed != lmbs_to_remove) {
+ bool rollback_dt_update = false;
+
pr_err("Memory hot-remove failed, adding LMB's back\n");
for_each_drmem_lmb(lmb) {
if (!drmem_lmb_reserved(lmb))
continue;
- rc = dlpar_add_lmb(lmb);
+ /*
+ * Even if dlpar_remove_lmb() fails to update dt, it is
+ * harmless to update dt here.
+ */
+ if (--lmbs_removed == 0 && dt_update)
+ rollback_dt_update = true;
+ rc = dlpar_add_lmb(lmb, rollback_dt_update);
if (rc)
pr_err("Failed to add LMB back, drc index %x\n",
lmb->drc_index);
@@ -468,7 +485,7 @@ static int dlpar_memory_remove_by_index(u32 drc_index)
for_each_drmem_lmb(lmb) {
if (lmb->drc_index == drc_index) {
lmb_found = 1;
- rc = dlpar_remove_lmb(lmb);
+ rc = dlpar_remove_lmb(lmb, true);
if (!rc)
dlpar_release_drc(lmb->drc_index);
@@ -493,6 +510,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
struct drmem_lmb *lmb, *start_lmb, *end_lmb;
int lmbs_available = 0;
int rc;
+ bool dt_update = false;
pr_info("Attempting to hot-remove %u LMB(s) at %x\n",
lmbs_to_remove, drc_index);
@@ -519,7 +537,9 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
continue;
- rc = dlpar_remove_lmb(lmb);
+ if (lmb == end_lmb)
+ dt_update = true;
+ rc = dlpar_remove_lmb(lmb, dt_update);
if (rc)
break;
@@ -527,14 +547,17 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
}
if (rc) {
- pr_err("Memory indexed-count-remove failed, adding any removed LMBs\n");
+ bool rollback_dt_update = false;
+ pr_err("Memory indexed-count-remove failed, adding any removed LMBs\n");
for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
if (!drmem_lmb_reserved(lmb))
continue;
-
- rc = dlpar_add_lmb(lmb);
+ /* Since in removing path, dt is only updated if lmb == end_lmb */
+ if (lmb == end_lmb)
+ rollback_dt_update = true;
+ rc = dlpar_add_lmb(lmb, rollback_dt_update);
if (rc)
pr_err("Failed to add LMB, drc index %x\n",
lmb->drc_index);
@@ -572,7 +595,7 @@ static inline int dlpar_memory_remove(struct pseries_hp_errorlog *hp_elog)
{
return -EOPNOTSUPP;
}
-static int dlpar_remove_lmb(struct drmem_lmb *lmb)
+static int dlpar_remove_lmb(struct drmem_lmb *lmb, bool dt_update)
{
return -EOPNOTSUPP;
}
@@ -591,10 +614,10 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
}
#endif /* CONFIG_MEMORY_HOTREMOVE */
-static int dlpar_add_lmb(struct drmem_lmb *lmb)
+static int dlpar_add_lmb(struct drmem_lmb *lmb, bool dt_update)
{
unsigned long block_sz;
- int rc;
+ int rc, ret;
if (lmb->flags & DRCONF_MEM_ASSIGNED)
return -EINVAL;
@@ -607,6 +630,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
lmb_set_nid(lmb);
lmb->flags |= DRCONF_MEM_ASSIGNED;
+ if (dt_update) {
+ ret = drmem_update_dt();
+ if (ret)
+ pr_warn("%s fail to update dt, but continue\n", __func__);
+ }
block_sz = memory_block_size_bytes();
@@ -625,7 +653,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
-
+ if (dt_update) {
+ ret = drmem_update_dt();
+ if (ret)
+ pr_warn("%s fail to update dt during rollback, but continue\n", __func__);
+ }
__remove_memory(nid, base_addr, block_sz);
}
@@ -638,6 +670,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
int lmbs_available = 0;
int lmbs_added = 0;
int rc;
+ bool dt_update = false;
pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
@@ -664,7 +697,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
if (rc)
continue;
- rc = dlpar_add_lmb(lmb);
+ rc = dlpar_add_lmb(lmb, dt_update);
if (rc) {
dlpar_release_drc(lmb->drc_index);
continue;
@@ -678,16 +711,23 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
lmbs_added++;
if (lmbs_added == lmbs_to_add)
break;
+ else if (lmbs_added == lmbs_to_add - 1)
+ dt_update = true;
}
if (lmbs_added != lmbs_to_add) {
+ bool rollback_dt_update = false;
+
pr_err("Memory hot-add failed, removing any added LMBs\n");
for_each_drmem_lmb(lmb) {
if (!drmem_lmb_reserved(lmb))
continue;
- rc = dlpar_remove_lmb(lmb);
+ if (--lmbs_added == 0 && dt_update)
+ rollback_dt_update = true;
+
+ rc = dlpar_remove_lmb(lmb, rollback_dt_update);
if (rc)
pr_err("Failed to remove LMB, drc index %x\n",
lmb->drc_index);
@@ -725,7 +765,7 @@ static int dlpar_memory_add_by_index(u32 drc_index)
lmb_found = 1;
rc = dlpar_acquire_drc(lmb->drc_index);
if (!rc) {
- rc = dlpar_add_lmb(lmb);
+ rc = dlpar_add_lmb(lmb, true);
if (rc)
dlpar_release_drc(lmb->drc_index);
}
@@ -751,6 +791,7 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
struct drmem_lmb *lmb, *start_lmb, *end_lmb;
int lmbs_available = 0;
int rc;
+ bool dt_update = false;
pr_info("Attempting to hot-add %u LMB(s) at index %x\n",
lmbs_to_add, drc_index);
@@ -781,7 +822,9 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
if (rc)
break;
- rc = dlpar_add_lmb(lmb);
+ if (lmb == end_lmb)
+ dt_update = true;
+ rc = dlpar_add_lmb(lmb, dt_update);
if (rc) {
dlpar_release_drc(lmb->drc_index);
break;
@@ -794,10 +837,14 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
pr_err("Memory indexed-count-add failed, removing any added LMBs\n");
for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
+ bool rollback_dt_update = false;
+
if (!drmem_lmb_reserved(lmb))
continue;
- rc = dlpar_remove_lmb(lmb);
+ if (lmb == end_lmb)
+ rollback_dt_update = true;
+ rc = dlpar_remove_lmb(lmb, rollback_dt_update);
if (rc)
pr_err("Failed to remove LMB, drc index %x\n",
lmb->drc_index);
@@ -877,9 +924,6 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
break;
}
- if (!rc)
- rc = drmem_update_dt();
-
unlock_device_hotplug();
return rc;
}
--
2.7.5
^ permalink raw reply related
* [PATCHv4 1/2] powerpc/pseries: group lmb operation and memblock's
From: Pingfan Liu @ 2020-07-30 13:33 UTC (permalink / raw)
To: linuxppc-dev
Cc: Nathan Lynch, kexec, Pingfan Liu, Nathan Fontenot, Hari Bathini
This patch prepares for the incoming patch which swaps the order of
KOBJ_ADD/REMOVE uevent and dt's updating.
The dt updating should come after lmb operations, and before
__remove_memory()/__add_memory(). Accordingly, grouping all lmb operations
before the memblock's.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
v3 -> v4: improve commit log
arch/powerpc/platforms/pseries/hotplug-memory.c | 26 ++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 5d545b7..1a3ac3b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -355,7 +355,8 @@ static int dlpar_add_lmb(struct drmem_lmb *);
static int dlpar_remove_lmb(struct drmem_lmb *lmb)
{
unsigned long block_sz;
- int rc;
+ phys_addr_t base_addr;
+ int rc, nid;
if (!lmb_is_removable(lmb))
return -EINVAL;
@@ -364,17 +365,19 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
if (rc)
return rc;
+ base_addr = lmb->base_addr;
+ nid = lmb->nid;
block_sz = pseries_memory_block_size();
- __remove_memory(lmb->nid, lmb->base_addr, block_sz);
-
- /* Update memory regions for memory remove */
- memblock_remove(lmb->base_addr, block_sz);
-
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+ __remove_memory(nid, base_addr, block_sz);
+
+ /* Update memory regions for memory remove */
+ memblock_remove(base_addr, block_sz);
+
return 0;
}
@@ -603,6 +606,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
}
lmb_set_nid(lmb);
+ lmb->flags |= DRCONF_MEM_ASSIGNED;
+
block_sz = memory_block_size_bytes();
/* Add the memory */
@@ -614,11 +619,14 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
rc = dlpar_online_lmb(lmb);
if (rc) {
- __remove_memory(lmb->nid, lmb->base_addr, block_sz);
+ int nid = lmb->nid;
+ phys_addr_t base_addr = lmb->base_addr;
+
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
- } else {
- lmb->flags |= DRCONF_MEM_ASSIGNED;
+ lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+
+ __remove_memory(nid, base_addr, block_sz);
}
return rc;
--
2.7.5
^ permalink raw reply related
* [PATCH] soc: fsl: Remove bogus packed attributes from qman.h
From: Herbert Xu @ 2020-07-30 12:52 UTC (permalink / raw)
To: Li Yang, linuxppc-dev, linux-arm-kernel
There are two __packed attributes in qman.h that are both unnecessary
and causing compiler warnings because they're conflicting with
explicit alignment requirements set on members within the structure.
This patch removes them both.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/include/soc/fsl/qman.h b/include/soc/fsl/qman.h
index cfe00e08e85b..d81ff185dc0b 100644
--- a/include/soc/fsl/qman.h
+++ b/include/soc/fsl/qman.h
@@ -256,7 +256,7 @@ struct qm_dqrr_entry {
__be32 context_b;
struct qm_fd fd;
u8 __reserved4[32];
-} __packed;
+};
#define QM_DQRR_VERB_VBIT 0x80
#define QM_DQRR_VERB_MASK 0x7f /* where the verb contains; */
#define QM_DQRR_VERB_FRAME_DEQUEUE 0x60 /* "this format" */
@@ -289,7 +289,7 @@ union qm_mr_entry {
__be32 tag;
struct qm_fd fd;
u8 __reserved1[32];
- } __packed ern;
+ } ern;
struct {
u8 verb;
u8 fqs; /* Frame Queue Status */
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* Re: [PATCH] powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=y
From: Michael Ellerman @ 2020-07-30 12:51 UTC (permalink / raw)
To: linuxppc-dev, Michael Ellerman
In-Reply-To: <20200727070341.595634-1-mpe@ellerman.id.au>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 473 bytes --]
On Mon, 27 Jul 2020 17:03:41 +1000, Michael Ellerman wrote:
> skiroot_defconfig fails:
>
> arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used
> 48 | static atomic_t cpus_in_fadump;
>
> Fix it by moving the definition into the #ifdef where it's used.
Applied to powerpc/next.
[1/1] powerpc/fadump: Fix build error with CONFIG_PRESERVE_FA_DUMP=y
https://git.kernel.org/powerpc/c/5f987caec521cbb00d4ba2dc641ac8074626b762
cheers
^ permalink raw reply
* Re: [PATCH -next] powerpc/powernv/sriov: Remove unused but set variable 'phb'
From: Michael Ellerman @ 2020-07-30 12:50 UTC (permalink / raw)
To: Alexey Kardashevskiy, Wei Yongjun, Hulk Robot, Michael Ellerman,
Oliver O'Halloran
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200727171112.2781-1-weiyongjun1@huawei.com>
On Tue, 28 Jul 2020 01:11:12 +0800, Wei Yongjun wrote:
> Gcc report warning as follows:
>
> arch/powerpc/platforms/powernv/pci-sriov.c:602:25: warning:
> variable 'phb' set but not used [-Wunused-but-set-variable]
> 602 | struct pnv_phb *phb;
> | ^~~
>
> [...]
Applied to powerpc/next.
[1/1] powerpc/powernv/sriov: Remove unused but set variable 'phb'
https://git.kernel.org/powerpc/c/cf1ae052e073c7ef6cf1a783a6427f7228253bd3
cheers
^ permalink raw reply
* Re: [PATCH v2 0/6] Improvements to pkey tests
From: Michael Ellerman @ 2020-07-30 12:50 UTC (permalink / raw)
To: mpe, Sandipan Das; +Cc: fweimer, linuxppc-dev, linuxram, bauerman, aneesh.kumar
In-Reply-To: <cover.1595821792.git.sandipan@linux.ibm.com>
On Mon, 27 Jul 2020 09:30:34 +0530, Sandipan Das wrote:
> Based on recent bugs found in the pkey infrastructure, this
> improves the test for execute-disabled pkeys and adds a new
> test for detecting inconsistencies with the pkey reported by
> the signal information upon getting a fault.
>
> Previous versions can be found at:
> v1: https://lore.kernel.org/linuxppc-dev/cover.1594897099.git.sandipan@linux.ibm.com/
>
> [...]
Applied to powerpc/next.
[1/6] selftests/powerpc: Move pkey helpers to headers
https://git.kernel.org/powerpc/c/128d3d0210076232b7d54c361082c8ee17e4b669
[2/6] selftests/powerpc: Add pkey helpers for rights
https://git.kernel.org/powerpc/c/264d7fccc4711328a19f07e6bd57aee4c68803aa
[3/6] selftests/powerpc: Harden test for execute-disabled pkeys
https://git.kernel.org/powerpc/c/03634bbf5d8a6f2d97e6150a1b8ff03675badac3
[4/6] selftests/powerpc: Add helper to exit on failure
https://git.kernel.org/powerpc/c/e3333c599482245d08002725cc1b353e4963fa26
[5/6] selftests/powerpc: Add wrapper for gettid
https://git.kernel.org/powerpc/c/743f3544fffb9662aaf550c8358a8c1b6fcae707
[6/6] selftests/powerpc: Add test for pkey siginfo verification
https://git.kernel.org/powerpc/c/c27f2fd1705a7e19ef2dc2b986c0d1cde3c3dbe7
cheers
^ permalink raw reply
* Re: [PATCH] selftests/powerpc: Squash spurious errors due to device removal
From: Michael Ellerman @ 2020-07-30 12:50 UTC (permalink / raw)
To: Oliver O'Halloran, linuxppc-dev
In-Reply-To: <20200727010127.23698-1-oohall@gmail.com>
On Mon, 27 Jul 2020 11:01:27 +1000, Oliver O'Halloran wrote:
> For drivers that don't have the error handling callbacks we implement
> recovery by removing the device and re-probing it. This causes the sysfs
> directory for the PCI device to be removed which causes the following
> spurious error to be printed when checking the PE state:
>
> Breaking 0005:03:00.0...
> ./eeh-basic.sh: line 13: can't open /sys/bus/pci/devices/0005:03:00.0/eeh_pe_state: no such file
> 0005:03:00.0, waited 0/60
> 0005:03:00.0, waited 1/60
> 0005:03:00.0, waited 2/60
> 0005:03:00.0, waited 3/60
> 0005:03:00.0, waited 4/60
> 0005:03:00.0, waited 5/60
> 0005:03:00.0, waited 6/60
> 0005:03:00.0, waited 7/60
> 0005:03:00.0, Recovered after 8 seconds
>
> [...]
Applied to powerpc/next.
[1/1] selftests/powerpc: Squash spurious errors due to device removal
https://git.kernel.org/powerpc/c/5f8cf6475828b600ff6d000e580c961ac839cc61
cheers
^ permalink raw reply
* Re: [PATCH v3 0/3] powerpc/pseries: IPI doorbell improvements
From: Michael Ellerman @ 2020-07-30 12:50 UTC (permalink / raw)
To: linuxppc-dev, Nicholas Piggin
Cc: Anton Blanchard, Cédric Le Goater, kvm-ppc, David Gibson
In-Reply-To: <20200726035155.1424103-1-npiggin@gmail.com>
On Sun, 26 Jul 2020 13:51:52 +1000, Nicholas Piggin wrote:
> Since v2:
> - Fixed ppc32 compile error
> - Tested-by from Cedric
>
> Nicholas Piggin (3):
> powerpc: inline doorbell sending functions
> powerpc/pseries: Use doorbells even if XIVE is available
> powerpc/pseries: Add KVM guest doorbell restrictions
>
> [...]
Applied to powerpc/next.
[1/3] powerpc: Inline doorbell sending functions
https://git.kernel.org/powerpc/c/1f0ce497433f8944045ee1baae218e31a0d295ee
[2/3] powerpc/pseries: Use doorbells even if XIVE is available
https://git.kernel.org/powerpc/c/5b06d1679f2fe874ef49ea11324cd893ec9e2da8
[3/3] powerpc/pseries: Add KVM guest doorbell restrictions
https://git.kernel.org/powerpc/c/107c55005fbd5243ee31fb13b6f166cde9e3ade1
cheers
^ permalink raw reply
* Re: [PATCH] powerpc/build: vdso linker warning for orphan sections
From: Michael Ellerman @ 2020-07-30 12:50 UTC (permalink / raw)
To: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20200303012748.4190929-1-npiggin@gmail.com>
On Tue, 3 Mar 2020 11:27:48 +1000, Nicholas Piggin wrote:
>
Applied to powerpc/next.
[1/1] powerpc/build: vdso linker warning for orphan sections
https://git.kernel.org/powerpc/c/f2af201002a8bc22500c04cc474ea480bf361351
cheers
^ permalink raw reply
* Re: [PATCH v2 1/5] selftests/powerpc: Add test of stack expansion logic
From: Michael Ellerman @ 2020-07-30 12:50 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev; +Cc: linux-kernel, dja
In-Reply-To: <20200724092528.1578671-1-mpe@ellerman.id.au>
On Fri, 24 Jul 2020 19:25:24 +1000, Michael Ellerman wrote:
> We have custom stack expansion checks that it turns out are extremely
> badly tested and contain bugs, surprise. So add some tests that
> exercise the code and capture the current boundary conditions.
>
> The signal test currently fails on 64-bit kernels because the 2048
> byte allowance for the signal frame is too small, we will fix that in
> a subsequent patch.
Applied to powerpc/next.
[1/5] selftests/powerpc: Add test of stack expansion logic
https://git.kernel.org/powerpc/c/c9938a9dac95be7650218cdd8e9d1f882e7b5691
[2/5] powerpc: Allow 4224 bytes of stack expansion for the signal frame
https://git.kernel.org/powerpc/c/63dee5df43a31f3844efabc58972f0a206ca4534
[3/5] selftests/powerpc: Update the stack expansion test
https://git.kernel.org/powerpc/c/9ee571d84bf8cfdd587a1acbf3490ca90fc40c9d
[4/5] powerpc/mm: Remove custom stack expansion checking
https://git.kernel.org/powerpc/c/773b3e53df5b84e73bf64998e4019f50a6662ad1
[5/5] selftests/powerpc: Remove powerpc special cases from stack expansion test
https://git.kernel.org/powerpc/c/73da08f6966b81feb429af4fb3229da4cf21d6d9
cheers
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox