LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 06/15] rsxx: stop using ->queuedata
From: Christoph Hellwig @ 2020-05-08 16:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-bcache, linux-xtensa, linux-raid, Sergey Senozhatsky,
	linux-nvdimm, Geoff Levand, linux-kernel, Jim Paris, linux-block,
	Minchan Kim, linux-m68k, Philip Kelleher, linuxppc-dev,
	Joshua Morris, Nitin Gupta, drbd-dev
In-Reply-To: <20200508161517.252308-1-hch@lst.de>

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/rsxx/dev.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 8ffa8260dcafe..6dde80b096c62 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -133,7 +133,7 @@ static void bio_dma_done_cb(struct rsxx_cardinfo *card,
 
 static blk_qc_t rsxx_make_request(struct request_queue *q, struct bio *bio)
 {
-	struct rsxx_cardinfo *card = q->queuedata;
+	struct rsxx_cardinfo *card = bio->bi_disk->private_data;
 	struct rsxx_bio_meta *bio_meta;
 	blk_status_t st = BLK_STS_IOERR;
 
@@ -282,8 +282,6 @@ int rsxx_setup_dev(struct rsxx_cardinfo *card)
 		card->queue->limits.discard_alignment   = RSXX_HW_BLK_SIZE;
 	}
 
-	card->queue->queuedata = card;
-
 	snprintf(card->gendisk->disk_name, sizeof(card->gendisk->disk_name),
 		 "rsxx%d", card->disk_id);
 	card->gendisk->major = card->major;
@@ -304,7 +302,6 @@ void rsxx_destroy_dev(struct rsxx_cardinfo *card)
 	card->gendisk = NULL;
 
 	blk_cleanup_queue(card->queue);
-	card->queue->queuedata = NULL;
 	unregister_blkdev(card->major, DRIVER_NAME);
 }
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 02/15] simdisk: stop using ->queuedata
From: Christoph Hellwig @ 2020-05-08 16:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-bcache, linux-xtensa, linux-raid, Sergey Senozhatsky,
	linux-nvdimm, Geoff Levand, linux-kernel, Jim Paris, linux-block,
	Minchan Kim, linux-m68k, Philip Kelleher, linuxppc-dev,
	Joshua Morris, Nitin Gupta, drbd-dev
In-Reply-To: <20200508161517.252308-1-hch@lst.de>

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/xtensa/platforms/iss/simdisk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/xtensa/platforms/iss/simdisk.c b/arch/xtensa/platforms/iss/simdisk.c
index 49322b66cda93..31b5020077a05 100644
--- a/arch/xtensa/platforms/iss/simdisk.c
+++ b/arch/xtensa/platforms/iss/simdisk.c
@@ -103,7 +103,7 @@ static void simdisk_transfer(struct simdisk *dev, unsigned long sector,
 
 static blk_qc_t simdisk_make_request(struct request_queue *q, struct bio *bio)
 {
-	struct simdisk *dev = q->queuedata;
+	struct simdisk *dev = bio->bi_disk->private_data;
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 	sector_t sector = bio->bi_iter.bi_sector;
@@ -273,8 +273,6 @@ static int __init simdisk_setup(struct simdisk *dev, int which,
 		goto out_alloc_queue;
 	}
 
-	dev->queue->queuedata = dev;
-
 	dev->gd = alloc_disk(SIMDISK_MINORS);
 	if (dev->gd == NULL) {
 		pr_err("alloc_disk failed\n");
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules
From: Borislav Petkov @ 2020-05-08 16:09 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Cezary Rojewski, linux-nvdimm, linux-kernel, Steven Rostedt,
	Piotr Maziarz, Aneesh Kumar K . V, linuxppc-dev
In-Reply-To: <87blmy8wm8.fsf@linux.ibm.com>

On Fri, May 08, 2020 at 05:30:31PM +0530, Vaibhav Jain wrote:
> I am referring to Kernel Loadable Modules with MODULE_LICENSE("GPL")
> here.

And what does "external" refer to? Because if it is out-of-tree, we
don't export symbols for out-of-tree modules.

Looks like you're exporting it for that papr_scm.c thing, which is fine.
But that is not "external".

So?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules
From: Joe Perches @ 2020-05-08 14:52 UTC (permalink / raw)
  To: Vaibhav Jain, Borislav Petkov
  Cc: Cezary Rojewski, linux-nvdimm, linux-kernel, Steven Rostedt,
	Piotr Maziarz, Aneesh Kumar K . V, linuxppc-dev
In-Reply-To: <87blmy8wm8.fsf@linux.ibm.com>

On Fri, 2020-05-08 at 17:30 +0530, Vaibhav Jain wrote:
> Hi Boris,
> 
> Borislav Petkov <bp@alien8.de> writes:
> 
> > On Fri, May 08, 2020 at 04:19:19PM +0530, Vaibhav Jain wrote:
> > > 'seq_buf' provides a very useful abstraction for writing to a string
> > > buffer without needing to worry about it over-flowing. However even
> > > though the API has been stable for couple of years now its stills not
> > > exported to external modules limiting its usage.
> > > 
> > > Hence this patch proposes update to 'seq_buf.c' to mark
> > > seq_buf_printf() which is part of the seq_buf API to be exported to
> > > external GPL modules. This symbol will be used in later parts of this
> > 
> > What is "external GPL modules"?
> I am referring to Kernel Loadable Modules with MODULE_LICENSE("GPL")
> here.

Any reason why these Kernel Loadable Modules with MODULE_LICENSE("GPL")
are not in the kernel tree?



^ permalink raw reply

* [PATCH -next] soc: fsl: qbman: Remove unused inline function qm_eqcr_get_ci_stashing
From: YueHaibing @ 2020-05-08 14:08 UTC (permalink / raw)
  To: leoyang.li, roy.pledge
  Cc: YueHaibing, linuxppc-dev, linux-kernel, linux-arm-kernel

There's no callers in-tree anymore.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/soc/fsl/qbman/qman.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 1e164e03410a..9888a7061873 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -449,11 +449,6 @@ static inline int qm_eqcr_init(struct qm_portal *portal,
 	return 0;
 }
 
-static inline unsigned int qm_eqcr_get_ci_stashing(struct qm_portal *portal)
-{
-	return (qm_in(portal, QM_REG_CFG) >> 28) & 0x7;
-}
-
 static inline void qm_eqcr_finish(struct qm_portal *portal)
 {
 	struct qm_eqcr *eqcr = &portal->eqcr;
-- 
2.17.1



^ permalink raw reply related

* ioremap() called early from pnv_pci_init_ioda_phb()
From: Qian Cai @ 2020-05-08 14:39 UTC (permalink / raw)
  To: Christophe Leroy, Michael Ellerman; +Cc: linuxppc-dev, LKML

 Booting POWER9 PowerNV has this message,
    
"ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use early_ioremap() instead”

but use the patch below will result in leaks because it will never call early_iounmap() anywhere. However, it looks me it was by design that phb->regs mapping would be there forever where it would be used in pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() initcall too strong?

--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -36,6 +36,7 @@
 #include <asm/firmware.h>
 #include <asm/pnv-pci.h>
 #include <asm/mmzone.h>
+#include <asm/early_ioremap.h>
 
 #include <misc/cxl-base.h>
 
@@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
        /* Get registers */
        if (!of_address_to_resource(np, 0, &r)) {
                phb->regs_phys = r.start;
-               phb->regs = ioremap(r.start, resource_size(&r));
+               phb->regs = early_ioremap(r.start, resource_size(&r));
                if (phb->regs == NULL)
                        pr_err("  Failed to map registers !\n”);

[   23.080069][    T1] ------------[ cut here ]------------
[   23.080089][    T1] Debug warning: early ioremap leak of 10 areas detected.
[   23.080089][    T1] please boot with early_ioremap_debug and report the dmesg.
[   23.080157][    T1] WARNING: CPU: 4 PID: 1 at mm/early_ioremap.c:99 check_early_ioremap_leak+0xd4/0x108
[   23.080171][    T1] Modules linked in:
[   23.080192][    T1] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4-next-20200508+ #4
[   23.080214][    T1] NIP:  c00000000103f2d8 LR: c00000000103f2d4 CTR: 0000000000000000
[   23.080226][    T1] REGS: c00000003df0f860 TRAP: 0700   Not tainted  (5.7.0-rc4-next-20200508+)
[   23.080259][    T1] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48000222  XER: 20040000
[   23.080296][    T1] CFAR: c00000000010d5a8 IRQMASK: 0 
[   23.080296][    T1] GPR00: c00000000103f2d4 c00000003df0faf0 c000000001689400 0000000000000072 
[   23.080296][    T1] GPR04: 0000000000000006 0000000000000000 c00000003df0f7e4 0000000000000004 
[   23.080296][    T1] GPR08: 0000001ffbb60000 0000000000000000 c00000003dee6680 0000000000000002 
[   23.080296][    T1] GPR12: 0000000000000000 c000001fffffae00 c000000001057860 c0000000010578b0 
[   23.080296][    T1] GPR16: c000000001002d38 c0000000014f0660 c0000000014f0680 c0000000014f06a0 
[   23.080296][    T1] GPR20: c0000000014f06c0 c0000000014f06e0 c0000000014f0700 c0000000014f0720 
[   23.080296][    T1] GPR24: c000000000c4bc30 c000000486b82000 c0000000015a0fe0 c0000000015a0fc0 
[   23.080296][    T1] GPR28: 0000000000000010 0000000000000010 c000000001061e30 000000000000000a 
[   23.080507][    T1] NIP [c00000000103f2d8] check_early_ioremap_leak+0xd4/0x108
[   23.080530][    T1] LR [c00000000103f2d4] check_early_ioremap_leak+0xd0/0x108
[   23.080552][    T1] Call Trace:
[   23.080571][    T1] [c00000003df0faf0] [c00000000103f2d4] check_early_ioremap_leak+0xd0/0x108 (unreliable)
[   23.080607][    T1] [c00000003df0fb80] [c00000000001130c] do_one_initcall+0xcc/0x660
[   23.080648][    T1] [c00000003df0fc80] [c000000001004c18] kernel_init_freeable+0x480/0x568
[   23.080681][    T1] [c00000003df0fdb0] [c000000000012180] kernel_init+0x24/0x194
[   23.080713][    T1] [c00000003df0fe20] [c00000000000cb28] ret_from_kernel_thread+0x5c/0x74

This is from the early_ioremap_debug dmesg.

[    0.000000][    T0] ------------[ cut here ]------------
[    0.000000][    T0] __early_ioremap(0x000600c3c0010000, 00010000) [0] => 00000000 + ffffffffffbe0000
[    0.000000][    T0] WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:162 __early_ioremap+0x2d8/0x408
[    0.000000][    T0] Modules linked in:
[    0.000000][    T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc4-next-20200508+ #4
[    0.000000][    T0] NIP:  c00000000103f5e4 LR: c00000000103f5e0 CTR: c0000000001e77f0
[    0.000000][    T0] REGS: c00000000168f980 TRAP: 0700   Not tainted  (5.7.0-rc4-next-20200508+)
[    0.000000][    T0] MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28000248  XER: 20040000
[    0.000000][    T0] CFAR: c00000000010d5a8 IRQMASK: 1 
[    0.000000][    T0] GPR00: c00000000103f5e0 c00000000168fc10 c000000001689400 0000000000000050 
[    0.000000][    T0] GPR04: c00000000152f6f8 0000000000000000 c00000000168f904 0000000000000000 
[    0.000000][    T0] GPR08: 0000000000000000 0000000000000000 c00000000162f600 0000000000000002 
[    0.000000][    T0] GPR12: c0000000001e77f0 c000000005b30000 0000000000000000 0000000000000000 
[    0.000000][    T0] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000001000 
[    0.000000][    T0] GPR20: 0000000000000000 80000000000001ae 0000000000000000 0000000000000000 
[    0.000000][    T0] GPR24: 0000000000010000 c000000001061da8 0000000000000008 0000000000000008 
[    0.000000][    T0] GPR28: 0000000000000000 c000000001061db0 0000000000000000 c000000001061eb8 
[    0.000000][    T0] NIP [c00000000103f5e4] __early_ioremap+0x2d8/0x408
[    0.000000][    T0] LR [c00000000103f5e0] __early_ioremap+0x2d4/0x408
[    0.000000][    T0] Call Trace:
[    0.000000][    T0] [c00000000168fc10] [c00000000103f5e0] __early_ioremap+0x2d4/0x408 (unreliable)
[    0.000000][    T0] [c00000000168fcf0] [c00000000101d010] pnv_pci_init_ioda_phb+0x420/0xdfc
[    0.000000][    T0] [c00000000168fe10] [c00000000101c9b8] pnv_pci_init+0x12c/0x264
[    0.000000][    T0] [c00000000168fe40] [c000000001016c40] pnv_setup_arch+0x2e4/0x330
[    0.000000][    T0] [c00000000168fe80] [c000000001009dd0] setup_arch+0x3a0/0x3ec
[    0.000000][    T0] [c00000000168fef0] [c000000001003ed0] start_kernel+0xb0/0x978
[    0.000000][    T0] [c00000000168ff90] [c00000000000c790] start_here_common+0x1c/0x8c
[    0.000000][    T0] Instruction dump:
[    0.000000][    T0] 7d39ba14 3c82ff3c 3c62ff54 7f06c378 7f88e378 7fc7f378 38a10068 e9290110 
[    0.000000][    T0] 38849e90 3863e8f0 4b0cdf65 60000000 <0fe00000> 2bbe000f 409d0018 3c62fff1 
[    0.000000][    T0] irq event stamp: 0
[    0.000000][    T0] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.000000][    T0] hardirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.000000][    T0] softirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.000000][    T0] softirqs last disabled at (0): [<0000000000000000>] 0x0


^ permalink raw reply

* [PATCH -next] soc: fsl: dpio: remove set but not used variable 'addr_cena'
From: YueHaibing @ 2020-05-08 14:10 UTC (permalink / raw)
  To: Roy.Pledge, leoyang.li, youri.querry_1
  Cc: YueHaibing, linuxppc-dev, linux-kernel, linux-arm-kernel

drivers/soc/fsl/dpio//qbman-portal.c:650:11: warning: variable 'addr_cena' set but not used [-Wunused-but-set-variable]
  uint64_t addr_cena;
           ^~~~~~~~~

It is never used, so remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/soc/fsl/dpio/qbman-portal.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
index e2e9fbb58a72..0ce859b60888 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -647,7 +647,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
 	const uint32_t *cl = (uint32_t *)d;
 	uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
 	int i, num_enqueued = 0;
-	uint64_t addr_cena;
 
 	spin_lock(&s->access_spinlock);
 	half_mask = (s->eqcr.pi_ci_mask>>1);
@@ -700,7 +699,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
 
 	/* Flush all the cacheline without load/store in between */
 	eqcr_pi = s->eqcr.pi;
-	addr_cena = (size_t)s->addr_cena;
 	for (i = 0; i < num_enqueued; i++)
 		eqcr_pi++;
 	s->eqcr.pi = eqcr_pi & full_mask;
-- 
2.17.1



^ permalink raw reply related

* [PATCH -next] soc: fsl: dpio: Remove unused inline function qbman_write_eqcr_am_rt_register
From: YueHaibing @ 2020-05-08 14:09 UTC (permalink / raw)
  To: Roy.Pledge, leoyang.li, youri.querry_1
  Cc: YueHaibing, linuxppc-dev, linux-kernel, linux-arm-kernel

There's no callers in-tree anymore since commit
3b2abda7d28c ("soc: fsl: dpio: Replace QMAN array mode with ring mode enqueue")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/soc/fsl/dpio/qbman-portal.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
index 804b8ba9bf5c..e2e9fbb58a72 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -572,18 +572,6 @@ void qbman_eq_desc_set_qd(struct qbman_eq_desc *d, u32 qdid,
 #define EQAR_VB(eqar)      ((eqar) & 0x80)
 #define EQAR_SUCCESS(eqar) ((eqar) & 0x100)
 
-static inline void qbman_write_eqcr_am_rt_register(struct qbman_swp *p,
-						   u8 idx)
-{
-	if (idx < 16)
-		qbman_write_register(p, QBMAN_CINH_SWP_EQCR_AM_RT + idx * 4,
-				     QMAN_RT_MODE);
-	else
-		qbman_write_register(p, QBMAN_CINH_SWP_EQCR_AM_RT2 +
-				     (idx - 16) * 4,
-				     QMAN_RT_MODE);
-}
-
 #define QB_RT_BIT ((u32)0x100)
 /**
  * qbman_swp_enqueue_direct() - Issue an enqueue command
-- 
2.17.1



^ permalink raw reply related

* Re: [PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules
From: Michael Ellerman @ 2020-05-08 13:44 UTC (permalink / raw)
  To: Borislav Petkov, Vaibhav Jain
  Cc: Santosh Sivaraj, linux-nvdimm, Aneesh Kumar K . V,
	Cezary Rojewski, linux-kernel, Steven Rostedt, Piotr Maziarz,
	Oliver O'Halloran, Dan Williams, linuxppc-dev
In-Reply-To: <20200508113100.GA19436@zn.tnic>

Borislav Petkov <bp@alien8.de> writes:
> On Fri, May 08, 2020 at 04:19:19PM +0530, Vaibhav Jain wrote:
>> 'seq_buf' provides a very useful abstraction for writing to a string
>> buffer without needing to worry about it over-flowing. However even
>> though the API has been stable for couple of years now its stills not
>> exported to external modules limiting its usage.
>> 
>> Hence this patch proposes update to 'seq_buf.c' to mark
>> seq_buf_printf() which is part of the seq_buf API to be exported to
>> external GPL modules. This symbol will be used in later parts of this
>
> What is "external GPL modules"?

A module that has MODULE_LICENSE("GPL") ?

cheers

^ permalink raw reply

* Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
From: David Hildenbrand @ 2020-05-08 13:42 UTC (permalink / raw)
  To: Srikar Dronamraju, Michal Hocko
  Cc: Linus Torvalds, linux-kernel, linux-mm, Mel Gorman,
	Kirill A. Shutemov, Andrew Morton, linuxppc-dev,
	Christopher Lameter, Vlastimil Babka
In-Reply-To: <3bfe7469-1d8c-baa4-6d9d-f4786492eaa8@redhat.com>

On 08.05.20 15:39, David Hildenbrand wrote:
> On 08.05.20 15:03, Srikar Dronamraju wrote:
>> * Michal Hocko <mhocko@kernel.org> [2020-05-04 11:37:12]:
>>
>>>>>
>>>>> Have you tested on something else than ppc? Each arch does the NUMA
>>>>> setup separately and this is a big mess. E.g. x86 marks even memory less
>>>>> nodes (see init_memory_less_node) as online.
>>>>>
>>>>
>>>> while I have predominantly tested on ppc, I did test on X86 with CONFIG_NUMA
>>>> enabled/disabled on both single node and multi node machines.
>>>> However, I dont have a cpuless/memoryless x86 system.
>>>
>>> This should be able to emulate inside kvm, I believe.
>>>
>>
>> I did try but somehow not able to get cpuless / memoryless node in a x86 kvm
>> guest.
> 
> I use the following
> 
> #! /bin/bash
> sudo x86_64-softmmu/qemu-system-x86_64 \
>     --enable-kvm \
>     -m 4G,maxmem=20G,slots=2 \
>     -smp sockets=2,cores=2 \
>     -numa node,nodeid=0,cpus=0-1,mem=4G -numa node,nodeid=1,cpus=2-3,mem=0G \

Sorry, this line has to be

-numa node,nodeid=0,cpus=0-3,mem=4G -numa node,nodeid=1,mem=0G \

>     -kernel /home/dhildenb/git/linux/arch/x86_64/boot/bzImage \
>     -append "console=ttyS0 rd.shell rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0" \
>     -initrd /boot/initramfs-5.2.8-200.fc30.x86_64.img \
>     -machine pc,nvdimm \
>     -nographic \
>     -nodefaults \
>     -chardev stdio,id=serial \
>     -device isa-serial,chardev=serial \
>     -chardev socket,id=monitor,path=/var/tmp/monitor,server,nowait \
>     -mon chardev=monitor,mode=readline
> 
> to get a cpu-less and memory-less node 1. Never tried with node 0.
> 


-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
From: David Hildenbrand @ 2020-05-08 13:39 UTC (permalink / raw)
  To: Srikar Dronamraju, Michal Hocko
  Cc: Linus Torvalds, linux-kernel, linux-mm, Mel Gorman,
	Kirill A. Shutemov, Andrew Morton, linuxppc-dev,
	Christopher Lameter, Vlastimil Babka
In-Reply-To: <20200508130304.GA1961@linux.vnet.ibm.com>

On 08.05.20 15:03, Srikar Dronamraju wrote:
> * Michal Hocko <mhocko@kernel.org> [2020-05-04 11:37:12]:
> 
>>>>
>>>> Have you tested on something else than ppc? Each arch does the NUMA
>>>> setup separately and this is a big mess. E.g. x86 marks even memory less
>>>> nodes (see init_memory_less_node) as online.
>>>>
>>>
>>> while I have predominantly tested on ppc, I did test on X86 with CONFIG_NUMA
>>> enabled/disabled on both single node and multi node machines.
>>> However, I dont have a cpuless/memoryless x86 system.
>>
>> This should be able to emulate inside kvm, I believe.
>>
> 
> I did try but somehow not able to get cpuless / memoryless node in a x86 kvm
> guest.

I use the following

#! /bin/bash
sudo x86_64-softmmu/qemu-system-x86_64 \
    --enable-kvm \
    -m 4G,maxmem=20G,slots=2 \
    -smp sockets=2,cores=2 \
    -numa node,nodeid=0,cpus=0-1,mem=4G -numa node,nodeid=1,cpus=2-3,mem=0G \
    -kernel /home/dhildenb/git/linux/arch/x86_64/boot/bzImage \
    -append "console=ttyS0 rd.shell rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0" \
    -initrd /boot/initramfs-5.2.8-200.fc30.x86_64.img \
    -machine pc,nvdimm \
    -nographic \
    -nodefaults \
    -chardev stdio,id=serial \
    -device isa-serial,chardev=serial \
    -chardev socket,id=monitor,path=/var/tmp/monitor,server,nowait \
    -mon chardev=monitor,mode=readline

to get a cpu-less and memory-less node 1. Never tried with node 0.

-- 
Thanks,

David / dhildenb


^ permalink raw reply

* Re: [PATCH v4 04/16] powerpc/64s/exceptions: machine check reconcile irq state
From: Michael Ellerman @ 2020-05-08 13:39 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200508043408.886394-5-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> pseries fwnmi machine check code pops the soft-irq checks in rtas_call
> (after the previous patch to remove rtas_token from this call path).
             ^
             I changed this to "next" which I think is what you meant?

cheers

> Rather than play whack a mole with these and forever having fragile
> code, it seems better to have the early machine check handler perform
> the same kind of reconcile as the other NMI interrupts.
>
>   WARNING: CPU: 0 PID: 493 at arch/powerpc/kernel/irq.c:343
>   CPU: 0 PID: 493 Comm: a Tainted: G        W
>   NIP:  c00000000001ed2c LR: c000000000042c40 CTR: 0000000000000000
>   REGS: c0000001fffd38b0 TRAP: 0700   Tainted: G        W
>   MSR:  8000000000021003 <SF,ME,RI,LE>  CR: 28000488  XER: 00000000
>   CFAR: c00000000001ec90 IRQMASK: 0
>   GPR00: c000000000043820 c0000001fffd3b40 c0000000012ba300 0000000000000000
>   GPR04: 0000000048000488 0000000000000000 0000000000000000 00000000deadbeef
>   GPR08: 0000000000000080 0000000000000000 0000000000000000 0000000000001001
>   GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
>   GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR28: 0000000000000000 0000000000000001 c000000001360810 0000000000000000
>   NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
>   LR [c000000000042c40] unlock_rtas+0x30/0x90
>   Call Trace:
>   [c0000001fffd3b40] [c000000001360810] 0xc000000001360810 (unreliable)
>   [c0000001fffd3b60] [c000000000043820] rtas_call+0x1c0/0x280
>   [c0000001fffd3bb0] [c0000000000dc328] fwnmi_release_errinfo+0x38/0x70
>   [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540
>   [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70
>   [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0
>   --- interrupt: 200 at 0x13f1307c8
>       LR = 0x7fff888b8528
>   Instruction dump:
>   60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
>   60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index a42b73efb1a9..072772803b7c 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1116,11 +1116,30 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>  	li	r10,MSR_RI
>  	mtmsrd	r10,1
>  
> +	/*
> +	 * Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see
> +	 * system_reset_common)
> +	 */
> +	li	r10,IRQS_ALL_DISABLED
> +	stb	r10,PACAIRQSOFTMASK(r13)
> +	lbz	r10,PACAIRQHAPPENED(r13)
> +	std	r10,RESULT(r1)
> +	ori	r10,r10,PACA_IRQ_HARD_DIS
> +	stb	r10,PACAIRQHAPPENED(r13)
> +
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
>  	bl	machine_check_early
>  	std	r3,RESULT(r1)	/* Save result */
>  	ld	r12,_MSR(r1)
>  
> +	/*
> +	 * Restore soft mask settings.
> +	 */
> +	ld	r10,RESULT(r1)
> +	stb	r10,PACAIRQHAPPENED(r13)
> +	ld	r10,SOFTE(r1)
> +	stb	r10,PACAIRQSOFTMASK(r13)
> +
>  #ifdef CONFIG_PPC_P7_NAP
>  	/*
>  	 * Check if thread was in power saving mode. We come here when any
> -- 
> 2.23.0

^ permalink raw reply

* Re: [PATCH v3 1/3] powerpc/numa: Set numa_node for all possible cpus
From: Srikar Dronamraju @ 2020-05-08 13:21 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Gautham R Shenoy, Michal Hocko, Linus Torvalds, linux-kernel,
	linux-mm, Mel Gorman, Kirill A. Shutemov, Andrew Morton,
	linuxppc-dev, Vlastimil Babka
In-Reply-To: <alpine.DEB.2.21.2005022254170.28355@www.lameter.com>

* Christopher Lameter <cl@linux.com> [2020-05-02 22:55:16]:

> On Fri, 1 May 2020, Srikar Dronamraju wrote:
> 
> > -	for_each_present_cpu(cpu)
> > -		numa_setup_cpu(cpu);
> > +	for_each_possible_cpu(cpu) {
> > +		/*
> > +		 * Powerpc with CONFIG_NUMA always used to have a node 0,
> > +		 * even if it was memoryless or cpuless. For all cpus that
> > +		 * are possible but not present, cpu_to_node() would point
> > +		 * to node 0. To remove a cpuless, memoryless dummy node,
> > +		 * powerpc need to make sure all possible but not present
> > +		 * cpu_to_node are set to a proper node.
> > +		 */
> > +		if (cpu_present(cpu))
> > +			numa_setup_cpu(cpu);
> > +		else
> > +			set_cpu_numa_node(cpu, first_online_node);
> > +	}
> >  }
> 
> 
> Can this be folded into numa_setup_cpu?
> 
> This looks more like numa_setup_cpu needs to change?
> 

We can fold this into numa_setup_cpu().

However till now we were sure that numa_setup_cpu() would be called only for
a present cpu. That assumption will change.
+ (non-consequential) an additional check everytime cpu is hotplugged in.

If Michael Ellerman is okay with the change, I can fold it in.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* [PATCH v2] powerpc/spufs: Rework fcheck() usage
From: Michael Ellerman @ 2020-05-08 13:06 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: hch, jk

Currently the spu coredump code triggers an RCU warning:

  =============================
  WARNING: suspicious RCU usage
  5.7.0-rc3-01755-g7cd49f0b7ec7 #1 Not tainted
  -----------------------------
  include/linux/fdtable.h:95 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by spu-coredump/1343:
   #0: c0000007fa22f430 (sb_writers#2){.+.+}-{0:0}, at: .do_coredump+0x1010/0x13c8

  stack backtrace:
  CPU: 0 PID: 1343 Comm: spu-coredump Not tainted 5.7.0-rc3-01755-g7cd49f0b7ec7 #1
  Call Trace:
    .dump_stack+0xec/0x15c (unreliable)
    .lockdep_rcu_suspicious+0x120/0x144
    .coredump_next_context+0x148/0x158
    .spufs_coredump_extra_notes_size+0x54/0x190
    .elf_coredump_extra_notes_size+0x34/0x50
    .elf_core_dump+0xe48/0x19d0
    .do_coredump+0xe50/0x13c8
    .get_signal+0x864/0xd88
    .do_notify_resume+0x158/0x3c8
    .interrupt_exit_user_prepare+0x19c/0x208
    interrupt_return+0x14/0x1c0

This comes from fcheck_files() via fcheck().

It's pretty clearly documented that fcheck() must be wrapped with
rcu_read_lock(), adding that fixes the RCU warning.

hch points out that once we've released the RCU read lock the file may
be closed and freed, which would leave us with a pointer to a freed
spu_context.

To avoid that, take a reference to the spu_context while we hold the
RCU read lock, and drop that reference later once we're done with the
context.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/platforms/cell/spufs/coredump.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

v2: Take a reference and hold it until we're done.

diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c
index 8b3296b62f65..37c155254cd5 100644
--- a/arch/powerpc/platforms/cell/spufs/coredump.c
+++ b/arch/powerpc/platforms/cell/spufs/coredump.c
@@ -82,13 +82,20 @@ static int match_context(const void *v, struct file *file, unsigned fd)
  */
 static struct spu_context *coredump_next_context(int *fd)
 {
+	struct spu_context *ctx;
 	struct file *file;
 	int n = iterate_fd(current->files, *fd, match_context, NULL);
 	if (!n)
 		return NULL;
 	*fd = n - 1;
+
+	rcu_read_lock();
 	file = fcheck(*fd);
-	return SPUFS_I(file_inode(file))->i_ctx;
+	ctx = SPUFS_I(file_inode(file))->i_ctx;
+	get_spu_context(ctx);
+	rcu_read_unlock();
+
+	return ctx;
 }
 
 int spufs_coredump_extra_notes_size(void)
@@ -99,17 +106,23 @@ int spufs_coredump_extra_notes_size(void)
 	fd = 0;
 	while ((ctx = coredump_next_context(&fd)) != NULL) {
 		rc = spu_acquire_saved(ctx);
-		if (rc)
+		if (rc) {
+			put_spu_context(ctx);
 			break;
+		}
+
 		rc = spufs_ctx_note_size(ctx, fd);
 		spu_release_saved(ctx);
-		if (rc < 0)
+		if (rc < 0) {
+			put_spu_context(ctx);
 			break;
+		}
 
 		size += rc;
 
 		/* start searching the next fd next time */
 		fd++;
+		put_spu_context(ctx);
 	}
 
 	return size;
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v3 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
From: Srikar Dronamraju @ 2020-05-08 13:05 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Gautham R Shenoy, Michal Hocko, Linus Torvalds, linux-kernel,
	linux-mm, Mel Gorman, Kirill A. Shutemov, Andrew Morton,
	linuxppc-dev, Vlastimil Babka
In-Reply-To: <alpine.DEB.2.21.2005022304190.28355@www.lameter.com>

* Christopher Lameter <cl@linux.com> [2020-05-02 23:05:28]:

> On Fri, 1 May 2020, Srikar Dronamraju wrote:
> 
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy);
> >   */
> >  nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
> >  	[N_POSSIBLE] = NODE_MASK_ALL,
> > +#ifdef CONFIG_NUMA
> > +	[N_ONLINE] = NODE_MASK_NONE,
> 
> Hmmm.... I would have expected that you would have added something early
> in boot that would mark the current node (whatever is is) online instead?

Do correct me, but these are structure initialization in page_alloc.c
Wouldn't these happen much before the numa initialization happens?
I think we are already marking nodes as online as soon as we detect the
nodes.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* [tip: perf/core] perf metricgroups: Enhance JSON/metric infrastructure to handle "?"
From: tip-bot2 for Kajol Jain @ 2020-05-08 13:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Madhavan Srinivasan, Peter Zijlstra, Jin Yao,
	Jiri Olsa, Kan Liang, Andi Kleen, x86, Alexander Shishkin,
	Anju T Sudhakar, Mamatha Inamdar, Sukadev Bhattiprolu,
	Ravi Bangoria, Kajol Jain, Arnaldo Carvalho de Melo, Joe Mario,
	Namhyung Kim, Thomas Gleixner, Michael Petlan, Greg Kroah-Hartman,
	LKML, linuxppc-dev
In-Reply-To: <20200401203340.31402-5-kjain@linux.ibm.com>

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     1e1a873dc67fc748cc319a27603f33db91027730
Gitweb:        https://git.kernel.org/tip/1e1a873dc67fc748cc319a27603f33db91027730
Author:        Kajol Jain <kjain@linux.ibm.com>
AuthorDate:    Thu, 02 Apr 2020 02:03:37 +05:30
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Thu, 30 Apr 2020 10:48:33 -03:00

perf metricgroups: Enhance JSON/metric infrastructure to handle "?"

Patch enhances current metric infrastructure to handle "?" in the metric
expression. The "?" can be use for parameters whose value not known
while creating metric events and which can be replace later at runtime
to the proper value. It also add flexibility to create multiple events
out of single metric event added in JSON file.

Patch adds function 'arch_get_runtimeparam' which is a arch specific
function, returns the count of metric events need to be created.  By
default it return 1.

This infrastructure needed for hv_24x7 socket/chip level events.
"hv_24x7" chip level events needs specific chip-id to which the data is
requested. Function 'arch_get_runtimeparam' implemented in header.c
which extract number of sockets from sysfs file "sockets" under
"/sys/devices/hv_24x7/interface/".

With this patch basically we are trying to create as many metric events
as define by runtime_param.

For that one loop is added in function 'metricgroup__add_metric', which
create multiple events at run time depend on return value of
'arch_get_runtimeparam' and merge that event in 'group_list'.

To achieve that we are actually passing this parameter value as part of
`expr__find_other` function and changing "?" present in metric
expression with this value.

As in our JSON file, there gonna be single metric event, and out of
which we are creating multiple events.

To understand which data count belongs to which parameter value,
we also printing param value in generic_metric function.

For example,

  command:# ./perf stat  -M PowerBUS_Frequency -C 0 -I 1000
    1.000101867  9,356,933  hv_24x7/pm_pb_cyc,chip=0/ #  2.3 GHz  PowerBUS_Frequency_0
    1.000101867  9,366,134  hv_24x7/pm_pb_cyc,chip=1/ #  2.3 GHz  PowerBUS_Frequency_1
    2.000314878  9,365,868  hv_24x7/pm_pb_cyc,chip=0/ #  2.3 GHz  PowerBUS_Frequency_0
    2.000314878  9,366,092  hv_24x7/pm_pb_cyc,chip=1/ #  2.3 GHz  PowerBUS_Frequency_1

So, here _0 and _1 after PowerBUS_Frequency specify parameter value.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-5-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/powerpc/util/header.c |  8 +++++++-
 tools/perf/tests/expr.c               |  8 +++----
 tools/perf/util/expr.c                | 11 +++++-----
 tools/perf/util/expr.h                |  5 +++--
 tools/perf/util/expr.l                | 27 ++++++++++++++++++-------
 tools/perf/util/metricgroup.c         | 28 +++++++++++++++++++++++---
 tools/perf/util/metricgroup.h         |  2 ++-
 tools/perf/util/stat-shadow.c         | 17 ++++++++++------
 8 files changed, 79 insertions(+), 27 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/header.c b/tools/perf/arch/powerpc/util/header.c
index 3b4cdfc..d487007 100644
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@@ -7,6 +7,8 @@
 #include <string.h>
 #include <linux/stringify.h>
 #include "header.h"
+#include "metricgroup.h"
+#include <api/fs/fs.h>
 
 #define mfspr(rn)       ({unsigned long rval; \
 			 asm volatile("mfspr %0," __stringify(rn) \
@@ -44,3 +46,9 @@ get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
 
 	return bufp;
 }
+
+int arch_get_runtimeparam(void)
+{
+	int count;
+	return sysfs__read_int("/devices/hv_24x7/interface/sockets", &count) < 0 ? 1 : count;
+}
diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index ea10fc4..516504c 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -10,7 +10,7 @@ static int test(struct expr_parse_ctx *ctx, const char *e, double val2)
 {
 	double val;
 
-	if (expr__parse(&val, ctx, e))
+	if (expr__parse(&val, ctx, e, 1))
 		TEST_ASSERT_VAL("parse test failed", 0);
 	TEST_ASSERT_VAL("unexpected value", val == val2);
 	return 0;
@@ -44,15 +44,15 @@ int test__expr(struct test *t __maybe_unused, int subtest __maybe_unused)
 		return ret;
 
 	p = "FOO/0";
-	ret = expr__parse(&val, &ctx, p);
+	ret = expr__parse(&val, &ctx, p, 1);
 	TEST_ASSERT_VAL("division by zero", ret == -1);
 
 	p = "BAR/";
-	ret = expr__parse(&val, &ctx, p);
+	ret = expr__parse(&val, &ctx, p, 1);
 	TEST_ASSERT_VAL("missing operand", ret == -1);
 
 	TEST_ASSERT_VAL("find other",
-			expr__find_other("FOO + BAR + BAZ + BOZO", "FOO", &other, &num_other) == 0);
+			expr__find_other("FOO + BAR + BAZ + BOZO", "FOO", &other, &num_other, 1) == 0);
 	TEST_ASSERT_VAL("find other", num_other == 3);
 	TEST_ASSERT_VAL("find other", !strcmp(other[0], "BAR"));
 	TEST_ASSERT_VAL("find other", !strcmp(other[1], "BAZ"));
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index c3382d5..aa631e3 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -27,10 +27,11 @@ void expr__ctx_init(struct expr_parse_ctx *ctx)
 
 static int
 __expr__parse(double *val, struct expr_parse_ctx *ctx, const char *expr,
-	      int start)
+	      int start, int runtime)
 {
 	struct expr_scanner_ctx scanner_ctx = {
 		.start_token = start,
+		.runtime = runtime,
 	};
 	YY_BUFFER_STATE buffer;
 	void *scanner;
@@ -54,9 +55,9 @@ __expr__parse(double *val, struct expr_parse_ctx *ctx, const char *expr,
 	return ret;
 }
 
-int expr__parse(double *final_val, struct expr_parse_ctx *ctx, const char *expr)
+int expr__parse(double *final_val, struct expr_parse_ctx *ctx, const char *expr, int runtime)
 {
-	return __expr__parse(final_val, ctx, expr, EXPR_PARSE) ? -1 : 0;
+	return __expr__parse(final_val, ctx, expr, EXPR_PARSE, runtime) ? -1 : 0;
 }
 
 static bool
@@ -74,13 +75,13 @@ already_seen(const char *val, const char *one, const char **other,
 }
 
 int expr__find_other(const char *expr, const char *one, const char ***other,
-		     int *num_other)
+		     int *num_other, int runtime)
 {
 	int err, i = 0, j = 0;
 	struct expr_parse_ctx ctx;
 
 	expr__ctx_init(&ctx);
-	err = __expr__parse(NULL, &ctx, expr, EXPR_OTHER);
+	err = __expr__parse(NULL, &ctx, expr, EXPR_OTHER, runtime);
 	if (err)
 		return -1;
 
diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h
index 0938ad1..87d627b 100644
--- a/tools/perf/util/expr.h
+++ b/tools/perf/util/expr.h
@@ -17,12 +17,13 @@ struct expr_parse_ctx {
 
 struct expr_scanner_ctx {
 	int start_token;
+	int runtime;
 };
 
 void expr__ctx_init(struct expr_parse_ctx *ctx);
 void expr__add_id(struct expr_parse_ctx *ctx, const char *id, double val);
-int expr__parse(double *final_val, struct expr_parse_ctx *ctx, const char *expr);
+int expr__parse(double *final_val, struct expr_parse_ctx *ctx, const char *expr, int runtime);
 int expr__find_other(const char *expr, const char *one, const char ***other,
-		int *num_other);
+		int *num_other, int runtime);
 
 #endif
diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l
index 2582c24..74b9b59 100644
--- a/tools/perf/util/expr.l
+++ b/tools/perf/util/expr.l
@@ -35,7 +35,7 @@ static int value(yyscan_t scanner, int base)
  * Allow @ instead of / to be able to specify pmu/event/ without
  * conflicts with normal division.
  */
-static char *normalize(char *str)
+static char *normalize(char *str, int runtime)
 {
 	char *ret = str;
 	char *dst = str;
@@ -45,6 +45,19 @@ static char *normalize(char *str)
 			*dst++ = '/';
 		else if (*str == '\\')
 			*dst++ = *++str;
+		 else if (*str == '?') {
+			char *paramval;
+			int i = 0;
+			int size = asprintf(&paramval, "%d", runtime);
+
+			if (size < 0)
+				*dst++ = '0';
+			else {
+				while (i < size)
+					*dst++ = paramval[i++];
+				free(paramval);
+			}
+		}
 		else
 			*dst++ = *str;
 		str++;
@@ -54,16 +67,16 @@ static char *normalize(char *str)
 	return ret;
 }
 
-static int str(yyscan_t scanner, int token)
+static int str(yyscan_t scanner, int token, int runtime)
 {
 	YYSTYPE *yylval = expr_get_lval(scanner);
 	char *text = expr_get_text(scanner);
 
-	yylval->str = normalize(strdup(text));
+	yylval->str = normalize(strdup(text), runtime);
 	if (!yylval->str)
 		return EXPR_ERROR;
 
-	yylval->str = normalize(yylval->str);
+	yylval->str = normalize(yylval->str, runtime);
 	return token;
 }
 %}
@@ -72,8 +85,8 @@ number		[0-9]+
 
 sch		[-,=]
 spec		\\{sch}
-sym		[0-9a-zA-Z_\.:@]+
-symbol		{spec}*{sym}*{spec}*{sym}*
+sym		[0-9a-zA-Z_\.:@?]+
+symbol		{spec}*{sym}*{spec}*{sym}*{spec}*{sym}
 
 %%
 	struct expr_scanner_ctx *sctx = expr_get_extra(yyscanner);
@@ -93,7 +106,7 @@ if		{ return IF; }
 else		{ return ELSE; }
 #smt_on		{ return SMT_ON; }
 {number}	{ return value(yyscanner, 10); }
-{symbol}	{ return str(yyscanner, ID); }
+{symbol}	{ return str(yyscanner, ID, sctx->runtime); }
 "|"		{ return '|'; }
 "^"		{ return '^'; }
 "&"		{ return '&'; }
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 7ad81c8..b071df3 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -90,6 +90,7 @@ struct egroup {
 	const char *metric_name;
 	const char *metric_expr;
 	const char *metric_unit;
+	int runtime;
 };
 
 static struct evsel *find_evsel_group(struct evlist *perf_evlist,
@@ -202,6 +203,7 @@ static int metricgroup__setup_events(struct list_head *groups,
 		expr->metric_name = eg->metric_name;
 		expr->metric_unit = eg->metric_unit;
 		expr->metric_events = metric_events;
+		expr->runtime = eg->runtime;
 		list_add(&expr->nd, &me->head);
 	}
 
@@ -485,15 +487,20 @@ static bool metricgroup__has_constraint(struct pmu_event *pe)
 	return false;
 }
 
+int __weak arch_get_runtimeparam(void)
+{
+	return 1;
+}
+
 static int __metricgroup__add_metric(struct strbuf *events,
-			struct list_head *group_list, struct pmu_event *pe)
+		struct list_head *group_list, struct pmu_event *pe, int runtime)
 {
 
 	const char **ids;
 	int idnum;
 	struct egroup *eg;
 
-	if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum) < 0)
+	if (expr__find_other(pe->metric_expr, NULL, &ids, &idnum, runtime) < 0)
 		return -EINVAL;
 
 	if (events->len > 0)
@@ -513,6 +520,7 @@ static int __metricgroup__add_metric(struct strbuf *events,
 	eg->metric_name = pe->metric_name;
 	eg->metric_expr = pe->metric_expr;
 	eg->metric_unit = pe->unit;
+	eg->runtime = runtime;
 	list_add_tail(&eg->nd, group_list);
 
 	return 0;
@@ -540,7 +548,21 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events,
 
 			pr_debug("metric expr %s for %s\n", pe->metric_expr, pe->metric_name);
 
-			ret = __metricgroup__add_metric(events,	group_list, pe);
+			if (!strstr(pe->metric_expr, "?")) {
+				ret = __metricgroup__add_metric(events, group_list, pe, 1);
+			} else {
+				int j, count;
+
+				count = arch_get_runtimeparam();
+
+				/* This loop is added to create multiple
+				 * events depend on count value and add
+				 * those events to group_list.
+				 */
+
+				for (j = 0; j < count; j++)
+					ret = __metricgroup__add_metric(events, group_list, pe, j);
+			}
 			if (ret == -ENOMEM)
 				break;
 		}
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 475c7f9..6b09eb3 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -22,6 +22,7 @@ struct metric_expr {
 	const char *metric_name;
 	const char *metric_unit;
 	struct evsel **metric_events;
+	int runtime;
 };
 
 struct metric_event *metricgroup__lookup(struct rblist *metric_events,
@@ -34,4 +35,5 @@ int metricgroup__parse_groups(const struct option *opt,
 void metricgroup__print(bool metrics, bool groups, char *filter,
 			bool raw, bool details);
 bool metricgroup__has_metric(const char *metric);
+int arch_get_runtimeparam(void);
 #endif
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 1ad5c5b..518fbb3 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -336,7 +336,7 @@ void perf_stat__collect_metric_expr(struct evlist *evsel_list)
 		metric_events = counter->metric_events;
 		if (!metric_events) {
 			if (expr__find_other(counter->metric_expr, counter->name,
-						&metric_names, &num_metric_names) < 0)
+						&metric_names, &num_metric_names, 1) < 0)
 				continue;
 
 			metric_events = calloc(sizeof(struct evsel *),
@@ -723,6 +723,7 @@ static void generic_metric(struct perf_stat_config *config,
 			   char *name,
 			   const char *metric_name,
 			   const char *metric_unit,
+			   int runtime,
 			   double avg,
 			   int cpu,
 			   struct perf_stat_output_ctx *out,
@@ -777,7 +778,7 @@ static void generic_metric(struct perf_stat_config *config,
 	}
 
 	if (!metric_events[i]) {
-		if (expr__parse(&ratio, &pctx, metric_expr) == 0) {
+		if (expr__parse(&ratio, &pctx, metric_expr, runtime) == 0) {
 			char *unit;
 			char metric_bf[64];
 
@@ -786,9 +787,13 @@ static void generic_metric(struct perf_stat_config *config,
 					&unit, &scale) >= 0) {
 					ratio *= scale;
 				}
-
-				scnprintf(metric_bf, sizeof(metric_bf),
+				if (strstr(metric_expr, "?"))
+					scnprintf(metric_bf, sizeof(metric_bf),
+					  "%s  %s_%d", unit, metric_name, runtime);
+				else
+					scnprintf(metric_bf, sizeof(metric_bf),
 					  "%s  %s", unit, metric_name);
+
 				print_metric(config, ctxp, NULL, "%8.1f",
 					     metric_bf, ratio);
 			} else {
@@ -1022,7 +1027,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 			print_metric(config, ctxp, NULL, NULL, name, 0);
 	} else if (evsel->metric_expr) {
 		generic_metric(config, evsel->metric_expr, evsel->metric_events, evsel->name,
-				evsel->metric_name, NULL, avg, cpu, out, st);
+				evsel->metric_name, NULL, 1, avg, cpu, out, st);
 	} else if (runtime_stat_n(st, STAT_NSECS, 0, cpu) != 0) {
 		char unit = 'M';
 		char unit_buf[10];
@@ -1051,7 +1056,7 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				out->new_line(config, ctxp);
 			generic_metric(config, mexp->metric_expr, mexp->metric_events,
 					evsel->name, mexp->metric_name,
-					mexp->metric_unit, avg, cpu, out, st);
+					mexp->metric_unit, mexp->runtime, avg, cpu, out, st);
 		}
 	}
 	if (num == 0)

^ permalink raw reply related

* [tip: perf/core] perf vendor events power9: Add hv_24x7 socket/chip level metric events
From: tip-bot2 for Kajol Jain @ 2020-05-08 13:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Madhavan Srinivasan, Peter Zijlstra, Jin Yao,
	Jiri Olsa, Kan Liang, Andi Kleen, x86, Alexander Shishkin,
	Anju T Sudhakar, Mamatha Inamdar, Sukadev Bhattiprolu,
	Ravi Bangoria, Kajol Jain, Arnaldo Carvalho de Melo, Joe Mario,
	Namhyung Kim, Thomas Gleixner, Michael Petlan, Greg Kroah-Hartman,
	LKML, linuxppc-dev
In-Reply-To: <20200401203340.31402-8-kjain@linux.ibm.com>

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     354575c00d61c174e0ff070f56cf3cdbe6d23f9e
Gitweb:        https://git.kernel.org/tip/354575c00d61c174e0ff070f56cf3cdbe6d23f9e
Author:        Kajol Jain <kjain@linux.ibm.com>
AuthorDate:    Thu, 02 Apr 2020 02:03:40 +05:30
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Thu, 30 Apr 2020 10:48:33 -03:00

perf vendor events power9: Add hv_24x7 socket/chip level metric events

The hv_24×7 feature in IBM® POWER9™ processor-based servers provide the
facility to continuously collect large numbers of hardware performance
metrics efficiently and accurately.

This patch adds hv_24x7  metric file for different Socket/chip
resources.

Result:

power9 platform:

  command:# ./perf stat --metric-only -M Memory_RD_BW_Chip -C 0 -I 1000

     1.000096188          0.9           0.3
     2.000285720          0.5           0.1
     3.000424990          0.4           0.1

  command:# ./perf stat --metric-only -M PowerBUS_Frequency -C 0 -I 1000

     1.000097981          2.3           2.3
     2.000291713          2.3           2.3
     3.000421719          2.3           2.3
     4.000550912          2.3           2.3

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-8-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json | 19 +++++++-
 1 file changed, 19 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json

diff --git a/tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json b/tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json
new file mode 100644
index 0000000..c121e52
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json
@@ -0,0 +1,19 @@
+[
+    {
+        "MetricExpr": "(hv_24x7@PM_MCS01_128B_RD_DISP_PORT01\\,chip\\=?@ + hv_24x7@PM_MCS01_128B_RD_DISP_PORT23\\,chip\\=?@ + hv_24x7@PM_MCS23_128B_RD_DISP_PORT01\\,chip\\=?@ + hv_24x7@PM_MCS23_128B_RD_DISP_PORT23\\,chip\\=?@)",
+        "MetricName": "Memory_RD_BW_Chip",
+        "MetricGroup": "Memory_BW",
+        "ScaleUnit": "1.6e-2MB"
+    },
+    {
+	"MetricExpr": "(hv_24x7@PM_MCS01_128B_WR_DISP_PORT01\\,chip\\=?@ + hv_24x7@PM_MCS01_128B_WR_DISP_PORT23\\,chip\\=?@ + hv_24x7@PM_MCS23_128B_WR_DISP_PORT01\\,chip\\=?@ + hv_24x7@PM_MCS23_128B_WR_DISP_PORT23\\,chip\\=?@ )",
+        "MetricName": "Memory_WR_BW_Chip",
+        "MetricGroup": "Memory_BW",
+        "ScaleUnit": "1.6e-2MB"
+    },
+    {
+	"MetricExpr": "(hv_24x7@PM_PB_CYC\\,chip\\=?@ )",
+        "MetricName": "PowerBUS_Frequency",
+        "ScaleUnit": "2.5e-7GHz"
+    }
+]

^ permalink raw reply related

* [tip: perf/core] perf tests expr: Added test for runtime param in metric expression
From: tip-bot2 for Kajol Jain @ 2020-05-08 13:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Madhavan Srinivasan, Peter Zijlstra, Jin Yao,
	Jiri Olsa, Kan Liang, Andi Kleen, x86, Alexander Shishkin,
	Anju T Sudhakar, Mamatha Inamdar, Sukadev Bhattiprolu,
	Ravi Bangoria, Kajol Jain, Arnaldo Carvalho de Melo, Joe Mario,
	Namhyung Kim, Thomas Gleixner, Michael Petlan, Greg Kroah-Hartman,
	LKML, linuxppc-dev
In-Reply-To: <20200401203340.31402-6-kjain@linux.ibm.com>

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     9022608ec5babbb0fa631234098d52895e7e34d8
Gitweb:        https://git.kernel.org/tip/9022608ec5babbb0fa631234098d52895e7e34d8
Author:        Kajol Jain <kjain@linux.ibm.com>
AuthorDate:    Thu, 02 Apr 2020 02:03:38 +05:30
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Thu, 30 Apr 2020 10:48:33 -03:00

perf tests expr: Added test for runtime param in metric expression

Added test case for parsing  "?" in metric expression.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-6-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/expr.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index 516504c..f9e8e56 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -59,6 +59,14 @@ int test__expr(struct test *t __maybe_unused, int subtest __maybe_unused)
 	TEST_ASSERT_VAL("find other", !strcmp(other[2], "BOZO"));
 	TEST_ASSERT_VAL("find other", other[3] == NULL);
 
+	TEST_ASSERT_VAL("find other",
+			expr__find_other("EVENT1\\,param\\=?@ + EVENT2\\,param\\=?@", NULL,
+				   &other, &num_other, 3) == 0);
+	TEST_ASSERT_VAL("find other", num_other == 2);
+	TEST_ASSERT_VAL("find other", !strcmp(other[0], "EVENT1,param=3/"));
+	TEST_ASSERT_VAL("find other", !strcmp(other[1], "EVENT2,param=3/"));
+	TEST_ASSERT_VAL("find other", other[2] == NULL);
+
 	for (i = 0; i < num_other; i++)
 		zfree(&other[i]);
 	free((void *)other);

^ permalink raw reply related

* [tip: perf/core] perf tools: Enable Hz/hz prinitg for --metric-only option
From: tip-bot2 for Kajol Jain @ 2020-05-08 13:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mark Rutland, Madhavan Srinivasan, Peter Zijlstra, Jin Yao,
	Jiri Olsa, Kan Liang, Andi Kleen, x86, Alexander Shishkin,
	Anju T Sudhakar, Mamatha Inamdar, Sukadev Bhattiprolu,
	Ravi Bangoria, Kajol Jain, Arnaldo Carvalho de Melo, Joe Mario,
	Namhyung Kim, Thomas Gleixner, Michael Petlan, Greg Kroah-Hartman,
	LKML, linuxppc-dev
In-Reply-To: <20200401203340.31402-7-kjain@linux.ibm.com>

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     3351c6da896bf521b118bfbb699fbda8f2a816b3
Gitweb:        https://git.kernel.org/tip/3351c6da896bf521b118bfbb699fbda8f2a816b3
Author:        Kajol Jain <kjain@linux.ibm.com>
AuthorDate:    Thu, 02 Apr 2020 02:03:39 +05:30
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Thu, 30 Apr 2020 10:48:33 -03:00

perf tools: Enable Hz/hz prinitg for --metric-only option

Commit 54b5091606c18 ("perf stat: Implement --metric-only mode") added
function 'valid_only_metric()' which drops "Hz" or "hz", if it is part
of "ScaleUnit". This patch enable it since hv_24x7 supports couple of
frequency events.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-7-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/stat-display.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 9e757d1..679aaa6 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -237,8 +237,6 @@ static bool valid_only_metric(const char *unit)
 	if (!unit)
 		return false;
 	if (strstr(unit, "/sec") ||
-	    strstr(unit, "hz") ||
-	    strstr(unit, "Hz") ||
 	    strstr(unit, "CPUs utilized"))
 		return false;
 	return true;

^ permalink raw reply related

* Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
From: Srikar Dronamraju @ 2020-05-08 13:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, linux-kernel, linux-mm, Mel Gorman,
	Kirill A. Shutemov, Andrew Morton, linuxppc-dev,
	Christopher Lameter, Vlastimil Babka
In-Reply-To: <20200504093712.GL22838@dhcp22.suse.cz>

* Michal Hocko <mhocko@kernel.org> [2020-05-04 11:37:12]:

> > > 
> > > Have you tested on something else than ppc? Each arch does the NUMA
> > > setup separately and this is a big mess. E.g. x86 marks even memory less
> > > nodes (see init_memory_less_node) as online.
> > > 
> > 
> > while I have predominantly tested on ppc, I did test on X86 with CONFIG_NUMA
> > enabled/disabled on both single node and multi node machines.
> > However, I dont have a cpuless/memoryless x86 system.
> 
> This should be able to emulate inside kvm, I believe.
> 

I did try but somehow not able to get cpuless / memoryless node in a x86 kvm
guest.

Also I am unable to see how to enable HAVE_MEMORYLESS_NODES on x86 system.
# git grep -w HAVE_MEMORYLESS_NODES | cat
arch/ia64/Kconfig:config HAVE_MEMORYLESS_NODES
arch/powerpc/Kconfig:config HAVE_MEMORYLESS_NODES
#
I forced enabled but it got disabled while kernel build.
May be I am missing something.

> > 
> > So we have a redundant page hinting numa faults which we can avoid.
> 
> interesting. Does this lead to any observable differences? Btw. it would
> be really great to describe how the online state influences the numa
> balancing.
> 

If numa_balancing is enabled, it has a check to see if the number of online
nodes is 1. If its one, it disables numa_balancing, else the numa_balancing
stays as is. In this case, the actual node (node nr > 0) and
node 0 were marked online without the patch.

Here are 2 sample numa programs.

numa01.sh is a set of 2 process each running threads as many as number of cpus;
each thread doing 50 loops on 3GB process shared memory operations.

numa02.sh is a single process with threads as many as number of cpus;
each thread doing 800 loops on 32MB thread local memory operations.

Testcase         Time:  Min      Max      Avg      StdDev
./numa01.sh      Real:  149.62   149.66   149.64   0.02
./numa01.sh      Sys:   3.21     3.71     3.46     0.25
./numa01.sh      User:  4755.13  4758.15  4756.64  1.51
./numa02.sh      Real:  24.98    25.02    25.00    0.02
./numa02.sh      Sys:   0.51     0.59     0.55     0.04
./numa02.sh      User:  790.28   790.88   790.58   0.30

Testcase         Time:  Min      Max      Avg      StdDev  %Change
./numa01.sh      Real:  149.44   149.46   149.45   0.01    0.127133%
./numa01.sh      Sys:   0.71     0.89     0.80     0.09    332.5%
./numa01.sh      User:  4754.19  4754.48  4754.33  0.15    0.0485873%
./numa02.sh      Real:  24.97    24.98    24.98    0.00    0.0800641%
./numa02.sh      Sys:   0.26     0.41     0.33     0.08    66.6667%
./numa02.sh      User:  789.75   790.28   790.01   0.27    0.072151%

numa01.sh
param                   no_patch    with_patch  %Change
-----                   ----------  ----------  -------
numa_hint_faults        1131164     0           -100%
numa_hint_faults_local  1131164     0           -100%
numa_hit                213696      214244      0.256439%
numa_local              213696      214244      0.256439%
numa_pte_updates        1131294     0           -100%
pgfault                 1380845     241424      -82.5162%
pgmajfault              75          60          -20%

numa02.sh
param                   no_patch    with_patch  %Change
-----                   ----------  ----------  -------
numa_hint_faults        111878      0           -100%
numa_hint_faults_local  111878      0           -100%
numa_hit                41854       43220       3.26373%
numa_local              41854       43220       3.26373%
numa_pte_updates        113926      0           -100%
pgfault                 163662      51210       -68.7099%
pgmajfault              56          52          -7.14286%

Observations:
The real time and user time actually doesn't change much. However the system
time changes to some extent. The reason being the number of numa hinting
faults. With the patch we are not seeing the numa hinting faults.

> > 2. Few people have complained about existence of this dummy node when
> > parsing lscpu and numactl o/p. They somehow start to think that the tools
> > are reporting incorrectly or the kernel is not able to recognize resources
> > connected to the node.
> 
> Please be more specific.

Taking the below example of numactl
available: 2 nodes (0,7)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 7 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 7 size: 16238 MB
node 7 free: 15449 MB
node distances:
node   0   7 
  0:  10  20 
  7:  20  10 

We know node 0 can be special, but users may not feel the same.

When users parse numactl/lscpu or /sys directory; they find there are 2
online nodes. They find none of the resources for a node(node 0) are
available but still online. However they find other nodes (nodes 1-6) with
don't have resources but not online. So they tend to think the kernel has
been unable to online some of the resources or the resources have gone bad.
Please do note that on hypervisors like PowerVM, the admins don't have
control over which nodes the resources are allocated.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH 2/3] dts: ppc: t4240rdb: add uie_unsupported property to drop warning
From: Alexandre Belloni @ 2020-05-08 11:50 UTC (permalink / raw)
  To: Biwen Li
  Cc: linux-rtc, a.zummo, devicetree, Biwen Li, linux-kernel,
	leoyang.li, robh+dt, linuxppc-dev
In-Reply-To: <20200508054925.48237-2-biwen.li@oss.nxp.com>

On 08/05/2020 13:49:24+0800, Biwen Li wrote:
> From: Biwen Li <biwen.li@nxp.com>
> 
> This adds uie_unsupported property to drop warning as follows:
>     - $ hwclock.util-linux
>       hwclock.util-linux: select() to /dev/rtc0
>       to wait for clock tick timed out
> 
> My case:
>     - RTC ds1374's INT pin is connected to VCC on T4240RDB,
>       then the RTC cannot inform cpu about the alarm interrupt
> 
> Signed-off-by: Biwen Li <biwen.li@nxp.com>
> ---
>  arch/powerpc/boot/dts/fsl/t4240rdb.dts | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> index a56a705d41f7..ccdd10202e56 100644
> --- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> +++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> @@ -144,7 +144,11 @@
>  			rtc@68 {
>  				compatible = "dallas,ds1374";
>  				reg = <0x68>;
> -				interrupts = <0x1 0x1 0 0>;

removing the interrupt should be enough to solve your issue

> +				// The ds1374's INT pin isn't
> +				// connected to cpu's INT pin,
> +				// so the rtc cannot synchronize
> +				// clock tick per second.
> +				uie_unsupported;
>  			};
>  		};
>  
> -- 
> 2.17.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH 1/3] rtc: ds1374: add uie_unsupported property to drop warning
From: Alexandre Belloni @ 2020-05-08 11:49 UTC (permalink / raw)
  To: Biwen Li
  Cc: linux-rtc, a.zummo, devicetree, Biwen Li, linux-kernel,
	leoyang.li, robh+dt, linuxppc-dev
In-Reply-To: <20200508054925.48237-1-biwen.li@oss.nxp.com>

Hi,

On 08/05/2020 13:49:23+0800, Biwen Li wrote:
> From: Biwen Li <biwen.li@nxp.com>
> 
> Add uie_unsupported property to drop warning as follows:
>     - $ hwclock.util-linux
>       hwclock.util-liux: select() /dev/rtc0
>       to wait for clock tick timed out
> 
> My case:
>     - RTC ds1374's INT pin is connected to VCC on T4240RDB,
>       then the RTC cannot inform cpu about the alarm
>       interrupt
> 
> Signed-off-by: Biwen Li <biwen.li@nxp.com>
> ---
>  drivers/rtc/rtc-ds1374.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/rtc/rtc-ds1374.c b/drivers/rtc/rtc-ds1374.c
> index 9c51a12cf70f..e530e887a17e 100644
> --- a/drivers/rtc/rtc-ds1374.c
> +++ b/drivers/rtc/rtc-ds1374.c
> @@ -651,6 +651,10 @@ static int ds1374_probe(struct i2c_client *client,
>  	if (ret)
>  		return ret;
>  
> +	if (of_property_read_bool(client->dev.of_node,
> +						 "uie_unsupported"))
> +		ds1374->rtc->uie_unsupported = true;
> +

This is not how this is supposed to work, either the RTC support uie or
don't, it is not board dependent and certainly doesn't require an
(undocumented) DT property.

>  #ifdef CONFIG_RTC_DRV_DS1374_WDT
>  	save_client = client;
>  	ret = misc_register(&ds1374_miscdev);
> -- 
> 2.17.1
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH net 11/16] net: ethernet: marvell: mvneta: fix fixed-link phydev leaks
From: Greg Kroah-Hartman @ 2020-05-08 12:02 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Andrew Lunn, lkft-triage, Frank Rowand, Sasha Levin,
	Florian Fainelli, Naresh Kamboju,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Grygorii Strashko, Rob Herring, linux-mediatek, Lars Persson,
	Matthias Brugger, linux-omap, Thomas Petazzoni, Fugang Duan,
	Sergei Shtylyov, Netdev, open list, linux- stable,
	linux-renesas-soc, nios2-dev, linuxppc-dev, David S. Miller
In-Reply-To: <20200508062119.GE25962@localhost>

On Fri, May 08, 2020 at 08:21:19AM +0200, Johan Hovold wrote:
> On Fri, May 08, 2020 at 03:35:02AM +0530, Naresh Kamboju wrote:
> > On Thu, 7 May 2020 at 16:43, Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > <trim>
> > > > >
> > > > > Greg, 3f65047c853a ("of_mdio: add helper to deregister fixed-link
> > > > > PHYs") needs to be backported as well for these.
> > > > >
> > > > > Original series can be found here:
> > > > >
> > > > >     https://lkml.kernel.org/r/1480357509-28074-1-git-send-email-johan@kernel.org
> > > >
> > > > Ah, thanks for that, I thought I dropped all of the ones that caused
> > > > build errors, but missed the above one.  I'll go take the whole series
> > > > instead.
> > >
> > > This should now all be fixed up, thanks.
> > 
> > While building kernel Image for arm architecture on stable-rc 4.4 branch
> > the following build error found.
> > 
> > of_mdio: add helper to deregister fixed-link PHYs
> > commit 3f65047c853a2a5abcd8ac1984af3452b5df4ada upstream.
> > 
> > Add helper to deregister fixed-link PHYs registered using
> > of_phy_register_fixed_link().
> > 
> > Convert the two drivers that care to deregister their fixed-link PHYs to
> > use the new helper, but note that most drivers currently fail to do so.
> > 
> > Signed-off-by: Johan Hovold <johan@kernel.org>
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> > [only take helper function for 4.4.y - gregkh]
> > 
> >  # make -sk KBUILD_BUILD_USER=TuxBuild -C/linux -j16 ARCH=arm
> > CROSS_COMPILE=arm-linux-gnueabihf- HOSTCC=gcc CC="sccache
> > arm-linux-gnueabihf-gcc" O=build zImage
> > 70 #
> > 71 ../drivers/of/of_mdio.c: In function ‘of_phy_deregister_fixed_link’:
> > 72 ../drivers/of/of_mdio.c:379:2: error: implicit declaration of
> > function ‘fixed_phy_unregister’; did you mean ‘fixed_phy_register’?
> > [-Werror=implicit-function-declaration]
> > 73  379 | fixed_phy_unregister(phydev);
> > 74  | ^~~~~~~~~~~~~~~~~~~~
> > 75  | fixed_phy_register
> > 76 ../drivers/of/of_mdio.c:381:22: error: ‘struct phy_device’ has no
> > member named ‘mdio’; did you mean ‘mdix’?
> > 77  381 | put_device(&phydev->mdio.dev); /* of_phy_find_device() */
> > 78  | ^~~~
> > 79  | mdix
> 
> Another dependency: 5bcbe0f35fb1 ("phy: fixed: Fix removal of phys.")
> 
> Greg, these patches are from four years ago so can't really remember if
> there are other dependencies or reasons against backporting them (the
> missing stable tags are per Dave's preference), sorry.
> 
> The cover letter also mentions another dependency, but that may just
> have been some context conflict.
> 
> Perhaps you better drop these unless you want to review them closer.

Good idea, I've dropped them all for now, sorry for the noise.

greg k-h

^ permalink raw reply

* Re: [PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules
From: Vaibhav Jain @ 2020-05-08 12:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Cezary Rojewski, linux-nvdimm, linux-kernel, Steven Rostedt,
	Piotr Maziarz, Aneesh Kumar K . V, linuxppc-dev
In-Reply-To: <20200508113100.GA19436@zn.tnic>

Hi Boris,

Borislav Petkov <bp@alien8.de> writes:

> On Fri, May 08, 2020 at 04:19:19PM +0530, Vaibhav Jain wrote:
>> 'seq_buf' provides a very useful abstraction for writing to a string
>> buffer without needing to worry about it over-flowing. However even
>> though the API has been stable for couple of years now its stills not
>> exported to external modules limiting its usage.
>> 
>> Hence this patch proposes update to 'seq_buf.c' to mark
>> seq_buf_printf() which is part of the seq_buf API to be exported to
>> external GPL modules. This symbol will be used in later parts of this
>
> What is "external GPL modules"?
I am referring to Kernel Loadable Modules with MODULE_LICENSE("GPL")
here.

>
> -- 
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply

* RE: [PATCH 2/3] dts: ppc: t4240rdb: add uie_unsupported property to drop warning
From: Biwen Li (OSS) @ 2020-05-08 11:59 UTC (permalink / raw)
  To: Alexandre Belloni, Biwen Li (OSS)
  Cc: linux-rtc@vger.kernel.org, a.zummo@towertech.it,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, Leo Li,
	robh+dt@kernel.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20200508115052.GL34497@piout.net>

> 
> On 08/05/2020 13:49:24+0800, Biwen Li wrote:
> > From: Biwen Li <biwen.li@nxp.com>
> >
> > This adds uie_unsupported property to drop warning as follows:
> >     - $ hwclock.util-linux
> >       hwclock.util-linux: select() to /dev/rtc0
> >       to wait for clock tick timed out
> >
> > My case:
> >     - RTC ds1374's INT pin is connected to VCC on T4240RDB,
> >       then the RTC cannot inform cpu about the alarm interrupt
> >
> > Signed-off-by: Biwen Li <biwen.li@nxp.com>
> > ---
> >  arch/powerpc/boot/dts/fsl/t4240rdb.dts | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> > index a56a705d41f7..ccdd10202e56 100644
> > --- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> > +++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> > @@ -144,7 +144,11 @@
> >  			rtc@68 {
> >  				compatible = "dallas,ds1374";
> >  				reg = <0x68>;
> > -				interrupts = <0x1 0x1 0 0>;
> 
> removing the interrupt should be enough to solve your issue
Okay, got it. Thanks.
> 
> > +				// The ds1374's INT pin isn't
> > +				// connected to cpu's INT pin,
> > +				// so the rtc cannot synchronize
> > +				// clock tick per second.
> > +				uie_unsupported;
> >  			};
> >  		};
> >
> > --
> > 2.17.1
> >
> 
> --
> Alexandre Belloni, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox