* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters [not found] ` <20170530181724.27197-3-hannes@cmpxchg.org> @ 2017-05-31 9:12 ` Heiko Carstens 2017-05-31 11:39 ` Heiko Carstens 0 siblings, 1 reply; 10+ messages in thread From: Heiko Carstens @ 2017-05-31 9:12 UTC (permalink / raw) To: Johannes Weiner Cc: Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > To re-implement slab cache vs. page cache balancing, we'll need the > slab counters at the lruvec level, which, ever since lru reclaim was > moved from the zone to the node, is the intersection of the node, not > the zone, and the memcg. > > We could retain the per-zone counters for when the page allocator > dumps its memory information on failures, and have counters on both > levels - which on all but NUMA node 0 is usually redundant. But let's > keep it simple for now and just move them. If anybody complains we can > restore the per-zone counters. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> This patch causes an early boot crash on s390 (linux-next as of today). CONFIG_NUMA on/off doesn't make any difference. I haven't looked any further into this yet, maybe you have an idea? Kernel BUG at 00000000002b0362 [verbose debug info unavailable] addressing exception: 0005 ilc:3 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) task: 0000000000d75d00 task.stack: 0000000000d60000 Krnl PSW : 0404200180000000 00000000002b0362 (mod_node_page_state+0x62/0x158) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000001 000000003d81f000 0000000000000000 0000000000000006 0000000000000001 0000000000f29b52 0000000000000041 0000000000000000 0000000000000007 0000000000000040 000000003fe81000 000003d100ffa000 0000000000ee1cd0 0000000000979040 0000000000300abc 0000000000d63c90 Krnl Code: 00000000002b0350: e31003900004 lg %r1,912 00000000002b0356: e320f0a80004 lg %r2,168(%r15) #00000000002b035c: e31120000090 llgc %r1,0(%r1,%r2) >00000000002b0362: b9060011 lgbr %r1,%r1 00000000002b0366: e32003900004 lg %r2,912 00000000002b036c: e3c280000090 llgc %r12,0(%r2,%r8) 00000000002b0372: b90600ac lgbr %r10,%r12 00000000002b0376: b904002a lgr %r2,%r10 Call Trace: ([<0000000000000000>] (null)) [<0000000000300abc>] new_slab+0x35c/0x628 [<000000000030740c>] __kmem_cache_create+0x33c/0x638 [<0000000000e99c0e>] create_boot_cache+0xae/0xe0 [<0000000000e9e12c>] kmem_cache_init+0x5c/0x138 [<0000000000e7999c>] start_kernel+0x24c/0x440 [<0000000000100020>] _stext+0x20/0x80 Last Breaking-Event-Address: [<0000000000300ab6>] new_slab+0x356/0x628 Kernel panic - not syncing: Fatal exception: panic_on_oops > diff --git a/drivers/base/node.c b/drivers/base/node.c > index 5548f9686016..e57e06e6df4c 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -129,11 +129,11 @@ static ssize_t node_read_meminfo(struct device *dev, > nid, K(node_page_state(pgdat, NR_UNSTABLE_NFS)), > nid, K(sum_zone_node_page_state(nid, NR_BOUNCE)), > nid, K(node_page_state(pgdat, NR_WRITEBACK_TEMP)), > - nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE) + > - sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)), > - nid, K(sum_zone_node_page_state(nid, NR_SLAB_RECLAIMABLE)), > + nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE) + > + node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)), > + nid, K(node_page_state(pgdat, NR_SLAB_RECLAIMABLE)), > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > - nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE)), > + nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE)), > nid, K(node_page_state(pgdat, NR_ANON_THPS) * > HPAGE_PMD_NR), > nid, K(node_page_state(pgdat, NR_SHMEM_THPS) * > @@ -141,7 +141,7 @@ static ssize_t node_read_meminfo(struct device *dev, > nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) * > HPAGE_PMD_NR)); > #else > - nid, K(sum_zone_node_page_state(nid, NR_SLAB_UNRECLAIMABLE))); > + nid, K(node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE))); > #endif > n += hugetlb_report_node_meminfo(nid, buf + n); > return n; > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index ebaccd4e7d8c..eacadee83964 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -125,8 +125,6 @@ enum zone_stat_item { > NR_ZONE_UNEVICTABLE, > NR_ZONE_WRITE_PENDING, /* Count of dirty, writeback and unstable pages */ > NR_MLOCK, /* mlock()ed pages found and moved off LRU */ > - NR_SLAB_RECLAIMABLE, > - NR_SLAB_UNRECLAIMABLE, > NR_PAGETABLE, /* used for pagetables */ > NR_KERNEL_STACK_KB, /* measured in KiB */ > /* Second 128 byte cacheline */ > @@ -152,6 +150,8 @@ enum node_stat_item { > NR_INACTIVE_FILE, /* " " " " " */ > NR_ACTIVE_FILE, /* " " " " " */ > NR_UNEVICTABLE, /* " " " " " */ > + NR_SLAB_RECLAIMABLE, > + NR_SLAB_UNRECLAIMABLE, > NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ > NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ > WORKINGSET_REFAULT, > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index f9e450c6b6e4..5f89cfaddc4b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4601,8 +4601,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) > " present:%lukB" > " managed:%lukB" > " mlocked:%lukB" > - " slab_reclaimable:%lukB" > - " slab_unreclaimable:%lukB" > " kernel_stack:%lukB" > " pagetables:%lukB" > " bounce:%lukB" > @@ -4624,8 +4622,6 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) > K(zone->present_pages), > K(zone->managed_pages), > K(zone_page_state(zone, NR_MLOCK)), > - K(zone_page_state(zone, NR_SLAB_RECLAIMABLE)), > - K(zone_page_state(zone, NR_SLAB_UNRECLAIMABLE)), > zone_page_state(zone, NR_KERNEL_STACK_KB), > K(zone_page_state(zone, NR_PAGETABLE)), > K(zone_page_state(zone, NR_BOUNCE)), > diff --git a/mm/slab.c b/mm/slab.c > index 2a31ee3c5814..b55853399559 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -1425,10 +1425,10 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, > > nr_pages = (1 << cachep->gfporder); > if (cachep->flags & SLAB_RECLAIM_ACCOUNT) > - add_zone_page_state(page_zone(page), > + add_node_page_state(page_pgdat(page), > NR_SLAB_RECLAIMABLE, nr_pages); > else > - add_zone_page_state(page_zone(page), > + add_node_page_state(page_pgdat(page), > NR_SLAB_UNRECLAIMABLE, nr_pages); > > __SetPageSlab(page); > @@ -1459,10 +1459,10 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page) > kmemcheck_free_shadow(page, order); > > if (cachep->flags & SLAB_RECLAIM_ACCOUNT) > - sub_zone_page_state(page_zone(page), > + sub_node_page_state(page_pgdat(page), > NR_SLAB_RECLAIMABLE, nr_freed); > else > - sub_zone_page_state(page_zone(page), > + sub_node_page_state(page_pgdat(page), > NR_SLAB_UNRECLAIMABLE, nr_freed); > > BUG_ON(!PageSlab(page)); > diff --git a/mm/slub.c b/mm/slub.c > index 57e5156f02be..673e72698d9b 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1615,7 +1615,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) > if (!page) > return NULL; > > - mod_zone_page_state(page_zone(page), > + mod_node_page_state(page_pgdat(page), > (s->flags & SLAB_RECLAIM_ACCOUNT) ? > NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE, > 1 << oo_order(oo)); > @@ -1655,7 +1655,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page) > > kmemcheck_free_shadow(page, compound_order(page)); > > - mod_zone_page_state(page_zone(page), > + mod_node_page_state(page_pgdat(page), > (s->flags & SLAB_RECLAIM_ACCOUNT) ? > NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE, > -pages); > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c5f9d1673392..5d187ee618c0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -3815,7 +3815,7 @@ int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order) > * unmapped file backed pages. > */ > if (node_pagecache_reclaimable(pgdat) <= pgdat->min_unmapped_pages && > - sum_zone_node_page_state(pgdat->node_id, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages) > + node_page_state(pgdat, NR_SLAB_RECLAIMABLE) <= pgdat->min_slab_pages) > return NODE_RECLAIM_FULL; > > /* > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 76f73670200a..a64f1c764f17 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -928,8 +928,6 @@ const char * const vmstat_text[] = { > "nr_zone_unevictable", > "nr_zone_write_pending", > "nr_mlock", > - "nr_slab_reclaimable", > - "nr_slab_unreclaimable", > "nr_page_table_pages", > "nr_kernel_stack", > "nr_bounce", > @@ -952,6 +950,8 @@ const char * const vmstat_text[] = { > "nr_inactive_file", > "nr_active_file", > "nr_unevictable", > + "nr_slab_reclaimable", > + "nr_slab_unreclaimable", > "nr_isolated_anon", > "nr_isolated_file", > "workingset_refault", > -- > 2.12.2 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-05-31 9:12 ` [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters Heiko Carstens @ 2017-05-31 11:39 ` Heiko Carstens 2017-05-31 17:11 ` Yury Norov 0 siblings, 1 reply; 10+ messages in thread From: Heiko Carstens @ 2017-05-31 11:39 UTC (permalink / raw) To: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > > To re-implement slab cache vs. page cache balancing, we'll need the > > slab counters at the lruvec level, which, ever since lru reclaim was > > moved from the zone to the node, is the intersection of the node, not > > the zone, and the memcg. > > > > We could retain the per-zone counters for when the page allocator > > dumps its memory information on failures, and have counters on both > > levels - which on all but NUMA node 0 is usually redundant. But let's > > keep it simple for now and just move them. If anybody complains we can > > restore the per-zone counters. > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > This patch causes an early boot crash on s390 (linux-next as of today). > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > further into this yet, maybe you have an idea? > > Kernel BUG at 00000000002b0362 [verbose debug info unavailable] > addressing exception: 0005 ilc:3 [#1] SMP > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #16 > Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) > task: 0000000000d75d00 task.stack: 0000000000d60000 > Krnl PSW : 0404200180000000 00000000002b0362 (mod_node_page_state+0x62/0x158) > R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3 > Krnl GPRS: 0000000000000001 000000003d81f000 0000000000000000 0000000000000006 > 0000000000000001 0000000000f29b52 0000000000000041 0000000000000000 > 0000000000000007 0000000000000040 000000003fe81000 000003d100ffa000 > 0000000000ee1cd0 0000000000979040 0000000000300abc 0000000000d63c90 > Krnl Code: 00000000002b0350: e31003900004 lg %r1,912 > 00000000002b0356: e320f0a80004 lg %r2,168(%r15) > #00000000002b035c: e31120000090 llgc %r1,0(%r1,%r2) > >00000000002b0362: b9060011 lgbr %r1,%r1 > 00000000002b0366: e32003900004 lg %r2,912 > 00000000002b036c: e3c280000090 llgc %r12,0(%r2,%r8) > 00000000002b0372: b90600ac lgbr %r10,%r12 > 00000000002b0376: b904002a lgr %r2,%r10 > Call Trace: > ([<0000000000000000>] (null)) > [<0000000000300abc>] new_slab+0x35c/0x628 > [<000000000030740c>] __kmem_cache_create+0x33c/0x638 > [<0000000000e99c0e>] create_boot_cache+0xae/0xe0 > [<0000000000e9e12c>] kmem_cache_init+0x5c/0x138 > [<0000000000e7999c>] start_kernel+0x24c/0x440 > [<0000000000100020>] _stext+0x20/0x80 > Last Breaking-Event-Address: > [<0000000000300ab6>] new_slab+0x356/0x628 FWIW, it looks like your patch only triggers a bug that was introduced with a different change that somehow messes around with the pages used to setup the kernel page tables. I'll look into this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-05-31 11:39 ` Heiko Carstens @ 2017-05-31 17:11 ` Yury Norov 2017-06-01 10:07 ` Michael Ellerman 0 siblings, 1 reply; 10+ messages in thread From: Yury Norov @ 2017-05-31 17:11 UTC (permalink / raw) To: Heiko Carstens Cc: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: > On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > > > To re-implement slab cache vs. page cache balancing, we'll need the > > > slab counters at the lruvec level, which, ever since lru reclaim was > > > moved from the zone to the node, is the intersection of the node, not > > > the zone, and the memcg. > > > > > > We could retain the per-zone counters for when the page allocator > > > dumps its memory information on failures, and have counters on both > > > levels - which on all but NUMA node 0 is usually redundant. But let's > > > keep it simple for now and just move them. If anybody complains we can > > > restore the per-zone counters. > > > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > > > This patch causes an early boot crash on s390 (linux-next as of today). > > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > > further into this yet, maybe you have an idea? The same on arm64. Yury -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-05-31 17:11 ` Yury Norov @ 2017-06-01 10:07 ` Michael Ellerman 2017-06-05 18:35 ` Johannes Weiner 0 siblings, 1 reply; 10+ messages in thread From: Michael Ellerman @ 2017-06-01 10:07 UTC (permalink / raw) To: Yury Norov, Heiko Carstens Cc: Johannes Weiner, Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 Yury Norov <ynorov@caviumnetworks.com> writes: > On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: >> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: >> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: >> > > To re-implement slab cache vs. page cache balancing, we'll need the >> > > slab counters at the lruvec level, which, ever since lru reclaim was >> > > moved from the zone to the node, is the intersection of the node, not >> > > the zone, and the memcg. >> > > >> > > We could retain the per-zone counters for when the page allocator >> > > dumps its memory information on failures, and have counters on both >> > > levels - which on all but NUMA node 0 is usually redundant. But let's >> > > keep it simple for now and just move them. If anybody complains we can >> > > restore the per-zone counters. >> > > >> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> >> > >> > This patch causes an early boot crash on s390 (linux-next as of today). >> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any >> > further into this yet, maybe you have an idea? > > The same on arm64. And powerpc. cheers -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-06-01 10:07 ` Michael Ellerman @ 2017-06-05 18:35 ` Johannes Weiner 2017-06-05 21:38 ` Andrew Morton 2017-06-06 4:31 ` Michael Ellerman 0 siblings, 2 replies; 10+ messages in thread From: Johannes Weiner @ 2017-06-05 18:35 UTC (permalink / raw) To: Michael Ellerman Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 On Thu, Jun 01, 2017 at 08:07:28PM +1000, Michael Ellerman wrote: > Yury Norov <ynorov@caviumnetworks.com> writes: > > > On Wed, May 31, 2017 at 01:39:00PM +0200, Heiko Carstens wrote: > >> On Wed, May 31, 2017 at 11:12:56AM +0200, Heiko Carstens wrote: > >> > On Tue, May 30, 2017 at 02:17:20PM -0400, Johannes Weiner wrote: > >> > > To re-implement slab cache vs. page cache balancing, we'll need the > >> > > slab counters at the lruvec level, which, ever since lru reclaim was > >> > > moved from the zone to the node, is the intersection of the node, not > >> > > the zone, and the memcg. > >> > > > >> > > We could retain the per-zone counters for when the page allocator > >> > > dumps its memory information on failures, and have counters on both > >> > > levels - which on all but NUMA node 0 is usually redundant. But let's > >> > > keep it simple for now and just move them. If anybody complains we can > >> > > restore the per-zone counters. > >> > > > >> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > >> > > >> > This patch causes an early boot crash on s390 (linux-next as of today). > >> > CONFIG_NUMA on/off doesn't make any difference. I haven't looked any > >> > further into this yet, maybe you have an idea? > > > > The same on arm64. > > And powerpc. It looks like we need the following on top. I can't reproduce the crash, but it's verifiable with WARN_ONs in the vmstat functions that the nodestat array isn't properly initialized when slab bootstraps: --- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-06-05 18:35 ` Johannes Weiner @ 2017-06-05 21:38 ` Andrew Morton 2017-06-07 16:20 ` Johannes Weiner 2017-06-06 4:31 ` Michael Ellerman 1 sibling, 1 reply; 10+ messages in thread From: Andrew Morton @ 2017-06-05 21:38 UTC (permalink / raw) To: Johannes Weiner Cc: Michael Ellerman, Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat) > */ > static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); > static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); > +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); > static void setup_zone_pageset(struct zone *zone); There's a few kb there. It just sits evermore unused after boot? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-06-05 21:38 ` Andrew Morton @ 2017-06-07 16:20 ` Johannes Weiner 0 siblings, 0 replies; 10+ messages in thread From: Johannes Weiner @ 2017-06-07 16:20 UTC (permalink / raw) To: Andrew Morton Cc: Michael Ellerman, Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko, Vladimir Davydov, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 On Mon, Jun 05, 2017 at 02:38:31PM -0700, Andrew Morton wrote: > On Mon, 5 Jun 2017 14:35:11 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5107,6 +5107,7 @@ static void build_zonelists(pg_data_t *pgdat) > > */ > > static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch); > > static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset); > > +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); > > static void setup_zone_pageset(struct zone *zone); > > There's a few kb there. It just sits evermore unused after boot? It's not the greatest, but it's nothing new. All the node stats we have now used to be in the zone, i.e. the then bigger boot_pageset, before we moved them to the node level. It just re-adds static boot time space for them now. Of course, if somebody has an idea on how to elegantly reuse that memory after boot, that'd be cool. But we've lived with that footprint for the longest time, so I don't think it's a showstopper. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-06-05 18:35 ` Johannes Weiner 2017-06-05 21:38 ` Andrew Morton @ 2017-06-06 4:31 ` Michael Ellerman 2017-06-06 11:15 ` Michael Ellerman 1 sibling, 1 reply; 10+ messages in thread From: Michael Ellerman @ 2017-06-06 4:31 UTC (permalink / raw) To: Johannes Weiner Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 Johannes Weiner <hannes@cmpxchg.org> writes: > From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <hannes@cmpxchg.org> > Date: Mon, 5 Jun 2017 14:12:15 -0400 > Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters > fix > > Unable to handle kernel paging request at virtual address 2e116007 > pgd = c0004000 > [2e116007] *pgd=00000000 > Internal error: Oops: 5 [#1] SMP ARM > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 > Hardware name: Generic DRA74X (Flattened Device Tree) > task: c0d0adc0 task.stack: c0d00000 > PC is at __mod_node_page_state+0x2c/0xc8 > LR is at __per_cpu_offset+0x0/0x8 > pc : [<c0271de8>] lr : [<c0d07da4>] psr: 600000d3 > sp : c0d01eec ip : 00000000 fp : c15782f4 > r10: 00000000 r9 : c1591280 r8 : 00004000 > r7 : 00000001 r6 : 00000006 r5 : 2e116000 r4 : 00000007 > r3 : 00000007 r2 : 00000001 r1 : 00000006 r0 : c0dc27c0 > Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 8000406a DAC: 00000051 > Process swapper (pid: 0, stack limit = 0xc0d00218) > Stack: (0xc0d01eec to 0xc0d02000) > 1ee0: 600000d3 c0dc27c0 c0271efc 00000001 c0d58864 > 1f00: ef470000 00008000 00004000 c029fbb0 01000000 c1572b5c 00002000 00000000 > 1f20: 00000001 00000001 00008000 c029f584 00000000 c0d58864 00008000 00008000 > 1f40: 01008000 c0c23790 c15782f4 a00000d3 c0d58864 c02a0364 00000000 c0819388 > 1f60: c0d58864 000000c0 01000000 c1572a58 c0aa57a4 00000080 00002000 c0dca000 > 1f80: efffe980 c0c53a48 00000000 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c > 1fa0: c0dca000 c0c257a4 00000000 ffffffff c0dca000 c0d07940 c0dca000 c0c00a9c > 1fc0: ffffffff ffffffff 00000000 c0c00680 00000000 c0c53a48 c0dca214 c0d07958 > 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 00000000 8000807c 00000000 00000000 > [<c0271de8>] (__mod_node_page_state) from [<c0271efc>] (mod_node_page_state+0x2c/0x4c) > [<c0271efc>] (mod_node_page_state) from [<c029fbb0>] (cache_alloc_refill+0x5b8/0x828) > [<c029fbb0>] (cache_alloc_refill) from [<c02a0364>] (kmem_cache_alloc+0x24c/0x2d0) > [<c02a0364>] (kmem_cache_alloc) from [<c0c23790>] (create_kmalloc_cache+0x20/0x8c) > [<c0c23790>] (create_kmalloc_cache) from [<c0c257a4>] (kmem_cache_init+0xac/0x11c) > [<c0c257a4>] (kmem_cache_init) from [<c0c00a9c>] (start_kernel+0x1b8/0x3c0) > [<c0c00a9c>] (start_kernel) from [<8000807c>] (0x8000807c) > Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) > ---[ end trace 0000000000000000 ]--- Just to be clear that's not my call trace. > The zone counters work earlier than the node counters because the > zones have special boot pagesets, whereas the nodes do not. > > Add boot nodestats against which we account until the dynamic per-cpu > allocator is available. This isn't working for me. I applied it on top of next-20170605, I still get an oops: $ qemu-system-ppc64 -M pseries -m 1G -kernel build/vmlinux -vga none -nographic SLOF ********************************************************************** QEMU Starting ... Linux version 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty (michael@ka3.ozlabs.ibm.com) (gcc version 5.4.1 20170214 (Custom 2af61cd06c9fd8f5) ) #352 SMP Tue Jun 6 14:09:57 AEST 2017 ... PID hash table entries: 4096 (order: -1, 32768 bytes) Memory: 1014592K/1048576K available (9920K kernel code, 1536K rwdata, 2608K rodata, 832K init, 1420K bss, 33984K reserved, 0K cma-reserved) Unable to handle kernel paging request for data at address 0x00000338 Faulting instruction address: 0xc0000000002cf338 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-gcc-5.4.1-next-20170605-dirty #352 task: c000000000d11080 task.stack: c000000000e24000 NIP: c0000000002cf338 LR: c0000000002cf0dc CTR: 0000000000000000 REGS: c000000000e279a0 TRAP: 0380 Not tainted (4.12.0-rc3-gcc-5.4.1-next-20170605-dirty) MSR: 8000000002001033 <SF,VEC,ME,IR,DR,RI,LE> CR: 22482242 XER: 00000000 CFAR: c0000000002cf6a0 SOFTE: 0 GPR00: c0000000002cf0dc c000000000e27c20 c000000000e28300 c00000003ffc6300 GPR04: c000000000e556f8 0000000000000000 000000003f120000 0000000000000000 GPR08: c000000000ed3058 0000000000000330 0000000000000000 ffffffffffffff80 GPR12: 0000000028402824 c00000000fd40000 0000000000000060 0000000000f540a8 GPR16: 0000000000f540d8 fffffffffffffffd 000000003dc54ee0 0000000000000014 GPR20: c000000000b90e60 c000000000b90e90 0000000000002000 0000000000000000 GPR24: 0000000000000401 0000000000000000 0000000000000001 c00000003e000000 GPR28: 0000000080010400 f0000000000f8000 0000000000000006 c000000000cb4270 NIP [c0000000002cf338] new_slab+0x338/0x770 LR [c0000000002cf0dc] new_slab+0xdc/0x770 Call Trace: [c000000000e27c20] [c0000000002cf0dc] new_slab+0xdc/0x770 (unreliable) [c000000000e27cf0] [c0000000002d6bb4] __kmem_cache_create+0x1a4/0x6a0 [c000000000e27e00] [c000000000c73098] create_boot_cache+0x98/0xdc [c000000000e27e80] [c000000000c77608] kmem_cache_init+0x5c/0x160 [c000000000e27f00] [c000000000c43ec8] start_kernel+0x290/0x51c [c000000000e27f90] [c00000000000b070] start_here_common+0x1c/0x4ac Instruction dump: 419e0388 893d0007 3d02000b 3908ad58 79291f24 7c68482a 60000000 3d230001 e9299a42 39290066 79291f24 7d2a4a14 <eb890008> e93c0080 7fa34800 409e03b0 ---[ end trace 0000000000000000 ]--- cheers -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-06-06 4:31 ` Michael Ellerman @ 2017-06-06 11:15 ` Michael Ellerman 2017-06-06 14:33 ` Johannes Weiner 0 siblings, 1 reply; 10+ messages in thread From: Michael Ellerman @ 2017-06-06 11:15 UTC (permalink / raw) To: Johannes Weiner Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 Michael Ellerman <mpe@ellerman.id.au> writes: > Johannes Weiner <hannes@cmpxchg.org> writes: >> From 89ed86b5b538d8debd3c29567d7e1d31257fa577 Mon Sep 17 00:00:00 2001 >> From: Johannes Weiner <hannes@cmpxchg.org> >> Date: Mon, 5 Jun 2017 14:12:15 -0400 >> Subject: [PATCH] mm: vmstat: move slab statistics from zone to node counters >> fix >> >> Unable to handle kernel paging request at virtual address 2e116007 >> pgd = c0004000 >> [2e116007] *pgd=00000000 >> Internal error: Oops: 5 [#1] SMP ARM >> Modules linked in: >> CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc3-00153-gb6bc6724488a #200 >> Hardware name: Generic DRA74X (Flattened Device Tree) >> task: c0d0adc0 task.stack: c0d00000 >> PC is at __mod_node_page_state+0x2c/0xc8 >> LR is at __per_cpu_offset+0x0/0x8 >> pc : [<c0271de8>] lr : [<c0d07da4>] psr: 600000d3 >> sp : c0d01eec ip : 00000000 fp : c15782f4 >> r10: 00000000 r9 : c1591280 r8 : 00004000 >> r7 : 00000001 r6 : 00000006 r5 : 2e116000 r4 : 00000007 >> r3 : 00000007 r2 : 00000001 r1 : 00000006 r0 : c0dc27c0 >> Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none >> Control: 10c5387d Table: 8000406a DAC: 00000051 >> Process swapper (pid: 0, stack limit = 0xc0d00218) >> Stack: (0xc0d01eec to 0xc0d02000) >> 1ee0: 600000d3 c0dc27c0 c0271efc 00000001 c0d58864 >> 1f00: ef470000 00008000 00004000 c029fbb0 01000000 c1572b5c 00002000 00000000 >> 1f20: 00000001 00000001 00008000 c029f584 00000000 c0d58864 00008000 00008000 >> 1f40: 01008000 c0c23790 c15782f4 a00000d3 c0d58864 c02a0364 00000000 c0819388 >> 1f60: c0d58864 000000c0 01000000 c1572a58 c0aa57a4 00000080 00002000 c0dca000 >> 1f80: efffe980 c0c53a48 00000000 c0c23790 c1572a58 c0c59e48 c0c59de8 c1572b5c >> 1fa0: c0dca000 c0c257a4 00000000 ffffffff c0dca000 c0d07940 c0dca000 c0c00a9c >> 1fc0: ffffffff ffffffff 00000000 c0c00680 00000000 c0c53a48 c0dca214 c0d07958 >> 1fe0: c0c53a44 c0d0caa4 8000406a 412fc0f2 00000000 8000807c 00000000 00000000 >> [<c0271de8>] (__mod_node_page_state) from [<c0271efc>] (mod_node_page_state+0x2c/0x4c) >> [<c0271efc>] (mod_node_page_state) from [<c029fbb0>] (cache_alloc_refill+0x5b8/0x828) >> [<c029fbb0>] (cache_alloc_refill) from [<c02a0364>] (kmem_cache_alloc+0x24c/0x2d0) >> [<c02a0364>] (kmem_cache_alloc) from [<c0c23790>] (create_kmalloc_cache+0x20/0x8c) >> [<c0c23790>] (create_kmalloc_cache) from [<c0c257a4>] (kmem_cache_init+0xac/0x11c) >> [<c0c257a4>] (kmem_cache_init) from [<c0c00a9c>] (start_kernel+0x1b8/0x3c0) >> [<c0c00a9c>] (start_kernel) from [<8000807c>] (0x8000807c) >> Code: e79e5103 e28c3001 e0833001 e1a04003 (e19440d5) >> ---[ end trace 0000000000000000 ]--- > > Just to be clear that's not my call trace. > >> The zone counters work earlier than the node counters because the >> zones have special boot pagesets, whereas the nodes do not. >> >> Add boot nodestats against which we account until the dynamic per-cpu >> allocator is available. > > This isn't working for me. I applied it on top of next-20170605, I still > get an oops: But today's linux-next is OK. So I must have missed a fix when testing this in isolation. commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a Author: Johannes Weiner <hannes@cmpxchg.org> AuthorDate: Tue Jun 6 09:19:50 2017 +1000 Commit: Stephen Rothwell <sfr@canb.auug.org.au> CommitDate: Tue Jun 6 09:19:50 2017 +1000 mm: vmstat: move slab statistics from zone to node counters fix Unable to handle kernel paging request at virtual address 2e116007 pgd = c0004000 [2e116007] *pgd=00000000 Internal error: Oops: 5 [#1] SMP ARM ... Booted to userspace: $ uname -a Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 20:44:42 AEST 2017 ppc64le GNU/Linux cheers -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters 2017-06-06 11:15 ` Michael Ellerman @ 2017-06-06 14:33 ` Johannes Weiner 0 siblings, 0 replies; 10+ messages in thread From: Johannes Weiner @ 2017-06-06 14:33 UTC (permalink / raw) To: Michael Ellerman Cc: Yury Norov, Heiko Carstens, Josef Bacik, Michal Hocko, Vladimir Davydov, Andrew Morton, Rik van Riel, linux-mm, cgroups, linux-kernel, kernel-team, linux-s390 On Tue, Jun 06, 2017 at 09:15:48PM +1000, Michael Ellerman wrote: > But today's linux-next is OK. So I must have missed a fix when testing > this in isolation. > > commit d94b69d9a3f8139e6d5f5d03c197d8004de3905a > Author: Johannes Weiner <hannes@cmpxchg.org> > AuthorDate: Tue Jun 6 09:19:50 2017 +1000 > Commit: Stephen Rothwell <sfr@canb.auug.org.au> > CommitDate: Tue Jun 6 09:19:50 2017 +1000 > > mm: vmstat: move slab statistics from zone to node counters fix > > Unable to handle kernel paging request at virtual address 2e116007 > pgd = c0004000 > [2e116007] *pgd=00000000 > Internal error: Oops: 5 [#1] SMP ARM > > ... > > Booted to userspace: > > $ uname -a > Linux buildroot 4.12.0-rc4-gcc-5.4.1-00130-gd94b69d9a3f8 #354 SMP Tue Jun 6 20:44:42 AEST 2017 ppc64le GNU/Linux Thanks for verifying! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-06-07 16:20 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20170530181724.27197-1-hannes@cmpxchg.org>
[not found] ` <20170530181724.27197-3-hannes@cmpxchg.org>
2017-05-31 9:12 ` [PATCH 2/6] mm: vmstat: move slab statistics from zone to node counters Heiko Carstens
2017-05-31 11:39 ` Heiko Carstens
2017-05-31 17:11 ` Yury Norov
2017-06-01 10:07 ` Michael Ellerman
2017-06-05 18:35 ` Johannes Weiner
2017-06-05 21:38 ` Andrew Morton
2017-06-07 16:20 ` Johannes Weiner
2017-06-06 4:31 ` Michael Ellerman
2017-06-06 11:15 ` Michael Ellerman
2017-06-06 14:33 ` Johannes Weiner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox