All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to local_lock
Date: Wed, 31 Mar 2021 12:01:37 +0100	[thread overview]
Message-ID: <20210331110137.GA3697@techsingularity.net> (raw)
In-Reply-To: <877dln640j.ffs@nanos.tec.linutronix.de>

On Wed, Mar 31, 2021 at 11:55:56AM +0200, Thomas Gleixner wrote:
> On Mon, Mar 29 2021 at 13:06, Mel Gorman wrote:
> > There is a lack of clarity of what exactly local_irq_save/local_irq_restore
> > protects in page_alloc.c . It conflates the protection of per-cpu page
> > allocation structures with per-cpu vmstat deltas.
> >
> > This patch protects the PCP structure using local_lock which
> > for most configurations is identical to IRQ enabling/disabling.
> > The scope of the lock is still wider than it should be but this is
> > decreased in later patches. The per-cpu vmstat deltas are protected by
> > preempt_disable/preempt_enable where necessary instead of relying on
> > IRQ disable/enable.
> 
> Yes, this goes into the right direction and I really appreciate the
> scoped protection for clarity sake.
> 

Thanks.

> >  #ifdef CONFIG_MEMORY_HOTREMOVE
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 8a8f1a26b231..01b74ff73549 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -887,6 +887,7 @@ void cpu_vm_stats_fold(int cpu)
> >  
> >  		pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);
> >  
> > +		preempt_disable();
> 
> What's the reason for the preempt_disable() here? A comment would be
> appreciated.
> 

Very good question because it's protecting vm_stat_diff and
vm_numa_stat_diff in different contexts and not quite correctly at this
point of the series. By the end of the series vm_numa_stat_diff is a
simple counter and does not need special protection.

Right now, it's protecting against a read and clear of vm_stat_diff
in two contexts -- cpu_vm_stats_fold and drain_zonestats but it's only
defensive. cpu_vm_stats_fold is only called when a CPU is going dead and
drain_zonestats is called from memory hotplug context. The protection is
necessary only if a new drain_zonestats caller was added without taking
the RMW of vm_stat_diff into account which may never happen.

This whole problem with preemption could be avoided altogether if
this_cpu_xchg was used similar to what is done elsewhere in vmstat
so.... this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 64429ca4957f..9528304ce24d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8969,8 +8969,9 @@ void zone_pcp_reset(struct zone *zone)
 	struct per_cpu_zonestat *pzstats;
 
 	/*
-	 * No race with drain_pages. drain_zonestat disables preemption
-	 * and drain_pages relies on the pcp local_lock.
+	 * No race with drain_pages. drain_zonestat is only concerned with
+	 * vm_*_stat_diff which is updated with this_cpu_xchg and drain_pages
+	 * only cares about the PCP lists protected by local_lock.
 	 */
 	if (zone->per_cpu_pageset != &boot_pageset) {
 		for_each_online_cpu(cpu) {
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 01b74ff73549..34ff61a145d2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -887,13 +887,11 @@ void cpu_vm_stats_fold(int cpu)
 
 		pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);
 
-		preempt_disable();
 		for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 			if (pzstats->vm_stat_diff[i]) {
 				int v;
 
-				v = pzstats->vm_stat_diff[i];
-				pzstats->vm_stat_diff[i] = 0;
+				v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);
 				atomic_long_add(v, &zone->vm_stat[i]);
 				global_zone_diff[i] += v;
 			}
@@ -903,13 +901,11 @@ void cpu_vm_stats_fold(int cpu)
 			if (pzstats->vm_numa_stat_diff[i]) {
 				int v;
 
-				v = pzstats->vm_numa_stat_diff[i];
-				pzstats->vm_numa_stat_diff[i] = 0;
+				v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0);
 				atomic_long_add(v, &zone->vm_numa_stat[i]);
 				global_numa_diff[i] += v;
 			}
 #endif
-		preempt_enable();
 	}
 
 	for_each_online_pgdat(pgdat) {
@@ -943,10 +939,9 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats)
 {
 	int i;
 
-	preempt_disable();
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		if (pzstats->vm_stat_diff[i]) {
-			int v = pzstats->vm_stat_diff[i];
+			int v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);
 			pzstats->vm_stat_diff[i] = 0;
 			atomic_long_add(v, &zone->vm_stat[i]);
 			atomic_long_add(v, &vm_zone_stat[i]);
@@ -955,14 +950,12 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats)
 #ifdef CONFIG_NUMA
 	for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++)
 		if (pzstats->vm_numa_stat_diff[i]) {
-			int v = pzstats->vm_numa_stat_diff[i];
+			int v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0);
 
-			pzstats->vm_numa_stat_diff[i] = 0;
 			atomic_long_add(v, &zone->vm_numa_stat[i]);
 			atomic_long_add(v, &vm_numa_stat[i]);
 		}
 #endif
-	preempt_enable();
 }
 #endif
 
 
-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2021-03-31 11:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29 12:06 [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead Mel Gorman
2021-03-29 12:06 ` [PATCH 1/6] mm/page_alloc: Split per cpu page lists and zone stats Mel Gorman
2021-03-29 12:06 ` [PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to local_lock Mel Gorman
2021-03-31  9:55   ` Thomas Gleixner
2021-03-31  9:55     ` Thomas Gleixner
2021-03-31 11:01     ` Mel Gorman [this message]
2021-03-31 17:42       ` Thomas Gleixner
2021-03-31 17:46         ` Thomas Gleixner
2021-03-31 20:42         ` Mel Gorman
2021-03-29 12:06 ` [PATCH 3/6] mm/vmstat: Convert NUMA statistics to basic NUMA counters Mel Gorman
2021-03-29 12:06 ` [PATCH 4/6] mm/vmstat: Inline NUMA event counter updates Mel Gorman
2021-03-29 12:06 ` [PATCH 5/6] mm/page_alloc: Batch the accounting updates in the bulk allocator Mel Gorman
2021-03-29 12:06 ` [PATCH 6/6] mm/page_alloc: Reduce duration that IRQs are disabled for VM counters Mel Gorman
2021-03-30 18:51 ` [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead Jesper Dangaard Brouer
2021-03-31  7:38   ` Mel Gorman
2021-03-31  8:17     ` Jesper Dangaard Brouer
2021-03-31  8:52 ` Mel Gorman
2021-03-31  9:51   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210331110137.GA3697@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.