From: Marcelo Tosatti <mtosatti@redhat.com>
To: Christoph Lameter <cl@linux.com>
Cc: Aaron Tomlin <atomlin@atomlin.com>,
Frederic Weisbecker <frederic@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>,
Marcelo Tosatti <mtosatti@redhat.com>
Subject: [PATCH v3 3/3] mm/vmstat: do not refresh stats for isolated CPUs
Date: Mon, 05 Jun 2023 15:56:30 -0300 [thread overview]
Message-ID: <20230605190132.087124739@redhat.com> (raw)
In-Reply-To: 20230605185627.923698377@redhat.com
schedule_work_on API uses the workqueue mechanism to
queue a work item on a queue. A kernel thread, which
runs on the target CPU, executes those work items.
Therefore, when using the schedule_work_on API,
it is necessary for the kworker kernel thread to
be scheduled in, for the work function to be executed.
Time sensitive applications such as SoftPLCs
(https://tum-esi.github.io/publications-list/PDF/2022-ETFA-How_Real_Time_Are_Virtual_PLCs.pdf),
have their response times affected by such interruptions.
The /proc/sys/vm/stat_refresh file was originally introduced
with the goal to:
"Provide /proc/sys/vm/stat_refresh to force an immediate update of
per-cpu into global vmstats: useful to avoid a sleep(2) or whatever
before checking counts when testing. Originally added to work around a
bug which left counts stranded indefinitely on a cpu going idle (an
inaccuracy magnified when small below-batch numbers represent "huge"
amounts of memory), but I believe that bug is now fixed: nonetheless,
this is still a useful knob."
Other than the potential interruption to a time sensitive application,
if using SCHED_FIFO or SCHED_RR priority on the isolated CPU, then
system hangs can occur:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=978688
To avoid the problems above, do not schedule the work to synchronize
per-CPU mm counters on isolated CPUs. Given the possibility for
breaking existing userspace applications, avoid returning
errors from access to /proc/sys/vm/stat_refresh.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
v3: improve changelog (Michal Hocko)
v2: opencode schedule_on_each_cpu (Michal Hocko)
Index: linux-vmstat-remote/mm/vmstat.c
===================================================================
--- linux-vmstat-remote.orig/mm/vmstat.c
+++ linux-vmstat-remote/mm/vmstat.c
@@ -1881,8 +1881,13 @@ int vmstat_refresh(struct ctl_table *tab
void *buffer, size_t *lenp, loff_t *ppos)
{
long val;
- int err;
int i;
+ int cpu;
+ struct work_struct __percpu *works;
+
+ works = alloc_percpu(struct work_struct);
+ if (!works)
+ return -ENOMEM;
/*
* The regular update, every sysctl_stat_interval, may come later
@@ -1896,9 +1901,24 @@ int vmstat_refresh(struct ctl_table *tab
* transiently negative values, report an error here if any of
* the stats is negative, so we know to go looking for imbalance.
*/
- err = schedule_on_each_cpu(refresh_vm_stats);
- if (err)
- return err;
+ cpus_read_lock();
+ for_each_online_cpu(cpu) {
+ struct work_struct *work;
+
+ if (cpu_is_isolated(cpu))
+ continue;
+ work = per_cpu_ptr(works, cpu);
+ INIT_WORK(work, refresh_vm_stats);
+ schedule_work_on(cpu, work);
+ }
+
+ for_each_online_cpu(cpu) {
+ if (cpu_is_isolated(cpu))
+ continue;
+ flush_work(per_cpu_ptr(works, cpu));
+ }
+ cpus_read_unlock();
+ free_percpu(works);
for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
/*
* Skip checking stats known to go negative occasionally.
next prev parent reply other threads:[~2023-06-05 19:04 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-05 18:56 [PATCH v3 0/3] vmstat bug fixes for nohz_full and isolated CPUs Marcelo Tosatti
2023-06-05 18:56 ` [PATCH v3 1/3] vmstat: allow_direct_reclaim should use zone_page_state_snapshot Marcelo Tosatti
2023-06-05 18:56 ` [PATCH v3 2/3] vmstat: skip periodic vmstat update for isolated CPUs Marcelo Tosatti
2023-06-05 18:56 ` Marcelo Tosatti [this message]
2023-06-05 19:20 ` [PATCH v3 3/3] mm/vmstat: do not refresh stats " Michal Hocko
2023-06-05 19:53 ` Marcelo Tosatti
2023-06-05 20:22 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230605190132.087124739@redhat.com \
--to=mtosatti@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@atomlin.com \
--cc=cl@linux.com \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox