* [PATCH] mm: introduce clear all vm events counters
@ 2021-03-01 10:49 Pintu Kumar
  2021-03-01 12:13 ` Matthew Wilcox
  2021-03-01 15:11 ` Johannes Weiner
  0 siblings, 2 replies; 7+ messages in thread
From: Pintu Kumar @ 2021-03-01 10:49 UTC (permalink / raw)
  To: linux-kernel, akpm, linux-mm, linux-fsdevel, linux-doc,
	jaewon31.kim, yuzhao, shakeelb, guro, mchehab+huawei, xi.fengfei,
	pintu, lokeshgidra, hannes, nigupta, famzheng, andrew.a.klychkov,
	bigeasy, ping.ping, vbabka, yzaikin, keescook, mcgrof, corbet
  Cc: pintu.ping
At times there is a need to regularly monitor vm counters while we
reproduce some issue, or it could be as simple as gathering some system
statistics when we run some scenario and every time we like to start from
beginning.
The current steps are:
Dump /proc/vmstat
Run some scenario
Dump /proc/vmstat again
Generate some data or graph
reboot and repeat again
So, each time we wanted to capture fresh data a reboot is required.
Thus, in this patch I introduce a way to clear/reset all vm counters to 0
without rebooting the system.
With this patch the steps can be:
Dump /proc/vmstat
Run some scenario
Dump /proc/vmstat again
Generate some data or graph
Do: echo 1 > /proc/sys/vm/clear_all_vm_events
Then repeat again
Thus, when 1 is written to this file, it resets all the vm counters present
under /proc/vmstat to its default value or zero.
The next time when you read /proc/vmstat, each node is filled with default
or current value.
There could be few other use cases where this can be useful.
Signed-off-by: Pintu Kumar <pintu@codeaurora.org>
Signed-off-by: Pintu Agarwal <ping.ping@gmail.com>
---
 Documentation/admin-guide/sysctl/vm.rst |  8 ++++++++
 include/linux/vmstat.h                  |  7 +++++++
 kernel/sysctl.c                         |  9 +++++++++
 mm/vmstat.c                             | 26 ++++++++++++++++++++++++++
 4 files changed, 50 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index e35a3f2..54382ab 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -26,6 +26,7 @@ Currently, these files are in /proc/sys/vm:
 
 - admin_reserve_kbytes
 - block_dump
+- clear_all_vm_events
 - compact_memory
 - compaction_proactiveness
 - compact_unevictable_allowed
@@ -112,6 +113,13 @@ block_dump
 block_dump enables block I/O debugging when set to a nonzero value. More
 information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.
 
+clear_all_vm_events
+===================
+
+When 1 is written to this file, it resets all the vm counters present under
+/proc/vmstat to its default value or zeros.
+The next time when you read /proc/vmstat, each node is filled with default or
+current value.
 
 compact_memory
 ==============
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 773135f..8ab9be5 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -11,6 +11,8 @@
 #include <linux/mmdebug.h>
 
 extern int sysctl_stat_interval;
+extern int sysctl_clearvmevents_handler(struct ctl_table *table, int write,
+	void __user *buffer, size_t *length, loff_t *ppos);
 
 #ifdef CONFIG_NUMA
 #define ENABLE_NUMA_STAT   1
@@ -83,6 +85,8 @@ static inline void count_vm_events(enum vm_event_item item, long delta)
 
 extern void all_vm_events(unsigned long *);
 
+extern void clear_all_vm_events(void);
+
 extern void vm_events_fold_cpu(int cpu);
 
 #else
@@ -103,6 +107,9 @@ static inline void __count_vm_events(enum vm_event_item item, long delta)
 static inline void all_vm_events(unsigned long *ret)
 {
 }
+static inline void clear_all_vm_events(void)
+{
+}
 static inline void vm_events_fold_cpu(int cpu)
 {
 }
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c9fbdd8..25299a4 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -200,6 +200,8 @@ static int min_extfrag_threshold;
 static int max_extfrag_threshold = 1000;
 #endif
 
+static int sysctl_clear_vm_events;
+
 #endif /* CONFIG_SYSCTL */
 
 #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_SYSCTL)
@@ -2891,6 +2893,13 @@ static struct ctl_table vm_table[] = {
 
 #endif /* CONFIG_COMPACTION */
 	{
+		.procname	= "clear_all_vm_events",
+		.data		= &sysctl_clear_vm_events,
+		.maxlen		= sizeof(int),
+		.mode		= 0200,
+		.proc_handler	= sysctl_clearvmevents_handler,
+	},
+	{
 		.procname	= "min_free_kbytes",
 		.data		= &min_free_kbytes,
 		.maxlen		= sizeof(min_free_kbytes),
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f894216..26b8f81 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -136,6 +136,32 @@ void all_vm_events(unsigned long *ret)
 }
 EXPORT_SYMBOL_GPL(all_vm_events);
 
+void clear_all_vm_events(void)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
+		int sz = NR_VM_EVENT_ITEMS * sizeof(unsigned long);
+
+		memset(this->event, 0, sz);
+	}
+}
+EXPORT_SYMBOL_GPL(clear_all_vm_events);
+
+/*
+ * This is the node to reset all vm events in /proc/vmstat
+ * via /proc/sys/vm/clear_all_vm_events
+ */
+int sysctl_clearvmevents_handler(struct ctl_table *table, int write,
+	void __user *buffer, size_t *length, loff_t *ppos)
+{
+	if (write)
+		clear_all_vm_events();
+
+	return 0;
+}
+
 /*
  * Fold the foreign cpu events into our own.
  *
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply related	[flat|nested] 7+ messages in thread
* Re: [PATCH] mm: introduce clear all vm events counters
  2021-03-01 10:49 [PATCH] mm: introduce clear all vm events counters Pintu Kumar
@ 2021-03-01 12:13 ` Matthew Wilcox
  2021-03-01 13:55   ` pintu
  2021-03-01 15:11 ` Johannes Weiner
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2021-03-01 12:13 UTC (permalink / raw)
  To: Pintu Kumar
  Cc: linux-kernel, akpm, linux-mm, linux-fsdevel, linux-doc,
	jaewon31.kim, yuzhao, shakeelb, guro, mchehab+huawei, xi.fengfei,
	lokeshgidra, hannes, nigupta, famzheng, andrew.a.klychkov,
	bigeasy, ping.ping, vbabka, yzaikin, keescook, mcgrof, corbet,
	pintu.ping
On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
> +EXPORT_SYMBOL_GPL(clear_all_vm_events);
What module uses this function?
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [PATCH] mm: introduce clear all vm events counters
  2021-03-01 12:13 ` Matthew Wilcox
@ 2021-03-01 13:55   ` pintu
  0 siblings, 0 replies; 7+ messages in thread
From: pintu @ 2021-03-01 13:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, akpm, linux-mm, linux-fsdevel, linux-doc,
	jaewon31.kim, yuzhao, shakeelb, guro, mchehab+huawei, xi.fengfei,
	lokeshgidra, hannes, nigupta, famzheng, andrew.a.klychkov,
	bigeasy, ping.ping, vbabka, yzaikin, keescook, mcgrof, corbet,
	pintu.ping
On 2021-03-01 17:43, Matthew Wilcox wrote:
> On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
>> +EXPORT_SYMBOL_GPL(clear_all_vm_events);
> 
> What module uses this function?
oh sorry, I will remove the EXPORT
Thanks for the review.
Regards,
Pintu
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [PATCH] mm: introduce clear all vm events counters
  2021-03-01 10:49 [PATCH] mm: introduce clear all vm events counters Pintu Kumar
  2021-03-01 12:13 ` Matthew Wilcox
@ 2021-03-01 15:11 ` Johannes Weiner
  2021-03-02 10:30   ` pintu
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2021-03-01 15:11 UTC (permalink / raw)
  To: Pintu Kumar
  Cc: linux-kernel, akpm, linux-mm, linux-fsdevel, linux-doc,
	jaewon31.kim, yuzhao, shakeelb, guro, mchehab+huawei, xi.fengfei,
	lokeshgidra, nigupta, famzheng, andrew.a.klychkov, bigeasy,
	ping.ping, vbabka, yzaikin, keescook, mcgrof, corbet, pintu.ping
On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
> At times there is a need to regularly monitor vm counters while we
> reproduce some issue, or it could be as simple as gathering some system
> statistics when we run some scenario and every time we like to start from
> beginning.
> The current steps are:
> Dump /proc/vmstat
> Run some scenario
> Dump /proc/vmstat again
> Generate some data or graph
> reboot and repeat again
You can subtract the first vmstat dump from the second to get the
event delta for the scenario run. That's what I do, and I'd assume
most people are doing. Am I missing something?
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [PATCH] mm: introduce clear all vm events counters
  2021-03-01 15:11 ` Johannes Weiner
@ 2021-03-02 10:30   ` pintu
  2021-03-02 15:56     ` Johannes Weiner
  0 siblings, 1 reply; 7+ messages in thread
From: pintu @ 2021-03-02 10:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-kernel, akpm, linux-mm, linux-fsdevel, linux-doc,
	jaewon31.kim, yuzhao, shakeelb, guro, mchehab+huawei, xi.fengfei,
	lokeshgidra, nigupta, famzheng, andrew.a.klychkov, bigeasy,
	ping.ping, vbabka, yzaikin, keescook, mcgrof, corbet, pintu.ping
On 2021-03-01 20:41, Johannes Weiner wrote:
> On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
>> At times there is a need to regularly monitor vm counters while we
>> reproduce some issue, or it could be as simple as gathering some 
>> system
>> statistics when we run some scenario and every time we like to start 
>> from
>> beginning.
>> The current steps are:
>> Dump /proc/vmstat
>> Run some scenario
>> Dump /proc/vmstat again
>> Generate some data or graph
>> reboot and repeat again
> 
> You can subtract the first vmstat dump from the second to get the
> event delta for the scenario run. That's what I do, and I'd assume
> most people are doing. Am I missing something?
Thanks so much for your comments.
Yes in most cases it works.
But I guess there are sometimes where we need to compare with fresh data 
(just like reboot) at least for some of the counters.
Suppose we wanted to monitor pgalloc_normal and pgfree.
Or, suppose we want to monitor until the field becomes non-zero..
Or, how certain values are changing compared to fresh reboot.
Or, suppose we want to reset all counters after boot and start capturing 
fresh stats.
Some of the counters could be growing too large and too fast. Will there 
be chances of overflow ?
Then resetting using this could help without rebooting.
Suppose system came back from hibernation, and we want to reset all 
counters again (to capture fresh data) ?
Here I am sharing one output (from qemu-arm32 with 256MB RAM) just to 
indicate what could be changed:
Scenario: Generate OOM kill case and check oom_kill counter
BEFORE	AFTER	proc/vmstat
------  ------  -----------------------
49991	49916	nr_free_pages
4467	4481	nr_zone_inactive_anon
68	68	nr_zone_active_anon
3189	3067	nr_zone_inactive_file
223	444	nr_zone_active_file
0	0	nr_zone_unevictable
122	136	nr_zone_write_pending
0	0	nr_mlock
139	139	nr_page_table_pages
0	0	nr_bounce
0	0	nr_zspages
4032	4032	nr_free_cma
4467	4481	nr_inactive_anon
68	68	nr_active_anon
3189	3067	nr_inactive_file
223	444	nr_active_file
0	0	nr_unevictable
1177	1178	nr_slab_reclaimable
1889	1889	nr_slab_unreclaimable
0	0	nr_isolated_anon
0	0	nr_isolated_file
176	163	workingset_nodes
0	0	workingset_refault_anon
3295	3369	workingset_refault_file
0	0	workingset_activate_anon
4	4	workingset_activate_file
0	0	workingset_restore_anon
4	4	workingset_restore_file
0	0	workingset_nodereclaim
4436	4436	nr_anon_pages
2636	2678	nr_mapped
3559	3645	nr_file_pages
122	136	nr_dirty
0	0	nr_writeback
0	0	nr_writeback_temp
126	126	nr_shmem
0	0	nr_shmem_hugepages
0	0	nr_shmem_pmdmapped
0	0	nr_file_hugepages
0	0	nr_file_pmdmapped
0	0	nr_anon_transparent_hugepages
1	1	nr_vmscan_write
1	1	nr_vmscan_immediate_reclaim
1024	1038	nr_dirtied
902	902	nr_written
0	0	nr_kernel_misc_reclaimable
0	0	nr_foll_pin_acquired
0	0	nr_foll_pin_released
616	616	nr_kernel_stack
10529	10533	nr_dirty_threshold
5258	5260	nr_dirty_background_threshold
50714	256	pgpgin
3932	16	pgpgout
0	0	pswpin
0	0	pswpout
86828	122	pgalloc_normal
0	0	pgalloc_movable
0	0	allocstall_normal
22	0	allocstall_movable
0	0	pgskip_normal
0	0	pgskip_movable
139594	34	pgfree
4998	155	pgactivate
5738	0	pgdeactivate
0	0	pglazyfree
82113	122	pgfault
374	2	pgmajfault
0	0	pglazyfreed
7695	0	pgrefill
2718	20	pgreuse
9261	0	pgsteal_kswapd
173	0	pgsteal_direct
12627	0	pgscan_kswapd
283	0	pgscan_direct
2	0	pgscan_direct_throttle
0	0	pgscan_anon
12910	0	pgscan_file
0	0	pgsteal_anon
9434	0	pgsteal_file
0	0	pginodesteal
7008	0	slabs_scanned
109	0	kswapd_inodesteal
16	0	kswapd_low_wmark_hit_quickly
24	0	kswapd_high_wmark_hit_quickly
43	0	pageoutrun
1	0	pgrotated
0	0	drop_pagecache
0	0	drop_slab
1	0	oom_kill
1210	0	pgmigrate_success
0	0	pgmigrate_fail
0	0	thp_migration_success
0	0	thp_migration_fail
0	0	thp_migration_split
1509	0	compact_migrate_scanned
9015	0	compact_free_scanned
3911	0	compact_isolated
0	0	compact_stall
0	0	compact_fail
0	0	compact_success
3	0	compact_daemon_wake
1509	0	compact_daemon_migrate_scanned
9015	0	compact_daemon_free_scanned
0	0	unevictable_pgs_culled
0	0	unevictable_pgs_scanned
0	0	unevictable_pgs_rescued
0	0	unevictable_pgs_mlocked
0	0	unevictable_pgs_munlocked
0	0	unevictable_pgs_cleared
0	0	unevictable_pgs_stranded
0	0	balloon_inflate
0	0	balloon_deflate
0	0	balloon_migrate
0	0	swap_ra
0	0	swap_ra_hit
0	0	nr_unstable
Thanks,
Pintu
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [PATCH] mm: introduce clear all vm events counters
  2021-03-02 10:30   ` pintu
@ 2021-03-02 15:56     ` Johannes Weiner
  2021-03-03 12:25       ` pintu
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2021-03-02 15:56 UTC (permalink / raw)
  To: pintu
  Cc: linux-kernel, akpm, linux-mm, linux-fsdevel, linux-doc,
	jaewon31.kim, yuzhao, shakeelb, guro, mchehab+huawei, xi.fengfei,
	lokeshgidra, nigupta, famzheng, andrew.a.klychkov, bigeasy,
	ping.ping, vbabka, yzaikin, keescook, mcgrof, corbet, pintu.ping
On Tue, Mar 02, 2021 at 04:00:34PM +0530, pintu@codeaurora.org wrote:
> On 2021-03-01 20:41, Johannes Weiner wrote:
> > On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
> > > At times there is a need to regularly monitor vm counters while we
> > > reproduce some issue, or it could be as simple as gathering some
> > > system
> > > statistics when we run some scenario and every time we like to start
> > > from
> > > beginning.
> > > The current steps are:
> > > Dump /proc/vmstat
> > > Run some scenario
> > > Dump /proc/vmstat again
> > > Generate some data or graph
> > > reboot and repeat again
> > 
> > You can subtract the first vmstat dump from the second to get the
> > event delta for the scenario run. That's what I do, and I'd assume
> > most people are doing. Am I missing something?
> 
> Thanks so much for your comments.
> Yes in most cases it works.
> 
> But I guess there are sometimes where we need to compare with fresh data
> (just like reboot) at least for some of the counters.
> Suppose we wanted to monitor pgalloc_normal and pgfree.
Hopefully these would already be balanced out pretty well before you
run a test, or there is a risk that whatever outstanding allocations
there are can cause a large number of frees during your test that
don't match up to your recorded allocation events. Resetting to zero
doesn't eliminate the risk of such background noise.
> Or, suppose we want to monitor until the field becomes non-zero..
> Or, how certain values are changing compared to fresh reboot.
> Or, suppose we want to reset all counters after boot and start capturing
> fresh stats.
Again, there simply is no mathematical difference between
	reset events to 0
	run test
	look at events - 0
and
	read events baseline
	run test
	look at events - baseline
> Some of the counters could be growing too large and too fast. Will there be
> chances of overflow ?
> Then resetting using this could help without rebooting.
Overflows are just a fact of life on 32 bit systems. However, they can
also be trivially handled - you can always subtract a ulong start
state from a ulong end state and get a reliable delta of up to 2^32
events, whether the end state has overflowed or not.
The bottom line is that the benefit of this patch adds a minor
convenience for something that can already be done in userspace. But
the downside is that there would be one more possible source of noise
for kernel developers to consider when looking at a bug report. Plus
the extra code and user interface that need to be maintained.
I don't think we should merge this patch.
^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [PATCH] mm: introduce clear all vm events counters
  2021-03-02 15:56     ` Johannes Weiner
@ 2021-03-03 12:25       ` pintu
  0 siblings, 0 replies; 7+ messages in thread
From: pintu @ 2021-03-03 12:25 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-kernel, akpm, linux-mm, linux-fsdevel, linux-doc,
	jaewon31.kim, yuzhao, shakeelb, guro, mchehab+huawei, xi.fengfei,
	lokeshgidra, nigupta, famzheng, andrew.a.klychkov, bigeasy,
	ping.ping, vbabka, yzaikin, keescook, mcgrof, corbet, pintu.ping
On 2021-03-02 21:26, Johannes Weiner wrote:
> On Tue, Mar 02, 2021 at 04:00:34PM +0530, pintu@codeaurora.org wrote:
>> On 2021-03-01 20:41, Johannes Weiner wrote:
>> > On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
>> > > At times there is a need to regularly monitor vm counters while we
>> > > reproduce some issue, or it could be as simple as gathering some
>> > > system
>> > > statistics when we run some scenario and every time we like to start
>> > > from
>> > > beginning.
>> > > The current steps are:
>> > > Dump /proc/vmstat
>> > > Run some scenario
>> > > Dump /proc/vmstat again
>> > > Generate some data or graph
>> > > reboot and repeat again
>> >
>> > You can subtract the first vmstat dump from the second to get the
>> > event delta for the scenario run. That's what I do, and I'd assume
>> > most people are doing. Am I missing something?
>> 
>> Thanks so much for your comments.
>> Yes in most cases it works.
>> 
>> But I guess there are sometimes where we need to compare with fresh 
>> data
>> (just like reboot) at least for some of the counters.
>> Suppose we wanted to monitor pgalloc_normal and pgfree.
> 
> Hopefully these would already be balanced out pretty well before you
> run a test, or there is a risk that whatever outstanding allocations
> there are can cause a large number of frees during your test that
> don't match up to your recorded allocation events. Resetting to zero
> doesn't eliminate the risk of such background noise.
> 
>> Or, suppose we want to monitor until the field becomes non-zero..
>> Or, how certain values are changing compared to fresh reboot.
>> Or, suppose we want to reset all counters after boot and start 
>> capturing
>> fresh stats.
> 
> Again, there simply is no mathematical difference between
> 
> 	reset events to 0
> 	run test
> 	look at events - 0
> 
> and
> 
> 	read events baseline
> 	run test
> 	look at events - baseline
> 
>> Some of the counters could be growing too large and too fast. Will 
>> there be
>> chances of overflow ?
>> Then resetting using this could help without rebooting.
> 
> Overflows are just a fact of life on 32 bit systems. However, they can
> also be trivially handled - you can always subtract a ulong start
> state from a ulong end state and get a reliable delta of up to 2^32
> events, whether the end state has overflowed or not.
> 
> The bottom line is that the benefit of this patch adds a minor
> convenience for something that can already be done in userspace. But
> the downside is that there would be one more possible source of noise
> for kernel developers to consider when looking at a bug report. Plus
> the extra code and user interface that need to be maintained.
> 
> I don't think we should merge this patch.
Okay no problem.Thank you so much for your review and feedback.
Yes I agree the benefits are minor but I thought might be useful for 
someone somewhere.
I worked on it and found it easy and convinient and thus proposed it.
If others feel not important I am okay to drop it.
Thanks once again to all who helped to review it.
^ permalink raw reply	[flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-03-04  0:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-01 10:49 [PATCH] mm: introduce clear all vm events counters Pintu Kumar
2021-03-01 12:13 ` Matthew Wilcox
2021-03-01 13:55   ` pintu
2021-03-01 15:11 ` Johannes Weiner
2021-03-02 10:30   ` pintu
2021-03-02 15:56     ` Johannes Weiner
2021-03-03 12:25       ` pintu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).