public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs
@ 2018-04-19 19:08 Alexey Dobriyan
  2018-04-19 19:28 ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Alexey Dobriyan @ 2018-04-19 19:08 UTC (permalink / raw)
  To: longman; +Cc: linux-kernel, rdunlap

> Therefore, application performance can be impacted if the application
> reads /proc/stat rather frequently.

[nods]
Text interfaces can be designed in a very stupid way.

> For example, reading /proc/stat in a certain 2-socket Skylake server
> took about 4.6ms because it had over 5k irqs.

Is this top(1)? What is this application doing?
If it needs percpu usage stats, then maybe /proc/stat should be
converted away from single_open() so that core seq_file code doesn't
generate everything at once.

> -
> -	/* sum again ? it could be updated? */
> -	for_each_irq_nr(j)
> -		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
> -

This is direct userspace breakage.

Proper fix is to start strategic switch away from /proc.
It is a fun toy but its time has come.

^ permalink raw reply	[flat|nested] 18+ messages in thread
* [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs
@ 2018-04-19 17:09 Waiman Long
  2018-04-19 17:38 ` Randy Dunlap
  2018-04-19 19:43 ` Andrew Morton
  0 siblings, 2 replies; 18+ messages in thread
From: Waiman Long @ 2018-04-19 17:09 UTC (permalink / raw)
  To: Andrew Morton, Kate Stewart, Thomas Gleixner, Greg Kroah-Hartman
  Cc: linux-kernel, Waiman Long

It was found that reading /proc/stat could be time consuming on
systems with a lot of irqs. For example, reading /proc/stat in a
certain 2-socket Skylake server took about 4.6ms because it had over
5k irqs. In that particular case, the majority of the CPU cycles for
reading /proc/stat was spent in the kstat_irqs() function.  Therefore,
application performance can be impacted if the application reads
/proc/stat rather frequently.

The "intr" line within /proc/stat contains a sum total of all the irqs
that have happened followed by a list of irq counts for each individual
irq number. In many cases, the first number is good enough. The
individual irq counts may not provide that much more information.

In order to avoid this kind of performance issue, all these individual
irq counts are now separated into a new /proc/stat_irqs file. The
sum total irq count will stay in /proc/stat and be duplicated in
/proc/stat_irqs. Applications that need to look up individual irq counts
will now have to look into /proc/stat_irqs instead of /proc/stat.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/proc/stat.c | 48 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 59749df..79e3c03 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -155,11 +155,6 @@ static int show_stat(struct seq_file *p, void *v)
 		seq_putc(p, '\n');
 	}
 	seq_put_decimal_ull(p, "intr ", (unsigned long long)sum);
-
-	/* sum again ? it could be updated? */
-	for_each_irq_nr(j)
-		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
-
 	seq_printf(p,
 		"\nctxt %llu\n"
 		"btime %llu\n"
@@ -181,15 +176,46 @@ static int show_stat(struct seq_file *p, void *v)
 	return 0;
 }
 
+/*
+ * Showing individual irq counts can be expensive if there are a lot of
+ * irqs. So it is done in a separate procfs file to reduce performance
+ * overhead of reading other statistical counts.
+ */
+static int show_stat_irqs(struct seq_file *p, void *v)
+{
+	int i, j;
+	u64 sum = 0;
+
+	for_each_possible_cpu(i) {
+		sum += kstat_cpu_irqs_sum(i);
+		sum += arch_irq_stat_cpu(i);
+	}
+	sum += arch_irq_stat();
+
+	seq_put_decimal_ull(p, "intr ", (unsigned long long)sum);
+
+	for_each_irq_nr(j)
+		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
+
+	seq_putc(p, '\n');
+
+	return 0;
+}
+
 static int stat_open(struct inode *inode, struct file *file)
 {
 	size_t size = 1024 + 128 * num_online_cpus();
 
-	/* minimum size to display an interrupt count : 2 bytes */
-	size += 2 * nr_irqs;
 	return single_open_size(file, show_stat, NULL, size);
 }
 
+static int stat_irqs_open(struct inode *inode, struct file *file)
+{
+	size_t size = 1024 + 16 * nr_irqs;
+
+	return single_open_size(file, show_stat_irqs, NULL, size);
+}
+
 static const struct file_operations proc_stat_operations = {
 	.open		= stat_open,
 	.read		= seq_read,
@@ -197,9 +223,17 @@ static int stat_open(struct inode *inode, struct file *file)
 	.release	= single_release,
 };
 
+static const struct file_operations proc_stat_irqs_operations = {
+	.open		= stat_irqs_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static int __init proc_stat_init(void)
 {
 	proc_create("stat", 0, NULL, &proc_stat_operations);
+	proc_create("stat_irqs", 0, NULL, &proc_stat_irqs_operations);
 	return 0;
 }
 fs_initcall(proc_stat_init);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-05-02  0:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-19 19:08 [PATCH] proc/stat: Separate out individual irq counts into /proc/stat_irqs Alexey Dobriyan
2018-04-19 19:28 ` Waiman Long
2018-04-19 19:55   ` Alexey Dobriyan
     [not found]     ` <eb7c1569-e445-5cbb-6d10-2694b625232a@redhat.com>
2018-04-19 20:39       ` Alexey Dobriyan
2018-04-19 20:58         ` Waiman Long
2018-04-19 23:23           ` Joel Fernandes (Google)
2018-04-21 20:34             ` Alexey Dobriyan
2018-04-21 20:36               ` Alexey Dobriyan
2018-04-24  5:54                 ` David Rientjes
2018-04-24  6:18                   ` Alexey Dobriyan
2018-05-02  0:02                     ` Andrew Morton
2018-04-19 21:05         ` Waiman Long
  -- strict thread matches above, loose matches on Subject: below --
2018-04-19 17:09 Waiman Long
2018-04-19 17:38 ` Randy Dunlap
2018-04-19 18:44   ` Waiman Long
2018-04-19 19:43 ` Andrew Morton
2018-04-19 19:57   ` Waiman Long
2018-04-19 20:02   ` Alexey Dobriyan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox