public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9
@ 2018-09-07  1:42 Ivan Delalande
  2018-09-07  1:42 ` [PATCH 4.9 1/4] time: Introduce jiffies64_to_nsecs() Ivan Delalande
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Ivan Delalande @ 2018-09-07  1:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable, Ivan Delalande

Hi Greg,

This series fixes issues we've seen with softirq time accounting in 4.9:
  - when ksoftirqd is running at 100% on a CPU, none of the values
    reported by /proc/stat for that CPU will change, sometimes for
    dozens of seconds,
  - large deviations in the total number of ticks accumulated over a
    fixed time for a CPU, probably because of the first issue hitting
    for shorter periods.

We found out that something pretty similar had been reported 9 months
ago, see the reference link below. In that discussion, Rabin Vincent had
made a 4.9 specific patch which fixes our first issue, but we were still
seeing some deviation from the total number of ticks (up to 1.7% from
expected, where we had only 0.2% on older kernels), and you had also
asked for a direct backport from the mainline series, if possible.

As mentioned in that thread, a lot of changes (probably 50+) went into
4.11 to remove cputime, but we could get something working with only the
4 attached patches to fix these two issues. Three of these patches apply
without change, and the second one in the series ("sched/cputime:
Convert kcpustat to nsecs") needed a minor change as a cast had been
added in 527b0a76f41d ("sched/cpuacct: Avoid %lld seq_printf warning")
to fix a build warning on s390. I guess we could also include that patch
in this series, let me know if this is the preferred way to handle this.

We ran our tests on 3.18, 4.4 and 4.9 and confirmed that only 4.9 would
need this series, and that this series indeed restores the behavior we
were seeing on those older kernels.

Thanks!

Reference: http://lkml.kernel.org/r/%3C1513159876-5125-1-git-send-email-rabin.vincent@axis.com


Frederic Weisbecker (4):
  time: Introduce jiffies64_to_nsecs()
  sched/cputime: Convert kcpustat to nsecs
  sched/cputime: Increment kcpustat directly on irqtime account
  sched/cputime: Fix ksoftirqd cputime accounting regression

 arch/s390/appldata/appldata_os.c   | 16 +++----
 drivers/cpufreq/cpufreq.c          |  6 +--
 drivers/cpufreq/cpufreq_governor.c |  2 +-
 drivers/cpufreq/cpufreq_stats.c    |  1 -
 drivers/macintosh/rack-meter.c     |  2 +-
 fs/proc/stat.c                     | 68 +++++++++++++--------------
 fs/proc/uptime.c                   |  7 +--
 include/linux/jiffies.h            |  2 +
 kernel/sched/cpuacct.c             |  2 +-
 kernel/sched/cputime.c             | 75 +++++++++++++-----------------
 kernel/sched/sched.h               | 12 +++--
 kernel/time/time.c                 | 10 ++++
 kernel/time/timeconst.bc           |  6 +++
 13 files changed, 109 insertions(+), 100 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4.9 1/4] time: Introduce jiffies64_to_nsecs()
  2018-09-07  1:42 [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
@ 2018-09-07  1:42 ` Ivan Delalande
  2018-09-07  1:42 ` [PATCH 4.9 2/4] sched/cputime: Convert kcpustat to nsecs Ivan Delalande
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ivan Delalande @ 2018-09-07  1:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Heiko Carstens, Martin Schwidefsky, Tony Luck, Fenghua Yu,
	Peter Zijlstra, Rik van Riel, Stanislaw Gruszka, Wanpeng Li,
	Ivan Delalande

From: Frederic Weisbecker <fweisbec@gmail.com>

commit 07e5f5e353aaa61696c8353d87050994a0c4648a upstream.

This will be needed for the cputime_t to nsec conversion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ivan Delalande <colona@arista.com>
---
 include/linux/jiffies.h  |  2 ++
 kernel/time/time.c       | 10 ++++++++++
 kernel/time/timeconst.bc |  6 ++++++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index c2a0f0072274..734377ad42e9 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -292,6 +292,8 @@ static inline u64 jiffies_to_nsecs(const unsigned long j)
 	return (u64)jiffies_to_usecs(j) * NSEC_PER_USEC;
 }
 
+extern u64 jiffies64_to_nsecs(u64 j);
+
 extern unsigned long __msecs_to_jiffies(const unsigned int m);
 #if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
 /*
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 39468651a064..a5b6d98ea7b1 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -704,6 +704,16 @@ u64 nsec_to_clock_t(u64 x)
 #endif
 }
 
+u64 jiffies64_to_nsecs(u64 j)
+{
+#if !(NSEC_PER_SEC % HZ)
+	return (NSEC_PER_SEC / HZ) * j;
+# else
+	return div_u64(j * HZ_TO_NSEC_NUM, HZ_TO_NSEC_DEN);
+#endif
+}
+EXPORT_SYMBOL(jiffies64_to_nsecs);
+
 /**
  * nsecs_to_jiffies64 - Convert nsecs in u64 to jiffies64
  *
diff --git a/kernel/time/timeconst.bc b/kernel/time/timeconst.bc
index c48688904f9f..f83bbb81600b 100644
--- a/kernel/time/timeconst.bc
+++ b/kernel/time/timeconst.bc
@@ -98,6 +98,12 @@ define timeconst(hz) {
 		print "#define HZ_TO_USEC_DEN\t\t", hz/cd, "\n"
 		print "#define USEC_TO_HZ_NUM\t\t", hz/cd, "\n"
 		print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
+
+		cd=gcd(hz,1000000000)
+		print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
+		print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
+		print "#define NSEC_TO_HZ_NUM\t\t", hz/cd, "\n"
+		print "#define NSEC_TO_HZ_DEN\t\t", 1000000000/cd, "\n"
 		print "\n"
 
 		print "#endif /* KERNEL_TIMECONST_H */\n"
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4.9 2/4] sched/cputime: Convert kcpustat to nsecs
  2018-09-07  1:42 [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
  2018-09-07  1:42 ` [PATCH 4.9 1/4] time: Introduce jiffies64_to_nsecs() Ivan Delalande
@ 2018-09-07  1:42 ` Ivan Delalande
  2018-10-12 14:42   ` Greg Kroah-Hartman
  2018-09-07  1:42 ` [PATCH 4.9 3/4] sched/cputime: Increment kcpustat directly on irqtime account Ivan Delalande
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Ivan Delalande @ 2018-09-07  1:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Heiko Carstens, Martin Schwidefsky, Tony Luck, Fenghua Yu,
	Peter Zijlstra, Rik van Riel, Stanislaw Gruszka, Wanpeng Li,
	Ivan Delalande

From: Frederic Weisbecker <fweisbec@gmail.com>

commit 7fb1327ee9b92fca27662f9b9d60c7c3376d6c69 upstream.

Kernel CPU stats are stored in cputime_t which is an architecture
defined type, and hence a bit opaque and requiring accessors and mutators
for any operation.

Converting them to nsecs simplifies the code and is one step toward
the removal of cputime_t in the core code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[colona: minor conflict as 527b0a76f41d ("sched/cpuacct: Avoid %lld seq_printf
warning") is missing from v4.9]
Signed-off-by: Ivan Delalande <colona@arista.com>
---
 arch/s390/appldata/appldata_os.c   | 16 +++----
 drivers/cpufreq/cpufreq.c          |  6 +--
 drivers/cpufreq/cpufreq_governor.c |  2 +-
 drivers/cpufreq/cpufreq_stats.c    |  1 -
 drivers/macintosh/rack-meter.c     |  2 +-
 fs/proc/stat.c                     | 68 +++++++++++++++---------------
 fs/proc/uptime.c                   |  7 +--
 kernel/sched/cpuacct.c             |  2 +-
 kernel/sched/cputime.c             | 22 +++++-----
 9 files changed, 61 insertions(+), 65 deletions(-)

diff --git a/arch/s390/appldata/appldata_os.c b/arch/s390/appldata/appldata_os.c
index 69b23b25ac34..08b9e942a262 100644
--- a/arch/s390/appldata/appldata_os.c
+++ b/arch/s390/appldata/appldata_os.c
@@ -113,21 +113,21 @@ static void appldata_get_os_data(void *data)
 	j = 0;
 	for_each_online_cpu(i) {
 		os_data->os_cpu[j].per_cpu_user =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]);
 		os_data->os_cpu[j].per_cpu_nice =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]);
 		os_data->os_cpu[j].per_cpu_system =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]);
 		os_data->os_cpu[j].per_cpu_idle =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]);
 		os_data->os_cpu[j].per_cpu_irq =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]);
 		os_data->os_cpu[j].per_cpu_softirq =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]);
 		os_data->os_cpu[j].per_cpu_iowait =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]);
 		os_data->os_cpu[j].per_cpu_steal =
-			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]);
+			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]);
 		os_data->os_cpu[j].cpu_id = i;
 		j++;
 	}
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index af5eff6835a8..d6d91e8afa9e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -132,7 +132,7 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
 	u64 cur_wall_time;
 	u64 busy_time;
 
-	cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
+	cur_wall_time = jiffies64_to_nsecs(get_jiffies_64());
 
 	busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER];
 	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM];
@@ -143,9 +143,9 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
 
 	idle_time = cur_wall_time - busy_time;
 	if (wall)
-		*wall = cputime_to_usecs(cur_wall_time);
+		*wall = div_u64(cur_wall_time, NSEC_PER_USEC);
 
-	return cputime_to_usecs(idle_time);
+	return div_u64(idle_time, NSEC_PER_USEC);
 }
 
 u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy)
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 642dd0f183a8..38d1a8216084 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -152,7 +152,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 		if (ignore_nice) {
 			u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-			idle_time += cputime_to_usecs(cur_nice - j_cdbs->prev_cpu_nice);
+			idle_time += div_u64(cur_nice - j_cdbs->prev_cpu_nice, NSEC_PER_USEC);
 			j_cdbs->prev_cpu_nice = cur_nice;
 		}
 
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 06d3abdffd3a..b084708fd113 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -13,7 +13,6 @@
 #include <linux/cpufreq.h>
 #include <linux/module.h>
 #include <linux/slab.h>
-#include <linux/cputime.h>
 
 static DEFINE_SPINLOCK(cpufreq_stats_lock);
 
diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
index 25852e399ab2..13b16f34256c 100644
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -91,7 +91,7 @@ static inline cputime64_t get_cpu_idle_time(unsigned int cpu)
 	if (rackmeter_ignore_nice)
 		retval += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
 
-	return retval;
+	return nsecs_to_cputime64(retval);
 }
 
 static void rackmeter_setup_i2s(struct rackmeter *rm)
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index d700c42b3572..44475a44cbf1 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -21,23 +21,23 @@
 
 #ifdef arch_idle_time
 
-static cputime64_t get_idle_time(int cpu)
+static u64 get_idle_time(int cpu)
 {
-	cputime64_t idle;
+	u64 idle;
 
 	idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE];
 	if (cpu_online(cpu) && !nr_iowait_cpu(cpu))
-		idle += arch_idle_time(cpu);
+		idle += cputime_to_nsecs(arch_idle_time(cpu));
 	return idle;
 }
 
-static cputime64_t get_iowait_time(int cpu)
+static u64 get_iowait_time(int cpu)
 {
-	cputime64_t iowait;
+	u64 iowait;
 
 	iowait = kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT];
 	if (cpu_online(cpu) && nr_iowait_cpu(cpu))
-		iowait += arch_idle_time(cpu);
+		iowait += cputime_to_nsecs(arch_idle_time(cpu));
 	return iowait;
 }
 
@@ -45,32 +45,32 @@ static cputime64_t get_iowait_time(int cpu)
 
 static u64 get_idle_time(int cpu)
 {
-	u64 idle, idle_time = -1ULL;
+	u64 idle, idle_usecs = -1ULL;
 
 	if (cpu_online(cpu))
-		idle_time = get_cpu_idle_time_us(cpu, NULL);
+		idle_usecs = get_cpu_idle_time_us(cpu, NULL);
 
-	if (idle_time == -1ULL)
+	if (idle_usecs == -1ULL)
 		/* !NO_HZ or cpu offline so we can rely on cpustat.idle */
 		idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE];
 	else
-		idle = usecs_to_cputime64(idle_time);
+		idle = idle_usecs * NSEC_PER_USEC;
 
 	return idle;
 }
 
 static u64 get_iowait_time(int cpu)
 {
-	u64 iowait, iowait_time = -1ULL;
+	u64 iowait, iowait_usecs = -1ULL;
 
 	if (cpu_online(cpu))
-		iowait_time = get_cpu_iowait_time_us(cpu, NULL);
+		iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
 
-	if (iowait_time == -1ULL)
+	if (iowait_usecs == -1ULL)
 		/* !NO_HZ or cpu offline so we can rely on cpustat.iowait */
 		iowait = kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT];
 	else
-		iowait = usecs_to_cputime64(iowait_time);
+		iowait = iowait_usecs * NSEC_PER_USEC;
 
 	return iowait;
 }
@@ -115,16 +115,16 @@ static int show_stat(struct seq_file *p, void *v)
 	}
 	sum += arch_irq_stat();
 
-	seq_put_decimal_ull(p, "cpu  ", cputime64_to_clock_t(user));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(nice));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(system));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(idle));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(iowait));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(irq));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(softirq));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(steal));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(guest));
-	seq_put_decimal_ull(p, " ", cputime64_to_clock_t(guest_nice));
+	seq_put_decimal_ull(p, "cpu  ", nsec_to_clock_t(user));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(nice));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(system));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(idle));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(iowait));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(irq));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(softirq));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(steal));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest));
+	seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice));
 	seq_putc(p, '\n');
 
 	for_each_online_cpu(i) {
@@ -140,16 +140,16 @@ static int show_stat(struct seq_file *p, void *v)
 		guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
 		guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
 		seq_printf(p, "cpu%d", i);
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(user));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(nice));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(system));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(idle));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(iowait));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(irq));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(softirq));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(steal));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(guest));
-		seq_put_decimal_ull(p, " ", cputime64_to_clock_t(guest_nice));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(user));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(nice));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(system));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(idle));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(iowait));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(irq));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(softirq));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(steal));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest));
+		seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice));
 		seq_putc(p, '\n');
 	}
 	seq_put_decimal_ull(p, "intr ", (unsigned long long)sum);
diff --git a/fs/proc/uptime.c b/fs/proc/uptime.c
index 33de567c25af..7981c4ffe787 100644
--- a/fs/proc/uptime.c
+++ b/fs/proc/uptime.c
@@ -5,23 +5,20 @@
 #include <linux/seq_file.h>
 #include <linux/time.h>
 #include <linux/kernel_stat.h>
-#include <linux/cputime.h>
 
 static int uptime_proc_show(struct seq_file *m, void *v)
 {
 	struct timespec uptime;
 	struct timespec idle;
-	u64 idletime;
 	u64 nsec;
 	u32 rem;
 	int i;
 
-	idletime = 0;
+	nsec = 0;
 	for_each_possible_cpu(i)
-		idletime += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE];
+		nsec += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE];
 
 	get_monotonic_boottime(&uptime);
-	nsec = cputime64_to_jiffies64(idletime) * TICK_NSEC;
 	idle.tv_sec = div_u64_rem(nsec, NSEC_PER_SEC, &rem);
 	idle.tv_nsec = rem;
 	seq_printf(m, "%lu.%02lu %lu.%02lu\n",
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index bc0b309c3f19..4c882791c10c 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -297,7 +297,7 @@ static int cpuacct_stats_show(struct seq_file *sf, void *v)
 	for (stat = 0; stat < CPUACCT_STAT_NSTATS; stat++) {
 		seq_printf(sf, "%s %lld\n",
 			   cpuacct_stat_desc[stat],
-			   cputime64_to_clock_t(val[stat]));
+			   nsec_to_clock_t(val[stat]));
 	}
 
 	return 0;
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5ebee3164e64..c07f0cfea07e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -75,9 +75,9 @@ static cputime_t irqtime_account_update(u64 irqtime, int idx, cputime_t maxtime)
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	cputime_t irq_cputime;
 
-	irq_cputime = nsecs_to_cputime64(irqtime) - cpustat[idx];
+	irq_cputime = nsecs_to_cputime64(irqtime - cpustat[idx]);
 	irq_cputime = min(irq_cputime, maxtime);
-	cpustat[idx] += irq_cputime;
+	cpustat[idx] += cputime_to_nsecs(irq_cputime);
 
 	return irq_cputime;
 }
@@ -143,7 +143,7 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
 	index = (task_nice(p) > 0) ? CPUTIME_NICE : CPUTIME_USER;
 
 	/* Add user time to cpustat. */
-	task_group_account_field(p, index, (__force u64) cputime);
+	task_group_account_field(p, index, cputime_to_nsecs(cputime));
 
 	/* Account for user time used */
 	acct_account_cputime(p);
@@ -168,11 +168,11 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime,
 
 	/* Add guest time to cpustat. */
 	if (task_nice(p) > 0) {
-		cpustat[CPUTIME_NICE] += (__force u64) cputime;
-		cpustat[CPUTIME_GUEST_NICE] += (__force u64) cputime;
+		cpustat[CPUTIME_NICE] += cputime_to_nsecs(cputime);
+		cpustat[CPUTIME_GUEST_NICE] += cputime_to_nsecs(cputime);
 	} else {
-		cpustat[CPUTIME_USER] += (__force u64) cputime;
-		cpustat[CPUTIME_GUEST] += (__force u64) cputime;
+		cpustat[CPUTIME_USER] += cputime_to_nsecs(cputime);
+		cpustat[CPUTIME_GUEST] += cputime_to_nsecs(cputime);
 	}
 }
 
@@ -193,7 +193,7 @@ void __account_system_time(struct task_struct *p, cputime_t cputime,
 	account_group_system_time(p, cputime);
 
 	/* Add system time to cpustat. */
-	task_group_account_field(p, index, (__force u64) cputime);
+	task_group_account_field(p, index, cputime_to_nsecs(cputime));
 
 	/* Account for system time used */
 	acct_account_cputime(p);
@@ -234,7 +234,7 @@ void account_steal_time(cputime_t cputime)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
-	cpustat[CPUTIME_STEAL] += (__force u64) cputime;
+	cpustat[CPUTIME_STEAL] += cputime_to_nsecs(cputime);
 }
 
 /*
@@ -247,9 +247,9 @@ void account_idle_time(cputime_t cputime)
 	struct rq *rq = this_rq();
 
 	if (atomic_read(&rq->nr_iowait) > 0)
-		cpustat[CPUTIME_IOWAIT] += (__force u64) cputime;
+		cpustat[CPUTIME_IOWAIT] += cputime_to_nsecs(cputime);
 	else
-		cpustat[CPUTIME_IDLE] += (__force u64) cputime;
+		cpustat[CPUTIME_IDLE] += cputime_to_nsecs(cputime);
 }
 
 /*
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4.9 3/4] sched/cputime: Increment kcpustat directly on irqtime account
  2018-09-07  1:42 [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
  2018-09-07  1:42 ` [PATCH 4.9 1/4] time: Introduce jiffies64_to_nsecs() Ivan Delalande
  2018-09-07  1:42 ` [PATCH 4.9 2/4] sched/cputime: Convert kcpustat to nsecs Ivan Delalande
@ 2018-09-07  1:42 ` Ivan Delalande
  2018-09-07  1:42 ` [PATCH 4.9 4/4] sched/cputime: Fix ksoftirqd cputime accounting regression Ivan Delalande
  2018-09-13 23:13 ` [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
  4 siblings, 0 replies; 9+ messages in thread
From: Ivan Delalande @ 2018-09-07  1:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable, Benjamin Herrenschmidt, Fenghua Yu, Heiko Carstens,
	Linus Torvalds, Martin Schwidefsky, Michael Ellerman,
	Paul Mackerras, Peter Zijlstra, Rik van Riel, Stanislaw Gruszka,
	Thomas Gleixner, Tony Luck, Wanpeng Li, Ivan Delalande

From: Frederic Weisbecker <fweisbec@gmail.com>

commit a499a5a14dbd1d0315a96fc62a8798059325e9e6 upstream.

The irqtime is accounted is nsecs and stored in
cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
accumulated amount reaches a new jiffy, this one gets accounted to the
kcpustat.

This was necessary when kcpustat was stored in cputime_t, which could at
worst have jiffies granularity. But now kcpustat is stored in nsecs
so this whole discretization game with temporary irqtime storage has
become unnecessary.

We can now directly account the irqtime to the kcpustat.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-17-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ivan Delalande <colona@arista.com>
---
 kernel/sched/cputime.c | 50 ++++++++++++++----------------------------
 kernel/sched/sched.h   |  7 +++---
 2 files changed, 21 insertions(+), 36 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index c07f0cfea07e..2c2060c99896 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -44,6 +44,7 @@ void disable_sched_clock_irqtime(void)
 void irqtime_account_irq(struct task_struct *curr)
 {
 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	s64 delta;
 	int cpu;
 
@@ -61,49 +62,35 @@ void irqtime_account_irq(struct task_struct *curr)
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (hardirq_count())
-		irqtime->hardirq_time += delta;
-	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
-		irqtime->softirq_time += delta;
+	if (hardirq_count()) {
+		cpustat[CPUTIME_IRQ] += delta;
+		irqtime->tick_delta += delta;
+	} else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) {
+		cpustat[CPUTIME_SOFTIRQ] += delta;
+		irqtime->tick_delta += delta;
+	}
 
 	u64_stats_update_end(&irqtime->sync);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
-static cputime_t irqtime_account_update(u64 irqtime, int idx, cputime_t maxtime)
+static cputime_t irqtime_tick_accounted(cputime_t maxtime)
 {
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
-	cputime_t irq_cputime;
-
-	irq_cputime = nsecs_to_cputime64(irqtime - cpustat[idx]);
-	irq_cputime = min(irq_cputime, maxtime);
-	cpustat[idx] += cputime_to_nsecs(irq_cputime);
-
-	return irq_cputime;
-}
+	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+	cputime_t delta;
 
-static cputime_t irqtime_account_hi_update(cputime_t maxtime)
-{
-	return irqtime_account_update(__this_cpu_read(cpu_irqtime.hardirq_time),
-				      CPUTIME_IRQ, maxtime);
-}
+	delta = nsecs_to_cputime(irqtime->tick_delta);
+	delta = min(delta, maxtime);
+	irqtime->tick_delta -= cputime_to_nsecs(delta);
 
-static cputime_t irqtime_account_si_update(cputime_t maxtime)
-{
-	return irqtime_account_update(__this_cpu_read(cpu_irqtime.softirq_time),
-				      CPUTIME_SOFTIRQ, maxtime);
+	return delta;
 }
 
 #else /* CONFIG_IRQ_TIME_ACCOUNTING */
 
 #define sched_clock_irqtime	(0)
 
-static cputime_t irqtime_account_hi_update(cputime_t dummy)
-{
-	return 0;
-}
-
-static cputime_t irqtime_account_si_update(cputime_t dummy)
+static cputime_t irqtime_tick_accounted(cputime_t dummy)
 {
 	return 0;
 }
@@ -290,10 +277,7 @@ static inline cputime_t account_other_time(cputime_t max)
 	accounted = steal_account_process_time(max);
 
 	if (accounted < max)
-		accounted += irqtime_account_hi_update(max - accounted);
-
-	if (accounted < max)
-		accounted += irqtime_account_si_update(max - accounted);
+		accounted += irqtime_tick_accounted(max - accounted);
 
 	return accounted;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f564a1d2c9d5..187c556196af 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -4,6 +4,7 @@
 #include <linux/sched/rt.h>
 #include <linux/u64_stats_sync.h>
 #include <linux/sched/deadline.h>
+#include <linux/kernel_stat.h>
 #include <linux/binfmts.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
@@ -1742,8 +1743,7 @@ static inline void nohz_balance_exit_idle(unsigned int cpu) { }
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 struct irqtime {
-	u64			hardirq_time;
-	u64			softirq_time;
+	u64			tick_delta;
 	u64			irq_start_time;
 	struct u64_stats_sync	sync;
 };
@@ -1753,12 +1753,13 @@ DECLARE_PER_CPU(struct irqtime, cpu_irqtime);
 static inline u64 irq_time_read(int cpu)
 {
 	struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);
+	u64 *cpustat = kcpustat_cpu(cpu).cpustat;
 	unsigned int seq;
 	u64 total;
 
 	do {
 		seq = __u64_stats_fetch_begin(&irqtime->sync);
-		total = irqtime->softirq_time + irqtime->hardirq_time;
+		total = cpustat[CPUTIME_SOFTIRQ] + cpustat[CPUTIME_IRQ];
 	} while (__u64_stats_fetch_retry(&irqtime->sync, seq));
 
 	return total;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4.9 4/4] sched/cputime: Fix ksoftirqd cputime accounting regression
  2018-09-07  1:42 [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
                   ` (2 preceding siblings ...)
  2018-09-07  1:42 ` [PATCH 4.9 3/4] sched/cputime: Increment kcpustat directly on irqtime account Ivan Delalande
@ 2018-09-07  1:42 ` Ivan Delalande
  2018-09-13 23:13 ` [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
  4 siblings, 0 replies; 9+ messages in thread
From: Ivan Delalande @ 2018-09-07  1:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable, Linus Torvalds, Peter Zijlstra, Stanislaw Gruszka,
	Thomas Gleixner, Wanpeng Li, Ivan Delalande

From: Frederic Weisbecker <fweisbec@gmail.com>

commit 25e2d8c1b9e327ed260edd13169cc22bc7a78bc6 upstream.

irq_time_read() returns the irqtime minus the ksoftirqd time. This
is necessary because irq_time_read() is used to substract the IRQ time
from the sum_exec_runtime of a task. If we were to include the softirq
time of ksoftirqd, this task would substract its own CPU time everytime
it updates ksoftirqd->sum_exec_runtime which would therefore never
progress.

But this behaviour got broken by:

  a499a5a14db ("sched/cputime: Increment kcpustat directly on irqtime account")

... which now includes ksoftirqd softirq time in the time returned by
irq_time_read().

This has resulted in wrong ksoftirqd cputime reported to userspace
through /proc/stat and thus "top" not showing ksoftirqd when it should
after intense networking load.

ksoftirqd->stime happens to be correct but it gets scaled down by
sum_exec_runtime through task_cputime_adjusted().

To fix this, just account the strict IRQ time in a separate counter and
use it to report the IRQ time.

Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ivan Delalande <colona@arista.com>
---
 kernel/sched/cputime.c | 27 ++++++++++++++++-----------
 kernel/sched/sched.h   |  9 +++++++--
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 2c2060c99896..448d6426fa5f 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -37,6 +37,18 @@ void disable_sched_clock_irqtime(void)
 	sched_clock_irqtime = 0;
 }
 
+static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
+				  enum cpu_usage_stat idx)
+{
+	u64 *cpustat = kcpustat_this_cpu->cpustat;
+
+	u64_stats_update_begin(&irqtime->sync);
+	cpustat[idx] += delta;
+	irqtime->total += delta;
+	irqtime->tick_delta += delta;
+	u64_stats_update_end(&irqtime->sync);
+}
+
 /*
  * Called before incrementing preempt_count on {soft,}irq_enter
  * and before decrementing preempt_count on {soft,}irq_exit.
@@ -44,7 +56,6 @@ void disable_sched_clock_irqtime(void)
 void irqtime_account_irq(struct task_struct *curr)
 {
 	struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
-	u64 *cpustat = kcpustat_this_cpu->cpustat;
 	s64 delta;
 	int cpu;
 
@@ -55,22 +66,16 @@ void irqtime_account_irq(struct task_struct *curr)
 	delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
 	irqtime->irq_start_time += delta;
 
-	u64_stats_update_begin(&irqtime->sync);
 	/*
 	 * We do not account for softirq time from ksoftirqd here.
 	 * We want to continue accounting softirq time to ksoftirqd thread
 	 * in that case, so as not to confuse scheduler with a special task
 	 * that do not consume any time, but still wants to run.
 	 */
-	if (hardirq_count()) {
-		cpustat[CPUTIME_IRQ] += delta;
-		irqtime->tick_delta += delta;
-	} else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) {
-		cpustat[CPUTIME_SOFTIRQ] += delta;
-		irqtime->tick_delta += delta;
-	}
-
-	u64_stats_update_end(&irqtime->sync);
+	if (hardirq_count())
+		irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
+	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
+		irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
 }
 EXPORT_SYMBOL_GPL(irqtime_account_irq);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 187c556196af..923cc35e8490 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1743,6 +1743,7 @@ static inline void nohz_balance_exit_idle(unsigned int cpu) { }
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 struct irqtime {
+	u64			total;
 	u64			tick_delta;
 	u64			irq_start_time;
 	struct u64_stats_sync	sync;
@@ -1750,16 +1751,20 @@ struct irqtime {
 
 DECLARE_PER_CPU(struct irqtime, cpu_irqtime);
 
+/*
+ * Returns the irqtime minus the softirq time computed by ksoftirqd.
+ * Otherwise ksoftirqd's sum_exec_runtime is substracted its own runtime
+ * and never move forward.
+ */
 static inline u64 irq_time_read(int cpu)
 {
 	struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu);
-	u64 *cpustat = kcpustat_cpu(cpu).cpustat;
 	unsigned int seq;
 	u64 total;
 
 	do {
 		seq = __u64_stats_fetch_begin(&irqtime->sync);
-		total = cpustat[CPUTIME_SOFTIRQ] + cpustat[CPUTIME_IRQ];
+		total = irqtime->total;
 	} while (__u64_stats_fetch_retry(&irqtime->sync, seq));
 
 	return total;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9
  2018-09-07  1:42 [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
                   ` (3 preceding siblings ...)
  2018-09-07  1:42 ` [PATCH 4.9 4/4] sched/cputime: Fix ksoftirqd cputime accounting regression Ivan Delalande
@ 2018-09-13 23:13 ` Ivan Delalande
  2018-09-14  6:57   ` Greg Kroah-Hartman
  4 siblings, 1 reply; 9+ messages in thread
From: Ivan Delalande @ 2018-09-13 23:13 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable

Hey Greg,

Did you get a chance to look at this series? Thanks a lot!

On Thu, Sep 06, 2018 at 06:42:14PM -0700, Ivan Delalande wrote:
> This series fixes issues we've seen with softirq time accounting in 4.9:
>   - when ksoftirqd is running at 100% on a CPU, none of the values
>     reported by /proc/stat for that CPU will change, sometimes for
>     dozens of seconds,
>   - large deviations in the total number of ticks accumulated over a
>     fixed time for a CPU, probably because of the first issue hitting
>     for shorter periods.
> 
> We found out that something pretty similar had been reported 9 months
> ago, see the reference link below. In that discussion, Rabin Vincent had
> made a 4.9 specific patch which fixes our first issue, but we were still
> seeing some deviation from the total number of ticks (up to 1.7% from
> expected, where we had only 0.2% on older kernels), and you had also
> asked for a direct backport from the mainline series, if possible.
> 
> As mentioned in that thread, a lot of changes (probably 50+) went into
> 4.11 to remove cputime, but we could get something working with only the
> 4 attached patches to fix these two issues. Three of these patches apply
> without change, and the second one in the series ("sched/cputime:
> Convert kcpustat to nsecs") needed a minor change as a cast had been
> added in 527b0a76f41d ("sched/cpuacct: Avoid %lld seq_printf warning")
> to fix a build warning on s390. I guess we could also include that patch
> in this series, let me know if this is the preferred way to handle this.
> 
> We ran our tests on 3.18, 4.4 and 4.9 and confirmed that only 4.9 would
> need this series, and that this series indeed restores the behavior we
> were seeing on those older kernels.
> 
> Thanks!
> 
> Reference: http://lkml.kernel.org/r/%3C1513159876-5125-1-git-send-email-rabin.vincent@axis.com
> 
> 
> Frederic Weisbecker (4):
>   time: Introduce jiffies64_to_nsecs()
>   sched/cputime: Convert kcpustat to nsecs
>   sched/cputime: Increment kcpustat directly on irqtime account
>   sched/cputime: Fix ksoftirqd cputime accounting regression
> 
>  arch/s390/appldata/appldata_os.c   | 16 +++----
>  drivers/cpufreq/cpufreq.c          |  6 +--
>  drivers/cpufreq/cpufreq_governor.c |  2 +-
>  drivers/cpufreq/cpufreq_stats.c    |  1 -
>  drivers/macintosh/rack-meter.c     |  2 +-
>  fs/proc/stat.c                     | 68 +++++++++++++--------------
>  fs/proc/uptime.c                   |  7 +--
>  include/linux/jiffies.h            |  2 +
>  kernel/sched/cpuacct.c             |  2 +-
>  kernel/sched/cputime.c             | 75 +++++++++++++-----------------
>  kernel/sched/sched.h               | 12 +++--
>  kernel/time/time.c                 | 10 ++++
>  kernel/time/timeconst.bc           |  6 +++
>  13 files changed, 109 insertions(+), 100 deletions(-)

-- 
Ivan Delalande
Arista Networks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9
  2018-09-13 23:13 ` [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
@ 2018-09-14  6:57   ` Greg Kroah-Hartman
  2018-10-11 14:11     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2018-09-14  6:57 UTC (permalink / raw)
  To: Ivan Delalande
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable

On Thu, Sep 13, 2018 at 04:13:17PM -0700, Ivan Delalande wrote:
> Hey Greg,
> 
> Did you get a chance to look at this series? Thanks a lot!

Not yet, sorry, I was digging myself out from under the patch backlog of
all of the other stuff sent before this series.

It isn't lost, don't worry, give it a chance.  It's also almost a 2 year
old kernel you are using here, so there can't be a lot of rush :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9
  2018-09-14  6:57   ` Greg Kroah-Hartman
@ 2018-10-11 14:11     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 9+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-11 14:11 UTC (permalink / raw)
  To: Ivan Delalande
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable

On Fri, Sep 14, 2018 at 08:57:48AM +0200, Greg Kroah-Hartman wrote:
> On Thu, Sep 13, 2018 at 04:13:17PM -0700, Ivan Delalande wrote:
> > Hey Greg,
> > 
> > Did you get a chance to look at this series? Thanks a lot!
> 
> Not yet, sorry, I was digging myself out from under the patch backlog of
> all of the other stuff sent before this series.
> 
> It isn't lost, don't worry, give it a chance.  It's also almost a 2 year
> old kernel you are using here, so there can't be a lot of rush :)

All now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4.9 2/4] sched/cputime: Convert kcpustat to nsecs
  2018-09-07  1:42 ` [PATCH 4.9 2/4] sched/cputime: Convert kcpustat to nsecs Ivan Delalande
@ 2018-10-12 14:42   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 9+ messages in thread
From: Greg Kroah-Hartman @ 2018-10-12 14:42 UTC (permalink / raw)
  To: Ivan Delalande
  Cc: Frederic Weisbecker, Ingo Molnar, Rabin Vincent, Edward Chron,
	stable, Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Heiko Carstens, Martin Schwidefsky, Tony Luck, Fenghua Yu,
	Peter Zijlstra, Rik van Riel, Stanislaw Gruszka, Wanpeng Li

On Thu, Sep 06, 2018 at 06:42:16PM -0700, Ivan Delalande wrote:
> From: Frederic Weisbecker <fweisbec@gmail.com>
> 
> commit 7fb1327ee9b92fca27662f9b9d60c7c3376d6c69 upstream.
> 
> Kernel CPU stats are stored in cputime_t which is an architecture
> defined type, and hence a bit opaque and requiring accessors and mutators
> for any operation.
> 
> Converting them to nsecs simplifies the code and is one step toward
> the removal of cputime_t in the core code.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Stanislaw Gruszka <sgruszka@redhat.com>
> Cc: Wanpeng Li <wanpeng.li@hotmail.com>
> Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> [colona: minor conflict as 527b0a76f41d ("sched/cpuacct: Avoid %lld seq_printf
> warning") is missing from v4.9]
> Signed-off-by: Ivan Delalande <colona@arista.com>
> ---
>  arch/s390/appldata/appldata_os.c   | 16 +++----
>  drivers/cpufreq/cpufreq.c          |  6 +--
>  drivers/cpufreq/cpufreq_governor.c |  2 +-
>  drivers/cpufreq/cpufreq_stats.c    |  1 -
>  drivers/macintosh/rack-meter.c     |  2 +-
>  fs/proc/stat.c                     | 68 +++++++++++++++---------------
>  fs/proc/uptime.c                   |  7 +--
>  kernel/sched/cpuacct.c             |  2 +-
>  kernel/sched/cputime.c             | 22 +++++-----
>  9 files changed, 61 insertions(+), 65 deletions(-)
> 
> diff --git a/arch/s390/appldata/appldata_os.c b/arch/s390/appldata/appldata_os.c
> index 69b23b25ac34..08b9e942a262 100644
> --- a/arch/s390/appldata/appldata_os.c
> +++ b/arch/s390/appldata/appldata_os.c
> @@ -113,21 +113,21 @@ static void appldata_get_os_data(void *data)
>  	j = 0;
>  	for_each_online_cpu(i) {
>  		os_data->os_cpu[j].per_cpu_user =
> -			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]);
> +			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]);
>  		os_data->os_cpu[j].per_cpu_nice =
> -			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]);
> +			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]);
>  		os_data->os_cpu[j].per_cpu_system =
> -			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]);
> +			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]);
>  		os_data->os_cpu[j].per_cpu_idle =
> -			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]);
> +			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]);
>  		os_data->os_cpu[j].per_cpu_irq =
> -			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]);
> +			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]);
>  		os_data->os_cpu[j].per_cpu_softirq =
> -			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]);
> +			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]);
>  		os_data->os_cpu[j].per_cpu_iowait =
> -			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]);
> +			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]);
>  		os_data->os_cpu[j].per_cpu_steal =
> -			cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]);
> +			nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]);
>  		os_data->os_cpu[j].cpu_id = i;
>  		j++;
>  	}
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index af5eff6835a8..d6d91e8afa9e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -132,7 +132,7 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
>  	u64 cur_wall_time;
>  	u64 busy_time;
>  
> -	cur_wall_time = jiffies64_to_cputime64(get_jiffies_64());
> +	cur_wall_time = jiffies64_to_nsecs(get_jiffies_64());
>  
>  	busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER];
>  	busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM];
> @@ -143,9 +143,9 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
>  
>  	idle_time = cur_wall_time - busy_time;
>  	if (wall)
> -		*wall = cputime_to_usecs(cur_wall_time);
> +		*wall = div_u64(cur_wall_time, NSEC_PER_USEC);
>  
> -	return cputime_to_usecs(idle_time);
> +	return div_u64(idle_time, NSEC_PER_USEC);
>  }
>  
>  u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy)
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 642dd0f183a8..38d1a8216084 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -152,7 +152,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
>  		if (ignore_nice) {
>  			u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
>  
> -			idle_time += cputime_to_usecs(cur_nice - j_cdbs->prev_cpu_nice);
> +			idle_time += div_u64(cur_nice - j_cdbs->prev_cpu_nice, NSEC_PER_USEC);
>  			j_cdbs->prev_cpu_nice = cur_nice;
>  		}
>  
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index 06d3abdffd3a..b084708fd113 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -13,7 +13,6 @@
>  #include <linux/cpufreq.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>
> -#include <linux/cputime.h>
>  
>  static DEFINE_SPINLOCK(cpufreq_stats_lock);
>  
> diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
> index 25852e399ab2..13b16f34256c 100644
> --- a/drivers/macintosh/rack-meter.c
> +++ b/drivers/macintosh/rack-meter.c
> @@ -91,7 +91,7 @@ static inline cputime64_t get_cpu_idle_time(unsigned int cpu)
>  	if (rackmeter_ignore_nice)
>  		retval += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
>  
> -	return retval;
> +	return nsecs_to_cputime64(retval);

Turns out this patch breaks the build here on powerpc :(

And I can't easily figure out how to fix it up.

So I'm going to drop this series from the 4.9 tree right now.  Can you
fix this up and resend?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-10-12 22:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-07  1:42 [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
2018-09-07  1:42 ` [PATCH 4.9 1/4] time: Introduce jiffies64_to_nsecs() Ivan Delalande
2018-09-07  1:42 ` [PATCH 4.9 2/4] sched/cputime: Convert kcpustat to nsecs Ivan Delalande
2018-10-12 14:42   ` Greg Kroah-Hartman
2018-09-07  1:42 ` [PATCH 4.9 3/4] sched/cputime: Increment kcpustat directly on irqtime account Ivan Delalande
2018-09-07  1:42 ` [PATCH 4.9 4/4] sched/cputime: Fix ksoftirqd cputime accounting regression Ivan Delalande
2018-09-13 23:13 ` [PATCH 4.9 0/4] Fix softirq time accounting issues on 4.9 Ivan Delalande
2018-09-14  6:57   ` Greg Kroah-Hartman
2018-10-11 14:11     ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox