netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Serious performance degradation in Linux 4.15
@ 2018-02-09 17:59 Jon Maloy
  2018-02-10 14:01 ` Peter Zijlstra
  2018-02-12 15:16 ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Jon Maloy @ 2018-02-09 17:59 UTC (permalink / raw)
  To: netdev@vger.kernel.org, peterz@infradead.org, mingo@kernel.org
  Cc: David Miller (davem@davemloft.net)

The two commits 
d153b153446f7 (" sched/core: Fix wake_affine() performance regression") and
f2cdd9cc6c97 ("sched/core: Address more wake_affine() regressions")
are causing a serious performance degradation in Linux 4.5.

The effect is worst on TIPC, but even TCP is affected, as the figures below show. 


Command for TCP:
"netperf TCP_STREAM (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"

v4.15-rc1 without f2cdd9cc6c97e, d153b153446f7:   1293.67
V4.15 with the two commits:                                           1104.58 

i.e., a degradation of 17 % for TCP

Command for TIPC:
"netperf TIPC_STREAM (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"

v4.15-rc1 without f2cdd9cc6c97e, d153b153446f7:   786.22
V4.15 with the two commits:                                           223.18

i.e., a degradation of 71 % 

This is really bad, and I hope you have a plan for reintroducing this in some form.

BR
Jon Maloy







	v4.15-rc1->latest	diff (%)
"netperf TCP_STREAM (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"	1293.67	1104.58	-14.62%
"benchmark TIPC
(client_bench -c 1 -m 65000 -t)"	786.67	215.67	-72.58%
"netperf TIPC_STREAM
(netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TIPC_STREAM -l 10 -- -O THROUGHPUT)"	786.22	223.18	-71.61%

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-09 17:59 Serious performance degradation in Linux 4.15 Jon Maloy
@ 2018-02-10 14:01 ` Peter Zijlstra
  2018-02-12 15:16 ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2018-02-10 14:01 UTC (permalink / raw)
  To: Jon Maloy
  Cc: netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net)

On Fri, Feb 09, 2018 at 05:59:12PM +0000, Jon Maloy wrote:
> The two commits 
> d153b153446f7 (" sched/core: Fix wake_affine() performance regression") and
> f2cdd9cc6c97 ("sched/core: Address more wake_affine() regressions")
> are causing a serious performance degradation in Linux 4.5.
> 
> The effect is worst on TIPC, but even TCP is affected, as the figures below show. 

I did run a whole bunch of netperf and didn't see anything like that.
0-day also didn't report anything, and it too runs netperf.

I'll try and see if I can reproduce somewhere next week.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-09 17:59 Serious performance degradation in Linux 4.15 Jon Maloy
  2018-02-10 14:01 ` Peter Zijlstra
@ 2018-02-12 15:16 ` Peter Zijlstra
  2018-02-13  8:14   ` Jon Maloy
  2018-02-14 22:46   ` Matt Fleming
  1 sibling, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2018-02-12 15:16 UTC (permalink / raw)
  To: Jon Maloy
  Cc: netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith, Matt Fleming

On Fri, Feb 09, 2018 at 05:59:12PM +0000, Jon Maloy wrote:
> Command for TCP:
> "netperf TCP_STREAM  (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"
> Command for TIPC:
> "netperf TIPC_STREAM (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"

That looks like identical tests to me. And my netperf (debian testing)
doesn't appear to have -t TIPC_STREAM.

Please try a coherent report and I'll have another look. Don't (again)
forget to mention what kind of setup you're running this on.


On my IVB-EP (2 sockets, 10 cores, 2 threads), performance cpufreq,
PTI=n RETPOLINE=n, I get:


CPUS=`grep -c ^processor /proc/cpuinfo`

for test in TCP_STREAM
do
        for i in 1 $((CPUS/4)) $((CPUS/2)) $((CPUS)) $((CPUS*2))
        do
                echo -n $test-$i ": "

                (
                  for ((j=0; j<i; j++))
                  do
                        netperf -t $test -4 -c -C -l 60 -P0 | head -1 &
                  done

                  wait
                ) | awk '{ n++; v+=$5; } END { print "Avg: " v/n }'
        done
done



NO_WA_OLD WA_IDLE WA_WEIGHT:

TCP_STREAM-1 : Avg: 44139.8
TCP_STREAM-10 : Avg: 27301.6
TCP_STREAM-20 : Avg: 12701.5
TCP_STREAM-40 : Avg: 5711.62
TCP_STREAM-80 : Avg: 2870.16


WA_OLD NO_WA_IDLE NO_WA_WEIGHT:

TCP_STREAM-1 : Avg: 25293.1
TCP_STREAM-10 : Avg: 28196.3
TCP_STREAM-20 : Avg: 12463.7
TCP_STREAM-40 : Avg: 5566.83
TCP_STREAM-80 : Avg: 2630.03

---
 include/linux/sched/topology.h |  4 ++
 kernel/sched/fair.c            | 99 +++++++++++++++++++++++++++++++++++++-----
 kernel/sched/features.h        |  2 +
 3 files changed, 93 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 26347741ba50..2cb74343c252 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -72,6 +72,10 @@ struct sched_domain_shared {
 	atomic_t	ref;
 	atomic_t	nr_busy_cpus;
 	int		has_idle_cores;
+
+	unsigned long	nr_running;
+	unsigned long	load;
+	unsigned long	capacity;
 };
 
 struct sched_domain {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5eb3ffc9be84..4a561311241a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5680,6 +5680,68 @@ static int wake_wide(struct task_struct *p)
 	return 1;
 }
 
+struct llc_stats {
+	unsigned long nr_running;
+	unsigned long load;
+	unsigned long capacity;
+	int		has_capacity;
+};
+
+static bool get_llc_stats(struct llc_stats *stats, int cpu)
+{
+	struct sched_domain_shared *sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+
+	if (!sds)
+		return false;
+
+	stats->nr_running = READ_ONCE(sds->nr_running);
+	stats->load	  = READ_ONCE(sds->load);
+	stats->capacity	  = READ_ONCE(sds->capacity);
+	stats->has_capacity = stats->nr_running < per_cpu(sd_llc_size, cpu);
+
+	return true;
+}
+
+static int
+wake_affine_old(struct sched_domain *sd, struct task_struct *p,
+		int this_cpu, int prev_cpu, int sync)
+{
+	struct llc_stats prev_stats, this_stats;
+	s64 this_eff_load, prev_eff_load;
+	unsigned long task_load;
+
+	if (!get_llc_stats(&prev_stats, prev_cpu) ||
+	    !get_llc_stats(&this_stats, this_cpu))
+		return nr_cpumask_bits;
+
+	if (sync) {
+		unsigned long current_load = task_h_load(current);
+		if (current_load > this_stats.load)
+			return this_cpu;
+
+		this_stats.load -= current_load;
+	}
+
+	if (prev_stats.has_capacity && prev_stats.nr_running < this_stats.nr_running+1)
+		return nr_cpumask_bits;
+
+	if (this_stats.has_capacity && this_stats.nr_running+1 < prev_stats.nr_running)
+		return this_cpu;
+
+	task_load = task_h_load(p);
+
+	this_eff_load = 100;
+	this_eff_load *= prev_stats.capacity;
+
+	prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
+	prev_eff_load *= this_stats.capacity;
+
+	this_eff_load *= this_stats.load + task_load;
+	prev_eff_load *= prev_stats.load - task_load;
+
+	return this_eff_load <= prev_eff_load ? this_cpu : nr_cpumask_bits;
+}
+
 /*
  * The purpose of wake_affine() is to quickly determine on which CPU we can run
  * soonest. For the purpose of speed we only consider the waking and previous
@@ -5756,6 +5818,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 	int this_cpu = smp_processor_id();
 	int target = nr_cpumask_bits;
 
+	if (sched_feat(WA_OLD))
+		target = wake_affine_old(sd, p, this_cpu, prev_cpu, sync);
+
 	if (sched_feat(WA_IDLE))
 		target = wake_affine_idle(this_cpu, prev_cpu, sync);
 
@@ -6209,18 +6274,20 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 		return prev;
 
 	/* Check a recently used CPU as a potential idle candidate */
-	recent_used_cpu = p->recent_used_cpu;
-	if (recent_used_cpu != prev &&
-	    recent_used_cpu != target &&
-	    cpus_share_cache(recent_used_cpu, target) &&
-	    idle_cpu(recent_used_cpu) &&
-	    cpumask_test_cpu(p->recent_used_cpu, &p->cpus_allowed)) {
-		/*
-		 * Replace recent_used_cpu with prev as it is a potential
-		 * candidate for the next wake.
-		 */
-		p->recent_used_cpu = prev;
-		return recent_used_cpu;
+	if (sched_feat(SIS_RECENT)) {
+		recent_used_cpu = p->recent_used_cpu;
+		if (recent_used_cpu != prev &&
+		    recent_used_cpu != target &&
+		    cpus_share_cache(recent_used_cpu, target) &&
+		    idle_cpu(recent_used_cpu) &&
+		    cpumask_test_cpu(p->recent_used_cpu, &p->cpus_allowed)) {
+			/*
+			 * Replace recent_used_cpu with prev as it is a potential
+			 * candidate for the next wake.
+			 */
+			p->recent_used_cpu = prev;
+			return recent_used_cpu;
+		}
 	}
 
 	sd = rcu_dereference(per_cpu(sd_llc, target));
@@ -7961,6 +8028,7 @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq)
  */
 static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds)
 {
+	struct sched_domain_shared *shared = env->sd->shared;
 	struct sched_domain *child = env->sd->child;
 	struct sched_group *sg = env->sd->groups;
 	struct sg_lb_stats *local = &sds->local_stat;
@@ -8032,6 +8100,13 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 		if (env->dst_rq->rd->overload != overload)
 			env->dst_rq->rd->overload = overload;
 	}
+
+	if (!shared)
+		return;
+
+	WRITE_ONCE(shared->nr_running, sds->total_running);
+	WRITE_ONCE(shared->load, sds->total_load);
+	WRITE_ONCE(shared->capacity, sds->total_capacity);
 }
 
 /**
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 9552fd5854bf..bdb0a66caaae 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -57,6 +57,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
  */
 SCHED_FEAT(SIS_AVG_CPU, false)
 SCHED_FEAT(SIS_PROP, true)
+SCHED_FEAT(SIS_RECENT, true)
 
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
@@ -82,6 +83,7 @@ SCHED_FEAT(RT_RUNTIME_SHARE, true)
 SCHED_FEAT(LB_MIN, false)
 SCHED_FEAT(ATTACH_AGE_LOAD, true)
 
+SCHED_FEAT(WA_OLD, false)
 SCHED_FEAT(WA_IDLE, true)
 SCHED_FEAT(WA_WEIGHT, true)
 SCHED_FEAT(WA_BIAS, true)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: Serious performance degradation in Linux 4.15
  2018-02-12 15:16 ` Peter Zijlstra
@ 2018-02-13  8:14   ` Jon Maloy
  2018-02-14 22:46   ` Matt Fleming
  1 sibling, 0 replies; 11+ messages in thread
From: Jon Maloy @ 2018-02-13  8:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith, Matt Fleming

The person who reported this is on vacation right now. I will be back with more detailed info in two weeks.

///jon

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Peter Zijlstra
> Sent: Monday, February 12, 2018 16:17
> To: Jon Maloy <jon.maloy@ericsson.com>
> Cc: netdev@vger.kernel.org; mingo@kernel.org; David Miller
> (davem@davemloft.net) <davem@davemloft.net>; Mike Galbraith
> <umgwanakikbuti@gmail.com>; Matt Fleming <matt@codeblueprint.co.uk>
> Subject: Re: Serious performance degradation in Linux 4.15
> 
> On Fri, Feb 09, 2018 at 05:59:12PM +0000, Jon Maloy wrote:
> > Command for TCP:
> > "netperf TCP_STREAM  (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t
> TCP_STREAM -l 10 -- -O THROUGHPUT)"
> > Command for TIPC:
> > "netperf TIPC_STREAM (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t
> TCP_STREAM -l 10 -- -O THROUGHPUT)"
> 
> That looks like identical tests to me. And my netperf (debian testing) doesn't
> appear to have -t TIPC_STREAM.
> 
> Please try a coherent report and I'll have another look. Don't (again) forget to
> mention what kind of setup you're running this on.
> 
> 
> On my IVB-EP (2 sockets, 10 cores, 2 threads), performance cpufreq, PTI=n
> RETPOLINE=n, I get:
> 
> 
> CPUS=`grep -c ^processor /proc/cpuinfo`
> 
> for test in TCP_STREAM
> do
>         for i in 1 $((CPUS/4)) $((CPUS/2)) $((CPUS)) $((CPUS*2))
>         do
>                 echo -n $test-$i ": "
> 
>                 (
>                   for ((j=0; j<i; j++))
>                   do
>                         netperf -t $test -4 -c -C -l 60 -P0 | head -1 &
>                   done
> 
>                   wait
>                 ) | awk '{ n++; v+=$5; } END { print "Avg: " v/n }'
>         done
> done
> 
> 
> 
> NO_WA_OLD WA_IDLE WA_WEIGHT:
> 
> TCP_STREAM-1 : Avg: 44139.8
> TCP_STREAM-10 : Avg: 27301.6
> TCP_STREAM-20 : Avg: 12701.5
> TCP_STREAM-40 : Avg: 5711.62
> TCP_STREAM-80 : Avg: 2870.16
> 
> 
> WA_OLD NO_WA_IDLE NO_WA_WEIGHT:
> 
> TCP_STREAM-1 : Avg: 25293.1
> TCP_STREAM-10 : Avg: 28196.3
> TCP_STREAM-20 : Avg: 12463.7
> TCP_STREAM-40 : Avg: 5566.83
> TCP_STREAM-80 : Avg: 2630.03
> 
> ---
>  include/linux/sched/topology.h |  4 ++
>  kernel/sched/fair.c            | 99
> +++++++++++++++++++++++++++++++++++++-----
>  kernel/sched/features.h        |  2 +
>  3 files changed, 93 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 26347741ba50..2cb74343c252 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -72,6 +72,10 @@ struct sched_domain_shared {
>  	atomic_t	ref;
>  	atomic_t	nr_busy_cpus;
>  	int		has_idle_cores;
> +
> +	unsigned long	nr_running;
> +	unsigned long	load;
> +	unsigned long	capacity;
>  };
> 
>  struct sched_domain {
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> 5eb3ffc9be84..4a561311241a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5680,6 +5680,68 @@ static int wake_wide(struct task_struct *p)
>  	return 1;
>  }
> 
> +struct llc_stats {
> +	unsigned long nr_running;
> +	unsigned long load;
> +	unsigned long capacity;
> +	int		has_capacity;
> +};
> +
> +static bool get_llc_stats(struct llc_stats *stats, int cpu) {
> +	struct sched_domain_shared *sds =
> +rcu_dereference(per_cpu(sd_llc_shared, cpu));
> +
> +	if (!sds)
> +		return false;
> +
> +	stats->nr_running = READ_ONCE(sds->nr_running);
> +	stats->load	  = READ_ONCE(sds->load);
> +	stats->capacity	  = READ_ONCE(sds->capacity);
> +	stats->has_capacity = stats->nr_running < per_cpu(sd_llc_size, cpu);
> +
> +	return true;
> +}
> +
> +static int
> +wake_affine_old(struct sched_domain *sd, struct task_struct *p,
> +		int this_cpu, int prev_cpu, int sync) {
> +	struct llc_stats prev_stats, this_stats;
> +	s64 this_eff_load, prev_eff_load;
> +	unsigned long task_load;
> +
> +	if (!get_llc_stats(&prev_stats, prev_cpu) ||
> +	    !get_llc_stats(&this_stats, this_cpu))
> +		return nr_cpumask_bits;
> +
> +	if (sync) {
> +		unsigned long current_load = task_h_load(current);
> +		if (current_load > this_stats.load)
> +			return this_cpu;
> +
> +		this_stats.load -= current_load;
> +	}
> +
> +	if (prev_stats.has_capacity && prev_stats.nr_running <
> this_stats.nr_running+1)
> +		return nr_cpumask_bits;
> +
> +	if (this_stats.has_capacity && this_stats.nr_running+1 <
> prev_stats.nr_running)
> +		return this_cpu;
> +
> +	task_load = task_h_load(p);
> +
> +	this_eff_load = 100;
> +	this_eff_load *= prev_stats.capacity;
> +
> +	prev_eff_load = 100 + (sd->imbalance_pct - 100) / 2;
> +	prev_eff_load *= this_stats.capacity;
> +
> +	this_eff_load *= this_stats.load + task_load;
> +	prev_eff_load *= prev_stats.load - task_load;
> +
> +	return this_eff_load <= prev_eff_load ? this_cpu : nr_cpumask_bits;
> }
> +
>  /*
>   * The purpose of wake_affine() is to quickly determine on which CPU we
> can run
>   * soonest. For the purpose of speed we only consider the waking and
> previous @@ -5756,6 +5818,9 @@ static int wake_affine(struct
> sched_domain *sd, struct task_struct *p,
>  	int this_cpu = smp_processor_id();
>  	int target = nr_cpumask_bits;
> 
> +	if (sched_feat(WA_OLD))
> +		target = wake_affine_old(sd, p, this_cpu, prev_cpu, sync);
> +
>  	if (sched_feat(WA_IDLE))
>  		target = wake_affine_idle(this_cpu, prev_cpu, sync);
> 
> @@ -6209,18 +6274,20 @@ static int select_idle_sibling(struct task_struct *p,
> int prev, int target)
>  		return prev;
> 
>  	/* Check a recently used CPU as a potential idle candidate */
> -	recent_used_cpu = p->recent_used_cpu;
> -	if (recent_used_cpu != prev &&
> -	    recent_used_cpu != target &&
> -	    cpus_share_cache(recent_used_cpu, target) &&
> -	    idle_cpu(recent_used_cpu) &&
> -	    cpumask_test_cpu(p->recent_used_cpu, &p->cpus_allowed)) {
> -		/*
> -		 * Replace recent_used_cpu with prev as it is a potential
> -		 * candidate for the next wake.
> -		 */
> -		p->recent_used_cpu = prev;
> -		return recent_used_cpu;
> +	if (sched_feat(SIS_RECENT)) {
> +		recent_used_cpu = p->recent_used_cpu;
> +		if (recent_used_cpu != prev &&
> +		    recent_used_cpu != target &&
> +		    cpus_share_cache(recent_used_cpu, target) &&
> +		    idle_cpu(recent_used_cpu) &&
> +		    cpumask_test_cpu(p->recent_used_cpu, &p-
> >cpus_allowed)) {
> +			/*
> +			 * Replace recent_used_cpu with prev as it is a
> potential
> +			 * candidate for the next wake.
> +			 */
> +			p->recent_used_cpu = prev;
> +			return recent_used_cpu;
> +		}
>  	}
> 
>  	sd = rcu_dereference(per_cpu(sd_llc, target)); @@ -7961,6 +8028,7
> @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq)
>   */
>  static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats
> *sds)  {
> +	struct sched_domain_shared *shared = env->sd->shared;
>  	struct sched_domain *child = env->sd->child;
>  	struct sched_group *sg = env->sd->groups;
>  	struct sg_lb_stats *local = &sds->local_stat; @@ -8032,6 +8100,13
> @@ static inline void update_sd_lb_stats(struct lb_env *env, struct
> sd_lb_stats *sd
>  		if (env->dst_rq->rd->overload != overload)
>  			env->dst_rq->rd->overload = overload;
>  	}
> +
> +	if (!shared)
> +		return;
> +
> +	WRITE_ONCE(shared->nr_running, sds->total_running);
> +	WRITE_ONCE(shared->load, sds->total_load);
> +	WRITE_ONCE(shared->capacity, sds->total_capacity);
>  }
> 
>  /**
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h index
> 9552fd5854bf..bdb0a66caaae 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -57,6 +57,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
>   */
>  SCHED_FEAT(SIS_AVG_CPU, false)
>  SCHED_FEAT(SIS_PROP, true)
> +SCHED_FEAT(SIS_RECENT, true)
> 
>  /*
>   * Issue a WARN when we do multiple update_rq_clock() calls @@ -82,6
> +83,7 @@ SCHED_FEAT(RT_RUNTIME_SHARE, true)  SCHED_FEAT(LB_MIN,
> false)  SCHED_FEAT(ATTACH_AGE_LOAD, true)
> 
> +SCHED_FEAT(WA_OLD, false)
>  SCHED_FEAT(WA_IDLE, true)
>  SCHED_FEAT(WA_WEIGHT, true)
>  SCHED_FEAT(WA_BIAS, true)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-12 15:16 ` Peter Zijlstra
  2018-02-13  8:14   ` Jon Maloy
@ 2018-02-14 22:46   ` Matt Fleming
  2018-02-15  8:38     ` Peter Zijlstra
                       ` (3 more replies)
  1 sibling, 4 replies; 11+ messages in thread
From: Matt Fleming @ 2018-02-14 22:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jon Maloy, netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith

On Mon, 12 Feb, at 04:16:42PM, Peter Zijlstra wrote:
> On Fri, Feb 09, 2018 at 05:59:12PM +0000, Jon Maloy wrote:
> > Command for TCP:
> > "netperf TCP_STREAM  (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"
> > Command for TIPC:
> > "netperf TIPC_STREAM (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"
> 
> That looks like identical tests to me. And my netperf (debian testing)
> doesn't appear to have -t TIPC_STREAM.
> 
> Please try a coherent report and I'll have another look. Don't (again)
> forget to mention what kind of setup you're running this on.
> 
> 
> On my IVB-EP (2 sockets, 10 cores, 2 threads), performance cpufreq,
> PTI=n RETPOLINE=n, I get:

Here's some more numbers. This is with RETPOLINE=y but you'll see it
doesn't make much of a difference. Oh, this is also with powersave
cpufreq governor.

The 'tip+' column is tip/master, commit ca96ad6978c3 ("Merge branch 'x86/mm'")

The 'tip-plus-patch+' column is tip/master plus Peter's patch from
20180212151642.GU25201@hirez.programming.kicks-ass.net


netperf-tcp
                            4.15.0-rc1                 4.15.0             4.16.0-rc1             4.16.0-rc1
                               vanilla                vanilla                   tip+        tip-plus-patch+
Min       64        1804.73 (   0.00%)      951.28 ( -47.29%)      956.77 ( -46.99%)      936.19 ( -48.13%)
Min       128       3352.00 (   0.00%)     1847.80 ( -44.87%)     1831.41 ( -45.36%)     1808.88 ( -46.04%)
Min       256       5619.02 (   0.00%)     3327.27 ( -40.79%)     3287.00 ( -41.50%)     3311.33 ( -41.07%)
Min       1024     17325.58 (   0.00%)    11053.24 ( -36.20%)    11098.91 ( -35.94%)    10892.59 ( -37.13%)
Min       2048     27564.59 (   0.00%)    18311.31 ( -33.57%)    18649.89 ( -32.34%)    18327.69 ( -33.51%)
Min       3312     33677.30 (   0.00%)    25254.43 ( -25.01%)    24897.65 ( -26.07%)    25464.71 ( -24.39%)
Min       4096     35624.64 (   0.00%)    28186.09 ( -20.88%)    27317.58 ( -23.32%)    27046.46 ( -24.08%)
Min       8192     42950.87 (   0.00%)    33407.18 ( -22.22%)    34133.19 ( -20.53%)    33429.82 ( -22.17%)
Min       16384    46798.74 (   0.00%)    40020.99 ( -14.48%)    40761.81 ( -12.90%)    40370.88 ( -13.74%)
Hmean     64        1818.68 (   0.00%)      959.16 ( -47.26%)      962.40 ( -47.08%)      954.96 ( -47.49%)
Hmean     128       3405.06 (   0.00%)     1860.21 ( -45.37%)     1844.12 ( -45.84%)     1849.44 ( -45.69%)
Hmean     256       5777.53 (   0.00%)     3371.67 ( -41.64%)     3341.43 ( -42.17%)     3360.35 ( -41.84%)
Hmean     1024     17679.46 (   0.00%)    11326.96 ( -35.93%)    11192.24 ( -36.69%)    11219.22 ( -36.54%)
Hmean     2048     27764.04 (   0.00%)    18864.94 ( -32.05%)    18833.51 ( -32.17%)    18740.31 ( -32.50%)
Hmean     3312     35253.65 (   0.00%)    25444.33 ( -27.82%)    25700.57 ( -27.10%)    25610.63 ( -27.35%)
Hmean     4096     36479.20 (   0.00%)    28636.63 ( -21.50%)    28073.90 ( -23.04%)    27856.51 ( -23.64%)
Hmean     8192     43386.27 (   0.00%)    34771.52 ( -19.86%)    35213.44 ( -18.84%)    34603.90 ( -20.24%)
Hmean     16384    47487.74 (   0.00%)    41329.50 ( -12.97%)    41096.73 ( -13.46%)    40787.33 ( -14.11%)
Stddev    64          12.42 (   0.00%)        6.35 (  48.87%)        5.77 (  53.54%)       12.21 (   1.73%)
Stddev    128         45.84 (   0.00%)        9.25 (  79.82%)       13.49 (  70.57%)       23.86 (  47.95%)
Stddev    256         90.59 (   0.00%)       30.55 (  66.28%)       37.07 (  59.08%)       28.66 (  68.36%)
Stddev    1024       322.33 (   0.00%)      164.75 (  48.89%)      119.05 (  63.07%)      265.42 (  17.65%)
Stddev    2048       153.04 (   0.00%)      424.98 (-177.70%)      176.40 ( -15.26%)      242.90 ( -58.72%)
Stddev    3312      1024.93 (   0.00%)      182.58 (  82.19%)      585.07 (  42.92%)      108.93 (  89.37%)
Stddev    4096       696.34 (   0.00%)      433.20 (  37.79%)      626.42 (  10.04%)      712.05 (  -2.26%)
Stddev    8192       478.31 (   0.00%)      808.23 ( -68.98%)      794.39 ( -66.08%)      698.27 ( -45.99%)
Stddev    16384      720.05 (   0.00%)      816.70 ( -13.42%)      412.26 (  42.75%)      325.43 (  54.81%)
CoeffVar  64           0.68 (   0.00%)        0.66 (   3.05%)        0.60 (  12.20%)        1.28 ( -87.13%)
CoeffVar  128          1.35 (   0.00%)        0.50 (  63.06%)        0.73 (  45.66%)        1.29 (   4.17%)
CoeffVar  256          1.57 (   0.00%)        0.91 (  42.21%)        1.11 (  29.24%)        0.85 (  45.59%)
CoeffVar  1024         1.82 (   0.00%)        1.45 (  20.22%)        1.06 (  41.65%)        2.36 ( -29.74%)
CoeffVar  2048         0.55 (   0.00%)        2.25 (-308.53%)        0.94 ( -69.91%)        1.30 (-135.12%)
CoeffVar  3312         2.91 (   0.00%)        0.72 (  75.30%)        2.28 (  21.68%)        0.43 (  85.36%)
CoeffVar  4096         1.91 (   0.00%)        1.51 (  20.74%)        2.23 ( -16.88%)        2.55 ( -33.88%)
CoeffVar  8192         1.10 (   0.00%)        2.32 (-110.77%)        2.25 (-104.56%)        2.02 ( -82.99%)
CoeffVar  16384        1.52 (   0.00%)        1.98 ( -30.31%)        1.00 (  33.83%)        0.80 (  47.37%)
Max       64        1832.51 (   0.00%)      966.09 ( -47.28%)      970.35 ( -47.05%)      967.15 ( -47.22%)
Max       128       3476.62 (   0.00%)     1873.20 ( -46.12%)     1865.28 ( -46.35%)     1869.10 ( -46.24%)
Max       256       5839.83 (   0.00%)     3402.61 ( -41.73%)     3379.67 ( -42.13%)     3383.69 ( -42.06%)
Max       1024     18031.63 (   0.00%)    11482.14 ( -36.32%)    11396.22 ( -36.80%)    11463.71 ( -36.42%)
Max       2048     27912.65 (   0.00%)    19343.06 ( -30.70%)    19095.51 ( -31.59%)    18969.02 ( -32.04%)
Max       3312     36142.68 (   0.00%)    25749.54 ( -28.76%)    26503.65 ( -26.67%)    25767.14 ( -28.71%)
Max       4096     37481.84 (   0.00%)    29189.76 ( -22.12%)    28875.41 ( -22.96%)    28973.52 ( -22.70%)
Max       8192     44101.03 (   0.00%)    35471.04 ( -19.57%)    35890.95 ( -18.62%)    35178.96 ( -20.23%)
Max       16384    48321.50 (   0.00%)    42086.21 ( -12.90%)    41793.29 ( -13.51%)    41152.43 ( -14.84%)

Peter, if you want to run this test yourself you can do:

 1. git clone https://github.com/gorman/mmmtests.git
 2. cd mmtests
 3. ./run-mmtests.sh --config=configs/config-global-dhp__network-netperf-unbound `uname -r`

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-14 22:46   ` Matt Fleming
@ 2018-02-15  8:38     ` Peter Zijlstra
  2018-02-16 10:09     ` Peter Zijlstra
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2018-02-15  8:38 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Jon Maloy, netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith

On Wed, Feb 14, 2018 at 10:46:20PM +0000, Matt Fleming wrote:
> Here's some more numbers. This is with RETPOLINE=y but you'll see it
> doesn't make much of a difference. Oh, this is also with powersave
> cpufreq governor.

Hurmph, I'll go have a look when I can boot tip/master again :/

But didn't you bench those patches before we merged them? I can't
remember you reporting this..

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-14 22:46   ` Matt Fleming
  2018-02-15  8:38     ` Peter Zijlstra
@ 2018-02-16 10:09     ` Peter Zijlstra
  2018-02-16 10:17     ` Peter Zijlstra
  2018-02-16 14:38     ` Matt Fleming
  3 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2018-02-16 10:09 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Jon Maloy, netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith

On Wed, Feb 14, 2018 at 10:46:20PM +0000, Matt Fleming wrote:
> Peter, if you want to run this test yourself you can do:
> 
>  1. git clone https://github.com/gorman/mmmtests.git

root@ivb-ep:/usr/local/src# git clone https://github.com/gorman/mmmtests.git
Cloning into 'mmmtests'...
Username for 'https://github.com':


I'm thinking you meant this:

  https://github.com/gormanm/mmtests.git

right? (also, I still hate the github webthing and you made me look at
it :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-14 22:46   ` Matt Fleming
  2018-02-15  8:38     ` Peter Zijlstra
  2018-02-16 10:09     ` Peter Zijlstra
@ 2018-02-16 10:17     ` Peter Zijlstra
  2018-02-16 10:49       ` Mel Gorman
  2018-02-16 14:38     ` Matt Fleming
  3 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2018-02-16 10:17 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Jon Maloy, netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith, Mel Gorman

On Wed, Feb 14, 2018 at 10:46:20PM +0000, Matt Fleming wrote:
>  3. ./run-mmtests.sh --config=configs/config-global-dhp__network-netperf-unbound `uname -r`

Not a success.. firstly it attempts to install packages without asking
and then horribly fails at it..


root@ivb-ep:/usr/local/src/mmtests# ./run-mmtests.sh --config=configs/config-global-dhp__network-netperf-unbound `uname -r`
Reading package lists... Done
Building dependency tree
Reading state information... Done
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
  binutils-dev
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 2,339 kB of archives.
After this operation, 21.1 MB of additional disk space will be used.
Get:1 http://ftp.nl.debian.org/debian testing/main amd64 binutils-dev amd64 2.30-4 [2,339 kB]
Fetched 2,339 kB in 1s (3,451 kB/s)
Selecting previously unselected package binutils-dev.
(Reading database ... 126177 files and directories currently installed.)
Preparing to unpack .../binutils-dev_2.30-4_amd64.deb ...
Unpacking binutils-dev (2.30-4) ...
Setting up binutils-dev (2.30-4) ...
W: --force-yes is deprecated, use one of the options starting with --allow instead.
Installed binutils-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package oprofile
Failed to install package oprofile for distro debian at /usr/local/src/mmtests/bin/install-depends line 121.
Reading package lists... Done
Building dependency tree
Reading state information... Done
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package perl-Time-HiRes
Failed to install package perl-Time-HiRes for distro debian at /usr/local/src/mmtests/bin/install-depends line 121.
Reading package lists... Done
Building dependency tree
Reading state information... Done
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package hwloc-lstopo
Failed to install package hwloc-lstopo for distro debian at /usr/local/src/mmtests/bin/install-depends line 121.
Reading package lists... Done
Building dependency tree
Reading state information... Done
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package cpupower
Failed to install package cpupower for distro debian at /usr/local/src/mmtests/bin/install-depends line 121.
Using default SHELLPACK_DATA
grep: /usr/local/src/mmtests/shellpacks/shellpack-bench-netperf-udp: No such file or directory
Starting test netperf-udp
/usr/local/src/mmtests/bin/unbuffer: 8: exec: tclsh: not found
ls: cannot access '/proc/16070/fd/0': No such file or directory
ls: cannot access '/proc/16070/fd/0': No such file or directory

after which I killed it dead...


This is one dodgy script which I'll not touch again. Please provide a
small bash script that wraps the right netperf magic and I'll try and
run that.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-16 10:17     ` Peter Zijlstra
@ 2018-02-16 10:49       ` Mel Gorman
  0 siblings, 0 replies; 11+ messages in thread
From: Mel Gorman @ 2018-02-16 10:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Jon Maloy, netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith

On Fri, Feb 16, 2018 at 11:17:01AM +0100, Peter Zijlstra wrote:
> On Wed, Feb 14, 2018 at 10:46:20PM +0000, Matt Fleming wrote:
> >  3. ./run-mmtests.sh --config=configs/config-global-dhp__network-netperf-unbound `uname -r`
> 
> Not a success.. firstly it attempts to install packages without asking
> and then horribly fails at it..
> 

The automatic package installation only works very well for opensuse/SLE.
bin/install-depends can map opensuse package names to other distros but
it's incomplete at best.

In this case, it's the monitoring scripts that are failing because they
rely on expect. You could try running with --no-monitor.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-14 22:46   ` Matt Fleming
                       ` (2 preceding siblings ...)
  2018-02-16 10:17     ` Peter Zijlstra
@ 2018-02-16 14:38     ` Matt Fleming
  2018-02-16 16:48       ` Peter Zijlstra
  3 siblings, 1 reply; 11+ messages in thread
From: Matt Fleming @ 2018-02-16 14:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jon Maloy, netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith

On Wed, 14 Feb, at 10:46:20PM, Matt Fleming wrote:
> On Mon, 12 Feb, at 04:16:42PM, Peter Zijlstra wrote:
> > On Fri, Feb 09, 2018 at 05:59:12PM +0000, Jon Maloy wrote:
> > > Command for TCP:
> > > "netperf TCP_STREAM  (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"
> > > Command for TIPC:
> > > "netperf TIPC_STREAM (netperf -n 4 -f m -c 4 -C 4 -P 1 -H 10.0.0.1 -t TCP_STREAM -l 10 -- -O THROUGHPUT)"
> > 
> > That looks like identical tests to me. And my netperf (debian testing)
> > doesn't appear to have -t TIPC_STREAM.
> > 
> > Please try a coherent report and I'll have another look. Don't (again)
> > forget to mention what kind of setup you're running this on.
> > 
> > 
> > On my IVB-EP (2 sockets, 10 cores, 2 threads), performance cpufreq,
> > PTI=n RETPOLINE=n, I get:
> 
> Here's some more numbers. This is with RETPOLINE=y but you'll see it
> doesn't make much of a difference. Oh, this is also with powersave
> cpufreq governor.

Feh, I was wrong. The differences in performance I see are entirely
due to CONFIG_RETPOLINE and CONFIG_PAGE_TABLE_ISOLATION being enabled.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Serious performance degradation in Linux 4.15
  2018-02-16 14:38     ` Matt Fleming
@ 2018-02-16 16:48       ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2018-02-16 16:48 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Jon Maloy, netdev@vger.kernel.org, mingo@kernel.org,
	David Miller (davem@davemloft.net), Mike Galbraith

On Fri, Feb 16, 2018 at 02:38:39PM +0000, Matt Fleming wrote:
> On Wed, 14 Feb, at 10:46:20PM, Matt Fleming wrote:

> > Here's some more numbers. This is with RETPOLINE=y but you'll see it
> > doesn't make much of a difference. Oh, this is also with powersave
> > cpufreq governor.
> 
> Feh, I was wrong. The differences in performance I see are entirely
> due to CONFIG_RETPOLINE and CONFIG_PAGE_TABLE_ISOLATION being enabled.

OK, so I'm not the one crazy person not being able to reproduce this :-)

Lets wait for more details from the original submitter.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-02-16 16:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-09 17:59 Serious performance degradation in Linux 4.15 Jon Maloy
2018-02-10 14:01 ` Peter Zijlstra
2018-02-12 15:16 ` Peter Zijlstra
2018-02-13  8:14   ` Jon Maloy
2018-02-14 22:46   ` Matt Fleming
2018-02-15  8:38     ` Peter Zijlstra
2018-02-16 10:09     ` Peter Zijlstra
2018-02-16 10:17     ` Peter Zijlstra
2018-02-16 10:49       ` Mel Gorman
2018-02-16 14:38     ` Matt Fleming
2018-02-16 16:48       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).