[PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz
@ 2016-06-11  6:39 Akira Yokosawa
  2016-06-15 23:51 ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Akira Yokosawa @ 2016-06-11  6:39 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, Akira Yokosawa

From 1ff97081c713ff51d5bc2e15f8ba7649427fac4f Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 11 Jun 2016 15:28:20 +0900
Subject: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz

Numbers given in 'Comms Fabric' and 'Global Comms' rows in
Table D.1 seem wrong.

Their costs are given in ns unit, so they should match those given
in Table 3.1.

Also, their ratio should be calculated by cost(ns)/0.36(ns).

This commit fixes those numbers.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 cpu/overheads.tex | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/cpu/overheads.tex b/cpu/overheads.tex
index 311c43e..0cf83f3 100644
--- a/cpu/overheads.tex
+++ b/cpu/overheads.tex
@@ -224,11 +224,11 @@ global agreement.
 	\hline
 	CAS cache miss		&          95.9	&         266.4 \\
 	\hline
-	Comms Fabric		&       4,500\textcolor{white}{.0}
-						&	7,500\textcolor{white}{.0} \\
+	Comms Fabric		&       3,000\textcolor{white}{.0}
+						&	8,330\textcolor{white}{.0} \\
 	\hline
-	Global Comms		& 195,000,000\textcolor{white}{.0}
-						& 324,000,000\textcolor{white}{.0} \\
+	Global Comms		& 130,000,000\textcolor{white}{.0}
+						& 361,000,000\textcolor{white}{.0} \\
 \end{tabular}
 \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
 \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz
  2016-06-11  6:39 [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz Akira Yokosawa
@ 2016-06-15 23:51 ` Paul E. McKenney
  2016-06-18  2:48   ` Akira Yokosawa
  0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2016-06-15 23:51 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Sat, Jun 11, 2016 at 03:39:31PM +0900, Akira Yokosawa wrote:
> >From 1ff97081c713ff51d5bc2e15f8ba7649427fac4f Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@gmail.com>
> Date: Sat, 11 Jun 2016 15:28:20 +0900
> Subject: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz
> 
> Numbers given in 'Comms Fabric' and 'Global Comms' rows in
> Table D.1 seem wrong.
> 
> Their costs are given in ns unit, so they should match those given
> in Table 3.1.
> 
> Also, their ratio should be calculated by cost(ns)/0.36(ns).
> 
> This commit fixes those numbers.

Great catch, but let's talk about the fix.

The 130ms is the time required for light to circumnavigate the earth
in a vacuum.  The 195ms is that for light to circumnavigate the earth
in glass, as in an optical fiber.  There is some variation due to
different refractive indexes of different types of fiber.

So making both of those entries be 195,000,000 seems like the right
approach.  An even better approach would be to use the ping time to
some system on the other side of the world, but I am coming up empty.
Pinging google.com gets me 216.58.217.46 for about 47ms.  If you are
far away from West Coast USA, please see what you get.  Not that I
would put it past Google to spread a single IP address worldwide to
defeat this, but worth a try...

On the Comms Fabric number, I suspect that I just googled this at
two different times and got two different numbers.  Not surprising,
as different products would be optimized differently at different
times.

So how about https://en.wikipedia.org/wiki/InfiniBand?  Choose the
value in effect when Nehalem was released for the QQ table, and choose
the value in effect when the AMD Opteron 844 was released for the
table in Chapter 3.

Seem reasonable?

Oh, and also add a Latex comment saying where the data came from!  ;-)

							Thanx, Paul

> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> ---
>  cpu/overheads.tex | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/cpu/overheads.tex b/cpu/overheads.tex
> index 311c43e..0cf83f3 100644
> --- a/cpu/overheads.tex
> +++ b/cpu/overheads.tex
> @@ -224,11 +224,11 @@ global agreement.
>  	\hline
>  	CAS cache miss		&          95.9	&         266.4 \\
>  	\hline
> -	Comms Fabric		&       4,500\textcolor{white}{.0}
> -						&	7,500\textcolor{white}{.0} \\
> +	Comms Fabric		&       3,000\textcolor{white}{.0}
> +						&	8,330\textcolor{white}{.0} \\
>  	\hline
> -	Global Comms		& 195,000,000\textcolor{white}{.0}
> -						& 324,000,000\textcolor{white}{.0} \\
> +	Global Comms		& 130,000,000\textcolor{white}{.0}
> +						& 361,000,000\textcolor{white}{.0} \\
>  \end{tabular}
>  \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
>  \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz
  2016-06-15 23:51 ` Paul E. McKenney
@ 2016-06-18  2:48   ` Akira Yokosawa
  2016-06-18  2:50     ` [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables Akira Yokosawa
  0 siblings, 1 reply; 5+ messages in thread
From: Akira Yokosawa @ 2016-06-18  2:48 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, Akira Yokosawa

On 2016/06/15 16:51:38 -0700, Paul E. McKenney wrote:
> On Sat, Jun 11, 2016 at 03:39:31PM +0900, Akira Yokosawa wrote:
>> >From 1ff97081c713ff51d5bc2e15f8ba7649427fac4f Mon Sep 17 00:00:00 2001
>> From: Akira Yokosawa <akiyks@gmail.com>
>> Date: Sat, 11 Jun 2016 15:28:20 +0900
>> Subject: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz
>>
>> Numbers given in 'Comms Fabric' and 'Global Comms' rows in
>> Table D.1 seem wrong.
>>
>> Their costs are given in ns unit, so they should match those given
>> in Table 3.1.
>>
>> Also, their ratio should be calculated by cost(ns)/0.36(ns).
>>
>> This commit fixes those numbers.
> 
> Great catch, but let's talk about the fix.
> 
> The 130ms is the time required for light to circumnavigate the earth
> in a vacuum.  The 195ms is that for light to circumnavigate the earth
> in glass, as in an optical fiber.  There is some variation due to
> different refractive indexes of different types of fiber.
> 
> So making both of those entries be 195,000,000 seems like the right
> approach.  An even better approach would be to use the ping time to
> some system on the other side of the world, but I am coming up empty.
> Pinging google.com gets me 216.58.217.46 for about 47ms.  If you are
> far away from West Coast USA, please see what you get.  Not that I
> would put it past Google to spread a single IP address worldwide to
> defeat this, but worth a try...
> 
> On the Comms Fabric number, I suspect that I just googled this at
> two different times and got two different numbers.  Not surprising,
> as different products would be optimized differently at different
> times.
> 
> So how about https://en.wikipedia.org/wiki/InfiniBand?  Choose the
> value in effect when Nehalem was released for the QQ table, and choose
> the value in effect when the AMD Opteron 844 was released for the
> table in Chapter 3.
> 
> Seem reasonable?
> 
> Oh, and also add a Latex comment saying where the data came from!  ;-)

Hi,

I took some time collecting latency data, and composed v2 of the
patch. Will send it in reply to this mail.

                                              Thanks, Akira

> 
> 							Thanx, Paul
> 
>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
>> ---
>>  cpu/overheads.tex | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/cpu/overheads.tex b/cpu/overheads.tex
>> index 311c43e..0cf83f3 100644
>> --- a/cpu/overheads.tex
>> +++ b/cpu/overheads.tex
>> @@ -224,11 +224,11 @@ global agreement.
>>  	\hline
>>  	CAS cache miss		&          95.9	&         266.4 \\
>>  	\hline
>> -	Comms Fabric		&       4,500\textcolor{white}{.0}
>> -						&	7,500\textcolor{white}{.0} \\
>> +	Comms Fabric		&       3,000\textcolor{white}{.0}
>> +						&	8,330\textcolor{white}{.0} \\
>>  	\hline
>> -	Global Comms		& 195,000,000\textcolor{white}{.0}
>> -						& 324,000,000\textcolor{white}{.0} \\
>> +	Global Comms		& 130,000,000\textcolor{white}{.0}
>> +						& 361,000,000\textcolor{white}{.0} \\
>>  \end{tabular}
>>  \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
>>  \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
>> -- 
>> 1.9.1
>>
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables
  2016-06-18  2:48   ` Akira Yokosawa
@ 2016-06-18  2:50     ` Akira Yokosawa
  2016-06-18  3:57       ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Akira Yokosawa @ 2016-06-18  2:50 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, Akira Yokosawa

From 3b2c58c7e7f7abe4303383437502513f948d7401 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 18 Jun 2016 10:38:57 +0900
Subject: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables

Numbers given in 'Comms Fabric' and 'Global Comms' rows in
Table D.1 seem inconsistent.

'Comms Fabric' latency in Table 3.1 is 3 microsecond.
The latency of Infiniband DDR, which was available in 2005 (at the
time of AMD Opteron 844) is 2.5 microsecond.
'Comms Fabric' latency in Table D.1 is 4.5 microsecond.
The latency of Infiniband QDR, which was available in 2009 (at the
time of Intel X5550 (Nehalem)) is 1.3 microsecond.
These latencies are of one-way communication.
In the other rows in the tables, costs are for at least one round-
trip. So we need to double these numbers for consistency.

For 'Comms Fabric', we'd be better to use 5 microsecond in Table 3.1,
and 2.6 microsecond in Table D.1.

Of course, these numbers are for bast cases. Actual latency would
depend on the topology and the configuration of fabric.

'Global Comms' latency in Table 3.1 is 130 ms.
This is based on the speed-of-light in vacuum.
On the other hand, 'Global Comms' latency in Table D.1 is 195 ms.
This is based on the speed-of-light in optical fiber.
The number in Table D.1 is more realistic and we should use it
in both tables.

This commit fixes these inconsistencies and modifies the related
explanation in the text accordingly.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 cpu/overheads.tex | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/cpu/overheads.tex b/cpu/overheads.tex
index 311c43e..bfdd711 100644
--- a/cpu/overheads.tex
+++ b/cpu/overheads.tex
@@ -126,12 +126,12 @@ This simplified sequence is just the beginning of a discipline called
 	\hline
 	CAS cache miss		&         306.0	&         510.0 \\
 	\hline
-	Comms Fabric		&       3,000\textcolor{white}{.0}
-						&       5,000\textcolor{white}{.0}
+	Comms Fabric		&       5,000\textcolor{white}{.0}
+						&       8,330\textcolor{white}{.0}
 								\\
 	\hline
-	Global Comms		& 130,000,000\textcolor{white}{.0}
-						& 216,000,000\textcolor{white}{.0}
+	Global Comms		& 195,000,000\textcolor{white}{.0}
+						& 325,000,000\textcolor{white}{.0} \\
 								\\
 \end{tabular}
 \caption{Performance of Synchronization Mechanisms on 4-CPU 1.8GHz AMD Opteron 844 System}
@@ -224,11 +224,11 @@ global agreement.
 	\hline
 	CAS cache miss		&          95.9	&         266.4 \\
 	\hline
-	Comms Fabric		&       4,500\textcolor{white}{.0}
-						&	7,500\textcolor{white}{.0} \\
+	Comms Fabric		&       2,600\textcolor{white}{.0}
+						&	7,220\textcolor{white}{.0} \\
 	\hline
 	Global Comms		& 195,000,000\textcolor{white}{.0}
-						& 324,000,000\textcolor{white}{.0} \\
+						& 542,000,000\textcolor{white}{.0} \\
 \end{tabular}
 \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
 \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
@@ -264,15 +264,19 @@ I/O operations are even more expensive.
 As shown in the ``Comms Fabric'' row,
 high performance (and expensive!) communications fabric, such as
 InfiniBand or any number of proprietary interconnects, has a latency
-of roughly three microseconds, during which time five \emph{thousand}
-instructions might have been executed.
+of roughly five microseconds for an end-to-end round trip, during which
+time more than eight \emph{thousand} instructions might have been executed.
 Standards-based communications networks often require some sort of
 protocol processing, which further increases the latency.
 Of course, geographic distance also increases latency, with the
-theoretical speed-of-light latency around the world coming to
-roughly 130 \emph{milliseconds}, or more than 200 million clock
+speed-of-light through optical fiber latency around the world coming to
+roughly 195 \emph{milliseconds}, or more than 300 million clock
 cycles, as shown in the ``Global Comms'' row.

+% Reference of Infiniband latency:
+% http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf
+%     page 6/76 'Leading Interconnect, Leading Performance'
+
 \QuickQuiz{}
 	These numbers are insanely large!
 	How can I possibly get my head around them?
-- 
1.9.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables
  2016-06-18  2:50     ` [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables Akira Yokosawa
@ 2016-06-18  3:57       ` Paul E. McKenney
  0 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2016-06-18  3:57 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Sat, Jun 18, 2016 at 11:50:53AM +0900, Akira Yokosawa wrote:
> >From 3b2c58c7e7f7abe4303383437502513f948d7401 Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@gmail.com>
> Date: Sat, 18 Jun 2016 10:38:57 +0900
> Subject: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables
> 
> Numbers given in 'Comms Fabric' and 'Global Comms' rows in
> Table D.1 seem inconsistent.
> 
> 'Comms Fabric' latency in Table 3.1 is 3 microsecond.
> The latency of Infiniband DDR, which was available in 2005 (at the
> time of AMD Opteron 844) is 2.5 microsecond.
> 'Comms Fabric' latency in Table D.1 is 4.5 microsecond.
> The latency of Infiniband QDR, which was available in 2009 (at the
> time of Intel X5550 (Nehalem)) is 1.3 microsecond.
> These latencies are of one-way communication.
> In the other rows in the tables, costs are for at least one round-
> trip. So we need to double these numbers for consistency.
> 
> For 'Comms Fabric', we'd be better to use 5 microsecond in Table 3.1,
> and 2.6 microsecond in Table D.1.
> 
> Of course, these numbers are for bast cases. Actual latency would
> depend on the topology and the configuration of fabric.
> 
> 'Global Comms' latency in Table 3.1 is 130 ms.
> This is based on the speed-of-light in vacuum.
> On the other hand, 'Global Comms' latency in Table D.1 is 195 ms.
> This is based on the speed-of-light in optical fiber.
> The number in Table D.1 is more realistic and we should use it
> in both tables.
> 
> This commit fixes these inconsistencies and modifies the related
> explanation in the text accordingly.
> 
> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

Nice!!! Applied and pushed.

							Thanx, Paul

> ---
>  cpu/overheads.tex | 26 +++++++++++++++-----------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/cpu/overheads.tex b/cpu/overheads.tex
> index 311c43e..bfdd711 100644
> --- a/cpu/overheads.tex
> +++ b/cpu/overheads.tex
> @@ -126,12 +126,12 @@ This simplified sequence is just the beginning of a discipline called
>  	\hline
>  	CAS cache miss		&         306.0	&         510.0 \\
>  	\hline
> -	Comms Fabric		&       3,000\textcolor{white}{.0}
> -						&       5,000\textcolor{white}{.0}
> +	Comms Fabric		&       5,000\textcolor{white}{.0}
> +						&       8,330\textcolor{white}{.0}
>  								\\
>  	\hline
> -	Global Comms		& 130,000,000\textcolor{white}{.0}
> -						& 216,000,000\textcolor{white}{.0}
> +	Global Comms		& 195,000,000\textcolor{white}{.0}
> +						& 325,000,000\textcolor{white}{.0} \\
>  								\\
>  \end{tabular}
>  \caption{Performance of Synchronization Mechanisms on 4-CPU 1.8GHz AMD Opteron 844 System}
> @@ -224,11 +224,11 @@ global agreement.
>  	\hline
>  	CAS cache miss		&          95.9	&         266.4 \\
>  	\hline
> -	Comms Fabric		&       4,500\textcolor{white}{.0}
> -						&	7,500\textcolor{white}{.0} \\
> +	Comms Fabric		&       2,600\textcolor{white}{.0}
> +						&	7,220\textcolor{white}{.0} \\
>  	\hline
>  	Global Comms		& 195,000,000\textcolor{white}{.0}
> -						& 324,000,000\textcolor{white}{.0} \\
> +						& 542,000,000\textcolor{white}{.0} \\
>  \end{tabular}
>  \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
>  \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System}
> @@ -264,15 +264,19 @@ I/O operations are even more expensive.
>  As shown in the ``Comms Fabric'' row,
>  high performance (and expensive!) communications fabric, such as
>  InfiniBand or any number of proprietary interconnects, has a latency
> -of roughly three microseconds, during which time five \emph{thousand}
> -instructions might have been executed.
> +of roughly five microseconds for an end-to-end round trip, during which
> +time more than eight \emph{thousand} instructions might have been executed.
>  Standards-based communications networks often require some sort of
>  protocol processing, which further increases the latency.
>  Of course, geographic distance also increases latency, with the
> -theoretical speed-of-light latency around the world coming to
> -roughly 130 \emph{milliseconds}, or more than 200 million clock
> +speed-of-light through optical fiber latency around the world coming to
> +roughly 195 \emph{milliseconds}, or more than 300 million clock
>  cycles, as shown in the ``Global Comms'' row.
> 
> +% Reference of Infiniband latency:
> +% http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf
> +%     page 6/76 'Leading Interconnect, Leading Performance'
> +
>  \QuickQuiz{}
>  	These numbers are insanely large!
>  	How can I possibly get my head around them?
> -- 
> 1.9.1
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-06-18  3:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-11  6:39 [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz Akira Yokosawa
2016-06-15 23:51 ` Paul E. McKenney
2016-06-18  2:48   ` Akira Yokosawa
2016-06-18  2:50     ` [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables Akira Yokosawa
2016-06-18  3:57       ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.