* [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz @ 2016-06-11 6:39 Akira Yokosawa 2016-06-15 23:51 ` Paul E. McKenney 0 siblings, 1 reply; 5+ messages in thread From: Akira Yokosawa @ 2016-06-11 6:39 UTC (permalink / raw) To: paulmck; +Cc: perfbook, Akira Yokosawa From 1ff97081c713ff51d5bc2e15f8ba7649427fac4f Mon Sep 17 00:00:00 2001 From: Akira Yokosawa <akiyks@gmail.com> Date: Sat, 11 Jun 2016 15:28:20 +0900 Subject: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz Numbers given in 'Comms Fabric' and 'Global Comms' rows in Table D.1 seem wrong. Their costs are given in ns unit, so they should match those given in Table 3.1. Also, their ratio should be calculated by cost(ns)/0.36(ns). This commit fixes those numbers. Signed-off-by: Akira Yokosawa <akiyks@gmail.com> --- cpu/overheads.tex | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/cpu/overheads.tex b/cpu/overheads.tex index 311c43e..0cf83f3 100644 --- a/cpu/overheads.tex +++ b/cpu/overheads.tex @@ -224,11 +224,11 @@ global agreement. \hline CAS cache miss & 95.9 & 266.4 \\ \hline - Comms Fabric & 4,500\textcolor{white}{.0} - & 7,500\textcolor{white}{.0} \\ + Comms Fabric & 3,000\textcolor{white}{.0} + & 8,330\textcolor{white}{.0} \\ \hline - Global Comms & 195,000,000\textcolor{white}{.0} - & 324,000,000\textcolor{white}{.0} \\ + Global Comms & 130,000,000\textcolor{white}{.0} + & 361,000,000\textcolor{white}{.0} \\ \end{tabular} \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} -- 1.9.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz 2016-06-11 6:39 [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz Akira Yokosawa @ 2016-06-15 23:51 ` Paul E. McKenney 2016-06-18 2:48 ` Akira Yokosawa 0 siblings, 1 reply; 5+ messages in thread From: Paul E. McKenney @ 2016-06-15 23:51 UTC (permalink / raw) To: Akira Yokosawa; +Cc: perfbook On Sat, Jun 11, 2016 at 03:39:31PM +0900, Akira Yokosawa wrote: > >From 1ff97081c713ff51d5bc2e15f8ba7649427fac4f Mon Sep 17 00:00:00 2001 > From: Akira Yokosawa <akiyks@gmail.com> > Date: Sat, 11 Jun 2016 15:28:20 +0900 > Subject: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz > > Numbers given in 'Comms Fabric' and 'Global Comms' rows in > Table D.1 seem wrong. > > Their costs are given in ns unit, so they should match those given > in Table 3.1. > > Also, their ratio should be calculated by cost(ns)/0.36(ns). > > This commit fixes those numbers. Great catch, but let's talk about the fix. The 130ms is the time required for light to circumnavigate the earth in a vacuum. The 195ms is that for light to circumnavigate the earth in glass, as in an optical fiber. There is some variation due to different refractive indexes of different types of fiber. So making both of those entries be 195,000,000 seems like the right approach. An even better approach would be to use the ping time to some system on the other side of the world, but I am coming up empty. Pinging google.com gets me 216.58.217.46 for about 47ms. If you are far away from West Coast USA, please see what you get. Not that I would put it past Google to spread a single IP address worldwide to defeat this, but worth a try... On the Comms Fabric number, I suspect that I just googled this at two different times and got two different numbers. Not surprising, as different products would be optimized differently at different times. So how about https://en.wikipedia.org/wiki/InfiniBand? Choose the value in effect when Nehalem was released for the QQ table, and choose the value in effect when the AMD Opteron 844 was released for the table in Chapter 3. Seem reasonable? Oh, and also add a Latex comment saying where the data came from! ;-) Thanx, Paul > Signed-off-by: Akira Yokosawa <akiyks@gmail.com> > --- > cpu/overheads.tex | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/cpu/overheads.tex b/cpu/overheads.tex > index 311c43e..0cf83f3 100644 > --- a/cpu/overheads.tex > +++ b/cpu/overheads.tex > @@ -224,11 +224,11 @@ global agreement. > \hline > CAS cache miss & 95.9 & 266.4 \\ > \hline > - Comms Fabric & 4,500\textcolor{white}{.0} > - & 7,500\textcolor{white}{.0} \\ > + Comms Fabric & 3,000\textcolor{white}{.0} > + & 8,330\textcolor{white}{.0} \\ > \hline > - Global Comms & 195,000,000\textcolor{white}{.0} > - & 324,000,000\textcolor{white}{.0} \\ > + Global Comms & 130,000,000\textcolor{white}{.0} > + & 361,000,000\textcolor{white}{.0} \\ > \end{tabular} > \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} > \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} > -- > 1.9.1 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz 2016-06-15 23:51 ` Paul E. McKenney @ 2016-06-18 2:48 ` Akira Yokosawa 2016-06-18 2:50 ` [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables Akira Yokosawa 0 siblings, 1 reply; 5+ messages in thread From: Akira Yokosawa @ 2016-06-18 2:48 UTC (permalink / raw) To: paulmck; +Cc: perfbook, Akira Yokosawa On 2016/06/15 16:51:38 -0700, Paul E. McKenney wrote: > On Sat, Jun 11, 2016 at 03:39:31PM +0900, Akira Yokosawa wrote: >> >From 1ff97081c713ff51d5bc2e15f8ba7649427fac4f Mon Sep 17 00:00:00 2001 >> From: Akira Yokosawa <akiyks@gmail.com> >> Date: Sat, 11 Jun 2016 15:28:20 +0900 >> Subject: [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz >> >> Numbers given in 'Comms Fabric' and 'Global Comms' rows in >> Table D.1 seem wrong. >> >> Their costs are given in ns unit, so they should match those given >> in Table 3.1. >> >> Also, their ratio should be calculated by cost(ns)/0.36(ns). >> >> This commit fixes those numbers. > > Great catch, but let's talk about the fix. > > The 130ms is the time required for light to circumnavigate the earth > in a vacuum. The 195ms is that for light to circumnavigate the earth > in glass, as in an optical fiber. There is some variation due to > different refractive indexes of different types of fiber. > > So making both of those entries be 195,000,000 seems like the right > approach. An even better approach would be to use the ping time to > some system on the other side of the world, but I am coming up empty. > Pinging google.com gets me 216.58.217.46 for about 47ms. If you are > far away from West Coast USA, please see what you get. Not that I > would put it past Google to spread a single IP address worldwide to > defeat this, but worth a try... > > On the Comms Fabric number, I suspect that I just googled this at > two different times and got two different numbers. Not surprising, > as different products would be optimized differently at different > times. > > So how about https://en.wikipedia.org/wiki/InfiniBand? Choose the > value in effect when Nehalem was released for the QQ table, and choose > the value in effect when the AMD Opteron 844 was released for the > table in Chapter 3. > > Seem reasonable? > > Oh, and also add a Latex comment saying where the data came from! ;-) Hi, I took some time collecting latency data, and composed v2 of the patch. Will send it in reply to this mail. Thanks, Akira > > Thanx, Paul > >> Signed-off-by: Akira Yokosawa <akiyks@gmail.com> >> --- >> cpu/overheads.tex | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/cpu/overheads.tex b/cpu/overheads.tex >> index 311c43e..0cf83f3 100644 >> --- a/cpu/overheads.tex >> +++ b/cpu/overheads.tex >> @@ -224,11 +224,11 @@ global agreement. >> \hline >> CAS cache miss & 95.9 & 266.4 \\ >> \hline >> - Comms Fabric & 4,500\textcolor{white}{.0} >> - & 7,500\textcolor{white}{.0} \\ >> + Comms Fabric & 3,000\textcolor{white}{.0} >> + & 8,330\textcolor{white}{.0} \\ >> \hline >> - Global Comms & 195,000,000\textcolor{white}{.0} >> - & 324,000,000\textcolor{white}{.0} \\ >> + Global Comms & 130,000,000\textcolor{white}{.0} >> + & 361,000,000\textcolor{white}{.0} \\ >> \end{tabular} >> \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} >> \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} >> -- >> 1.9.1 >> > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables 2016-06-18 2:48 ` Akira Yokosawa @ 2016-06-18 2:50 ` Akira Yokosawa 2016-06-18 3:57 ` Paul E. McKenney 0 siblings, 1 reply; 5+ messages in thread From: Akira Yokosawa @ 2016-06-18 2:50 UTC (permalink / raw) To: paulmck; +Cc: perfbook, Akira Yokosawa From 3b2c58c7e7f7abe4303383437502513f948d7401 Mon Sep 17 00:00:00 2001 From: Akira Yokosawa <akiyks@gmail.com> Date: Sat, 18 Jun 2016 10:38:57 +0900 Subject: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables Numbers given in 'Comms Fabric' and 'Global Comms' rows in Table D.1 seem inconsistent. 'Comms Fabric' latency in Table 3.1 is 3 microsecond. The latency of Infiniband DDR, which was available in 2005 (at the time of AMD Opteron 844) is 2.5 microsecond. 'Comms Fabric' latency in Table D.1 is 4.5 microsecond. The latency of Infiniband QDR, which was available in 2009 (at the time of Intel X5550 (Nehalem)) is 1.3 microsecond. These latencies are of one-way communication. In the other rows in the tables, costs are for at least one round- trip. So we need to double these numbers for consistency. For 'Comms Fabric', we'd be better to use 5 microsecond in Table 3.1, and 2.6 microsecond in Table D.1. Of course, these numbers are for bast cases. Actual latency would depend on the topology and the configuration of fabric. 'Global Comms' latency in Table 3.1 is 130 ms. This is based on the speed-of-light in vacuum. On the other hand, 'Global Comms' latency in Table D.1 is 195 ms. This is based on the speed-of-light in optical fiber. The number in Table D.1 is more realistic and we should use it in both tables. This commit fixes these inconsistencies and modifies the related explanation in the text accordingly. Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Akira Yokosawa <akiyks@gmail.com> --- cpu/overheads.tex | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/cpu/overheads.tex b/cpu/overheads.tex index 311c43e..bfdd711 100644 --- a/cpu/overheads.tex +++ b/cpu/overheads.tex @@ -126,12 +126,12 @@ This simplified sequence is just the beginning of a discipline called \hline CAS cache miss & 306.0 & 510.0 \\ \hline - Comms Fabric & 3,000\textcolor{white}{.0} - & 5,000\textcolor{white}{.0} + Comms Fabric & 5,000\textcolor{white}{.0} + & 8,330\textcolor{white}{.0} \\ \hline - Global Comms & 130,000,000\textcolor{white}{.0} - & 216,000,000\textcolor{white}{.0} + Global Comms & 195,000,000\textcolor{white}{.0} + & 325,000,000\textcolor{white}{.0} \\ \\ \end{tabular} \caption{Performance of Synchronization Mechanisms on 4-CPU 1.8GHz AMD Opteron 844 System} @@ -224,11 +224,11 @@ global agreement. \hline CAS cache miss & 95.9 & 266.4 \\ \hline - Comms Fabric & 4,500\textcolor{white}{.0} - & 7,500\textcolor{white}{.0} \\ + Comms Fabric & 2,600\textcolor{white}{.0} + & 7,220\textcolor{white}{.0} \\ \hline Global Comms & 195,000,000\textcolor{white}{.0} - & 324,000,000\textcolor{white}{.0} \\ + & 542,000,000\textcolor{white}{.0} \\ \end{tabular} \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} @@ -264,15 +264,19 @@ I/O operations are even more expensive. As shown in the ``Comms Fabric'' row, high performance (and expensive!) communications fabric, such as InfiniBand or any number of proprietary interconnects, has a latency -of roughly three microseconds, during which time five \emph{thousand} -instructions might have been executed. +of roughly five microseconds for an end-to-end round trip, during which +time more than eight \emph{thousand} instructions might have been executed. Standards-based communications networks often require some sort of protocol processing, which further increases the latency. Of course, geographic distance also increases latency, with the -theoretical speed-of-light latency around the world coming to -roughly 130 \emph{milliseconds}, or more than 200 million clock +speed-of-light through optical fiber latency around the world coming to +roughly 195 \emph{milliseconds}, or more than 300 million clock cycles, as shown in the ``Global Comms'' row. +% Reference of Infiniband latency: +% http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf +% page 6/76 'Leading Interconnect, Leading Performance' + \QuickQuiz{} These numbers are insanely large! How can I possibly get my head around them? -- 1.9.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables 2016-06-18 2:50 ` [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables Akira Yokosawa @ 2016-06-18 3:57 ` Paul E. McKenney 0 siblings, 0 replies; 5+ messages in thread From: Paul E. McKenney @ 2016-06-18 3:57 UTC (permalink / raw) To: Akira Yokosawa; +Cc: perfbook On Sat, Jun 18, 2016 at 11:50:53AM +0900, Akira Yokosawa wrote: > >From 3b2c58c7e7f7abe4303383437502513f948d7401 Mon Sep 17 00:00:00 2001 > From: Akira Yokosawa <akiyks@gmail.com> > Date: Sat, 18 Jun 2016 10:38:57 +0900 > Subject: [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables > > Numbers given in 'Comms Fabric' and 'Global Comms' rows in > Table D.1 seem inconsistent. > > 'Comms Fabric' latency in Table 3.1 is 3 microsecond. > The latency of Infiniband DDR, which was available in 2005 (at the > time of AMD Opteron 844) is 2.5 microsecond. > 'Comms Fabric' latency in Table D.1 is 4.5 microsecond. > The latency of Infiniband QDR, which was available in 2009 (at the > time of Intel X5550 (Nehalem)) is 1.3 microsecond. > These latencies are of one-way communication. > In the other rows in the tables, costs are for at least one round- > trip. So we need to double these numbers for consistency. > > For 'Comms Fabric', we'd be better to use 5 microsecond in Table 3.1, > and 2.6 microsecond in Table D.1. > > Of course, these numbers are for bast cases. Actual latency would > depend on the topology and the configuration of fabric. > > 'Global Comms' latency in Table 3.1 is 130 ms. > This is based on the speed-of-light in vacuum. > On the other hand, 'Global Comms' latency in Table D.1 is 195 ms. > This is based on the speed-of-light in optical fiber. > The number in Table D.1 is more realistic and we should use it > in both tables. > > This commit fixes these inconsistencies and modifies the related > explanation in the text accordingly. > > Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Nice!!! Applied and pushed. Thanx, Paul > --- > cpu/overheads.tex | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/cpu/overheads.tex b/cpu/overheads.tex > index 311c43e..bfdd711 100644 > --- a/cpu/overheads.tex > +++ b/cpu/overheads.tex > @@ -126,12 +126,12 @@ This simplified sequence is just the beginning of a discipline called > \hline > CAS cache miss & 306.0 & 510.0 \\ > \hline > - Comms Fabric & 3,000\textcolor{white}{.0} > - & 5,000\textcolor{white}{.0} > + Comms Fabric & 5,000\textcolor{white}{.0} > + & 8,330\textcolor{white}{.0} > \\ > \hline > - Global Comms & 130,000,000\textcolor{white}{.0} > - & 216,000,000\textcolor{white}{.0} > + Global Comms & 195,000,000\textcolor{white}{.0} > + & 325,000,000\textcolor{white}{.0} \\ > \\ > \end{tabular} > \caption{Performance of Synchronization Mechanisms on 4-CPU 1.8GHz AMD Opteron 844 System} > @@ -224,11 +224,11 @@ global agreement. > \hline > CAS cache miss & 95.9 & 266.4 \\ > \hline > - Comms Fabric & 4,500\textcolor{white}{.0} > - & 7,500\textcolor{white}{.0} \\ > + Comms Fabric & 2,600\textcolor{white}{.0} > + & 7,220\textcolor{white}{.0} \\ > \hline > Global Comms & 195,000,000\textcolor{white}{.0} > - & 324,000,000\textcolor{white}{.0} \\ > + & 542,000,000\textcolor{white}{.0} \\ > \end{tabular} > \caption{Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} > \label{tab:cpu:Performance of Synchronization Mechanisms on 16-CPU 2.8GHz Intel X5550 (Nehalem) System} > @@ -264,15 +264,19 @@ I/O operations are even more expensive. > As shown in the ``Comms Fabric'' row, > high performance (and expensive!) communications fabric, such as > InfiniBand or any number of proprietary interconnects, has a latency > -of roughly three microseconds, during which time five \emph{thousand} > -instructions might have been executed. > +of roughly five microseconds for an end-to-end round trip, during which > +time more than eight \emph{thousand} instructions might have been executed. > Standards-based communications networks often require some sort of > protocol processing, which further increases the latency. > Of course, geographic distance also increases latency, with the > -theoretical speed-of-light latency around the world coming to > -roughly 130 \emph{milliseconds}, or more than 200 million clock > +speed-of-light through optical fiber latency around the world coming to > +roughly 195 \emph{milliseconds}, or more than 300 million clock > cycles, as shown in the ``Global Comms'' row. > > +% Reference of Infiniband latency: > +% http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf > +% page 6/76 'Leading Interconnect, Leading Performance' > + > \QuickQuiz{} > These numbers are insanely large! > How can I possibly get my head around them? > -- > 1.9.1 > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-06-18 3:57 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-11 6:39 [PATCH] cpu: Fix numbers in Performance of Mechanisms table in qqz Akira Yokosawa 2016-06-15 23:51 ` Paul E. McKenney 2016-06-18 2:48 ` Akira Yokosawa 2016-06-18 2:50 ` [PATCH v2] cpu: Fix numbers in Performance of Mechanisms tables Akira Yokosawa 2016-06-18 3:57 ` Paul E. McKenney
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.