* [PATCH 1/2] count: fix typos
@ 2016-03-07 23:42 SeongJae Park
2016-03-07 23:42 ` [PATCH 2/2] count: fix a word to fit in context SeongJae Park
2016-03-08 0:08 ` [PATCH 1/2] count: fix typos Paul E. McKenney
0 siblings, 2 replies; 7+ messages in thread
From: SeongJae Park @ 2016-03-07 23:42 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
This commit fix typos such as wrong spelling, missed or unnecessarily
inserted words and characters in `COUNTING` section.
---
count/count.tex | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/count/count.tex b/count/count.tex
index 83b695f..8f69a10 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -672,7 +672,7 @@ at the beginning of this chapter.
to reduce.
However, the worst case is unchanged because although the
counter \emph{could} move in either direction, the worst
- case is whenthe read operation completes immediately,
+ case is when the read operation completes immediately,
but then is delayed for $\Delta$ time units, during which
time all the changes in the counter's value move it in
the same direction, again giving us an absolute error
@@ -851,7 +851,7 @@ comes at the cost of the additional thread running \co{eventual()}.
the per-thread \co{counter} variables
might need to be limited to 32 bits in order to sum them accurately,
but with a 64-bit \co{global_count} variable to avoid overflow.
- In this case, it is necessary to zero the per-thead
+ In this case, it is necessary to zero the per-thread
\co{counter} variables periodically in order to avoid overflow.
It is extremely important to note that this zeroing cannot
be delayed too long or overflow of the smaller per-thread
@@ -1648,7 +1648,7 @@ Then line~34 releases \co{gblcnt_mutex}, and line~35 returns success.
Lines~38-50 show \co{read_count()}, which returns the aggregate value
of the counter.
-It acquires \co{gblcnt_mutex} on line~43 and releases it on line 48,
+It acquires \co{gblcnt_mutex} on line~43 and releases it on line~48,
excluding global operations from \co{add_count()} and \co{sub_count()},
and, as we will see, also excluding thread creation and exit.
Line~44 initializes local variable \co{sum} to the value of
@@ -2262,7 +2262,7 @@ then line~25 returns failure.
Otherwise, line~28 adds \co{delta} to the global counter, line~29
spreads counts to the local state if appropriate, line~30 releases
-\co{gblcnt_mutex} (again, as noted earlier), and finally, line 31
+\co{gblcnt_mutex} (again, as noted earlier), and finally, line~31
returns success.
Lines~34-63 of
@@ -2305,7 +2305,7 @@ Line~9 acquires \co{gblcnt_mutex} and line~16 releases it.
Line~10 initializes local variable \co{sum} to the value of
\co{globalcount}, and the loop spanning lines~11-15 adds the
per-thread counters to this sum, isolating each per-thread counter
-using \co{split_ctrandmax} on line 13.
+using \co{split_ctrandmax} on line~13.
Finally, line~17 returns the sum.
\begin{figure}[tbp]
@@ -2432,7 +2432,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
\QuickQuiz{}
What stops a thread from simply refilling its
\co{ctrandmax} variable immediately after
- \co{flush_local_count()} on line 14 of
+ \co{flush_local_count()} on line~14 of
Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
empties it?
\QuickQuizAnswer{
@@ -2449,7 +2449,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
What prevents concurrent execution of the fastpath of either
\co{add_count()} or \co{sub_count()} from interfering with
the \co{ctrandmax} variable while
- \co{flush_local_count()} is accessing it on line 27 of
+ \co{flush_local_count()} is accessing it on line~27 of
Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
empties it?
\QuickQuizAnswer{
@@ -2542,7 +2542,7 @@ Even though per-thread state will now be manipulated only by the
corresponding thread, there will still need to be synchronization
with the signal handlers.
This synchronization is provided by the state machine shown in
-Figure~\ref{fig:count:Signal-Theft State Machine}
+Figure~\ref{fig:count:Signal-Theft State Machine}.
The state machine starts out in the IDLE state, and when \co{add_count()}
or \co{sub_count()} find that the combination of the local thread's count
and the global count cannot accommodate the request, the corresponding
@@ -3576,7 +3576,7 @@ Summarizing the summary:
counters' partitioned updates and non-partitioned reads), but also
across time (as in
Section~\ref{sec:count:Approximate Limit Counters}'s and
- Section~\ref{sec:count:Exact Limit Counters}'s and
+ Section~\ref{sec:count:Exact Limit Counters}'s
limit counters running fast when far from
the limit, but slowly when close to the limit).
\item Partitioning across time often batches updates locally
@@ -3585,7 +3585,7 @@ Summarizing the summary:
improving performance and scalability.
All the algorithms shown in
Tables~\ref{tab:count:Statistical Counter Performance on Power-6}
- and~\ref{tab:count:Limit Counter Performance on Power-6}?
+ and~\ref{tab:count:Limit Counter Performance on Power-6}
make heavy use of batching.
\item Read-only code paths should remain read-only: Spurious
synchronization writes to shared memory kill performance
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] count: fix a word to fit in context
2016-03-07 23:42 [PATCH 1/2] count: fix typos SeongJae Park
@ 2016-03-07 23:42 ` SeongJae Park
2016-03-08 0:08 ` [PATCH 1/2] count: fix typos Paul E. McKenney
1 sibling, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2016-03-07 23:42 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
A word of quick quiz in `Parallel Counting Performance` subsection does
not fit with its context. It says ``'count_stat.c' row of Table
'Statistical Counter Performance on Power-6' shows that the _update_
side scales linearly with the number of threads.''. However, the table
shows read performance under 1 core and 32 cores, not update performance
under different number of cores. Also, the context says about read-side
performance, not update-side performance. This commit fix the word to
fit in the context by changing the word to _read_-side performance.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
count/count.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/count/count.tex b/count/count.tex
index 8f69a10..3a3ca80 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -3297,7 +3297,7 @@ courtesy of eventual consistency.
\QuickQuiz{}
On the \url{count_stat.c} row of
Table~\ref{tab:count:Statistical Counter Performance on Power-6},
- we see that the update side scales linearly with the number of
+ we see that the read-side scales linearly with the number of
threads.
How is that possible given that the more threads there are,
the more per-thread counters must be summed up?
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] count: fix typos
2016-03-07 23:42 [PATCH 1/2] count: fix typos SeongJae Park
2016-03-07 23:42 ` [PATCH 2/2] count: fix a word to fit in context SeongJae Park
@ 2016-03-08 0:08 ` Paul E. McKenney
2016-03-08 0:11 ` SeongJae Park
1 sibling, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2016-03-08 0:08 UTC (permalink / raw)
To: SeongJae Park; +Cc: perfbook
On Tue, Mar 08, 2016 at 08:42:31AM +0900, SeongJae Park wrote:
> This commit fix typos such as wrong spelling, missed or unnecessarily
> inserted words and characters in `COUNTING` section.
Could you please resend with your Signed-off-by? Looks good otherwise.
Thanx, Paul
> ---
> count/count.tex | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/count/count.tex b/count/count.tex
> index 83b695f..8f69a10 100644
> --- a/count/count.tex
> +++ b/count/count.tex
> @@ -672,7 +672,7 @@ at the beginning of this chapter.
> to reduce.
> However, the worst case is unchanged because although the
> counter \emph{could} move in either direction, the worst
> - case is whenthe read operation completes immediately,
> + case is when the read operation completes immediately,
> but then is delayed for $\Delta$ time units, during which
> time all the changes in the counter's value move it in
> the same direction, again giving us an absolute error
> @@ -851,7 +851,7 @@ comes at the cost of the additional thread running \co{eventual()}.
> the per-thread \co{counter} variables
> might need to be limited to 32 bits in order to sum them accurately,
> but with a 64-bit \co{global_count} variable to avoid overflow.
> - In this case, it is necessary to zero the per-thead
> + In this case, it is necessary to zero the per-thread
> \co{counter} variables periodically in order to avoid overflow.
> It is extremely important to note that this zeroing cannot
> be delayed too long or overflow of the smaller per-thread
> @@ -1648,7 +1648,7 @@ Then line~34 releases \co{gblcnt_mutex}, and line~35 returns success.
>
> Lines~38-50 show \co{read_count()}, which returns the aggregate value
> of the counter.
> -It acquires \co{gblcnt_mutex} on line~43 and releases it on line 48,
> +It acquires \co{gblcnt_mutex} on line~43 and releases it on line~48,
> excluding global operations from \co{add_count()} and \co{sub_count()},
> and, as we will see, also excluding thread creation and exit.
> Line~44 initializes local variable \co{sum} to the value of
> @@ -2262,7 +2262,7 @@ then line~25 returns failure.
>
> Otherwise, line~28 adds \co{delta} to the global counter, line~29
> spreads counts to the local state if appropriate, line~30 releases
> -\co{gblcnt_mutex} (again, as noted earlier), and finally, line 31
> +\co{gblcnt_mutex} (again, as noted earlier), and finally, line~31
> returns success.
>
> Lines~34-63 of
> @@ -2305,7 +2305,7 @@ Line~9 acquires \co{gblcnt_mutex} and line~16 releases it.
> Line~10 initializes local variable \co{sum} to the value of
> \co{globalcount}, and the loop spanning lines~11-15 adds the
> per-thread counters to this sum, isolating each per-thread counter
> -using \co{split_ctrandmax} on line 13.
> +using \co{split_ctrandmax} on line~13.
> Finally, line~17 returns the sum.
>
> \begin{figure}[tbp]
> @@ -2432,7 +2432,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
> \QuickQuiz{}
> What stops a thread from simply refilling its
> \co{ctrandmax} variable immediately after
> - \co{flush_local_count()} on line 14 of
> + \co{flush_local_count()} on line~14 of
> Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
> empties it?
> \QuickQuizAnswer{
> @@ -2449,7 +2449,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
> What prevents concurrent execution of the fastpath of either
> \co{add_count()} or \co{sub_count()} from interfering with
> the \co{ctrandmax} variable while
> - \co{flush_local_count()} is accessing it on line 27 of
> + \co{flush_local_count()} is accessing it on line~27 of
> Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
> empties it?
> \QuickQuizAnswer{
> @@ -2542,7 +2542,7 @@ Even though per-thread state will now be manipulated only by the
> corresponding thread, there will still need to be synchronization
> with the signal handlers.
> This synchronization is provided by the state machine shown in
> -Figure~\ref{fig:count:Signal-Theft State Machine}
> +Figure~\ref{fig:count:Signal-Theft State Machine}.
> The state machine starts out in the IDLE state, and when \co{add_count()}
> or \co{sub_count()} find that the combination of the local thread's count
> and the global count cannot accommodate the request, the corresponding
> @@ -3576,7 +3576,7 @@ Summarizing the summary:
> counters' partitioned updates and non-partitioned reads), but also
> across time (as in
> Section~\ref{sec:count:Approximate Limit Counters}'s and
> - Section~\ref{sec:count:Exact Limit Counters}'s and
> + Section~\ref{sec:count:Exact Limit Counters}'s
> limit counters running fast when far from
> the limit, but slowly when close to the limit).
> \item Partitioning across time often batches updates locally
> @@ -3585,7 +3585,7 @@ Summarizing the summary:
> improving performance and scalability.
> All the algorithms shown in
> Tables~\ref{tab:count:Statistical Counter Performance on Power-6}
> - and~\ref{tab:count:Limit Counter Performance on Power-6}?
> + and~\ref{tab:count:Limit Counter Performance on Power-6}
> make heavy use of batching.
> \item Read-only code paths should remain read-only: Spurious
> synchronization writes to shared memory kill performance
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] count: fix typos
2016-03-08 0:08 ` [PATCH 1/2] count: fix typos Paul E. McKenney
@ 2016-03-08 0:11 ` SeongJae Park
2016-03-08 0:13 ` SeongJae Park
0 siblings, 1 reply; 7+ messages in thread
From: SeongJae Park @ 2016-03-08 0:11 UTC (permalink / raw)
To: Paul McKenney; +Cc: perfbook
On Tue, Mar 8, 2016 at 9:08 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Mar 08, 2016 at 08:42:31AM +0900, SeongJae Park wrote:
>> This commit fix typos such as wrong spelling, missed or unnecessarily
>> inserted words and characters in `COUNTING` section.
>
> Could you please resend with your Signed-off-by? Looks good otherwise.
Oops, I forgot it while squashing commits. I will send edited version
right now.
>
> Thanx, Paul
>
>> ---
>> count/count.tex | 20 ++++++++++----------
>> 1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/count/count.tex b/count/count.tex
>> index 83b695f..8f69a10 100644
>> --- a/count/count.tex
>> +++ b/count/count.tex
>> @@ -672,7 +672,7 @@ at the beginning of this chapter.
>> to reduce.
>> However, the worst case is unchanged because although the
>> counter \emph{could} move in either direction, the worst
>> - case is whenthe read operation completes immediately,
>> + case is when the read operation completes immediately,
>> but then is delayed for $\Delta$ time units, during which
>> time all the changes in the counter's value move it in
>> the same direction, again giving us an absolute error
>> @@ -851,7 +851,7 @@ comes at the cost of the additional thread running \co{eventual()}.
>> the per-thread \co{counter} variables
>> might need to be limited to 32 bits in order to sum them accurately,
>> but with a 64-bit \co{global_count} variable to avoid overflow.
>> - In this case, it is necessary to zero the per-thead
>> + In this case, it is necessary to zero the per-thread
>> \co{counter} variables periodically in order to avoid overflow.
>> It is extremely important to note that this zeroing cannot
>> be delayed too long or overflow of the smaller per-thread
>> @@ -1648,7 +1648,7 @@ Then line~34 releases \co{gblcnt_mutex}, and line~35 returns success.
>>
>> Lines~38-50 show \co{read_count()}, which returns the aggregate value
>> of the counter.
>> -It acquires \co{gblcnt_mutex} on line~43 and releases it on line 48,
>> +It acquires \co{gblcnt_mutex} on line~43 and releases it on line~48,
>> excluding global operations from \co{add_count()} and \co{sub_count()},
>> and, as we will see, also excluding thread creation and exit.
>> Line~44 initializes local variable \co{sum} to the value of
>> @@ -2262,7 +2262,7 @@ then line~25 returns failure.
>>
>> Otherwise, line~28 adds \co{delta} to the global counter, line~29
>> spreads counts to the local state if appropriate, line~30 releases
>> -\co{gblcnt_mutex} (again, as noted earlier), and finally, line 31
>> +\co{gblcnt_mutex} (again, as noted earlier), and finally, line~31
>> returns success.
>>
>> Lines~34-63 of
>> @@ -2305,7 +2305,7 @@ Line~9 acquires \co{gblcnt_mutex} and line~16 releases it.
>> Line~10 initializes local variable \co{sum} to the value of
>> \co{globalcount}, and the loop spanning lines~11-15 adds the
>> per-thread counters to this sum, isolating each per-thread counter
>> -using \co{split_ctrandmax} on line 13.
>> +using \co{split_ctrandmax} on line~13.
>> Finally, line~17 returns the sum.
>>
>> \begin{figure}[tbp]
>> @@ -2432,7 +2432,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
>> \QuickQuiz{}
>> What stops a thread from simply refilling its
>> \co{ctrandmax} variable immediately after
>> - \co{flush_local_count()} on line 14 of
>> + \co{flush_local_count()} on line~14 of
>> Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
>> empties it?
>> \QuickQuizAnswer{
>> @@ -2449,7 +2449,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
>> What prevents concurrent execution of the fastpath of either
>> \co{add_count()} or \co{sub_count()} from interfering with
>> the \co{ctrandmax} variable while
>> - \co{flush_local_count()} is accessing it on line 27 of
>> + \co{flush_local_count()} is accessing it on line~27 of
>> Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
>> empties it?
>> \QuickQuizAnswer{
>> @@ -2542,7 +2542,7 @@ Even though per-thread state will now be manipulated only by the
>> corresponding thread, there will still need to be synchronization
>> with the signal handlers.
>> This synchronization is provided by the state machine shown in
>> -Figure~\ref{fig:count:Signal-Theft State Machine}
>> +Figure~\ref{fig:count:Signal-Theft State Machine}.
>> The state machine starts out in the IDLE state, and when \co{add_count()}
>> or \co{sub_count()} find that the combination of the local thread's count
>> and the global count cannot accommodate the request, the corresponding
>> @@ -3576,7 +3576,7 @@ Summarizing the summary:
>> counters' partitioned updates and non-partitioned reads), but also
>> across time (as in
>> Section~\ref{sec:count:Approximate Limit Counters}'s and
>> - Section~\ref{sec:count:Exact Limit Counters}'s and
>> + Section~\ref{sec:count:Exact Limit Counters}'s
>> limit counters running fast when far from
>> the limit, but slowly when close to the limit).
>> \item Partitioning across time often batches updates locally
>> @@ -3585,7 +3585,7 @@ Summarizing the summary:
>> improving performance and scalability.
>> All the algorithms shown in
>> Tables~\ref{tab:count:Statistical Counter Performance on Power-6}
>> - and~\ref{tab:count:Limit Counter Performance on Power-6}?
>> + and~\ref{tab:count:Limit Counter Performance on Power-6}
>> make heavy use of batching.
>> \item Read-only code paths should remain read-only: Spurious
>> synchronization writes to shared memory kill performance
>> --
>> 1.9.1
>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] count: fix typos
2016-03-08 0:11 ` SeongJae Park
@ 2016-03-08 0:13 ` SeongJae Park
2016-03-08 0:13 ` [PATCH 2/2] count: fix a word to fit in context SeongJae Park
2016-03-08 17:04 ` [PATCH 1/2] count: fix typos Paul E. McKenney
0 siblings, 2 replies; 7+ messages in thread
From: SeongJae Park @ 2016-03-08 0:13 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
This commit fix typos such as wrong spelling, missed or unnecessarily
inserted words and characters in `COUNTING` section.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
count/count.tex | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/count/count.tex b/count/count.tex
index 83b695f..8f69a10 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -672,7 +672,7 @@ at the beginning of this chapter.
to reduce.
However, the worst case is unchanged because although the
counter \emph{could} move in either direction, the worst
- case is whenthe read operation completes immediately,
+ case is when the read operation completes immediately,
but then is delayed for $\Delta$ time units, during which
time all the changes in the counter's value move it in
the same direction, again giving us an absolute error
@@ -851,7 +851,7 @@ comes at the cost of the additional thread running \co{eventual()}.
the per-thread \co{counter} variables
might need to be limited to 32 bits in order to sum them accurately,
but with a 64-bit \co{global_count} variable to avoid overflow.
- In this case, it is necessary to zero the per-thead
+ In this case, it is necessary to zero the per-thread
\co{counter} variables periodically in order to avoid overflow.
It is extremely important to note that this zeroing cannot
be delayed too long or overflow of the smaller per-thread
@@ -1648,7 +1648,7 @@ Then line~34 releases \co{gblcnt_mutex}, and line~35 returns success.
Lines~38-50 show \co{read_count()}, which returns the aggregate value
of the counter.
-It acquires \co{gblcnt_mutex} on line~43 and releases it on line 48,
+It acquires \co{gblcnt_mutex} on line~43 and releases it on line~48,
excluding global operations from \co{add_count()} and \co{sub_count()},
and, as we will see, also excluding thread creation and exit.
Line~44 initializes local variable \co{sum} to the value of
@@ -2262,7 +2262,7 @@ then line~25 returns failure.
Otherwise, line~28 adds \co{delta} to the global counter, line~29
spreads counts to the local state if appropriate, line~30 releases
-\co{gblcnt_mutex} (again, as noted earlier), and finally, line 31
+\co{gblcnt_mutex} (again, as noted earlier), and finally, line~31
returns success.
Lines~34-63 of
@@ -2305,7 +2305,7 @@ Line~9 acquires \co{gblcnt_mutex} and line~16 releases it.
Line~10 initializes local variable \co{sum} to the value of
\co{globalcount}, and the loop spanning lines~11-15 adds the
per-thread counters to this sum, isolating each per-thread counter
-using \co{split_ctrandmax} on line 13.
+using \co{split_ctrandmax} on line~13.
Finally, line~17 returns the sum.
\begin{figure}[tbp]
@@ -2432,7 +2432,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
\QuickQuiz{}
What stops a thread from simply refilling its
\co{ctrandmax} variable immediately after
- \co{flush_local_count()} on line 14 of
+ \co{flush_local_count()} on line~14 of
Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
empties it?
\QuickQuizAnswer{
@@ -2449,7 +2449,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
What prevents concurrent execution of the fastpath of either
\co{add_count()} or \co{sub_count()} from interfering with
the \co{ctrandmax} variable while
- \co{flush_local_count()} is accessing it on line 27 of
+ \co{flush_local_count()} is accessing it on line~27 of
Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
empties it?
\QuickQuizAnswer{
@@ -2542,7 +2542,7 @@ Even though per-thread state will now be manipulated only by the
corresponding thread, there will still need to be synchronization
with the signal handlers.
This synchronization is provided by the state machine shown in
-Figure~\ref{fig:count:Signal-Theft State Machine}
+Figure~\ref{fig:count:Signal-Theft State Machine}.
The state machine starts out in the IDLE state, and when \co{add_count()}
or \co{sub_count()} find that the combination of the local thread's count
and the global count cannot accommodate the request, the corresponding
@@ -3576,7 +3576,7 @@ Summarizing the summary:
counters' partitioned updates and non-partitioned reads), but also
across time (as in
Section~\ref{sec:count:Approximate Limit Counters}'s and
- Section~\ref{sec:count:Exact Limit Counters}'s and
+ Section~\ref{sec:count:Exact Limit Counters}'s
limit counters running fast when far from
the limit, but slowly when close to the limit).
\item Partitioning across time often batches updates locally
@@ -3585,7 +3585,7 @@ Summarizing the summary:
improving performance and scalability.
All the algorithms shown in
Tables~\ref{tab:count:Statistical Counter Performance on Power-6}
- and~\ref{tab:count:Limit Counter Performance on Power-6}?
+ and~\ref{tab:count:Limit Counter Performance on Power-6}
make heavy use of batching.
\item Read-only code paths should remain read-only: Spurious
synchronization writes to shared memory kill performance
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] count: fix a word to fit in context
2016-03-08 0:13 ` SeongJae Park
@ 2016-03-08 0:13 ` SeongJae Park
2016-03-08 17:04 ` [PATCH 1/2] count: fix typos Paul E. McKenney
1 sibling, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2016-03-08 0:13 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
A word of quick quiz in `Parallel Counting Performance` subsection does
not fit with its context. It says ``'count_stat.c' row of Table
'Statistical Counter Performance on Power-6' shows that the _update_
side scales linearly with the number of threads.''. However, the table
shows read performance under 1 core and 32 cores, not update performance
under different number of cores. Also, the context says about read-side
performance, not update-side performance. This commit fix the word to
fit in the context by changing the word to _read_-side performance.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
count/count.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/count/count.tex b/count/count.tex
index 8f69a10..3a3ca80 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -3297,7 +3297,7 @@ courtesy of eventual consistency.
\QuickQuiz{}
On the \url{count_stat.c} row of
Table~\ref{tab:count:Statistical Counter Performance on Power-6},
- we see that the update side scales linearly with the number of
+ we see that the read-side scales linearly with the number of
threads.
How is that possible given that the more threads there are,
the more per-thread counters must be summed up?
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] count: fix typos
2016-03-08 0:13 ` SeongJae Park
2016-03-08 0:13 ` [PATCH 2/2] count: fix a word to fit in context SeongJae Park
@ 2016-03-08 17:04 ` Paul E. McKenney
1 sibling, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2016-03-08 17:04 UTC (permalink / raw)
To: SeongJae Park; +Cc: perfbook
On Tue, Mar 08, 2016 at 09:13:38AM +0900, SeongJae Park wrote:
> This commit fix typos such as wrong spelling, missed or unnecessarily
> inserted words and characters in `COUNTING` section.
>
> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Applied both, thank you!
Thanx, Paul
> ---
> count/count.tex | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/count/count.tex b/count/count.tex
> index 83b695f..8f69a10 100644
> --- a/count/count.tex
> +++ b/count/count.tex
> @@ -672,7 +672,7 @@ at the beginning of this chapter.
> to reduce.
> However, the worst case is unchanged because although the
> counter \emph{could} move in either direction, the worst
> - case is whenthe read operation completes immediately,
> + case is when the read operation completes immediately,
> but then is delayed for $\Delta$ time units, during which
> time all the changes in the counter's value move it in
> the same direction, again giving us an absolute error
> @@ -851,7 +851,7 @@ comes at the cost of the additional thread running \co{eventual()}.
> the per-thread \co{counter} variables
> might need to be limited to 32 bits in order to sum them accurately,
> but with a 64-bit \co{global_count} variable to avoid overflow.
> - In this case, it is necessary to zero the per-thead
> + In this case, it is necessary to zero the per-thread
> \co{counter} variables periodically in order to avoid overflow.
> It is extremely important to note that this zeroing cannot
> be delayed too long or overflow of the smaller per-thread
> @@ -1648,7 +1648,7 @@ Then line~34 releases \co{gblcnt_mutex}, and line~35 returns success.
>
> Lines~38-50 show \co{read_count()}, which returns the aggregate value
> of the counter.
> -It acquires \co{gblcnt_mutex} on line~43 and releases it on line 48,
> +It acquires \co{gblcnt_mutex} on line~43 and releases it on line~48,
> excluding global operations from \co{add_count()} and \co{sub_count()},
> and, as we will see, also excluding thread creation and exit.
> Line~44 initializes local variable \co{sum} to the value of
> @@ -2262,7 +2262,7 @@ then line~25 returns failure.
>
> Otherwise, line~28 adds \co{delta} to the global counter, line~29
> spreads counts to the local state if appropriate, line~30 releases
> -\co{gblcnt_mutex} (again, as noted earlier), and finally, line 31
> +\co{gblcnt_mutex} (again, as noted earlier), and finally, line~31
> returns success.
>
> Lines~34-63 of
> @@ -2305,7 +2305,7 @@ Line~9 acquires \co{gblcnt_mutex} and line~16 releases it.
> Line~10 initializes local variable \co{sum} to the value of
> \co{globalcount}, and the loop spanning lines~11-15 adds the
> per-thread counters to this sum, isolating each per-thread counter
> -using \co{split_ctrandmax} on line 13.
> +using \co{split_ctrandmax} on line~13.
> Finally, line~17 returns the sum.
>
> \begin{figure}[tbp]
> @@ -2432,7 +2432,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
> \QuickQuiz{}
> What stops a thread from simply refilling its
> \co{ctrandmax} variable immediately after
> - \co{flush_local_count()} on line 14 of
> + \co{flush_local_count()} on line~14 of
> Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
> empties it?
> \QuickQuizAnswer{
> @@ -2449,7 +2449,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}.
> What prevents concurrent execution of the fastpath of either
> \co{add_count()} or \co{sub_count()} from interfering with
> the \co{ctrandmax} variable while
> - \co{flush_local_count()} is accessing it on line 27 of
> + \co{flush_local_count()} is accessing it on line~27 of
> Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1}
> empties it?
> \QuickQuizAnswer{
> @@ -2542,7 +2542,7 @@ Even though per-thread state will now be manipulated only by the
> corresponding thread, there will still need to be synchronization
> with the signal handlers.
> This synchronization is provided by the state machine shown in
> -Figure~\ref{fig:count:Signal-Theft State Machine}
> +Figure~\ref{fig:count:Signal-Theft State Machine}.
> The state machine starts out in the IDLE state, and when \co{add_count()}
> or \co{sub_count()} find that the combination of the local thread's count
> and the global count cannot accommodate the request, the corresponding
> @@ -3576,7 +3576,7 @@ Summarizing the summary:
> counters' partitioned updates and non-partitioned reads), but also
> across time (as in
> Section~\ref{sec:count:Approximate Limit Counters}'s and
> - Section~\ref{sec:count:Exact Limit Counters}'s and
> + Section~\ref{sec:count:Exact Limit Counters}'s
> limit counters running fast when far from
> the limit, but slowly when close to the limit).
> \item Partitioning across time often batches updates locally
> @@ -3585,7 +3585,7 @@ Summarizing the summary:
> improving performance and scalability.
> All the algorithms shown in
> Tables~\ref{tab:count:Statistical Counter Performance on Power-6}
> - and~\ref{tab:count:Limit Counter Performance on Power-6}?
> + and~\ref{tab:count:Limit Counter Performance on Power-6}
> make heavy use of batching.
> \item Read-only code paths should remain read-only: Spurious
> synchronization writes to shared memory kill performance
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-03-08 17:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-07 23:42 [PATCH 1/2] count: fix typos SeongJae Park
2016-03-07 23:42 ` [PATCH 2/2] count: fix a word to fit in context SeongJae Park
2016-03-08 0:08 ` [PATCH 1/2] count: fix typos Paul E. McKenney
2016-03-08 0:11 ` SeongJae Park
2016-03-08 0:13 ` SeongJae Park
2016-03-08 0:13 ` [PATCH 2/2] count: fix a word to fit in context SeongJae Park
2016-03-08 17:04 ` [PATCH 1/2] count: fix typos Paul E. McKenney
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.