* [PATCH 0/4] datastruct: Minor fixes
@ 2022-12-26 18:16 SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
` (8 more replies)
0 siblings, 9 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Hi Paul,
Hope you're having great holidays.
This patchset contains minor fixes for datastruct/ that found during the
Korean translation[1] of it.
[1] https://github.com/sjp38/perfbook-ko_KR
Thanks,
SJ
SeongJae Park (4):
datastruct: Remove unnecessary space
datastruct: Add missed unbreakable spaces
datastruct: Enclose NULL with \co{}
datastruct: Put \cref{} content in a single line
datastruct/datastruct.tex | 35 ++++++++++++++++-------------------
1 file changed, 16 insertions(+), 19 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/4] datastruct: Remove unnecessary space
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
` (7 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
A sentence in datastruct has unnecessary extra space between words.
Remove it.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index ed404e5a..99c92d9a 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -34,7 +34,7 @@ which improves both performance and scalability.
Because this chapter cannot delve into the details of every concurrent
data structure,
\cref{sec:datastruct:Other Data Structures}
-surveys a few of the important ones.
+surveys a few of the important ones.
Although the best performance and scalability results from design rather
than after-the-fact micro-optimization, micro-optimization is nevertheless
necessary for the absolute best possible performance and scalability,
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 23:41 ` Akira Yokosawa
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
` (6 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Add missing unbreakable spaces for 'CPUs' and 'elements'.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 99c92d9a..40ea6995 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -664,7 +664,7 @@ shows the same data on a linear scale.
This drops the global-locking trace into the x-axis, but allows the
non-ideal performance of RCU and hazard pointers to be more readily
discerned.
-Both show a change in slope at 224 CPUs, and this is due to hardware
+Both show a change in slope at 224~CPUs, and this is due to hardware
multithreading.
At 32 and fewer CPUs, each thread has a core to itself.
In this regime, RCU does better than does hazard pointers because the
@@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
In short, RCU is better able to utilize a core from a single hardware
thread than is hazard pointers.
-This situation changes above 224 CPUs.
+This situation changes above 224~CPUs.
Because RCU is using more than half of each core's resources from a
single hardware thread, RCU gains relatively little benefit from the
second hardware thread in each core.
-The slope of the hazard-pointers trace also decreases at 224 CPUs, but
+The slope of the hazard-pointers trace also decreases at 224~CPUs, but
less dramatically,
because the second hardware thread is able to fill in the time
that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
@@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
Still unconvinced?
Then look at the log-log plot in
- \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
- which shows performance for 448 CPUs as a function of the
+ \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
+ which shows performance for 448~CPUs as a function of the
hash-table size, that is, number of buckets and maximum number
of elements.
A hash-table of size 1,024 has 1,024~buckets and contains
@@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
Because this is a read-only benchmark, the actual occupancy is
always equal to the average occupancy.
- This figure shows near-ideal performance below about 8,000
- elements, that is, when the hash table comprises less than
- 1\,MB of data.
+ This figure shows near-ideal performance below about 8,000~elements,
+ that is, when the hash table comprises less than 1\,MB of data.
This near-ideal performance is consistent with that for the
pre-BSD routing table shown in
\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
- even at 448 CPUs.
+ even at 448~CPUs.
However, the performance drops significantly (this is a log-log
plot) at about 8,000~elements, which is where the 1,048,576-byte
L2 cache overflows.
@@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
\QuickQuiz{
The memory system is a serious bottleneck on this big system.
- Why bother putting 448 CPUs on a system without giving them
+ Why bother putting 448~CPUs on a system without giving them
enough memory bandwidth to do something useful???
}\QuickQuizAnswer{
It would indeed be a bad idea to use this large and expensive
@@ -905,10 +904,10 @@ concurrency control to begin with.
\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
therefore shows the effect of updates on readers.
At the extreme left-hand side of this graph, all but one of the CPUs
-are doing lookups, while to the right all 448 CPUs are doing updates.
+are doing lookups, while to the right all 448~CPUs are doing updates.
For all four implementations, the number of lookups per millisecond
decreases as the number of updating CPUs increases, of course reaching
-zero lookups per millisecond when all 448 CPUs are updating.
+zero lookups per millisecond when all 448~CPUs are updating.
Both hazard pointers and RCU do well compared to per-bucket locking
because their readers do not increase update-side lock contention.
RCU does well relative to hazard pointers as the number of updaters
@@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
shows the effect of increasing update rates on the updates themselves.
Again, at the left-hand side of the figure all but one of the CPUs are
-doing lookups and at the right-hand side of the figure all 448 CPUs are
+doing lookups and at the right-hand side of the figure all 448~CPUs are
doing updates.
Hazard pointers and RCU start off with a significant advantage because,
unlike bucket locking, readers do not exclude updaters.
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/4] datastruct: Enclose NULL with \co{}
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
` (5 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Every 'NULL' in datastruct are enclosed with \co{} but one. Remove the
inconsistent exception.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 40ea6995..9dffaf37 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1532,7 +1532,7 @@ and \clnref{acq_oldcur} acquires that bucket's spinlock.
\co{hashtab_add()} and \co{hashtab_del()}?
In other words, what prevents \co{hashtab_add()}
and \co{hashtab_del()} from dereferencing
- a NULL pointer loaded from \co{->ht_new}?
+ a \co{NULL} pointer loaded from \co{->ht_new}?
\end{fcvref}
}\QuickQuizAnswer{
\begin{fcvref}[ln:datastruct:hash_resize:resize]
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/4] datastruct: Put \cref{} content in a single line
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (2 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (4 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Every \cref{} content is in a single line and it helps grep-like
scripting. However two \cref{}s are broken into two lines. Make those
single lines for consistency and easier grep.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 9dffaf37..4c7f9fe2 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1551,8 +1551,7 @@ and \clnref{acq_oldcur} acquires that bucket's spinlock.
\co{hashtab_del()} functions must be enclosed
in RCU read-side critical sections, courtesy of
\co{hashtab_lock_mod()} and \co{hashtab_unlock_mod()} in
- \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency
- Control}.
+ \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control}.
\end{fcvref}
}\QuickQuizEnd
@@ -1584,8 +1583,7 @@ the old hash table, and finally \clnref{ret_success} returns success.
\begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod]
Together with the \co{READ_ONCE()}
on \clnref{l:ifresized} in \co{hashtab_lock_mod()}
- of \cref{lst:datastruct:Resizable Hash-Table Update-Side
- Concurrency Control},
+ of \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control},
it tells the compiler that the non-initialization accesses
to \co{->ht_resize_cur} must remain because reads
from \co{->ht_resize_cur} really can race with writes,
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 0/4] datastruct: Minor fixes
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (3 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
` (3 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Hi Paul,
Hope you're having great holidays.
This patchset contains minor fixes for datastruct/ that found during the
Korean translation[1] of it.
[1] https://github.com/sjp38/perfbook-ko_KR
Thanks,
SJ
SeongJae Park (4):
datastruct: Remove unnecessary space
datastruct: Add missed unbreakable spaces
datastruct: Enclose NULL with \co{}
datastruct: Put \cref{} content in a single line
datastruct/datastruct.tex | 35 ++++++++++++++++-------------------
1 file changed, 16 insertions(+), 19 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/4] datastruct: Remove unnecessary space
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (4 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 0/4] datastruct: Minor fixes SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
` (2 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
A sentence in datastruct has unnecessary extra space between words.
Remove it.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index ed404e5a..99c92d9a 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -34,7 +34,7 @@ which improves both performance and scalability.
Because this chapter cannot delve into the details of every concurrent
data structure,
\cref{sec:datastruct:Other Data Structures}
-surveys a few of the important ones.
+surveys a few of the important ones.
Although the best performance and scalability results from design rather
than after-the-fact micro-optimization, micro-optimization is nevertheless
necessary for the absolute best possible performance and scalability,
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (5 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Add missing unbreakable spaces for 'CPUs' and 'elements'.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 99c92d9a..40ea6995 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -664,7 +664,7 @@ shows the same data on a linear scale.
This drops the global-locking trace into the x-axis, but allows the
non-ideal performance of RCU and hazard pointers to be more readily
discerned.
-Both show a change in slope at 224 CPUs, and this is due to hardware
+Both show a change in slope at 224~CPUs, and this is due to hardware
multithreading.
At 32 and fewer CPUs, each thread has a core to itself.
In this regime, RCU does better than does hazard pointers because the
@@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
In short, RCU is better able to utilize a core from a single hardware
thread than is hazard pointers.
-This situation changes above 224 CPUs.
+This situation changes above 224~CPUs.
Because RCU is using more than half of each core's resources from a
single hardware thread, RCU gains relatively little benefit from the
second hardware thread in each core.
-The slope of the hazard-pointers trace also decreases at 224 CPUs, but
+The slope of the hazard-pointers trace also decreases at 224~CPUs, but
less dramatically,
because the second hardware thread is able to fill in the time
that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
@@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
Still unconvinced?
Then look at the log-log plot in
- \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
- which shows performance for 448 CPUs as a function of the
+ \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
+ which shows performance for 448~CPUs as a function of the
hash-table size, that is, number of buckets and maximum number
of elements.
A hash-table of size 1,024 has 1,024~buckets and contains
@@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
Because this is a read-only benchmark, the actual occupancy is
always equal to the average occupancy.
- This figure shows near-ideal performance below about 8,000
- elements, that is, when the hash table comprises less than
- 1\,MB of data.
+ This figure shows near-ideal performance below about 8,000~elements,
+ that is, when the hash table comprises less than 1\,MB of data.
This near-ideal performance is consistent with that for the
pre-BSD routing table shown in
\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
- even at 448 CPUs.
+ even at 448~CPUs.
However, the performance drops significantly (this is a log-log
plot) at about 8,000~elements, which is where the 1,048,576-byte
L2 cache overflows.
@@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
\QuickQuiz{
The memory system is a serious bottleneck on this big system.
- Why bother putting 448 CPUs on a system without giving them
+ Why bother putting 448~CPUs on a system without giving them
enough memory bandwidth to do something useful???
}\QuickQuizAnswer{
It would indeed be a bad idea to use this large and expensive
@@ -905,10 +904,10 @@ concurrency control to begin with.
\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
therefore shows the effect of updates on readers.
At the extreme left-hand side of this graph, all but one of the CPUs
-are doing lookups, while to the right all 448 CPUs are doing updates.
+are doing lookups, while to the right all 448~CPUs are doing updates.
For all four implementations, the number of lookups per millisecond
decreases as the number of updating CPUs increases, of course reaching
-zero lookups per millisecond when all 448 CPUs are updating.
+zero lookups per millisecond when all 448~CPUs are updating.
Both hazard pointers and RCU do well compared to per-bucket locking
because their readers do not increase update-side lock contention.
RCU does well relative to hazard pointers as the number of updaters
@@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
shows the effect of increasing update rates on the updates themselves.
Again, at the left-hand side of the figure all but one of the CPUs are
-doing lookups and at the right-hand side of the figure all 448 CPUs are
+doing lookups and at the right-hand side of the figure all 448~CPUs are
doing updates.
Hazard pointers and RCU start off with a significant advantage because,
unlike bucket locking, readers do not exclude updaters.
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/4] datastruct: Enclose NULL with \co{}
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (6 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Every 'NULL' in datastruct are enclosed with \co{} but one. Remove the
inconsistent exception.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 40ea6995..9dffaf37 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1532,7 +1532,7 @@ and \clnref{acq_oldcur} acquires that bucket's spinlock.
\co{hashtab_add()} and \co{hashtab_del()}?
In other words, what prevents \co{hashtab_add()}
and \co{hashtab_del()} from dereferencing
- a NULL pointer loaded from \co{->ht_new}?
+ a \co{NULL} pointer loaded from \co{->ht_new}?
\end{fcvref}
}\QuickQuizAnswer{
\begin{fcvref}[ln:datastruct:hash_resize:resize]
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/4] datastruct: Put \cref{} content in a single line
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (7 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Every \cref{} content is in a single line and it helps grep-like
scripting. However two \cref{}s are broken into two lines. Make those
single lines for consistency and easier grep.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 9dffaf37..4c7f9fe2 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1551,8 +1551,7 @@ and \clnref{acq_oldcur} acquires that bucket's spinlock.
\co{hashtab_del()} functions must be enclosed
in RCU read-side critical sections, courtesy of
\co{hashtab_lock_mod()} and \co{hashtab_unlock_mod()} in
- \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency
- Control}.
+ \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control}.
\end{fcvref}
}\QuickQuizEnd
@@ -1584,8 +1583,7 @@ the old hash table, and finally \clnref{ret_success} returns success.
\begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod]
Together with the \co{READ_ONCE()}
on \clnref{l:ifresized} in \co{hashtab_lock_mod()}
- of \cref{lst:datastruct:Resizable Hash-Table Update-Side
- Concurrency Control},
+ of \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control},
it tells the compiler that the non-initialization accesses
to \co{->ht_resize_cur} must remain because reads
from \co{->ht_resize_cur} really can race with writes,
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
@ 2022-12-26 23:41 ` Akira Yokosawa
2022-12-27 0:26 ` Paul E. McKenney
0 siblings, 1 reply; 18+ messages in thread
From: Akira Yokosawa @ 2022-12-26 23:41 UTC (permalink / raw)
To: SeongJae Park, paulmck; +Cc: perfbook, SeongJae Park, Akira Yokosawa
Hi,
On Mon, 26 Dec 2022 10:16:32 -0800, SeongJae Park wrote:
> From: SeongJae Park <sj38.park@gmail.com>
>
> Add missing unbreakable spaces for 'CPUs' and 'elements'.
>
> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
> ---
> datastruct/datastruct.tex | 25 ++++++++++++-------------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index 99c92d9a..40ea6995 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
[...]
> @@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
>
> Still unconvinced?
> Then look at the log-log plot in
> - \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> - which shows performance for 448 CPUs as a function of the
> + \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
> + which shows performance for 448~CPUs as a function of the
> hash-table size, that is, number of buckets and maximum number
> of elements.
> A hash-table of size 1,024 has 1,024~buckets and contains
This hunk caused an error for me.
-----
l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
,
?
! Emergency stop.
<to be read again>
\protect
l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
,
End of file on the terminal!
-----
Please remove the unbreakable space in \cref{}.
Thanks, Akira
[...]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-26 23:41 ` Akira Yokosawa
@ 2022-12-27 0:26 ` Paul E. McKenney
2022-12-27 16:04 ` SeongJae Park
2022-12-27 16:06 ` [PATCH v2] " SeongJae Park
0 siblings, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-27 0:26 UTC (permalink / raw)
To: Akira Yokosawa; +Cc: SeongJae Park, perfbook, SeongJae Park
On Tue, Dec 27, 2022 at 08:41:10AM +0900, Akira Yokosawa wrote:
> Hi,
>
> On Mon, 26 Dec 2022 10:16:32 -0800, SeongJae Park wrote:
> > From: SeongJae Park <sj38.park@gmail.com>
> >
> > Add missing unbreakable spaces for 'CPUs' and 'elements'.
> >
> > Signed-off-by: SeongJae Park <sj38.park@gmail.com>
I queued and pushed, 1/4, 3/4, and 4/4, thank you!
Please do send an updated version of 2/4.
Thanx, Paul
> > ---
> > datastruct/datastruct.tex | 25 ++++++++++++-------------
> > 1 file changed, 12 insertions(+), 13 deletions(-)
> >
> > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > index 99c92d9a..40ea6995 100644
> > --- a/datastruct/datastruct.tex
> > +++ b/datastruct/datastruct.tex
> [...]
> > @@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
> >
> > Still unconvinced?
> > Then look at the log-log plot in
> > - \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> > - which shows performance for 448 CPUs as a function of the
> > + \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
> > + which shows performance for 448~CPUs as a function of the
> > hash-table size, that is, number of buckets and maximum number
> > of elements.
> > A hash-table of size 1,024 has 1,024~buckets and contains
>
> This hunk caused an error for me.
>
> -----
> l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
> ,
> ?
> ! Emergency stop.
> <to be read again>
> \protect
> l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
> ,
> End of file on the terminal!
> -----
>
> Please remove the unbreakable space in \cref{}.
>
> Thanks, Akira
>
> [...]
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-27 0:26 ` Paul E. McKenney
@ 2022-12-27 16:04 ` SeongJae Park
2022-12-27 16:06 ` [PATCH v2] " SeongJae Park
1 sibling, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-27 16:04 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: Akira Yokosawa, SeongJae Park, perfbook, SeongJae Park
On Mon, 26 Dec 2022 16:26:50 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote:
> On Tue, Dec 27, 2022 at 08:41:10AM +0900, Akira Yokosawa wrote:
> > Hi,
> >
> > On Mon, 26 Dec 2022 10:16:32 -0800, SeongJae Park wrote:
> > > From: SeongJae Park <sj38.park@gmail.com>
> > >
> > > Add missing unbreakable spaces for 'CPUs' and 'elements'.
> > >
> > > Signed-off-by: SeongJae Park <sj38.park@gmail.com>
>
> I queued and pushed, 1/4, 3/4, and 4/4, thank you!
>
> Please do send an updated version of 2/4.
Thank you, and sorry for my mistake. I will send the updated version right
now.
Thanks,
SJ
>
> Thanx, Paul
>
> > > ---
> > > datastruct/datastruct.tex | 25 ++++++++++++-------------
> > > 1 file changed, 12 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > > index 99c92d9a..40ea6995 100644
> > > --- a/datastruct/datastruct.tex
> > > +++ b/datastruct/datastruct.tex
> > [...]
> > > @@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
> > >
> > > Still unconvinced?
> > > Then look at the log-log plot in
> > > - \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> > > - which shows performance for 448 CPUs as a function of the
> > > + \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
> > > + which shows performance for 448~CPUs as a function of the
> > > hash-table size, that is, number of buckets and maximum number
> > > of elements.
> > > A hash-table of size 1,024 has 1,024~buckets and contains
> >
> > This hunk caused an error for me.
> >
> > -----
> > l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
> > ,
> > ?
> > ! Emergency stop.
> > <to be read again>
> > \protect
> > l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
> > ,
> > End of file on the terminal!
> > -----
> >
> > Please remove the unbreakable space in \cref{}.
> >
> > Thanks, Akira
> >
> > [...]
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 0:26 ` Paul E. McKenney
2022-12-27 16:04 ` SeongJae Park
@ 2022-12-27 16:06 ` SeongJae Park
2022-12-27 16:06 ` SeongJae Park
1 sibling, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2022-12-27 16:06 UTC (permalink / raw)
To: paulmck; +Cc: akiyks, perfbook, SeongJae Park
Add missing unbreakable spaces for 'CPUs' and 'elements'.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
Changes from v1
- Fix build error by removing unbreakable space from \cref{}
datastruct/datastruct.tex | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 99c92d9a..c095b846 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -664,7 +664,7 @@ shows the same data on a linear scale.
This drops the global-locking trace into the x-axis, but allows the
non-ideal performance of RCU and hazard pointers to be more readily
discerned.
-Both show a change in slope at 224 CPUs, and this is due to hardware
+Both show a change in slope at 224~CPUs, and this is due to hardware
multithreading.
At 32 and fewer CPUs, each thread has a core to itself.
In this regime, RCU does better than does hazard pointers because the
@@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
In short, RCU is better able to utilize a core from a single hardware
thread than is hazard pointers.
-This situation changes above 224 CPUs.
+This situation changes above 224~CPUs.
Because RCU is using more than half of each core's resources from a
single hardware thread, RCU gains relatively little benefit from the
second hardware thread in each core.
-The slope of the hazard-pointers trace also decreases at 224 CPUs, but
+The slope of the hazard-pointers trace also decreases at 224~CPUs, but
less dramatically,
because the second hardware thread is able to fill in the time
that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
@@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
Still unconvinced?
Then look at the log-log plot in
\cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
- which shows performance for 448 CPUs as a function of the
+ which shows performance for 448~CPUs as a function of the
hash-table size, that is, number of buckets and maximum number
of elements.
A hash-table of size 1,024 has 1,024~buckets and contains
@@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
Because this is a read-only benchmark, the actual occupancy is
always equal to the average occupancy.
- This figure shows near-ideal performance below about 8,000
- elements, that is, when the hash table comprises less than
- 1\,MB of data.
+ This figure shows near-ideal performance below about 8,000~elements,
+ that is, when the hash table comprises less than 1\,MB of data.
This near-ideal performance is consistent with that for the
pre-BSD routing table shown in
\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
- even at 448 CPUs.
+ even at 448~CPUs.
However, the performance drops significantly (this is a log-log
plot) at about 8,000~elements, which is where the 1,048,576-byte
L2 cache overflows.
@@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
\QuickQuiz{
The memory system is a serious bottleneck on this big system.
- Why bother putting 448 CPUs on a system without giving them
+ Why bother putting 448~CPUs on a system without giving them
enough memory bandwidth to do something useful???
}\QuickQuizAnswer{
It would indeed be a bad idea to use this large and expensive
@@ -905,10 +904,10 @@ concurrency control to begin with.
\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
therefore shows the effect of updates on readers.
At the extreme left-hand side of this graph, all but one of the CPUs
-are doing lookups, while to the right all 448 CPUs are doing updates.
+are doing lookups, while to the right all 448~CPUs are doing updates.
For all four implementations, the number of lookups per millisecond
decreases as the number of updating CPUs increases, of course reaching
-zero lookups per millisecond when all 448 CPUs are updating.
+zero lookups per millisecond when all 448~CPUs are updating.
Both hazard pointers and RCU do well compared to per-bucket locking
because their readers do not increase update-side lock contention.
RCU does well relative to hazard pointers as the number of updaters
@@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
shows the effect of increasing update rates on the updates themselves.
Again, at the left-hand side of the figure all but one of the CPUs are
-doing lookups and at the right-hand side of the figure all 448 CPUs are
+doing lookups and at the right-hand side of the figure all 448~CPUs are
doing updates.
Hazard pointers and RCU start off with a significant advantage because,
unlike bucket locking, readers do not exclude updaters.
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 16:06 ` [PATCH v2] " SeongJae Park
@ 2022-12-27 16:06 ` SeongJae Park
2022-12-27 18:29 ` Paul E. McKenney
0 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2022-12-27 16:06 UTC (permalink / raw)
To: paulmck; +Cc: akiyks, perfbook, SeongJae Park
Add missing unbreakable spaces for 'CPUs' and 'elements'.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
Changes from v1
- Fix build error by removing unbreakable space from \cref{}
datastruct/datastruct.tex | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 99c92d9a..c095b846 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -664,7 +664,7 @@ shows the same data on a linear scale.
This drops the global-locking trace into the x-axis, but allows the
non-ideal performance of RCU and hazard pointers to be more readily
discerned.
-Both show a change in slope at 224 CPUs, and this is due to hardware
+Both show a change in slope at 224~CPUs, and this is due to hardware
multithreading.
At 32 and fewer CPUs, each thread has a core to itself.
In this regime, RCU does better than does hazard pointers because the
@@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
In short, RCU is better able to utilize a core from a single hardware
thread than is hazard pointers.
-This situation changes above 224 CPUs.
+This situation changes above 224~CPUs.
Because RCU is using more than half of each core's resources from a
single hardware thread, RCU gains relatively little benefit from the
second hardware thread in each core.
-The slope of the hazard-pointers trace also decreases at 224 CPUs, but
+The slope of the hazard-pointers trace also decreases at 224~CPUs, but
less dramatically,
because the second hardware thread is able to fill in the time
that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
@@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
Still unconvinced?
Then look at the log-log plot in
\cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
- which shows performance for 448 CPUs as a function of the
+ which shows performance for 448~CPUs as a function of the
hash-table size, that is, number of buckets and maximum number
of elements.
A hash-table of size 1,024 has 1,024~buckets and contains
@@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
Because this is a read-only benchmark, the actual occupancy is
always equal to the average occupancy.
- This figure shows near-ideal performance below about 8,000
- elements, that is, when the hash table comprises less than
- 1\,MB of data.
+ This figure shows near-ideal performance below about 8,000~elements,
+ that is, when the hash table comprises less than 1\,MB of data.
This near-ideal performance is consistent with that for the
pre-BSD routing table shown in
\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
- even at 448 CPUs.
+ even at 448~CPUs.
However, the performance drops significantly (this is a log-log
plot) at about 8,000~elements, which is where the 1,048,576-byte
L2 cache overflows.
@@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
\QuickQuiz{
The memory system is a serious bottleneck on this big system.
- Why bother putting 448 CPUs on a system without giving them
+ Why bother putting 448~CPUs on a system without giving them
enough memory bandwidth to do something useful???
}\QuickQuizAnswer{
It would indeed be a bad idea to use this large and expensive
@@ -905,10 +904,10 @@ concurrency control to begin with.
\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
therefore shows the effect of updates on readers.
At the extreme left-hand side of this graph, all but one of the CPUs
-are doing lookups, while to the right all 448 CPUs are doing updates.
+are doing lookups, while to the right all 448~CPUs are doing updates.
For all four implementations, the number of lookups per millisecond
decreases as the number of updating CPUs increases, of course reaching
-zero lookups per millisecond when all 448 CPUs are updating.
+zero lookups per millisecond when all 448~CPUs are updating.
Both hazard pointers and RCU do well compared to per-bucket locking
because their readers do not increase update-side lock contention.
RCU does well relative to hazard pointers as the number of updaters
@@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
shows the effect of increasing update rates on the updates themselves.
Again, at the left-hand side of the figure all but one of the CPUs are
-doing lookups and at the right-hand side of the figure all 448 CPUs are
+doing lookups and at the right-hand side of the figure all 448~CPUs are
doing updates.
Hazard pointers and RCU start off with a significant advantage because,
unlike bucket locking, readers do not exclude updaters.
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 16:06 ` SeongJae Park
@ 2022-12-27 18:29 ` Paul E. McKenney
2022-12-27 23:26 ` Akira Yokosawa
0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-27 18:29 UTC (permalink / raw)
To: SeongJae Park; +Cc: akiyks, perfbook
On Tue, Dec 27, 2022 at 08:06:19AM -0800, SeongJae Park wrote:
> Add missing unbreakable spaces for 'CPUs' and 'elements'.
>
> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Works for me, thank you!
I have queued this, and if Akira (who tests with a much wider variety
of environments than I do) does not object, then I will push it out.
Thanx, Paul
> ---
> Changes from v1
> - Fix build error by removing unbreakable space from \cref{}
>
> datastruct/datastruct.tex | 23 +++++++++++------------
> 1 file changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index 99c92d9a..c095b846 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
> @@ -664,7 +664,7 @@ shows the same data on a linear scale.
> This drops the global-locking trace into the x-axis, but allows the
> non-ideal performance of RCU and hazard pointers to be more readily
> discerned.
> -Both show a change in slope at 224 CPUs, and this is due to hardware
> +Both show a change in slope at 224~CPUs, and this is due to hardware
> multithreading.
> At 32 and fewer CPUs, each thread has a core to itself.
> In this regime, RCU does better than does hazard pointers because the
> @@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
> In short, RCU is better able to utilize a core from a single hardware
> thread than is hazard pointers.
>
> -This situation changes above 224 CPUs.
> +This situation changes above 224~CPUs.
> Because RCU is using more than half of each core's resources from a
> single hardware thread, RCU gains relatively little benefit from the
> second hardware thread in each core.
> -The slope of the hazard-pointers trace also decreases at 224 CPUs, but
> +The slope of the hazard-pointers trace also decreases at 224~CPUs, but
> less dramatically,
> because the second hardware thread is able to fill in the time
> that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
> @@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
> Still unconvinced?
> Then look at the log-log plot in
> \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> - which shows performance for 448 CPUs as a function of the
> + which shows performance for 448~CPUs as a function of the
> hash-table size, that is, number of buckets and maximum number
> of elements.
> A hash-table of size 1,024 has 1,024~buckets and contains
> @@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
> Because this is a read-only benchmark, the actual occupancy is
> always equal to the average occupancy.
>
> - This figure shows near-ideal performance below about 8,000
> - elements, that is, when the hash table comprises less than
> - 1\,MB of data.
> + This figure shows near-ideal performance below about 8,000~elements,
> + that is, when the hash table comprises less than 1\,MB of data.
> This near-ideal performance is consistent with that for the
> pre-BSD routing table shown in
> \cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
> on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
> - even at 448 CPUs.
> + even at 448~CPUs.
> However, the performance drops significantly (this is a log-log
> plot) at about 8,000~elements, which is where the 1,048,576-byte
> L2 cache overflows.
> @@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
>
> \QuickQuiz{
> The memory system is a serious bottleneck on this big system.
> - Why bother putting 448 CPUs on a system without giving them
> + Why bother putting 448~CPUs on a system without giving them
> enough memory bandwidth to do something useful???
> }\QuickQuizAnswer{
> It would indeed be a bad idea to use this large and expensive
> @@ -905,10 +904,10 @@ concurrency control to begin with.
> \Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
> therefore shows the effect of updates on readers.
> At the extreme left-hand side of this graph, all but one of the CPUs
> -are doing lookups, while to the right all 448 CPUs are doing updates.
> +are doing lookups, while to the right all 448~CPUs are doing updates.
> For all four implementations, the number of lookups per millisecond
> decreases as the number of updating CPUs increases, of course reaching
> -zero lookups per millisecond when all 448 CPUs are updating.
> +zero lookups per millisecond when all 448~CPUs are updating.
> Both hazard pointers and RCU do well compared to per-bucket locking
> because their readers do not increase update-side lock contention.
> RCU does well relative to hazard pointers as the number of updaters
> @@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
> \cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
> shows the effect of increasing update rates on the updates themselves.
> Again, at the left-hand side of the figure all but one of the CPUs are
> -doing lookups and at the right-hand side of the figure all 448 CPUs are
> +doing lookups and at the right-hand side of the figure all 448~CPUs are
> doing updates.
> Hazard pointers and RCU start off with a significant advantage because,
> unlike bucket locking, readers do not exclude updaters.
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 18:29 ` Paul E. McKenney
@ 2022-12-27 23:26 ` Akira Yokosawa
2022-12-28 0:40 ` Paul E. McKenney
0 siblings, 1 reply; 18+ messages in thread
From: Akira Yokosawa @ 2022-12-27 23:26 UTC (permalink / raw)
To: paulmck, SeongJae Park; +Cc: perfbook, Akira Yokosawa
Hi,
On Date: Tue, 27 Dec 2022 10:29:20 -0800, Paul E. McKenney wrote:
> On Tue, Dec 27, 2022 at 08:06:19AM -0800, SeongJae Park wrote:
>> Add missing unbreakable spaces for 'CPUs' and 'elements'.
>>
>> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
>
> Works for me, thank you!
>
> I have queued this, and if Akira (who tests with a much wider variety
> of environments than I do) does not object, then I will push it out.
>
> Thanx, Paul
>
>> ---
>> Changes from v1
>> - Fix build error by removing unbreakable space from \cref{}
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Thanks, Akira
>>
>> datastruct/datastruct.tex | 23 +++++++++++------------
>> 1 file changed, 11 insertions(+), 12 deletions(-)
>>
>> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
>> index 99c92d9a..c095b846 100644
>> --- a/datastruct/datastruct.tex
>> +++ b/datastruct/datastruct.tex
>> @@ -664,7 +664,7 @@ shows the same data on a linear scale.
>> This drops the global-locking trace into the x-axis, but allows the
>> non-ideal performance of RCU and hazard pointers to be more readily
>> discerned.
>> -Both show a change in slope at 224 CPUs, and this is due to hardware
>> +Both show a change in slope at 224~CPUs, and this is due to hardware
>> multithreading.
>> At 32 and fewer CPUs, each thread has a core to itself.
>> In this regime, RCU does better than does hazard pointers because the
>> @@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
>> In short, RCU is better able to utilize a core from a single hardware
>> thread than is hazard pointers.
>>
>> -This situation changes above 224 CPUs.
>> +This situation changes above 224~CPUs.
>> Because RCU is using more than half of each core's resources from a
>> single hardware thread, RCU gains relatively little benefit from the
>> second hardware thread in each core.
>> -The slope of the hazard-pointers trace also decreases at 224 CPUs, but
>> +The slope of the hazard-pointers trace also decreases at 224~CPUs, but
>> less dramatically,
>> because the second hardware thread is able to fill in the time
>> that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
>> @@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
>> Still unconvinced?
>> Then look at the log-log plot in
>> \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
>> - which shows performance for 448 CPUs as a function of the
>> + which shows performance for 448~CPUs as a function of the
>> hash-table size, that is, number of buckets and maximum number
>> of elements.
>> A hash-table of size 1,024 has 1,024~buckets and contains
>> @@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
>> Because this is a read-only benchmark, the actual occupancy is
>> always equal to the average occupancy.
>>
>> - This figure shows near-ideal performance below about 8,000
>> - elements, that is, when the hash table comprises less than
>> - 1\,MB of data.
>> + This figure shows near-ideal performance below about 8,000~elements,
>> + that is, when the hash table comprises less than 1\,MB of data.
>> This near-ideal performance is consistent with that for the
>> pre-BSD routing table shown in
>> \cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
>> on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
>> - even at 448 CPUs.
>> + even at 448~CPUs.
>> However, the performance drops significantly (this is a log-log
>> plot) at about 8,000~elements, which is where the 1,048,576-byte
>> L2 cache overflows.
>> @@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
>>
>> \QuickQuiz{
>> The memory system is a serious bottleneck on this big system.
>> - Why bother putting 448 CPUs on a system without giving them
>> + Why bother putting 448~CPUs on a system without giving them
>> enough memory bandwidth to do something useful???
>> }\QuickQuizAnswer{
>> It would indeed be a bad idea to use this large and expensive
>> @@ -905,10 +904,10 @@ concurrency control to begin with.
>> \Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
>> therefore shows the effect of updates on readers.
>> At the extreme left-hand side of this graph, all but one of the CPUs
>> -are doing lookups, while to the right all 448 CPUs are doing updates.
>> +are doing lookups, while to the right all 448~CPUs are doing updates.
>> For all four implementations, the number of lookups per millisecond
>> decreases as the number of updating CPUs increases, of course reaching
>> -zero lookups per millisecond when all 448 CPUs are updating.
>> +zero lookups per millisecond when all 448~CPUs are updating.
>> Both hazard pointers and RCU do well compared to per-bucket locking
>> because their readers do not increase update-side lock contention.
>> RCU does well relative to hazard pointers as the number of updaters
>> @@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
>> \cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
>> shows the effect of increasing update rates on the updates themselves.
>> Again, at the left-hand side of the figure all but one of the CPUs are
>> -doing lookups and at the right-hand side of the figure all 448 CPUs are
>> +doing lookups and at the right-hand side of the figure all 448~CPUs are
>> doing updates.
>> Hazard pointers and RCU start off with a significant advantage because,
>> unlike bucket locking, readers do not exclude updaters.
>> --
>> 2.17.1
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 23:26 ` Akira Yokosawa
@ 2022-12-28 0:40 ` Paul E. McKenney
0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-28 0:40 UTC (permalink / raw)
To: Akira Yokosawa; +Cc: SeongJae Park, perfbook
On Wed, Dec 28, 2022 at 08:26:23AM +0900, Akira Yokosawa wrote:
> Hi,
>
> On Date: Tue, 27 Dec 2022 10:29:20 -0800, Paul E. McKenney wrote:
> > On Tue, Dec 27, 2022 at 08:06:19AM -0800, SeongJae Park wrote:
> >> Add missing unbreakable spaces for 'CPUs' and 'elements'.
> >>
> >> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
> >
> > Works for me, thank you!
> >
> > I have queued this, and if Akira (who tests with a much wider variety
> > of environments than I do) does not object, then I will push it out.
> >
> > Thanx, Paul
> >
> >> ---
> >> Changes from v1
> >> - Fix build error by removing unbreakable space from \cref{}
>
> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
And pushed, thank you both!
Thanx, Paul
> Thanks, Akira
> >>
> >> datastruct/datastruct.tex | 23 +++++++++++------------
> >> 1 file changed, 11 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> >> index 99c92d9a..c095b846 100644
> >> --- a/datastruct/datastruct.tex
> >> +++ b/datastruct/datastruct.tex
> >> @@ -664,7 +664,7 @@ shows the same data on a linear scale.
> >> This drops the global-locking trace into the x-axis, but allows the
> >> non-ideal performance of RCU and hazard pointers to be more readily
> >> discerned.
> >> -Both show a change in slope at 224 CPUs, and this is due to hardware
> >> +Both show a change in slope at 224~CPUs, and this is due to hardware
> >> multithreading.
> >> At 32 and fewer CPUs, each thread has a core to itself.
> >> In this regime, RCU does better than does hazard pointers because the
> >> @@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
> >> In short, RCU is better able to utilize a core from a single hardware
> >> thread than is hazard pointers.
> >>
> >> -This situation changes above 224 CPUs.
> >> +This situation changes above 224~CPUs.
> >> Because RCU is using more than half of each core's resources from a
> >> single hardware thread, RCU gains relatively little benefit from the
> >> second hardware thread in each core.
> >> -The slope of the hazard-pointers trace also decreases at 224 CPUs, but
> >> +The slope of the hazard-pointers trace also decreases at 224~CPUs, but
> >> less dramatically,
> >> because the second hardware thread is able to fill in the time
> >> that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
> >> @@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
> >> Still unconvinced?
> >> Then look at the log-log plot in
> >> \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> >> - which shows performance for 448 CPUs as a function of the
> >> + which shows performance for 448~CPUs as a function of the
> >> hash-table size, that is, number of buckets and maximum number
> >> of elements.
> >> A hash-table of size 1,024 has 1,024~buckets and contains
> >> @@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
> >> Because this is a read-only benchmark, the actual occupancy is
> >> always equal to the average occupancy.
> >>
> >> - This figure shows near-ideal performance below about 8,000
> >> - elements, that is, when the hash table comprises less than
> >> - 1\,MB of data.
> >> + This figure shows near-ideal performance below about 8,000~elements,
> >> + that is, when the hash table comprises less than 1\,MB of data.
> >> This near-ideal performance is consistent with that for the
> >> pre-BSD routing table shown in
> >> \cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
> >> on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
> >> - even at 448 CPUs.
> >> + even at 448~CPUs.
> >> However, the performance drops significantly (this is a log-log
> >> plot) at about 8,000~elements, which is where the 1,048,576-byte
> >> L2 cache overflows.
> >> @@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
> >>
> >> \QuickQuiz{
> >> The memory system is a serious bottleneck on this big system.
> >> - Why bother putting 448 CPUs on a system without giving them
> >> + Why bother putting 448~CPUs on a system without giving them
> >> enough memory bandwidth to do something useful???
> >> }\QuickQuizAnswer{
> >> It would indeed be a bad idea to use this large and expensive
> >> @@ -905,10 +904,10 @@ concurrency control to begin with.
> >> \Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
> >> therefore shows the effect of updates on readers.
> >> At the extreme left-hand side of this graph, all but one of the CPUs
> >> -are doing lookups, while to the right all 448 CPUs are doing updates.
> >> +are doing lookups, while to the right all 448~CPUs are doing updates.
> >> For all four implementations, the number of lookups per millisecond
> >> decreases as the number of updating CPUs increases, of course reaching
> >> -zero lookups per millisecond when all 448 CPUs are updating.
> >> +zero lookups per millisecond when all 448~CPUs are updating.
> >> Both hazard pointers and RCU do well compared to per-bucket locking
> >> because their readers do not increase update-side lock contention.
> >> RCU does well relative to hazard pointers as the number of updaters
> >> @@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
> >> \cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
> >> shows the effect of increasing update rates on the updates themselves.
> >> Again, at the left-hand side of the figure all but one of the CPUs are
> >> -doing lookups and at the right-hand side of the figure all 448 CPUs are
> >> +doing lookups and at the right-hand side of the figure all 448~CPUs are
> >> doing updates.
> >> Hazard pointers and RCU start off with a significant advantage because,
> >> unlike bucket locking, readers do not exclude updaters.
> >> --
> >> 2.17.1
> >>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-12-28 0:40 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
2022-12-26 23:41 ` Akira Yokosawa
2022-12-27 0:26 ` Paul E. McKenney
2022-12-27 16:04 ` SeongJae Park
2022-12-27 16:06 ` [PATCH v2] " SeongJae Park
2022-12-27 16:06 ` SeongJae Park
2022-12-27 18:29 ` Paul E. McKenney
2022-12-27 23:26 ` Akira Yokosawa
2022-12-28 0:40 ` Paul E. McKenney
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
2022-12-26 18:16 ` [PATCH 0/4] datastruct: Minor fixes SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.