* [PATCH 1/4] datastruct: Remove unnecessary space
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
` (7 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
A sentence in datastruct has unnecessary extra space between words.
Remove it.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index ed404e5a..99c92d9a 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -34,7 +34,7 @@ which improves both performance and scalability.
Because this chapter cannot delve into the details of every concurrent
data structure,
\cref{sec:datastruct:Other Data Structures}
-surveys a few of the important ones.
+surveys a few of the important ones.
Although the best performance and scalability results from design rather
than after-the-fact micro-optimization, micro-optimization is nevertheless
necessary for the absolute best possible performance and scalability,
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 23:41 ` Akira Yokosawa
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
` (6 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Add missing unbreakable spaces for 'CPUs' and 'elements'.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 99c92d9a..40ea6995 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -664,7 +664,7 @@ shows the same data on a linear scale.
This drops the global-locking trace into the x-axis, but allows the
non-ideal performance of RCU and hazard pointers to be more readily
discerned.
-Both show a change in slope at 224 CPUs, and this is due to hardware
+Both show a change in slope at 224~CPUs, and this is due to hardware
multithreading.
At 32 and fewer CPUs, each thread has a core to itself.
In this regime, RCU does better than does hazard pointers because the
@@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
In short, RCU is better able to utilize a core from a single hardware
thread than is hazard pointers.
-This situation changes above 224 CPUs.
+This situation changes above 224~CPUs.
Because RCU is using more than half of each core's resources from a
single hardware thread, RCU gains relatively little benefit from the
second hardware thread in each core.
-The slope of the hazard-pointers trace also decreases at 224 CPUs, but
+The slope of the hazard-pointers trace also decreases at 224~CPUs, but
less dramatically,
because the second hardware thread is able to fill in the time
that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
@@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
Still unconvinced?
Then look at the log-log plot in
- \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
- which shows performance for 448 CPUs as a function of the
+ \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
+ which shows performance for 448~CPUs as a function of the
hash-table size, that is, number of buckets and maximum number
of elements.
A hash-table of size 1,024 has 1,024~buckets and contains
@@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
Because this is a read-only benchmark, the actual occupancy is
always equal to the average occupancy.
- This figure shows near-ideal performance below about 8,000
- elements, that is, when the hash table comprises less than
- 1\,MB of data.
+ This figure shows near-ideal performance below about 8,000~elements,
+ that is, when the hash table comprises less than 1\,MB of data.
This near-ideal performance is consistent with that for the
pre-BSD routing table shown in
\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
- even at 448 CPUs.
+ even at 448~CPUs.
However, the performance drops significantly (this is a log-log
plot) at about 8,000~elements, which is where the 1,048,576-byte
L2 cache overflows.
@@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
\QuickQuiz{
The memory system is a serious bottleneck on this big system.
- Why bother putting 448 CPUs on a system without giving them
+ Why bother putting 448~CPUs on a system without giving them
enough memory bandwidth to do something useful???
}\QuickQuizAnswer{
It would indeed be a bad idea to use this large and expensive
@@ -905,10 +904,10 @@ concurrency control to begin with.
\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
therefore shows the effect of updates on readers.
At the extreme left-hand side of this graph, all but one of the CPUs
-are doing lookups, while to the right all 448 CPUs are doing updates.
+are doing lookups, while to the right all 448~CPUs are doing updates.
For all four implementations, the number of lookups per millisecond
decreases as the number of updating CPUs increases, of course reaching
-zero lookups per millisecond when all 448 CPUs are updating.
+zero lookups per millisecond when all 448~CPUs are updating.
Both hazard pointers and RCU do well compared to per-bucket locking
because their readers do not increase update-side lock contention.
RCU does well relative to hazard pointers as the number of updaters
@@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
shows the effect of increasing update rates on the updates themselves.
Again, at the left-hand side of the figure all but one of the CPUs are
-doing lookups and at the right-hand side of the figure all 448 CPUs are
+doing lookups and at the right-hand side of the figure all 448~CPUs are
doing updates.
Hazard pointers and RCU start off with a significant advantage because,
unlike bucket locking, readers do not exclude updaters.
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
@ 2022-12-26 23:41 ` Akira Yokosawa
2022-12-27 0:26 ` Paul E. McKenney
0 siblings, 1 reply; 18+ messages in thread
From: Akira Yokosawa @ 2022-12-26 23:41 UTC (permalink / raw)
To: SeongJae Park, paulmck; +Cc: perfbook, SeongJae Park, Akira Yokosawa
Hi,
On Mon, 26 Dec 2022 10:16:32 -0800, SeongJae Park wrote:
> From: SeongJae Park <sj38.park@gmail.com>
>
> Add missing unbreakable spaces for 'CPUs' and 'elements'.
>
> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
> ---
> datastruct/datastruct.tex | 25 ++++++++++++-------------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index 99c92d9a..40ea6995 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
[...]
> @@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
>
> Still unconvinced?
> Then look at the log-log plot in
> - \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> - which shows performance for 448 CPUs as a function of the
> + \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
> + which shows performance for 448~CPUs as a function of the
> hash-table size, that is, number of buckets and maximum number
> of elements.
> A hash-table of size 1,024 has 1,024~buckets and contains
This hunk caused an error for me.
-----
l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
,
?
! Emergency stop.
<to be read again>
\protect
l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
,
End of file on the terminal!
-----
Please remove the unbreakable space in \cref{}.
Thanks, Akira
[...]
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-26 23:41 ` Akira Yokosawa
@ 2022-12-27 0:26 ` Paul E. McKenney
2022-12-27 16:04 ` SeongJae Park
2022-12-27 16:06 ` [PATCH v2] " SeongJae Park
0 siblings, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-27 0:26 UTC (permalink / raw)
To: Akira Yokosawa; +Cc: SeongJae Park, perfbook, SeongJae Park
On Tue, Dec 27, 2022 at 08:41:10AM +0900, Akira Yokosawa wrote:
> Hi,
>
> On Mon, 26 Dec 2022 10:16:32 -0800, SeongJae Park wrote:
> > From: SeongJae Park <sj38.park@gmail.com>
> >
> > Add missing unbreakable spaces for 'CPUs' and 'elements'.
> >
> > Signed-off-by: SeongJae Park <sj38.park@gmail.com>
I queued and pushed, 1/4, 3/4, and 4/4, thank you!
Please do send an updated version of 2/4.
Thanx, Paul
> > ---
> > datastruct/datastruct.tex | 25 ++++++++++++-------------
> > 1 file changed, 12 insertions(+), 13 deletions(-)
> >
> > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > index 99c92d9a..40ea6995 100644
> > --- a/datastruct/datastruct.tex
> > +++ b/datastruct/datastruct.tex
> [...]
> > @@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
> >
> > Still unconvinced?
> > Then look at the log-log plot in
> > - \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> > - which shows performance for 448 CPUs as a function of the
> > + \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
> > + which shows performance for 448~CPUs as a function of the
> > hash-table size, that is, number of buckets and maximum number
> > of elements.
> > A hash-table of size 1,024 has 1,024~buckets and contains
>
> This hunk caused an error for me.
>
> -----
> l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
> ,
> ?
> ! Emergency stop.
> <to be read again>
> \protect
> l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
> ,
> End of file on the terminal!
> -----
>
> Please remove the unbreakable space in \cref{}.
>
> Thanks, Akira
>
> [...]
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-27 0:26 ` Paul E. McKenney
@ 2022-12-27 16:04 ` SeongJae Park
2022-12-27 16:06 ` [PATCH v2] " SeongJae Park
1 sibling, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-27 16:04 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: Akira Yokosawa, SeongJae Park, perfbook, SeongJae Park
On Mon, 26 Dec 2022 16:26:50 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote:
> On Tue, Dec 27, 2022 at 08:41:10AM +0900, Akira Yokosawa wrote:
> > Hi,
> >
> > On Mon, 26 Dec 2022 10:16:32 -0800, SeongJae Park wrote:
> > > From: SeongJae Park <sj38.park@gmail.com>
> > >
> > > Add missing unbreakable spaces for 'CPUs' and 'elements'.
> > >
> > > Signed-off-by: SeongJae Park <sj38.park@gmail.com>
>
> I queued and pushed, 1/4, 3/4, and 4/4, thank you!
>
> Please do send an updated version of 2/4.
Thank you, and sorry for my mistake. I will send the updated version right
now.
Thanks,
SJ
>
> Thanx, Paul
>
> > > ---
> > > datastruct/datastruct.tex | 25 ++++++++++++-------------
> > > 1 file changed, 12 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> > > index 99c92d9a..40ea6995 100644
> > > --- a/datastruct/datastruct.tex
> > > +++ b/datastruct/datastruct.tex
> > [...]
> > > @@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
> > >
> > > Still unconvinced?
> > > Then look at the log-log plot in
> > > - \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> > > - which shows performance for 448 CPUs as a function of the
> > > + \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
> > > + which shows performance for 448~CPUs as a function of the
> > > hash-table size, that is, number of buckets and maximum number
> > > of elements.
> > > A hash-table of size 1,024 has 1,024~buckets and contains
> >
> > This hunk caused an error for me.
> >
> > -----
> > l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
> > ,
> > ?
> > ! Emergency stop.
> > <to be read again>
> > \protect
> > l.6047 ...r's Zoo at 448~CPUs; Varying Table Size}
> > ,
> > End of file on the terminal!
> > -----
> >
> > Please remove the unbreakable space in \cref{}.
> >
> > Thanks, Akira
> >
> > [...]
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 0:26 ` Paul E. McKenney
2022-12-27 16:04 ` SeongJae Park
@ 2022-12-27 16:06 ` SeongJae Park
2022-12-27 16:06 ` SeongJae Park
1 sibling, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2022-12-27 16:06 UTC (permalink / raw)
To: paulmck; +Cc: akiyks, perfbook, SeongJae Park
Add missing unbreakable spaces for 'CPUs' and 'elements'.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
Changes from v1
- Fix build error by removing unbreakable space from \cref{}
datastruct/datastruct.tex | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 99c92d9a..c095b846 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -664,7 +664,7 @@ shows the same data on a linear scale.
This drops the global-locking trace into the x-axis, but allows the
non-ideal performance of RCU and hazard pointers to be more readily
discerned.
-Both show a change in slope at 224 CPUs, and this is due to hardware
+Both show a change in slope at 224~CPUs, and this is due to hardware
multithreading.
At 32 and fewer CPUs, each thread has a core to itself.
In this regime, RCU does better than does hazard pointers because the
@@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
In short, RCU is better able to utilize a core from a single hardware
thread than is hazard pointers.
-This situation changes above 224 CPUs.
+This situation changes above 224~CPUs.
Because RCU is using more than half of each core's resources from a
single hardware thread, RCU gains relatively little benefit from the
second hardware thread in each core.
-The slope of the hazard-pointers trace also decreases at 224 CPUs, but
+The slope of the hazard-pointers trace also decreases at 224~CPUs, but
less dramatically,
because the second hardware thread is able to fill in the time
that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
@@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
Still unconvinced?
Then look at the log-log plot in
\cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
- which shows performance for 448 CPUs as a function of the
+ which shows performance for 448~CPUs as a function of the
hash-table size, that is, number of buckets and maximum number
of elements.
A hash-table of size 1,024 has 1,024~buckets and contains
@@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
Because this is a read-only benchmark, the actual occupancy is
always equal to the average occupancy.
- This figure shows near-ideal performance below about 8,000
- elements, that is, when the hash table comprises less than
- 1\,MB of data.
+ This figure shows near-ideal performance below about 8,000~elements,
+ that is, when the hash table comprises less than 1\,MB of data.
This near-ideal performance is consistent with that for the
pre-BSD routing table shown in
\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
- even at 448 CPUs.
+ even at 448~CPUs.
However, the performance drops significantly (this is a log-log
plot) at about 8,000~elements, which is where the 1,048,576-byte
L2 cache overflows.
@@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
\QuickQuiz{
The memory system is a serious bottleneck on this big system.
- Why bother putting 448 CPUs on a system without giving them
+ Why bother putting 448~CPUs on a system without giving them
enough memory bandwidth to do something useful???
}\QuickQuizAnswer{
It would indeed be a bad idea to use this large and expensive
@@ -905,10 +904,10 @@ concurrency control to begin with.
\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
therefore shows the effect of updates on readers.
At the extreme left-hand side of this graph, all but one of the CPUs
-are doing lookups, while to the right all 448 CPUs are doing updates.
+are doing lookups, while to the right all 448~CPUs are doing updates.
For all four implementations, the number of lookups per millisecond
decreases as the number of updating CPUs increases, of course reaching
-zero lookups per millisecond when all 448 CPUs are updating.
+zero lookups per millisecond when all 448~CPUs are updating.
Both hazard pointers and RCU do well compared to per-bucket locking
because their readers do not increase update-side lock contention.
RCU does well relative to hazard pointers as the number of updaters
@@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
shows the effect of increasing update rates on the updates themselves.
Again, at the left-hand side of the figure all but one of the CPUs are
-doing lookups and at the right-hand side of the figure all 448 CPUs are
+doing lookups and at the right-hand side of the figure all 448~CPUs are
doing updates.
Hazard pointers and RCU start off with a significant advantage because,
unlike bucket locking, readers do not exclude updaters.
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 16:06 ` [PATCH v2] " SeongJae Park
@ 2022-12-27 16:06 ` SeongJae Park
2022-12-27 18:29 ` Paul E. McKenney
0 siblings, 1 reply; 18+ messages in thread
From: SeongJae Park @ 2022-12-27 16:06 UTC (permalink / raw)
To: paulmck; +Cc: akiyks, perfbook, SeongJae Park
Add missing unbreakable spaces for 'CPUs' and 'elements'.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
Changes from v1
- Fix build error by removing unbreakable space from \cref{}
datastruct/datastruct.tex | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 99c92d9a..c095b846 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -664,7 +664,7 @@ shows the same data on a linear scale.
This drops the global-locking trace into the x-axis, but allows the
non-ideal performance of RCU and hazard pointers to be more readily
discerned.
-Both show a change in slope at 224 CPUs, and this is due to hardware
+Both show a change in slope at 224~CPUs, and this is due to hardware
multithreading.
At 32 and fewer CPUs, each thread has a core to itself.
In this regime, RCU does better than does hazard pointers because the
@@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
In short, RCU is better able to utilize a core from a single hardware
thread than is hazard pointers.
-This situation changes above 224 CPUs.
+This situation changes above 224~CPUs.
Because RCU is using more than half of each core's resources from a
single hardware thread, RCU gains relatively little benefit from the
second hardware thread in each core.
-The slope of the hazard-pointers trace also decreases at 224 CPUs, but
+The slope of the hazard-pointers trace also decreases at 224~CPUs, but
less dramatically,
because the second hardware thread is able to fill in the time
that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
@@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
Still unconvinced?
Then look at the log-log plot in
\cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
- which shows performance for 448 CPUs as a function of the
+ which shows performance for 448~CPUs as a function of the
hash-table size, that is, number of buckets and maximum number
of elements.
A hash-table of size 1,024 has 1,024~buckets and contains
@@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
Because this is a read-only benchmark, the actual occupancy is
always equal to the average occupancy.
- This figure shows near-ideal performance below about 8,000
- elements, that is, when the hash table comprises less than
- 1\,MB of data.
+ This figure shows near-ideal performance below about 8,000~elements,
+ that is, when the hash table comprises less than 1\,MB of data.
This near-ideal performance is consistent with that for the
pre-BSD routing table shown in
\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
- even at 448 CPUs.
+ even at 448~CPUs.
However, the performance drops significantly (this is a log-log
plot) at about 8,000~elements, which is where the 1,048,576-byte
L2 cache overflows.
@@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
\QuickQuiz{
The memory system is a serious bottleneck on this big system.
- Why bother putting 448 CPUs on a system without giving them
+ Why bother putting 448~CPUs on a system without giving them
enough memory bandwidth to do something useful???
}\QuickQuizAnswer{
It would indeed be a bad idea to use this large and expensive
@@ -905,10 +904,10 @@ concurrency control to begin with.
\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
therefore shows the effect of updates on readers.
At the extreme left-hand side of this graph, all but one of the CPUs
-are doing lookups, while to the right all 448 CPUs are doing updates.
+are doing lookups, while to the right all 448~CPUs are doing updates.
For all four implementations, the number of lookups per millisecond
decreases as the number of updating CPUs increases, of course reaching
-zero lookups per millisecond when all 448 CPUs are updating.
+zero lookups per millisecond when all 448~CPUs are updating.
Both hazard pointers and RCU do well compared to per-bucket locking
because their readers do not increase update-side lock contention.
RCU does well relative to hazard pointers as the number of updaters
@@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
shows the effect of increasing update rates on the updates themselves.
Again, at the left-hand side of the figure all but one of the CPUs are
-doing lookups and at the right-hand side of the figure all 448 CPUs are
+doing lookups and at the right-hand side of the figure all 448~CPUs are
doing updates.
Hazard pointers and RCU start off with a significant advantage because,
unlike bucket locking, readers do not exclude updaters.
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 16:06 ` SeongJae Park
@ 2022-12-27 18:29 ` Paul E. McKenney
2022-12-27 23:26 ` Akira Yokosawa
0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-27 18:29 UTC (permalink / raw)
To: SeongJae Park; +Cc: akiyks, perfbook
On Tue, Dec 27, 2022 at 08:06:19AM -0800, SeongJae Park wrote:
> Add missing unbreakable spaces for 'CPUs' and 'elements'.
>
> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Works for me, thank you!
I have queued this, and if Akira (who tests with a much wider variety
of environments than I do) does not object, then I will push it out.
Thanx, Paul
> ---
> Changes from v1
> - Fix build error by removing unbreakable space from \cref{}
>
> datastruct/datastruct.tex | 23 +++++++++++------------
> 1 file changed, 11 insertions(+), 12 deletions(-)
>
> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> index 99c92d9a..c095b846 100644
> --- a/datastruct/datastruct.tex
> +++ b/datastruct/datastruct.tex
> @@ -664,7 +664,7 @@ shows the same data on a linear scale.
> This drops the global-locking trace into the x-axis, but allows the
> non-ideal performance of RCU and hazard pointers to be more readily
> discerned.
> -Both show a change in slope at 224 CPUs, and this is due to hardware
> +Both show a change in slope at 224~CPUs, and this is due to hardware
> multithreading.
> At 32 and fewer CPUs, each thread has a core to itself.
> In this regime, RCU does better than does hazard pointers because the
> @@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
> In short, RCU is better able to utilize a core from a single hardware
> thread than is hazard pointers.
>
> -This situation changes above 224 CPUs.
> +This situation changes above 224~CPUs.
> Because RCU is using more than half of each core's resources from a
> single hardware thread, RCU gains relatively little benefit from the
> second hardware thread in each core.
> -The slope of the hazard-pointers trace also decreases at 224 CPUs, but
> +The slope of the hazard-pointers trace also decreases at 224~CPUs, but
> less dramatically,
> because the second hardware thread is able to fill in the time
> that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
> @@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
> Still unconvinced?
> Then look at the log-log plot in
> \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> - which shows performance for 448 CPUs as a function of the
> + which shows performance for 448~CPUs as a function of the
> hash-table size, that is, number of buckets and maximum number
> of elements.
> A hash-table of size 1,024 has 1,024~buckets and contains
> @@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
> Because this is a read-only benchmark, the actual occupancy is
> always equal to the average occupancy.
>
> - This figure shows near-ideal performance below about 8,000
> - elements, that is, when the hash table comprises less than
> - 1\,MB of data.
> + This figure shows near-ideal performance below about 8,000~elements,
> + that is, when the hash table comprises less than 1\,MB of data.
> This near-ideal performance is consistent with that for the
> pre-BSD routing table shown in
> \cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
> on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
> - even at 448 CPUs.
> + even at 448~CPUs.
> However, the performance drops significantly (this is a log-log
> plot) at about 8,000~elements, which is where the 1,048,576-byte
> L2 cache overflows.
> @@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
>
> \QuickQuiz{
> The memory system is a serious bottleneck on this big system.
> - Why bother putting 448 CPUs on a system without giving them
> + Why bother putting 448~CPUs on a system without giving them
> enough memory bandwidth to do something useful???
> }\QuickQuizAnswer{
> It would indeed be a bad idea to use this large and expensive
> @@ -905,10 +904,10 @@ concurrency control to begin with.
> \Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
> therefore shows the effect of updates on readers.
> At the extreme left-hand side of this graph, all but one of the CPUs
> -are doing lookups, while to the right all 448 CPUs are doing updates.
> +are doing lookups, while to the right all 448~CPUs are doing updates.
> For all four implementations, the number of lookups per millisecond
> decreases as the number of updating CPUs increases, of course reaching
> -zero lookups per millisecond when all 448 CPUs are updating.
> +zero lookups per millisecond when all 448~CPUs are updating.
> Both hazard pointers and RCU do well compared to per-bucket locking
> because their readers do not increase update-side lock contention.
> RCU does well relative to hazard pointers as the number of updaters
> @@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
> \cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
> shows the effect of increasing update rates on the updates themselves.
> Again, at the left-hand side of the figure all but one of the CPUs are
> -doing lookups and at the right-hand side of the figure all 448 CPUs are
> +doing lookups and at the right-hand side of the figure all 448~CPUs are
> doing updates.
> Hazard pointers and RCU start off with a significant advantage because,
> unlike bucket locking, readers do not exclude updaters.
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 18:29 ` Paul E. McKenney
@ 2022-12-27 23:26 ` Akira Yokosawa
2022-12-28 0:40 ` Paul E. McKenney
0 siblings, 1 reply; 18+ messages in thread
From: Akira Yokosawa @ 2022-12-27 23:26 UTC (permalink / raw)
To: paulmck, SeongJae Park; +Cc: perfbook, Akira Yokosawa
Hi,
On Date: Tue, 27 Dec 2022 10:29:20 -0800, Paul E. McKenney wrote:
> On Tue, Dec 27, 2022 at 08:06:19AM -0800, SeongJae Park wrote:
>> Add missing unbreakable spaces for 'CPUs' and 'elements'.
>>
>> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
>
> Works for me, thank you!
>
> I have queued this, and if Akira (who tests with a much wider variety
> of environments than I do) does not object, then I will push it out.
>
> Thanx, Paul
>
>> ---
>> Changes from v1
>> - Fix build error by removing unbreakable space from \cref{}
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Thanks, Akira
>>
>> datastruct/datastruct.tex | 23 +++++++++++------------
>> 1 file changed, 11 insertions(+), 12 deletions(-)
>>
>> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
>> index 99c92d9a..c095b846 100644
>> --- a/datastruct/datastruct.tex
>> +++ b/datastruct/datastruct.tex
>> @@ -664,7 +664,7 @@ shows the same data on a linear scale.
>> This drops the global-locking trace into the x-axis, but allows the
>> non-ideal performance of RCU and hazard pointers to be more readily
>> discerned.
>> -Both show a change in slope at 224 CPUs, and this is due to hardware
>> +Both show a change in slope at 224~CPUs, and this is due to hardware
>> multithreading.
>> At 32 and fewer CPUs, each thread has a core to itself.
>> In this regime, RCU does better than does hazard pointers because the
>> @@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
>> In short, RCU is better able to utilize a core from a single hardware
>> thread than is hazard pointers.
>>
>> -This situation changes above 224 CPUs.
>> +This situation changes above 224~CPUs.
>> Because RCU is using more than half of each core's resources from a
>> single hardware thread, RCU gains relatively little benefit from the
>> second hardware thread in each core.
>> -The slope of the hazard-pointers trace also decreases at 224 CPUs, but
>> +The slope of the hazard-pointers trace also decreases at 224~CPUs, but
>> less dramatically,
>> because the second hardware thread is able to fill in the time
>> that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
>> @@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
>> Still unconvinced?
>> Then look at the log-log plot in
>> \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
>> - which shows performance for 448 CPUs as a function of the
>> + which shows performance for 448~CPUs as a function of the
>> hash-table size, that is, number of buckets and maximum number
>> of elements.
>> A hash-table of size 1,024 has 1,024~buckets and contains
>> @@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
>> Because this is a read-only benchmark, the actual occupancy is
>> always equal to the average occupancy.
>>
>> - This figure shows near-ideal performance below about 8,000
>> - elements, that is, when the hash table comprises less than
>> - 1\,MB of data.
>> + This figure shows near-ideal performance below about 8,000~elements,
>> + that is, when the hash table comprises less than 1\,MB of data.
>> This near-ideal performance is consistent with that for the
>> pre-BSD routing table shown in
>> \cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
>> on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
>> - even at 448 CPUs.
>> + even at 448~CPUs.
>> However, the performance drops significantly (this is a log-log
>> plot) at about 8,000~elements, which is where the 1,048,576-byte
>> L2 cache overflows.
>> @@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
>>
>> \QuickQuiz{
>> The memory system is a serious bottleneck on this big system.
>> - Why bother putting 448 CPUs on a system without giving them
>> + Why bother putting 448~CPUs on a system without giving them
>> enough memory bandwidth to do something useful???
>> }\QuickQuizAnswer{
>> It would indeed be a bad idea to use this large and expensive
>> @@ -905,10 +904,10 @@ concurrency control to begin with.
>> \Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
>> therefore shows the effect of updates on readers.
>> At the extreme left-hand side of this graph, all but one of the CPUs
>> -are doing lookups, while to the right all 448 CPUs are doing updates.
>> +are doing lookups, while to the right all 448~CPUs are doing updates.
>> For all four implementations, the number of lookups per millisecond
>> decreases as the number of updating CPUs increases, of course reaching
>> -zero lookups per millisecond when all 448 CPUs are updating.
>> +zero lookups per millisecond when all 448~CPUs are updating.
>> Both hazard pointers and RCU do well compared to per-bucket locking
>> because their readers do not increase update-side lock contention.
>> RCU does well relative to hazard pointers as the number of updaters
>> @@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
>> \cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
>> shows the effect of increasing update rates on the updates themselves.
>> Again, at the left-hand side of the figure all but one of the CPUs are
>> -doing lookups and at the right-hand side of the figure all 448 CPUs are
>> +doing lookups and at the right-hand side of the figure all 448~CPUs are
>> doing updates.
>> Hazard pointers and RCU start off with a significant advantage because,
>> unlike bucket locking, readers do not exclude updaters.
>> --
>> 2.17.1
>>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2] datastruct: Add missed unbreakable spaces
2022-12-27 23:26 ` Akira Yokosawa
@ 2022-12-28 0:40 ` Paul E. McKenney
0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-12-28 0:40 UTC (permalink / raw)
To: Akira Yokosawa; +Cc: SeongJae Park, perfbook
On Wed, Dec 28, 2022 at 08:26:23AM +0900, Akira Yokosawa wrote:
> Hi,
>
> On Date: Tue, 27 Dec 2022 10:29:20 -0800, Paul E. McKenney wrote:
> > On Tue, Dec 27, 2022 at 08:06:19AM -0800, SeongJae Park wrote:
> >> Add missing unbreakable spaces for 'CPUs' and 'elements'.
> >>
> >> Signed-off-by: SeongJae Park <sj38.park@gmail.com>
> >
> > Works for me, thank you!
> >
> > I have queued this, and if Akira (who tests with a much wider variety
> > of environments than I do) does not object, then I will push it out.
> >
> > Thanx, Paul
> >
> >> ---
> >> Changes from v1
> >> - Fix build error by removing unbreakable space from \cref{}
>
> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
And pushed, thank you both!
Thanx, Paul
> Thanks, Akira
> >>
> >> datastruct/datastruct.tex | 23 +++++++++++------------
> >> 1 file changed, 11 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
> >> index 99c92d9a..c095b846 100644
> >> --- a/datastruct/datastruct.tex
> >> +++ b/datastruct/datastruct.tex
> >> @@ -664,7 +664,7 @@ shows the same data on a linear scale.
> >> This drops the global-locking trace into the x-axis, but allows the
> >> non-ideal performance of RCU and hazard pointers to be more readily
> >> discerned.
> >> -Both show a change in slope at 224 CPUs, and this is due to hardware
> >> +Both show a change in slope at 224~CPUs, and this is due to hardware
> >> multithreading.
> >> At 32 and fewer CPUs, each thread has a core to itself.
> >> In this regime, RCU does better than does hazard pointers because the
> >> @@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
> >> In short, RCU is better able to utilize a core from a single hardware
> >> thread than is hazard pointers.
> >>
> >> -This situation changes above 224 CPUs.
> >> +This situation changes above 224~CPUs.
> >> Because RCU is using more than half of each core's resources from a
> >> single hardware thread, RCU gains relatively little benefit from the
> >> second hardware thread in each core.
> >> -The slope of the hazard-pointers trace also decreases at 224 CPUs, but
> >> +The slope of the hazard-pointers trace also decreases at 224~CPUs, but
> >> less dramatically,
> >> because the second hardware thread is able to fill in the time
> >> that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
> >> @@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@.
> >> Still unconvinced?
> >> Then look at the log-log plot in
> >> \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
> >> - which shows performance for 448 CPUs as a function of the
> >> + which shows performance for 448~CPUs as a function of the
> >> hash-table size, that is, number of buckets and maximum number
> >> of elements.
> >> A hash-table of size 1,024 has 1,024~buckets and contains
> >> @@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
> >> Because this is a read-only benchmark, the actual occupancy is
> >> always equal to the average occupancy.
> >>
> >> - This figure shows near-ideal performance below about 8,000
> >> - elements, that is, when the hash table comprises less than
> >> - 1\,MB of data.
> >> + This figure shows near-ideal performance below about 8,000~elements,
> >> + that is, when the hash table comprises less than 1\,MB of data.
> >> This near-ideal performance is consistent with that for the
> >> pre-BSD routing table shown in
> >> \cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
> >> on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
> >> - even at 448 CPUs.
> >> + even at 448~CPUs.
> >> However, the performance drops significantly (this is a log-log
> >> plot) at about 8,000~elements, which is where the 1,048,576-byte
> >> L2 cache overflows.
> >> @@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
> >>
> >> \QuickQuiz{
> >> The memory system is a serious bottleneck on this big system.
> >> - Why bother putting 448 CPUs on a system without giving them
> >> + Why bother putting 448~CPUs on a system without giving them
> >> enough memory bandwidth to do something useful???
> >> }\QuickQuizAnswer{
> >> It would indeed be a bad idea to use this large and expensive
> >> @@ -905,10 +904,10 @@ concurrency control to begin with.
> >> \Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
> >> therefore shows the effect of updates on readers.
> >> At the extreme left-hand side of this graph, all but one of the CPUs
> >> -are doing lookups, while to the right all 448 CPUs are doing updates.
> >> +are doing lookups, while to the right all 448~CPUs are doing updates.
> >> For all four implementations, the number of lookups per millisecond
> >> decreases as the number of updating CPUs increases, of course reaching
> >> -zero lookups per millisecond when all 448 CPUs are updating.
> >> +zero lookups per millisecond when all 448~CPUs are updating.
> >> Both hazard pointers and RCU do well compared to per-bucket locking
> >> because their readers do not increase update-side lock contention.
> >> RCU does well relative to hazard pointers as the number of updaters
> >> @@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
> >> \cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
> >> shows the effect of increasing update rates on the updates themselves.
> >> Again, at the left-hand side of the figure all but one of the CPUs are
> >> -doing lookups and at the right-hand side of the figure all 448 CPUs are
> >> +doing lookups and at the right-hand side of the figure all 448~CPUs are
> >> doing updates.
> >> Hazard pointers and RCU start off with a significant advantage because,
> >> unlike bucket locking, readers do not exclude updaters.
> >> --
> >> 2.17.1
> >>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/4] datastruct: Enclose NULL with \co{}
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
` (5 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Every 'NULL' in datastruct are enclosed with \co{} but one. Remove the
inconsistent exception.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 40ea6995..9dffaf37 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1532,7 +1532,7 @@ and \clnref{acq_oldcur} acquires that bucket's spinlock.
\co{hashtab_add()} and \co{hashtab_del()}?
In other words, what prevents \co{hashtab_add()}
and \co{hashtab_del()} from dereferencing
- a NULL pointer loaded from \co{->ht_new}?
+ a \co{NULL} pointer loaded from \co{->ht_new}?
\end{fcvref}
}\QuickQuizAnswer{
\begin{fcvref}[ln:datastruct:hash_resize:resize]
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 4/4] datastruct: Put \cref{} content in a single line
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (2 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (4 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Every \cref{} content is in a single line and it helps grep-like
scripting. However two \cref{}s are broken into two lines. Make those
single lines for consistency and easier grep.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 9dffaf37..4c7f9fe2 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1551,8 +1551,7 @@ and \clnref{acq_oldcur} acquires that bucket's spinlock.
\co{hashtab_del()} functions must be enclosed
in RCU read-side critical sections, courtesy of
\co{hashtab_lock_mod()} and \co{hashtab_unlock_mod()} in
- \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency
- Control}.
+ \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control}.
\end{fcvref}
}\QuickQuizEnd
@@ -1584,8 +1583,7 @@ the old hash table, and finally \clnref{ret_success} returns success.
\begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod]
Together with the \co{READ_ONCE()}
on \clnref{l:ifresized} in \co{hashtab_lock_mod()}
- of \cref{lst:datastruct:Resizable Hash-Table Update-Side
- Concurrency Control},
+ of \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control},
it tells the compiler that the non-initialization accesses
to \co{->ht_resize_cur} must remain because reads
from \co{->ht_resize_cur} really can race with writes,
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 0/4] datastruct: Minor fixes
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (3 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
` (3 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Hi Paul,
Hope you're having great holidays.
This patchset contains minor fixes for datastruct/ that found during the
Korean translation[1] of it.
[1] https://github.com/sjp38/perfbook-ko_KR
Thanks,
SJ
SeongJae Park (4):
datastruct: Remove unnecessary space
datastruct: Add missed unbreakable spaces
datastruct: Enclose NULL with \co{}
datastruct: Put \cref{} content in a single line
datastruct/datastruct.tex | 35 ++++++++++++++++-------------------
1 file changed, 16 insertions(+), 19 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/4] datastruct: Remove unnecessary space
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (4 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 0/4] datastruct: Minor fixes SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
` (2 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
A sentence in datastruct has unnecessary extra space between words.
Remove it.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index ed404e5a..99c92d9a 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -34,7 +34,7 @@ which improves both performance and scalability.
Because this chapter cannot delve into the details of every concurrent
data structure,
\cref{sec:datastruct:Other Data Structures}
-surveys a few of the important ones.
+surveys a few of the important ones.
Although the best performance and scalability results from design rather
than after-the-fact micro-optimization, micro-optimization is nevertheless
necessary for the absolute best possible performance and scalability,
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 2/4] datastruct: Add missed unbreakable spaces
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (5 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 1/4] datastruct: Remove unnecessary space SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Add missing unbreakable spaces for 'CPUs' and 'elements'.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 99c92d9a..40ea6995 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -664,7 +664,7 @@ shows the same data on a linear scale.
This drops the global-locking trace into the x-axis, but allows the
non-ideal performance of RCU and hazard pointers to be more readily
discerned.
-Both show a change in slope at 224 CPUs, and this is due to hardware
+Both show a change in slope at 224~CPUs, and this is due to hardware
multithreading.
At 32 and fewer CPUs, each thread has a core to itself.
In this regime, RCU does better than does hazard pointers because the
@@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core.
In short, RCU is better able to utilize a core from a single hardware
thread than is hazard pointers.
-This situation changes above 224 CPUs.
+This situation changes above 224~CPUs.
Because RCU is using more than half of each core's resources from a
single hardware thread, RCU gains relatively little benefit from the
second hardware thread in each core.
-The slope of the hazard-pointers trace also decreases at 224 CPUs, but
+The slope of the hazard-pointers trace also decreases at 224~CPUs, but
less dramatically,
because the second hardware thread is able to fill in the time
that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}.
@@ -775,8 +775,8 @@ to about half again faster than that of either QSBR or RCU\@.
Still unconvinced?
Then look at the log-log plot in
- \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
- which shows performance for 448 CPUs as a function of the
+ \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448~CPUs; Varying Table Size},
+ which shows performance for 448~CPUs as a function of the
hash-table size, that is, number of buckets and maximum number
of elements.
A hash-table of size 1,024 has 1,024~buckets and contains
@@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@.
Because this is a read-only benchmark, the actual occupancy is
always equal to the average occupancy.
- This figure shows near-ideal performance below about 8,000
- elements, that is, when the hash table comprises less than
- 1\,MB of data.
+ This figure shows near-ideal performance below about 8,000~elements,
+ that is, when the hash table comprises less than 1\,MB of data.
This near-ideal performance is consistent with that for the
pre-BSD routing table shown in
\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
- even at 448 CPUs.
+ even at 448~CPUs.
However, the performance drops significantly (this is a log-log
plot) at about 8,000~elements, which is where the 1,048,576-byte
L2 cache overflows.
@@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table.
\QuickQuiz{
The memory system is a serious bottleneck on this big system.
- Why bother putting 448 CPUs on a system without giving them
+ Why bother putting 448~CPUs on a system without giving them
enough memory bandwidth to do something useful???
}\QuickQuizAnswer{
It would indeed be a bad idea to use this large and expensive
@@ -905,10 +904,10 @@ concurrency control to begin with.
\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
therefore shows the effect of updates on readers.
At the extreme left-hand side of this graph, all but one of the CPUs
-are doing lookups, while to the right all 448 CPUs are doing updates.
+are doing lookups, while to the right all 448~CPUs are doing updates.
For all four implementations, the number of lookups per millisecond
decreases as the number of updating CPUs increases, of course reaching
-zero lookups per millisecond when all 448 CPUs are updating.
+zero lookups per millisecond when all 448~CPUs are updating.
Both hazard pointers and RCU do well compared to per-bucket locking
because their readers do not increase update-side lock contention.
RCU does well relative to hazard pointers as the number of updaters
@@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups,
\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
shows the effect of increasing update rates on the updates themselves.
Again, at the left-hand side of the figure all but one of the CPUs are
-doing lookups and at the right-hand side of the figure all 448 CPUs are
+doing lookups and at the right-hand side of the figure all 448~CPUs are
doing updates.
Hazard pointers and RCU start off with a significant advantage because,
unlike bucket locking, readers do not exclude updaters.
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 3/4] datastruct: Enclose NULL with \co{}
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (6 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 2/4] datastruct: Add missed unbreakable spaces SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
2022-12-26 18:16 ` [PATCH 4/4] datastruct: Put \cref{} content in a single line SeongJae Park
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Every 'NULL' in datastruct are enclosed with \co{} but one. Remove the
inconsistent exception.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 40ea6995..9dffaf37 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1532,7 +1532,7 @@ and \clnref{acq_oldcur} acquires that bucket's spinlock.
\co{hashtab_add()} and \co{hashtab_del()}?
In other words, what prevents \co{hashtab_add()}
and \co{hashtab_del()} from dereferencing
- a NULL pointer loaded from \co{->ht_new}?
+ a \co{NULL} pointer loaded from \co{->ht_new}?
\end{fcvref}
}\QuickQuizAnswer{
\begin{fcvref}[ln:datastruct:hash_resize:resize]
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 4/4] datastruct: Put \cref{} content in a single line
2022-12-26 18:16 [PATCH 0/4] datastruct: Minor fixes SeongJae Park
` (7 preceding siblings ...)
2022-12-26 18:16 ` [PATCH 3/4] datastruct: Enclose NULL with \co{} SeongJae Park
@ 2022-12-26 18:16 ` SeongJae Park
8 siblings, 0 replies; 18+ messages in thread
From: SeongJae Park @ 2022-12-26 18:16 UTC (permalink / raw)
To: paulmck; +Cc: perfbook, SeongJae Park
From: SeongJae Park <sj38.park@gmail.com>
Every \cref{} content is in a single line and it helps grep-like
scripting. However two \cref{}s are broken into two lines. Make those
single lines for consistency and easier grep.
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
datastruct/datastruct.tex | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 9dffaf37..4c7f9fe2 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -1551,8 +1551,7 @@ and \clnref{acq_oldcur} acquires that bucket's spinlock.
\co{hashtab_del()} functions must be enclosed
in RCU read-side critical sections, courtesy of
\co{hashtab_lock_mod()} and \co{hashtab_unlock_mod()} in
- \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency
- Control}.
+ \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control}.
\end{fcvref}
}\QuickQuizEnd
@@ -1584,8 +1583,7 @@ the old hash table, and finally \clnref{ret_success} returns success.
\begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod]
Together with the \co{READ_ONCE()}
on \clnref{l:ifresized} in \co{hashtab_lock_mod()}
- of \cref{lst:datastruct:Resizable Hash-Table Update-Side
- Concurrency Control},
+ of \cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control},
it tells the compiler that the non-initialization accesses
to \co{->ht_resize_cur} must remain because reads
from \co{->ht_resize_cur} really can race with writes,
--
2.17.1
^ permalink raw reply related [flat|nested] 18+ messages in thread