* [PATCH 00/10] Tweaks to follow guidelines in style guide
@ 2017-10-05 15:47 Akira Yokosawa
2017-10-05 15:48 ` [PATCH 01/10] debugging: Insert narrow space in front of percent symbol Akira Yokosawa
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:47 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From 2890e0069882321553c16aac213d4bb8d0a06fb7 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Wed, 5 Oct 2017 22:57:46 +0900
Subject: [PATCH 00/10] Tweaks to follow guidelines in style guide
Hi Paul,
This patch set consists of minor tweaks in regard to the suggestions
having been presented in style guide for a while.
Patches #1 -- #5 are trivial changes.
Patch #6 attempts to improve consistency in denoting POWER series CPU
by defining a macro "\Power{}".
Patch #7 substitutes "GCC" for "gcc". There are a few exceptions as
mentioned in commit log.
Patch #8 substitutes "IRQ" for "irq" in the same way. You might like
to skip this one, as I see "irq" more often than "IRQ" in Linux
documentations.
Patch #9 is somewhat invasive. It switches Times font to that of
"newtxtext" and "newtxmath" packages. The reason of the change
is to have access to both upright and slated glyphs of Greek letters.
Recent versions of these font packages give better looking result,
especially in math mode. As noted in the commit log, newtxmath in
TeX Live 2013/Debian has a few issues which have been fixed in later
versions. It also switches font choice for the experimental target "1csf".
Patch #10 updates style guide to reflect the changes made in this
patch set.
Thanks, Akira
--
Akira Yokosawa (10):
debugging: Insert narrow space in front of percent symbol
debugging: Use upright font for Euler's number
future/QC: Insert narrow space in front of percent symbol
future/QC: Use non-breakable hyphen for axis names
treewide: Insert narrow space in front of percent symbol
treewide: Use \Power{} macro for POWER CPU family
treewide: Call GNU C compiler as "GCC"
treewide: Use "IRQ" instead of "irq" used as abbreviation
future/QC: Use upright glyph for math constant and descriptive suffix
styleguide: Reflect recent style improvements
FAQ-BUILD.txt | 4 +-
Makefile | 2 +-
SMPdesign/SMPdesign.tex | 2 +-
SMPdesign/beyond.tex | 14 ++---
advsync/advsync.tex | 2 +-
appendix/styleguide/styleguide.tex | 64 ++++++++---------------
appendix/toyrcu/toyrcu.tex | 26 +++++-----
count/count.tex | 28 +++++-----
cpu/hwfreelunch.tex | 4 +-
datastruct/datastruct.tex | 2 +-
debugging/debugging.tex | 104 ++++++++++++++++++-------------------
defer/rcuapi.tex | 6 +--
defer/rcuusage.tex | 4 +-
formal/dyntickrcu.tex | 52 +++++++++----------
formal/formal.tex | 2 +-
formal/spinhint.tex | 2 +-
future/QC.tex | 92 ++++++++++++++++----------------
future/htm.tex | 2 +-
future/tm.tex | 4 +-
intro/intro.tex | 8 +--
memorder/memorder.tex | 32 ++++++------
perfbook.tex | 20 +++++--
rt/rt.tex | 22 ++++----
toolsoftrade/toolsoftrade.tex | 24 ++++-----
24 files changed, 257 insertions(+), 265 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 01/10] debugging: Insert narrow space in front of percent symbol
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
@ 2017-10-05 15:48 ` Akira Yokosawa
2017-10-05 15:49 ` [PATCH 02/10] debugging: Use upright font for Euler's number Akira Yokosawa
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:48 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From b67d6df0f0621907e81f419784f6b63b09619e9a Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 30 Sep 2017 16:20:34 +0900
Subject: [PATCH 01/10] debugging: Insert narrow space in front of percent symbol
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
debugging/debugging.tex | 82 ++++++++++++++++++++++++-------------------------
1 file changed, 41 insertions(+), 41 deletions(-)
diff --git a/debugging/debugging.tex b/debugging/debugging.tex
index 0199720..5747656 100644
--- a/debugging/debugging.tex
+++ b/debugging/debugging.tex
@@ -1025,19 +1025,19 @@ We therefore start with discrete tests.
\subsection{Statistics for Discrete Testing}
\label{sec:debugging:Statistics for Discrete Testing}
-Suppose that the bug had a 10\% chance of occurring in
+Suppose that the bug had a 10\,\% chance of occurring in
a given run and that we do five runs.
How do we compute that probability of at least one run failing?
One way is as follows:
\begin{enumerate}
-\item Compute the probability of a given run succeeding, which is 90\%.
+\item Compute the probability of a given run succeeding, which is 90\,\%.
\item Compute the probability of all five runs succeeding, which
- is 0.9 raised to the fifth power, or about 59\%.
+ is 0.9 raised to the fifth power, or about 59\,\%.
\item There are only two possibilities: either all five runs succeed,
or at least one fails.
Therefore, the probability of at least one failure is
- 59\% taken away from 100\%, or 41\%.
+ 59\,\% taken away from 100\,\%, or 41\,\%.
\end{enumerate}
However, many people find it easier to work with a formula than a series
@@ -1060,7 +1060,7 @@ The probability of failure is $1-S_n$, or:
\QuickQuiz{}
Say what???
When I plug the earlier example of five tests each with a
- 10\% failure rate into the formula, I get 59,050\% and that
+ 10\,\% failure rate into the formula, I get 59,050\,\% and that
just doesn't make sense!!!
\QuickQuizAnswer{
You are right, that makes no sense at all.
@@ -1068,27 +1068,27 @@ The probability of failure is $1-S_n$, or:
Remember that a probability is a number between zero and one,
so that you need to divide a percentage by 100 to get a
probability.
- So 10\% is a probability of 0.1, which gets a probability
- of 0.4095, which rounds to 41\%, which quite sensibly
+ So 10\,\% is a probability of 0.1, which gets a probability
+ of 0.4095, which rounds to 41\,\%, which quite sensibly
matches the earlier result.
} \QuickQuizEnd
-So suppose that a given test has been failing 10\% of the time.
-How many times do you have to run the test to be 99\% sure that
+So suppose that a given test has been failing 10\,\% of the time.
+How many times do you have to run the test to be 99\,\% sure that
your supposed fix has actually improved matters?
Another way to ask this question is ``How many times would we need
-to run the test to cause the probability of failure to rise above 99\%?''
+to run the test to cause the probability of failure to rise above 99\,\%?''
After all, if we were to run the test enough times that the probability
-of seeing at least one failure becomes 99\%, if there are no failures,
-there is only 1\% probability of this being due to dumb luck.
+of seeing at least one failure becomes 99\,\%, if there are no failures,
+there is only 1\,\% probability of this being due to dumb luck.
And if we plug $f=0.1$ into
Equation~\ref{eq:debugging:Binomial Failure Rate} and vary $n$,
-we find that 43 runs gives us a 98.92\% chance of at least one test failing
-given the original 10\% per-test failure rate,
-while 44 runs gives us a 99.03\% chance of at least one test failing.
+we find that 43 runs gives us a 98.92\,\% chance of at least one test failing
+given the original 10\,\% per-test failure rate,
+while 44 runs gives us a 99.03\,\% chance of at least one test failing.
So if we run the test on our fix 44 times and see no failures, there
-is a 99\% probability that our fix was actually a real improvement.
+is a 99\,\% probability that our fix was actually a real improvement.
But repeatedly plugging numbers into
Equation~\ref{eq:debugging:Binomial Failure Rate}
@@ -1110,7 +1110,7 @@ Finally the number of tests required is given by:
Plugging $f=0.1$ and $F_n=0.99$ into
Equation~\ref{eq:debugging:Binomial Number of Tests Required}
gives 43.7, meaning that we need 44 consecutive successful test
-runs to be 99\% certain that our fix was a real improvement.
+runs to be 99\,\% certain that our fix was a real improvement.
This matches the number obtained by the previous method, which
is reassuring.
@@ -1135,9 +1135,9 @@ is reassuring.
Figure~\ref{fig:debugging:Number of Tests Required for 99 Percent Confidence Given Failure Rate}
shows a plot of this function.
Not surprisingly, the less frequently each test run fails, the more
-test runs are required to be 99\% confident that the bug has been
+test runs are required to be 99\,\% confident that the bug has been
fixed.
-If the bug caused the test to fail only 1\% of the time, then a
+If the bug caused the test to fail only 1\,\% of the time, then a
mind-boggling 458 test runs are required.
As the failure probability decreases, the number of test runs required
increases, going to infinity as the failure probability goes to zero.
@@ -1145,18 +1145,18 @@ increases, going to infinity as the failure probability goes to zero.
The moral of this story is that when you have found a rarely occurring
bug, your testing job will be much easier if you can come up with
a carefully targeted test with a much higher failure rate.
-For example, if your targeted test raised the failure rate from 1\%
-to 30\%, then the number of runs required for 99\% confidence
+For example, if your targeted test raised the failure rate from 1\,\%
+to 30\,\%, then the number of runs required for 99\,\% confidence
would drop from 458 test runs to a mere thirteen test runs.
-But these thirteen test runs would only give you 99\% confidence that
+But these thirteen test runs would only give you 99\,\% confidence that
your fix had produced ``some improvement''.
-Suppose you instead want to have 99\% confidence that your fix reduced
+Suppose you instead want to have 99\,\% confidence that your fix reduced
the failure rate by an order of magnitude.
How many failure-free test runs are required?
-An order of magnitude improvement from a 30\% failure rate would be
-a 3\% failure rate.
+An order of magnitude improvement from a 30\,\% failure rate would be
+a 3\,\% failure rate.
Plugging these numbers into
Equation~\ref{eq:debugging:Binomial Number of Tests Required} yields:
@@ -1178,14 +1178,14 @@ Section~\ref{sec:debugging:Hunting Heisenbugs}.
But suppose that you have a continuous test that fails about three
times every ten hours, and that you fix the bug that you believe was
causing the failure.
-How long do you have to run this test without failure to be 99\% certain
+How long do you have to run this test without failure to be 99\,\% certain
that you reduced the probability of failure?
Without doing excessive violence to statistics, we could simply
-redefine a one-hour run to be a discrete test that has a 30\%
+redefine a one-hour run to be a discrete test that has a 30\,\%
probability of failure.
Then the results of in the previous section tell us that if the test
-runs for 13 hours without failure, there is a 99\% probability that
+runs for 13 hours without failure, there is a 99\,\% probability that
our fix actually improved the program's reliability.
A dogmatic statistician might not approve of this approach, but the
@@ -1216,10 +1216,10 @@ this book~\cite{McKenney2014ParallelProgramming-e1}.
Let's try reworking the example from
Section~\ref{sec:debugging:Abusing Statistics for Discrete Testing}
using the Poisson distribution.
-Recall that this example involved a test with a 30\% failure rate per
+Recall that this example involved a test with a 30\,\% failure rate per
hour, and that the question was how long the test would need to run
error-free
-on a alleged fix to be 99\% certain that the fix actually reduced the
+on a alleged fix to be 99\,\% certain that the fix actually reduced the
failure rate.
In this case, $\lambda$ is zero, so that
Equation~\ref{eq:debugging:Poisson Probability} reduces to:
@@ -1236,17 +1236,17 @@ to 0.01 and solving for $\lambda$, resulting in:
\end{equation}
Because we get $0.3$ failures per hour, the number of hours required
-is $4.6/0.3 = 14.3$, which is within 10\% of the 13 hours
+is $4.6/0.3 = 14.3$, which is within 10\,\% of the 13 hours
calculated using the method in
Section~\ref{sec:debugging:Abusing Statistics for Discrete Testing}.
-Given that you normally won't know your failure rate to within 10\%,
+Given that you normally won't know your failure rate to within 10\,\%,
this indicates that the method in
Section~\ref{sec:debugging:Abusing Statistics for Discrete Testing}
is a good and sufficient substitute for the Poisson distribution in
a great many situations.
More generally, if we have $n$ failures per unit time, and we want to
-be P\% certain that a fix reduced the failure rate, we can use the
+be P\,\% certain that a fix reduced the failure rate, we can use the
following formula:
\begin{equation}
@@ -1257,7 +1257,7 @@ following formula:
\QuickQuiz{}
Suppose that a bug causes a test failure three times per hour
on average.
- How long must the test run error-free to provide 99.9\%
+ How long must the test run error-free to provide 99.9\,\%
confidence that the fix significantly reduced the probability
of failure?
\QuickQuizAnswer{
@@ -1268,7 +1268,7 @@ following formula:
T = - \frac{1}{3} \log \frac{100 - 99.9}{100} = 2.3
\end{equation}
- If the test runs without failure for 2.3 hours, we can be 99.9\%
+ If the test runs without failure for 2.3 hours, we can be 99.9\,\%
certain that the fix reduced the probability of failure.
} \QuickQuizEnd
@@ -1616,7 +1616,7 @@ delay might be counted as a near miss.\footnote{
For example, a low-probability bug in RCU priority boosting occurred
roughly once every hundred hours of focused rcutorture testing.
Because it would take almost 500 hours of failure-free testing to be
-99\% certain that the bug's probability had been significantly reduced,
+99\,\% certain that the bug's probability had been significantly reduced,
the \co{git bisect} process
to find the failure would be painfully slow---or would require an extremely
large test farm.
@@ -1782,12 +1782,12 @@ much a bug as is incorrectness.
Although I do heartily salute your spirit and aspirations,
you are forgetting that there may be high costs due to delays
in the program's completion.
- For an extreme example, suppose that a 40\% performance shortfall
+ For an extreme example, suppose that a 40\,\% performance shortfall
from a single-threaded application is causing one person to die
each day.
Suppose further that in a day you could hack together a
quick and dirty
- parallel program that ran 50\% faster on an eight-CPU system
+ parallel program that ran 50\,\% faster on an eight-CPU system
than the sequential version, but that an optimal parallel
program would require four months of painstaking design, coding,
debugging, and tuning.
@@ -2265,7 +2265,7 @@ This script takes three optional arguments as follows:
\item [\tco{--relerr}\nf{:}] Relative measurement error. The script assumes
that values that differ by less than this error are for all
intents and purposes equal.
- This defaults to 0.01, which is equivalent to 1\%.
+ This defaults to 0.01, which is equivalent to 1\,\%.
\item [\tco{--trendbreak}\nf{:}] Ratio of inter-element spacing constituting
a break in the trend of the data.
For example, if the average spacing in the data accepted so far
@@ -2322,7 +2322,7 @@ Lines~44-52 then compute and print the statistics for the data set.
\QuickQuizAnswer{
Because mean and standard deviation were not designed to do this job.
To see this, try applying mean and standard deviation to the
- following data set, given a 1\% relative error in measurement:
+ following data set, given a 1\,\% relative error in measurement:
\begin{quote}
49,548.4 49,549.4 49,550.2 49,550.9 49,550.9 49,551.0
@@ -2452,7 +2452,7 @@ about a billion instances throughout the world?
In that case, a bug that would be encountered once every million years
will be encountered almost three times per day across the installed
base.
-A test with a 50\% chance of encountering this bug in a one-hour run
+A test with a 50\,\% chance of encountering this bug in a one-hour run
would need to increase that bug's probability of occurrence by more
than nine orders of magnitude, which poses a severe challenge to
today's testing methodologies.
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 02/10] debugging: Use upright font for Euler's number
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
2017-10-05 15:48 ` [PATCH 01/10] debugging: Insert narrow space in front of percent symbol Akira Yokosawa
@ 2017-10-05 15:49 ` Akira Yokosawa
2017-10-05 15:51 ` [PATCH 03/10] future/QC: Insert narrow space in front of percent symbol Akira Yokosawa
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:49 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From 69881c2d7792c59d8dfbed3a799186a95ef835fb Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 30 Sep 2017 17:37:45 +0900
Subject: [PATCH 02/10] debugging: Use upright font for Euler's number
Also use \ln for natural logarithm.
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
debugging/debugging.tex | 24 ++++++++++++------------
perfbook.tex | 2 ++
2 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/debugging/debugging.tex b/debugging/debugging.tex
index 5747656..7a5f71d 100644
--- a/debugging/debugging.tex
+++ b/debugging/debugging.tex
@@ -1116,7 +1116,7 @@ is reassuring.
\QuickQuiz{}
In Equation~\ref{eq:debugging:Binomial Number of Tests Required},
- are the logarithms base-10, base-2, or base-$e$?
+ are the logarithms base-10, base-2, or base-$\euler$?
\QuickQuizAnswer{
It does not matter.
You will get the same answer no matter what base of logarithms
@@ -1201,7 +1201,7 @@ The fundamental formula for failure probabilities is the Poisson
distribution:
\begin{equation}
- F_m = \frac{\lambda^m}{m!} e^{-\lambda}
+ F_m = \frac{\lambda^m}{m!} \euler^{-\lambda}
\label{eq:debugging:Poisson Probability}
\end{equation}
@@ -1225,14 +1225,14 @@ In this case, $\lambda$ is zero, so that
Equation~\ref{eq:debugging:Poisson Probability} reduces to:
\begin{equation}
- F_0 = e^{-\lambda}
+ F_0 = \euler^{-\lambda}
\end{equation}
Solving this requires setting $F_0$
to 0.01 and solving for $\lambda$, resulting in:
\begin{equation}
- \lambda = - \log 0.01 = 4.6
+ \lambda = - \ln 0.01 = 4.6
\end{equation}
Because we get $0.3$ failures per hour, the number of hours required
@@ -1246,11 +1246,11 @@ is a good and sufficient substitute for the Poisson distribution in
a great many situations.
More generally, if we have $n$ failures per unit time, and we want to
-be P\,\% certain that a fix reduced the failure rate, we can use the
+be $P$\,\% certain that a fix reduced the failure rate, we can use the
following formula:
\begin{equation}
- T = - \frac{1}{n} \log \frac{100 - P}{100}
+ T = - \frac{1}{n} \ln \frac{100 - P}{100}
\label{eq:debugging:Error-Free Test Duration}
\end{equation}
@@ -1287,14 +1287,14 @@ Equation~\ref{eq:debugging:Poisson Probability} as follows:
\begin{equation}
F_0 + F_1 + \dots + F_{m - 1} + F_m =
- \sum_{i=0}^m \frac{\lambda^i}{i!} e^{-\lambda}
+ \sum_{i=0}^m \frac{\lambda^i}{i!} \euler^{-\lambda}
\end{equation}
This is the Poisson cumulative distribution function, which can be
written more compactly as:
\begin{equation}
- F_{i \le m} = \sum_{i=0}^m \frac{\lambda^i}{i!} e^{-\lambda}
+ F_{i \le m} = \sum_{i=0}^m \frac{\lambda^i}{i!} \euler^{-\lambda}
\label{eq:debugging:Possion CDF}
\end{equation}
@@ -1341,18 +1341,18 @@ that the fix actually had some relationship to the bug.\footnote{
Indeed it should.
And it does.
- To see this, note that $e^{-\lambda}$ does not depend on $i$,
+ To see this, note that $\euler^{-\lambda}$ does not depend on $i$,
which means that it can be pulled out of the summation as follows:
\begin{equation}
- e^{-\lambda} \sum_{i=0}^\infty \frac{\lambda^i}{i!}
+ \euler^{-\lambda} \sum_{i=0}^\infty \frac{\lambda^i}{i!}
\end{equation}
The remaining summation is exactly the Taylor series for
- $e^\lambda$, yielding:
+ $\euler^\lambda$, yielding:
\begin{equation}
- e^{-\lambda} e^\lambda
+ \euler^{-\lambda} \euler^\lambda
\end{equation}
The two exponentials are reciprocals, and therefore cancel,
diff --git a/perfbook.tex b/perfbook.tex
index 84e48eb..da9cfa8 100644
--- a/perfbook.tex
+++ b/perfbook.tex
@@ -137,6 +137,8 @@
\newcommand{\nf}[1]{\textnormal{#1}} % to return to normal font
\newcommand{\qop}[1]{{\sffamily #1}} % QC operator such as H, T, S, etc.
+\DeclareRobustCommand{\euler}{\ensuremath{\mathrm{e}}}
+
\newcommand{\Epigraph}[2]{\epigraphhead[65]{\rmfamily\epigraph{#1}{#2}}}
\input{ushyphex} % Hyphenation exceptions for US English from hyphenex package
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 03/10] future/QC: Insert narrow space in front of percent symbol
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
2017-10-05 15:48 ` [PATCH 01/10] debugging: Insert narrow space in front of percent symbol Akira Yokosawa
2017-10-05 15:49 ` [PATCH 02/10] debugging: Use upright font for Euler's number Akira Yokosawa
@ 2017-10-05 15:51 ` Akira Yokosawa
2017-10-05 15:52 ` [PATCH 04/10] future/QC: Use non-breakable hyphen for axis names Akira Yokosawa
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:51 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From 85760945ceceff79e0d43156ad428043141c38b0 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 30 Sep 2017 18:01:14 +0900
Subject: [PATCH 03/10] future/QC: Insert narrow space in front of percent symbol
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
future/QC.tex | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/future/QC.tex b/future/QC.tex
index e5ff74e..437a4d4 100644
--- a/future/QC.tex
+++ b/future/QC.tex
@@ -309,10 +309,10 @@ A qubit is said to:
\item Collapse to a zero ($\ket{0}$) or a one ($\ket{1}$) if measured,
with probability being a function of the relative distance from
$\ket{0}$ and $\ket{1}$, but projected onto the Z-axis.
- Thus, a qubit on the equator of the Bloch sphere has a 50\%
+ Thus, a qubit on the equator of the Bloch sphere has a 50\,\%
probability of being measured as a one or as a zero, while
a qubit on the 45\textdegree-north latitude would have
- a 14\% chance of being measured as one and 86\% chance
+ a 14\,\% chance of being measured as one and 86\,\% chance
of being measured as zero.
This situation naturally causes developers to prefer a line
segment---or a classic-computing bit---over a sphere.
@@ -335,7 +335,7 @@ are as follows:
positive X-axis intersects the Bloch sphere, and rotates $\ket{1}$
to the point at which the negative X-axis intersects the Bloch
sphere.
- Either way, we get a qubit that is 50\% one and 50\% zero.
+ Either way, we get a qubit that is 50\,\% one and 50\,\% zero.
\item[\qop{S}\,:]
Rotate 90\degree{} ($\frac{\pi}{2}$ radians) about the
Bloch-sphere Z-axis, which has no effect on qubits in the
@@ -1260,9 +1260,9 @@ be extremely valuable in reducing costs (and environmental impacts)
of logistics, but current classic heuristics can find near-optimal
solutions for hundreds of cities~\cite{Martin:1992:LMC:2307953.2308141}
and polynomial-time algorithms that are guaranteed to find routes
-that are no more than 40\% longer than optimal for arbitrarily
+that are no more than 40\,\% longer than optimal for arbitrarily
large numbers of cities~\cite{Sebo:2014:STN:2688265.2688281},
-improving on the 50\% bound located a few decades
+improving on the 50\,\% bound located a few decades
earlier~\cite{NicosChristofides1976TSP-FiftyPercent}.
As of 2006 TSP solvers were finding optimal solutions to
85,900-city problems~\cite{DLApplegate2007TSPtextbook}.
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 04/10] future/QC: Use non-breakable hyphen for axis names
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
` (2 preceding siblings ...)
2017-10-05 15:51 ` [PATCH 03/10] future/QC: Insert narrow space in front of percent symbol Akira Yokosawa
@ 2017-10-05 15:52 ` Akira Yokosawa
2017-10-05 15:53 ` [PATCH 05/10] treewide: Insert narrow space in front of percent symbol Akira Yokosawa
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:52 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From 827ccdeef99cec23f937b92720992f38b492e502 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 30 Sep 2017 18:08:03 +0900
Subject: [PATCH 04/10] future/QC: Use non-breakable hyphen for axis names
The short cut "\=/" is provided by the "extdash" package,
as is presented in style guide.
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
future/QC.tex | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/future/QC.tex b/future/QC.tex
index 437a4d4..daf0086 100644
--- a/future/QC.tex
+++ b/future/QC.tex
@@ -308,7 +308,7 @@ A qubit is said to:
Figure~\ref{fig:future:Qubit as Bloch Sphere}.
\item Collapse to a zero ($\ket{0}$) or a one ($\ket{1}$) if measured,
with probability being a function of the relative distance from
- $\ket{0}$ and $\ket{1}$, but projected onto the Z-axis.
+ $\ket{0}$ and $\ket{1}$, but projected onto the Z\=/axis.
Thus, a qubit on the equator of the Bloch sphere has a 50\,\%
probability of being measured as a one or as a zero, while
a qubit on the 45\textdegree-north latitude would have
@@ -332,37 +332,37 @@ are as follows:
Rotate 180\degree{} ($\pi$ radians) about the Bloch-sphere
X-Z axis, that is, about the 45\degree{} line on the
X-Z plane. This rotates $\ket{0}$ to the point at which the
- positive X-axis intersects the Bloch sphere, and rotates $\ket{1}$
- to the point at which the negative X-axis intersects the Bloch
+ positive X\=/axis intersects the Bloch sphere, and rotates $\ket{1}$
+ to the point at which the negative X\=/axis intersects the Bloch
sphere.
Either way, we get a qubit that is 50\,\% one and 50\,\% zero.
\item[\qop{S}\,:]
Rotate 90\degree{} ($\frac{\pi}{2}$ radians) about the
- Bloch-sphere Z-axis, which has no effect on qubits in the
+ Bloch-sphere Z\=/axis, which has no effect on qubits in the
$\ket{0}$ or $\ket{1}$ states.
\item[\qop{S}$^{\bm{\dagger}}$:]
Rotate $-90\degree$ ($-\frac{\pi}{2}$ radians) about the
- Bloch-sphere Z-axis, which has no effect on qubits in the
+ Bloch-sphere Z\=/axis, which has no effect on qubits in the
$\ket{0}$ or $\ket{1}$ states.
This operator is the inverse of \qop{S}.
\item[\qop{T}\,:]
Rotate 45\degree{} ($\frac{\pi}{4}$ radians) about the
- Bloch-sphere Z-axis, which has no effect on qubits in the
+ Bloch-sphere Z\=/axis, which has no effect on qubits in the
$\ket{0}$ or $\ket{1}$ states.
\item[\qop{T}$^{\bm{\dagger}}$:]
Rotate $-45\degree$ ($-\frac{\pi}{4}$ radians) about the
- Bloch-sphere Z-axis, which has no effect on qubits in the
+ Bloch-sphere Z\=/axis, which has no effect on qubits in the
$\ket{0}$ or $\ket{1}$ states.
This operator is the inverse of \qop{T}.
\item[\qop{X}\,:]
Rotate 180\degree{} ($\pi$ radians) about the Bloch-sphere
- X-axis, which takes $\ket{0}$ to $\ket{1}$ and vice versa.
+ X\=/axis, which takes $\ket{0}$ to $\ket{1}$ and vice versa.
\item[\qop{Y}\,:]
Rotate 180\degree{} ($\pi$ radians) about the Bloch-sphere
- Y-axis, which also takes $\ket{0}$ to $\ket{1}$ and vice versa.
+ Y\=/axis, which also takes $\ket{0}$ to $\ket{1}$ and vice versa.
\item[\qop{Z}\,:]
Rotate 180\degree{} ($\pi$ radians) about the Bloch-sphere
- Z-axis, which has no effect on qubits in the $\ket{0}$ or
+ Z\=/axis, which has no effect on qubits in the $\ket{0}$ or
$\ket{1}$ states.
\end{description}
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 05/10] treewide: Insert narrow space in front of percent symbol
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
` (3 preceding siblings ...)
2017-10-05 15:52 ` [PATCH 04/10] future/QC: Use non-breakable hyphen for axis names Akira Yokosawa
@ 2017-10-05 15:53 ` Akira Yokosawa
2017-10-05 15:54 ` [PATCH 06/10] treewide: Use \Power{} macro for POWER CPU family Akira Yokosawa
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:53 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From ffbf7756c160eaa59e8a93c1bdd09c1497dfe449 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sun, 1 Oct 2017 12:17:43 +0900
Subject: [PATCH 05/10] treewide: Insert narrow space in front of percent symbol
In SMPdesign/beyond.tex, there are two cases where "percent" is
spelled out in compound words.
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
SMPdesign/SMPdesign.tex | 2 +-
SMPdesign/beyond.tex | 14 +++++++-------
advsync/advsync.tex | 2 +-
count/count.tex | 4 ++--
cpu/hwfreelunch.tex | 4 ++--
defer/rcuusage.tex | 4 ++--
formal/dyntickrcu.tex | 2 +-
formal/spinhint.tex | 2 +-
future/htm.tex | 2 +-
future/tm.tex | 4 ++--
intro/intro.tex | 6 +++---
rt/rt.tex | 8 ++++----
12 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/SMPdesign/SMPdesign.tex b/SMPdesign/SMPdesign.tex
index 1936d27..81219cb 100644
--- a/SMPdesign/SMPdesign.tex
+++ b/SMPdesign/SMPdesign.tex
@@ -1186,7 +1186,7 @@ which fortunately is usually quite easy to do in actual
practice~\cite{McKenney01e}, especially given today's large memories.
For example, in most systems, it is quite reasonable to set
\co{TARGET_POOL_SIZE} to 100, in which case allocations and frees
-are guaranteed to be confined to per-thread pools at least 99\% of
+are guaranteed to be confined to per-thread pools at least 99\,\% of
the time.
As can be seen from the figure, the situations where the common-case
diff --git a/SMPdesign/beyond.tex b/SMPdesign/beyond.tex
index 7ba351e..1fb2a6b 100644
--- a/SMPdesign/beyond.tex
+++ b/SMPdesign/beyond.tex
@@ -401,8 +401,8 @@ large algorithmic superlinear speedups.
\end{figure}
Further investigation showed that
-PART sometimes visited fewer than 2\% of the maze's cells,
-while SEQ and PWQ never visited fewer than about 9\%.
+PART sometimes visited fewer than 2\,\% of the maze's cells,
+while SEQ and PWQ never visited fewer than about 9\,\%.
The reason for this difference is shown by
Figure~\ref{fig:SMPdesign:Reason for Small Visit Percentages}.
If the thread traversing the solution from the upper left reaches
@@ -473,11 +473,11 @@ optimizations are quite attractive.
Cache alignment and padding often improves performance by reducing
false sharing.
However, for these maze-solution algorithms, aligning and padding the
-maze-cell array \emph{degrades} performance by up to 42\% for 1000x1000 mazes.
+maze-cell array \emph{degrades} performance by up to 42\,\% for 1000x1000 mazes.
Cache locality is more important than avoiding
false sharing, especially for large mazes.
For smaller 20-by-20 or 50-by-50 mazes, aligning and padding can produce
-up to a 40\% performance improvement for PART,
+up to a 40\,\% performance improvement for PART,
but for these small sizes, SEQ performs better anyway because there
is insufficient time for PART to make up for the overhead of
thread creation and destruction.
@@ -508,7 +508,7 @@ context-switch overhead and visit percentage.
As can be seen in
Figure~\ref{fig:SMPdesign:Partitioned Coroutines},
this coroutine algorithm (COPART) is quite effective, with the performance
-on one thread being within about 30\% of PART on two threads
+on one thread being within about 30\,\% of PART on two threads
(\path{maze_2seq.c}).
\subsection{Performance Comparison II}
@@ -532,7 +532,7 @@ Figures~\ref{fig:SMPdesign:Varying Maze Size vs. SEQ}
and~\ref{fig:SMPdesign:Varying Maze Size vs. COPART}
show the effects of varying maze size, comparing both PWQ and PART
running on two threads
-against either SEQ or COPART, respectively, with 90\%-confidence
+against either SEQ or COPART, respectively, with 90\=/percent\-/confidence
error bars.
PART shows superlinear scalability against SEQ and modest scalability
against COPART for 100-by-100 and larger mazes.
@@ -565,7 +565,7 @@ a thread is connected to both beginning and end).
PWQ performs quite poorly, but
PART hits breakeven at two threads and again at five threads, achieving
modest speedups beyond five threads.
-Theoretical energy efficiency breakeven is within the 90\% confidence
+Theoretical energy efficiency breakeven is within the 90\=/percent\-/confidence
interval for seven and eight threads.
The reasons for the peak at two threads are (1) the lower complexity
of termination detection in the two-thread case and (2) the fact that
diff --git a/advsync/advsync.tex b/advsync/advsync.tex
index 98e6986..adf1dc9 100644
--- a/advsync/advsync.tex
+++ b/advsync/advsync.tex
@@ -85,7 +85,7 @@ basis of real-time programming:
bound.
\item Real-time forward-progress guarantees are sometimes
probabilistic, as in the soft-real-time guarantee that
- ``at least 99.9\% of the time, scheduling latency must
+ ``at least 99.9\,\% of the time, scheduling latency must
be less than 100 microseconds.''
In contrast, NBS's forward-progress
guarantees have traditionally been unconditional.
diff --git a/count/count.tex b/count/count.tex
index f1645ee..73b6866 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -55,7 +55,7 @@ counting.
whatever ``true value'' might mean in this context.
However, the value read out should maintain roughly the same
absolute error over time.
- For example, a 1\% error might be just fine when the count
+ For example, a 1\,\% error might be just fine when the count
is on the order of a million or so, but might be absolutely
unacceptable once the count reaches a trillion.
See Section~\ref{sec:count:Statistical Counters}.
@@ -204,7 +204,7 @@ On my dual-core laptop, a short run invoked \co{inc_count()}
100,014,000 times, but the final value of the counter was only
52,909,118.
Although approximate values do have their place in computing,
-accuracies far greater than 50\% are almost always necessary.
+accuracies far greater than 50\,\% are almost always necessary.
\QuickQuiz{}
But doesn't the \co{++} operator produce an x86 add-to-memory
diff --git a/cpu/hwfreelunch.tex b/cpu/hwfreelunch.tex
index b449ba2..152f691 100644
--- a/cpu/hwfreelunch.tex
+++ b/cpu/hwfreelunch.tex
@@ -193,13 +193,13 @@ excellent bragging rights, if nothing else!
Although the speed of light would be a hard limit, the fact is that
semiconductor devices are limited by the speed of electricity rather
than that of light, given that electric waves in semiconductor materials
-move at between 3\% and 30\% of the speed of light in a vacuum.
+move at between 3\,\% and 30\,\% of the speed of light in a vacuum.
The use of copper connections on silicon devices is one way to increase
the speed of electricity, and it is quite possible that additional
advances will push closer still to the actual speed of light.
In addition, there have been some experiments with tiny optical fibers
as interconnects within and between chips, based on the fact that
-the speed of light in glass is more than 60\% of the speed of light
+the speed of light in glass is more than 60\,\% of the speed of light
in a vacuum.
One obstacle to such optical fibers is the inefficiency conversion
between electricity and light and vice versa, resulting in both
diff --git a/defer/rcuusage.tex b/defer/rcuusage.tex
index af4faff..74be9fc 100644
--- a/defer/rcuusage.tex
+++ b/defer/rcuusage.tex
@@ -193,7 +193,7 @@ ideal synchronization-free workload, as desired.
each search is taking on average about 13~nanoseconds,
which is short enough for small differences in code
generation to make their presence felt.
- The difference ranges from about 1.5\% to about 11.1\%, which is
+ The difference ranges from about 1.5\,\% to about 11.1\,\%, which is
quite small when you consider that the RCU QSBR code can handle
concurrent updates and the ``ideal'' code cannot.
@@ -775,7 +775,7 @@ again showing data taken on a 16-CPU 3\,GHz Intel x86 system.
Most likely NUMA effects.
However, there is substantial variance in the values measured for the
refcnt line, as can be seen by the error bars.
- In fact, standard deviations range in excess of 10\% of measured
+ In fact, standard deviations range in excess of 10\,\% of measured
values in some cases.
The dip in overhead therefore might well be a statistical aberration.
} \QuickQuizEnd
diff --git a/formal/dyntickrcu.tex b/formal/dyntickrcu.tex
index 80fa3e7..ec3c78c 100644
--- a/formal/dyntickrcu.tex
+++ b/formal/dyntickrcu.tex
@@ -1748,7 +1748,7 @@ states, passing without errors.
\end{quote}
This means that any attempt to optimize the production of code should
- place at least 66\% of its emphasis on optimizing the debugging process,
+ place at least 66\,\% of its emphasis on optimizing the debugging process,
even at the expense of increasing the time and effort spent coding.
Incremental coding and testing is one way to optimize the debugging
process, at the expense of some increase in coding effort.
diff --git a/formal/spinhint.tex b/formal/spinhint.tex
index a40d2c3..27df639 100644
--- a/formal/spinhint.tex
+++ b/formal/spinhint.tex
@@ -416,7 +416,7 @@ Given a source file \path{qrcu.spin}, one can use the following commands:
run \co{top} in one window and \co{./pan} in another. Keep the
focus on the \co{./pan} window so that you can quickly kill
execution if need be. As soon as CPU time drops much below
- 100\%, kill \co{./pan}. If you have removed focus from the
+ 100\,\%, kill \co{./pan}. If you have removed focus from the
window running \co{./pan}, you may wait a long time for the
windowing system to grab enough memory to do anything for
you.
diff --git a/future/htm.tex b/future/htm.tex
index e26ee2a..0c3801d 100644
--- a/future/htm.tex
+++ b/future/htm.tex
@@ -1185,7 +1185,7 @@ by Siakavaras et al.~\cite{Siakavaras2017CombiningHA},
is to use RCU for read-only traversals and HTM
only for the actual updates themselves.
This combination outperformed other transactional-memory techniques by
-up to 220\%, a speedup similar to that observed by
+up to 220\,\%, a speedup similar to that observed by
Howard and Walpole~\cite{PhilHoward2011RCUTMRBTree}
when they combined RCU with STM.
In both cases, the weak atomicity is implemented in software rather than
diff --git a/future/tm.tex b/future/tm.tex
index ec5373d..8420331 100644
--- a/future/tm.tex
+++ b/future/tm.tex
@@ -711,8 +711,8 @@ representing the lock as part of the transaction, and everything works
out perfectly.
In practice, a number of non-obvious complications~\cite{Volos2008TRANSACT}
can arise, depending on implementation details of the TM system.
-These complications can be resolved, but at the cost of a 45\% increase in
-overhead for locks acquired outside of transactions and a 300\% increase
+These complications can be resolved, but at the cost of a 45\,\% increase in
+overhead for locks acquired outside of transactions and a 300\,\% increase
in overhead for locks acquired within transactions.
Although these overheads might be acceptable for transactional
programs containing small amounts of locking, they are often completely
diff --git a/intro/intro.tex b/intro/intro.tex
index ca991bd..8bed518 100644
--- a/intro/intro.tex
+++ b/intro/intro.tex
@@ -414,7 +414,7 @@ To see this, consider that the price of early computers was tens
of millions of dollars at
a time when engineering salaries were but a few thousand dollars a year.
If dedicating a team of ten engineers to such a machine would improve
-its performance, even by only 10\%, then their salaries
+its performance, even by only 10\,\%, then their salaries
would be repaid many times over.
One such machine was the CSIRAC, the oldest still-intact stored-program
@@ -863,11 +863,11 @@ been extremely narrowly focused, and hence unable to demonstrate any
general results.
Furthermore, given that the normal range of programmer productivity
spans more than an order of magnitude, it is unrealistic to expect
-an affordable study to be capable of detecting (say) a 10\% difference
+an affordable study to be capable of detecting (say) a 10\,\% difference
in productivity.
Although the multiple-order-of-magnitude differences that such studies
\emph{can} reliably detect are extremely valuable, the most impressive
-improvements tend to be based on a long series of 10\% improvements.
+improvements tend to be based on a long series of 10\,\% improvements.
We must therefore take a different approach.
diff --git a/rt/rt.tex b/rt/rt.tex
index 2f5d4fe..21e7117 100644
--- a/rt/rt.tex
+++ b/rt/rt.tex
@@ -48,7 +48,7 @@ are clearly required.
We might therefore say that a given soft real-time application must meet
its response-time requirements at least some fraction of the time, for
example, we might say that it must execute in less than 20 microseconds
-99.9\% of the time.
+99.9\,\% of the time.
This of course raises the question of what is to be done when the application
fails to meet its response-time requirements.
@@ -267,7 +267,7 @@ or even avoiding interrupts altogether in favor of polling.
Overloading can also degrade response times due to queueing effects,
so it is not unusual for real-time systems to overprovision CPU bandwidth,
-so that a running system has (say) 80\% idle time.
+so that a running system has (say) 80\,\% idle time.
This approach also applies to storage and networking devices.
In some cases, separate storage and networking hardware might be reserved
for the sole use of high-priority portions of the real-time application.
@@ -351,7 +351,7 @@ on the hardware and software implementing those operations.
For each such operation, these constraints might include a maximum
response time (and possibly also a minimum response time) and a
probability of meeting that response time.
-A probability of 100\% indicates that the corresponding operation
+A probability of 100\,\% indicates that the corresponding operation
must provide hard real-time service.
In some cases, both the response times and the required probabilities of
@@ -1583,7 +1583,7 @@ These constraints include:
latencies are provided only to the highest-priority threads.
\item Sufficient bandwidth to support the workload.
An implementation rule supporting this constraint might be
- ``There will be at least 50\% idle time on all CPUs
+ ``There will be at least 50\,\% idle time on all CPUs
during normal operation,''
or, more formally, ``The offered load will be sufficiently low
to allow the workload to be schedulable at all times.''
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 06/10] treewide: Use \Power{} macro for POWER CPU family
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
` (4 preceding siblings ...)
2017-10-05 15:53 ` [PATCH 05/10] treewide: Insert narrow space in front of percent symbol Akira Yokosawa
@ 2017-10-05 15:54 ` Akira Yokosawa
2017-10-05 15:55 ` [PATCH 07/10] treewide: Call GNU C compiler as "GCC" Akira Yokosawa
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:54 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From c7255fa8b6fc7835c0eb6ab524aed3349cea1dca Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sun, 1 Oct 2017 12:40:18 +0900
Subject: [PATCH 06/10] treewide: Use \Power{} macro for POWER CPU family
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
appendix/toyrcu/toyrcu.tex | 26 +++++++++++++-------------
count/count.tex | 6 +++---
intro/intro.tex | 2 +-
memorder/memorder.tex | 30 +++++++++++++++---------------
perfbook.tex | 1 +
toolsoftrade/toolsoftrade.tex | 4 ++--
6 files changed, 35 insertions(+), 34 deletions(-)
diff --git a/appendix/toyrcu/toyrcu.tex b/appendix/toyrcu/toyrcu.tex
index db45fad..2c65f74 100644
--- a/appendix/toyrcu/toyrcu.tex
+++ b/appendix/toyrcu/toyrcu.tex
@@ -73,7 +73,7 @@ Of course, only one RCU reader may be in its read-side critical section
at a time, which almost entirely defeats the purpose of RCU.
In addition, the lock operations in \co{rcu_read_lock()} and
\co{rcu_read_unlock()} are extremely heavyweight,
-with read-side overhead ranging from about 100~nanoseconds on a single Power5
+with read-side overhead ranging from about 100~nanoseconds on a single \Power{5}
CPU up to more than 17~\emph{microseconds} on a 64-CPU system.
Worse yet,
these same lock operations permit \co{rcu_read_lock()}
@@ -216,7 +216,7 @@ with a single global lock.
Furthermore, the read-side overhead, though high at roughly 140 nanoseconds,
remains at about 140 nanoseconds regardless of the number of CPUs.
However, the update-side overhead ranges from about 600 nanoseconds
-on a single Power5 CPU
+on a single \Power{5} CPU
up to more than 100 \emph{microseconds} on 64 CPUs.
\QuickQuiz{}
@@ -368,7 +368,7 @@ However, this implementations still has some serious shortcomings.
First, the atomic operations in \co{rcu_read_lock()} and
\co{rcu_read_unlock()} are still quite heavyweight,
with read-side overhead ranging from about 100~nanoseconds on
-a single Power5 CPU up to almost 40~\emph{microseconds}
+a single \Power{5} CPU up to almost 40~\emph{microseconds}
on a 64-CPU system.
This means that the RCU read-side critical sections
have to be extremely long in order to get any real
@@ -718,9 +718,9 @@ In fact, they are more complex than those
of the single-counter variant shown in
Figure~\ref{fig:app:toyrcu:RCU Implementation Using Single Global Reference Counter},
with the read-side primitives consuming about 150~nanoseconds on a single
-Power5 CPU and almost 40~\emph{microseconds} on a 64-CPU system.
+\Power{5} CPU and almost 40~\emph{microseconds} on a 64-CPU system.
The update-side \co{synchronize_rcu()} primitive is more costly as
-well, ranging from about 200~nanoseconds on a single Power5 CPU to
+well, ranging from about 200~nanoseconds on a single \Power{5} CPU to
more than 40~\emph{microseconds} on a 64-CPU system.
This means that the RCU read-side critical sections
have to be extremely long in order to get any real
@@ -963,9 +963,9 @@ environments.
That said, the read-side primitives scale very nicely, requiring about
115~nanoseconds regardless of whether running on a single-CPU or a 64-CPU
-Power5 system.
+\Power{5} system.
As noted above, the \co{synchronize_rcu()} primitive does not scale,
-ranging in overhead from almost a microsecond on a single Power5 CPU
+ranging in overhead from almost a microsecond on a single \Power{5} CPU
up to almost 200~microseconds on a 64-CPU system.
This implementation could conceivably form the basis for a
production-quality user-level RCU implementation.
@@ -1340,9 +1340,9 @@ destruction will not be reordered into the preceding loop.
This approach achieves much better read-side performance, incurring
roughly 63~nanoseconds of overhead regardless of the number of
-Power5 CPUs.
+\Power{5} CPUs.
Updates incur more overhead, ranging from about 500~nanoseconds on
-a single Power5 CPU to more than 100~\emph{microseconds} on 64
+a single \Power{5} CPU to more than 100~\emph{microseconds} on 64
such CPUs.
\QuickQuiz{}
@@ -1542,9 +1542,9 @@ This approach achieves read-side performance almost equal to that
shown in
Section~\ref{sec:app:toyrcu:RCU Based on Free-Running Counter}, incurring
roughly 65~nanoseconds of overhead regardless of the number of
-Power5 CPUs.
+\Power{5} CPUs.
Updates again incur more overhead, ranging from about 600~nanoseconds on
-a single Power5 CPU to more than 100~\emph{microseconds} on 64
+a single \Power{5} CPU to more than 100~\emph{microseconds} on 64
such CPUs.
\QuickQuiz{}
@@ -1866,11 +1866,11 @@ This implementation has blazingly fast read-side primitives, with
an \co{rcu_read_lock()}-\co{rcu_read_unlock()} round trip incurring
an overhead of roughly 50~\emph{picoseconds}.
The \co{synchronize_rcu()} overhead ranges from about 600~nanoseconds
-on a single-CPU Power5 system up to more than 100~microseconds on
+on a single-CPU \Power{5} system up to more than 100~microseconds on
a 64-CPU system.
\QuickQuiz{}
- To be sure, the clock frequencies of Power
+ To be sure, the clock frequencies of \Power{}
systems in 2008 were quite high, but even a 5\,GHz clock
frequency is insufficient to allow
loops to be executed in 50~picoseconds!
diff --git a/count/count.tex b/count/count.tex
index 73b6866..a38aba1 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -3330,7 +3330,7 @@ will expand on these lessons.
\path{count_end_rcu.c} & \ref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters} &
5.7 ns & 354 ns & 501 ns \\
\end{tabular}
-\caption{Statistical Counter Performance on Power-6}
+\caption{Statistical Counter Performance on \Power{6}}
\label{tab:count:Statistical Counter Performance on Power-6}
\end{table*}
@@ -3410,14 +3410,14 @@ courtesy of eventual consistency.
\path{count_lim_sig.c} & \ref{sec:count:Signal-Theft Limit Counter Implementation} &
Y & 10.2 ns & 370 ns & 54,000 ns \\
\end{tabular}
-\caption{Limit Counter Performance on Power-6}
+\caption{Limit Counter Performance on \Power{6}}
\label{tab:count:Limit Counter Performance on Power-6}
\end{table*}
Figure~\ref{tab:count:Limit Counter Performance on Power-6}
shows the performance of the parallel limit-counting algorithms.
Exact enforcement of the limits incurs a substantial performance
-penalty, although on this 4.7\,GHz Power-6 system that penalty can be reduced
+penalty, although on this 4.7\,GHz \Power{6} system that penalty can be reduced
by substituting signals for atomic operations.
All of these implementations suffer from read-side lock contention
in the face of concurrent readers.
diff --git a/intro/intro.tex b/intro/intro.tex
index 8bed518..293a02f 100644
--- a/intro/intro.tex
+++ b/intro/intro.tex
@@ -77,7 +77,7 @@ that of a bicycle, courtesy of Moore's Law.
Papers calling out the advantages of multicore CPUs were published
as early as 1996~\cite{Olukotun96}.
IBM introduced simultaneous multi-threading
-into its high-end POWER family in 2000, and multicore in 2001.
+into its high-end \Power{} family in 2000, and multicore in 2001.
Intel introduced hyperthreading into its commodity Pentium line in
November 2000, and both AMD and Intel introduced
dual-core CPUs in 2005.
diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 7dc3fb4..944c17a 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -314,7 +314,7 @@ synchronization primitives (such as locking and RCU)
that are responsible for maintaining the illusion of ordering through use of
\emph{memory barriers} (for example, \co{smp_mb()} in the Linux kernel).
These memory barriers can be explicit instructions, as they are on
-ARM, POWER, Itanium, and Alpha, or they can be implied by other instructions,
+ARM, \Power{}, Itanium, and Alpha, or they can be implied by other instructions,
as they often are on x86.
Since these standard synchronization primitives preserve the illusion of
ordering, your path of least resistance is to simply use these primitives,
@@ -827,7 +827,7 @@ if the shared variable had changed before entry into the loop.
This allows us to plot each CPU's view of the value of \co{state.variable}
over a 532-nanosecond time period, as shown in
Figure~\ref{fig:memorder:A Variable With Multiple Simultaneous Values}.
-This data was collected in 2006 on 1.5\,GHz POWER5 system with 8 cores,
+This data was collected in 2006 on 1.5\,GHz \Power{5} system with 8 cores,
each containing a pair of hardware threads.
CPUs~1, 2, 3, and~4 recorded the values, while CPU~0 controlled the test.
The timebase counter period was about 5.32\,ns, sufficiently fine-grained
@@ -2043,7 +2043,7 @@ communicated to \co{P1()} long before it was communicated to \co{P2()}.
\QuickQuizAnswer{
You need to face the fact that it really can trigger.
Akira Yokosawa used the \co{litmus7} tool to run this litmus test
- on a Power8 system.
+ on a \Power{8} system.
Out of 1,000,000,000 runs, 4 triggered the \co{exists} clause.
Thus, triggering the \co{exists} clause is not merely a one-in-a-million
occurrence, but rather a one-in-a-hundred-million occurrence.
@@ -3707,7 +3707,7 @@ dependencies.
\rotatebox{90}{PA-RISC CPUs}
\end{picture}
& \begin{picture}(6,60)(0,0)
- \rotatebox{90}{POWER}
+ \rotatebox{90}{\Power{}}
\end{picture}
& \begin{picture}(6,60)(0,0)
\rotatebox{90}{SPARC TSO}
@@ -4134,7 +4134,7 @@ For more on Alpha, see its reference manual~\cite{ALPHA2002}.
The ARM family of CPUs is extremely popular in embedded applications,
particularly for power-constrained applications such as cellphones.
-Its memory model is similar to that of Power
+Its memory model is similar to that of \Power{}
(see Section~\ref{sec:memorder:POWER / PowerPC}, but ARM uses a
different set of memory-barrier instructions~\cite{ARMv7A:2010}:
@@ -4144,7 +4144,7 @@ different set of memory-barrier instructions~\cite{ARMv7A:2010}:
subsequent operations of the same type.
The ``type'' of operations can be all operations or can be
restricted to only writes (similar to the Alpha \co{wmb}
- and the POWER \co{eieio} instructions).
+ and the \Power{} \co{eieio} instructions).
In addition, ARM allows cache coherence to have one of three
scopes: single processor, a subset of the processors
(``inner'') and global (``outer'').
@@ -4168,7 +4168,7 @@ None of these instructions exactly match the semantics of Linux's
\co{DMB}.
The \co{DMB} and \co{DSB} instructions have a recursive definition
of accesses ordered before and after the barrier, which has an effect
-similar to that of POWER's cumulativity.
+similar to that of \Power{}'s cumulativity.
ARM also implements control dependencies, so that if a conditional
branch depends on a load, then any store executed after that conditional
@@ -4292,7 +4292,7 @@ memory barriers.
\subsection{MIPS}
The MIPS memory model~\cite[Table 6.6]{MIPSvII-A-2015}
-appears to resemble that of ARM, Itanium, and Power,
+appears to resemble that of ARM, Itanium, and \Power{},
being weakly ordered by default, but respecting dependencies.
MIPS has a wide variety of memory-barrier instructions, but ties them
not to hardware considerations, but rather to the use cases provided
@@ -4325,7 +4325,7 @@ in a manner similar to the ARM64 additions:
Informal discussions with MIPS architects indicates that MIPS has a
definition of transitivity or cumulativity similar to that of
-ARM and Power.
+ARM and \Power{}.
However, it appears that different MIPS implementations can have
different memory-ordering properties, so it is important to consult
the documentation for the specific MIPS implementation you are using.
@@ -4339,10 +4339,10 @@ no code, however, they do use the gcc {\tt memory} attribute to disable
compiler optimizations that would reorder code across the memory
barrier.
-\subsection{POWER / PowerPC}
+\subsection{\Power{} / PowerPC}
\label{sec:memorder:POWER / PowerPC}
-The POWER and PowerPC\textsuperscript{\textregistered}
+The \Power{} and PowerPC\textsuperscript{\textregistered}
CPU families have a wide variety of memory-barrier
instructions~\cite{PowerPC94,MichaelLyons05a}:
\begin{description}
@@ -4388,7 +4388,7 @@ The \co{smp_mb()} instruction is also defined to be the {\tt sync}
instruction, but both \co{smp_rmb()} and \co{rmb()} are defined to
be the lighter-weight {\tt lwsync} instruction.
-Power features ``cumulativity'', which can be used to obtain
+\Power{} features ``cumulativity'', which can be used to obtain
transitivity.
When used properly, any code seeing the results of an earlier
code fragment will also see the accesses that this earlier code
@@ -4396,11 +4396,11 @@ fragment itself saw.
Much more detail is available from
McKenney and Silvera~\cite{PaulEMcKenneyN2745r2009}.
-Power respects control dependencies in much the same way that ARM
-does, with the exception that the Power \co{isync} instruction
+\Power{} respects control dependencies in much the same way that ARM
+does, with the exception that the \Power{} \co{isync} instruction
is substituted for the ARM \co{ISB} instruction.
-Many members of the POWER architecture have incoherent instruction
+Many members of the \Power{} architecture have incoherent instruction
caches, so that a store to memory will not necessarily be reflected
in the instruction cache.
Thankfully, few people write self-modifying code these days, but JITs
diff --git a/perfbook.tex b/perfbook.tex
index da9cfa8..cc4f4b0 100644
--- a/perfbook.tex
+++ b/perfbook.tex
@@ -138,6 +138,7 @@
\newcommand{\qop}[1]{{\sffamily #1}} % QC operator such as H, T, S, etc.
\DeclareRobustCommand{\euler}{\ensuremath{\mathrm{e}}}
+\newcommand{\Power}[1]{POWER#1}
\newcommand{\Epigraph}[2]{\epigraphhead[65]{\rmfamily\epigraph{#1}{#2}}}
diff --git a/toolsoftrade/toolsoftrade.tex b/toolsoftrade/toolsoftrade.tex
index 9cf3312..97a37d3 100644
--- a/toolsoftrade/toolsoftrade.tex
+++ b/toolsoftrade/toolsoftrade.tex
@@ -1038,7 +1038,7 @@ Line~39 moves the lock-acquisition count to this thread's element of the
\end{figure}
Figure~\ref{fig:toolsoftrade:Reader-Writer Lock Scalability}
-shows the results of running this test on a 64-core Power-5 system
+shows the results of running this test on a 64-core \Power{5} system
with two hardware threads per core for a total of 128 software-visible
CPUs.
The \co{thinktime} parameter was zero for all these tests, and the
@@ -1137,7 +1137,7 @@ This situation will only get worse as you add CPUs.
} \QuickQuizEnd
\QuickQuiz{}
- Power-5 is several years old, and new hardware should
+ \Power{5} is several years old, and new hardware should
be faster.
So why should anyone worry about reader-writer locks being slow?
\QuickQuizAnswer{
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 07/10] treewide: Call GNU C compiler as "GCC"
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
` (5 preceding siblings ...)
2017-10-05 15:54 ` [PATCH 06/10] treewide: Use \Power{} macro for POWER CPU family Akira Yokosawa
@ 2017-10-05 15:55 ` Akira Yokosawa
2017-10-05 15:56 ` [PATCH 08/10] treewide: Use "IRQ" instead of "irq" used as abbreviation Akira Yokosawa
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:55 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From 051dc90e73bbd57412c054f482d6ad401f3b1228 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sun, 1 Oct 2017 16:29:14 +0900
Subject: [PATCH 07/10] treewide: Call GNU C compiler as "GCC"
Exception to simple substitution:
The gcc compiler -> The GNU C compiler
the gcc xxxx facility -> GCC's xxxx facility
gcc extensions -> GNU extensions
"GNU C" and "GCC" are defined in macros "\GNUC" and "\GCC" respectively.
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
count/count.tex | 18 +++++++++---------
datastruct/datastruct.tex | 2 +-
formal/formal.tex | 2 +-
memorder/memorder.tex | 2 +-
perfbook.tex | 3 +++
toolsoftrade/toolsoftrade.tex | 20 ++++++++++----------
6 files changed, 25 insertions(+), 22 deletions(-)
diff --git a/count/count.tex b/count/count.tex
index a38aba1..a213558 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -213,7 +213,7 @@ accuracies far greater than 50\,\% are almost always necessary.
\QuickQuizAnswer{
Although the \co{++} operator \emph{could} be atomic, there
is no requirement that it be so.
- And indeed, \co{gcc} often
+ And indeed, \GCC\ often
chooses to load the value to a register, increment
the register, then store the value to memory, which is
decidedly non-atomic.
@@ -486,7 +486,7 @@ thread (presumably cache aligned and padded to avoid false sharing).
It can, and in this toy implementation, it does.
But it is not that hard to come up with an alternative
implementation that permits an arbitrary number of threads,
- for example, using the \co{gcc} \co{__thread} facility,
+ for example, using \GCC's \co{__thread} facility,
as shown in
Section~\ref{sec:count:Per-Thread-Variable-Based Implementation}.
} \QuickQuizEnd
@@ -535,11 +535,11 @@ using the \co{for_each_thread()} primitive to iterate over the list of
currently running threads, and using the \co{per_thread()} primitive
to fetch the specified thread's counter.
Because the hardware can fetch and store a properly aligned \co{long}
-atomically, and because gcc is kind enough to make use of this capability,
+atomically, and because \GCC\ is kind enough to make use of this capability,
normal loads suffice, and no special atomic instructions are required.
\QuickQuiz{}
- What other choice does gcc have, anyway???
+ What other choice does \GCC\ have, anyway???
\QuickQuizAnswer{
According to the C standard, the effects of fetching a variable
that might be concurrently modified by some other thread are
@@ -548,7 +548,7 @@ normal loads suffice, and no special atomic instructions are required.
given that C must support (for example) eight-bit architectures
which are incapable of atomically loading a \co{long}.
An upcoming version of the C standard aims to fill this gap,
- but until then, we depend on the kindness of the gcc developers.
+ but until then, we depend on the kindness of the \GCC\ developers.
Alternatively, use of volatile accesses such as those provided
by \co{ACCESS_ONCE()}~\cite{JonCorbet2012ACCESS:ONCE}
@@ -987,7 +987,7 @@ comes at the cost of the additional thread running \co{eventual()}.
\label{fig:count:Per-Thread Statistical Counters}
\end{figure}
-Fortunately, gcc provides an \co{__thread} storage class that provides
+Fortunately, \GCC\ provides an \co{__thread} storage class that provides
per-thread storage.
This can be used as shown in
Figure~\ref{fig:count:Per-Thread Statistical Counters} (\path{count_end.c})
@@ -1005,13 +1005,13 @@ value of the counter and exiting threads.
\QuickQuiz{}
Why do we need an explicit array to find the other threads'
counters?
- Why doesn't gcc provide a \co{per_thread()} interface, similar
+ Why doesn't \GCC\ provide a \co{per_thread()} interface, similar
to the Linux kernel's \co{per_cpu()} primitive, to allow
threads to more easily access each others' per-thread variables?
\QuickQuizAnswer{
Why indeed?
- To be fair, gcc faces some challenges that the Linux kernel
+ To be fair, \GCC\ faces some challenges that the Linux kernel
gets to ignore.
When a user-level thread exits, its per-thread variables all
disappear, which complicates the problem of per-thread-variable
@@ -2862,7 +2862,7 @@ line~33 sends the thread a signal.
\QuickQuiz{}
The code in
Figure~\ref{fig:count:Signal-Theft Limit Counter Value-Migration Functions},
- works with gcc and POSIX.
+ works with \GCC\ and POSIX.
What would be required to make it also conform to the ISO C standard?
\QuickQuizAnswer{
The \co{theft} variable must be of type \co{sig_atomic_t}
diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index fad7668..8b8dd0a 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -2086,7 +2086,7 @@ performance and scalability.
One way to solve this problem on systems with 64-byte cache line is shown in
Figure~\ref{fig:datastruct:Alignment for 64-Byte Cache Lines}.
-Here a gcc \co{aligned} attribute is used to force the \co{->counter}
+Here \GCC's \co{aligned} attribute is used to force the \co{->counter}
and the \co{ht_elem} structure into separate cache lines.
This would allow CPUs to traverse the hash bucket list at full speed
despite the frequent incrementing.
diff --git a/formal/formal.tex b/formal/formal.tex
index f629190..e4bf3bd 100644
--- a/formal/formal.tex
+++ b/formal/formal.tex
@@ -127,7 +127,7 @@ The larger overarching software construct is of course validated by testing.
Furthermore, although the L4 microkernel is a large software
artifact from the viewpoint of formal verification, it is tiny
compared to a great number of projects, including LLVM,
- gcc, the Linux kernel, Hadoop, MongoDB, and a great many others.
+ \GCC, the Linux kernel, Hadoop, MongoDB, and a great many others.
Although formal verification is finally starting to show some
promise, including more-recent L4 verifications involving greater
diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 944c17a..ba54fee 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -4335,7 +4335,7 @@ the documentation for the specific MIPS implementation you are using.
Although the PA-RISC architecture permits full reordering of loads and
stores, actual CPUs run fully ordered~\cite{GerryKane96a}.
This means that the Linux kernel's memory-ordering primitives generate
-no code, however, they do use the gcc {\tt memory} attribute to disable
+no code, however, they do use \GCC's {\tt memory} attribute to disable
compiler optimizations that would reorder code across the memory
barrier.
diff --git a/perfbook.tex b/perfbook.tex
index cc4f4b0..dc28079 100644
--- a/perfbook.tex
+++ b/perfbook.tex
@@ -139,6 +139,9 @@
\DeclareRobustCommand{\euler}{\ensuremath{\mathrm{e}}}
\newcommand{\Power}[1]{POWER#1}
+\newcommand{\GNUC}{GNU~C}
+\newcommand{\GCC}{GCC}
+%\newcommand{\GCC}{\co{gcc}} % For those who prefer "gcc"
\newcommand{\Epigraph}[2]{\epigraphhead[65]{\rmfamily\epigraph{#1}{#2}}}
diff --git a/toolsoftrade/toolsoftrade.tex b/toolsoftrade/toolsoftrade.tex
index 97a37d3..bd43879 100644
--- a/toolsoftrade/toolsoftrade.tex
+++ b/toolsoftrade/toolsoftrade.tex
@@ -481,7 +481,7 @@ in the following section.
broken???
\QuickQuizAnswer{
Ah, but the Linux kernel is written in a carefully selected
- superset of the C language that includes special gcc
+ superset of the C language that includes special GNU
extensions, such as asms, that permit safe execution even
in presence of data races.
In addition, the Linux kernel does not run on a number of
@@ -1001,7 +1001,7 @@ rights to assume that the value of \co{goflag} would never change.
\QuickQuiz{}
Would it ever be necessary to use \co{READ_ONCE()} when accessing
a per-thread variable, for example, a variable declared using
- the \co{gcc} \co{__thread} storage class?
+ \GCC's \co{__thread} storage class?
\QuickQuizAnswer{
It depends.
If the per-thread variable was accessed only from its thread,
@@ -1156,7 +1156,7 @@ cases, for example when the readers must do high-latency file or network I/O.
There are alternatives, some of which will be presented in
Chapters~\ref{chp:Counting} and \ref{chp:Deferred Processing}.
-\subsection{Atomic Operations (gcc Classic)}
+\subsection{Atomic Operations (\GCC\ Classic)}
\label{sec:toolsoftrade:Atomic Operations (gcc Classic)}
Given that
@@ -1175,7 +1175,7 @@ If a pair of threads concurrently execute \co{__sync_fetch_and_add()} on
the same variable, the resulting value of the variable will include
the result of both additions.
-The {\sf gcc} compiler offers a number of additional atomic operations,
+The \GNUC\ compiler offers a number of additional atomic operations,
including \co{__sync_fetch_and_sub()},
\co{__sync_fetch_and_or()},
\co{__sync_fetch_and_and()},
@@ -1250,7 +1250,7 @@ avoids optimizing away a given memory read, in which case the
Figure~\ref{fig:toolsoftrade:Demonstration of Exclusive Locks}.
Similarly, the \co{WRITE_ONCE()} primitive may be used to prevent the
compiler from optimizing away a given memory write.
-These last three primitives are not provided directly by gcc,
+These last three primitives are not provided directly by \GCC,
but may be implemented straightforwardly as follows:
\vspace{5pt}
@@ -1307,7 +1307,7 @@ is vaguely similar to the Linux kernel's ``\co{READ_ONCE()}''.\footnote{
One restriction of the C11 atomics is that they apply only to special
atomic types, which can be problematic.
-The gcc compiler therefore provides atomic intrinsics, including
+The \GNUC\ compiler therefore provides atomic intrinsics, including
\co{__atomic_load()},
\co{__atomic_load_n()},
\co{__atomic_store()},
@@ -1339,14 +1339,14 @@ to key,
variable corresponding to the specified key,
and \co{pthread_getspecific()} to return that value.
-A number of compilers (including gcc) provide a \co{__thread} specifier
+A number of compilers (including \GCC) provide a \co{__thread} specifier
that may be used in a variable definition to designate that variable
as being per-thread.
The name of the variable may then be used normally to access the
value of the current thread's instance of that variable.
Of course, \co{__thread} is much easier to use than the POSIX
thead-specific data, and so \co{__thread} is usually preferred for
-code that is to be built only with gcc or other compilers supporting
+code that is to be built only with \GCC\ or other compilers supporting
\co{__thread}.
Fortunately, the C11 standard introduced a \co{_Thread_local} keyword
@@ -1365,7 +1365,7 @@ are supported.
It is still quite common to find these operations implemented in
assembly language, either for historical reasons or to obtain better
performance in specialized circumstances.
-For example, the gcc \co{__sync_} family of primitives all provide full
+For example, \GCC's \co{__sync_} family of primitives all provide full
memory-ordering semantics, which in the past motivated many developers
to create their own implementations for situations where the full memory
ordering semantics are not required.
@@ -1380,7 +1380,7 @@ code, the code samples in this book start with a call to \co{smp_init()},
which initializes a mapping from \co{pthread_t} to consecutive integers.
The userspace RCU library similarly requires a call to \co{rcu_init()}.
Although these calls can be hidden in environments (such as that of
-gcc) that support constructors,
+\GCC) that support constructors,
most of the RCU flavors supported by the userspace RCU library
also require each thread invoke \co{rcu_register_thread()} upon thread
creation and \co{rcu_unregister_thread()} before thread exit.
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 08/10] treewide: Use "IRQ" instead of "irq" used as abbreviation
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
` (6 preceding siblings ...)
2017-10-05 15:55 ` [PATCH 07/10] treewide: Call GNU C compiler as "GCC" Akira Yokosawa
@ 2017-10-05 15:56 ` Akira Yokosawa
2017-10-05 15:59 ` [PATCH 09/10] future/QC: Use upright glyph for math constant and descriptive suffix Akira Yokosawa
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:56 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From 07346d744227b79263c044034a03fc56c032dd0b Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sun, 1 Oct 2017 18:44:13 +0900
Subject: [PATCH 08/10] treewide: Use "IRQ" instead of "irq" used as abbreviation
"IRQ" is defined as a macro "\IRQ" in preamble for ease of customization.
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
defer/rcuapi.tex | 6 +++---
formal/dyntickrcu.tex | 50 +++++++++++++++++++++++++-------------------------
perfbook.tex | 2 ++
rt/rt.tex | 14 +++++++-------
4 files changed, 37 insertions(+), 35 deletions(-)
diff --git a/defer/rcuapi.tex b/defer/rcuapi.tex
index 158ebff..d60a0dd 100644
--- a/defer/rcuapi.tex
+++ b/defer/rcuapi.tex
@@ -93,7 +93,7 @@ Read side overhead &
Preempt disable/enable (free on non-\tco{PREEMPT}) &
BH disable/enable &
Preempt disable/enable (free on non-\tco{PREEMPT}) &
- Simple instructions, irq disable/enable &
+ Simple instructions, \IRQ\ disable/enable &
Simple instructions, preempt disable/enable, memory barriers \\
\hline
Asynchronous update-side overhead &
@@ -420,12 +420,12 @@ and also by their scope, as follows:
(\co{softirq}) handlers.
RCU BH is global in scope.
\item RCU Sched: read-side critical sections must guarantee forward
- progress against everything except for NMI and irq handlers,
+ progress against everything except for NMI and \IRQ\ handlers,
including \co{softirq} handlers.
RCU Sched is global in scope.
\item RCU (both classic and real-time): read-side critical sections
must guarantee forward progress against everything except for
- NMI handlers, irq handlers, \co{softirq} handlers, and (in the
+ NMI handlers, \IRQ\ handlers, \co{softirq} handlers, and (in the
real-time case) higher-priority real-time tasks.
RCU is global in scope.
\item SRCU: read-side critical sections need not guarantee
diff --git a/formal/dyntickrcu.tex b/formal/dyntickrcu.tex
index ec3c78c..2ae41ae 100644
--- a/formal/dyntickrcu.tex
+++ b/formal/dyntickrcu.tex
@@ -1829,7 +1829,7 @@ This effort provided some lessons (re)learned:
is buggy.
\item {\bf Use of atomic instructions can simplify verification.}
Unfortunately, use of the \co{cmpxchg} atomic instruction
- would also slow down the critical irq fastpath, so they
+ would also slow down the critical \IRQ\ fastpath, so they
are not appropriate in this case.
\item {\bf The need for complex formal verification often indicates
a need to re-think your design.}
@@ -1842,10 +1842,10 @@ the dynticks problem, which is presented in the next section.
\label{sec:formal:Simplicity Avoids Formal Verification}
The complexity of the dynticks interface for preemptible RCU is primarily
-due to the fact that both irqs and NMIs use the same code path and the
+due to the fact that both \IRQ s and NMIs use the same code path and the
same state variables.
This leads to the notion of providing separate code paths and variables
-for irqs and NMIs, as has been done for
+for \IRQ s and NMIs, as has been done for
hierarchical RCU~\cite{PaulEMcKenney2008HierarchicalRCU}
as indirectly suggested by
Manfred Spraul~\cite{ManfredSpraul2008StateMachineRCU}.
@@ -1884,7 +1884,7 @@ and efficiently share dynticks state.
In what follows, they can be thought of as independent per-CPU variables.
The \co{dynticks_nesting}, \co{dynticks}, and \co{dynticks_snap} variables
-are for the irq code paths, and the \co{dynticks_nmi} and
+are for the \IRQ\ code paths, and the \co{dynticks_nmi} and
\co{dynticks_nmi_snap} variables are for the NMI code paths, although
the NMI code path will also reference (but not modify) the
\co{dynticks_nesting} variable.
@@ -1895,18 +1895,18 @@ These variables are used as follows:
This counts the number of reasons that the corresponding
CPU should be monitored for RCU read-side critical sections.
If the CPU is in dynticks-idle mode, then this counts the
- irq nesting level, otherwise it is one greater than the
- irq nesting level.
+ \IRQ\ nesting level, otherwise it is one greater than the
+ \IRQ\ nesting level.
\item [\tco{dynticks}]
This counter's value is even if the corresponding CPU is
- in dynticks-idle mode and there are no irq handlers currently
+ in dynticks-idle mode and there are no \IRQ\ handlers currently
running on that CPU, otherwise the counter's value is odd.
In other words, if this counter's value is odd, then the
corresponding CPU might be in an RCU read-side critical section.
\item [\tco{dynticks_nmi}]
This counter's value is odd if the corresponding CPU is
in an NMI handler, but only if the NMI arrived while this
- CPU was in dyntick-idle mode with no irq handlers running.
+ CPU was in dyntick-idle mode with no \IRQ\ handlers running.
Otherwise, the counter's value will be even.
\item [\tco{dynticks_snap}]
This will be a snapshot of the \co{dynticks} counter, but
@@ -1924,11 +1924,11 @@ passed through a quiescent state during that interval.
\QuickQuiz{}
But what happens if an NMI handler starts running before
- an irq handler completes, and if that NMI handler continues
- running until a second irq handler starts?
+ an \IRQ\ handler completes, and if that NMI handler continues
+ running until a second \IRQ\ handler starts?
\QuickQuizAnswer{
This cannot happen within the confines of a single CPU.
- The first irq handler cannot complete until the NMI handler
+ The first \IRQ\ handler cannot complete until the NMI handler
returns.
Therefore, if each of the \co{dynticks} and \co{dynticks_nmi}
variables have taken on an even value during a given time
@@ -1985,7 +1985,7 @@ These two functions are invoked from process context.
Line~6 ensures that any prior memory accesses (which might
include accesses from RCU read-side critical sections) are seen
by other CPUs before those marking entry to dynticks-idle mode.
-Lines~7 and~12 disable and reenable irqs.
+Lines~7 and~12 disable and reenable \IRQ s.
Line~8 acquires a pointer to the current CPU's \co{rcu_dynticks}
structure, and
line~9 increments the current CPU's \co{dynticks} counter, which
@@ -2038,7 +2038,7 @@ Figure~\ref{fig:formal:NMIs From Dynticks-Idle Mode}
shows the \co{rcu_nmi_enter()} and \co{rcu_nmi_exit()} functions,
which inform RCU of NMI entry and exit, respectively, from dynticks-idle
mode.
-However, if the NMI arrives during an irq handler, then RCU will already
+However, if the NMI arrives during an \IRQ\ handler, then RCU will already
be on the lookout for RCU read-side critical sections from this CPU,
so lines~6 and~7 of \co{rcu_nmi_enter()} and lines~18 and~19
of \co{rcu_nmi_exit()} silently return if \co{dynticks} is odd.
@@ -2091,7 +2091,7 @@ respectively.
Figure~\ref{fig:formal:Interrupts From Dynticks-Idle Mode}
shows \co{rcu_irq_enter()} and \co{rcu_irq_exit()}, which
-inform RCU of entry to and exit from, respectively, irq context.
+inform RCU of entry to and exit from, respectively, \IRQ\ context.
Line~6 of \co{rcu_irq_enter()} increments \co{dynticks_nesting},
and if this variable was already non-zero, line~7 silently returns.
Otherwise, line~8 increments \co{dynticks}, which will then have
@@ -2099,18 +2099,18 @@ an odd value, consistent with the fact that this CPU can now
execute RCU read-side critical sections.
Line~10 therefore executes a memory barrier to ensure that
the increment of \co{dynticks} is seen before any
-RCU read-side critical sections that the subsequent irq handler
+RCU read-side critical sections that the subsequent \IRQ\ handler
might execute.
Line~18 of \co{rcu_irq_exit()} decrements \co{dynticks_nesting}, and
if the result is non-zero, line~19 silently returns.
Otherwise, line~20 executes a memory barrier to ensure that the
increment of \co{dynticks} on line~21 is seen after any RCU
-read-side critical sections that the prior irq handler might have executed.
+read-side critical sections that the prior \IRQ\ handler might have executed.
Line~22 verifies that \co{dynticks} is now even, consistent with
the fact that no RCU read-side critical sections may appear in
dynticks-idle mode.
-Lines~23-25 check to see if the prior irq handlers enqueued any
+Lines~23-25 check to see if the prior \IRQ\ handlers enqueued any
RCU callbacks, forcing this CPU out of dynticks-idle mode via
a reschedule API if so.
@@ -2159,7 +2159,7 @@ Figures~\ref{fig:formal:Entering and Exiting Dynticks-Idle Mode},
Lines~11 and~12 record the snapshots for later calls to
\co{rcu_implicit_dynticks_qs()},
and lines~13 and~14 check to see if the CPU is in dynticks-idle mode with
-neither irqs nor NMIs in progress (in other words, both snapshots
+neither \IRQ s nor NMIs in progress (in other words, both snapshots
have even values), hence in an extended quiescent state.
If so, lines~15 and~16 count this event, and line~17 returns
true if the CPU was in a quiescent state.
@@ -2225,15 +2225,15 @@ waiting for a CPU that is offline.
This is still pretty complicated.
Why not just have a \co{cpumask_t} that has a bit set for
each CPU that is in dyntick-idle mode, clearing the bit
- when entering an irq or NMI handler, and setting it upon
+ when entering an \IRQ\ or NMI handler, and setting it upon
exit?
\QuickQuizAnswer{
Although this approach would be functionally correct, it
- would result in excessive irq entry/exit overhead on
+ would result in excessive \IRQ\ entry/exit overhead on
large machines.
In contrast, the approach laid out in this section allows
- each CPU to touch only per-CPU data on irq and NMI entry/exit,
- resulting in much lower irq entry/exit overhead, especially
+ each CPU to touch only per-CPU data on \IRQ\ and NMI entry/exit,
+ resulting in much lower \IRQ\ entry/exit overhead, especially
on large machines.
} \QuickQuizEnd
@@ -2243,9 +2243,9 @@ waiting for a CPU that is offline.
A slight shift in viewpoint resulted in a substantial simplification
of the dynticks interface for RCU.
The key change leading to this simplification was minimizing of
-sharing between irq and NMI contexts.
+sharing between \IRQ\ and NMI contexts.
The only sharing in this simplified interface is references from NMI
-context to irq variables (the \co{dynticks} variable).
+context to \IRQ\ variables (the \co{dynticks} variable).
This type of sharing is benign, because the NMI functions never update
this variable, so that its value remains constant through the lifetime
of the NMI handler.
@@ -2254,6 +2254,6 @@ understood one at a time, in happy contrast to the situation
described in
Section~\ref{sec:formal:Promela Parable: dynticks and Preemptible RCU},
where an NMI might change shared state at any point during execution of
-the irq functions.
+the \IRQ\ functions.
Verification can be a good thing, but simplicity is even better.
diff --git a/perfbook.tex b/perfbook.tex
index dc28079..906d71b 100644
--- a/perfbook.tex
+++ b/perfbook.tex
@@ -142,6 +142,8 @@
\newcommand{\GNUC}{GNU~C}
\newcommand{\GCC}{GCC}
%\newcommand{\GCC}{\co{gcc}} % For those who prefer "gcc"
+\newcommand{\IRQ}{IRQ}
+%\newcommand{\IRQ}{irq} % For those who prefer "irq"
\newcommand{\Epigraph}[2]{\epigraphhead[65]{\rmfamily\epigraph{#1}{#2}}}
diff --git a/rt/rt.tex b/rt/rt.tex
index 21e7117..f1c0ae1 100644
--- a/rt/rt.tex
+++ b/rt/rt.tex
@@ -995,20 +995,20 @@ indefinitely, thus indefinitely degrading real-time latencies.
One way of addressing this problem is the use of threaded interrupts shown in
Figure~\ref{fig:rt:Threaded Interrupt Handler}.
-Interrupt handlers run in the context of a preemptible IRQ thread,
+Interrupt handlers run in the context of a preemptible \IRQ\ thread,
which runs at a configurable priority.
The device interrupt handler then runs for only a short time, just
-long enough to make the IRQ thread aware of the new event.
+long enough to make the \IRQ\ thread aware of the new event.
As shown in the figure, threaded interrupts can greatly improve
real-time latencies, in part because interrupt handlers running in
-the context of the IRQ thread may be preempted by high-priority real-time
+the context of the \IRQ\ thread may be preempted by high-priority real-time
threads.
However, there is no such thing as a free lunch, and there are downsides
to threaded interrupts.
One downside is increased interrupt latency.
Instead of immediately running the interrupt handler, the handler's execution
-is deferred until the IRQ thread gets around to running it.
+is deferred until the \IRQ\ thread gets around to running it.
Of course, this is not a problem unless the device generating the interrupt
is on the real-time application's critical path.
@@ -1025,16 +1025,16 @@ which can be caused by, among other things, locks acquired by
preemptible interrupt handlers~\cite{LuiSha1990PriorityInheritance}.
Suppose that a low-priority thread holds a lock, but is preempted by
a group of medium-priority threads, at least one such thread per CPU.
-If an interrupt occurs, a high-priority IRQ thread will preempt one
+If an interrupt occurs, a high-priority \IRQ\ thread will preempt one
of the medium-priority threads, but only until it decides to acquire
the lock held by the low-priority thread.
Unfortunately, the low-priority thread cannot release the lock until
it starts running, which the medium-priority threads prevent it from
doing.
-So the high-priority IRQ thread cannot acquire the lock until after one
+So the high-priority \IRQ\ thread cannot acquire the lock until after one
of the medium-priority threads releases its CPU.
In short, the medium-priority threads are indirectly blocking the
-high-priority IRQ threads, a classic case of priority inversion.
+high-priority \IRQ\ threads, a classic case of priority inversion.
Note that this priority inversion could not happen with non-threaded
interrupts because the low-priority thread would have to disable interrupts
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 09/10] future/QC: Use upright glyph for math constant and descriptive suffix
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
` (7 preceding siblings ...)
2017-10-05 15:56 ` [PATCH 08/10] treewide: Use "IRQ" instead of "irq" used as abbreviation Akira Yokosawa
@ 2017-10-05 15:59 ` Akira Yokosawa
2017-10-05 16:00 ` [PATCH 10/10] styleguide: Reflect recent style improvements Akira Yokosawa
2017-10-05 20:48 ` [PATCH 00/10] Tweaks to follow guidelines in style guide Paul E. McKenney
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 15:59 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From 22146e245ef551489d65834ac4f766176f53bfaf Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 30 Sep 2017 20:22:13 +0900
Subject: [PATCH 09/10] future/QC: Use upright glyph for math constant and descriptive suffix
To have access to a larger set of Greek glyphs and improved math mode
typesetting, substitute "newtxtext" and "newtxmath" packages for
the "mathptmx" package.
Uppercase Greek letters are now slanted by default.
To specify upright Greek letters, you can use commands such as
"\upDelta" and "\uppi" provided by the newtxmath package. In QC.tex,
"pi" is used to represent the circular constant and "Delta" is used to
represent the difference operator. In these cases upright glyphs should be
used.
In NIST style guide, descriptive suffixes are also recommended to be
upright. To avoid repetitive use of \mathrm{} command, macros "\TLo",
"\THi", and "\CPf" are defined locally in QC.tex.
Also use mathcal font for Big O.[1]
NOTE 1: For target "1csf", we now use the "newtxsf" package, which
also provides a larger set of Greek glyphs. However, it is not available
on TeX Live 2013/Debian. Furthermore, it uses a different upright font in
math mode than in text mode. You can distinguish math mode figures from
text mode figures in this target, but the difference looks acceptable.
The font choice for this target can be changed should a better font
combination be found.
NOTE 2: On TeX Live 2013/Debian, newtxmath has a few spacing issues.
They are fixed on TeX Live 2015/Debian, which is available on
Ubuntu Xenial. Both newtxtext and newtxmath are actively updated.
See https://www.ctan.org/pkg/newtx.
[1]: https://texblog.org/2014/06/24/big-o-and-related-notations-in-latex/
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
FAQ-BUILD.txt | 4 +--
Makefile | 2 +-
appendix/styleguide/styleguide.tex | 6 ++--
future/QC.tex | 62 ++++++++++++++++++++------------------
perfbook.tex | 12 +++++---
5 files changed, 46 insertions(+), 40 deletions(-)
diff --git a/FAQ-BUILD.txt b/FAQ-BUILD.txt
index 1fa581d..e277d95 100644
--- a/FAQ-BUILD.txt
+++ b/FAQ-BUILD.txt
@@ -114,9 +114,9 @@
directory.
Following is a list of links to optional packages as of
- March 2017:
+ October 2017:
https://www.ctan.org/pkg/newtxtt
https://www.ctan.org/pkg/nimbus15
https://www.ctan.org/pkg/inconsolata
- https://www.ctan.org/pkg/mathastext
+ https://www.ctan.org/pkg/newtxsf
diff --git a/Makefile b/Makefile
index 8799313..d8d23c7 100644
--- a/Makefile
+++ b/Makefile
@@ -134,7 +134,7 @@ perfbook-msnt.tex: perfbook.tex
perfbook-1csf.tex: perfbook-1c.tex
sed -e 's/setboolean{sansserif}{false}/setboolean{sansserif}{true}/' \
-e 's/%msfontstub/\\usepackage[var0]{inconsolata}[2013\/07\/17]/' < $< > $@
- @echo "## This target requires recent version (>= 1.3i) of mathastext. ##"
+ @echo "## This target requires math font package newtxsf. ##"
# Rules related to perfbook_html are removed as of May, 2016
diff --git a/appendix/styleguide/styleguide.tex b/appendix/styleguide/styleguide.tex
index bb100a9..bec700c 100644
--- a/appendix/styleguide/styleguide.tex
+++ b/appendix/styleguide/styleguide.tex
@@ -940,7 +940,7 @@ with the help of ``booktabs'' and ``xcolor'' packages.
\begin{tabular}{lrrr}\toprule
Situation
& $T$ (K)
- & $C_P$ & \parbox[b]{.75in}{\raggedleft Power per watt\par waste heat (W)} \\
+ & $\CPf$ & \parbox[b]{.75in}{\raggedleft Power per watt\par waste heat (W)} \\
\midrule
Dry Ice
& $195$
@@ -1320,7 +1320,7 @@ with dashed horizontal and vertical rules of the arydshln package.
\begin{tabular}{l:r:r:r}\toprule
Situation
& $T$ (K)
- & $C_P$ & \parbox[b]{.75in}{\raggedleft Power per watt\par waste heat (W)} \\
+ & $\CPf$ & \parbox[b]{.75in}{\raggedleft Power per watt\par waste heat (W)} \\
\hline
Dry Ice
& $195$
@@ -1356,7 +1356,7 @@ Table~\ref{tab:app:styleguide:Refrigeration Power Consumption (arydshln-2)}.
\begin{tabular}{lrrr}\toprule
Situation
& $T$ (K)
- & $C_P$ & \parbox[b]{.75in}{\raggedleft Power per watt\par waste heat (W)} \\
+ & $\CPf$ & \parbox[b]{.75in}{\raggedleft Power per watt\par waste heat (W)} \\
\midrule
Dry Ice
& $195$
diff --git a/future/QC.tex b/future/QC.tex
index daf0086..349c4ed 100644
--- a/future/QC.tex
+++ b/future/QC.tex
@@ -329,7 +329,7 @@ are as follows:
\begin{description}
\item[\qop{H}\,:]
- Rotate 180\degree{} ($\pi$ radians) about the Bloch-sphere
+ Rotate 180\degree{} ($\uppi$ radians) about the Bloch-sphere
X-Z axis, that is, about the 45\degree{} line on the
X-Z plane. This rotates $\ket{0}$ to the point at which the
positive X\=/axis intersects the Bloch sphere, and rotates $\ket{1}$
@@ -337,31 +337,31 @@ are as follows:
sphere.
Either way, we get a qubit that is 50\,\% one and 50\,\% zero.
\item[\qop{S}\,:]
- Rotate 90\degree{} ($\frac{\pi}{2}$ radians) about the
+ Rotate 90\degree{} ($\frac{\uppi}{2}$ radians) about the
Bloch-sphere Z\=/axis, which has no effect on qubits in the
$\ket{0}$ or $\ket{1}$ states.
\item[\qop{S}$^{\bm{\dagger}}$:]
- Rotate $-90\degree$ ($-\frac{\pi}{2}$ radians) about the
+ Rotate $-90\degree$ ($-\frac{\uppi}{2}$ radians) about the
Bloch-sphere Z\=/axis, which has no effect on qubits in the
$\ket{0}$ or $\ket{1}$ states.
This operator is the inverse of \qop{S}.
\item[\qop{T}\,:]
- Rotate 45\degree{} ($\frac{\pi}{4}$ radians) about the
+ Rotate 45\degree{} ($\frac{\uppi}{4}$ radians) about the
Bloch-sphere Z\=/axis, which has no effect on qubits in the
$\ket{0}$ or $\ket{1}$ states.
\item[\qop{T}$^{\bm{\dagger}}$:]
- Rotate $-45\degree$ ($-\frac{\pi}{4}$ radians) about the
+ Rotate $-45\degree$ ($-\frac{\uppi}{4}$ radians) about the
Bloch-sphere Z\=/axis, which has no effect on qubits in the
$\ket{0}$ or $\ket{1}$ states.
This operator is the inverse of \qop{T}.
\item[\qop{X}\,:]
- Rotate 180\degree{} ($\pi$ radians) about the Bloch-sphere
+ Rotate 180\degree{} ($\uppi$ radians) about the Bloch-sphere
X\=/axis, which takes $\ket{0}$ to $\ket{1}$ and vice versa.
\item[\qop{Y}\,:]
- Rotate 180\degree{} ($\pi$ radians) about the Bloch-sphere
+ Rotate 180\degree{} ($\uppi$ radians) about the Bloch-sphere
Y\=/axis, which also takes $\ket{0}$ to $\ket{1}$ and vice versa.
\item[\qop{Z}\,:]
- Rotate 180\degree{} ($\pi$ radians) about the Bloch-sphere
+ Rotate 180\degree{} ($\uppi$ radians) about the Bloch-sphere
Z\=/axis, which has no effect on qubits in the $\ket{0}$ or
$\ket{1}$ states.
\end{description}
@@ -606,11 +606,11 @@ However, because of its thermodynamic reversibiltiy,
QC is governed by an even lower limit:
\begin{equation}
- \Delta E \geq \frac{\hbar}{2 \Delta t}
+ \upDelta E \geq \frac{\hbar}{2 \upDelta t}
\end{equation}
-Here $\Delta E$ is the energy required to change the qubit in Joules,
-$\Delta t$ is the time taken to change the qubit in seconds, and
+Here $\upDelta E$ is the energy required to change the qubit in Joules,
+$\upDelta t$ is the time taken to change the qubit in seconds, and
$\hbar$ is Planck's constant, which is $6.62 \times 10^{-34}$\,J$\cdot$s.
For the 50-nanosecond switching times of IBM's Quantum Experience
hardware, this limit is $5.52 \times 10^{-27}$\,J, more than an order
@@ -631,12 +631,16 @@ program.
Unfortunately, it is not just the amount of heat generated that is
important, but also the temperature at which this heat is generated.
+\newcommand{\TLo}{T_\mathrm{L}}
+\newcommand{\THi}{T_\mathrm{H}}
+\newcommand{\CPf}{C_\mathrm{P}}
+
The thermodynamic theoretical limit on the ability of a refrigerator
-to transport heat from a low temperature ($T_L$) to a high temperature
-($T_H$) is given by the coefficient of performance ($C_P$):
+to transport heat from a low temperature ($\TLo$) to a high temperature
+($\THi$) is given by the coefficient of performance ($\CPf$):
\begin{equation}
- C_P = \frac{T_L}{T_H - T_L}
+ \CPf = \frac{\TLo}{\THi - \TLo}
\end{equation}
\begin{table}
@@ -664,9 +668,9 @@ fancifully illustrated in
Table~\ref{tab:future:The Three Laws of Thermodynamics}.
The nominal temperature for IBM~Q is 15~millikelvins, which certainly
-qualifies as a low $T_L$.
-Let's assume $T_H$ is 293\,K (room temperature),
-in which case $C_P$ is $0.000051$.
+qualifies as a low $\TLo$.
+Let's assume $\THi$ is 293\,K (room temperature),
+in which case $\CPf$ is $0.000051$.
This in turn means that it requires \emph{at least} one watt of
power into the refrigeration unit to transport $0.000051$~watts
of waste heat from the 15~millikelvin IBM~Q out to room temperature.
@@ -692,7 +696,7 @@ at low temperatures.\footnote{
& & & Power per watt \\
Situation
& $T$ (K)
- & $C_P$ & waste heat (W) \\
+ & $\CPf$ & waste heat (W) \\
\hline
\hline
Dry Ice
@@ -996,27 +1000,27 @@ it to be not too early to start thinking in terms of replacing RSA.
\label{sec:future:Grover's Search Algorithm}
Grover's algorithm searches an unordered list of $N$ items
-in $O(\sqrt N)$ time.
+in $\O{\sqrt N}$ time.
This is mainly intended for implicit search for solutions as opposed
to searching through data.
To see why, keep in mind that before any data can be searched,
that data list must be downloaded into the QC system, and that
-this download will have computational complexity $O(n)$, where
+this download will have computational complexity $\O{n}$, where
$n$ is the number of data items.
The competing classical system can use this time to sort the data
or to construct any desired index over the data, and the computational
-complexity of these operations can be considered to be $O(n \log_2 n)$,
+complexity of these operations can be considered to be $\O{n \log_2 n}$,
after which the classical
-system can carry out the search in $O(\log N)$ time, which
-is much faster than the $O(\sqrt N)$ time promised by
+system can carry out the search in $\O{\log N}$ time, which
+is much faster than the $\O{\sqrt N}$ time promised by
Grover's algorithm.
\QuickQuiz{}
- What do you mean $O(n)$ for classic-computing sorting/indexing
- and $O(n \log_2 n)$ for classic-computing search?
- Hash tables do $O(n)$ and $O(1)$ respectively!!!
+ What do you mean $\O{n}$ for classic-computing sorting/indexing
+ and $\O{n \log_2 n}$ for classic-computing search?
+ Hash tables do $\O{n}$ and $\O{1}$ respectively!!!
\QuickQuizAnswer{
- Fixed-size hash table lookups are $O(n)$, not $O(1)$.
+ Fixed-size hash table lookups are $\O{n}$, not $\O{1}$.
And for a resizing hash table, fairness dictates that the overhead
of resizing be properly accounted for.
@@ -1049,7 +1053,7 @@ computing:
Of course, one can pick $n$ and $m$ to favor either approach.
It makes little sense to choose small $m$ because the winner of that
-race is a simple $O(n)$ sequential scan.
+race is a simple $\O{n}$ sequential scan.
More interesting scenarios use larger values of $m$.
The first scenario looks at
@@ -1185,7 +1189,7 @@ That said, this analysis has some limitations:
\item Explicit lists are assumed.
Implicit lists might well favor quantum computing.
\item Traditional sorting and indexing is assumed to result in
- the traditional $O(\log N)$ computational complexity for
+ the traditional $\O{\log N}$ computational complexity for
classic-computing search.
\item Quantum computing is assumed to be capable of handling
very large data sets.
diff --git a/perfbook.tex b/perfbook.tex
index 906d71b..f36b7ca 100644
--- a/perfbook.tex
+++ b/perfbook.tex
@@ -7,9 +7,9 @@
% A more pleasant font
\usepackage{lmodern}
\usepackage[T1]{fontenc} % use postscript type 1 fonts
+\usepackage[defaultsups]{newtxtext} % use nice, standard fonts for roman
\usepackage{textcomp} % use symbols in TS1 encoding
-\usepackage{mathptmx} % use nice, standard fonts for roman
-\usepackage[scaled=.92]{helvet} % and sans serif
+\renewcommand*\ttdefault{lmtt}
%msfontstub
% Improves the text layout
@@ -85,9 +85,10 @@
\IfSansSerif{
\renewcommand{\familydefault}{\sfdefault}
\normalfont
-\usepackage[italic]{mathastext}[2016/01/06]
-\renewcommand{\path}[1]{\nolinkurl{#1}} % workaround of interference with mathastext
-}{}
+\usepackage[slantedGreek,scaled=.96]{newtxsf}
+}{
+\usepackage[slantedGreek]{newtxmath} % math package to be used with newtxtext
+}
\newcommand{\LstLineNo}{\makebox[5ex][r]{\arabic{VerbboxLineNo}\hspace{2ex}}}
@@ -138,6 +139,7 @@
\newcommand{\qop}[1]{{\sffamily #1}} % QC operator such as H, T, S, etc.
\DeclareRobustCommand{\euler}{\ensuremath{\mathrm{e}}}
+\DeclareRobustCommand{\O}[1]{\ensuremath{\mathcal{O}(#1)}}
\newcommand{\Power}[1]{POWER#1}
\newcommand{\GNUC}{GNU~C}
\newcommand{\GCC}{GCC}
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 10/10] styleguide: Reflect recent style improvements
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
` (8 preceding siblings ...)
2017-10-05 15:59 ` [PATCH 09/10] future/QC: Use upright glyph for math constant and descriptive suffix Akira Yokosawa
@ 2017-10-05 16:00 ` Akira Yokosawa
2017-10-05 20:48 ` [PATCH 00/10] Tweaks to follow guidelines in style guide Paul E. McKenney
10 siblings, 0 replies; 12+ messages in thread
From: Akira Yokosawa @ 2017-10-05 16:00 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
From 2890e0069882321553c16aac213d4bb8d0a06fb7 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Wed, 4 Oct 2017 08:18:50 +0900
Subject: [PATCH 10/10] styleguide: Reflect recent style improvements
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
appendix/styleguide/styleguide.tex | 58 ++++++++++++--------------------------
1 file changed, 18 insertions(+), 40 deletions(-)
diff --git a/appendix/styleguide/styleguide.tex b/appendix/styleguide/styleguide.tex
index bec700c..d7ee97d 100644
--- a/appendix/styleguide/styleguide.tex
+++ b/appendix/styleguide/styleguide.tex
@@ -181,29 +181,14 @@ Example:
$45\degree$, rather than $45\,\degree$.
\end{quote}
-\subsection{NIST Guide Yet To Be Followed}
-\label{sec:app:styleguide:NIST Guides Yet To Be Followed}
-
-There are a few cases where NIST style guide is not followed.
-Other English conventions are followed in such cases.
-NIST rules of
-Sections~\ref{sec:app:styleguide:Percent Symbol}
-and~\ref{sec:app:styleguide:Font Style}
-are deemed acceptable by the editor.
-Contributions in those areas should be welcome.
-
\subsubsection{Percent Symbol}
\label{sec:app:styleguide:Percent Symbol}
NIST style guide treats the percent symbol (\%) as the same as SI unit
symbols.
-In this textbook, no space is placed in front of a percent symbol.
\begin{quote}
-\begin{tabular}{ll}
- NIST guide:& 50\,\% possibility\\
- Current convention:& 50\% possibility\\
-\end{tabular}
+ 50\,\% possibility, rather than 50\% possibility.
\end{quote}
\subsubsection{Font Style}
@@ -235,12 +220,15 @@ For example,
$\mathrm{e}^x$
\end{quote}
-In this textbook, this rule is not much considered as of this writing.
-Most letters in math mode are italic regardless of what they
-represent. Exceptions are uppercase Greek letters, which are upright
-in math mode by default.\footnote{
- See \url{https://tex.stackexchange.com/questions/119248/}
- for the historical reason.}
+%\footnote{
+% See \url{https://tex.stackexchange.com/questions/119248/}
+% for the historical reason.}
+
+\subsection{NIST Guide Yet To Be Followed}
+\label{sec:app:styleguide:NIST Guides Yet To Be Followed}
+
+There are a few cases where NIST style guide is not followed.
+Other English conventions are followed in such cases.
\subsubsection{Digit Grouping}
\label{sec:app:styleguide:Digit Grouping}
@@ -670,7 +658,14 @@ Example with an en dash:
\label{sec:app:styleguide:Numerical Minus Sign}
Numerical minus signs should be coded as math mode minus signs,
-namely \qco{$-$}. For example,
+namely \qco{$-$}.\footnote{This rule assumes that math mode uses the
+ same upright glyph as text mode. Our default font choice meets
+ the assumption.
+\IfSansSerif{
+ One of the experimental targets ``1csf'' \emph{does} use a differnt font
+ for math mode figures as of October 2017.}{}
+}
+For example,
\begin{quote}
$-30$, rather than -30.
@@ -1399,28 +1394,11 @@ for examples of tables with complex headings.
Other improvement candidates are listed in the source of this
section as comments.
-% Capitalize initialism:
-% Gnu Compiler Collection = GCC
-% gcc should be used as a command name in \co{gcc}
-% When mentioning GCC's C language, use `GNU C'
-%
% Trademarks:
% As the Legal page covers trademarks, there is no need to
% use trademark symbol in the text. They seems to have been
% imported from original publications.
%
-% Power or POWER?
-% IBM's trademark page at https://www.ibm.com/legal/us/en/copytrade.shtml#section-P
-% lists the following.
-% PowerPC
-% Power Architecture
-% Power
-% POWER
-% POWER5
-% POWER6
-%
-% not Power5, POWER 5, nor Power-5
-%
% Ugly line break by \co{}
% __
% atomic_store()
--
2.7.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 00/10] Tweaks to follow guidelines in style guide
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
` (9 preceding siblings ...)
2017-10-05 16:00 ` [PATCH 10/10] styleguide: Reflect recent style improvements Akira Yokosawa
@ 2017-10-05 20:48 ` Paul E. McKenney
10 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2017-10-05 20:48 UTC (permalink / raw)
To: Akira Yokosawa; +Cc: perfbook
On Fri, Oct 06, 2017 at 12:47:21AM +0900, Akira Yokosawa wrote:
> >From 2890e0069882321553c16aac213d4bb8d0a06fb7 Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@gmail.com>
> Date: Wed, 5 Oct 2017 22:57:46 +0900
> Subject: [PATCH 00/10] Tweaks to follow guidelines in style guide
>
> Hi Paul,
>
> This patch set consists of minor tweaks in regard to the suggestions
> having been presented in style guide for a while.
>
> Patches #1 -- #5 are trivial changes.
>
> Patch #6 attempts to improve consistency in denoting POWER series CPU
> by defining a macro "\Power{}".
>
> Patch #7 substitutes "GCC" for "gcc". There are a few exceptions as
> mentioned in commit log.
>
> Patch #8 substitutes "IRQ" for "irq" in the same way. You might like
> to skip this one, as I see "irq" more often than "IRQ" in Linux
> documentations.
>
> Patch #9 is somewhat invasive. It switches Times font to that of
> "newtxtext" and "newtxmath" packages. The reason of the change
> is to have access to both upright and slated glyphs of Greek letters.
> Recent versions of these font packages give better looking result,
> especially in math mode. As noted in the commit log, newtxmath in
> TeX Live 2013/Debian has a few issues which have been fixed in later
> versions. It also switches font choice for the experimental target "1csf".
>
> Patch #10 updates style guide to reflect the changes made in this
> patch set.
They look fine, so I applied them, thank you! We might want to go with
"irq", but let's see how people react. Easy to change, aside from
beginnings of sentences!
Thanx, Paul
> Thanks, Akira
> --
> Akira Yokosawa (10):
> debugging: Insert narrow space in front of percent symbol
> debugging: Use upright font for Euler's number
> future/QC: Insert narrow space in front of percent symbol
> future/QC: Use non-breakable hyphen for axis names
> treewide: Insert narrow space in front of percent symbol
> treewide: Use \Power{} macro for POWER CPU family
> treewide: Call GNU C compiler as "GCC"
> treewide: Use "IRQ" instead of "irq" used as abbreviation
> future/QC: Use upright glyph for math constant and descriptive suffix
> styleguide: Reflect recent style improvements
>
> FAQ-BUILD.txt | 4 +-
> Makefile | 2 +-
> SMPdesign/SMPdesign.tex | 2 +-
> SMPdesign/beyond.tex | 14 ++---
> advsync/advsync.tex | 2 +-
> appendix/styleguide/styleguide.tex | 64 ++++++++---------------
> appendix/toyrcu/toyrcu.tex | 26 +++++-----
> count/count.tex | 28 +++++-----
> cpu/hwfreelunch.tex | 4 +-
> datastruct/datastruct.tex | 2 +-
> debugging/debugging.tex | 104 ++++++++++++++++++-------------------
> defer/rcuapi.tex | 6 +--
> defer/rcuusage.tex | 4 +-
> formal/dyntickrcu.tex | 52 +++++++++----------
> formal/formal.tex | 2 +-
> formal/spinhint.tex | 2 +-
> future/QC.tex | 92 ++++++++++++++++----------------
> future/htm.tex | 2 +-
> future/tm.tex | 4 +-
> intro/intro.tex | 8 +--
> memorder/memorder.tex | 32 ++++++------
> perfbook.tex | 20 +++++--
> rt/rt.tex | 22 ++++----
> toolsoftrade/toolsoftrade.tex | 24 ++++-----
> 24 files changed, 257 insertions(+), 265 deletions(-)
>
> --
> 2.7.4
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-10-05 20:48 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-05 15:47 [PATCH 00/10] Tweaks to follow guidelines in style guide Akira Yokosawa
2017-10-05 15:48 ` [PATCH 01/10] debugging: Insert narrow space in front of percent symbol Akira Yokosawa
2017-10-05 15:49 ` [PATCH 02/10] debugging: Use upright font for Euler's number Akira Yokosawa
2017-10-05 15:51 ` [PATCH 03/10] future/QC: Insert narrow space in front of percent symbol Akira Yokosawa
2017-10-05 15:52 ` [PATCH 04/10] future/QC: Use non-breakable hyphen for axis names Akira Yokosawa
2017-10-05 15:53 ` [PATCH 05/10] treewide: Insert narrow space in front of percent symbol Akira Yokosawa
2017-10-05 15:54 ` [PATCH 06/10] treewide: Use \Power{} macro for POWER CPU family Akira Yokosawa
2017-10-05 15:55 ` [PATCH 07/10] treewide: Call GNU C compiler as "GCC" Akira Yokosawa
2017-10-05 15:56 ` [PATCH 08/10] treewide: Use "IRQ" instead of "irq" used as abbreviation Akira Yokosawa
2017-10-05 15:59 ` [PATCH 09/10] future/QC: Use upright glyph for math constant and descriptive suffix Akira Yokosawa
2017-10-05 16:00 ` [PATCH 10/10] styleguide: Reflect recent style improvements Akira Yokosawa
2017-10-05 20:48 ` [PATCH 00/10] Tweaks to follow guidelines in style guide Paul E. McKenney
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.