Discussions of the Parallel Programming book
 help / color / mirror / Atom feed
* [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future)
@ 2017-10-13 22:57 Akira Yokosawa
  2017-10-13 22:58 ` [PATCH 2/2] Convert code snippets to 'listing' env (appendix) Akira Yokosawa
  2017-10-13 23:37 ` [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future) Paul E. McKenney
  0 siblings, 2 replies; 3+ messages in thread
From: Akira Yokosawa @ 2017-10-13 22:57 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From c8849b06b1563d9d82dfaf0f120932c4998c8b03 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 14 Oct 2017 00:12:52 +0900
Subject: [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future)

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 advsync/advsync.tex   | 10 ++++-----
 future/htm.tex        | 16 +++++++--------
 rt/rt.tex             | 34 +++++++++++++++----------------
 together/applyrcu.tex | 56 +++++++++++++++++++++++++--------------------------
 together/refcnt.tex   | 38 +++++++++++++++++-----------------
 5 files changed, 77 insertions(+), 77 deletions(-)

diff --git a/advsync/advsync.tex b/advsync/advsync.tex
index adf1dc9..3b04aff 100644
--- a/advsync/advsync.tex
+++ b/advsync/advsync.tex
@@ -188,7 +188,7 @@ algorithm as wait-free.
 This algorithm is probably the most heavily used NBS algorithm in
 the Linux kernel.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 static inline bool
@@ -217,14 +217,14 @@ the Linux kernel.
 \centering
 \theverbbox
 \caption{NBS Enqueue Algorithm}
-\label{fig:count:NBS Enqueue Algorithm}
-\end{figure}
+\label{lst:count:NBS Enqueue Algorithm}
+\end{listing}

 Another common NBS algorithm is the atomic queue where elements are
 enqueued using an atomic exchange instruction~\cite{MagedMichael1993JPDC},
 followed by a store into the \co{->next} pointer of the new element's
 predecessor, as shown in
-Figure~\ref{fig:count:NBS Enqueue Algorithm},
+Listing~\ref{lst:count:NBS Enqueue Algorithm},
 which shows the userspace-RCU library
 implementation~\cite{MathieuDesnoyers2009URCU}.
 Line~9 updates the tail pointer to reference the new element while
@@ -240,7 +240,7 @@ Although mutual exclusion is required to dequeue a single element
 removal of the entire contents of the queue.
 What is not possible is to dequeue any given element in a non-blocking
 manner: The enqueuer might have failed between lines~9 and~10 of the
-figure, so that the element in question is only partially enqueued.
+listing, so that the element in question is only partially enqueued.
 This results in a half-NBS algorithm where enqueues are NBS but
 dequeues are blocking.
 This algorithm is nevertheless used in practice, in part because
diff --git a/future/htm.tex b/future/htm.tex
index 80d3e1e..7c9f610 100644
--- a/future/htm.tex
+++ b/future/htm.tex
@@ -733,7 +733,7 @@ Therefore, the medium-priority threads are in effect blocking the
 high-priority process, which is the rationale for the name ``priority
 inversion.''

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 void boostee(void)
@@ -765,8 +765,8 @@ inversion.''
 \centering
 \theverbbox
 \caption{Exploiting Priority Boosting}
-\label{fig:future:Exploiting Priority Boosting}
-\end{figure}
+\label{lst:future:Exploiting Priority Boosting}
+\end{listing}

 One way to avoid priority inversion is \emph{priority inheritance},
 in which a high-priority thread blocked on a lock temporarily donates
@@ -774,10 +774,10 @@ its priority to the lock's holder, which is also called \emph{priority
 boosting}.
 However, priority boosting can be used for things other than avoiding
 priority inversion, as shown in
-Figure~\ref{fig:future:Exploiting Priority Boosting}.
-Lines~1-12 of this figure show a low-priority process that must
+Listing~\ref{lst:future:Exploiting Priority Boosting}.
+Lines~1-12 of this listing show a low-priority process that must
 nevertheless run every millisecond or so, while lines~14-24 of
-this same figure show a high-priority process that uses priority
+this same listing show a high-priority process that uses priority
 boosting to ensure that \co{boostee()} runs periodically as needed.

 The \co{boostee()} function arranges this by always holding one of
@@ -786,7 +786,7 @@ the two \co{boost_lock[]} locks, so that lines~20-21 of

 \QuickQuiz{}
 	But the \co{boostee()} function in
-	Figure~\ref{fig:future:Exploiting Priority Boosting}
+	Listing~\ref{lst:future:Exploiting Priority Boosting}
 	alternatively acquires its locks in reverse order!
 	Won't this result in deadlock?
 \QuickQuizAnswer{
@@ -816,7 +816,7 @@ this thread might never again get a chance to run.

 And if the \co{boostee()} thread is not holding the lock, then
 the \co{booster()} thread's empty critical section on lines~20 and~21 of
-Figure~\ref{fig:future:Exploiting Priority Boosting}
+Listing~\ref{lst:future:Exploiting Priority Boosting}
 will become an empty transaction that has no effect, so that
 \co{boostee()} never runs.
 This example illustrates some of the subtle consequences of
diff --git a/rt/rt.tex b/rt/rt.tex
index f1c0ae1..d9c32aa 100644
--- a/rt/rt.tex
+++ b/rt/rt.tex
@@ -1150,7 +1150,7 @@ sections~\cite{DinakarGuniguntala2008IBMSysJ}.
 Otherwise, long RCU read-side critical sections would result in
 excessive real-time latencies.

-\begin{figure}[tb]
+\begin{listing}[tb]
 { \scriptsize
 \begin{verbbox}
  1 void __rcu_read_lock(void)
@@ -1180,8 +1180,8 @@ excessive real-time latencies.
 \centering
 \theverbbox
 \caption{Preemptible Linux-Kernel RCU}
-\label{fig:rt:Preemptible Linux-Kernel RCU}
-\end{figure}
+\label{lst:rt:Preemptible Linux-Kernel RCU}
+\end{listing}

 A preemptible RCU implementation was therefore added to the Linux kernel.
 This implementation avoids the need to individually track the state of
@@ -1194,7 +1194,7 @@ of the current grace period and
 while in one of those pre-existing critical sections have removed
 themselves from their lists.
 A simplified version of this implementation is shown in
-Figure~\ref{fig:rt:Preemptible Linux-Kernel RCU}.
+Listing~\ref{lst:rt:Preemptible Linux-Kernel RCU}.
 The \co{__rcu_read_lock()} function spans lines~1-5 and
 the \co{__rcu_read_unlock()} function spans lines~7-22.

@@ -1241,7 +1241,7 @@ count on line~20.
 \QuickQuiz{}
 	Suppose that preemption occurs just after the load from
 	\co{t->rcu_read_unlock_special.s} on line~17 of
-	Figure~\ref{fig:rt:Preemptible Linux-Kernel RCU}.
+	Listing~\ref{lst:rt:Preemptible Linux-Kernel RCU}.
 	Mightn't that result in the task failing to invoke
 	\co{rcu_read_unlock_special()}, thus failing to remove itself
 	from the list of tasks blocking the current grace period,
@@ -1460,7 +1460,7 @@ real-time use.
 OSADL runs long-term tests of systems, so referring to their
 website (\url{http://osadl.org/}) can be helpful.

-\begin{figure}[tb]
+\begin{listing}[tb]
 { \scriptsize
 \begin{verbbox}
  1 cd /sys/kernel/debug/tracing
@@ -1473,8 +1473,8 @@ website (\url{http://osadl.org/}) can be helpful.
 \centering
 \theverbbox
 \caption{Locating Sources of OS Jitter}
-\label{fig:rt:Locating Sources of OS Jitter}
-\end{figure}
+\label{lst:rt:Locating Sources of OS Jitter}
+\end{listing}

 Unfortunately, this list of OS-jitter sources can never be complete,
 as it will change with each new version of the kernel.
@@ -1482,7 +1482,7 @@ This makes it necessary to be able to track down additional sources
 of OS jitter.
 Given a CPU $N$ running a CPU-bound usermode thread, the
 commands shown in
-Figure~\ref{fig:rt:Locating Sources of OS Jitter}
+Listing~\ref{lst:rt:Locating Sources of OS Jitter}
 will produce a list of all the times that this CPU entered the kernel.
 Of course, the \co{N} on line~5 must be replaced with the
 number of the CPU in question, and the \co{1} on line~2 may be increased
@@ -1704,7 +1704,7 @@ or about 9,000 degrees of rotation per second, which translates to
 We therefore need to schedule the fuel injection to within a time
 interval of about 100 microseconds.

-\begin{figure}[tb]
+\begin{listing}[tb]
 { \scriptsize
 \begin{verbbox}
  1 if (clock_gettime(CLOCK_REALTIME, &timestart) != 0) {
@@ -1724,15 +1724,15 @@ interval of about 100 microseconds.
 \centering
 \theverbbox
 \caption{Timed-Wait Test Program}
-\label{fig:rt:Timed-Wait Test Program}
-\end{figure}
+\label{lst:rt:Timed-Wait Test Program}
+\end{listing}

 Suppose that a timed wait was to be used to initiate fuel injection,
 although if you are building an engine, I hope you supply a rotation
 sensor.
 We need to test the timed-wait functionality, perhaps using the test program
 shown in
-Figure~\ref{fig:rt:Timed-Wait Test Program}.
+Listing~\ref{lst:rt:Timed-Wait Test Program}.
 Unfortunately, if we run this program, we can get unacceptable timer
 jitter, even in a -rt kernel.

@@ -1789,7 +1789,7 @@ in fields imaginatively named \co{a}, \co{b}, and \co{c}.
 Otherwise, \co{cur_cal} points to a dynamically allocated
 structure providing the current calibration values.

-\begin{figure}[tb]
+\begin{listing}[tb]
 { \scriptsize
 \begin{verbbox}
  1 struct calibration {
@@ -1832,10 +1832,10 @@ structure providing the current calibration values.
 \centering
 \theverbbox
 \caption{Real-Time Calibration Using RCU}
-\label{fig:rt:Real-Time Calibration Using RCU}
-\end{figure}
+\label{lst:rt:Real-Time Calibration Using RCU}
+\end{listing}

-Figure~\ref{fig:rt:Real-Time Calibration Using RCU}
+Listing~\ref{lst:rt:Real-Time Calibration Using RCU}
 shows how RCU can be used to solve this problem.
 Lookups are deterministic, as shown in \co{calc_control()}
 on lines~9-15, consistent with real-time requirements.
diff --git a/together/applyrcu.tex b/together/applyrcu.tex
index d477ab5..5e7efff 100644
--- a/together/applyrcu.tex
+++ b/together/applyrcu.tex
@@ -108,7 +108,7 @@ held constant, ensuring that \co{read_count()} sees consistent data.

 \subsubsection{Implementation}

-\begin{figure}[bp]
+\begin{listing}[bp]
 { \scriptsize
 \begin{verbbox}
   1 struct countarray {
@@ -186,11 +186,11 @@ held constant, ensuring that \co{read_count()} sees consistent data.
 \centering
 \theverbbox
 \caption{RCU and Per-Thread Statistical Counters}
-\label{fig:together:RCU and Per-Thread Statistical Counters}
-\end{figure}
+\label{lst:together:RCU and Per-Thread Statistical Counters}
+\end{listing}

 Lines~1-4 of
-Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}
+Listing~\ref{lst:together:RCU and Per-Thread Statistical Counters}
 show the \co{countarray} structure, which contains a
 \co{->total} field for the count from previously exited threads,
 and a \co{counterp[]} array of pointers to the per-thread
@@ -235,7 +235,7 @@ Line~43 picks up the current thread's index, line~45 acquires
 \QuickQuiz{}
 	Hey!!!
 	Line~46 of
-	Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}
+	Listing~\ref{lst:together:RCU and Per-Thread Statistical Counters}
 	modifies a value in a pre-existing \co{countarray} structure!
 	Didn't you say that this structure, once made available to
 	\co{read_count()}, remained constant???
@@ -278,7 +278,7 @@ Line~69 can then safely free the old \co{countarray} structure.

 \QuickQuiz{}
 	Wow!
-	Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}
+	Listing~\ref{lst:together:RCU and Per-Thread Statistical Counters}
 	contains 69 lines of code, compared to only 42 in
 	Listing~\ref{lst:count:Per-Thread Statistical Counters}.
 	Is this extra complexity really worth it?
@@ -303,7 +303,7 @@ Line~69 can then safely free the old \co{countarray} structure.
 	Listing~\ref{lst:count:Per-Thread Statistical Counters}, but
 	with all the scalability and performance benefits of the
 	implementation shown in
-	Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}!
+	Listing~\ref{lst:together:RCU and Per-Thread Statistical Counters}!
 } \QuickQuizEnd

 Use of RCU enables exiting threads to wait until other threads are
@@ -389,7 +389,7 @@ tradeoff.
 \subsection{Array and Length}
 \label{sec:together:Array and Length}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 struct foo {
@@ -401,11 +401,11 @@ tradeoff.
 \centering
 \theverbbox
 \caption{RCU-Protected Variable-Length Array}
-\label{fig:together:RCU-Protected Variable-Length Array}
-\end{figure}
+\label{lst:together:RCU-Protected Variable-Length Array}
+\end{listing}

 Suppose we have an RCU-protected variable-length array, as shown in
-Figure~\ref{fig:together:RCU-Protected Variable-Length Array}.
+Listing~\ref{lst:together:RCU-Protected Variable-Length Array}.
 The length of the array \co{->a[]} can change dynamically, and at any
 given time, its length is given by the field \co{->length}.
 Of course, this introduces the following race condition:
@@ -429,7 +429,7 @@ covered in Chapter~\ref{chp:memorder:Memory Ordering}.
 This works, but incurs read-side overhead and, perhaps worse, requires
 use of explicit memory barriers.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 struct foo_a {
@@ -445,12 +445,12 @@ use of explicit memory barriers.
 \centering
 \theverbbox
 \caption{Improved RCU-Protected Variable-Length Array}
-\label{fig:together:Improved RCU-Protected Variable-Length Array}
-\end{figure}
+\label{lst:together:Improved RCU-Protected Variable-Length Array}
+\end{listing}

 A better approach is to put the value and the array into the same structure,
 as shown in
-Figure~\ref{fig:together:Improved RCU-Protected Variable-Length Array}.
+Listing~\ref{lst:together:Improved RCU-Protected Variable-Length Array}.
 Allocating a new array (\co{foo_a} structure) then automatically provides
 a new place for the array length.
 This means that if any CPU picks up a reference to \co{->fa}, it is
@@ -480,7 +480,7 @@ A more general version of this approach is presented in the next section.
 \subsection{Correlated Fields}
 \label{sec:together:Correlated Fields}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 struct animal {
@@ -496,12 +496,12 @@ A more general version of this approach is presented in the next section.
 \centering
 \theverbbox
 \caption{Uncorrelated Measurement Fields}
-\label{fig:together:Uncorrelated Measurement Fields}
-\end{figure}
+\label{lst:together:Uncorrelated Measurement Fields}
+\end{listing}

 Suppose that each of Sch\"odinger's animals is represented by the
 data element shown in
-Figure~\ref{fig:together:Uncorrelated Measurement Fields}.
+Listing~\ref{lst:together:Uncorrelated Measurement Fields}.
 The \co{meas_1}, \co{meas_2}, and \co{meas_3} fields are a set
 of correlated measurements that are updated periodically.
 It is critically important that readers see these three values from
@@ -521,7 +521,7 @@ measurement values, but it requires copying a large structure due
 to the \co{->photo[]} field.
 This copying might incur unacceptably large overhead.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 struct measurement {
@@ -541,11 +541,11 @@ This copying might incur unacceptably large overhead.
 \centering
 \theverbbox
 \caption{Correlated Measurement Fields}
-\label{fig:together:Correlated Measurement Fields}
-\end{figure}
+\label{lst:together:Correlated Measurement Fields}
+\end{listing}

 Another approach is to insert a level of indirection, as shown in
-Figure~\ref{fig:together:Correlated Measurement Fields}.
+Listing~\ref{lst:together:Correlated Measurement Fields}.
 When a new measurement is taken, a new \co{measurement} structure
 is allocated, filled in with the measurements, and the \co{animal}
 structure's \co{->mp} field is updated to point to this new
@@ -555,13 +555,13 @@ can be freed.

 \QuickQuiz{}
 	But cant't the approach shown in
-	Figure~\ref{fig:together:Correlated Measurement Fields}
+	Listing~\ref{lst:together:Correlated Measurement Fields}
 	result in extra cache misses, in turn resulting in additional
 	read-side overhead?
 \QuickQuizAnswer{
 	Indeed it can.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 struct measurement {
@@ -582,11 +582,11 @@ can be freed.
 \centering
 \theverbbox
 \caption{Localized Correlated Measurement Fields}
-\label{fig:together:Localized Correlated Measurement Fields}
-\end{figure}
+\label{lst:together:Localized Correlated Measurement Fields}
+\end{listing}

 	One way to avoid this cache-miss overhead is shown in
-	Figure~\ref{fig:together:Localized Correlated Measurement Fields}:
+	Listing~\ref{lst:together:Localized Correlated Measurement Fields}:
 	Simply embed an instance of a \co{measurement} structure
 	named \co{meas}
 	into the \co{animal} structure, and point the \co{->mp}
diff --git a/together/refcnt.tex b/together/refcnt.tex
index 626c375..8a7fa1d 100644
--- a/together/refcnt.tex
+++ b/together/refcnt.tex
@@ -149,12 +149,12 @@ of compiler optimizations.
 This is the method of choice when the lock is required to protect
 other operations in addition to the reference count, but where
 a reference to the object must be held after the lock is released.
-Figure~\ref{fig:together:Simple Reference-Count API} shows a simple
+Listing~\ref{lst:together:Simple Reference-Count API} shows a simple
 API that might be used to implement simple non-atomic reference
 counting---although simple reference counting is almost always
 open-coded instead.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 struct sref {
@@ -188,8 +188,8 @@ open-coded instead.
 \centering
 \theverbbox
 \caption{Simple Reference-Count API}
-\label{fig:together:Simple Reference-Count API}
-\end{figure}
+\label{lst:together:Simple Reference-Count API}
+\end{listing}

 \subsubsection{Atomic Counting}
 \label{sec:together:Atomic Counting}
@@ -203,7 +203,7 @@ Any CPU that hands the object off must first acquire a new reference
 on behalf of the recipient object.
 In the Linux kernel, the \co{kref} primitives are used to implement
 this style of reference counting, as shown in
-Figure~\ref{fig:together:Linux Kernel kref API}.
+Listing~\ref{lst:together:Linux Kernel kref API}.

 Atomic counting is required
 because locking is not used to protect all reference-count operations,
@@ -243,10 +243,10 @@ RCU read-side critical sections.
 	there cannot possibly be any CPU that is permitted
 	to acquire a new reference.
 	This same fact allows the non-atomic check in line~22
-	of Figure~\ref{fig:together:Linux Kernel kref API}.
+	of Listing~\ref{lst:together:Linux Kernel kref API}.
 } \QuickQuizEnd

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 struct kref {
@@ -282,12 +282,12 @@ RCU read-side critical sections.
 \centering
 \theverbbox
 \caption{Linux Kernel \tco{kref} API}
-\label{fig:together:Linux Kernel kref API}
-\end{figure}
+\label{lst:together:Linux Kernel kref API}
+\end{listing}

 The \co{kref} structure itself, consisting of a single atomic
 data item, is shown in lines~1-3 of
-Figure~\ref{fig:together:Linux Kernel kref API}.
+Listing~\ref{lst:together:Linux Kernel kref API}.
 The \co{kref_init()} function on lines~5-8 initializes the counter
 to the value ``1''.
 Note that the \co{atomic_set()} primitive is a simple
@@ -314,7 +314,7 @@ Otherwise, \co{kref_sub()} returns zero, informing the caller that
 \QuickQuiz{}
 	Suppose that just after the \co{atomic_sub_and_test()}
 	on line~22 of
-	Figure~\ref{fig:together:Linux Kernel kref API} is invoked,
+	Listing~\ref{lst:together:Linux Kernel kref API} is invoked,
 	that some other CPU invokes \co{kref_get()}.
 	Doesn't this result in that other CPU now having an illegal
 	reference to a released object?
@@ -357,9 +357,9 @@ layer to track the destination caches that are used in packet routing.
 The actual implementation is quite a bit more involved; this section
 focuses on the aspects of \co{struct dst_entry} reference-count
 handling that matches this use case,
-shown in Figure~\ref{fig:together:Linux Kernel dst-clone API}.
+shown in Listing~\ref{lst:together:Linux Kernel dst-clone API}.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 static inline
@@ -384,8 +384,8 @@ shown in Figure~\ref{fig:together:Linux Kernel dst-clone API}.
 \centering
 \theverbbox
 \caption{Linux Kernel \tco{dst_clone} API}
-\label{fig:together:Linux Kernel dst-clone API}
-\end{figure}
+\label{lst:together:Linux Kernel dst-clone API}
+\end{listing}

 The \co{dst_clone()} primitive may be used if the caller
 already has a reference to the specified \co{dst_entry},
@@ -443,9 +443,9 @@ as shown below.
 The Linux kernel's \co{fget()} and \co{fput()} primitives
 use this style of reference counting.
 Simplified versions of these functions are shown in
-Figure~\ref{fig:together:Linux Kernel fget/fput API}.
+Listing~\ref{lst:together:Linux Kernel fget/fput API}.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \fontsize{6.5pt}{7.5pt}\selectfont
 \begin{verbbox}
   1 struct file *fget(unsigned int fd)
@@ -494,8 +494,8 @@ Figure~\ref{fig:together:Linux Kernel fget/fput API}.
 \centering
 \theverbbox
 \caption{Linux Kernel \tco{fget}/\tco{fput} API}
-\label{fig:together:Linux Kernel fget/fput API}
-\end{figure}
+\label{lst:together:Linux Kernel fget/fput API}
+\end{listing}

 Line~4 of \co{fget()} fetches the pointer to the current
 process's file-descriptor table, which might well be shared
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] Convert code snippets to 'listing' env (appendix)
  2017-10-13 22:57 [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future) Akira Yokosawa
@ 2017-10-13 22:58 ` Akira Yokosawa
  2017-10-13 23:37 ` [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future) Paul E. McKenney
  1 sibling, 0 replies; 3+ messages in thread
From: Akira Yokosawa @ 2017-10-13 22:58 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa

From a7719c90d14629a667fd82d9ecbd9c78a1ddefd8 Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>
Date: Sat, 14 Oct 2017 07:46:18 +0900
Subject: [PATCH 2/2] Convert code snippets to 'listing' env (appendix)

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 appendix/questions/after.tex |  34 +++----
 appendix/toyrcu/toyrcu.tex   | 230 +++++++++++++++++++++----------------------
 2 files changed, 132 insertions(+), 132 deletions(-)

diff --git a/appendix/questions/after.tex b/appendix/questions/after.tex
index 9944bcf..cea0791 100644
--- a/appendix/questions/after.tex
+++ b/appendix/questions/after.tex
@@ -12,15 +12,15 @@ and ``c''.
 The producer loops recording the current time
 (in seconds since 1970 in decimal),
 then updating the values of ``a'', ``b'', and ``c'',
-as shown in Figure~\ref{fig:app:questions:After Producer Function}.
+as shown in Listing~\ref{lst:app:questions:After Producer Function}.
 The consumer code loops, also recording the current time, but also
 copying the producer's timestamp along with the fields ``a'',
 ``b'', and ``c'', as shown in
-Figure~\ref{fig:app:questions:After Consumer Function}.
+Listing~\ref{lst:app:questions:After Consumer Function}.
 At the end of the run, the consumer outputs a list of anomalous recordings,
 e.g., where time has appeared to go backwards.

-\begin{figure}[htbp]
+\begin{listing}[htbp]
 { \scriptsize
 \begin{verbbox}
   1 /* WARNING: BUGGY CODE. */
@@ -47,10 +47,10 @@ e.g., where time has appeared to go backwards.
 \centering
 \theverbbox
 \caption{``After'' Producer Function}
-\label{fig:app:questions:After Producer Function}
-\end{figure}
+\label{lst:app:questions:After Producer Function}
+\end{listing}

-\begin{figure}[htbp]
+\begin{listing}[htbp]
 { \scriptsize
 \begin{verbbox}
   1 /* WARNING: BUGGY CODE. */
@@ -110,8 +110,8 @@ e.g., where time has appeared to go backwards.
 \centering
 \theverbbox
 \caption{``After'' Consumer Function}
-\label{fig:app:questions:After Consumer Function}
-\end{figure}
+\label{lst:app:questions:After Consumer Function}
+\end{listing}

 \QuickQuiz{}
 	What SMP coding errors can you see in these examples?
@@ -165,14 +165,14 @@ instructions in that time.
 One possible reason is given by the following sequence of events:
 \begin{enumerate}
 \item	Consumer obtains timestamp
-	(Figure~\ref{fig:app:questions:After Consumer Function}, line~13).
+	(Listing~\ref{lst:app:questions:After Consumer Function}, line~13).
 \item	Consumer is preempted.
 \item	An arbitrary amount of time passes.
 \item	Producer obtains timestamp
-	(Figure~\ref{fig:app:questions:After Producer Function}, line~10).
+	(Listing~\ref{lst:app:questions:After Producer Function}, line~10).
 \item	Consumer starts running again, and picks up the producer's
 	timestamp
-	(Figure~\ref{fig:app:questions:After Consumer Function}, line~14).
+	(Listing~\ref{lst:app:questions:After Consumer Function}, line~14).
 \end{enumerate}

 In this scenario, the producer's timestamp might be an arbitrary
@@ -185,16 +185,16 @@ Simply use SMP primitives as designed.

 In this example, the easiest fix is to use locking, for example,
 acquire a lock in the producer before line~10 in
-Figure~\ref{fig:app:questions:After Producer Function} and in
+Listing~\ref{lst:app:questions:After Producer Function} and in
 the consumer before line~13 in
-Figure~\ref{fig:app:questions:After Consumer Function}.
+Listing~\ref{lst:app:questions:After Consumer Function}.
 This lock must also be released after line~13 in
-Figure~\ref{fig:app:questions:After Producer Function} and
+Listing~\ref{lst:app:questions:After Producer Function} and
 after line~17 in
-Figure~\ref{fig:app:questions:After Consumer Function}.
+Listing~\ref{lst:app:questions:After Consumer Function}.
 These locks cause the code segments in lines~10-13 of
-Figure~\ref{fig:app:questions:After Producer Function} and in lines~13-17 of
-Figure~\ref{fig:app:questions:After Consumer Function} to {\em exclude}
+Listing~\ref{lst:app:questions:After Producer Function} and in lines~13-17 of
+Listing~\ref{lst:app:questions:After Consumer Function} to {\em exclude}
 each other, in other words, to run atomically with respect to each other.
 This is represented in
 Figure~\ref{fig:app:questions:Effect of Locking on Snapshot Collection}:
diff --git a/appendix/toyrcu/toyrcu.tex b/appendix/toyrcu/toyrcu.tex
index 2c65f74..cd6541c 100644
--- a/appendix/toyrcu/toyrcu.tex
+++ b/appendix/toyrcu/toyrcu.tex
@@ -31,7 +31,7 @@ provides a summary and a list of desirable RCU properties.
 \section{Lock-Based RCU}
 \label{sec:app:toyrcu:Lock-Based RCU}

-\begin{figure}[bp]
+\begin{listing}[bp]
 { \scriptsize
 \begin{verbbox}
   1 static void rcu_read_lock(void)
@@ -54,12 +54,12 @@ provides a summary and a list of desirable RCU properties.
 \centering
 \theverbbox
 \caption{Lock-Based RCU Implementation}
-\label{fig:app:toyrcu:Lock-Based RCU Implementation}
-\end{figure}
+\label{lst:app:toyrcu:Lock-Based RCU Implementation}
+\end{listing}

 Perhaps the simplest RCU implementation leverages locking, as
 shown in
-Figure~\ref{fig:app:toyrcu:Lock-Based RCU Implementation}
+Listing~\ref{lst:app:toyrcu:Lock-Based RCU Implementation}
 (\path{rcu_lock.h} and \path{rcu_lock.c}).
 In this implementation, \co{rcu_read_lock()} acquires a global
 spinlock, \co{rcu_read_unlock()} releases it, and
@@ -86,11 +86,11 @@ preventing grace-period sharing.

 \QuickQuiz{}
 	Why wouldn't any deadlock in the RCU implementation in
-	Figure~\ref{fig:app:toyrcu:Lock-Based RCU Implementation}
+	Listing~\ref{lst:app:toyrcu:Lock-Based RCU Implementation}
 	also be a deadlock in any other RCU implementation?
 \QuickQuizAnswer{
 %
-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 void foo(void)
@@ -117,11 +117,11 @@ preventing grace-period sharing.
 \centering
 \theverbbox
 \caption{Deadlock in Lock-Based RCU Implementation}
-\label{fig:app:toyrcu:Deadlock in Lock-Based RCU Implementation}
-\end{figure}
+\label{lst:app:toyrcu:Deadlock in Lock-Based RCU Implementation}
+\end{listing}
 %
 	Suppose the functions \co{foo()} and \co{bar()} in
-	Figure~\ref{fig:app:toyrcu:Deadlock in Lock-Based RCU Implementation}
+	Listing~\ref{lst:app:toyrcu:Deadlock in Lock-Based RCU Implementation}
 	are invoked concurrently from different CPUs.
 	Then \co{foo()} will acquire \co{my_lock()} on line~3,
 	while \co{bar()} will acquire \co{rcu_gp_lock} on
@@ -141,7 +141,7 @@ preventing grace-period sharing.
 \QuickQuiz{}
 	Why not simply use reader-writer locks in the RCU implementation
 	in
-	Figure~\ref{fig:app:toyrcu:Lock-Based RCU Implementation}
+	Listing~\ref{lst:app:toyrcu:Lock-Based RCU Implementation}
 	in order to allow RCU readers to proceed in parallel?
 \QuickQuizAnswer{
 	One could in fact use reader-writer locks in this manner.
@@ -169,7 +169,7 @@ in the next section.
 \section{Per-Thread Lock-Based RCU}
 \label{sec:app:toyrcu:Per-Thread Lock-Based RCU}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 static void rcu_read_lock(void)
@@ -196,10 +196,10 @@ in the next section.
 \centering
 \theverbbox
 \caption{Per-Thread Lock-Based RCU Implementation}
-\label{fig:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
-\end{figure}
+\label{lst:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
+Listing~\ref{lst:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
 (\path{rcu_lock_percpu.h} and \path{rcu_lock_percpu.c})
 shows an implementation based on one lock per thread.
 The \co{rcu_read_lock()} and \co{rcu_read_unlock()} functions
@@ -222,7 +222,7 @@ up to more than 100 \emph{microseconds} on 64 CPUs.
 \QuickQuiz{}
 	Wouldn't it be cleaner to acquire all the locks, and then
 	release them all in the loop from lines~15-18 of
-	Figure~\ref{fig:app:toyrcu:Per-Thread Lock-Based RCU Implementation}?
+	Listing~\ref{lst:app:toyrcu:Per-Thread Lock-Based RCU Implementation}?
 	After all, with this change, there would be a point in time
 	when there were no readers, simplifying things greatly.
 \QuickQuizAnswer{
@@ -232,7 +232,7 @@ up to more than 100 \emph{microseconds} on 64 CPUs.

 \QuickQuiz{}
 	Is the implementation shown in
-	Figure~\ref{fig:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
+	Listing~\ref{lst:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
 	free from deadlocks?
 	Why or why not?
 \QuickQuizAnswer{
@@ -266,7 +266,7 @@ up to more than 100 \emph{microseconds} on 64 CPUs.

 \QuickQuiz{}
 	Isn't one advantage of the RCU algorithm shown in
-	Figure~\ref{fig:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
+	Listing~\ref{lst:app:toyrcu:Per-Thread Lock-Based RCU Implementation}
 	that it uses only primitives that are widely available,
 	for example, in POSIX pthreads?
 \QuickQuizAnswer{
@@ -289,7 +289,7 @@ the shortcomings of the lock-based implementation.
 \section{Simple Counter-Based RCU}
 \label{sec:app:toyrcu:Simple Counter-Based RCU}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 atomic_t rcu_refcnt;
@@ -319,11 +319,11 @@ the shortcomings of the lock-based implementation.
 \centering
 \theverbbox
 \caption{RCU Implementation Using Single Global Reference Counter}
-\label{fig:app:toyrcu:RCU Implementation Using Single Global Reference Counter}
-\end{figure}
+\label{lst:app:toyrcu:RCU Implementation Using Single Global Reference Counter}
+\end{listing}

 A slightly more sophisticated RCU implementation is shown in
-Figure~\ref{fig:app:toyrcu:RCU Implementation Using Single Global Reference Counter}
+Listing~\ref{lst:app:toyrcu:RCU Implementation Using Single Global Reference Counter}
 (\path{rcu_rcg.h} and \path{rcu_rcg.c}).
 This implementation makes use of a global reference counter
 \co{rcu_refcnt} defined on line~1.
@@ -410,7 +410,7 @@ is of course unacceptable in production settings.
 	Why not simply make \co{rcu_read_lock()} wait when a concurrent
 	\co{synchronize_rcu()} has been waiting too long in
 	the RCU implementation in
-	Figure~\ref{fig:app:toyrcu:RCU Implementation Using Single Global Reference Counter}?
+	Listing~\ref{lst:app:toyrcu:RCU Implementation Using Single Global Reference Counter}?
 	Wouldn't that prevent \co{synchronize_rcu()} from starving?
 \QuickQuizAnswer{
 	Although this would in fact eliminate the starvation, it would
@@ -435,7 +435,7 @@ scheme that is more favorable to writers.
 \section{Starvation-Free Counter-Based RCU}
 \label{sec:app:toyrcu:Starvation-Free Counter-Based RCU}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 DEFINE_SPINLOCK(rcu_gp_lock);
@@ -448,10 +448,10 @@ scheme that is more favorable to writers.
 \centering
 \theverbbox
 \caption{RCU Global Reference-Count Pair Data}
-\label{fig:app:toyrcu:RCU Global Reference-Count Pair Data}
-\end{figure}
+\label{lst:app:toyrcu:RCU Global Reference-Count Pair Data}
+\end{listing}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 static void rcu_read_lock(void)
@@ -487,10 +487,10 @@ scheme that is more favorable to writers.
 \centering
 \theverbbox
 \caption{RCU Read-Side Using Global Reference-Count Pair}
-\label{fig:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
-\end{figure}
+\label{lst:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
 (\path{rcu_rcgp.h})
 shows the read-side primitives of an RCU implementation that uses a pair
 of reference counters (\co{rcu_refcnt[]}),
@@ -500,7 +500,7 @@ a per-thread nesting counter \co{rcu_nesting},
 a per-thread snapshot of the global index (\co{rcu_read_idx}),
 and a global lock (\co{rcu_gp_lock}),
 which are themselves shown in
-Figure~\ref{fig:app:toyrcu:RCU Global Reference-Count Pair Data}.
+Listing~\ref{lst:app:toyrcu:RCU Global Reference-Count Pair Data}.

 \paragraph{Design}

@@ -554,7 +554,7 @@ These additional measures use the per-thread \co{rcu_nesting} variable
 to track nesting.

 To make all this work, line~6 of \co{rcu_read_lock()} in
-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
 picks up the
 current thread's instance of \co{rcu_nesting}, and if line~7 finds
 that this is the outermost \co{rcu_read_lock()},
@@ -577,7 +577,7 @@ the selected element of \co{rcu_refcnt}.
 Regardless of the nesting level, line~27 decrements this thread's
 instance of \co{rcu_nesting}.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 void synchronize_rcu(void)
@@ -606,10 +606,10 @@ instance of \co{rcu_nesting}.
 \centering
 \theverbbox
 \caption{RCU Update Using Global Reference-Count Pair}
-\label{fig:app:toyrcu:RCU Update Using Global Reference-Count Pair}
-\end{figure}
+\label{lst:app:toyrcu:RCU Update Using Global Reference-Count Pair}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:RCU Update Using Global Reference-Count Pair}
+Listing~\ref{lst:app:toyrcu:RCU Update Using Global Reference-Count Pair}
 (\path{rcu_rcpg.c})
 shows the corresponding \co{synchronize_rcu()} implementation.
 Lines~6 and 19 acquire and release \co{rcu_gp_lock} in order to
@@ -628,7 +628,7 @@ checking of \co{rcu_refcnt}.

 \QuickQuiz{}
 	Why the memory barrier on line~5 of \co{synchronize_rcu()} in
-	Figure~\ref{fig:app:toyrcu:RCU Update Using Global Reference-Count Pair}
+	Listing~\ref{lst:app:toyrcu:RCU Update Using Global Reference-Count Pair}
 	given that there is a spin-lock acquisition immediately after?
 \QuickQuizAnswer{
 	The spin-lock acquisition only guarantees that the spin-lock's
@@ -644,7 +644,7 @@ checking of \co{rcu_refcnt}.

 	Exercise for the reader: use a tool such as Promela/spin
 	to determine which (if any) of the memory barriers in
-	Figure~\ref{fig:app:toyrcu:RCU Update Using Global Reference-Count Pair}
+	Listing~\ref{lst:app:toyrcu:RCU Update Using Global Reference-Count Pair}
 	are really needed.
 	See Chapter~\ref{chp:Formal Verification}
 	for information on using these tools.
@@ -653,17 +653,17 @@ checking of \co{rcu_refcnt}.

 \QuickQuiz{}
 	Why is the counter flipped twice in
-	Figure~\ref{fig:app:toyrcu:RCU Update Using Global Reference-Count Pair}?
+	Listing~\ref{lst:app:toyrcu:RCU Update Using Global Reference-Count Pair}?
 	Shouldn't a single flip-and-wait cycle be sufficient?
 \QuickQuizAnswer{
 	Both flips are absolutely required.
 	To see this, consider the following sequence of events:
 	\begin{enumerate}
 	\item	Line~8 of \co{rcu_read_lock()} in
-		Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
+		Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}
 		picks up \co{rcu_idx}, finding its value to be zero.
 	\item	Line~8 of \co{synchronize_rcu()} in
-		Figure~\ref{fig:app:toyrcu:RCU Update Using Global Reference-Count Pair}
+		Listing~\ref{lst:app:toyrcu:RCU Update Using Global Reference-Count Pair}
 		complements the value of \co{rcu_idx}, setting its
 		value to one.
 	\item	Lines~10-13 of \co{synchronize_rcu()} find that the
@@ -706,7 +706,7 @@ checking of \co{rcu_refcnt}.

 This implementation avoids the update-starvation issues that could
 occur in the single-counter implementation shown in
-Figure~\ref{fig:app:toyrcu:RCU Implementation Using Single Global Reference Counter}.
+Listing~\ref{lst:app:toyrcu:RCU Implementation Using Single Global Reference Counter}.

 \paragraph{Discussion}

@@ -716,7 +716,7 @@ and \co{rcu_read_unlock()}
 are still quite heavyweight.
 In fact, they are more complex than those
 of the single-counter variant shown in
-Figure~\ref{fig:app:toyrcu:RCU Implementation Using Single Global Reference Counter},
+Listing~\ref{lst:app:toyrcu:RCU Implementation Using Single Global Reference Counter},
 with the read-side primitives consuming about 150~nanoseconds on a single
 \Power{5} CPU and almost 40~\emph{microseconds} on a 64-CPU system.
 The update-side \co{synchronize_rcu()} primitive is more costly as
@@ -748,7 +748,7 @@ sharing.
 	Given that atomic increment and decrement are so expensive,
 	why not just use non-atomic increment on line~10 and a
 	non-atomic decrement on line~25 of
-	Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}?
+	Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair}?
 \QuickQuizAnswer{
 	Using non-atomic operations would cause increments and decrements
 	to be lost, in turn causing the implementation to fail.
@@ -769,7 +769,7 @@ scheme that provides greatly improved read-side performance and scalability.
 \section{Scalable Counter-Based RCU}
 \label{sec:app:toyrcu:Scalable Counter-Based RCU}

-\begin{figure}[tb]
+\begin{listing}[tb]
 { \scriptsize
 \begin{verbbox}
   1 DEFINE_SPINLOCK(rcu_gp_lock);
@@ -782,10 +782,10 @@ scheme that provides greatly improved read-side performance and scalability.
 \centering
 \theverbbox
 \caption{RCU Per-Thread Reference-Count Pair Data}
-\label{fig:app:toyrcu:RCU Per-Thread Reference-Count Pair Data}
-\end{figure}
+\label{lst:app:toyrcu:RCU Per-Thread Reference-Count Pair Data}
+\end{listing}

-\begin{figure}[tb]
+\begin{listing}[tb]
 { \scriptsize
 \begin{verbbox}
   1 static void rcu_read_lock(void)
@@ -821,18 +821,18 @@ scheme that provides greatly improved read-side performance and scalability.
 \centering
 \theverbbox
 \caption{RCU Read-Side Using Per-Thread Reference-Count Pair}
-\label{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair}
-\end{figure}
+\label{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair}
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair}
 (\path{rcu_rcpl.h})
 shows the read-side primitives of an RCU implementation that uses per-thread
 pairs of reference counters.
 This implementation is quite similar to that shown in
-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair},
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Global Reference-Count Pair},
 the only difference being that \co{rcu_refcnt} is now a per-thread
 array (as shown in
-Figure~\ref{fig:app:toyrcu:RCU Per-Thread Reference-Count Pair Data}).
+Listing~\ref{lst:app:toyrcu:RCU Per-Thread Reference-Count Pair Data}).
 As with the algorithm in the previous section, use of this two-element
 array prevents readers from starving updaters.
 One benefit of per-thread \co{rcu_refcnt[]} array is that the
@@ -857,7 +857,7 @@ perform atomic operations.
 	But thankfully, it seems that no one runs Linux on 8-bit systems.
 } \QuickQuizEnd

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 static void flip_counter_and_wait(int i)
@@ -891,15 +891,15 @@ perform atomic operations.
 \centering
 \theverbbox
 \caption{RCU Update Using Per-Thread Reference-Count Pair}
-\label{fig:app:toyrcu:RCU Update Using Per-Thread Reference-Count Pair}
-\end{figure}
+\label{lst:app:toyrcu:RCU Update Using Per-Thread Reference-Count Pair}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:RCU Update Using Per-Thread Reference-Count Pair}
+Listing~\ref{lst:app:toyrcu:RCU Update Using Per-Thread Reference-Count Pair}
 (\path{rcu_rcpl.c})
 shows the implementation of \co{synchronize_rcu()}, along with a helper
 function named \co{flip_counter_and_wait()}.
 The \co{synchronize_rcu()} function resembles that shown in
-Figure~\ref{fig:app:toyrcu:RCU Update Using Global Reference-Count Pair},
+Listing~\ref{lst:app:toyrcu:RCU Update Using Global Reference-Count Pair},
 except that the repeated counter flip is replaced by a pair of calls
 on lines~22 and 23 to the new helper function.

@@ -976,7 +976,7 @@ concurrent RCU updates.
 \section{Scalable Counter-Based RCU With Shared Grace Periods}
 \label{sec:app:toyrcu:Scalable Counter-Based RCU With Shared Grace Periods}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 DEFINE_SPINLOCK(rcu_gp_lock);
@@ -989,10 +989,10 @@ concurrent RCU updates.
 \centering
 \theverbbox
 \caption{RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update Data}
-\label{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update Data}
-\end{figure}
+\label{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update Data}
+\end{listing}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 static void rcu_read_lock(void)
@@ -1028,28 +1028,28 @@ concurrent RCU updates.
 \centering
 \theverbbox
 \caption{RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update}
-\label{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update}
-\end{figure}
+\label{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update}
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update}
 (\path{rcu_rcpls.h})
 shows the read-side primitives for an RCU implementation using per-thread
 reference count pairs, as before, but permitting updates to share
 grace periods.
 The main difference from the earlier implementation shown in
-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair}
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair}
 is that \co{rcu_idx} is now a \co{long} that counts freely,
 so that line~8 of
-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update}
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update}
 must mask off the low-order bit.
 We also switched from using \co{atomic_read()} and \co{atomic_set()}
 to using \co{READ_ONCE()}.
 The data is also quite similar, as shown in
-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update Data},
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update Data},
 with \co{rcu_idx} now being a \co{long} instead of an
 \co{atomic_t}.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 static void flip_counter_and_wait(int ctr)
@@ -1094,15 +1094,15 @@ with \co{rcu_idx} now being a \co{long} instead of an
 \centering
 \theverbbox
 \caption{RCU Shared Update Using Per-Thread Reference-Count Pair}
-\label{fig:app:toyrcu:RCU Shared Update Using Per-Thread Reference-Count Pair}
-\end{figure}
+\label{lst:app:toyrcu:RCU Shared Update Using Per-Thread Reference-Count Pair}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:RCU Shared Update Using Per-Thread Reference-Count Pair}
+Listing~\ref{lst:app:toyrcu:RCU Shared Update Using Per-Thread Reference-Count Pair}
 (\path{rcu_rcpls.c})
 shows the implementation of \co{synchronize_rcu()} and its helper
 function \co{flip_counter_and_wait()}.
 These are similar to those in
-Figure~\ref{fig:app:toyrcu:RCU Update Using Per-Thread Reference-Count Pair}.
+Listing~\ref{lst:app:toyrcu:RCU Update Using Per-Thread Reference-Count Pair}.
 The differences in \co{flip_counter_and_wait()} include:
 \begin{enumerate}
 \item	Line~6 uses \co{WRITE_ONCE()} instead of \co{atomic_set()},
@@ -1188,7 +1188,7 @@ RCU implementation being used in production in real-life applications.
 } \QuickQuizEnd

 Referring back to
-Figure~\ref{fig:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update},
+Listing~\ref{lst:app:toyrcu:RCU Read-Side Using Per-Thread Reference-Count Pair and Shared Update},
 we see that there is one global-variable access and no fewer than four
 accesses to thread-local variables.
 Given the relatively high cost of thread-local accesses on systems
@@ -1202,7 +1202,7 @@ thread-local accesses to one, as is done in the next section.
 \section{RCU Based on Free-Running Counter}
 \label{sec:app:toyrcu:RCU Based on Free-Running Counter}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 DEFINE_SPINLOCK(rcu_gp_lock);
@@ -1214,10 +1214,10 @@ thread-local accesses to one, as is done in the next section.
 \centering
 \theverbbox
 \caption{Data for Free-Running Counter Using RCU}
-\label{fig:app:toyrcu:Data for Free-Running Counter Using RCU}
-\end{figure}
+\label{lst:app:toyrcu:Data for Free-Running Counter Using RCU}
+\end{listing}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
  1 static void rcu_read_lock(void)
@@ -1257,14 +1257,14 @@ thread-local accesses to one, as is done in the next section.
 \centering
 \theverbbox
 \caption{Free-Running Counter Using RCU}
-\label{fig:app:toyrcu:Free-Running Counter Using RCU}
-\end{figure}
+\label{lst:app:toyrcu:Free-Running Counter Using RCU}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:Free-Running Counter Using RCU}
+Listing~\ref{lst:app:toyrcu:Free-Running Counter Using RCU}
 (\path{rcu.h} and \path{rcu.c})
 shows an RCU implementation based on a single global free-running counter
 that takes on only even-numbered values, with data shown in
-Figure~\ref{fig:app:toyrcu:Data for Free-Running Counter Using RCU}.
+Listing~\ref{lst:app:toyrcu:Data for Free-Running Counter Using RCU}.
 The resulting \co{rcu_read_lock()} implementation is extremely
 straightforward.
 Lines~3 and~4 simply add one to the global free-running \co{rcu_gp_ctr}
@@ -1284,7 +1284,7 @@ of \co{synchronize_rcu()} will know to ignore it.
 \QuickQuiz{}
 	If any even value is sufficient to tell \co{synchronize_rcu()}
 	to ignore a given task, why don't lines~10 and~11 of
-	Figure~\ref{fig:app:toyrcu:Free-Running Counter Using RCU}
+	Listing~\ref{lst:app:toyrcu:Free-Running Counter Using RCU}
 	simply assign zero to \co{rcu_reader_gp}?
 \QuickQuizAnswer{
 	Assigning zero (or any other even-numbered constant)
@@ -1323,7 +1323,7 @@ destruction will not be reordered into the preceding loop.

 \QuickQuiz{}
 	Why are the memory barriers on lines~19 and~31 of
-	Figure~\ref{fig:app:toyrcu:Free-Running Counter Using RCU}
+	Listing~\ref{lst:app:toyrcu:Free-Running Counter Using RCU}
 	needed?
 	Aren't the memory barriers inherent in the locking
 	primitives on lines~20 and~30 sufficient?
@@ -1349,7 +1349,7 @@ such CPUs.
 	Couldn't the update-side batching optimization described in
 	Section~\ref{sec:app:toyrcu:Scalable Counter-Based RCU With Shared Grace Periods}
 	be applied to the implementation shown in
-	Figure~\ref{fig:app:toyrcu:Free-Running Counter Using RCU}?
+	Listing~\ref{lst:app:toyrcu:Free-Running Counter Using RCU}?
 \QuickQuizAnswer{
 	Indeed it could, with a few modifications.
 	This work is left as an exercise for the reader.
@@ -1360,7 +1360,7 @@ addition to the high update-side overhead noted earlier.
 First, it is no longer permissible to nest RCU read-side critical
 sections, a topic that is taken up in the next section.
 Second, if a reader is preempted at line~3 of
-Figure~\ref{fig:app:toyrcu:Free-Running Counter Using RCU} after fetching from
+Listing~\ref{lst:app:toyrcu:Free-Running Counter Using RCU} after fetching from
 \co{rcu_gp_ctr} but before storing to \co{rcu_reader_gp},
 and if the \co{rcu_gp_ctr} counter then runs through more than half
 but less than all of its possible values, then \co{synchronize_rcu()}
@@ -1371,7 +1371,7 @@ variables.

 \QuickQuiz{}
 	Is the possibility of readers being preempted in
-	lines~3-4 of Figure~\ref{fig:app:toyrcu:Free-Running Counter Using RCU}
+	lines~3-4 of Listing~\ref{lst:app:toyrcu:Free-Running Counter Using RCU}
 	a real problem, in other words, is there a real sequence
 	of events that could lead to failure?
 	If not, why not?
@@ -1392,7 +1392,7 @@ variables.
 \section{Nestable RCU Based on Free-Running Counter}
 \label{sec:app:toyrcu:Nestable RCU Based on Free-Running Counter}

-\begin{figure}[tb]
+\begin{listing}[tb]
 { \scriptsize
 \begin{verbbox}
   1 DEFINE_SPINLOCK(rcu_gp_lock);
@@ -1406,10 +1406,10 @@ variables.
 \centering
 \theverbbox
 \caption{Data for Nestable RCU Using a Free-Running Counter}
-\label{fig:app:toyrcu:Data for Nestable RCU Using a Free-Running Counter}
-\end{figure}
+\label{lst:app:toyrcu:Data for Nestable RCU Using a Free-Running Counter}
+\end{listing}

-\begin{figure}[tb]
+\begin{listing}[tb]
 { \scriptsize
 \begin{verbbox}
  1 static void rcu_read_lock(void)
@@ -1458,16 +1458,16 @@ variables.
 \centering
 \theverbbox
 \caption{Nestable RCU Using a Free-Running Counter}
-\label{fig:app:toyrcu:Nestable RCU Using a Free-Running Counter}
-\end{figure}
+\label{lst:app:toyrcu:Nestable RCU Using a Free-Running Counter}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:Nestable RCU Using a Free-Running Counter}
+Listing~\ref{lst:app:toyrcu:Nestable RCU Using a Free-Running Counter}
 (\path{rcu_nest.h} and \path{rcu_nest.c})
 show an RCU implementation based on a single global free-running counter,
 but that permits nesting of RCU read-side critical sections.
 This nestability is accomplished by reserving the low-order bits of the
 global \co{rcu_gp_ctr} to count nesting, using the definitions shown in
-Figure~\ref{fig:app:toyrcu:Data for Nestable RCU Using a Free-Running Counter}.
+Listing~\ref{lst:app:toyrcu:Data for Nestable RCU Using a Free-Running Counter}.
 This is a generalization of the scheme in
 Section~\ref{sec:app:toyrcu:RCU Based on Free-Running Counter},
 which can be thought of as having a single low-order bit reserved
@@ -1573,7 +1573,7 @@ overhead.

 \QuickQuiz{}
 	Given the algorithm shown in
-	Figure~\ref{fig:app:toyrcu:Nestable RCU Using a Free-Running Counter},
+	Listing~\ref{lst:app:toyrcu:Nestable RCU Using a Free-Running Counter},
 	how could you double the time required to overflow the global
 	\co{rcu_gp_ctr}?
 \QuickQuizAnswer{
@@ -1585,7 +1585,7 @@ overhead.

 \QuickQuiz{}
 	Again, given the algorithm shown in
-	Figure~\ref{fig:app:toyrcu:Nestable RCU Using a Free-Running Counter},
+	Listing~\ref{lst:app:toyrcu:Nestable RCU Using a Free-Running Counter},
 	is counter overflow fatal?
 	Why or why not?
 	If it is fatal, what can be done to fix it?
@@ -1675,7 +1675,7 @@ overhead.
 \section{RCU Based on Quiescent States}
 \label{sec:app:toyrcu:RCU Based on Quiescent States}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 DEFINE_SPINLOCK(rcu_gp_lock);
@@ -1686,10 +1686,10 @@ overhead.
 \centering
 \theverbbox
 \caption{Data for Quiescent-State-Based RCU}
-\label{fig:app:toyrcu:Data for Quiescent-State-Based RCU}
-\end{figure}
+\label{lst:app:toyrcu:Data for Quiescent-State-Based RCU}
+\end{listing}

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 static void rcu_read_lock(void)
@@ -1725,15 +1725,15 @@ overhead.
 \centering
 \theverbbox
 \caption{Quiescent-State-Based RCU Read Side}
-\label{fig:app:toyrcu:Quiescent-State-Based RCU Read Side}
-\end{figure}
+\label{lst:app:toyrcu:Quiescent-State-Based RCU Read Side}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:Quiescent-State-Based RCU Read Side}
+Listing~\ref{lst:app:toyrcu:Quiescent-State-Based RCU Read Side}
 (\path{rcu_qs.h})
 shows the read-side primitives used to construct a user-level
 implementation of RCU based on quiescent states, with the data shown in
-Figure~\ref{fig:app:toyrcu:Data for Quiescent-State-Based RCU}.
-As can be seen from lines~1-7 in the figure, the \co{rcu_read_lock()}
+Listing~\ref{lst:app:toyrcu:Data for Quiescent-State-Based RCU}.
+As can be seen from lines~1-7 in the listing, the \co{rcu_read_lock()}
 and \co{rcu_read_unlock()} primitives do nothing, and can in fact
 be expected to be inlined and optimized away, as they are in
 server builds of the Linux kernel.
@@ -1741,7 +1741,7 @@ This is due to the fact that quiescent-state-based RCU implementations
 \emph{approximate} the extents of RCU read-side critical sections
 using the aforementioned quiescent states.
 Each of these quiescent states contains a call to
-\co{rcu_quiescent_state()}, which is shown from lines~9-15 in the figure.
+\co{rcu_quiescent_state()}, which is shown from lines~9-15 in the listing.
 Threads entering extended quiescent states (for example, when blocking)
 may instead call \co{rcu_thread_offline()} (lines~17-23) when entering
 an extended quiescent state and then call
@@ -1751,7 +1751,7 @@ and \co{rcu_thread_offline()} is analogous to \co{rcu_read_unlock()}.
 In addition, \co{rcu_quiescent_state()} can be thought of as a
 \co{rcu_thread_online()} immediately followed by a
 \co{rcu_thread_offline()}.\footnote{
-	Although the code in the figure is consistent with
+	Although the code in the listing is consistent with
 	\co{rcu_quiescent_state()}
 	being the same as \co{rcu_thread_online()} immediately followed by
 	\co{rcu_thread_offline()}, this relationship is obscured by
@@ -1779,7 +1779,7 @@ re-ordered with the lines~12-13.

 \QuickQuiz{}
 	Doesn't the additional memory barrier shown on line~14 of
-	Figure~\ref{fig:app:toyrcu:Quiescent-State-Based RCU Read Side}
+	Listing~\ref{lst:app:toyrcu:Quiescent-State-Based RCU Read Side}
 	greatly increase the overhead of \co{rcu_quiescent_state}?
 \QuickQuizAnswer{
 	Indeed it does!
@@ -1812,7 +1812,7 @@ ignore this thread.

 \QuickQuiz{}
 	Why are the two memory barriers on lines~19 and 22 of
-	Figure~\ref{fig:app:toyrcu:Quiescent-State-Based RCU Read Side}
+	Listing~\ref{lst:app:toyrcu:Quiescent-State-Based RCU Read Side}
 	needed?
 \QuickQuizAnswer{
 	The memory barrier on line~19 prevents any RCU read-side
@@ -1828,7 +1828,7 @@ The \co{rcu_thread_online()} function simply invokes
 \co{rcu_quiescent_state()}, thus marking the end of the extended
 quiescent state.

-\begin{figure}[tbp]
+\begin{listing}[tbp]
 { \scriptsize
 \begin{verbbox}
   1 void synchronize_rcu(void)
@@ -1854,10 +1854,10 @@ quiescent state.
 \centering
 \theverbbox
 \caption{RCU Update Side Using Quiescent States}
-\label{fig:app:toyrcu:RCU Update Side Using Quiescent States}
-\end{figure}
+\label{lst:app:toyrcu:RCU Update Side Using Quiescent States}
+\end{listing}

-Figure~\ref{fig:app:toyrcu:RCU Update Side Using Quiescent States}
+Listing~\ref{lst:app:toyrcu:RCU Update Side Using Quiescent States}
 (\path{rcu_qs.c})
 shows the implementation of \co{synchronize_rcu()}, which is
 quite similar to that of the preceding sections.
@@ -1895,8 +1895,8 @@ certain types of library functions.
 	Why would the fact that the code is in a library make
 	any difference for how easy it is to use the RCU
 	implementation shown in
-	Figures~\ref{fig:app:toyrcu:Quiescent-State-Based RCU Read Side} and
-	\ref{fig:app:toyrcu:RCU Update Side Using Quiescent States}?
+	Listings~\ref{lst:app:toyrcu:Quiescent-State-Based RCU Read Side} and
+	\ref{lst:app:toyrcu:RCU Update Side Using Quiescent States}?
 \QuickQuizAnswer{
 	A library function has absolutely no control over the caller,
 	and thus cannot force the caller to invoke \co{rcu_quiescent_state()}
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future)
  2017-10-13 22:57 [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future) Akira Yokosawa
  2017-10-13 22:58 ` [PATCH 2/2] Convert code snippets to 'listing' env (appendix) Akira Yokosawa
@ 2017-10-13 23:37 ` Paul E. McKenney
  1 sibling, 0 replies; 3+ messages in thread
From: Paul E. McKenney @ 2017-10-13 23:37 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: perfbook

On Sat, Oct 14, 2017 at 07:57:24AM +0900, Akira Yokosawa wrote:
> >From c8849b06b1563d9d82dfaf0f120932c4998c8b03 Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@gmail.com>
> Date: Sat, 14 Oct 2017 00:12:52 +0900
> Subject: [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future)
> 
> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

These also look good, applied and pushed, thank you!

							Thanx, Paul

> ---
>  advsync/advsync.tex   | 10 ++++-----
>  future/htm.tex        | 16 +++++++--------
>  rt/rt.tex             | 34 +++++++++++++++----------------
>  together/applyrcu.tex | 56 +++++++++++++++++++++++++--------------------------
>  together/refcnt.tex   | 38 +++++++++++++++++-----------------
>  5 files changed, 77 insertions(+), 77 deletions(-)
> 
> diff --git a/advsync/advsync.tex b/advsync/advsync.tex
> index adf1dc9..3b04aff 100644
> --- a/advsync/advsync.tex
> +++ b/advsync/advsync.tex
> @@ -188,7 +188,7 @@ algorithm as wait-free.
>  This algorithm is probably the most heavily used NBS algorithm in
>  the Linux kernel.
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>   1 static inline bool
> @@ -217,14 +217,14 @@ the Linux kernel.
>  \centering
>  \theverbbox
>  \caption{NBS Enqueue Algorithm}
> -\label{fig:count:NBS Enqueue Algorithm}
> -\end{figure}
> +\label{lst:count:NBS Enqueue Algorithm}
> +\end{listing}
> 
>  Another common NBS algorithm is the atomic queue where elements are
>  enqueued using an atomic exchange instruction~\cite{MagedMichael1993JPDC},
>  followed by a store into the \co{->next} pointer of the new element's
>  predecessor, as shown in
> -Figure~\ref{fig:count:NBS Enqueue Algorithm},
> +Listing~\ref{lst:count:NBS Enqueue Algorithm},
>  which shows the userspace-RCU library
>  implementation~\cite{MathieuDesnoyers2009URCU}.
>  Line~9 updates the tail pointer to reference the new element while
> @@ -240,7 +240,7 @@ Although mutual exclusion is required to dequeue a single element
>  removal of the entire contents of the queue.
>  What is not possible is to dequeue any given element in a non-blocking
>  manner: The enqueuer might have failed between lines~9 and~10 of the
> -figure, so that the element in question is only partially enqueued.
> +listing, so that the element in question is only partially enqueued.
>  This results in a half-NBS algorithm where enqueues are NBS but
>  dequeues are blocking.
>  This algorithm is nevertheless used in practice, in part because
> diff --git a/future/htm.tex b/future/htm.tex
> index 80d3e1e..7c9f610 100644
> --- a/future/htm.tex
> +++ b/future/htm.tex
> @@ -733,7 +733,7 @@ Therefore, the medium-priority threads are in effect blocking the
>  high-priority process, which is the rationale for the name ``priority
>  inversion.''
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>    1 void boostee(void)
> @@ -765,8 +765,8 @@ inversion.''
>  \centering
>  \theverbbox
>  \caption{Exploiting Priority Boosting}
> -\label{fig:future:Exploiting Priority Boosting}
> -\end{figure}
> +\label{lst:future:Exploiting Priority Boosting}
> +\end{listing}
> 
>  One way to avoid priority inversion is \emph{priority inheritance},
>  in which a high-priority thread blocked on a lock temporarily donates
> @@ -774,10 +774,10 @@ its priority to the lock's holder, which is also called \emph{priority
>  boosting}.
>  However, priority boosting can be used for things other than avoiding
>  priority inversion, as shown in
> -Figure~\ref{fig:future:Exploiting Priority Boosting}.
> -Lines~1-12 of this figure show a low-priority process that must
> +Listing~\ref{lst:future:Exploiting Priority Boosting}.
> +Lines~1-12 of this listing show a low-priority process that must
>  nevertheless run every millisecond or so, while lines~14-24 of
> -this same figure show a high-priority process that uses priority
> +this same listing show a high-priority process that uses priority
>  boosting to ensure that \co{boostee()} runs periodically as needed.
> 
>  The \co{boostee()} function arranges this by always holding one of
> @@ -786,7 +786,7 @@ the two \co{boost_lock[]} locks, so that lines~20-21 of
> 
>  \QuickQuiz{}
>  	But the \co{boostee()} function in
> -	Figure~\ref{fig:future:Exploiting Priority Boosting}
> +	Listing~\ref{lst:future:Exploiting Priority Boosting}
>  	alternatively acquires its locks in reverse order!
>  	Won't this result in deadlock?
>  \QuickQuizAnswer{
> @@ -816,7 +816,7 @@ this thread might never again get a chance to run.
> 
>  And if the \co{boostee()} thread is not holding the lock, then
>  the \co{booster()} thread's empty critical section on lines~20 and~21 of
> -Figure~\ref{fig:future:Exploiting Priority Boosting}
> +Listing~\ref{lst:future:Exploiting Priority Boosting}
>  will become an empty transaction that has no effect, so that
>  \co{boostee()} never runs.
>  This example illustrates some of the subtle consequences of
> diff --git a/rt/rt.tex b/rt/rt.tex
> index f1c0ae1..d9c32aa 100644
> --- a/rt/rt.tex
> +++ b/rt/rt.tex
> @@ -1150,7 +1150,7 @@ sections~\cite{DinakarGuniguntala2008IBMSysJ}.
>  Otherwise, long RCU read-side critical sections would result in
>  excessive real-time latencies.
> 
> -\begin{figure}[tb]
> +\begin{listing}[tb]
>  { \scriptsize
>  \begin{verbbox}
>   1 void __rcu_read_lock(void)
> @@ -1180,8 +1180,8 @@ excessive real-time latencies.
>  \centering
>  \theverbbox
>  \caption{Preemptible Linux-Kernel RCU}
> -\label{fig:rt:Preemptible Linux-Kernel RCU}
> -\end{figure}
> +\label{lst:rt:Preemptible Linux-Kernel RCU}
> +\end{listing}
> 
>  A preemptible RCU implementation was therefore added to the Linux kernel.
>  This implementation avoids the need to individually track the state of
> @@ -1194,7 +1194,7 @@ of the current grace period and
>  while in one of those pre-existing critical sections have removed
>  themselves from their lists.
>  A simplified version of this implementation is shown in
> -Figure~\ref{fig:rt:Preemptible Linux-Kernel RCU}.
> +Listing~\ref{lst:rt:Preemptible Linux-Kernel RCU}.
>  The \co{__rcu_read_lock()} function spans lines~1-5 and
>  the \co{__rcu_read_unlock()} function spans lines~7-22.
> 
> @@ -1241,7 +1241,7 @@ count on line~20.
>  \QuickQuiz{}
>  	Suppose that preemption occurs just after the load from
>  	\co{t->rcu_read_unlock_special.s} on line~17 of
> -	Figure~\ref{fig:rt:Preemptible Linux-Kernel RCU}.
> +	Listing~\ref{lst:rt:Preemptible Linux-Kernel RCU}.
>  	Mightn't that result in the task failing to invoke
>  	\co{rcu_read_unlock_special()}, thus failing to remove itself
>  	from the list of tasks blocking the current grace period,
> @@ -1460,7 +1460,7 @@ real-time use.
>  OSADL runs long-term tests of systems, so referring to their
>  website (\url{http://osadl.org/}) can be helpful.
> 
> -\begin{figure}[tb]
> +\begin{listing}[tb]
>  { \scriptsize
>  \begin{verbbox}
>   1 cd /sys/kernel/debug/tracing
> @@ -1473,8 +1473,8 @@ website (\url{http://osadl.org/}) can be helpful.
>  \centering
>  \theverbbox
>  \caption{Locating Sources of OS Jitter}
> -\label{fig:rt:Locating Sources of OS Jitter}
> -\end{figure}
> +\label{lst:rt:Locating Sources of OS Jitter}
> +\end{listing}
> 
>  Unfortunately, this list of OS-jitter sources can never be complete,
>  as it will change with each new version of the kernel.
> @@ -1482,7 +1482,7 @@ This makes it necessary to be able to track down additional sources
>  of OS jitter.
>  Given a CPU $N$ running a CPU-bound usermode thread, the
>  commands shown in
> -Figure~\ref{fig:rt:Locating Sources of OS Jitter}
> +Listing~\ref{lst:rt:Locating Sources of OS Jitter}
>  will produce a list of all the times that this CPU entered the kernel.
>  Of course, the \co{N} on line~5 must be replaced with the
>  number of the CPU in question, and the \co{1} on line~2 may be increased
> @@ -1704,7 +1704,7 @@ or about 9,000 degrees of rotation per second, which translates to
>  We therefore need to schedule the fuel injection to within a time
>  interval of about 100 microseconds.
> 
> -\begin{figure}[tb]
> +\begin{listing}[tb]
>  { \scriptsize
>  \begin{verbbox}
>   1 if (clock_gettime(CLOCK_REALTIME, &timestart) != 0) {
> @@ -1724,15 +1724,15 @@ interval of about 100 microseconds.
>  \centering
>  \theverbbox
>  \caption{Timed-Wait Test Program}
> -\label{fig:rt:Timed-Wait Test Program}
> -\end{figure}
> +\label{lst:rt:Timed-Wait Test Program}
> +\end{listing}
> 
>  Suppose that a timed wait was to be used to initiate fuel injection,
>  although if you are building an engine, I hope you supply a rotation
>  sensor.
>  We need to test the timed-wait functionality, perhaps using the test program
>  shown in
> -Figure~\ref{fig:rt:Timed-Wait Test Program}.
> +Listing~\ref{lst:rt:Timed-Wait Test Program}.
>  Unfortunately, if we run this program, we can get unacceptable timer
>  jitter, even in a -rt kernel.
> 
> @@ -1789,7 +1789,7 @@ in fields imaginatively named \co{a}, \co{b}, and \co{c}.
>  Otherwise, \co{cur_cal} points to a dynamically allocated
>  structure providing the current calibration values.
> 
> -\begin{figure}[tb]
> +\begin{listing}[tb]
>  { \scriptsize
>  \begin{verbbox}
>   1 struct calibration {
> @@ -1832,10 +1832,10 @@ structure providing the current calibration values.
>  \centering
>  \theverbbox
>  \caption{Real-Time Calibration Using RCU}
> -\label{fig:rt:Real-Time Calibration Using RCU}
> -\end{figure}
> +\label{lst:rt:Real-Time Calibration Using RCU}
> +\end{listing}
> 
> -Figure~\ref{fig:rt:Real-Time Calibration Using RCU}
> +Listing~\ref{lst:rt:Real-Time Calibration Using RCU}
>  shows how RCU can be used to solve this problem.
>  Lookups are deterministic, as shown in \co{calc_control()}
>  on lines~9-15, consistent with real-time requirements.
> diff --git a/together/applyrcu.tex b/together/applyrcu.tex
> index d477ab5..5e7efff 100644
> --- a/together/applyrcu.tex
> +++ b/together/applyrcu.tex
> @@ -108,7 +108,7 @@ held constant, ensuring that \co{read_count()} sees consistent data.
> 
>  \subsubsection{Implementation}
> 
> -\begin{figure}[bp]
> +\begin{listing}[bp]
>  { \scriptsize
>  \begin{verbbox}
>    1 struct countarray {
> @@ -186,11 +186,11 @@ held constant, ensuring that \co{read_count()} sees consistent data.
>  \centering
>  \theverbbox
>  \caption{RCU and Per-Thread Statistical Counters}
> -\label{fig:together:RCU and Per-Thread Statistical Counters}
> -\end{figure}
> +\label{lst:together:RCU and Per-Thread Statistical Counters}
> +\end{listing}
> 
>  Lines~1-4 of
> -Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}
> +Listing~\ref{lst:together:RCU and Per-Thread Statistical Counters}
>  show the \co{countarray} structure, which contains a
>  \co{->total} field for the count from previously exited threads,
>  and a \co{counterp[]} array of pointers to the per-thread
> @@ -235,7 +235,7 @@ Line~43 picks up the current thread's index, line~45 acquires
>  \QuickQuiz{}
>  	Hey!!!
>  	Line~46 of
> -	Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}
> +	Listing~\ref{lst:together:RCU and Per-Thread Statistical Counters}
>  	modifies a value in a pre-existing \co{countarray} structure!
>  	Didn't you say that this structure, once made available to
>  	\co{read_count()}, remained constant???
> @@ -278,7 +278,7 @@ Line~69 can then safely free the old \co{countarray} structure.
> 
>  \QuickQuiz{}
>  	Wow!
> -	Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}
> +	Listing~\ref{lst:together:RCU and Per-Thread Statistical Counters}
>  	contains 69 lines of code, compared to only 42 in
>  	Listing~\ref{lst:count:Per-Thread Statistical Counters}.
>  	Is this extra complexity really worth it?
> @@ -303,7 +303,7 @@ Line~69 can then safely free the old \co{countarray} structure.
>  	Listing~\ref{lst:count:Per-Thread Statistical Counters}, but
>  	with all the scalability and performance benefits of the
>  	implementation shown in
> -	Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}!
> +	Listing~\ref{lst:together:RCU and Per-Thread Statistical Counters}!
>  } \QuickQuizEnd
> 
>  Use of RCU enables exiting threads to wait until other threads are
> @@ -389,7 +389,7 @@ tradeoff.
>  \subsection{Array and Length}
>  \label{sec:together:Array and Length}
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>   1 struct foo {
> @@ -401,11 +401,11 @@ tradeoff.
>  \centering
>  \theverbbox
>  \caption{RCU-Protected Variable-Length Array}
> -\label{fig:together:RCU-Protected Variable-Length Array}
> -\end{figure}
> +\label{lst:together:RCU-Protected Variable-Length Array}
> +\end{listing}
> 
>  Suppose we have an RCU-protected variable-length array, as shown in
> -Figure~\ref{fig:together:RCU-Protected Variable-Length Array}.
> +Listing~\ref{lst:together:RCU-Protected Variable-Length Array}.
>  The length of the array \co{->a[]} can change dynamically, and at any
>  given time, its length is given by the field \co{->length}.
>  Of course, this introduces the following race condition:
> @@ -429,7 +429,7 @@ covered in Chapter~\ref{chp:memorder:Memory Ordering}.
>  This works, but incurs read-side overhead and, perhaps worse, requires
>  use of explicit memory barriers.
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>   1 struct foo_a {
> @@ -445,12 +445,12 @@ use of explicit memory barriers.
>  \centering
>  \theverbbox
>  \caption{Improved RCU-Protected Variable-Length Array}
> -\label{fig:together:Improved RCU-Protected Variable-Length Array}
> -\end{figure}
> +\label{lst:together:Improved RCU-Protected Variable-Length Array}
> +\end{listing}
> 
>  A better approach is to put the value and the array into the same structure,
>  as shown in
> -Figure~\ref{fig:together:Improved RCU-Protected Variable-Length Array}.
> +Listing~\ref{lst:together:Improved RCU-Protected Variable-Length Array}.
>  Allocating a new array (\co{foo_a} structure) then automatically provides
>  a new place for the array length.
>  This means that if any CPU picks up a reference to \co{->fa}, it is
> @@ -480,7 +480,7 @@ A more general version of this approach is presented in the next section.
>  \subsection{Correlated Fields}
>  \label{sec:together:Correlated Fields}
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>   1 struct animal {
> @@ -496,12 +496,12 @@ A more general version of this approach is presented in the next section.
>  \centering
>  \theverbbox
>  \caption{Uncorrelated Measurement Fields}
> -\label{fig:together:Uncorrelated Measurement Fields}
> -\end{figure}
> +\label{lst:together:Uncorrelated Measurement Fields}
> +\end{listing}
> 
>  Suppose that each of Sch\"odinger's animals is represented by the
>  data element shown in
> -Figure~\ref{fig:together:Uncorrelated Measurement Fields}.
> +Listing~\ref{lst:together:Uncorrelated Measurement Fields}.
>  The \co{meas_1}, \co{meas_2}, and \co{meas_3} fields are a set
>  of correlated measurements that are updated periodically.
>  It is critically important that readers see these three values from
> @@ -521,7 +521,7 @@ measurement values, but it requires copying a large structure due
>  to the \co{->photo[]} field.
>  This copying might incur unacceptably large overhead.
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>   1 struct measurement {
> @@ -541,11 +541,11 @@ This copying might incur unacceptably large overhead.
>  \centering
>  \theverbbox
>  \caption{Correlated Measurement Fields}
> -\label{fig:together:Correlated Measurement Fields}
> -\end{figure}
> +\label{lst:together:Correlated Measurement Fields}
> +\end{listing}
> 
>  Another approach is to insert a level of indirection, as shown in
> -Figure~\ref{fig:together:Correlated Measurement Fields}.
> +Listing~\ref{lst:together:Correlated Measurement Fields}.
>  When a new measurement is taken, a new \co{measurement} structure
>  is allocated, filled in with the measurements, and the \co{animal}
>  structure's \co{->mp} field is updated to point to this new
> @@ -555,13 +555,13 @@ can be freed.
> 
>  \QuickQuiz{}
>  	But cant't the approach shown in
> -	Figure~\ref{fig:together:Correlated Measurement Fields}
> +	Listing~\ref{lst:together:Correlated Measurement Fields}
>  	result in extra cache misses, in turn resulting in additional
>  	read-side overhead?
>  \QuickQuizAnswer{
>  	Indeed it can.
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>   1 struct measurement {
> @@ -582,11 +582,11 @@ can be freed.
>  \centering
>  \theverbbox
>  \caption{Localized Correlated Measurement Fields}
> -\label{fig:together:Localized Correlated Measurement Fields}
> -\end{figure}
> +\label{lst:together:Localized Correlated Measurement Fields}
> +\end{listing}
> 
>  	One way to avoid this cache-miss overhead is shown in
> -	Figure~\ref{fig:together:Localized Correlated Measurement Fields}:
> +	Listing~\ref{lst:together:Localized Correlated Measurement Fields}:
>  	Simply embed an instance of a \co{measurement} structure
>  	named \co{meas}
>  	into the \co{animal} structure, and point the \co{->mp}
> diff --git a/together/refcnt.tex b/together/refcnt.tex
> index 626c375..8a7fa1d 100644
> --- a/together/refcnt.tex
> +++ b/together/refcnt.tex
> @@ -149,12 +149,12 @@ of compiler optimizations.
>  This is the method of choice when the lock is required to protect
>  other operations in addition to the reference count, but where
>  a reference to the object must be held after the lock is released.
> -Figure~\ref{fig:together:Simple Reference-Count API} shows a simple
> +Listing~\ref{lst:together:Simple Reference-Count API} shows a simple
>  API that might be used to implement simple non-atomic reference
>  counting---although simple reference counting is almost always
>  open-coded instead.
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>    1 struct sref {
> @@ -188,8 +188,8 @@ open-coded instead.
>  \centering
>  \theverbbox
>  \caption{Simple Reference-Count API}
> -\label{fig:together:Simple Reference-Count API}
> -\end{figure}
> +\label{lst:together:Simple Reference-Count API}
> +\end{listing}
> 
>  \subsubsection{Atomic Counting}
>  \label{sec:together:Atomic Counting}
> @@ -203,7 +203,7 @@ Any CPU that hands the object off must first acquire a new reference
>  on behalf of the recipient object.
>  In the Linux kernel, the \co{kref} primitives are used to implement
>  this style of reference counting, as shown in
> -Figure~\ref{fig:together:Linux Kernel kref API}.
> +Listing~\ref{lst:together:Linux Kernel kref API}.
> 
>  Atomic counting is required
>  because locking is not used to protect all reference-count operations,
> @@ -243,10 +243,10 @@ RCU read-side critical sections.
>  	there cannot possibly be any CPU that is permitted
>  	to acquire a new reference.
>  	This same fact allows the non-atomic check in line~22
> -	of Figure~\ref{fig:together:Linux Kernel kref API}.
> +	of Listing~\ref{lst:together:Linux Kernel kref API}.
>  } \QuickQuizEnd
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>    1 struct kref {
> @@ -282,12 +282,12 @@ RCU read-side critical sections.
>  \centering
>  \theverbbox
>  \caption{Linux Kernel \tco{kref} API}
> -\label{fig:together:Linux Kernel kref API}
> -\end{figure}
> +\label{lst:together:Linux Kernel kref API}
> +\end{listing}
> 
>  The \co{kref} structure itself, consisting of a single atomic
>  data item, is shown in lines~1-3 of
> -Figure~\ref{fig:together:Linux Kernel kref API}.
> +Listing~\ref{lst:together:Linux Kernel kref API}.
>  The \co{kref_init()} function on lines~5-8 initializes the counter
>  to the value ``1''.
>  Note that the \co{atomic_set()} primitive is a simple
> @@ -314,7 +314,7 @@ Otherwise, \co{kref_sub()} returns zero, informing the caller that
>  \QuickQuiz{}
>  	Suppose that just after the \co{atomic_sub_and_test()}
>  	on line~22 of
> -	Figure~\ref{fig:together:Linux Kernel kref API} is invoked,
> +	Listing~\ref{lst:together:Linux Kernel kref API} is invoked,
>  	that some other CPU invokes \co{kref_get()}.
>  	Doesn't this result in that other CPU now having an illegal
>  	reference to a released object?
> @@ -357,9 +357,9 @@ layer to track the destination caches that are used in packet routing.
>  The actual implementation is quite a bit more involved; this section
>  focuses on the aspects of \co{struct dst_entry} reference-count
>  handling that matches this use case,
> -shown in Figure~\ref{fig:together:Linux Kernel dst-clone API}.
> +shown in Listing~\ref{lst:together:Linux Kernel dst-clone API}.
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \scriptsize
>  \begin{verbbox}
>    1 static inline
> @@ -384,8 +384,8 @@ shown in Figure~\ref{fig:together:Linux Kernel dst-clone API}.
>  \centering
>  \theverbbox
>  \caption{Linux Kernel \tco{dst_clone} API}
> -\label{fig:together:Linux Kernel dst-clone API}
> -\end{figure}
> +\label{lst:together:Linux Kernel dst-clone API}
> +\end{listing}
> 
>  The \co{dst_clone()} primitive may be used if the caller
>  already has a reference to the specified \co{dst_entry},
> @@ -443,9 +443,9 @@ as shown below.
>  The Linux kernel's \co{fget()} and \co{fput()} primitives
>  use this style of reference counting.
>  Simplified versions of these functions are shown in
> -Figure~\ref{fig:together:Linux Kernel fget/fput API}.
> +Listing~\ref{lst:together:Linux Kernel fget/fput API}.
> 
> -\begin{figure}[tbp]
> +\begin{listing}[tbp]
>  { \fontsize{6.5pt}{7.5pt}\selectfont
>  \begin{verbbox}
>    1 struct file *fget(unsigned int fd)
> @@ -494,8 +494,8 @@ Figure~\ref{fig:together:Linux Kernel fget/fput API}.
>  \centering
>  \theverbbox
>  \caption{Linux Kernel \tco{fget}/\tco{fput} API}
> -\label{fig:together:Linux Kernel fget/fput API}
> -\end{figure}
> +\label{lst:together:Linux Kernel fget/fput API}
> +\end{listing}
> 
>  Line~4 of \co{fget()} fetches the pointer to the current
>  process's file-descriptor table, which might well be shared
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-10-13 23:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-13 22:57 [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future) Akira Yokosawa
2017-10-13 22:58 ` [PATCH 2/2] Convert code snippets to 'listing' env (appendix) Akira Yokosawa
2017-10-13 23:37 ` [PATCH 1/2] Convert code snippets to 'listing' env (together, advsync, rt, future) Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox