[PATCH bpf-next v1 00/22] Resilient Queued Spin Lock

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
@ 2025-01-07 13:59 Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 01/22] locking: Move MCS struct definition to public header Kumar Kartikeya Dwivedi
                   ` (22 more replies)
  0 siblings, 23 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

This patch set introduces Resilient Queued Spin Lock (or rqspinlock with
res_spin_lock() and res_spin_unlock() APIs).

This is a qspinlock variant which recovers the kernel from a stalled
state when the lock acquisition path cannot make forward progress. This
can occur when a lock acquisition attempt enters a deadlock situation
(e.g. AA, or ABBA), or more generally, when the owner of the lock (which
we’re trying to acquire) isn’t making forward progress.

The cover letter provides an overview of the motivation, design, and
alternative approaches. We then provide evaluation numbers showcasing
that while rqspinlock incurs overhead, the performance of rqspinlock
approaches that of the normal qspinlock used by the kernel.

The evaluations for rqspinlock were performed by replacing the default
qspinlock implementation with it and booting the kernel to run the
experiments. Support for locktorture is also included with numbers in
this series.

The cover letter's design section provides an overview of the
algorithmic approach. A technical document describing the implementation
in more detail is available here:
https://github.com/kkdwivedi/rqspinlock/blob/main/rqspinlock.pdf

We have a WIP TLA+ proof for liveness and mutual exlcusion of rqspinlock
built on top of the qspinlock TLA+ proof from Catalin Marinas [3]. We
will share more details and the links in the near future.

Motivation
—---------

In regular kernel code, usage of locks is assumed to be correct, so as
to avoid deadlocks and stalls by construction, however, the same is not
true for BPF programs. Users write normal C code and the in-kernel eBPF
runtime ensures the safety of the kernel by rejecting unsafe programs.
Users can upload programs that use locks in an improper fashion, and may
cause deadlocks when these programs run inside the kernel. The verifier
is responsible for rejecting such programs from being loaded into the
kernel.

Until now, the eBPF verifier ensured deadlock safety by only permitting
one lock acquisition at a time, and by preventing any functions to be
called from within the critical section. Additionally, only a few
restricted program types are allowed to call spin locks. As the usage of
eBPF grows (e.g. with sched_ext) beyond its conventional application in
networking, tracing, and security, the limitations on locking are
becoming a bottleneck for users.

The rqspinlock implementation allows us to permit more flexible locking
patterns in BPF programs, without limiting them to the subset that can
be proven safe statically (which is fairly small, and requires complex
static analysis), while ensuring that the kernel will recover in case we
encounter a locking violation at runtime. We make a tradeoff here by
accepting programs that may potentially have deadlocks, and recover the
kernel quickly at runtime to ensure availability.

Additionally, eBPF programs attached to different parts of the kernel
can introduce new control flow into the kernel, which increases the
likelihood of deadlocks in code not written to handle reentrancy. There
have been multiple syzbot reports surfacing deadlocks in internal kernel
code due to the diverse ways in which eBPF programs can be attached to
different parts of the kernel.  By switching the BPF subsystem’s lock
usage to rqspinlock, all of these issues can be mitigated at runtime.

This spin lock implementation allows BPF maps to become safer and remove
mechanisms that have fallen short in assuring safety when nesting
programs in arbitrary ways in the same context or across different
contexts. The red diffs due to patches 16-18 demonstrate this
simplification.

>  kernel/bpf/hashtab.c         | 102 ++++++++++++++++++++++++++++++++--------------------------...
>  kernel/bpf/lpm_trie.c        |  25 ++++++++++++++-----------
>  kernel/bpf/percpu_freelist.c | 113 +++++++++++++++++++++++++---------------------------------...
>  kernel/bpf/percpu_freelist.h |   4 ++--
>  4 files changed, 73 insertions(+), 171 deletions(-)

Design
—-----

Deadlocks mostly manifest as stalls in the waiting loops of the
qspinlock slow path. Thus, using stalls as a signal for deadlocks avoids
introducing cost to the normal fast path, and ensures bounded
termination of the waiting loop. Our recovery algorithm is focused on
terminating the waiting loops of the qspinlock algorithm when it gets
stuck, and implementing bespoke recovery procedures for each class of
waiter to restore the lock to a usable state. Deadlock detection is the
main mechanism used to provide faster recovery, with the timeout
mechanism acting as a final line of defense.

Deadlock Detection
~~~~~~~~~~~~~~~~~~
We handle two cases of deadlocks: AA deadlocks (attempts to acquire the
same lock again), and ABBA deadlocks (attempts to acquire two locks in
the opposite order from two distinct threads). Variants of ABBA
deadlocks may be encountered with more than two locks being held in the
incorrect order. These are not diagnosed explicitly, as they reduce to
ABBA deadlocks.

Deadlock detection is triggered immediately when beginning the waiting
loop of a lock slow path.

While timeouts ensure that any waiting loops in the locking slow path
terminate and return to the caller, it can be excessively long in some
situations. While the default timeout is short (0.5s), a stall for this
duration inside the kernel can set off alerts for latency-critical
services with strict SLOs.  Ideally, the kernel should recover from an
undesired state of the lock as soon as possible.

A multi-step strategy is used to recover the kernel from waiting loops
in the locking algorithm which may fail to terminate in a bounded amount
of time.

 * Each CPU maintains a table of held locks. Entries are inserted and
   removed upon entry into lock, and exit from unlock, respectively.
 * Deadlock detection for AA locks is thus simple: we have an AA
   deadlock if we find a held lock entry for the lock we’re attempting
   to acquire on the same CPU.
 * During deadlock detection for ABBA, we search through the tables of
   all other CPUs to find situations where we are holding a lock the
   remote CPU is attempting to acquire, and they are holding a lock we
   are attempting to acquire. Upon encountering such a condition, we
   report an ABBA deadlock.
 * We divide the duration between entry time point into the waiting loop
   and the timeout time point into intervals of 1 ms, and perform
   deadlock detection until timeout happens. Upon entry into the slow
   path, and then completion of each 1 ms interval, we perform detection
   of both AA and ABBA deadlocks. In the event that deadlock detection
   yields a positive result, the recovery happens sooner than the
   timeout.  Otherwise, it happens as a last resort upon completion of
   the timeout.

Timeouts
~~~~~~~~
Timeouts act as final line of defense against stalls for waiting loops.
The ‘ktime_get_mono_fast_ns’ function is used to poll for the current
time, and it is compared to the timestamp indicating the end time in the
waiter loop. Each waiting loop is instrumented to check an extra
condition using a macro. Internally, the macro implementation amortizes
the checking of the timeout to avoid sampling the clock in every
iteration.  Precisely, the timeout checks are invoked every 64k
iterations.

Recovery
~~~~~~~~
There is extensive literature in academia on designing locks that
support timeouts [0][1], as timeouts can be used as a proxy for
detecting the presence of deadlocks and recovering from them, without
maintaining explicit metadata to construct a waits-for relationship
between two threads at runtime.

In case of rqspinlock, the key simplification in our algorithm comes
from the fact that upon a timeout, waiters always leave the queue in
FIFO order.  As such, the timeout is only enforced by the head of the
wait queue, while other waiters rely on the head to signal them when a
timeout has occurred and when they need to exit. We don’t have to
implement complex algorithms and do not need extra synchronization for
waiters in the middle of the queue timing out before their predecessor
or successor, unlike previous approaches [0][1].

There are three forms of waiters in the original queued spin lock
algorithm.  The first is the waiter which acquires the pending bit and
spins on the lock word without forming a wait queue. The second is the
head waiter that is the first waiter heading the wait queue. The third
form is of all the non-head waiters queued behind the head, waiting to
be signalled through their MCS node to overtake the responsibility of
the head.

In rqspinlock's recovery algorithm, we are concerned with the second and
third kind. First, we augment the waiting loop of the head of the wait
queue with a timeout. When this timeout happens, all waiters part of the
wait queue will abort their lock acquisition attempts. This happens in
three steps.

 * First, the head breaks out of its loop waiting for pending and locked
   bits to turn to 0, and non-head waiters break out of their MCS node
   spin (more on that later).
 * Next, every waiter (head or non-head) attempts to check whether they
   are also the tail waiter, in such a case they attempt to zero out the
   tail word and allow a new queue to be built up for this lock. If they
   succeed, they have no one to signal next in the queue to stop
   spinning.
 * Otherwise, they signal the MCS node of the next waiter to break out
   of its spin and try resetting the tail word back to 0. This goes on
   until the tail waiter is found. In case of races, the new tail will
   be responsible for performing the same task, as the old tail will
   then fail to reset the tail word and wait for its next pointer to be
   updated before it signals the new tail to do the same.

Timeout Bound
~~~~~~~~~~~~~
The timeout is applied by two types of waiters: the pending bit waiter
and the wait queue head waiter. As such, for the pending waiter, only
the lock owner is ahead of it, and for the wait queue head waiter, only
the lock owner and the pending waiter take precedence in executing their
critical sections.

Therefore, the timeout value must span at most 2 critical section
lengths, and thus, it is unaffected by the amount of contention or the
number of CPUs on the host. Non-head waiters simply wait for the wait
queue head to signal them on a timeout.

In Meta's production, we have noticed uncore PMU reads and SMIs
consuming tens of msecs. While these events are rare, a 0.5 second
timeout should absorb such tail events and not raise false alarms for
timeouts. We will continue monitoring this in production and adjust the
timeout if necessary in the future.

More details of the recovery algorithm is described in patch 9 and a
detailed description is available at [2].

Alternatives
—-----------

Lockdep: We do not rely on the lockdep facility for reporting violations
for primarily two reasons:

* Overhead: The lockdep infrastructure can add significant overhead to
  the lock acquisition path, and is not recommended for use in
  production due to this reason. While the report is more useful and
  exhaustive, the overhead can be prohibitive, especially as BPF
  programs run in hot paths of the kernel.  Moreover, it also increases
  the size of the lock word to store extra metadata, which is not
  feasible for BPF spin locks that are 4-bytes in size today (similar to
  qspinlock).

* Debug Tool: Lockdep is intended to be used as a debugging facility,
  providing extra context to the user about the locking violations
  occurring during runtime. It is always turned off on all production
  kernels, therefore isn’t available most of the time.

We require a mechanism for detecting common variants of deadlocks that
is always available in production kernels and never turned off. At the
same time, it must not introduce overhead in terms of time (for the slow
path) and memory (for the lock word size).

Evaluation
—---------

We run benchmarks that stress locking scalability and perform comparison
against the baseline (qspinlock). For the rqspinlock case, we replace
the default qspinlock with it in the kernel, such that all spin locks in
the kernel use the rqspinlock slow path. As such, benchmarks that stress
kernel spin locks end up exercising rqspinlock.

Evaluation setup
~~~~~~~~~~~~~~~~

Dual-socket Intel Xeon Platinum 8468 (Sapphire Rapids) machine.
48 cores per socket, 2 threads per core.
Hyperthreading enabled, CPU governor set to performance. NUMA boundary
crossed after 48 cores.  SMT siblings from 96-191 threads (first 48
assigned paired with NUMA node 0 cores, etc.).

The locktorture experiment is run for 30 seconds.
Average of 25 runs is used for will-it-scale.

Legend:
 QL - qspinlock (avg. throughput)
 RQL - rqspinlock (avg. throughput)

Results
~~~~~~~

locktorture

Threads QL		RQL		Speedup
-----------------------------------------------
1	46910437	45057327	0.96
2	29871063	25085034	0.84
4	13876024	19242776	1.39
8	14638499	13346847	0.91
16	14380506	14104716	0.98
24	17278144	15293077	0.89
32	19494283	17826675	0.91
40	27760955	21002910	0.76
48	28638897	26432549	0.92
56	29336194	26512029	0.9
64	30040731	27421403	0.91
72	29523599	27010618	0.91
80	28846738	27885141	0.97
88	29277418	25963753	0.89
96	28472339	27423865	0.96
104	28093317	26634895	0.95
112	29914000	27872339	0.93
120	29199580	26682695	0.91
128	27755880	27314662	0.98
136	30349095	27092211	0.89
144	29193933	27805445	0.95
152	28956663	26071497	0.9
160	28950009	28183864	0.97
168	29383520	28135091	0.96
176	28475883	27549601	0.97
184	31958138	28602434	0.89
192	31342633	33394385	1.07

will-it-scale open1_threads

Threads QL      	QL stddev       stddev% RQL     	RQL stddev      stddev% Speedup
-----------------------------------------------------------------------------------------------
1	1396323.92	7373.12		0.53	1366616.8	4152.08		0.3	0.98
2	1844403.8	3165.26		0.17	1700301.96	2396.58		0.14	0.92
4	2370590.6	24545.54	1.04	1655872.32	47938.71	2.9	0.7
8	2185227.04	9537.9		0.44	1691205.16	9783.25		0.58	0.77
16	2110672.36	10972.99	0.52	1781696.24	15021.43	0.84	0.84
24	1655042.72	18037.23	1.09	2165125.4	5422.54		0.25	1.31
32	1738928.24	7166.64		0.41	1829468.24	9081.59		0.5	1.05
40	1854430.52	6148.24		0.33	1731062.28	3311.95		0.19	0.93
48	1766529.96	5063.86		0.29	1749375.28	2311.27		0.13	0.99
56	1303016.28	6168.4		0.47	1452656		7695.29		0.53	1.11
64	1169557.96	4353.67		0.37	1287370.56	8477.2		0.66	1.1
72	1036023.4	7116.53		0.69	1135513.92	9542.55		0.84	1.1
80	1097913.64	11356		1.03	1176864.8	6771.41		0.58	1.07
88	1123907.36	12843.13	1.14	1072416.48	7412.25		0.69	0.95
96	1166981.52	9402.71		0.81	1129678.76	9499.14		0.84	0.97
104	1108954.04	8171.46		0.74	1032044.44	7840.17		0.76	0.93
112	1000777.76	8445.7		0.84	1078498.8	6551.47		0.61	1.08
120	1029448.4	6992.29		0.68	1093743		8378.94		0.77	1.06
128	1106670.36	10102.15	0.91	1241438.68	23212.66	1.87	1.12
136	1183776.88	6394.79		0.54	1116799.64	18111.38	1.62	0.94
144	1201122		25917.69	2.16	1301779.96	15792.6		1.21	1.08
152	1099737.08	13567.82	1.23	1053647.2	12704.29	1.21	0.96
160	1031186.32	9048.07		0.88	1069961.4	8293.18		0.78	1.04
168	1068817		16486.06	1.54	1096495.36	14021.93	1.28	1.03
176	966633.96	9623.27		1	1081129.84	9474.81		0.88	1.12
184	1004419.04	12111.11	1.21	1037771.24	12001.66	1.16	1.03
192	1088858.08	16522.93	1.52	1027943.12	14238.57	1.39	0.94

will-it-scale open2_threads

Threads QL      	QL stddev       stddev% RQL     	RQL stddev      stddev% Speedup
-----------------------------------------------------------------------------------------------
1	1337797.76	4649.19		0.35	1332609.4	3813.14		0.29	1
2	1598300.2	1059.93		0.07	1771891.36	5667.12		0.32	1.11
4	1736573.76	13025.33	0.75	1396901.2	2682.46		0.19	0.8
8	1794367.84	4879.6		0.27	1917478.56	3751.98		0.2	1.07
16	1990998.44	8332.78		0.42	1864165.56	9648.59		0.52	0.94
24	1868148.56	4248.23		0.23	1710136.68	2760.58		0.16	0.92
32	1955180		6719		0.34	1936149.88	1980.87		0.1	0.99
40	1769646.4	4686.54		0.26	1729653.68	4551.22		0.26	0.98
48	1724861.16	4056.66		0.24	1764900		971.11		0.06	1.02
56	1318568		7758.86		0.59	1385660.84	7039.8		0.51	1.05
64	1143290.28	5351.43		0.47	1316686.6	5597.69		0.43	1.15
72	1196762.68	10655.67	0.89	1230173.24	9858.2		0.8	1.03
80	1126308.24	6901.55		0.61	1085391.16	7444.34		0.69	0.96
88	1035672.96	5452.95		0.53	1035541.52	8095.33		0.78	1
96	1030203.36	6735.71		0.65	1020113.48	8683.13		0.85	0.99
104	1039432.88	6583.59		0.63	1083902.48	5775.72		0.53	1.04
112	1113609.04	4380.62		0.39	1072010.36	8983.14		0.84	0.96
120	1109420.96	7183.5		0.65	1079424.12	10929.97	1.01	0.97
128	1095400.04	4274.6		0.39	1095475.2	12042.02	1.1	1
136	1071605.4	11103.73	1.04	1114757.2	10516.55	0.94	1.04
144	1104147.2	9714.75		0.88	1044954.16	7544.2		0.72	0.95
152	1164280.24	13386.15	1.15	1101213.92	11568.49	1.05	0.95
160	1084892.04	7941.25		0.73	1152273.76	9593.38		0.83	1.06
168	983654.76	11772.85	1.2	1111772.28	9806.83		0.88	1.13
176	1087544.24	11262.35	1.04	1077507.76	9442.02		0.88	0.99
184	1101682.4	24701.68	2.24	1095223.2	16707.29	1.53	0.99
192	983712.08	13453.59	1.37	1051244.2	15662.05	1.49	1.07

will-it-scale lock1_threads

Threads QL      	QL stddev       stddev% RQL     	RQL stddev      stddev% Speedup
-----------------------------------------------------------------------------------------------
1	4307484.96	3959.31		0.09	4252908.56	10375.78	0.24	0.99
2	7701844.32	4169.88		0.05	7219233.52	6437.11		0.09	0.94
4	14781878.72	22854.85	0.15	15260565.12	37305.71	0.24	1.03
8	12949698.64	99270.42	0.77	9954660.4	142805.68	1.43	0.77
16	12947690.64	72977.27	0.56	10865245.12	49520.31	0.46	0.84
24	11142990.64	33200.39	0.3	11444391.68	37884.46	0.33	1.03
32	9652335.84	22369.48	0.23	9344086.72	21639.22	0.23	0.97
40	9185931.12	5508.96		0.06	8881506.32	5072.33		0.06	0.97
48	9084385.36	10871.05	0.12	8863579.12	4583.37		0.05	0.98
56	6595540.96	33100.59	0.5	6640389.76	46619.96	0.7	1.01
64	5946726.24	47160.5		0.79	6572155.84	91973.73	1.4	1.11
72	6744894.72	43166.65	0.64	5991363.36	80637.56	1.35	0.89
80	6234502.16	118983.16	1.91	5157894.32	73592.72	1.43	0.83
88	5053879.6	199713.75	3.95	4479758.08	36202.27	0.81	0.89
96	5184302.64	99199.89	1.91	5249210.16	122348.69	2.33	1.01
104	4612391.92	40803.05	0.88	4850209.6	26813.28	0.55	1.05
112	4809209.68	24070.68	0.5	4869477.84	27489.04	0.56	1.01
120	5130746.4	34265.5		0.67	4620047.12	44229.54	0.96	0.9
128	5376465.28	95028.05	1.77	4781179.6	43700.93	0.91	0.89
136	5453742.4	86718.87	1.59	5412457.12	40339.68	0.75	0.99
144	5805040.72	84669.31	1.46	5595382.48	68701.65	1.23	0.96
152	5842897.36	31120.33	0.53	5787587.12	43521.68	0.75	0.99
160	5837665.12	14179.44	0.24	5118808.72	45193.23	0.88	0.88
168	5660332.72	27467.09	0.49	5104959.04	40891.75	0.8	0.9
176	5180312.24	28656.39	0.55	4718407.6	58734.13	1.24	0.91
184	4706824.16	50469.31	1.07	4692962.64	92266.85	1.97	1
192	5126054.56	51082.02	1	4680866.8	58743.51	1.25	0.91

will-it-scale lock2_threads

Threads QL      	QL stddev       stddev% RQL     	RQL stddev      stddev% Speedup
-----------------------------------------------------------------------------------------------
1	4316091.2	4933.28		0.11	4293104		30369.71	0.71	0.99
2	3500046.4	19852.62	0.57	4507627.76	23667.66	0.53	1.29
4	3639098.96	26370.65	0.72	3673166.32	30822.71	0.84	1.01
8	3714548.56	49953.44	1.34	4055818.56	71630.41	1.77	1.09
16	4188724.64	105414.49	2.52	4316077.12	68956.15	1.6	1.03
24	3737908.32	47391.46	1.27	3762254.56	55345.7		1.47	1.01
32	3820952.8	45207.66	1.18	3710368.96	52651.92	1.42	0.97
40	3791280.8	28630.55	0.76	3661933.52	37671.27	1.03	0.97
48	3765721.84	59553.83	1.58	3604738.64	50861.36	1.41	0.96
56	3175505.76	64336.17	2.03	2771022.48	66586.99	2.4	0.87
64	2620294.48	71651.34	2.73	2650171.68	44810.83	1.69	1.01
72	2861893.6	86542.61	3.02	2537437.2	84571.75	3.33	0.89
80	2976297.2	83566.43	2.81	2645132.8	85992.34	3.25	0.89
88	2547724.8	102014.36	4	2336852.16	80570.25	3.45	0.92
96	2945310.32	82673.25	2.81	2513316.96	45741.81	1.82	0.85
104	3028818.64	90643.36	2.99	2581787.52	52967.48	2.05	0.85
112	2546264.16	102605.82	4.03	2118812.64	62043.19	2.93	0.83
120	2917334.64	112220.01	3.85	2720418.64	64035.96	2.35	0.93
128	2906621.84	69428.1		2.39	2795310.32	56736.87	2.03	0.96
136	2841833.76	105541.11	3.71	3063404.48	62288.94	2.03	1.08
144	3032822.32	134796.56	4.44	3169985.6	149707.83	4.72	1.05
152	2557694.96	62218.15	2.43	2469887.6	68343.78	2.77	0.97
160	2810214.72	61468.79	2.19	2323768.48	54226.71	2.33	0.83
168	2651146.48	76573.27	2.89	2385936.64	52433.98	2.2	0.9
176	2720616.32	89026.19	3.27	2941400.08	59296.64	2.02	1.08
184	2696086		88541.24	3.28	2598225.2	76365.7		2.94	0.96
192	2908194.48	87023.91	2.99	2377677.68	53299.82	2.24	0.82

Written By
—---------
Alexei Starovoitov <ast@kernel.org>
Kumar Kartikeya Dwivedi <memxor@gmail.com>

  [0]: https://www.cs.rochester.edu/research/synchronization/pseudocode/timeout.html
  [1]: https://dl.acm.org/doi/10.1145/571825.571830
  [2]: https://github.com/kkdwivedi/rqspinlock/blob/main/rqspinlock.pdf
  [3]: https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/plain/qspinlock.tla

Kumar Kartikeya Dwivedi (22):
  locking: Move MCS struct definition to public header
  locking: Move common qspinlock helpers to a private header
  locking: Allow obtaining result of arch_mcs_spin_lock_contended
  locking: Copy out qspinlock.c to rqspinlock.c
  rqspinlock: Add rqspinlock.h header
  rqspinlock: Drop PV and virtualization support
  rqspinlock: Add support for timeouts
  rqspinlock: Protect pending bit owners from stalls
  rqspinlock: Protect waiters in queue from stalls
  rqspinlock: Protect waiters in trylock fallback from stalls
  rqspinlock: Add deadlock detection and recovery
  rqspinlock: Add basic support for CONFIG_PARAVIRT
  rqspinlock: Add helper to print a splat on timeout or deadlock
  rqspinlock: Add macros for rqspinlock usage
  rqspinlock: Add locktorture support
  rqspinlock: Add entry to Makefile, MAINTAINERS
  bpf: Convert hashtab.c to rqspinlock
  bpf: Convert percpu_freelist.c to rqspinlock
  bpf: Convert lpm_trie.c to rqspinlock
  bpf: Introduce rqspinlock kfuncs
  bpf: Implement verifier support for rqspinlock
  selftests/bpf: Add tests for rqspinlock

 MAINTAINERS                                   |   3 +
 arch/x86/include/asm/rqspinlock.h             |  20 +
 include/asm-generic/Kbuild                    |   1 +
 include/asm-generic/mcs_spinlock.h            |   6 +
 include/asm-generic/rqspinlock.h              | 147 ++++
 include/linux/bpf.h                           |  10 +
 include/linux/bpf_verifier.h                  |  17 +-
 kernel/bpf/btf.c                              |  26 +-
 kernel/bpf/hashtab.c                          | 102 +--
 kernel/bpf/lpm_trie.c                         |  25 +-
 kernel/bpf/percpu_freelist.c                  | 113 +--
 kernel/bpf/percpu_freelist.h                  |   4 +-
 kernel/bpf/syscall.c                          |   6 +-
 kernel/bpf/verifier.c                         | 233 ++++--
 kernel/locking/Makefile                       |   3 +
 kernel/locking/lock_events_list.h             |   5 +
 kernel/locking/locktorture.c                  |  51 ++
 kernel/locking/mcs_spinlock.h                 |  10 +-
 kernel/locking/qspinlock.c                    | 193 +----
 kernel/locking/qspinlock.h                    | 200 +++++
 kernel/locking/rqspinlock.c                   | 724 ++++++++++++++++++
 kernel/locking/rqspinlock.h                   |  48 ++
 .../selftests/bpf/prog_tests/res_spin_lock.c  | 103 +++
 tools/testing/selftests/bpf/progs/irq.c       |  53 ++
 .../selftests/bpf/progs/res_spin_lock.c       | 189 +++++
 .../selftests/bpf/progs/res_spin_lock_fail.c  | 226 ++++++
 26 files changed, 2097 insertions(+), 421 deletions(-)
 create mode 100644 arch/x86/include/asm/rqspinlock.h
 create mode 100644 include/asm-generic/rqspinlock.h
 create mode 100644 kernel/locking/qspinlock.h
 create mode 100644 kernel/locking/rqspinlock.c
 create mode 100644 kernel/locking/rqspinlock.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c

base-commit: f44275e7155dc310d36516fc25be503da099781c
-- 
2.43.5

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 01/22] locking: Move MCS struct definition to public header
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 02/22] locking: Move common qspinlock helpers to a private header Kumar Kartikeya Dwivedi
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

Move the definition of the struct mcs_spinlock from the private
mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h
asm-generic header, since we will need to reference it from the
qspinlock.h header in subsequent commits.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/mcs_spinlock.h | 6 ++++++
 kernel/locking/mcs_spinlock.h      | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h
index 10cd4ffc6ba2..39c94012b88a 100644
--- a/include/asm-generic/mcs_spinlock.h
+++ b/include/asm-generic/mcs_spinlock.h
@@ -1,6 +1,12 @@
 #ifndef __ASM_MCS_SPINLOCK_H
 #define __ASM_MCS_SPINLOCK_H
 
+struct mcs_spinlock {
+	struct mcs_spinlock *next;
+	int locked; /* 1 if lock acquired */
+	int count;  /* nesting count, see qspinlock.c */
+};
+
 /*
  * Architectures can define their own:
  *
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 85251d8771d9..16160ca8907f 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -15,12 +15,6 @@
 
 #include <asm/mcs_spinlock.h>
 
-struct mcs_spinlock {
-	struct mcs_spinlock *next;
-	int locked; /* 1 if lock acquired */
-	int count;  /* nesting count, see qspinlock.c */
-};
-
 #ifndef arch_mcs_spin_lock_contended
 /*
  * Using smp_cond_load_acquire() provides the acquire semantics
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 02/22] locking: Move common qspinlock helpers to a private header
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 01/22] locking: Move MCS struct definition to public header Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 03/22] locking: Allow obtaining result of arch_mcs_spin_lock_contended Kumar Kartikeya Dwivedi
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

Move qspinlock helper functions that encode, decode tail word, set and
clear the pending and locked bits, and other miscellaneous definitions
and macros to a private header. To this end, create a qspinlock.h header
file in kernel/locking. Subsequent commits will introduce a modified
qspinlock slow path function, thus moving shared code to a private
header will help minimize unnecessary code duplication.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/qspinlock.c | 193 +----------------------------------
 kernel/locking/qspinlock.h | 200 +++++++++++++++++++++++++++++++++++++
 2 files changed, 205 insertions(+), 188 deletions(-)
 create mode 100644 kernel/locking/qspinlock.h

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 7d96bed718e4..af8d122bb649 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -25,8 +25,9 @@
 #include <trace/events/lock.h>
 
 /*
- * Include queued spinlock statistics code
+ * Include queued spinlock definitions and statistics code
  */
+#include "qspinlock.h"
 #include "qspinlock_stat.h"
 
 /*
@@ -67,36 +68,6 @@
  */
 
 #include "mcs_spinlock.h"
-#define MAX_NODES	4
-
-/*
- * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
- * size and four of them will fit nicely in one 64-byte cacheline. For
- * pvqspinlock, however, we need more space for extra data. To accommodate
- * that, we insert two more long words to pad it up to 32 bytes. IOW, only
- * two of them can fit in a cacheline in this case. That is OK as it is rare
- * to have more than 2 levels of slowpath nesting in actual use. We don't
- * want to penalize pvqspinlocks to optimize for a rare case in native
- * qspinlocks.
- */
-struct qnode {
-	struct mcs_spinlock mcs;
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-	long reserved[2];
-#endif
-};
-
-/*
- * The pending bit spinning loop count.
- * This heuristic is used to limit the number of lockword accesses
- * made by atomic_cond_read_relaxed when waiting for the lock to
- * transition out of the "== _Q_PENDING_VAL" state. We don't spin
- * indefinitely because there's no guarantee that we'll make forward
- * progress.
- */
-#ifndef _Q_PENDING_LOOPS
-#define _Q_PENDING_LOOPS	1
-#endif
 
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
@@ -106,161 +77,7 @@ struct qnode {
  *
  * PV doubles the storage and uses the second cacheline for PV state.
  */
-static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[MAX_NODES]);
-
-/*
- * We must be able to distinguish between no-tail and the tail at 0:0,
- * therefore increment the cpu number by one.
- */
-
-static inline __pure u32 encode_tail(int cpu, int idx)
-{
-	u32 tail;
-
-	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
-	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
-
-	return tail;
-}
-
-static inline __pure struct mcs_spinlock *decode_tail(u32 tail)
-{
-	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
-	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
-
-	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
-}
-
-static inline __pure
-struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
-{
-	return &((struct qnode *)base + idx)->mcs;
-}
-
-#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
-
-#if _Q_PENDING_BITS == 8
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->pending, 0);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- *
- * Lock stealing is not allowed if this function is used.
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
-}
-
-/*
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail), which heads an address dependency
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	/*
-	 * We can use relaxed semantics since the caller ensures that the
-	 * MCS node is properly initialized before updating the tail.
-	 */
-	return (u32)xchg_relaxed(&lock->tail,
-				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
-}
-
-#else /* _Q_PENDING_BITS == 8 */
-
-/**
- * clear_pending - clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,* -> *,0,*
- */
-static __always_inline void clear_pending(struct qspinlock *lock)
-{
-	atomic_andnot(_Q_PENDING_VAL, &lock->val);
-}
-
-/**
- * clear_pending_set_locked - take ownership and clear the pending bit.
- * @lock: Pointer to queued spinlock structure
- *
- * *,1,0 -> *,0,1
- */
-static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
-{
-	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
-}
-
-/**
- * xchg_tail - Put in the new queue tail code word & retrieve previous one
- * @lock : Pointer to queued spinlock structure
- * @tail : The new queue tail code word
- * Return: The previous queue tail code word
- *
- * xchg(lock, tail)
- *
- * p,*,* -> n,*,* ; prev = xchg(lock, node)
- */
-static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
-{
-	u32 old, new;
-
-	old = atomic_read(&lock->val);
-	do {
-		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
-		/*
-		 * We can use relaxed semantics since the caller ensures that
-		 * the MCS node is properly initialized before updating the
-		 * tail.
-		 */
-	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
-
-	return old;
-}
-#endif /* _Q_PENDING_BITS == 8 */
-
-/**
- * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
- * @lock : Pointer to queued spinlock structure
- * Return: The previous lock value
- *
- * *,*,* -> *,1,*
- */
-#ifndef queued_fetch_set_pending_acquire
-static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
-{
-	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
-}
-#endif
-
-/**
- * set_locked - Set the lock bit and own the lock
- * @lock: Pointer to queued spinlock structure
- *
- * *,*,0 -> *,0,1
- */
-static __always_inline void set_locked(struct qspinlock *lock)
-{
-	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
-}
-
+static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
 /*
  * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for
@@ -410,7 +227,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * any MCS node. This is not the most elegant solution, but is
 	 * simple enough.
 	 */
-	if (unlikely(idx >= MAX_NODES)) {
+	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
 		while (!queued_spin_trylock(lock))
 			cpu_relax();
@@ -465,7 +282,7 @@ void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
-		prev = decode_tail(old);
+		prev = decode_tail(old, qnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
diff --git a/kernel/locking/qspinlock.h b/kernel/locking/qspinlock.h
new file mode 100644
index 000000000000..d4ceb9490365
--- /dev/null
+++ b/kernel/locking/qspinlock.h
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Queued spinlock defines
+ *
+ * This file contains macro definitions and functions shared between different
+ * qspinlock slow path implementations.
+ */
+#ifndef __LINUX_QSPINLOCK_H
+#define __LINUX_QSPINLOCK_H
+
+#include <asm-generic/percpu.h>
+#include <linux/percpu-defs.h>
+#include <asm-generic/qspinlock.h>
+#include <asm-generic/mcs_spinlock.h>
+
+#define _Q_MAX_NODES	4
+
+/*
+ * The pending bit spinning loop count.
+ * This heuristic is used to limit the number of lockword accesses
+ * made by atomic_cond_read_relaxed when waiting for the lock to
+ * transition out of the "== _Q_PENDING_VAL" state. We don't spin
+ * indefinitely because there's no guarantee that we'll make forward
+ * progress.
+ */
+#ifndef _Q_PENDING_LOOPS
+#define _Q_PENDING_LOOPS	1
+#endif
+
+/*
+ * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
+ * size and four of them will fit nicely in one 64-byte cacheline. For
+ * pvqspinlock, however, we need more space for extra data. To accommodate
+ * that, we insert two more long words to pad it up to 32 bytes. IOW, only
+ * two of them can fit in a cacheline in this case. That is OK as it is rare
+ * to have more than 2 levels of slowpath nesting in actual use. We don't
+ * want to penalize pvqspinlocks to optimize for a rare case in native
+ * qspinlocks.
+ */
+struct qnode {
+	struct mcs_spinlock mcs;
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+	long reserved[2];
+#endif
+};
+
+/*
+ * We must be able to distinguish between no-tail and the tail at 0:0,
+ * therefore increment the cpu number by one.
+ */
+
+static inline __pure u32 encode_tail(int cpu, int idx)
+{
+	u32 tail;
+
+	tail  = (cpu + 1) << _Q_TAIL_CPU_OFFSET;
+	tail |= idx << _Q_TAIL_IDX_OFFSET; /* assume < 4 */
+
+	return tail;
+}
+
+static inline __pure struct mcs_spinlock *decode_tail(u32 tail, struct qnode *qnodes)
+{
+	int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
+	int idx = (tail &  _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;
+
+	return per_cpu_ptr(&qnodes[idx].mcs, cpu);
+}
+
+static inline __pure
+struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
+{
+	return &((struct qnode *)base + idx)->mcs;
+}
+
+#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
+
+#if _Q_PENDING_BITS == 8
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->pending, 0);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ *
+ * Lock stealing is not allowed if this function is used.
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked_pending, _Q_LOCKED_VAL);
+}
+
+/*
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail), which heads an address dependency
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	/*
+	 * We can use relaxed semantics since the caller ensures that the
+	 * MCS node is properly initialized before updating the tail.
+	 */
+	return (u32)xchg_relaxed(&lock->tail,
+				 tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
+}
+
+#else /* _Q_PENDING_BITS == 8 */
+
+/**
+ * clear_pending - clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,* -> *,0,*
+ */
+static __always_inline void clear_pending(struct qspinlock *lock)
+{
+	atomic_andnot(_Q_PENDING_VAL, &lock->val);
+}
+
+/**
+ * clear_pending_set_locked - take ownership and clear the pending bit.
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,1,0 -> *,0,1
+ */
+static __always_inline void clear_pending_set_locked(struct qspinlock *lock)
+{
+	atomic_add(-_Q_PENDING_VAL + _Q_LOCKED_VAL, &lock->val);
+}
+
+/**
+ * xchg_tail - Put in the new queue tail code word & retrieve previous one
+ * @lock : Pointer to queued spinlock structure
+ * @tail : The new queue tail code word
+ * Return: The previous queue tail code word
+ *
+ * xchg(lock, tail)
+ *
+ * p,*,* -> n,*,* ; prev = xchg(lock, node)
+ */
+static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		new = (old & _Q_LOCKED_PENDING_MASK) | tail;
+		/*
+		 * We can use relaxed semantics since the caller ensures that
+		 * the MCS node is properly initialized before updating the
+		 * tail.
+		 */
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return old;
+}
+#endif /* _Q_PENDING_BITS == 8 */
+
+/**
+ * queued_fetch_set_pending_acquire - fetch the whole lock value and set pending
+ * @lock : Pointer to queued spinlock structure
+ * Return: The previous lock value
+ *
+ * *,*,* -> *,1,*
+ */
+#ifndef queued_fetch_set_pending_acquire
+static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lock)
+{
+	return atomic_fetch_or_acquire(_Q_PENDING_VAL, &lock->val);
+}
+#endif
+
+/**
+ * set_locked - Set the lock bit and own the lock
+ * @lock: Pointer to queued spinlock structure
+ *
+ * *,*,0 -> *,0,1
+ */
+static __always_inline void set_locked(struct qspinlock *lock)
+{
+	WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
+}
+
+#endif /* __LINUX_QSPINLOCK_H */
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 03/22] locking: Allow obtaining result of arch_mcs_spin_lock_contended
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 01/22] locking: Move MCS struct definition to public header Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 02/22] locking: Move common qspinlock helpers to a private header Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 04/22] locking: Copy out qspinlock.c to rqspinlock.c Kumar Kartikeya Dwivedi
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

To support upcoming changes that require inspecting the return value
once the conditional waiting loop in arch_mcs_spin_lock_contended
terminates, modify the macro to preserve the result of
smp_cond_load_acquire. This enables checking the return value as needed,
which will help disambiguate the MCS node’s locked state in future
patches.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/mcs_spinlock.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 16160ca8907f..5c92ba199b90 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -24,9 +24,7 @@
  * spinning, and smp_cond_load_acquire() provides that behavior.
  */
 #define arch_mcs_spin_lock_contended(l)					\
-do {									\
-	smp_cond_load_acquire(l, VAL);					\
-} while (0)
+	smp_cond_load_acquire(l, VAL)
 #endif
 
 #ifndef arch_mcs_spin_unlock_contended
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 04/22] locking: Copy out qspinlock.c to rqspinlock.c
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 03/22] locking: Allow obtaining result of arch_mcs_spin_lock_contended Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 05/22] rqspinlock: Add rqspinlock.h header Kumar Kartikeya Dwivedi
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

In preparation for introducing a new lock implementation, Resilient
Queued Spin Lock, or rqspinlock, we first begin our modifications by
using the existing qspinlock.c code as the base. Simply copy the code to
a new file and rename functions and variables from 'queued' to
'resilient_queued'.

This helps each subsequent commit in clearly showing how and where the
code is being changed. The only change after a literal copy in this
commit is renaming the functions where necessary.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 410 ++++++++++++++++++++++++++++++++++++
 1 file changed, 410 insertions(+)
 create mode 100644 kernel/locking/rqspinlock.c

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
new file mode 100644
index 000000000000..caaa7c9bbc79
--- /dev/null
+++ b/kernel/locking/rqspinlock.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P.
+ * (C) Copyright 2013-2014,2018 Red Hat, Inc.
+ * (C) Copyright 2015 Intel Corp.
+ * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ *
+ * Authors: Waiman Long <longman@redhat.com>
+ *          Peter Zijlstra <peterz@infradead.org>
+ */
+
+#ifndef _GEN_PV_LOCK_SLOWPATH
+
+#include <linux/smp.h>
+#include <linux/bug.h>
+#include <linux/cpumask.h>
+#include <linux/percpu.h>
+#include <linux/hardirq.h>
+#include <linux/mutex.h>
+#include <linux/prefetch.h>
+#include <asm/byteorder.h>
+#include <asm/qspinlock.h>
+#include <trace/events/lock.h>
+
+/*
+ * Include queued spinlock definitions and statistics code
+ */
+#include "qspinlock.h"
+#include "qspinlock_stat.h"
+
+/*
+ * The basic principle of a queue-based spinlock can best be understood
+ * by studying a classic queue-based spinlock implementation called the
+ * MCS lock. A copy of the original MCS lock paper ("Algorithms for Scalable
+ * Synchronization on Shared-Memory Multiprocessors by Mellor-Crummey and
+ * Scott") is available at
+ *
+ * https://bugzilla.kernel.org/show_bug.cgi?id=206115
+ *
+ * This queued spinlock implementation is based on the MCS lock, however to
+ * make it fit the 4 bytes we assume spinlock_t to be, and preserve its
+ * existing API, we must modify it somehow.
+ *
+ * In particular; where the traditional MCS lock consists of a tail pointer
+ * (8 bytes) and needs the next pointer (another 8 bytes) of its own node to
+ * unlock the next pending (next->locked), we compress both these: {tail,
+ * next->locked} into a single u32 value.
+ *
+ * Since a spinlock disables recursion of its own context and there is a limit
+ * to the contexts that can nest; namely: task, softirq, hardirq, nmi. As there
+ * are at most 4 nesting levels, it can be encoded by a 2-bit number. Now
+ * we can encode the tail by combining the 2-bit nesting level with the cpu
+ * number. With one byte for the lock value and 3 bytes for the tail, only a
+ * 32-bit word is now needed. Even though we only need 1 bit for the lock,
+ * we extend it to a full byte to achieve better performance for architectures
+ * that support atomic byte write.
+ *
+ * We also change the first spinner to spin on the lock bit instead of its
+ * node; whereby avoiding the need to carry a node from lock to unlock, and
+ * preserving existing lock API. This also makes the unlock code simpler and
+ * faster.
+ *
+ * N.B. The current implementation only supports architectures that allow
+ *      atomic operations on smaller 8-bit and 16-bit data types.
+ *
+ */
+
+#include "mcs_spinlock.h"
+
+/*
+ * Per-CPU queue node structures; we can never have more than 4 nested
+ * contexts: task, softirq, hardirq, nmi.
+ *
+ * Exactly fits one 64-byte cacheline on a 64-bit architecture.
+ *
+ * PV doubles the storage and uses the second cacheline for PV state.
+ */
+static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
+
+/*
+ * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
+ * for all the PV callbacks.
+ */
+
+static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
+					   struct mcs_spinlock *prev) { }
+static __always_inline void __pv_kick_node(struct qspinlock *lock,
+					   struct mcs_spinlock *node) { }
+static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
+						   struct mcs_spinlock *node)
+						   { return 0; }
+
+#define pv_enabled()		false
+
+#define pv_init_node		__pv_init_node
+#define pv_wait_node		__pv_wait_node
+#define pv_kick_node		__pv_kick_node
+#define pv_wait_head_or_lock	__pv_wait_head_or_lock
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
+#endif
+
+#endif /* _GEN_PV_LOCK_SLOWPATH */
+
+/**
+ * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ * @val: Current value of the queued spinlock 32-bit word
+ *
+ * (queue tail, pending bit, lock value)
+ *
+ *              fast     :    slow                                  :    unlock
+ *                       :                                          :
+ * uncontended  (0,0,0) -:--> (0,0,1) ------------------------------:--> (*,*,0)
+ *                       :       | ^--------.------.             /  :
+ *                       :       v           \      \            |  :
+ * pending               :    (0,1,1) +--> (0,1,0)   \           |  :
+ *                       :       | ^--'              |           |  :
+ *                       :       v                   |           |  :
+ * uncontended           :    (n,x,y) +--> (n,0,0) --'           |  :
+ *   queue               :       | ^--'                          |  :
+ *                       :       v                               |  :
+ * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
+ *   queue               :         ^--'                             :
+ */
+void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+	struct mcs_spinlock *prev, *next, *node;
+	u32 old, tail;
+	int idx;
+
+	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
+
+	if (pv_enabled())
+		goto pv_queue;
+
+	if (virt_spin_lock(lock))
+		return;
+
+	/*
+	 * Wait for in-progress pending->locked hand-overs with a bounded
+	 * number of spins so that we guarantee forward progress.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	if (val == _Q_PENDING_VAL) {
+		int cnt = _Q_PENDING_LOOPS;
+		val = atomic_cond_read_relaxed(&lock->val,
+					       (VAL != _Q_PENDING_VAL) || !cnt--);
+	}
+
+	/*
+	 * If we observe any contention; queue.
+	 */
+	if (val & ~_Q_LOCKED_MASK)
+		goto queue;
+
+	/*
+	 * trylock || pending
+	 *
+	 * 0,0,* -> 0,1,* -> 0,0,1 pending, trylock
+	 */
+	val = queued_fetch_set_pending_acquire(lock);
+
+	/*
+	 * If we observe contention, there is a concurrent locker.
+	 *
+	 * Undo and queue; our setting of PENDING might have made the
+	 * n,0,0 -> 0,0,0 transition fail and it will now be waiting
+	 * on @next to become !NULL.
+	 */
+	if (unlikely(val & ~_Q_LOCKED_MASK)) {
+
+		/* Undo PENDING if we set it. */
+		if (!(val & _Q_PENDING_MASK))
+			clear_pending(lock);
+
+		goto queue;
+	}
+
+	/*
+	 * We're pending, wait for the owner to go away.
+	 *
+	 * 0,1,1 -> *,1,0
+	 *
+	 * this wait loop must be a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because not all
+	 * clear_pending_set_locked() implementations imply full
+	 * barriers.
+	 */
+	if (val & _Q_LOCKED_MASK)
+		smp_cond_load_acquire(&lock->locked, !VAL);
+
+	/*
+	 * take ownership and clear the pending bit.
+	 *
+	 * 0,1,0 -> 0,0,1
+	 */
+	clear_pending_set_locked(lock);
+	lockevent_inc(lock_pending);
+	return;
+
+	/*
+	 * End of pending bit optimistic spinning and beginning of MCS
+	 * queuing.
+	 */
+queue:
+	lockevent_inc(lock_slowpath);
+pv_queue:
+	node = this_cpu_ptr(&qnodes[0].mcs);
+	idx = node->count++;
+	tail = encode_tail(smp_processor_id(), idx);
+
+	trace_contention_begin(lock, LCB_F_SPIN);
+
+	/*
+	 * 4 nodes are allocated based on the assumption that there will
+	 * not be nested NMIs taking spinlocks. That may not be true in
+	 * some architectures even though the chance of needing more than
+	 * 4 nodes will still be extremely unlikely. When that happens,
+	 * we fall back to spinning on the lock directly without using
+	 * any MCS node. This is not the most elegant solution, but is
+	 * simple enough.
+	 */
+	if (unlikely(idx >= _Q_MAX_NODES)) {
+		lockevent_inc(lock_no_node);
+		while (!queued_spin_trylock(lock))
+			cpu_relax();
+		goto release;
+	}
+
+	node = grab_mcs_node(node, idx);
+
+	/*
+	 * Keep counts of non-zero index values:
+	 */
+	lockevent_cond_inc(lock_use_node2 + idx - 1, idx);
+
+	/*
+	 * Ensure that we increment the head node->count before initialising
+	 * the actual node. If the compiler is kind enough to reorder these
+	 * stores, then an IRQ could overwrite our assignments.
+	 */
+	barrier();
+
+	node->locked = 0;
+	node->next = NULL;
+	pv_init_node(node);
+
+	/*
+	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
+	 * attempt the trylock once more in the hope someone let go while we
+	 * weren't watching.
+	 */
+	if (queued_spin_trylock(lock))
+		goto release;
+
+	/*
+	 * Ensure that the initialisation of @node is complete before we
+	 * publish the updated tail via xchg_tail() and potentially link
+	 * @node into the waitqueue via WRITE_ONCE(prev->next, node) below.
+	 */
+	smp_wmb();
+
+	/*
+	 * Publish the updated tail.
+	 * We have already touched the queueing cacheline; don't bother with
+	 * pending stuff.
+	 *
+	 * p,*,* -> n,*,*
+	 */
+	old = xchg_tail(lock, tail);
+	next = NULL;
+
+	/*
+	 * if there was a previous node; link it and wait until reaching the
+	 * head of the waitqueue.
+	 */
+	if (old & _Q_TAIL_MASK) {
+		prev = decode_tail(old, qnodes);
+
+		/* Link @node into the waitqueue. */
+		WRITE_ONCE(prev->next, node);
+
+		pv_wait_node(node, prev);
+		arch_mcs_spin_lock_contended(&node->locked);
+
+		/*
+		 * While waiting for the MCS lock, the next pointer may have
+		 * been set by another lock waiter. We optimistically load
+		 * the next pointer & prefetch the cacheline for writing
+		 * to reduce latency in the upcoming MCS unlock operation.
+		 */
+		next = READ_ONCE(node->next);
+		if (next)
+			prefetchw(next);
+	}
+
+	/*
+	 * we're at the head of the waitqueue, wait for the owner & pending to
+	 * go away.
+	 *
+	 * *,x,y -> *,0,0
+	 *
+	 * this wait loop must use a load-acquire such that we match the
+	 * store-release that clears the locked bit and create lock
+	 * sequentiality; this is because the set_locked() function below
+	 * does not imply a full barrier.
+	 *
+	 * The PV pv_wait_head_or_lock function, if active, will acquire
+	 * the lock and return a non-zero value. So we have to skip the
+	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
+	 * been designated yet, there is no way for the locked value to become
+	 * _Q_SLOW_VAL. So both the set_locked() and the
+	 * atomic_cmpxchg_relaxed() calls will be safe.
+	 *
+	 * If PV isn't active, 0 will be returned instead.
+	 *
+	 */
+	if ((val = pv_wait_head_or_lock(lock, node)))
+		goto locked;
+
+	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+
+locked:
+	/*
+	 * claim the lock:
+	 *
+	 * n,0,0 -> 0,0,1 : lock, uncontended
+	 * *,*,0 -> *,*,1 : lock, contended
+	 *
+	 * If the queue head is the only one in the queue (lock value == tail)
+	 * and nobody is pending, clear the tail code and grab the lock.
+	 * Otherwise, we only need to grab the lock.
+	 */
+
+	/*
+	 * In the PV case we might already have _Q_LOCKED_VAL set, because
+	 * of lock stealing; therefore we must also allow:
+	 *
+	 * n,0,1 -> 0,0,1
+	 *
+	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
+	 *       above wait condition, therefore any concurrent setting of
+	 *       PENDING will make the uncontended transition fail.
+	 */
+	if ((val & _Q_TAIL_MASK) == tail) {
+		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
+			goto release; /* No contention */
+	}
+
+	/*
+	 * Either somebody is queued behind us or _Q_PENDING_VAL got set
+	 * which will then detect the remaining tail and queue behind us
+	 * ensuring we'll see a @next.
+	 */
+	set_locked(lock);
+
+	/*
+	 * contended path; wait for next if not observed yet, release.
+	 */
+	if (!next)
+		next = smp_cond_load_relaxed(&node->next, (VAL));
+
+	arch_mcs_spin_unlock_contended(&next->locked);
+	pv_kick_node(lock, next);
+
+release:
+	trace_contention_end(lock, 0);
+
+	/*
+	 * release the node
+	 */
+	__this_cpu_dec(qnodes[0].mcs.count);
+}
+EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
+
+/*
+ * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
+ */
+#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
+#define _GEN_PV_LOCK_SLOWPATH
+
+#undef  pv_enabled
+#define pv_enabled()	true
+
+#undef pv_init_node
+#undef pv_wait_node
+#undef pv_kick_node
+#undef pv_wait_head_or_lock
+
+#undef  resilient_queued_spin_lock_slowpath
+#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
+
+#include "qspinlock_paravirt.h"
+#include "rqspinlock.c"
+
+bool nopvspin;
+static __init int parse_nopvspin(char *arg)
+{
+	nopvspin = true;
+	return 0;
+}
+early_param("nopvspin", parse_nopvspin);
+#endif
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 05/22] rqspinlock: Add rqspinlock.h header
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (3 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 04/22] locking: Copy out qspinlock.c to rqspinlock.c Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 06/22] rqspinlock: Drop PV and virtualization support Kumar Kartikeya Dwivedi
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

This header contains the public declarations usable in the rest of the
kernel for rqspinlock.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 18 ++++++++++++++++++
 kernel/locking/rqspinlock.c      |  1 +
 2 files changed, 19 insertions(+)
 create mode 100644 include/asm-generic/rqspinlock.h

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
new file mode 100644
index 000000000000..5c2cd3097fb2
--- /dev/null
+++ b/include/asm-generic/rqspinlock.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock
+ *
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __ASM_GENERIC_RQSPINLOCK_H
+#define __ASM_GENERIC_RQSPINLOCK_H
+
+#include <linux/types.h>
+
+struct qspinlock;
+
+extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+
+#endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index caaa7c9bbc79..b7920ae79410 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -23,6 +23,7 @@
 #include <asm/byteorder.h>
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
+#include <asm/rqspinlock.h>
 
 /*
  * Include queued spinlock definitions and statistics code
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 06/22] rqspinlock: Drop PV and virtualization support
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (4 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 05/22] rqspinlock: Add rqspinlock.h header Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 07/22] rqspinlock: Add support for timeouts Kumar Kartikeya Dwivedi
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

Changes to rqspinlock in subsequent commits will be algorithmic
modifications, which won't remain in agreement with the implementations
of paravirt spin lock and virt_spin_lock support. These future changes
include measures for terminating waiting loops in slow path after a
certain point. While using a fair lock like qspinlock directly inside
virtual machines leads to suboptimal performance under certain
conditions, we cannot use the existing virtualization support before we
make it resilient as well.  Therefore, drop it for now.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 89 -------------------------------------
 1 file changed, 89 deletions(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index b7920ae79410..fada0dca6f3b 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -11,8 +11,6 @@
  *          Peter Zijlstra <peterz@infradead.org>
  */
 
-#ifndef _GEN_PV_LOCK_SLOWPATH
-
 #include <linux/smp.h>
 #include <linux/bug.h>
 #include <linux/cpumask.h>
@@ -75,38 +73,9 @@
  * contexts: task, softirq, hardirq, nmi.
  *
  * Exactly fits one 64-byte cacheline on a 64-bit architecture.
- *
- * PV doubles the storage and uses the second cacheline for PV state.
  */
 static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
 
-/*
- * Generate the native code for resilient_queued_spin_unlock_slowpath(); provide NOPs
- * for all the PV callbacks.
- */
-
-static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
-					   struct mcs_spinlock *prev) { }
-static __always_inline void __pv_kick_node(struct qspinlock *lock,
-					   struct mcs_spinlock *node) { }
-static __always_inline u32  __pv_wait_head_or_lock(struct qspinlock *lock,
-						   struct mcs_spinlock *node)
-						   { return 0; }
-
-#define pv_enabled()		false
-
-#define pv_init_node		__pv_init_node
-#define pv_wait_node		__pv_wait_node
-#define pv_kick_node		__pv_kick_node
-#define pv_wait_head_or_lock	__pv_wait_head_or_lock
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define resilient_queued_spin_lock_slowpath	native_resilient_queued_spin_lock_slowpath
-#endif
-
-#endif /* _GEN_PV_LOCK_SLOWPATH */
-
 /**
  * resilient_queued_spin_lock_slowpath - acquire the queued spinlock
  * @lock: Pointer to queued spinlock structure
@@ -136,12 +105,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
-	if (pv_enabled())
-		goto pv_queue;
-
-	if (virt_spin_lock(lock))
-		return;
-
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.
@@ -212,7 +175,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
-pv_queue:
 	node = this_cpu_ptr(&qnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -251,7 +213,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 
 	node->locked = 0;
 	node->next = NULL;
-	pv_init_node(node);
 
 	/*
 	 * We touched a (possibly) cold cacheline in the per-cpu queue node;
@@ -288,7 +249,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		pv_wait_node(node, prev);
 		arch_mcs_spin_lock_contended(&node->locked);
 
 		/*
@@ -312,23 +272,9 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 * store-release that clears the locked bit and create lock
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
-	 *
-	 * The PV pv_wait_head_or_lock function, if active, will acquire
-	 * the lock and return a non-zero value. So we have to skip the
-	 * atomic_cond_read_acquire() call. As the next PV queue head hasn't
-	 * been designated yet, there is no way for the locked value to become
-	 * _Q_SLOW_VAL. So both the set_locked() and the
-	 * atomic_cmpxchg_relaxed() calls will be safe.
-	 *
-	 * If PV isn't active, 0 will be returned instead.
-	 *
 	 */
-	if ((val = pv_wait_head_or_lock(lock, node)))
-		goto locked;
-
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
 
-locked:
 	/*
 	 * claim the lock:
 	 *
@@ -341,11 +287,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 */
 
 	/*
-	 * In the PV case we might already have _Q_LOCKED_VAL set, because
-	 * of lock stealing; therefore we must also allow:
-	 *
-	 * n,0,1 -> 0,0,1
-	 *
 	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
 	 *       above wait condition, therefore any concurrent setting of
 	 *       PENDING will make the uncontended transition fail.
@@ -369,7 +310,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 		next = smp_cond_load_relaxed(&node->next, (VAL));
 
 	arch_mcs_spin_unlock_contended(&next->locked);
-	pv_kick_node(lock, next);
 
 release:
 	trace_contention_end(lock, 0);
@@ -380,32 +320,3 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	__this_cpu_dec(qnodes[0].mcs.count);
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
-
-/*
- * Generate the paravirt code for resilient_queued_spin_unlock_slowpath().
- */
-#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
-#define _GEN_PV_LOCK_SLOWPATH
-
-#undef  pv_enabled
-#define pv_enabled()	true
-
-#undef pv_init_node
-#undef pv_wait_node
-#undef pv_kick_node
-#undef pv_wait_head_or_lock
-
-#undef  resilient_queued_spin_lock_slowpath
-#define resilient_queued_spin_lock_slowpath	__pv_resilient_queued_spin_lock_slowpath
-
-#include "qspinlock_paravirt.h"
-#include "rqspinlock.c"
-
-bool nopvspin;
-static __init int parse_nopvspin(char *arg)
-{
-	nopvspin = true;
-	return 0;
-}
-early_param("nopvspin", parse_nopvspin);
-#endif
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 07/22] rqspinlock: Add support for timeouts
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (5 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 06/22] rqspinlock: Drop PV and virtualization support Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 14:50   ` Peter Zijlstra
  2025-01-07 13:59 ` [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls Kumar Kartikeya Dwivedi
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect
when the timeout has expired for the slow path to return an error. It
depends on being passed two variables initialized to 0: ts, ret. The
'ts' parameter is of type rqspinlock_timeout.

This macro resolves to the (ret) expression so that it can be used in
statements like smp_cond_load_acquire to break the waiting loop
condition.

The 'spin' member is used to amortize the cost of checking time by
dispatching to the implementation every 64k iterations. The
'timeout_end' member is used to keep track of the timestamp that denotes
the end of the waiting period. The 'ret' parameter denotes the status of
the timeout, and can be checked in the slow path to detect timeouts
after waiting loops.

The 'duration' member is used to store the timeout duration for each
waiting loop, that is passed down from the caller of the slow path
function.  Use the RES_INIT_TIMEOUT macro to initialize it. The default
timeout value defined in the header (RES_DEF_TIMEOUT) is 0.5 seconds.

This macro will be used as a condition for waiting loops in the slow
path.  Since each waiting loop applies a fresh timeout using the same
rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the
values can be easily reinitialized to the default state.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  8 +++++-
 kernel/locking/rqspinlock.c      | 46 +++++++++++++++++++++++++++++++-
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 5c2cd3097fb2..8ed266f4e70b 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -10,9 +10,15 @@
 #define __ASM_GENERIC_RQSPINLOCK_H
 
 #include <linux/types.h>
+#include <vdso/time64.h>
 
 struct qspinlock;
 
-extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+/*
+ * Default timeout for waiting loops is 0.5 seconds
+ */
+#define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
+
+extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index fada0dca6f3b..815feb24d512 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -6,9 +6,11 @@
  * (C) Copyright 2013-2014,2018 Red Hat, Inc.
  * (C) Copyright 2015 Intel Corp.
  * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
  *
  * Authors: Waiman Long <longman@redhat.com>
  *          Peter Zijlstra <peterz@infradead.org>
+ *          Kumar Kartikeya Dwivedi <memxor@gmail.com>
  */
 
 #include <linux/smp.h>
@@ -22,6 +24,7 @@
 #include <asm/qspinlock.h>
 #include <trace/events/lock.h>
 #include <asm/rqspinlock.h>
+#include <linux/timekeeping.h>
 
 /*
  * Include queued spinlock definitions and statistics code
@@ -68,6 +71,44 @@
 
 #include "mcs_spinlock.h"
 
+struct rqspinlock_timeout {
+	u64 timeout_end;
+	u64 duration;
+	u16 spin;
+};
+
+static noinline int check_timeout(struct rqspinlock_timeout *ts)
+{
+	u64 time = ktime_get_mono_fast_ns();
+
+	if (!ts->timeout_end) {
+		ts->timeout_end = time + ts->duration;
+		return 0;
+	}
+
+	if (time > ts->timeout_end)
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+#define RES_CHECK_TIMEOUT(ts, ret)                    \
+	({                                            \
+		if (!((ts).spin++ & 0xffff))          \
+			(ret) = check_timeout(&(ts)); \
+		(ret);                                \
+	})
+
+/*
+ * Initialize the 'duration' member with the chosen timeout.
+ */
+#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 1; (ts).duration = _timeout; })
+
+/*
+ * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
+ */
+#define RES_RESET_TIMEOUT(ts) ({ (ts).timeout_end = 0; })
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -97,14 +138,17 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout)
 {
 	struct mcs_spinlock *prev, *next, *node;
+	struct rqspinlock_timeout ts;
 	u32 old, tail;
 	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
+	RES_INIT_TIMEOUT(ts, timeout);
+
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (6 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 07/22] rqspinlock: Add support for timeouts Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 14:51   ` Peter Zijlstra
  2025-01-08  2:19   ` Waiman Long
  2025-01-07 13:59 ` [PATCH bpf-next v1 09/22] rqspinlock: Protect waiters in queue " Kumar Kartikeya Dwivedi
                   ` (14 subsequent siblings)
  22 siblings, 2 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

The pending bit is used to avoid queueing in case the lock is
uncontended, and has demonstrated benefits for the 2 contender scenario,
esp. on x86. In case the pending bit is acquired and we wait for the
locked bit to disappear, we may get stuck due to the lock owner not
making progress. Hence, this waiting loop must be protected with a
timeout check.

To perform a graceful recovery once we decide to abort our lock
acquisition attempt in this case, we must unset the pending bit since we
own it. All waiters undoing their changes and exiting gracefully allows
the lock word to be restored to the unlocked state once all participants
(owner, waiters) have been recovered, and the lock remains usable.
Hence, set the pending bit back to zero before returning to the caller.

Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout
event statistics.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h  |  2 +-
 kernel/locking/lock_events_list.h |  5 +++++
 kernel/locking/rqspinlock.c       | 28 +++++++++++++++++++++++-----
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 8ed266f4e70b..5c996a82e75f 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -19,6 +19,6 @@ struct qspinlock;
  */
 #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
 
-extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
+extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 97fb6f3f840a..c5286249994d 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4)	/* # of locking ops that use 4th percpu node */
 LOCK_EVENT(lock_no_node)	/* # of locking ops w/o using percpu node    */
 #endif /* CONFIG_QUEUED_SPINLOCKS */
 
+/*
+ * Locking events for Resilient Queued Spin Lock
+ */
+LOCK_EVENT(rqspinlock_lock_timeout)	/* # of locking ops that timeout	*/
+
 /*
  * Locking events for rwsem
  */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 815feb24d512..dd305573db13 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -138,12 +138,12 @@ static DEFINE_PER_CPU_ALIGNED(struct qnode, qnodes[_Q_MAX_NODES]);
  * contended             :    (*,x,y) +--> (*,0,0) ---> (*,0,1) -'  :
  *   queue               :         ^--'                             :
  */
-void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout)
+int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout)
 {
 	struct mcs_spinlock *prev, *next, *node;
 	struct rqspinlock_timeout ts;
+	int idx, ret = 0;
 	u32 old, tail;
-	int idx;
 
 	BUILD_BUG_ON(CONFIG_NR_CPUS >= (1U << _Q_TAIL_CPU_BITS));
 
@@ -201,8 +201,25 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 * clear_pending_set_locked() implementations imply full
 	 * barriers.
 	 */
-	if (val & _Q_LOCKED_MASK)
-		smp_cond_load_acquire(&lock->locked, !VAL);
+	if (val & _Q_LOCKED_MASK) {
+		RES_RESET_TIMEOUT(ts);
+		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+	}
+
+	if (ret) {
+		/*
+		 * We waited for the locked bit to go back to 0, as the pending
+		 * waiter, but timed out. We need to clear the pending bit since
+		 * we own it. Once a stuck owner has been recovered, the lock
+		 * must be restored to a valid state, hence removing the pending
+		 * bit is necessary.
+		 *
+		 * *,1,* -> *,0,*
+		 */
+		clear_pending(lock);
+		lockevent_inc(rqspinlock_lock_timeout);
+		return ret;
+	}
 
 	/*
 	 * take ownership and clear the pending bit.
@@ -211,7 +228,7 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 */
 	clear_pending_set_locked(lock);
 	lockevent_inc(lock_pending);
-	return;
+	return 0;
 
 	/*
 	 * End of pending bit optimistic spinning and beginning of MCS
@@ -362,5 +379,6 @@ void __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32
 	 * release the node
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
+	return 0;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 09/22] rqspinlock: Protect waiters in queue from stalls
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (7 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-08  3:38   ` Waiman Long
  2025-01-07 13:59 ` [PATCH bpf-next v1 10/22] rqspinlock: Protect waiters in trylock fallback " Kumar Kartikeya Dwivedi
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

Implement the wait queue cleanup algorithm for rqspinlock. There are
three forms of waiters in the original queued spin lock algorithm. The
first is the waiter which acquires the pending bit and spins on the lock
word without forming a wait queue. The second is the head waiter that is
the first waiter heading the wait queue. The third form is of all the
non-head waiters queued behind the head, waiting to be signalled through
their MCS node to overtake the responsibility of the head.

In this commit, we are concerned with the second and third kind. First,
we augment the waiting loop of the head of the wait queue with a
timeout. When this timeout happens, all waiters part of the wait queue
will abort their lock acquisition attempts. This happens in three steps.
First, the head breaks out of its loop waiting for pending and locked
bits to turn to 0, and non-head waiters break out of their MCS node spin
(more on that later). Next, every waiter (head or non-head) attempts to
check whether they are also the tail waiter, in such a case they attempt
to zero out the tail word and allow a new queue to be built up for this
lock. If they succeed, they have no one to signal next in the queue to
stop spinning. Otherwise, they signal the MCS node of the next waiter to
break out of its spin and try resetting the tail word back to 0. This
goes on until the tail waiter is found. In case of races, the new tail
will be responsible for performing the same task, as the old tail will
then fail to reset the tail word and wait for its next pointer to be
updated before it signals the new tail to do the same.

Lastly, all of these waiters release the rqnode and return to the
caller. This patch underscores the point that rqspinlock's timeout does
not apply to each waiter individually, and cannot be relied upon as an
upper bound. It is possible for the rqspinlock waiters to return early
from a failed lock acquisition attempt as soon as stalls are detected.

The head waiter cannot directly WRITE_ONCE the tail to zero, as it may
race with a concurrent xchg and a non-head waiter linking its MCS node
to the head's MCS node through 'prev->next' assignment.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 42 +++++++++++++++++++++++++++++---
 kernel/locking/rqspinlock.h | 48 +++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 3 deletions(-)
 create mode 100644 kernel/locking/rqspinlock.h

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index dd305573db13..f712fe4b1f38 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -77,6 +77,8 @@ struct rqspinlock_timeout {
 	u16 spin;
 };
 
+#define RES_TIMEOUT_VAL	2
+
 static noinline int check_timeout(struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
@@ -305,12 +307,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 * head of the waitqueue.
 	 */
 	if (old & _Q_TAIL_MASK) {
+		int val;
+
 		prev = decode_tail(old, qnodes);
 
 		/* Link @node into the waitqueue. */
 		WRITE_ONCE(prev->next, node);
 
-		arch_mcs_spin_lock_contended(&node->locked);
+		val = arch_mcs_spin_lock_contended(&node->locked);
+		if (val == RES_TIMEOUT_VAL) {
+			ret = -EDEADLK;
+			goto waitq_timeout;
+		}
 
 		/*
 		 * While waiting for the MCS lock, the next pointer may have
@@ -334,7 +342,35 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 * sequentiality; this is because the set_locked() function below
 	 * does not imply a full barrier.
 	 */
-	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
+	RES_RESET_TIMEOUT(ts);
+	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
+				       RES_CHECK_TIMEOUT(ts, ret));
+
+waitq_timeout:
+	if (ret) {
+		/*
+		 * If the tail is still pointing to us, then we are the final waiter,
+		 * and are responsible for resetting the tail back to 0. Otherwise, if
+		 * the cmpxchg operation fails, we signal the next waiter to take exit
+		 * and try the same. For a waiter with tail node 'n':
+		 *
+		 * n,*,* -> 0,*,*
+		 *
+		 * When performing cmpxchg for the whole word (NR_CPUS > 16k), it is
+		 * possible locked/pending bits keep changing and we see failures even
+		 * when we remain the head of wait queue. However, eventually, for the
+		 * case without corruption, pending bit owner will unset the pending
+		 * bit, and new waiters will queue behind us. This will leave the lock
+		 * owner in charge, and it will eventually either set locked bit to 0,
+		 * or leave it as 1, allowing us to make progress.
+		 */
+		if (!try_cmpxchg_tail(lock, tail, 0)) {
+			next = smp_cond_load_relaxed(&node->next, VAL);
+			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
+		}
+		lockevent_inc(rqspinlock_lock_timeout);
+		goto release;
+	}
 
 	/*
 	 * claim the lock:
@@ -379,6 +415,6 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 * release the node
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
diff --git a/kernel/locking/rqspinlock.h b/kernel/locking/rqspinlock.h
new file mode 100644
index 000000000000..3cec3a0f2d7e
--- /dev/null
+++ b/kernel/locking/rqspinlock.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Resilient Queued Spin Lock defines
+ *
+ * (C) Copyright 2024 Meta Platforms, Inc. and affiliates.
+ *
+ * Authors: Kumar Kartikeya Dwivedi <memxor@gmail.com>
+ */
+#ifndef __LINUX_RQSPINLOCK_H
+#define __LINUX_RQSPINLOCK_H
+
+#include "qspinlock.h"
+
+/*
+ * try_cmpxchg_tail - Return result of cmpxchg of tail word with a new value
+ * @lock: Pointer to queued spinlock structure
+ * @tail: The tail to compare against
+ * @new_tail: The new queue tail code word
+ * Return: Bool to indicate whether the cmpxchg operation succeeded
+ *
+ * This is used by the head of the wait queue to clean up the queue.
+ * Provides relaxed ordering, since observers only rely on initialized
+ * state of the node which was made visible through the xchg_tail operation,
+ * i.e. through the smp_wmb preceding xchg_tail.
+ *
+ * We avoid using 16-bit cmpxchg, which is not available on all architectures.
+ */
+static __always_inline bool try_cmpxchg_tail(struct qspinlock *lock, u32 tail, u32 new_tail)
+{
+	u32 old, new;
+
+	old = atomic_read(&lock->val);
+	do {
+		/*
+		 * Is the tail part we compare to already stale? Fail.
+		 */
+		if ((old & _Q_TAIL_MASK) != tail)
+			return false;
+		/*
+		 * Encode latest locked/pending state for new tail.
+		 */
+		new = (old & _Q_LOCKED_PENDING_MASK) | new_tail;
+	} while (!atomic_try_cmpxchg_relaxed(&lock->val, &old, new));
+
+	return true;
+}
+
+#endif /* __LINUX_RQSPINLOCK_H */
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 10/22] rqspinlock: Protect waiters in trylock fallback from stalls
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (8 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 09/22] rqspinlock: Protect waiters in queue " Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and recovery Kumar Kartikeya Dwivedi
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

When we run out of maximum rqnodes, the original queued spin lock slow
path falls back to a try lock. In such a case, we are again susceptible
to stalls in case the lock owner fails to make progress. We use the
timeout as a fallback to break out of this loop and return to the
caller. This is a fallback for an extreme edge case, when on the same
CPU we run out of all 4 qnodes. When could this happen? We are in slow
path in task context, we get interrupted by an IRQ, which while in the
slow path gets interrupted by an NMI, whcih in the slow path gets
another nested NMI, which enters the slow path. All of the interruptions
happen after node->count++.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index f712fe4b1f38..b63f92bd43b1 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -255,8 +255,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 	if (unlikely(idx >= _Q_MAX_NODES)) {
 		lockevent_inc(lock_no_node);
-		while (!queued_spin_trylock(lock))
+		RES_RESET_TIMEOUT(ts);
+		while (!queued_spin_trylock(lock)) {
+			if (RES_CHECK_TIMEOUT(ts, ret)) {
+				lockevent_inc(rqspinlock_lock_timeout);
+				break;
+			}
 			cpu_relax();
+		}
 		goto release;
 	}
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and recovery
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (9 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 10/22] rqspinlock: Protect waiters in trylock fallback " Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-08 16:06   ` Waiman Long
  2025-01-07 13:59 ` [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT Kumar Kartikeya Dwivedi
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

While the timeout logic provides guarantees for the waiter's forward
progress, the time until a stalling waiter unblocks can still be long.
The default timeout of 1/2 sec can be excessively long for some use
cases.  Additionally, custom timeouts may exacerbate recovery time.

Introduce logic to detect common cases of deadlocks and perform quicker
recovery. This is done by dividing the time from entry into the locking
slow path until the timeout into intervals of 1 ms. Then, after each
interval elapses, deadlock detection is performed, while also polling
the lock word to ensure we can quickly break out of the detection logic
and proceed with lock acquisition.

A 'held_locks' table is maintained per-CPU where the entry at the bottom
denotes a lock being waited for or already taken. Entries coming before
it denote locks that are already held. The current CPU's table can thus
be looked at to detect AA deadlocks. The tables from other CPUs can be
looked at to discover ABBA situations. Finally, when a matching entry
for the lock being taken on the current CPU is found on some other CPU,
a deadlock situation is detected. This function can take a long time,
therefore the lock word is constantly polled in each loop iteration to
ensure we can preempt detection and proceed with lock acquisition, using
the is_lock_released check.

We set 'spin' member of rqspinlock_timeout struct to 0 to trigger
deadlock checks immediately to perform faster recovery.

Note: Extending lock word size by 4 bytes to record owner CPU can allow
faster detection for ABBA. It is typically the owner which participates
in a ABBA situation. However, to keep compatibility with existing lock
words in the kernel (struct qspinlock), and given deadlocks are a rare
event triggered by bugs, we choose to favor compatibility over faster
detection.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  56 +++++++++-
 kernel/locking/rqspinlock.c      | 178 ++++++++++++++++++++++++++++---
 2 files changed, 220 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 5c996a82e75f..c7e33ccc57a6 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -11,14 +11,68 @@
 
 #include <linux/types.h>
 #include <vdso/time64.h>
+#include <linux/percpu.h>
 
 struct qspinlock;
 
+extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
+
 /*
  * Default timeout for waiting loops is 0.5 seconds
  */
 #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
 
-extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
+#define RES_NR_HELD 32
+
+struct rqspinlock_held {
+	int cnt;
+	void *locks[RES_NR_HELD];
+};
+
+DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static __always_inline void grab_held_lock_entry(void *lock)
+{
+	int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt);
+
+	if (unlikely(cnt > RES_NR_HELD)) {
+		/* Still keep the inc so we decrement later. */
+		return;
+	}
+
+	/*
+	 * Implied compiler barrier in per-CPU operations; otherwise we can have
+	 * the compiler reorder inc with write to table, allowing interrupts to
+	 * overwrite and erase our write to the table (as on interrupt exit it
+	 * will be reset to NULL).
+	 */
+	this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock);
+}
+
+/*
+ * It is possible to run into misdetection scenarios of AA deadlocks on the same
+ * CPU, and missed ABBA deadlocks on remote CPUs when this function pops entries
+ * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
+ * logic to preserve right entries in the table would be to walk the array of
+ * held locks and swap and clear out-of-order entries, but that's too
+ * complicated and we don't have a compelling use case for out of order unlocking.
+ *
+ * Therefore, we simply don't support such cases and keep the logic simple here.
+ */
+static __always_inline void release_held_lock_entry(void)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto dec;
+	smp_store_release(&rqh->locks[rqh->cnt - 1], NULL);
+	/*
+	 * Overwrite of NULL should appear before our decrement of the count to
+	 * other CPUs, otherwise we have the issue of a stale non-NULL entry being
+	 * visible in the array, leading to misdetection during deadlock detection.
+	 */
+dec:
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+}
 
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index b63f92bd43b1..b7c86127d288 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -30,6 +30,7 @@
  * Include queued spinlock definitions and statistics code
  */
 #include "qspinlock.h"
+#include "rqspinlock.h"
 #include "qspinlock_stat.h"
 
 /*
@@ -74,16 +75,141 @@
 struct rqspinlock_timeout {
 	u64 timeout_end;
 	u64 duration;
+	u64 cur;
 	u16 spin;
 };
 
 #define RES_TIMEOUT_VAL	2
 
-static noinline int check_timeout(struct rqspinlock_timeout *ts)
+DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+
+static bool is_lock_released(struct qspinlock *lock, u32 mask, struct rqspinlock_timeout *ts)
+{
+	if (!(atomic_read_acquire(&lock->val) & (mask)))
+		return true;
+	return false;
+}
+
+static noinline int check_deadlock_AA(struct qspinlock *lock, u32 mask,
+				      struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int cnt = min(RES_NR_HELD, rqh->cnt);
+
+	/*
+	 * Return an error if we hold the lock we are attempting to acquire.
+	 * We'll iterate over max 32 locks; no need to do is_lock_released.
+	 */
+	for (int i = 0; i < cnt - 1; i++) {
+		if (rqh->locks[i] == lock)
+			return -EDEADLK;
+	}
+	return 0;
+}
+
+static noinline int check_deadlock_ABBA(struct qspinlock *lock, u32 mask,
+					struct rqspinlock_timeout *ts)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+	int rqh_cnt = min(RES_NR_HELD, rqh->cnt);
+	void *remote_lock;
+	int cpu;
+
+	/*
+	 * Find the CPU holding the lock that we want to acquire. If there is a
+	 * deadlock scenario, we will read a stable set on the remote CPU and
+	 * find the target. This would be a constant time operation instead of
+	 * O(NR_CPUS) if we could determine the owning CPU from a lock value, but
+	 * that requires increasing the size of the lock word.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct rqspinlock_held *rqh_cpu = per_cpu_ptr(&rqspinlock_held_locks, cpu);
+		int real_cnt = READ_ONCE(rqh_cpu->cnt);
+		int cnt = min(RES_NR_HELD, real_cnt);
+
+		/*
+		 * Let's ensure to break out of this loop if the lock is available for
+		 * us to potentially acquire.
+		 */
+		if (is_lock_released(lock, mask, ts))
+			return 0;
+
+		/*
+		 * Skip ourselves, and CPUs whose count is less than 2, as they need at
+		 * least one held lock and one acquisition attempt (reflected as top
+		 * most entry) to participate in an ABBA deadlock.
+		 *
+		 * If cnt is more than RES_NR_HELD, it means the current lock being
+		 * acquired won't appear in the table, and other locks in the table are
+		 * already held, so we can't determine ABBA.
+		 */
+		if (cpu == smp_processor_id() || real_cnt < 2 || real_cnt > RES_NR_HELD)
+			continue;
+
+		/*
+		 * Obtain the entry at the top, this corresponds to the lock the
+		 * remote CPU is attempting to acquire in a deadlock situation,
+		 * and would be one of the locks we hold on the current CPU.
+		 */
+		remote_lock = READ_ONCE(rqh_cpu->locks[cnt - 1]);
+		/*
+		 * If it is NULL, we've raced and cannot determine a deadlock
+		 * conclusively, skip this CPU.
+		 */
+		if (!remote_lock)
+			continue;
+		/*
+		 * Find if the lock we're attempting to acquire is held by this CPU.
+		 * Don't consider the topmost entry, as that must be the latest lock
+		 * being held or acquired.  For a deadlock, the target CPU must also
+		 * attempt to acquire a lock we hold, so for this search only 'cnt - 1'
+		 * entries are important.
+		 */
+		for (int i = 0; i < cnt - 1; i++) {
+			if (READ_ONCE(rqh_cpu->locks[i]) != lock)
+				continue;
+			/*
+			 * We found our lock as held on the remote CPU.  Is the
+			 * acquisition attempt on the remote CPU for a lock held
+			 * by us?  If so, we have a deadlock situation, and need
+			 * to recover.
+			 */
+			for (int i = 0; i < rqh_cnt - 1; i++) {
+				if (rqh->locks[i] == remote_lock)
+					return -EDEADLK;
+			}
+			/*
+			 * Inconclusive; retry again later.
+			 */
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static noinline int check_deadlock(struct qspinlock *lock, u32 mask,
+				   struct rqspinlock_timeout *ts)
+{
+	int ret;
+
+	ret = check_deadlock_AA(lock, mask, ts);
+	if (ret)
+		return ret;
+	ret = check_deadlock_ABBA(lock, mask, ts);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static noinline int check_timeout(struct qspinlock *lock, u32 mask,
+				  struct rqspinlock_timeout *ts)
 {
 	u64 time = ktime_get_mono_fast_ns();
+	u64 prev = ts->cur;
 
 	if (!ts->timeout_end) {
+		ts->cur = time;
 		ts->timeout_end = time + ts->duration;
 		return 0;
 	}
@@ -91,20 +217,30 @@ static noinline int check_timeout(struct rqspinlock_timeout *ts)
 	if (time > ts->timeout_end)
 		return -ETIMEDOUT;
 
+	/*
+	 * A millisecond interval passed from last time? Trigger deadlock
+	 * checks.
+	 */
+	if (prev + NSEC_PER_MSEC < time) {
+		ts->cur = time;
+		return check_deadlock(lock, mask, ts);
+	}
+
 	return 0;
 }
 
-#define RES_CHECK_TIMEOUT(ts, ret)                    \
-	({                                            \
-		if (!((ts).spin++ & 0xffff))          \
-			(ret) = check_timeout(&(ts)); \
-		(ret);                                \
+#define RES_CHECK_TIMEOUT(ts, ret, mask)                              \
+	({                                                            \
+		if (!((ts).spin++ & 0xffff))                          \
+			(ret) = check_timeout((lock), (mask), &(ts)); \
+		(ret);                                                \
 	})
 
 /*
  * Initialize the 'duration' member with the chosen timeout.
+ * Set spin member to 0 to trigger AA/ABBA checks immediately.
  */
-#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 1; (ts).duration = _timeout; })
+#define RES_INIT_TIMEOUT(ts, _timeout) ({ (ts).spin = 0; (ts).duration = _timeout; })
 
 /*
  * We only need to reset 'timeout_end', 'spin' will just wrap around as necessary.
@@ -192,6 +328,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 		goto queue;
 	}
 
+	/*
+	 * Grab an entry in the held locks array, to enable deadlock detection.
+	 */
+	grab_held_lock_entry(lock);
+
 	/*
 	 * We're pending, wait for the owner to go away.
 	 *
@@ -205,7 +346,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 	if (val & _Q_LOCKED_MASK) {
 		RES_RESET_TIMEOUT(ts);
-		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
+		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_MASK));
 	}
 
 	if (ret) {
@@ -220,7 +361,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 		 */
 		clear_pending(lock);
 		lockevent_inc(rqspinlock_lock_timeout);
-		return ret;
+		goto err_release_entry;
 	}
 
 	/*
@@ -238,6 +379,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 queue:
 	lockevent_inc(lock_slowpath);
+	/*
+	 * Grab deadlock detection entry for the queue path.
+	 */
+	grab_held_lock_entry(lock);
+
 	node = this_cpu_ptr(&qnodes[0].mcs);
 	idx = node->count++;
 	tail = encode_tail(smp_processor_id(), idx);
@@ -257,9 +403,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 		lockevent_inc(lock_no_node);
 		RES_RESET_TIMEOUT(ts);
 		while (!queued_spin_trylock(lock)) {
-			if (RES_CHECK_TIMEOUT(ts, ret)) {
+			if (RES_CHECK_TIMEOUT(ts, ret, ~0u)) {
 				lockevent_inc(rqspinlock_lock_timeout);
-				break;
+				goto err_release_node;
 			}
 			cpu_relax();
 		}
@@ -350,7 +496,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 	RES_RESET_TIMEOUT(ts);
 	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
-				       RES_CHECK_TIMEOUT(ts, ret));
+				       RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
 
 waitq_timeout:
 	if (ret) {
@@ -375,7 +521,7 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 			WRITE_ONCE(next->locked, RES_TIMEOUT_VAL);
 		}
 		lockevent_inc(rqspinlock_lock_timeout);
-		goto release;
+		goto err_release_node;
 	}
 
 	/*
@@ -422,5 +568,11 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	 */
 	__this_cpu_dec(qnodes[0].mcs.count);
 	return ret;
+err_release_node:
+	trace_contention_end(lock, ret);
+	__this_cpu_dec(qnodes[0].mcs.count);
+err_release_entry:
+	release_held_lock_entry();
+	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (10 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and recovery Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-08 16:27   ` Waiman Long
  2025-01-07 13:59 ` [PATCH bpf-next v1 13/22] rqspinlock: Add helper to print a splat on timeout or deadlock Kumar Kartikeya Dwivedi
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

We ripped out PV and virtualization related bits from rqspinlock in an
earlier commit, however, a fair lock performs poorly within a virtual
machine when the lock holder is preempted. As such, retain the
virt_spin_lock fallback to test and set lock, but with timeout and
deadlock detection.

We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
requires more involved algorithmic changes and introduces more
complexity. It can be done when the need arises in the future.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/include/asm/rqspinlock.h | 20 ++++++++++++++++
 include/asm-generic/rqspinlock.h  |  7 ++++++
 kernel/locking/rqspinlock.c       | 38 +++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+)
 create mode 100644 arch/x86/include/asm/rqspinlock.h

diff --git a/arch/x86/include/asm/rqspinlock.h b/arch/x86/include/asm/rqspinlock.h
new file mode 100644
index 000000000000..ecfb7dfe6370
--- /dev/null
+++ b/arch/x86/include/asm/rqspinlock.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_RQSPINLOCK_H
+#define _ASM_X86_RQSPINLOCK_H
+
+#include <asm/paravirt.h>
+
+#ifdef CONFIG_PARAVIRT
+DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
+
+#define resilient_virt_spin_lock_enabled resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+       return static_branch_likely(&virt_spin_lock_key);
+}
+
+#endif /* CONFIG_PARAVIRT */
+
+#include <asm-generic/rqspinlock.h>
+
+#endif /* _ASM_X86_RQSPINLOCK_H */
diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index c7e33ccc57a6..dc436ab01471 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -17,6 +17,13 @@ struct qspinlock;
 
 extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
 
+#ifndef resilient_virt_spin_lock_enabled
+static __always_inline bool resilient_virt_spin_lock_enabled(void)
+{
+	return false;
+}
+#endif
+
 /*
  * Default timeout for waiting loops is 0.5 seconds
  */
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index b7c86127d288..e397f91ebcf6 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -247,6 +247,41 @@ static noinline int check_timeout(struct qspinlock *lock, u32 mask,
  */
 #define RES_RESET_TIMEOUT(ts) ({ (ts).timeout_end = 0; })
 
+#ifdef CONFIG_PARAVIRT
+
+static inline int resilient_virt_spin_lock(struct qspinlock *lock, struct rqspinlock_timeout *ts)
+{
+	int val, ret = 0;
+
+	RES_RESET_TIMEOUT(*ts);
+	grab_held_lock_entry(lock);
+retry:
+	val = atomic_read(&lock->val);
+
+	if (val || !atomic_try_cmpxchg(&lock->val, &val, _Q_LOCKED_VAL)) {
+		if (RES_CHECK_TIMEOUT(*ts, ret, ~0u)) {
+			lockevent_inc(rqspinlock_lock_timeout);
+			goto timeout;
+		}
+		cpu_relax();
+		goto retry;
+	}
+
+	return 0;
+timeout:
+	release_held_lock_entry();
+	return ret;
+}
+
+#else
+
+static __always_inline int resilient_virt_spin_lock(struct qspinlock *lock, struct rqspinlock_timeout *ts)
+{
+	return 0;
+}
+
+#endif /* CONFIG_PARAVIRT */
+
 /*
  * Per-CPU queue node structures; we can never have more than 4 nested
  * contexts: task, softirq, hardirq, nmi.
@@ -287,6 +322,9 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 
 	RES_INIT_TIMEOUT(ts, timeout);
 
+	if (resilient_virt_spin_lock_enabled())
+		return resilient_virt_spin_lock(lock, &ts);
+
 	/*
 	 * Wait for in-progress pending->locked hand-overs with a bounded
 	 * number of spins so that we guarantee forward progress.
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 13/22] rqspinlock: Add helper to print a splat on timeout or deadlock
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (11 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage Kumar Kartikeya Dwivedi
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Whenever a timeout and a deadlock occurs, we would want to print a
message to the dmesg console, including the CPU where the event
occurred, the list of locks in the held locks table, and the stack trace
of the caller, which allows determining where exactly in the slow path
the waiter timed out or detected a deadlock.

Splats are limited to atmost one per-CPU during machine uptime, and a
lock is acquired to ensure that no interleaving occurs when a concurrent
set of CPUs conflict and enter a deadlock situation and start printing
data.

Later patches will use this to inspect return value of rqspinlock API
and then report a violation if necessary.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/rqspinlock.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index e397f91ebcf6..467336f6828e 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -187,6 +187,35 @@ static noinline int check_deadlock_ABBA(struct qspinlock *lock, u32 mask,
 	return 0;
 }
 
+static DEFINE_PER_CPU(int, report_nest_cnt);
+static DEFINE_PER_CPU(bool, report_flag);
+static arch_spinlock_t report_lock;
+
+static void rqspinlock_report_violation(const char *s, void *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (this_cpu_inc_return(report_nest_cnt) != 1) {
+		this_cpu_dec(report_nest_cnt);
+		return;
+	}
+	if (this_cpu_read(report_flag))
+		goto end;
+	this_cpu_write(report_flag, true);
+	arch_spin_lock(&report_lock);
+
+	pr_err("CPU %d: %s", smp_processor_id(), s);
+	pr_info("Held locks: %d\n", rqh->cnt + 1);
+	pr_info("Held lock[%2d] = 0x%px\n", 0, lock);
+	for (int i = 0; i < min(RES_NR_HELD, rqh->cnt); i++)
+		pr_info("Held lock[%2d] = 0x%px\n", i + 1, rqh->locks[i]);
+	dump_stack();
+
+	arch_spin_unlock(&report_lock);
+end:
+	this_cpu_dec(report_nest_cnt);
+}
+
 static noinline int check_deadlock(struct qspinlock *lock, u32 mask,
 				   struct rqspinlock_timeout *ts)
 {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (12 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 13/22] rqspinlock: Add helper to print a splat on timeout or deadlock Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-08 16:55   ` Waiman Long
  2025-01-07 13:59 ` [PATCH bpf-next v1 15/22] rqspinlock: Add locktorture support Kumar Kartikeya Dwivedi
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Introduce helper macros that wrap around the rqspinlock slow path and
provide an interface analogous to the raw_spin_lock API. Note that
in case of error conditions, preemption and IRQ disabling is
automatically unrolled before returning the error back to the caller.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h | 58 ++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index dc436ab01471..53be8426373c 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -12,8 +12,10 @@
 #include <linux/types.h>
 #include <vdso/time64.h>
 #include <linux/percpu.h>
+#include <asm/qspinlock.h>
 
 struct qspinlock;
+typedef struct qspinlock rqspinlock_t;
 
 extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
 
@@ -82,4 +84,60 @@ static __always_inline void release_held_lock_entry(void)
 	this_cpu_dec(rqspinlock_held_locks.cnt);
 }
 
+/**
+ * res_spin_lock - acquire a queued spinlock
+ * @lock: Pointer to queued spinlock structure
+ */
+static __always_inline int res_spin_lock(rqspinlock_t *lock)
+{
+	int val = 0;
+
+	if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
+		grab_held_lock_entry(lock);
+		return 0;
+	}
+	return resilient_queued_spin_lock_slowpath(lock, val, RES_DEF_TIMEOUT);
+}
+
+static __always_inline void res_spin_unlock(rqspinlock_t *lock)
+{
+	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
+
+	if (unlikely(rqh->cnt > RES_NR_HELD))
+		goto unlock;
+	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
+	/*
+	 * Release barrier, ensuring ordering. See release_held_lock_entry.
+	 */
+unlock:
+	queued_spin_unlock(lock);
+	this_cpu_dec(rqspinlock_held_locks.cnt);
+}
+
+#define raw_res_spin_lock_init(lock) ({ *(lock) = (struct qspinlock)__ARCH_SPIN_LOCK_UNLOCKED; })
+
+#define raw_res_spin_lock(lock)                    \
+	({                                         \
+		int __ret;                         \
+		preempt_disable();                 \
+		__ret = res_spin_lock(lock);	   \
+		if (__ret)                         \
+			preempt_enable();          \
+		__ret;                             \
+	})
+
+#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); })
+
+#define raw_res_spin_lock_irqsave(lock, flags)    \
+	({                                        \
+		int __ret;                        \
+		local_irq_save(flags);            \
+		__ret = raw_res_spin_lock(lock);  \
+		if (__ret)                        \
+			local_irq_restore(flags); \
+		__ret;                            \
+	})
+
+#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); })
+
 #endif /* __ASM_GENERIC_RQSPINLOCK_H */
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 15/22] rqspinlock: Add locktorture support
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (13 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 16/22] rqspinlock: Add entry to Makefile, MAINTAINERS Kumar Kartikeya Dwivedi
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Introduce locktorture support for rqspinlock using the newly added
macros as the first in-kernel user and consumer.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/locking/locktorture.c | 51 ++++++++++++++++++++++++++++++++++++
 kernel/locking/rqspinlock.c  |  1 +
 2 files changed, 52 insertions(+)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index de95ec07e477..897a7de0cd83 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -362,6 +362,56 @@ static struct lock_torture_ops raw_spin_lock_irq_ops = {
 	.name		= "raw_spin_lock_irq"
 };
 
+#include <asm/rqspinlock.h>
+static rqspinlock_t rqspinlock;
+
+static int torture_raw_res_spin_write_lock(int tid __maybe_unused)
+{
+	raw_res_spin_lock(&rqspinlock);
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock(int tid __maybe_unused)
+{
+	raw_res_spin_unlock(&rqspinlock);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_ops = {
+	.writelock	= torture_raw_res_spin_write_lock,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock"
+};
+
+static int torture_raw_res_spin_write_lock_irq(int tid __maybe_unused)
+{
+	unsigned long flags;
+
+	raw_res_spin_lock_irqsave(&rqspinlock, flags);
+	cxt.cur_ops->flags = flags;
+	return 0;
+}
+
+static void torture_raw_res_spin_write_unlock_irq(int tid __maybe_unused)
+{
+	raw_res_spin_unlock_irqrestore(&rqspinlock, cxt.cur_ops->flags);
+}
+
+static struct lock_torture_ops raw_res_spin_lock_irq_ops = {
+	.writelock	= torture_raw_res_spin_write_lock_irq,
+	.write_delay	= torture_spin_lock_write_delay,
+	.task_boost     = torture_rt_boost,
+	.writeunlock	= torture_raw_res_spin_write_unlock_irq,
+	.readlock       = NULL,
+	.read_delay     = NULL,
+	.readunlock     = NULL,
+	.name		= "raw_res_spin_lock_irq"
+};
+
 static DEFINE_RWLOCK(torture_rwlock);
 
 static int torture_rwlock_write_lock(int tid __maybe_unused)
@@ -1168,6 +1218,7 @@ static int __init lock_torture_init(void)
 		&lock_busted_ops,
 		&spin_lock_ops, &spin_lock_irq_ops,
 		&raw_spin_lock_ops, &raw_spin_lock_irq_ops,
+		&raw_res_spin_lock_ops, &raw_res_spin_lock_irq_ops,
 		&rw_lock_ops, &rw_lock_irq_ops,
 		&mutex_lock_ops,
 		&ww_mutex_lock_ops,
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 467336f6828e..9d3036f5e613 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -82,6 +82,7 @@ struct rqspinlock_timeout {
 #define RES_TIMEOUT_VAL	2
 
 DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
+EXPORT_SYMBOL_GPL(rqspinlock_held_locks);
 
 static bool is_lock_released(struct qspinlock *lock, u32 mask, struct rqspinlock_timeout *ts)
 {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 16/22] rqspinlock: Add entry to Makefile, MAINTAINERS
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (14 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 15/22] rqspinlock: Add locktorture support Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 13:59 ` [PATCH bpf-next v1 17/22] bpf: Convert hashtab.c to rqspinlock Kumar Kartikeya Dwivedi
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Ensure that rqspinlock is built when qspinlock support and BPF subsystem
is enabled. Also, add the file under the BPF MAINTAINERS entry so that
all patches changing code in the file end up Cc'ing bpf@vger and the
maintainers/reviewers.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 MAINTAINERS                | 3 +++
 include/asm-generic/Kbuild | 1 +
 kernel/locking/Makefile    | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index baf0eeb9a355..fde7ca94cc1d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4257,6 +4257,9 @@ F:	include/uapi/linux/filter.h
 F:	kernel/bpf/
 F:	kernel/trace/bpf_trace.c
 F:	lib/buildid.c
+F:	arch/*/include/asm/rqspinlock.h
+F:	include/asm-generic/rqspinlock.h
+F:	kernel/locking/rqspinlock.c
 F:	lib/test_bpf.c
 F:	net/bpf/
 F:	net/core/filter.c
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 1b43c3a77012..8675b7b4ad23 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -45,6 +45,7 @@ mandatory-y += pci.h
 mandatory-y += percpu.h
 mandatory-y += pgalloc.h
 mandatory-y += preempt.h
+mandatory-y += rqspinlock.h
 mandatory-y += runtime-const.h
 mandatory-y += rwonce.h
 mandatory-y += sections.h
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 0db4093d17b8..9b241490ab90 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -24,6 +24,9 @@ obj-$(CONFIG_SMP) += spinlock.o
 obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
 obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
+ifeq ($(CONFIG_BPF_SYSCALL),y)
+obj-$(CONFIG_QUEUED_SPINLOCKS) += rqspinlock.o
+endif
 obj-$(CONFIG_RT_MUTEXES) += rtmutex_api.o
 obj-$(CONFIG_PREEMPT_RT) += spinlock_rt.o ww_rt_mutex.o
 obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 17/22] bpf: Convert hashtab.c to rqspinlock
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (15 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 16/22] rqspinlock: Add entry to Makefile, MAINTAINERS Kumar Kartikeya Dwivedi
@ 2025-01-07 13:59 ` Kumar Kartikeya Dwivedi
  2025-01-07 14:00 ` [PATCH bpf-next v1 18/22] bpf: Convert percpu_freelist.c " Kumar Kartikeya Dwivedi
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 13:59 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed
per-cpu counter crud from the code base which is no longer necessary.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/hashtab.c | 102 ++++++++++++++-----------------------------
 1 file changed, 32 insertions(+), 70 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 3ec941a0ea41..6812b114b811 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -16,6 +16,7 @@
 #include "bpf_lru_list.h"
 #include "map_in_map.h"
 #include <linux/bpf_mem_alloc.h>
+#include <asm/rqspinlock.h>
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
@@ -78,7 +79,7 @@
  */
 struct bucket {
 	struct hlist_nulls_head head;
-	raw_spinlock_t raw_lock;
+	rqspinlock_t raw_lock;
 };
 
 #define HASHTAB_MAP_LOCK_COUNT 8
@@ -104,8 +105,6 @@ struct bpf_htab {
 	u32 n_buckets;	/* number of hash buckets */
 	u32 elem_size;	/* size of each element in bytes */
 	u32 hashrnd;
-	struct lock_class_key lockdep_key;
-	int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT];
 };
 
 /* each htab element is struct htab_elem + key + value */
@@ -140,45 +139,26 @@ static void htab_init_buckets(struct bpf_htab *htab)
 
 	for (i = 0; i < htab->n_buckets; i++) {
 		INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
-		raw_spin_lock_init(&htab->buckets[i].raw_lock);
-		lockdep_set_class(&htab->buckets[i].raw_lock,
-					  &htab->lockdep_key);
+		raw_res_spin_lock_init(&htab->buckets[i].raw_lock);
 		cond_resched();
 	}
 }
 
-static inline int htab_lock_bucket(const struct bpf_htab *htab,
-				   struct bucket *b, u32 hash,
-				   unsigned long *pflags)
+static inline int htab_lock_bucket(struct bucket *b, unsigned long *pflags)
 {
 	unsigned long flags;
+	int ret;
 
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-
-	preempt_disable();
-	local_irq_save(flags);
-	if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
-		__this_cpu_dec(*(htab->map_locked[hash]));
-		local_irq_restore(flags);
-		preempt_enable();
-		return -EBUSY;
-	}
-
-	raw_spin_lock(&b->raw_lock);
+	ret = raw_res_spin_lock_irqsave(&b->raw_lock, flags);
+	if (ret)
+		return ret;
 	*pflags = flags;
-
 	return 0;
 }
 
-static inline void htab_unlock_bucket(const struct bpf_htab *htab,
-				      struct bucket *b, u32 hash,
-				      unsigned long flags)
+static inline void htab_unlock_bucket(struct bucket *b, unsigned long flags)
 {
-	hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets - 1);
-	raw_spin_unlock(&b->raw_lock);
-	__this_cpu_dec(*(htab->map_locked[hash]));
-	local_irq_restore(flags);
-	preempt_enable();
+	raw_res_spin_unlock_irqrestore(&b->raw_lock, flags);
 }
 
 static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
@@ -483,14 +463,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
 	bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
 	struct bpf_htab *htab;
-	int err, i;
+	int err;
 
 	htab = bpf_map_area_alloc(sizeof(*htab), NUMA_NO_NODE);
 	if (!htab)
 		return ERR_PTR(-ENOMEM);
 
-	lockdep_register_key(&htab->lockdep_key);
-
 	bpf_map_init_from_attr(&htab->map, attr);
 
 	if (percpu_lru) {
@@ -536,15 +514,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	if (!htab->buckets)
 		goto free_elem_count;
 
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++) {
-		htab->map_locked[i] = bpf_map_alloc_percpu(&htab->map,
-							   sizeof(int),
-							   sizeof(int),
-							   GFP_USER);
-		if (!htab->map_locked[i])
-			goto free_map_locked;
-	}
-
 	if (htab->map.map_flags & BPF_F_ZERO_SEED)
 		htab->hashrnd = 0;
 	else
@@ -607,15 +576,12 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 free_map_locked:
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
 	bpf_map_area_free(htab->buckets);
 	bpf_mem_alloc_destroy(&htab->pcpu_ma);
 	bpf_mem_alloc_destroy(&htab->ma);
 free_elem_count:
 	bpf_map_free_elem_count(&htab->map);
 free_htab:
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 	return ERR_PTR(err);
 }
@@ -817,7 +783,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 	b = __select_bucket(htab, tgt_l->hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, tgt_l->hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return false;
 
@@ -829,7 +795,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
 			break;
 		}
 
-	htab_unlock_bucket(htab, b, tgt_l->hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	return l == tgt_l;
 }
@@ -1148,7 +1114,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		 */
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1199,7 +1165,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 			check_and_free_fields(htab, l_old);
 		}
 	}
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l_old) {
 		if (old_map_ptr)
 			map->ops->map_fd_put_ptr(map, old_map_ptr, true);
@@ -1208,7 +1174,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	}
 	return 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1255,7 +1221,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	copy_map_value(&htab->map,
 		       l_new->key + round_up(map->key_size, 8), value);
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1276,7 +1242,7 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	ret = 0;
 
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 err_lock_bucket:
 	if (ret)
@@ -1313,7 +1279,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1338,7 +1304,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	return ret;
 }
 
@@ -1379,7 +1345,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 			return -ENOMEM;
 	}
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		goto err_lock_bucket;
 
@@ -1403,7 +1369,7 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 	}
 	ret = 0;
 err:
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 err_lock_bucket:
 	if (l_new) {
 		bpf_map_dec_elem_count(&htab->map);
@@ -1445,7 +1411,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1455,7 +1421,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 
 	if (l)
 		free_htab_elem(htab, l);
@@ -1481,7 +1447,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &flags);
+	ret = htab_lock_bucket(b, &flags);
 	if (ret)
 		return ret;
 
@@ -1492,7 +1458,7 @@ static long htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	else
 		ret = -ENOENT;
 
-	htab_unlock_bucket(htab, b, hash, flags);
+	htab_unlock_bucket(b, flags);
 	if (l)
 		htab_lru_push_free(htab, l);
 	return ret;
@@ -1561,7 +1527,6 @@ static void htab_map_free_timers_and_wq(struct bpf_map *map)
 static void htab_map_free(struct bpf_map *map)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
-	int i;
 
 	/* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback.
 	 * bpf_free_used_maps() is called after bpf prog is no longer executing.
@@ -1586,9 +1551,6 @@ static void htab_map_free(struct bpf_map *map)
 	bpf_mem_alloc_destroy(&htab->ma);
 	if (htab->use_percpu_counter)
 		percpu_counter_destroy(&htab->pcount);
-	for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
-		free_percpu(htab->map_locked[i]);
-	lockdep_unregister_key(&htab->lockdep_key);
 	bpf_map_area_free(htab);
 }
 
@@ -1631,7 +1593,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	ret = htab_lock_bucket(htab, b, hash, &bflags);
+	ret = htab_lock_bucket(b, &bflags);
 	if (ret)
 		return ret;
 
@@ -1669,7 +1631,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 			free_htab_elem(htab, l);
 	}
 
-	htab_unlock_bucket(htab, b, hash, bflags);
+	htab_unlock_bucket(b, bflags);
 
 	if (is_lru_map && l)
 		htab_lru_push_free(htab, l);
@@ -1787,7 +1749,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	head = &b->head;
 	/* do not grab the lock unless need it (bucket_cnt > 0). */
 	if (locked) {
-		ret = htab_lock_bucket(htab, b, batch, &flags);
+		ret = htab_lock_bucket(b, &flags);
 		if (ret) {
 			rcu_read_unlock();
 			bpf_enable_instrumentation();
@@ -1810,7 +1772,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		goto after_loop;
@@ -1821,7 +1783,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		/* Note that since bucket_cnt > 0 here, it is implicit
 		 * that the locked was grabbed, so release it.
 		 */
-		htab_unlock_bucket(htab, b, batch, flags);
+		htab_unlock_bucket(b, flags);
 		rcu_read_unlock();
 		bpf_enable_instrumentation();
 		kvfree(keys);
@@ -1884,7 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		dst_val += value_size;
 	}
 
-	htab_unlock_bucket(htab, b, batch, flags);
+	htab_unlock_bucket(b, flags);
 	locked = false;
 
 	while (node_to_free) {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 18/22] bpf: Convert percpu_freelist.c to rqspinlock
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (16 preceding siblings ...)
  2025-01-07 13:59 ` [PATCH bpf-next v1 17/22] bpf: Convert hashtab.c to rqspinlock Kumar Kartikeya Dwivedi
@ 2025-01-07 14:00 ` Kumar Kartikeya Dwivedi
  2025-01-07 14:00 ` [PATCH bpf-next v1 19/22] bpf: Convert lpm_trie.c " Kumar Kartikeya Dwivedi
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 14:00 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Convert the percpu_freelist.c code to use rqspinlock, and remove the
extralist fallback and trylock-based acquisitions to avoid deadlocks.

Key thing to note is the retained while (true) loop to search through
other CPUs when failing to push a node due to locking errors. This
retains the behavior of the old code, where it would keep trying until
it would be able to successfully push the node back into the freelist of
a CPU.

Technically, we should start iteration for this loop from
raw_smp_processor_id() + 1, but to avoid hitting the edge of nr_cpus,
we skip execution in the loop body instead.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/percpu_freelist.c | 113 ++++++++---------------------------
 kernel/bpf/percpu_freelist.h |   4 +-
 2 files changed, 27 insertions(+), 90 deletions(-)

diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 034cf87b54e9..632762b57299 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -14,11 +14,9 @@ int pcpu_freelist_init(struct pcpu_freelist *s)
 	for_each_possible_cpu(cpu) {
 		struct pcpu_freelist_head *head = per_cpu_ptr(s->freelist, cpu);
 
-		raw_spin_lock_init(&head->lock);
+		raw_res_spin_lock_init(&head->lock);
 		head->first = NULL;
 	}
-	raw_spin_lock_init(&s->extralist.lock);
-	s->extralist.first = NULL;
 	return 0;
 }
 
@@ -34,58 +32,39 @@ static inline void pcpu_freelist_push_node(struct pcpu_freelist_head *head,
 	WRITE_ONCE(head->first, node);
 }
 
-static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
+static inline bool ___pcpu_freelist_push(struct pcpu_freelist_head *head,
 					 struct pcpu_freelist_node *node)
 {
-	raw_spin_lock(&head->lock);
-	pcpu_freelist_push_node(head, node);
-	raw_spin_unlock(&head->lock);
-}
-
-static inline bool pcpu_freelist_try_push_extra(struct pcpu_freelist *s,
-						struct pcpu_freelist_node *node)
-{
-	if (!raw_spin_trylock(&s->extralist.lock))
+	if (raw_res_spin_lock(&head->lock))
 		return false;
-
-	pcpu_freelist_push_node(&s->extralist, node);
-	raw_spin_unlock(&s->extralist.lock);
+	pcpu_freelist_push_node(head, node);
+	raw_res_spin_unlock(&head->lock);
 	return true;
 }
 
-static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
-					     struct pcpu_freelist_node *node)
+void __pcpu_freelist_push(struct pcpu_freelist *s,
+			struct pcpu_freelist_node *node)
 {
-	int cpu, orig_cpu;
+	struct pcpu_freelist_head *head;
+	int cpu;
 
-	orig_cpu = raw_smp_processor_id();
-	while (1) {
-		for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
-			struct pcpu_freelist_head *head;
+	if (___pcpu_freelist_push(this_cpu_ptr(s->freelist), node))
+		return;
 
+	while (true) {
+		for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
+			if (cpu == raw_smp_processor_id())
+				continue;
 			head = per_cpu_ptr(s->freelist, cpu);
-			if (raw_spin_trylock(&head->lock)) {
-				pcpu_freelist_push_node(head, node);
-				raw_spin_unlock(&head->lock);
-				return;
-			}
-		}
-
-		/* cannot lock any per cpu lock, try extralist */
-		if (pcpu_freelist_try_push_extra(s, node))
+			if (raw_res_spin_lock(&head->lock))
+				continue;
+			pcpu_freelist_push_node(head, node);
+			raw_res_spin_unlock(&head->lock);
 			return;
+		}
 	}
 }
 
-void __pcpu_freelist_push(struct pcpu_freelist *s,
-			struct pcpu_freelist_node *node)
-{
-	if (in_nmi())
-		___pcpu_freelist_push_nmi(s, node);
-	else
-		___pcpu_freelist_push(this_cpu_ptr(s->freelist), node);
-}
-
 void pcpu_freelist_push(struct pcpu_freelist *s,
 			struct pcpu_freelist_node *node)
 {
@@ -120,71 +99,29 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 
 static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
 {
+	struct pcpu_freelist_node *node = NULL;
 	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
 	int cpu;
 
 	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
 		head = per_cpu_ptr(s->freelist, cpu);
 		if (!READ_ONCE(head->first))
 			continue;
-		raw_spin_lock(&head->lock);
+		if (raw_res_spin_lock(&head->lock))
+			continue;
 		node = head->first;
 		if (node) {
 			WRITE_ONCE(head->first, node->next);
-			raw_spin_unlock(&head->lock);
+			raw_res_spin_unlock(&head->lock);
 			return node;
 		}
-		raw_spin_unlock(&head->lock);
+		raw_res_spin_unlock(&head->lock);
 	}
-
-	/* per cpu lists are all empty, try extralist */
-	if (!READ_ONCE(s->extralist.first))
-		return NULL;
-	raw_spin_lock(&s->extralist.lock);
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
-	return node;
-}
-
-static struct pcpu_freelist_node *
-___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
-{
-	struct pcpu_freelist_head *head;
-	struct pcpu_freelist_node *node;
-	int cpu;
-
-	for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
-		head = per_cpu_ptr(s->freelist, cpu);
-		if (!READ_ONCE(head->first))
-			continue;
-		if (raw_spin_trylock(&head->lock)) {
-			node = head->first;
-			if (node) {
-				WRITE_ONCE(head->first, node->next);
-				raw_spin_unlock(&head->lock);
-				return node;
-			}
-			raw_spin_unlock(&head->lock);
-		}
-	}
-
-	/* cannot pop from per cpu lists, try extralist */
-	if (!READ_ONCE(s->extralist.first) || !raw_spin_trylock(&s->extralist.lock))
-		return NULL;
-	node = s->extralist.first;
-	if (node)
-		WRITE_ONCE(s->extralist.first, node->next);
-	raw_spin_unlock(&s->extralist.lock);
 	return node;
 }
 
 struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
 {
-	if (in_nmi())
-		return ___pcpu_freelist_pop_nmi(s);
 	return ___pcpu_freelist_pop(s);
 }
 
diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h
index 3c76553cfe57..914798b74967 100644
--- a/kernel/bpf/percpu_freelist.h
+++ b/kernel/bpf/percpu_freelist.h
@@ -5,15 +5,15 @@
 #define __PERCPU_FREELIST_H__
 #include <linux/spinlock.h>
 #include <linux/percpu.h>
+#include <asm/rqspinlock.h>
 
 struct pcpu_freelist_head {
 	struct pcpu_freelist_node *first;
-	raw_spinlock_t lock;
+	rqspinlock_t lock;
 };
 
 struct pcpu_freelist {
 	struct pcpu_freelist_head __percpu *freelist;
-	struct pcpu_freelist_head extralist;
 };
 
 struct pcpu_freelist_node {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 19/22] bpf: Convert lpm_trie.c to rqspinlock
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (17 preceding siblings ...)
  2025-01-07 14:00 ` [PATCH bpf-next v1 18/22] bpf: Convert percpu_freelist.c " Kumar Kartikeya Dwivedi
@ 2025-01-07 14:00 ` Kumar Kartikeya Dwivedi
  2025-01-07 14:00 ` [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 14:00 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Convert all LPM trie usage of raw_spinlock to rqspinlock.

Note that rcu_dereference_protected in trie_delete_elem is switched over
to plain rcu_dereference, the RCU read lock should be held from BPF
program side or eBPF syscall path, and the trie->lock is just acquired
before the dereference. It is not clear the reason the protected variant
was used from the commit history, but the above reasoning makes sense so
switch over.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/lpm_trie.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index f8bc1e096182..a92d1eeafb33 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -15,6 +15,7 @@
 #include <net/ipv6.h>
 #include <uapi/linux/btf.h>
 #include <linux/btf_ids.h>
+#include <asm/rqspinlock.h>
 #include <linux/bpf_mem_alloc.h>
 
 /* Intermediate node */
@@ -36,7 +37,7 @@ struct lpm_trie {
 	size_t				n_entries;
 	size_t				max_prefixlen;
 	size_t				data_size;
-	raw_spinlock_t			lock;
+	rqspinlock_t			lock;
 };
 
 /* This trie implements a longest prefix match algorithm that can be used to
@@ -349,7 +350,9 @@ static long trie_update_elem(struct bpf_map *map,
 	if (!new_node)
 		return -ENOMEM;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		goto out_free;
 
 	new_node->prefixlen = key->prefixlen;
 	RCU_INIT_POINTER(new_node->child[0], NULL);
@@ -363,8 +366,7 @@ static long trie_update_elem(struct bpf_map *map,
 	 */
 	slot = &trie->root;
 
-	while ((node = rcu_dereference_protected(*slot,
-					lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*slot))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -450,8 +452,8 @@ static long trie_update_elem(struct bpf_map *map,
 	rcu_assign_pointer(*slot, im_node);
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
-
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
+out_free:
 	migrate_disable();
 	if (ret)
 		bpf_mem_cache_free(&trie->ma, new_node);
@@ -477,7 +479,9 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	if (key->prefixlen > trie->max_prefixlen)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&trie->lock, irq_flags);
+	ret = raw_res_spin_lock_irqsave(&trie->lock, irq_flags);
+	if (ret)
+		return ret;
 
 	/* Walk the tree looking for an exact key/length match and keeping
 	 * track of the path we traverse.  We will need to know the node
@@ -488,8 +492,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	trim = &trie->root;
 	trim2 = trim;
 	parent = NULL;
-	while ((node = rcu_dereference_protected(
-		       *trim, lockdep_is_held(&trie->lock)))) {
+	while ((node = rcu_dereference(*trim))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
@@ -553,7 +556,7 @@ static long trie_delete_elem(struct bpf_map *map, void *_key)
 	free_node = node;
 
 out:
-	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
+	raw_res_spin_unlock_irqrestore(&trie->lock, irq_flags);
 
 	migrate_disable();
 	bpf_mem_cache_free_rcu(&trie->ma, free_parent);
@@ -604,7 +607,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 			  offsetof(struct bpf_lpm_trie_key_u8, data);
 	trie->max_prefixlen = trie->data_size * 8;
 
-	raw_spin_lock_init(&trie->lock);
+	raw_res_spin_lock_init(&trie->lock);
 
 	/* Allocate intermediate and leaf nodes from the same allocator */
 	leaf_size = sizeof(struct lpm_trie_node) + trie->data_size +
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (18 preceding siblings ...)
  2025-01-07 14:00 ` [PATCH bpf-next v1 19/22] bpf: Convert lpm_trie.c " Kumar Kartikeya Dwivedi
@ 2025-01-07 14:00 ` Kumar Kartikeya Dwivedi
  2025-01-08 10:23   ` kernel test robot
                     ` (2 more replies)
  2025-01-07 14:00 ` [PATCH bpf-next v1 21/22] bpf: Implement verifier support for rqspinlock Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  22 siblings, 3 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 14:00 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Introduce four new kfuncs, bpf_res_spin_lock, and bpf_res_spin_unlock,
and their irqsave/irqrestore variants, which wrap the rqspinlock APIs.
bpf_res_spin_lock returns a conditional result, depending on whether the
lock was acquired (NULL is returned when lock acquisition succeeds,
non-NULL upon failure). The memory pointed to by the returned pointer
upon failure can be dereferenced after the NULL check to obtain the
error code.

Instead of using the old bpf_spin_lock type, introduce a new type with
the same layout, and the same alignment, but a different name to avoid
type confusion.

Preemption is disabled upon successful lock acquisition, however IRQs
are not. Special kfuncs can be introduced later to allow disabling IRQs
when taking a spin lock. Resilient locks are safe against AA deadlocks,
hence not disabling IRQs currently does not allow violation of kernel
safety.

__irq_flag annotation is used to accept IRQ flags for the IRQ-variants,
with the same semantics as existing bpf_local_irq_{save, restore}.

These kfuncs will require additional verifier-side support in subsequent
commits, to allow programs to hold multiple locks at the same time.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/asm-generic/rqspinlock.h |  4 ++
 include/linux/bpf.h              |  1 +
 kernel/locking/rqspinlock.c      | 78 ++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
index 53be8426373c..22f8770f033b 100644
--- a/include/asm-generic/rqspinlock.h
+++ b/include/asm-generic/rqspinlock.h
@@ -14,6 +14,10 @@
 #include <linux/percpu.h>
 #include <asm/qspinlock.h>
 
+struct bpf_res_spin_lock {
+	u32 val;
+};
+
 struct qspinlock;
 typedef struct qspinlock rqspinlock_t;
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index feda0ce90f5a..f93a4f40aaaf 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -30,6 +30,7 @@
 #include <linux/static_call.h>
 #include <linux/memcontrol.h>
 #include <linux/cfi.h>
+#include <asm/rqspinlock.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
index 9d3036f5e613..2c6293d1298c 100644
--- a/kernel/locking/rqspinlock.c
+++ b/kernel/locking/rqspinlock.c
@@ -15,6 +15,8 @@
 
 #include <linux/smp.h>
 #include <linux/bug.h>
+#include <linux/bpf.h>
+#include <linux/err.h>
 #include <linux/cpumask.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
@@ -644,3 +646,79 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
 	return ret;
 }
 EXPORT_SYMBOL(resilient_queued_spin_lock_slowpath);
+
+__bpf_kfunc_start_defs();
+
+#define REPORT_STR(ret) ({ ret == -ETIMEDOUT ? "Timeout detected" : "AA or ABBA deadlock detected"; })
+
+__bpf_kfunc int bpf_res_spin_lock(struct bpf_res_spin_lock *lock)
+{
+	int ret;
+
+	BUILD_BUG_ON(sizeof(struct qspinlock) != sizeof(struct bpf_res_spin_lock));
+	BUILD_BUG_ON(__alignof__(struct qspinlock) != __alignof__(struct bpf_res_spin_lock));
+
+	preempt_disable();
+	ret = res_spin_lock((struct qspinlock *)lock);
+	if (unlikely(ret)) {
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock(struct bpf_res_spin_lock *lock)
+{
+	res_spin_unlock((struct qspinlock *)lock);
+	preempt_enable();
+}
+
+__bpf_kfunc int bpf_res_spin_lock_irqsave(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags;
+	int ret;
+
+	preempt_disable();
+	local_irq_save(flags);
+	ret = res_spin_lock((struct qspinlock *)lock);
+	if (unlikely(ret)) {
+		local_irq_restore(flags);
+		preempt_enable();
+		rqspinlock_report_violation(REPORT_STR(ret), lock);
+		return ret;
+	}
+	*ptr = flags;
+	return 0;
+}
+
+__bpf_kfunc void bpf_res_spin_unlock_irqrestore(struct bpf_res_spin_lock *lock, unsigned long *flags__irq_flag)
+{
+	u64 *ptr = (u64 *)flags__irq_flag;
+	unsigned long flags = *ptr;
+
+	res_spin_unlock((struct qspinlock *)lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(rqspinlock_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_res_spin_lock, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock)
+BTF_ID_FLAGS(func, bpf_res_spin_lock_irqsave, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_res_spin_unlock_irqrestore)
+BTF_KFUNCS_END(rqspinlock_kfunc_ids)
+
+static const struct btf_kfunc_id_set rqspinlock_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &rqspinlock_kfunc_ids,
+};
+
+static __init int rqspinlock_register_kfuncs(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &rqspinlock_kfunc_set);
+}
+late_initcall(rqspinlock_register_kfuncs);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 21/22] bpf: Implement verifier support for rqspinlock
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (19 preceding siblings ...)
  2025-01-07 14:00 ` [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
@ 2025-01-07 14:00 ` Kumar Kartikeya Dwivedi
  2025-01-07 14:00 ` [PATCH bpf-next v1 22/22] selftests/bpf: Add tests " Kumar Kartikeya Dwivedi
  2025-01-07 23:54 ` [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Linus Torvalds
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 14:00 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Introduce verifier-side support for rqspinlock kfuncs. The first step is
allowing bpf_res_spin_lock type to be defined in map values and
allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK
field to recognize and validate.

Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only
one of them (and at most one of them per-object, like before) must be
present. The bpf_res_spin_lock can also be used to protect objects that
require lock protection for their kfuncs, like BPF rbtree and linked
list.

The verifier plumbing to simulate success and failure cases when calling
the kfuncs is done by pushing a new verifier state to the verifier state
stack which will verify the failure case upon calling the kfunc. The
path where success is indicated creates all lock reference state and IRQ
state (if necessary for irqsave variants). In the case of failure, all
state creation is skipped while verifying the kfunc. When marking the
return value for success case, the value is marked as 0, and for the
failure case as [-MAX_ERRNO, -1]. Then, in the program, whenever user
checks the return value as 'if (ret)' or 'if (ret < 0)' the verifier
never traverses such branches for success cases, and would be aware that
the lock is not held in such cases.

We push the kfunc state in do_check and then call check_kfunc_call
separately for pushed state and the current state, and operate on the
current state in case of success, and skip adding lock and IRQ state in
case of failure. Failure state is indicated using PROCESS_LOCK_FAIL
flag.

We introduce a kfunc_class state to avoid mixing lock irqrestore kfuncs
with IRQ state created by bpf_local_irq_save.

With all this infrastructure, these kfuncs become usable in programs
while satisfying all safety properties required by the kernel.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h          |   9 ++
 include/linux/bpf_verifier.h |  17 ++-
 kernel/bpf/btf.c             |  26 +++-
 kernel/bpf/syscall.c         |   6 +-
 kernel/bpf/verifier.c        | 233 ++++++++++++++++++++++++++++-------
 5 files changed, 238 insertions(+), 53 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f93a4f40aaaf..fd05c13590e0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -205,6 +205,7 @@ enum btf_field_type {
 	BPF_REFCOUNT   = (1 << 9),
 	BPF_WORKQUEUE  = (1 << 10),
 	BPF_UPTR       = (1 << 11),
+	BPF_RES_SPIN_LOCK = (1 << 12),
 };
 
 typedef void (*btf_dtor_kfunc_t)(void *);
@@ -240,6 +241,7 @@ struct btf_record {
 	u32 cnt;
 	u32 field_mask;
 	int spin_lock_off;
+	int res_spin_lock_off;
 	int timer_off;
 	int wq_off;
 	int refcount_off;
@@ -315,6 +317,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return "bpf_spin_lock";
+	case BPF_RES_SPIN_LOCK:
+		return "bpf_res_spin_lock";
 	case BPF_TIMER:
 		return "bpf_timer";
 	case BPF_WORKQUEUE:
@@ -347,6 +351,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return sizeof(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return sizeof(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return sizeof(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -377,6 +383,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 	switch (type) {
 	case BPF_SPIN_LOCK:
 		return __alignof__(struct bpf_spin_lock);
+	case BPF_RES_SPIN_LOCK:
+		return __alignof__(struct bpf_res_spin_lock);
 	case BPF_TIMER:
 		return __alignof__(struct bpf_timer);
 	case BPF_WORKQUEUE:
@@ -420,6 +428,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 	case BPF_RB_ROOT:
 		/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_KPTR_UNREF:
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 32c23f2a3086..ed444e44f524 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -115,6 +115,15 @@ struct bpf_reg_state {
 			int depth:30;
 		} iter;
 
+		/* For irq stack slots */
+		struct {
+			enum {
+				IRQ_KFUNC_IGNORE,
+				IRQ_NATIVE_KFUNC,
+				IRQ_LOCK_KFUNC,
+			} kfunc_class;
+		} irq;
+
 		/* Max size from any of the above. */
 		struct {
 			unsigned long raw1;
@@ -255,9 +264,11 @@ struct bpf_reference_state {
 	 * default to pointer reference on zero initialization of a state.
 	 */
 	enum ref_state_type {
-		REF_TYPE_PTR	= 1,
-		REF_TYPE_IRQ	= 2,
-		REF_TYPE_LOCK	= 3,
+		REF_TYPE_PTR		= (1 << 1),
+		REF_TYPE_IRQ		= (1 << 2),
+		REF_TYPE_LOCK		= (1 << 3),
+		REF_TYPE_RES_LOCK 	= (1 << 4),
+		REF_TYPE_RES_LOCK_IRQ	= (1 << 5),
 	} type;
 	/* Track each reference created with a unique id, even if the same
 	 * instruction creates the reference multiple times (eg, via CALL).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 8396ce1d0fba..99c9fdbdd31c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3477,6 +3477,15 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
 			goto end;
 		}
 	}
+	if (field_mask & BPF_RES_SPIN_LOCK) {
+		if (!strcmp(name, "bpf_res_spin_lock")) {
+			if (*seen_mask & BPF_RES_SPIN_LOCK)
+				return -E2BIG;
+			*seen_mask |= BPF_RES_SPIN_LOCK;
+			type = BPF_RES_SPIN_LOCK;
+			goto end;
+		}
+	}
 	if (field_mask & BPF_TIMER) {
 		if (!strcmp(name, "bpf_timer")) {
 			if (*seen_mask & BPF_TIMER)
@@ -3655,6 +3664,7 @@ static int btf_find_field_one(const struct btf *btf,
 
 	switch (field_type) {
 	case BPF_SPIN_LOCK:
+	case BPF_RES_SPIN_LOCK:
 	case BPF_TIMER:
 	case BPF_WORKQUEUE:
 	case BPF_LIST_NODE:
@@ -3948,6 +3958,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		return ERR_PTR(-ENOMEM);
 
 	rec->spin_lock_off = -EINVAL;
+	rec->res_spin_lock_off = -EINVAL;
 	rec->timer_off = -EINVAL;
 	rec->wq_off = -EINVAL;
 	rec->refcount_off = -EINVAL;
@@ -3975,6 +3986,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			/* Cache offset for faster lookup at runtime */
 			rec->spin_lock_off = rec->fields[i].offset;
 			break;
+		case BPF_RES_SPIN_LOCK:
+			WARN_ON_ONCE(rec->spin_lock_off >= 0);
+			/* Cache offset for faster lookup at runtime */
+			rec->res_spin_lock_off = rec->fields[i].offset;
+			break;
 		case BPF_TIMER:
 			WARN_ON_ONCE(rec->timer_off >= 0);
 			/* Cache offset for faster lookup at runtime */
@@ -4018,9 +4034,15 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->cnt++;
 	}
 
+	if (rec->spin_lock_off >= 0 && rec->res_spin_lock_off >= 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
 	/* bpf_{list_head, rb_node} require bpf_spin_lock */
 	if ((btf_record_has_field(rec, BPF_LIST_HEAD) ||
-	     btf_record_has_field(rec, BPF_RB_ROOT)) && rec->spin_lock_off < 0) {
+	     btf_record_has_field(rec, BPF_RB_ROOT)) &&
+		 (rec->spin_lock_off < 0 && rec->res_spin_lock_off < 0)) {
 		ret = -EINVAL;
 		goto end;
 	}
@@ -5638,7 +5660,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
 
 		type = &tab->types[tab->cnt];
 		type->btf_id = i;
-		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
+		record = btf_parse_fields(btf, t, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_LIST_HEAD | BPF_LIST_NODE |
 						  BPF_RB_ROOT | BPF_RB_NODE | BPF_REFCOUNT |
 						  BPF_KPTR, t->size);
 		/* The record cannot be unset, treat it as an error if so */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4e88797fdbeb..9701212aa2ed 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -648,6 +648,7 @@ void btf_record_free(struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -700,6 +701,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 		case BPF_RB_ROOT:
 		case BPF_RB_NODE:
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 		case BPF_TIMER:
 		case BPF_REFCOUNT:
 		case BPF_WORKQUEUE:
@@ -777,6 +779,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 
 		switch (fields[i].type) {
 		case BPF_SPIN_LOCK:
+		case BPF_RES_SPIN_LOCK:
 			break;
 		case BPF_TIMER:
 			bpf_timer_cancel_and_free(field_ptr);
@@ -1199,7 +1202,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 		return -EINVAL;
 
 	map->record = btf_parse_fields(btf, value_type,
-				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
+				       BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
 				       BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE | BPF_UPTR,
 				       map->value_size);
 	if (!IS_ERR_OR_NULL(map->record)) {
@@ -1218,6 +1221,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
 			case 0:
 				continue;
 			case BPF_SPIN_LOCK:
+			case BPF_RES_SPIN_LOCK:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_ARRAY &&
 				    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b8ca227c78af..bf230599d6f7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -455,7 +455,7 @@ static bool subprog_is_exc_cb(struct bpf_verifier_env *env, int subprog)
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
 {
-	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK);
+	return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK);
 }
 
 static bool type_is_rdonly_mem(u32 type)
@@ -1147,7 +1147,8 @@ static int release_irq_state(struct bpf_verifier_state *state, int id);
 
 static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 				     struct bpf_kfunc_call_arg_meta *meta,
-				     struct bpf_reg_state *reg, int insn_idx)
+				     struct bpf_reg_state *reg, int insn_idx,
+				     int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1169,6 +1170,7 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	st->type = PTR_TO_STACK; /* we don't have dedicated reg type */
 	st->live |= REG_LIVE_WRITTEN;
 	st->ref_obj_id = id;
+	st->irq.kfunc_class = kfunc_class;
 
 	for (i = 0; i < BPF_REG_SIZE; i++)
 		slot->slot_type[i] = STACK_IRQ_FLAG;
@@ -1177,7 +1179,8 @@ static int mark_stack_slot_irq_flag(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
+static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
+				      int kfunc_class)
 {
 	struct bpf_func_state *state = func(env, reg);
 	struct bpf_stack_state *slot;
@@ -1191,6 +1194,15 @@ static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_r
 	slot = &state->stack[spi];
 	st = &slot->spilled_ptr;
 
+	if (kfunc_class != IRQ_KFUNC_IGNORE && st->irq.kfunc_class != kfunc_class) {
+		const char *flag_kfunc = st->irq.kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+		const char *used_kfunc = kfunc_class == IRQ_NATIVE_KFUNC ? "native" : "lock";
+
+		verbose(env, "irq flag acquired by %s kfuncs cannot be restored with %s kfuncs\n",
+			flag_kfunc, used_kfunc);
+		return -EINVAL;
+	}
+
 	err = release_irq_state(env->cur_state, st->ref_obj_id);
 	WARN_ON_ONCE(err && err != -EACCES);
 	if (err) {
@@ -1588,7 +1600,7 @@ static struct bpf_reference_state *find_lock_state(struct bpf_verifier_state *st
 	for (i = 0; i < state->acquired_refs; i++) {
 		struct bpf_reference_state *s = &state->refs[i];
 
-		if (s->type != type)
+		if (!(s->type & type))
 			continue;
 
 		if (s->id == id && s->ptr == ptr)
@@ -7995,6 +8007,13 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
 	return err;
 }
 
+enum {
+	PROCESS_SPIN_LOCK = (1 << 0),
+	PROCESS_RES_LOCK  = (1 << 1),
+	PROCESS_LOCK_IRQ  = (1 << 2),
+	PROCESS_LOCK_FAIL = (1 << 3),
+};
+
 /* Implementation details:
  * bpf_map_lookup returns PTR_TO_MAP_VALUE_OR_NULL.
  * bpf_obj_new returns PTR_TO_BTF_ID | MEM_ALLOC | PTR_MAYBE_NULL.
@@ -8017,30 +8036,38 @@ static int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg
  * env->cur_state->active_locks remembers which map value element or allocated
  * object got locked and clears it after bpf_spin_unlock.
  */
-static int process_spin_lock(struct bpf_verifier_env *env, int regno,
-			     bool is_lock)
+static int process_spin_lock(struct bpf_verifier_env *env, struct bpf_verifier_state *cur, int regno, int flags)
 {
+	bool is_lock = flags & PROCESS_SPIN_LOCK, is_res_lock = flags & PROCESS_RES_LOCK;
+	const char *lock_str = is_res_lock ? "bpf_res_spin" : "bpf_spin";
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
-	struct bpf_verifier_state *cur = env->cur_state;
 	bool is_const = tnum_is_const(reg->var_off);
+	bool is_irq = flags & PROCESS_LOCK_IRQ;
 	u64 val = reg->var_off.value;
 	struct bpf_map *map = NULL;
 	struct btf *btf = NULL;
 	struct btf_record *rec;
+	u32 spin_lock_off;
 	int err;
 
+	/* If the spin lock acquisition failed, we don't process the argument. */
+	if (flags & PROCESS_LOCK_FAIL)
+		return 0;
+	/* Success case always operates on current state only. */
+	WARN_ON_ONCE(cur != env->cur_state);
+
 	if (!is_const) {
 		verbose(env,
-			"R%d doesn't have constant offset. bpf_spin_lock has to be at the constant offset\n",
-			regno);
+			"R%d doesn't have constant offset. %s_lock has to be at the constant offset\n",
+			regno, lock_str);
 		return -EINVAL;
 	}
 	if (reg->type == PTR_TO_MAP_VALUE) {
 		map = reg->map_ptr;
 		if (!map->btf) {
 			verbose(env,
-				"map '%s' has to have BTF in order to use bpf_spin_lock\n",
-				map->name);
+				"map '%s' has to have BTF in order to use %s_lock\n",
+				map->name, lock_str);
 			return -EINVAL;
 		}
 	} else {
@@ -8048,36 +8075,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 	}
 
 	rec = reg_btf_record(reg);
-	if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
-		verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
-			map ? map->name : "kptr");
+	if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) {
+		verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local",
+			map ? map->name : "kptr", lock_str);
 		return -EINVAL;
 	}
-	if (rec->spin_lock_off != val + reg->off) {
-		verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
-			val + reg->off, rec->spin_lock_off);
+	spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off;
+	if (spin_lock_off != val + reg->off) {
+		verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n",
+			val + reg->off, lock_str, spin_lock_off);
 		return -EINVAL;
 	}
 	if (is_lock) {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
 		else
 			ptr = btf;
 
-		if (cur->active_locks) {
-			verbose(env,
-				"Locking two bpf_spin_locks are not allowed\n");
-			return -EINVAL;
+		if (!is_res_lock && cur->active_locks) {
+			if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) {
+				verbose(env,
+					"Locking two bpf_spin_locks are not allowed\n");
+				return -EINVAL;
+			}
+		} else if (is_res_lock) {
+			if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK, reg->id, ptr)) {
+				verbose(env, "Acquiring the same lock again, AA deadlock detected\n");
+				return -EINVAL;
+			}
 		}
-		err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
+
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr);
 		if (err < 0) {
 			verbose(env, "Failed to acquire lock state\n");
 			return err;
 		}
 	} else {
 		void *ptr;
+		int type;
 
 		if (map)
 			ptr = map;
@@ -8085,12 +8129,18 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
 			ptr = btf;
 
 		if (!cur->active_locks) {
-			verbose(env, "bpf_spin_unlock without taking a lock\n");
+			verbose(env, "%s_unlock without taking a lock\n", lock_str);
 			return -EINVAL;
 		}
 
-		if (release_lock_state(env->cur_state, REF_TYPE_LOCK, reg->id, ptr)) {
-			verbose(env, "bpf_spin_unlock of different lock\n");
+		if (is_res_lock && is_irq)
+			type = REF_TYPE_RES_LOCK_IRQ;
+		else if (is_res_lock)
+			type = REF_TYPE_RES_LOCK;
+		else
+			type = REF_TYPE_LOCK;
+		if (release_lock_state(env->cur_state, type, reg->id, ptr)) {
+			verbose(env, "%s_unlock of different lock\n", lock_str);
 			return -EINVAL;
 		}
 
@@ -9338,11 +9388,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			return -EACCES;
 		}
 		if (meta->func_id == BPF_FUNC_spin_lock) {
-			err = process_spin_lock(env, regno, true);
+			err = process_spin_lock(env, env->cur_state, regno, PROCESS_SPIN_LOCK);
 			if (err)
 				return err;
 		} else if (meta->func_id == BPF_FUNC_spin_unlock) {
-			err = process_spin_lock(env, regno, false);
+			err = process_spin_lock(env, env->cur_state, regno, 0);
 			if (err)
 				return err;
 		} else {
@@ -11529,6 +11579,7 @@ enum {
 	KF_ARG_RB_ROOT_ID,
 	KF_ARG_RB_NODE_ID,
 	KF_ARG_WORKQUEUE_ID,
+	KF_ARG_RES_SPIN_LOCK_ID,
 };
 
 BTF_ID_LIST(kf_arg_btf_ids)
@@ -11538,6 +11589,7 @@ BTF_ID(struct, bpf_list_node)
 BTF_ID(struct, bpf_rb_root)
 BTF_ID(struct, bpf_rb_node)
 BTF_ID(struct, bpf_wq)
+BTF_ID(struct, bpf_res_spin_lock)
 
 static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
 				    const struct btf_param *arg, int type)
@@ -11586,6 +11638,11 @@ static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
 	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
 }
 
+static bool is_kfunc_arg_res_spin_lock(const struct btf *btf, const struct btf_param *arg)
+{
+	return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RES_SPIN_LOCK_ID);
+}
+
 static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
 				  const struct btf_param *arg)
 {
@@ -11657,6 +11714,7 @@ enum kfunc_ptr_arg_type {
 	KF_ARG_PTR_TO_MAP,
 	KF_ARG_PTR_TO_WORKQUEUE,
 	KF_ARG_PTR_TO_IRQ_FLAG,
+	KF_ARG_PTR_TO_RES_SPIN_LOCK,
 };
 
 enum special_kfunc_type {
@@ -11693,6 +11751,10 @@ enum special_kfunc_type {
 	KF_bpf_iter_num_new,
 	KF_bpf_iter_num_next,
 	KF_bpf_iter_num_destroy,
+	KF_bpf_res_spin_lock,
+	KF_bpf_res_spin_unlock,
+	KF_bpf_res_spin_lock_irqsave,
+	KF_bpf_res_spin_unlock_irqrestore,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -11771,6 +11833,10 @@ BTF_ID(func, bpf_local_irq_restore)
 BTF_ID(func, bpf_iter_num_new)
 BTF_ID(func, bpf_iter_num_next)
 BTF_ID(func, bpf_iter_num_destroy)
+BTF_ID(func, bpf_res_spin_lock)
+BTF_ID(func, bpf_res_spin_unlock)
+BTF_ID(func, bpf_res_spin_lock_irqsave)
+BTF_ID(func, bpf_res_spin_unlock_irqrestore)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -11864,6 +11930,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (is_kfunc_arg_irq_flag(meta->btf, &args[argno]))
 		return KF_ARG_PTR_TO_IRQ_FLAG;
 
+	if (is_kfunc_arg_res_spin_lock(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_RES_SPIN_LOCK;
+
 	if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
 		if (!btf_type_is_struct(ref_t)) {
 			verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -11967,22 +12036,34 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int process_irq_flag(struct bpf_verifier_env *env, int regno,
-			     struct bpf_kfunc_call_arg_meta *meta)
+static int process_irq_flag(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate, int regno,
+			    struct bpf_kfunc_call_arg_meta *meta, int flags)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	int err, kfunc_class = IRQ_NATIVE_KFUNC;
 	bool irq_save;
-	int err;
 
-	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save]) {
+	if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_save] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave]) {
 		irq_save = true;
-	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore]) {
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+			kfunc_class = IRQ_LOCK_KFUNC;
+	} else if (meta->func_id == special_kfunc_list[KF_bpf_local_irq_restore] ||
+		   meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore]) {
 		irq_save = false;
+		if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+			kfunc_class = IRQ_LOCK_KFUNC;
 	} else {
 		verbose(env, "verifier internal error: unknown irq flags kfunc\n");
 		return -EFAULT;
 	}
 
+	/* If the spin lock acquisition failed, we don't process the argument. */
+	if (kfunc_class == IRQ_LOCK_KFUNC && (flags & PROCESS_LOCK_FAIL))
+		return 0;
+	/* Success case always operates on current state only. */
+	WARN_ON_ONCE(vstate != env->cur_state);
+
 	if (irq_save) {
 		if (!is_irq_flag_reg_valid_uninit(env, reg)) {
 			verbose(env, "expected uninitialized irq flag as arg#%d\n", regno - 1);
@@ -11993,7 +12074,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx);
+		err = mark_stack_slot_irq_flag(env, meta, reg, env->insn_idx, kfunc_class);
 		if (err)
 			return err;
 	} else {
@@ -12007,7 +12088,7 @@ static int process_irq_flag(struct bpf_verifier_env *env, int regno,
 		if (err)
 			return err;
 
-		err = unmark_stack_slot_irq_flag(env, reg);
+		err = unmark_stack_slot_irq_flag(env, reg, kfunc_class);
 		if (err)
 			return err;
 	}
@@ -12134,7 +12215,8 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 
 	if (!env->cur_state->active_locks)
 		return -EINVAL;
-	s = find_lock_state(env->cur_state, REF_TYPE_LOCK, id, ptr);
+	s = find_lock_state(env->cur_state, REF_TYPE_LOCK | REF_TYPE_RES_LOCK | REF_TYPE_RES_LOCK_IRQ,
+			    id, ptr);
 	if (!s) {
 		verbose(env, "held lock and object are not in the same allocation\n");
 		return -EINVAL;
@@ -12170,9 +12252,18 @@ static bool is_bpf_graph_api_kfunc(u32 btf_id)
 	       btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl];
 }
 
+static bool is_bpf_res_spin_lock_kfunc(u32 btf_id)
+{
+	return btf_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+	       btf_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore];
+}
+
 static bool kfunc_spin_allowed(u32 btf_id)
 {
-	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id);
+	return is_bpf_graph_api_kfunc(btf_id) || is_bpf_iter_num_api_kfunc(btf_id) ||
+	       is_bpf_res_spin_lock_kfunc(btf_id);
 }
 
 static bool is_sync_callback_calling_kfunc(u32 btf_id)
@@ -12431,8 +12522,9 @@ static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
 	}
 }
 
-static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta,
-			    int insn_idx)
+static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate,
+			    struct bpf_kfunc_call_arg_meta *meta,
+			    int insn_idx, int arg_flags)
 {
 	const char *func_name = meta->func_name, *ref_tname;
 	const struct btf *btf = meta->btf;
@@ -12453,7 +12545,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 	 * verifier sees.
 	 */
 	for (i = 0; i < nargs; i++) {
-		struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[i + 1];
+		struct bpf_reg_state *regs = vstate->frame[vstate->curframe]->regs, *reg = &regs[i + 1];
 		const struct btf_type *t, *ref_t, *resolve_ret;
 		enum bpf_arg_type arg_type = ARG_DONTCARE;
 		u32 regno = i + 1, ref_id, type_size;
@@ -12604,6 +12696,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_CONST_STR:
 		case KF_ARG_PTR_TO_WORKQUEUE:
 		case KF_ARG_PTR_TO_IRQ_FLAG:
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
 			break;
 		default:
 			WARN_ON_ONCE(1);
@@ -12898,11 +12991,33 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "arg#%d doesn't point to an irq flag on stack\n", i);
 				return -EINVAL;
 			}
-			ret = process_irq_flag(env, regno, meta);
+			ret = process_irq_flag(env, vstate, regno, meta, arg_flags);
+			if (ret < 0)
+				return ret;
+			break;
+		case KF_ARG_PTR_TO_RES_SPIN_LOCK:
+		{
+			int flags = PROCESS_RES_LOCK;
+
+			if (reg->type != PTR_TO_MAP_VALUE && reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				verbose(env, "arg#%d doesn't point to map value or allocated object\n", i);
+				return -EINVAL;
+			}
+
+			if (!is_bpf_res_spin_lock_kfunc(meta->func_id))
+				return -EFAULT;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])
+				flags |= PROCESS_SPIN_LOCK;
+			if (meta->func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave] ||
+			    meta->func_id == special_kfunc_list[KF_bpf_res_spin_unlock_irqrestore])
+				flags |= PROCESS_LOCK_IRQ;
+			ret = process_spin_lock(env, vstate, regno, flags | arg_flags);
 			if (ret < 0)
 				return ret;
 			break;
 		}
+		}
 	}
 
 	if (is_kfunc_release(meta) && !meta->release_regno) {
@@ -12958,12 +13073,11 @@ static int fetch_kfunc_meta(struct bpf_verifier_env *env,
 
 static int check_return_code(struct bpf_verifier_env *env, int regno, const char *reg_name);
 
-static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
-			    int *insn_idx_p)
+static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate,
+			    struct bpf_insn *insn, int *insn_idx_p, int flags)
 {
 	bool sleepable, rcu_lock, rcu_unlock, preempt_disable, preempt_enable;
 	u32 i, nargs, ptr_type_id, release_ref_obj_id;
-	struct bpf_reg_state *regs = cur_regs(env);
 	const char *func_name, *ptr_type_name;
 	const struct btf_type *t, *ptr_type;
 	struct bpf_kfunc_call_arg_meta meta;
@@ -12971,8 +13085,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	int err, insn_idx = *insn_idx_p;
 	const struct btf_param *args;
 	const struct btf_type *ret_t;
+	struct bpf_reg_state *regs;
 	struct btf *desc_btf;
 
+	regs = vstate->frame[vstate->curframe]->regs;
+
 	/* skip for now, but return error when we find this in fixup_kfunc_call */
 	if (!insn->imm)
 		return 0;
@@ -12999,7 +13116,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	}
 
 	/* Check the arguments */
-	err = check_kfunc_args(env, &meta, insn_idx);
+	err = check_kfunc_args(env, vstate, &meta, insn_idx, flags);
 	if (err < 0)
 		return err;
 
@@ -13157,6 +13274,13 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 	if (btf_type_is_scalar(t)) {
 		mark_reg_unknown(env, regs, BPF_REG_0);
+		if (meta.btf == btf_vmlinux && (meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock] ||
+		    meta.func_id == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) {
+			if (flags & PROCESS_LOCK_FAIL)
+				__mark_reg_s32_range(env, regs, BPF_REG_0, -MAX_ERRNO, -1);
+			else
+				__mark_reg_const_zero(env, &regs[BPF_REG_0]);
+		}
 		mark_btf_func_reg_size(env, BPF_REG_0, t->size);
 	} else if (btf_type_is_ptr(t)) {
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
@@ -18040,7 +18164,8 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old,
 		case STACK_IRQ_FLAG:
 			old_reg = &old->stack[spi].spilled_ptr;
 			cur_reg = &cur->stack[spi].spilled_ptr;
-			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap))
+			if (!check_ids(old_reg->ref_obj_id, cur_reg->ref_obj_id, idmap) ||
+			    old_reg->irq.kfunc_class != cur_reg->irq.kfunc_class)
 				return false;
 			break;
 		case STACK_MISC:
@@ -18084,6 +18209,8 @@ static bool refsafe(struct bpf_verifier_state *old, struct bpf_verifier_state *c
 		case REF_TYPE_IRQ:
 			break;
 		case REF_TYPE_LOCK:
+		case REF_TYPE_RES_LOCK:
+		case REF_TYPE_RES_LOCK_IRQ:
 			if (old->refs[i].ptr != cur->refs[i].ptr)
 				return false;
 			break;
@@ -19074,7 +19201,19 @@ static int do_check(struct bpf_verifier_env *env)
 				if (insn->src_reg == BPF_PSEUDO_CALL) {
 					err = check_func_call(env, insn, &env->insn_idx);
 				} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
-					err = check_kfunc_call(env, insn, &env->insn_idx);
+					if (!insn->off &&
+					    (insn->imm == special_kfunc_list[KF_bpf_res_spin_lock] ||
+					     insn->imm == special_kfunc_list[KF_bpf_res_spin_lock_irqsave])) {
+						struct bpf_verifier_state *branch;
+
+						branch = push_stack(env, env->insn_idx + 1, env->prev_insn_idx, false);
+						if (!branch) {
+							verbose(env, "failed to push state for failed lock acquisition\n");
+							return -ENOMEM;
+						}
+						err = check_kfunc_call(env, branch, insn, &env->insn_idx, PROCESS_LOCK_FAIL);
+					}
+					err = err ?: check_kfunc_call(env, env->cur_state, insn, &env->insn_idx, 0);
 					if (!err && is_bpf_throw_kfunc(insn)) {
 						exception_exit = true;
 						goto process_bpf_exit_full;
@@ -19417,7 +19556,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		}
 	}
 
-	if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
+	if (btf_record_has_field(map->record, BPF_SPIN_LOCK | BPF_RES_SPIN_LOCK)) {
 		if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {
 			verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH bpf-next v1 22/22] selftests/bpf: Add tests for rqspinlock
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (20 preceding siblings ...)
  2025-01-07 14:00 ` [PATCH bpf-next v1 21/22] bpf: Implement verifier support for rqspinlock Kumar Kartikeya Dwivedi
@ 2025-01-07 14:00 ` Kumar Kartikeya Dwivedi
  2025-01-07 23:54 ` [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Linus Torvalds
  22 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 14:00 UTC (permalink / raw)
  To: bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Introduce selftests that trigger AA, ABBA deadlocks, and test the edge
case where the held locks table runs out of entries, since we then
fallback to the timeout as the final line of defense. Also exercise
verifier's AA detection where applicable.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/res_spin_lock.c  | 103 ++++++++
 tools/testing/selftests/bpf/progs/irq.c       |  53 ++++
 .../selftests/bpf/progs/res_spin_lock.c       | 189 +++++++++++++++
 .../selftests/bpf/progs/res_spin_lock_fail.c  | 226 ++++++++++++++++++
 4 files changed, 571 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/res_spin_lock_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
new file mode 100644
index 000000000000..547f76381d3a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "res_spin_lock.skel.h"
+#include "res_spin_lock_fail.skel.h"
+
+static void test_res_spin_lock_failure(void)
+{
+	RUN_TESTS(res_spin_lock_fail);
+}
+
+static volatile int skip;
+
+static void *spin_lock_thread(void *arg)
+{
+	int err, prog_fd = *(u32 *) arg;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 10000,
+	);
+
+	while (!skip) {
+		err = bpf_prog_test_run_opts(prog_fd, &topts);
+		ASSERT_OK(err, "test_run");
+		ASSERT_OK(topts.retval, "test_run retval");
+	}
+	pthread_exit(arg);
+}
+
+static void test_res_spin_lock_success(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct res_spin_lock *skel;
+	pthread_t thread_id[16];
+	int prog_fd, i, err;
+	void *ret;
+
+	skel = res_spin_lock__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "res_spin_lock__open_and_load"))
+		return;
+	/* AA deadlock */
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+	/* AA deadlock missed detection due to OoO unlock */
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_ooo_missed_AA);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_held_lock_max);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "error");
+	ASSERT_OK(topts.retval, "retval");
+
+	/* Multi-threaded ABBA deadlock. */
+
+	prog_fd = bpf_program__fd(skel->progs.res_spin_lock_test_AB);
+	for (i = 0; i < 16; i++) {
+		int err;
+
+		err = pthread_create(&thread_id[i], NULL, &spin_lock_thread, &prog_fd);
+		if (!ASSERT_OK(err, "pthread_create"))
+			goto end;
+	}
+
+	topts.repeat = 1000;
+	int fd = bpf_program__fd(skel->progs.res_spin_lock_test_BA);
+	while (!topts.retval && !err && !skel->bss->err) {
+		err = bpf_prog_test_run_opts(fd, &topts);
+	}
+	ASSERT_EQ(skel->bss->err, -EDEADLK, "timeout err");
+	ASSERT_OK(err, "err");
+	ASSERT_EQ(topts.retval, -EDEADLK, "timeout");
+
+	skip = true;
+
+	for (i = 0; i < 16; i++) {
+		if (!ASSERT_OK(pthread_join(thread_id[i], &ret), "pthread_join"))
+			goto end;
+		if (!ASSERT_EQ(ret, &prog_fd, "ret == prog_fd"))
+			goto end;
+	}
+end:
+	res_spin_lock__destroy(skel);
+	return;
+}
+
+void test_res_spin_lock(void)
+{
+	if (test__start_subtest("res_spin_lock_success"))
+		test_res_spin_lock_success();
+	if (test__start_subtest("res_spin_lock_failure"))
+		test_res_spin_lock_failure();
+}
diff --git a/tools/testing/selftests/bpf/progs/irq.c b/tools/testing/selftests/bpf/progs/irq.c
index b0b53d980964..3d4fee83a5be 100644
--- a/tools/testing/selftests/bpf/progs/irq.c
+++ b/tools/testing/selftests/bpf/progs/irq.c
@@ -11,6 +11,9 @@ extern void bpf_local_irq_save(unsigned long *) __weak __ksym;
 extern void bpf_local_irq_restore(unsigned long *) __weak __ksym;
 extern int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void *unsafe_ptr__ign, u64 flags) __weak __ksym;
 
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
 SEC("?tc")
 __failure __msg("arg#0 doesn't point to an irq flag on stack")
 int irq_save_bad_arg(struct __sk_buff *ctx)
@@ -441,4 +444,54 @@ int irq_ooo_refs_array(struct __sk_buff *ctx)
 	return 0;
 }
 
+SEC("?tc")
+__failure __msg("cannot restore irq state out of order")
+int irq_ooo_lock_cond_inv(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&lockB, &flags2)) {
+		bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+		return 0;
+	}
+
+	bpf_res_spin_unlock_irqrestore(&lockB, &flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags2);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_1(struct __sk_buff *ctx)
+{
+	unsigned long flags1;
+
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags1))
+		return 0;
+	/* For now, bpf_local_irq_restore is not allowed in critical section,
+	 * but this test ensures error will be caught with kfunc_class when it's
+	 * opened up. Tested by temporarily permitting this kfunc in critical
+	 * section.
+	 */
+	bpf_local_irq_restore(&flags1);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("function calls are not allowed")
+int irq_wrong_kfunc_class_2(struct __sk_buff *ctx)
+{
+	unsigned long flags1, flags2;
+
+	bpf_local_irq_save(&flags1);
+	if (bpf_res_spin_lock_irqsave(&lockA, &flags2))
+		return 0;
+	bpf_local_irq_restore(&flags2);
+	bpf_res_spin_unlock_irqrestore(&lockA, &flags1);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock.c b/tools/testing/selftests/bpf/progs/res_spin_lock.c
new file mode 100644
index 000000000000..6d98e8f99e04
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#define EDEADLK 35
+#define ETIMEDOUT 110
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 64);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+struct bpf_res_spin_lock lockA __hidden SEC(".data.A");
+struct bpf_res_spin_lock lockB __hidden SEC(".data.B");
+
+SEC("tc")
+int res_spin_lock_test(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem1, *elem2;
+	int r;
+
+	elem1 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem1)
+		return -1;
+	elem2 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem2)
+		return -1;
+
+	r = bpf_res_spin_lock(&elem1->lock);
+	if (r)
+		return r;
+	if (!bpf_res_spin_lock(&elem2->lock)) {
+		bpf_res_spin_unlock(&elem2->lock);
+		bpf_res_spin_unlock(&elem1->lock);
+		return -1;
+	}
+	bpf_res_spin_unlock(&elem1->lock);
+	return 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_ooo_missed_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem1, *elem2, *elem3;
+	int r;
+
+	elem1 = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem1)
+		return 1;
+	elem2 = bpf_map_lookup_elem(&arrmap, &(int){1});
+	if (!elem2)
+		return 2;
+	elem3 = bpf_map_lookup_elem(&arrmap, &(int){1});
+	if (!elem3)
+		return 3;
+	if (elem3 != elem2)
+		return 4;
+
+	r = bpf_res_spin_lock(&elem1->lock);
+	if (r)
+		return r;
+	if (bpf_res_spin_lock(&elem2->lock)) {
+		bpf_res_spin_unlock(&elem1->lock);
+		return 5;
+	}
+	/* Held locks shows elem1 but should be elem2 */
+	bpf_res_spin_unlock(&elem1->lock);
+	/* Distinct lookup gives a fresh id for elem3,
+	 * but it's the same address as elem2...
+	 */
+	r = bpf_res_spin_lock(&elem3->lock);
+	if (!r) {
+		/* Something is broken, how?? */
+		bpf_res_spin_unlock(&elem3->lock);
+		bpf_res_spin_unlock(&elem2->lock);
+		return 6;
+	}
+	/* We should get -ETIMEDOUT, as AA detection will fail to catch this. */
+	if (r != -ETIMEDOUT) {
+		bpf_res_spin_unlock(&elem2->lock);
+		return 7;
+	}
+	bpf_res_spin_unlock(&elem2->lock);
+	return 0;
+}
+
+SEC("tc")
+int res_spin_lock_test_AB(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockA);
+	if (r)
+		return !r;
+	/* Only unlock if we took the lock. */
+	if (!bpf_res_spin_lock(&lockB))
+		bpf_res_spin_unlock(&lockB);
+	bpf_res_spin_unlock(&lockA);
+	return 0;
+}
+
+int err;
+
+SEC("tc")
+int res_spin_lock_test_BA(struct __sk_buff *ctx)
+{
+	int r;
+
+	r = bpf_res_spin_lock(&lockB);
+	if (r)
+		return !r;
+	if (!bpf_res_spin_lock(&lockA))
+		bpf_res_spin_unlock(&lockA);
+	else
+		err = -EDEADLK;
+	bpf_res_spin_unlock(&lockB);
+	return -EDEADLK;
+}
+
+SEC("tc")
+int res_spin_lock_test_held_lock_max(struct __sk_buff *ctx)
+{
+	struct bpf_res_spin_lock *locks[48] = {};
+	struct arr_elem *e;
+	u64 time_beg, time;
+	int ret = 0, i;
+
+	_Static_assert(ARRAY_SIZE(((struct rqspinlock_held){}).locks) == 32,
+		       "RES_NR_HELD assumed to be 32");
+
+	for (i = 0; i < 34; i++) {
+		int key = i;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	for (; i < 48; i++) {
+		int key = i - 2;
+
+		/* We cannot pass in i as it will get spilled/filled by the compiler and
+		 * loses bounds in verifier state.
+		 */
+		e = bpf_map_lookup_elem(&arrmap, &key);
+		if (!e)
+			return 1;
+		locks[i] = &e->lock;
+	}
+
+	time_beg = bpf_ktime_get_ns();
+	for (i = 0; i < 34; i++) {
+		if (bpf_res_spin_lock(locks[i]))
+			goto end;
+	}
+
+	/* Trigger AA, after exhausting entries in the held lock table. This
+	 * time, only the timeout can save us, as AA detection won't succeed.
+	 */
+	if (!bpf_res_spin_lock(locks[34])) {
+		bpf_res_spin_unlock(locks[34]);
+		ret = 1;
+		goto end;
+	}
+
+end:
+	for (i = i - 1; i >= 0; i--)
+		bpf_res_spin_unlock(locks[i]);
+	time = bpf_ktime_get_ns() - time_beg;
+	/* Time spent should be easily above our limit (1/2 s), since AA
+	 * detection won't be expedited due to lack of held lock entry.
+	 */
+	return ret ?: (time > 1000000000 / 2 ? 0 : 1);
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
new file mode 100644
index 000000000000..dc402497a99e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/res_spin_lock_fail.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_misc.h"
+#include "bpf_experimental.h"
+
+struct arr_elem {
+	struct bpf_res_spin_lock lock;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct arr_elem);
+} arrmap SEC(".maps");
+
+long value;
+
+struct bpf_spin_lock lock __hidden SEC(".data.A");
+struct bpf_res_spin_lock res_lock __hidden SEC(".data.B");
+
+SEC("?tc")
+__failure __msg("point to map value or allocated object")
+int res_spin_lock_arg(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((struct bpf_res_spin_lock *)bpf_core_cast(&elem->lock, struct __sk_buff));
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("AA deadlock detected")
+int res_spin_lock_cond_AA(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_lock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock(&elem->lock);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_1(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_local_irq_save(&f1);
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("unlock of different lock")
+int res_spin_lock_irq_mismatch_2(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock(&res_lock))
+		return 0;
+	if (bpf_res_spin_lock(&elem->lock)) {
+		bpf_res_spin_unlock(&res_lock);
+		return 0;
+	}
+	bpf_res_spin_unlock(&elem->lock);
+	bpf_res_spin_unlock(&res_lock);
+	return 0;
+}
+
+SEC("?tc")
+__success
+int res_spin_lock_ooo_irq(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	unsigned long f1, f2;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&res_lock, &f1))
+		return 0;
+	if (bpf_res_spin_lock_irqsave(&elem->lock, &f2)) {
+		bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+		/* We won't have a unreleased IRQ flag error here. */
+		return 0;
+	}
+	bpf_res_spin_unlock_irqrestore(&elem->lock, &f2);
+	bpf_res_spin_unlock_irqrestore(&res_lock, &f1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("off 1 doesn't point to 'struct bpf_res_spin_lock' that is at 0")
+int res_spin_lock_bad_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem)
+		return 0;
+	bpf_res_spin_lock((void *)&elem->lock + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("R1 doesn't have constant offset. bpf_res_spin_lock has to be at the constant offset")
+int res_spin_lock_var_off(struct __sk_buff *ctx)
+{
+	struct arr_elem *elem;
+	u64 val = value;
+
+	elem = bpf_map_lookup_elem(&arrmap, &(int){0});
+	if (!elem) {
+		// FIXME: Only inline assembly use in assert macro doesn't emit
+		//	  BTF definition.
+		bpf_throw(0);
+		return 0;
+	}
+	bpf_assert_range(val, 0, 40);
+	bpf_res_spin_lock((void *)&value + val);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("map 'res_spin.bss' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_map(struct __sk_buff *ctx)
+{
+	bpf_res_spin_lock((void *)&value + 1);
+	return 0;
+}
+
+SEC("?tc")
+__failure __msg("local 'kptr' has no valid bpf_res_spin_lock")
+int res_spin_lock_no_lock_kptr(struct __sk_buff *ctx)
+{
+	struct { int i; } *p = bpf_obj_new(typeof(*p));
+
+	if (!p)
+		return 0;
+	bpf_res_spin_lock((void *)p);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 07/22] rqspinlock: Add support for timeouts
  2025-01-07 13:59 ` [PATCH bpf-next v1 07/22] rqspinlock: Add support for timeouts Kumar Kartikeya Dwivedi
@ 2025-01-07 14:50   ` Peter Zijlstra
  2025-01-07 17:14     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2025-01-07 14:50 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On Tue, Jan 07, 2025 at 05:59:49AM -0800, Kumar Kartikeya Dwivedi wrote:
> +struct rqspinlock_timeout {
> +	u64 timeout_end;
> +	u64 duration;
> +	u16 spin;
> +};

> +#define RES_CHECK_TIMEOUT(ts, ret)                    \
> +	({                                            \
> +		if (!((ts).spin++ & 0xffff))          \

Per the above spin is a u16, this mask is pointless.

> +			(ret) = check_timeout(&(ts)); \
> +		(ret);                                \
> +	})

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls
  2025-01-07 13:59 ` [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls Kumar Kartikeya Dwivedi
@ 2025-01-07 14:51   ` Peter Zijlstra
  2025-01-07 17:14     ` Kumar Kartikeya Dwivedi
  2025-01-08  2:19   ` Waiman Long
  1 sibling, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2025-01-07 14:51 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On Tue, Jan 07, 2025 at 05:59:50AM -0800, Kumar Kartikeya Dwivedi wrote:
> +	if (val & _Q_LOCKED_MASK) {
> +		RES_RESET_TIMEOUT(ts);
> +		smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
> +	}

Please check how smp_cond_load_acquire() works on ARM64 and then add
some words on how RES_CHECK_TIMEOUT() is still okay.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 07/22] rqspinlock: Add support for timeouts
  2025-01-07 14:50   ` Peter Zijlstra
@ 2025-01-07 17:14     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 17:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On Tue, 7 Jan 2025 at 20:20, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 07, 2025 at 05:59:49AM -0800, Kumar Kartikeya Dwivedi wrote:
> > +struct rqspinlock_timeout {
> > +     u64 timeout_end;
> > +     u64 duration;
> > +     u16 spin;
> > +};
>
> > +#define RES_CHECK_TIMEOUT(ts, ret)                    \
> > +     ({                                            \
> > +             if (!((ts).spin++ & 0xffff))          \
>
> Per the above spin is a u16, this mask is pointless.

Ack, I will drop the redundant mask.


>
> > +                     (ret) = check_timeout(&(ts)); \
> > +             (ret);                                \
> > +     })

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls
  2025-01-07 14:51   ` Peter Zijlstra
@ 2025-01-07 17:14     ` Kumar Kartikeya Dwivedi
  2025-01-07 19:17       ` Peter Zijlstra
  0 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 17:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On Tue, 7 Jan 2025 at 20:22, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 07, 2025 at 05:59:50AM -0800, Kumar Kartikeya Dwivedi wrote:
> > +     if (val & _Q_LOCKED_MASK) {
> > +             RES_RESET_TIMEOUT(ts);
> > +             smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
> > +     }
>
> Please check how smp_cond_load_acquire() works on ARM64 and then add
> some words on how RES_CHECK_TIMEOUT() is still okay.

Thanks Peter,

The __cmpwait_relaxed bit does indeed look problematic, my
understanding is that the ldxr + wfe sequence can get stuck because we
may not have any updates on the &lock->locked address, and we’ll not
call into RES_CHECK_TIMEOUT since that cond_expr check precedes the
__cmpwait macro.

I realized the sevl is just to not get stuck on the first wfe on
entry, it won’t unblock other CPUs WFE, so things are incorrect as-is.
In any case this is all too fragile to rely upon so it should be
fixed.

Do you have suggestions on resolving this? We want to invoke this
macro as part of the waiting loop. We can have a
rqspinlock_smp_cond_load_acquire that maps to no-WFE smp_load_acquire
loop on arm64 and uses the asm-generic version elsewhere.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls
  2025-01-07 17:14     ` Kumar Kartikeya Dwivedi
@ 2025-01-07 19:17       ` Peter Zijlstra
  2025-01-07 19:22         ` Peter Zijlstra
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2025-01-07 19:17 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On Tue, Jan 07, 2025 at 10:44:16PM +0530, Kumar Kartikeya Dwivedi wrote:
> On Tue, 7 Jan 2025 at 20:22, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Jan 07, 2025 at 05:59:50AM -0800, Kumar Kartikeya Dwivedi wrote:
> > > +     if (val & _Q_LOCKED_MASK) {
> > > +             RES_RESET_TIMEOUT(ts);
> > > +             smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
> > > +     }
> >
> > Please check how smp_cond_load_acquire() works on ARM64 and then add
> > some words on how RES_CHECK_TIMEOUT() is still okay.
> 
> Thanks Peter,
> 
> The __cmpwait_relaxed bit does indeed look problematic, my
> understanding is that the ldxr + wfe sequence can get stuck because we
> may not have any updates on the &lock->locked address, and we’ll not
> call into RES_CHECK_TIMEOUT since that cond_expr check precedes the
> __cmpwait macro.

IIRC the WFE will wake at least on every interrupt but might have an
inherent timeout itself, so it will make some progress, but not at a
speed comparable to a pure spin.

> Do you have suggestions on resolving this? We want to invoke this
> macro as part of the waiting loop. We can have a
> rqspinlock_smp_cond_load_acquire that maps to no-WFE smp_load_acquire
> loop on arm64 and uses the asm-generic version elsewhere.

That will make arm64 sad -- that wfe thing is how they get away with not
having paravirt spinlocks iirc. Also power consumption.

I've not read well enough to remember what order of timeout you're
looking for, but you could have the tick sample the lock like a watchdog
like, and write a magic 'lock' value when it is deemed stuck.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls
  2025-01-07 19:17       ` Peter Zijlstra
@ 2025-01-07 19:22         ` Peter Zijlstra
  2025-01-07 19:54           ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2025-01-07 19:22 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On Tue, Jan 07, 2025 at 08:17:56PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 07, 2025 at 10:44:16PM +0530, Kumar Kartikeya Dwivedi wrote:
> > On Tue, 7 Jan 2025 at 20:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Tue, Jan 07, 2025 at 05:59:50AM -0800, Kumar Kartikeya Dwivedi wrote:
> > > > +     if (val & _Q_LOCKED_MASK) {
> > > > +             RES_RESET_TIMEOUT(ts);
> > > > +             smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
> > > > +     }
> > >
> > > Please check how smp_cond_load_acquire() works on ARM64 and then add
> > > some words on how RES_CHECK_TIMEOUT() is still okay.
> > 
> > Thanks Peter,
> > 
> > The __cmpwait_relaxed bit does indeed look problematic, my
> > understanding is that the ldxr + wfe sequence can get stuck because we
> > may not have any updates on the &lock->locked address, and we’ll not
> > call into RES_CHECK_TIMEOUT since that cond_expr check precedes the
> > __cmpwait macro.
> 
> IIRC the WFE will wake at least on every interrupt but might have an
> inherent timeout itself, so it will make some progress, but not at a
> speed comparable to a pure spin.
> 
> > Do you have suggestions on resolving this? We want to invoke this
> > macro as part of the waiting loop. We can have a
> > rqspinlock_smp_cond_load_acquire that maps to no-WFE smp_load_acquire
> > loop on arm64 and uses the asm-generic version elsewhere.
> 
> That will make arm64 sad -- that wfe thing is how they get away with not
> having paravirt spinlocks iirc. Also power consumption.
> 
> I've not read well enough to remember what order of timeout you're
> looking for, but you could have the tick sample the lock like a watchdog
> like, and write a magic 'lock' value when it is deemed stuck.

Oh, there is this thread:

  https://lkml.kernel.org/r/20241107190818.522639-1-ankur.a.arora@oracle.com

That seems to add exactly what you need -- with the caveat that the
arm64 people will of course have to accept it first :-)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls
  2025-01-07 19:22         ` Peter Zijlstra
@ 2025-01-07 19:54           ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-07 19:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team, Will Deacon, Ankur Arora,
	linux-arm-kernel

On Wed, 8 Jan 2025 at 00:52, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 07, 2025 at 08:17:56PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 07, 2025 at 10:44:16PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > On Tue, 7 Jan 2025 at 20:22, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Tue, Jan 07, 2025 at 05:59:50AM -0800, Kumar Kartikeya Dwivedi wrote:
> > > > > +     if (val & _Q_LOCKED_MASK) {
> > > > > +             RES_RESET_TIMEOUT(ts);
> > > > > +             smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret));
> > > > > +     }
> > > >
> > > > Please check how smp_cond_load_acquire() works on ARM64 and then add
> > > > some words on how RES_CHECK_TIMEOUT() is still okay.
> > >
> > > Thanks Peter,
> > >
> > > The __cmpwait_relaxed bit does indeed look problematic, my
> > > understanding is that the ldxr + wfe sequence can get stuck because we
> > > may not have any updates on the &lock->locked address, and we’ll not
> > > call into RES_CHECK_TIMEOUT since that cond_expr check precedes the
> > > __cmpwait macro.
> >
> > IIRC the WFE will wake at least on every interrupt but might have an
> > inherent timeout itself, so it will make some progress, but not at a
> > speed comparable to a pure spin.

Yes, also, it is possible to have interrupts disabled (e.g. for
irqsave spin lock calls).

> >
> > > Do you have suggestions on resolving this? We want to invoke this
> > > macro as part of the waiting loop. We can have a
> > > rqspinlock_smp_cond_load_acquire that maps to no-WFE smp_load_acquire
> > > loop on arm64 and uses the asm-generic version elsewhere.
> >
> > That will make arm64 sad -- that wfe thing is how they get away with not
> > having paravirt spinlocks iirc. Also power consumption.
> >

Makes sense.

> > I've not read well enough to remember what order of timeout you're
> > looking for, but you could have the tick sample the lock like a watchdog
> > like, and write a magic 'lock' value when it is deemed stuck.
>
> Oh, there is this thread:
>
>   https://lkml.kernel.org/r/20241107190818.522639-1-ankur.a.arora@oracle.com
>
> That seems to add exactly what you need -- with the caveat that the
> arm64 people will of course have to accept it first :-)

This seems perfect, thanks. While it adds a relaxed variant, it can be
extended with an acquire variant as well.
I will make use of this once it lands, it looks like it is pretty close.
Until then I'm thinking that falling back to a non-WFE loop is the
best course for now.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
                   ` (21 preceding siblings ...)
  2025-01-07 14:00 ` [PATCH bpf-next v1 22/22] selftests/bpf: Add tests " Kumar Kartikeya Dwivedi
@ 2025-01-07 23:54 ` Linus Torvalds
  2025-01-08  9:18   ` Peter Zijlstra
  22 siblings, 1 reply; 63+ messages in thread
From: Linus Torvalds @ 2025-01-07 23:54 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Will Deacon
  Cc: bpf, linux-kernel, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Tue, 7 Jan 2025 at 06:00, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> This patch set introduces Resilient Queued Spin Lock (or rqspinlock with
> res_spin_lock() and res_spin_unlock() APIs).

So when I see people doing new locking mechanisms, I invariably go "Oh no!".

But this series seems reasonable to me. I see that PeterZ had a couple
of minor comments (well, the arm64 one is more fundamental), which
hopefully means that it seems reasonable to him too. Peter?

That said, it would be lovely if Waiman and Will would also take a
look. Perhaps Will in particular, considering Peter's point about
smp_cond_load_acquire() on arm64. And it looks like Will wasn't cc'd
on the series. Added.

Will? See

    https://lore.kernel.org/all/20250107140004.2732830-1-memxor@gmail.com/

for the series.

               Linus

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls
  2025-01-07 13:59 ` [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls Kumar Kartikeya Dwivedi
  2025-01-07 14:51   ` Peter Zijlstra
@ 2025-01-08  2:19   ` Waiman Long
  2025-01-08 20:13     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-08  2:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> The pending bit is used to avoid queueing in case the lock is
> uncontended, and has demonstrated benefits for the 2 contender scenario,
> esp. on x86. In case the pending bit is acquired and we wait for the
> locked bit to disappear, we may get stuck due to the lock owner not
> making progress. Hence, this waiting loop must be protected with a
> timeout check.
>
> To perform a graceful recovery once we decide to abort our lock
> acquisition attempt in this case, we must unset the pending bit since we
> own it. All waiters undoing their changes and exiting gracefully allows
> the lock word to be restored to the unlocked state once all participants
> (owner, waiters) have been recovered, and the lock remains usable.
> Hence, set the pending bit back to zero before returning to the caller.
>
> Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout
> event statistics.
>
> Reviewed-by: Barret Rhoden <brho@google.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   include/asm-generic/rqspinlock.h  |  2 +-
>   kernel/locking/lock_events_list.h |  5 +++++
>   kernel/locking/rqspinlock.c       | 28 +++++++++++++++++++++++-----
>   3 files changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
> index 8ed266f4e70b..5c996a82e75f 100644
> --- a/include/asm-generic/rqspinlock.h
> +++ b/include/asm-generic/rqspinlock.h
> @@ -19,6 +19,6 @@ struct qspinlock;
>    */
>   #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
>   
> -extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
> +extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
>   
>   #endif /* __ASM_GENERIC_RQSPINLOCK_H */
> diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
> index 97fb6f3f840a..c5286249994d 100644
> --- a/kernel/locking/lock_events_list.h
> +++ b/kernel/locking/lock_events_list.h
> @@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4)	/* # of locking ops that use 4th percpu node */
>   LOCK_EVENT(lock_no_node)	/* # of locking ops w/o using percpu node    */
>   #endif /* CONFIG_QUEUED_SPINLOCKS */
>   
> +/*
> + * Locking events for Resilient Queued Spin Lock
> + */
> +LOCK_EVENT(rqspinlock_lock_timeout)	/* # of locking ops that timeout	*/
> +
>   /*
>    * Locking events for rwsem
>    */

Since the build of rqspinlock.c is conditional on 
CONFIG_QUEUED_SPINLOCKS, this lock event should be inside the 
CONFIG_QUEUED_SPINLOCKS block.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 09/22] rqspinlock: Protect waiters in queue from stalls
  2025-01-07 13:59 ` [PATCH bpf-next v1 09/22] rqspinlock: Protect waiters in queue " Kumar Kartikeya Dwivedi
@ 2025-01-08  3:38   ` Waiman Long
  2025-01-08 20:42     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-08  3:38 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, linux-kernel
  Cc: Barret Rhoden, Linus Torvalds, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> Implement the wait queue cleanup algorithm for rqspinlock. There are
> three forms of waiters in the original queued spin lock algorithm. The
> first is the waiter which acquires the pending bit and spins on the lock
> word without forming a wait queue. The second is the head waiter that is
> the first waiter heading the wait queue. The third form is of all the
> non-head waiters queued behind the head, waiting to be signalled through
> their MCS node to overtake the responsibility of the head.
>
> In this commit, we are concerned with the second and third kind. First,
> we augment the waiting loop of the head of the wait queue with a
> timeout. When this timeout happens, all waiters part of the wait queue
> will abort their lock acquisition attempts. This happens in three steps.
> First, the head breaks out of its loop waiting for pending and locked
> bits to turn to 0, and non-head waiters break out of their MCS node spin
> (more on that later). Next, every waiter (head or non-head) attempts to
> check whether they are also the tail waiter, in such a case they attempt
> to zero out the tail word and allow a new queue to be built up for this
> lock. If they succeed, they have no one to signal next in the queue to
> stop spinning. Otherwise, they signal the MCS node of the next waiter to
> break out of its spin and try resetting the tail word back to 0. This
> goes on until the tail waiter is found. In case of races, the new tail
> will be responsible for performing the same task, as the old tail will
> then fail to reset the tail word and wait for its next pointer to be
> updated before it signals the new tail to do the same.
>
> Lastly, all of these waiters release the rqnode and return to the
> caller. This patch underscores the point that rqspinlock's timeout does
> not apply to each waiter individually, and cannot be relied upon as an
> upper bound. It is possible for the rqspinlock waiters to return early
> from a failed lock acquisition attempt as soon as stalls are detected.
>
> The head waiter cannot directly WRITE_ONCE the tail to zero, as it may
> race with a concurrent xchg and a non-head waiter linking its MCS node
> to the head's MCS node through 'prev->next' assignment.
>
> Reviewed-by: Barret Rhoden <brho@google.com>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   kernel/locking/rqspinlock.c | 42 +++++++++++++++++++++++++++++---
>   kernel/locking/rqspinlock.h | 48 +++++++++++++++++++++++++++++++++++++
>   2 files changed, 87 insertions(+), 3 deletions(-)
>   create mode 100644 kernel/locking/rqspinlock.h
>
> diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
> index dd305573db13..f712fe4b1f38 100644
> --- a/kernel/locking/rqspinlock.c
> +++ b/kernel/locking/rqspinlock.c
> @@ -77,6 +77,8 @@ struct rqspinlock_timeout {
>   	u16 spin;
>   };
>   
> +#define RES_TIMEOUT_VAL	2
> +
>   static noinline int check_timeout(struct rqspinlock_timeout *ts)
>   {
>   	u64 time = ktime_get_mono_fast_ns();
> @@ -305,12 +307,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
>   	 * head of the waitqueue.
>   	 */
>   	if (old & _Q_TAIL_MASK) {
> +		int val;
> +
>   		prev = decode_tail(old, qnodes);
>   
>   		/* Link @node into the waitqueue. */
>   		WRITE_ONCE(prev->next, node);
>   
> -		arch_mcs_spin_lock_contended(&node->locked);
> +		val = arch_mcs_spin_lock_contended(&node->locked);
> +		if (val == RES_TIMEOUT_VAL) {
> +			ret = -EDEADLK;
> +			goto waitq_timeout;
> +		}
>   
>   		/*
>   		 * While waiting for the MCS lock, the next pointer may have
> @@ -334,7 +342,35 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
>   	 * sequentiality; this is because the set_locked() function below
>   	 * does not imply a full barrier.
>   	 */
> -	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
> +	RES_RESET_TIMEOUT(ts);
> +	val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
> +				       RES_CHECK_TIMEOUT(ts, ret));

This has the same wfe problem for arm64.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-07 23:54 ` [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Linus Torvalds
@ 2025-01-08  9:18   ` Peter Zijlstra
  2025-01-08 20:12     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2025-01-08  9:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kumar Kartikeya Dwivedi, Will Deacon, bpf, linux-kernel,
	Waiman Long, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Tue, Jan 07, 2025 at 03:54:36PM -0800, Linus Torvalds wrote:
> On Tue, 7 Jan 2025 at 06:00, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > This patch set introduces Resilient Queued Spin Lock (or rqspinlock with
> > res_spin_lock() and res_spin_unlock() APIs).
> 
> So when I see people doing new locking mechanisms, I invariably go "Oh no!".
> 
> But this series seems reasonable to me. I see that PeterZ had a couple
> of minor comments (well, the arm64 one is more fundamental), which
> hopefully means that it seems reasonable to him too. Peter?

I've not had time to fully read the whole thing yet, I only did a quick
once over. I'll try and get around to doing a proper reading eventually,
but I'm chasing a regression atm, and then I need to go review a ton of
code Andrew merged over the xmas/newyears holiday :/

One potential issue is that qspinlock isn't suitable for all
architectures -- and I've yet to figure out widely BPF is planning on
using this. Notably qspinlock is ineffective (as in way over engineered)
for architectures that do not provide hardware level progress guarantees
on competing atomics and qspinlock uses mixed sized atomics, which are
typically under specified, architecturally.

Another issue is the code duplication.

Anyway, I'll get to it eventually...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs
  2025-01-07 14:00 ` [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
@ 2025-01-08 10:23   ` kernel test robot
  2025-01-08 10:23   ` kernel test robot
  2025-01-08 10:44   ` kernel test robot
  2 siblings, 0 replies; 63+ messages in thread
From: kernel test robot @ 2025-01-08 10:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, linux-kernel
  Cc: oe-kbuild-all, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Hi Kumar,

kernel test robot noticed the following build errors:

[auto build test ERROR on f44275e7155dc310d36516fc25be503da099781c]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/locking-Move-MCS-struct-definition-to-public-header/20250107-220615
base:   f44275e7155dc310d36516fc25be503da099781c
patch link:    https://lore.kernel.org/r/20250107140004.2732830-21-memxor%40gmail.com
patch subject: [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs
config: alpha-allnoconfig (https://download.01.org/0day-ci/archive/20250108/202501081832.WyLcpM5w-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250108/202501081832.WyLcpM5w-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501081832.WyLcpM5w-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from ./arch/alpha/include/generated/asm/rqspinlock.h:1,
                    from include/linux/bpf.h:33,
                    from include/linux/security.h:35,
                    from include/linux/perf_event.h:62,
                    from include/linux/trace_events.h:10,
                    from include/trace/syscall.h:7,
                    from include/linux/syscalls.h:94,
                    from init/main.c:21:
>> include/asm-generic/rqspinlock.h:15:10: fatal error: asm/qspinlock.h: No such file or directory
      15 | #include <asm/qspinlock.h>
         |          ^~~~~~~~~~~~~~~~~
   compilation terminated.


vim +15 include/asm-generic/rqspinlock.h

13d8f36ca2ecdf Kumar Kartikeya Dwivedi 2025-01-07  11  
13d8f36ca2ecdf Kumar Kartikeya Dwivedi 2025-01-07  12  #include <linux/types.h>
ebea887f32c13b Kumar Kartikeya Dwivedi 2025-01-07  13  #include <vdso/time64.h>
83c0f407f3dad2 Kumar Kartikeya Dwivedi 2025-01-07  14  #include <linux/percpu.h>
ea74a398e7e95d Kumar Kartikeya Dwivedi 2025-01-07 @15  #include <asm/qspinlock.h>
13d8f36ca2ecdf Kumar Kartikeya Dwivedi 2025-01-07  16  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs
  2025-01-07 14:00 ` [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
  2025-01-08 10:23   ` kernel test robot
@ 2025-01-08 10:23   ` kernel test robot
  2025-01-08 10:44   ` kernel test robot
  2 siblings, 0 replies; 63+ messages in thread
From: kernel test robot @ 2025-01-08 10:23 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, linux-kernel
  Cc: oe-kbuild-all, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

Hi Kumar,

kernel test robot noticed the following build errors:

[auto build test ERROR on f44275e7155dc310d36516fc25be503da099781c]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/locking-Move-MCS-struct-definition-to-public-header/20250107-220615
base:   f44275e7155dc310d36516fc25be503da099781c
patch link:    https://lore.kernel.org/r/20250107140004.2732830-21-memxor%40gmail.com
patch subject: [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs
config: loongarch-allnoconfig (https://download.01.org/0day-ci/archive/20250108/202501081853.1N3CiU6j-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250108/202501081853.1N3CiU6j-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501081853.1N3CiU6j-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/asm-generic/qspinlock.h:42,
                    from arch/loongarch/include/asm/qspinlock.h:39,
                    from include/asm-generic/rqspinlock.h:15,
                    from ./arch/loongarch/include/generated/asm/rqspinlock.h:1,
                    from include/linux/bpf.h:33,
                    from include/linux/security.h:35,
                    from kernel/printk/printk.c:34:
   include/asm-generic/qspinlock_types.h:44:3: error: conflicting types for 'arch_spinlock_t'; have 'struct qspinlock'
      44 | } arch_spinlock_t;
         |   ^~~~~~~~~~~~~~~
   In file included from include/linux/spinlock_types_raw.h:9,
                    from include/linux/ratelimit_types.h:7,
                    from include/linux/printk.h:9,
                    from include/linux/kernel.h:31,
                    from kernel/printk/printk.c:22:
   include/linux/spinlock_types_up.h:25:20: note: previous declaration of 'arch_spinlock_t' with type 'arch_spinlock_t'
      25 | typedef struct { } arch_spinlock_t;
         |                    ^~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:9: warning: "__ARCH_SPIN_LOCK_UNLOCKED" redefined
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_up.h:27:9: note: this is the location of the previous definition
      27 | #define __ARCH_SPIN_LOCK_UNLOCKED { }
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock.h:144:9: warning: "arch_spin_is_locked" redefined
     144 | #define arch_spin_is_locked(l)          queued_spin_is_locked(l)
         |         ^~~~~~~~~~~~~~~~~~~
   In file included from include/linux/spinlock.h:97,
                    from include/linux/mmzone.h:8,
                    from include/linux/gfp.h:7,
                    from include/linux/mm.h:7,
                    from kernel/printk/printk.c:23:
   include/linux/spinlock_up.h:62:9: note: this is the location of the previous definition
      62 | #define arch_spin_is_locked(lock)       ((void)(lock), 0)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock.h:145:9: warning: "arch_spin_is_contended" redefined
     145 | #define arch_spin_is_contended(l)       queued_spin_is_contended(l)
         |         ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_up.h:69:9: note: this is the location of the previous definition
      69 | #define arch_spin_is_contended(lock)    (((void)(lock), 0))
         |         ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock.h:147:9: warning: "arch_spin_lock" redefined
     147 | #define arch_spin_lock(l)               queued_spin_lock(l)
         |         ^~~~~~~~~~~~~~
   include/linux/spinlock_up.h:64:10: note: this is the location of the previous definition
      64 | # define arch_spin_lock(lock)           do { barrier(); (void)(lock); } while (0)
         |          ^~~~~~~~~~~~~~
   include/asm-generic/qspinlock.h:148:9: warning: "arch_spin_trylock" redefined
     148 | #define arch_spin_trylock(l)            queued_spin_trylock(l)
         |         ^~~~~~~~~~~~~~~~~
   include/linux/spinlock_up.h:66:10: note: this is the location of the previous definition
      66 | # define arch_spin_trylock(lock)        ({ barrier(); (void)(lock); 1; })
         |          ^~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock.h:149:9: warning: "arch_spin_unlock" redefined
     149 | #define arch_spin_unlock(l)             queued_spin_unlock(l)
         |         ^~~~~~~~~~~~~~~~
   include/linux/spinlock_up.h:65:10: note: this is the location of the previous definition
      65 | # define arch_spin_unlock(lock) do { barrier(); (void)(lock); } while (0)
         |          ^~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:89:8: note: in expansion of macro 'DEFINE_MUTEX'
      89 | static DEFINE_MUTEX(console_mutex);
         |        ^~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:89:8: note: in expansion of macro 'DEFINE_MUTEX'
      89 | static DEFINE_MUTEX(console_mutex);
         |        ^~~~~~~~~~~~
   In file included from include/linux/atomic.h:7,
                    from include/asm-generic/bitops/atomic.h:5,
                    from arch/loongarch/include/asm/bitops.h:27,
                    from include/linux/bitops.h:68,
                    from include/linux/kernel.h:23:
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:89:8: note: in expansion of macro 'DEFINE_MUTEX'
      89 | static DEFINE_MUTEX(console_mutex);
         |        ^~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:89:8: note: in expansion of macro 'DEFINE_MUTEX'
      89 | static DEFINE_MUTEX(console_mutex);
         |        ^~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:89:8: note: in expansion of macro 'DEFINE_MUTEX'
      89 | static DEFINE_MUTEX(console_mutex);
         |        ^~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:89:8: note: in expansion of macro 'DEFINE_MUTEX'
      89 | static DEFINE_MUTEX(console_mutex);
         |        ^~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:23:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      23 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED((name).lock),        \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:35:34: note: in expansion of macro '__SEMAPHORE_INITIALIZER'
      35 |         struct semaphore _name = __SEMAPHORE_INITIALIZER(_name, _n)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:95:8: note: in expansion of macro 'DEFINE_SEMAPHORE'
      95 | static DEFINE_SEMAPHORE(console_sem, 1);
         |        ^~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:23:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      23 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED((name).lock),        \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:35:34: note: in expansion of macro '__SEMAPHORE_INITIALIZER'
      35 |         struct semaphore _name = __SEMAPHORE_INITIALIZER(_name, _n)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:95:8: note: in expansion of macro 'DEFINE_SEMAPHORE'
      95 | static DEFINE_SEMAPHORE(console_sem, 1);
         |        ^~~~~~~~~~~~~~~~
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:23:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      23 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED((name).lock),        \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:35:34: note: in expansion of macro '__SEMAPHORE_INITIALIZER'
      35 |         struct semaphore _name = __SEMAPHORE_INITIALIZER(_name, _n)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:95:8: note: in expansion of macro 'DEFINE_SEMAPHORE'
      95 | static DEFINE_SEMAPHORE(console_sem, 1);
         |        ^~~~~~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:23:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      23 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED((name).lock),        \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:35:34: note: in expansion of macro '__SEMAPHORE_INITIALIZER'
      35 |         struct semaphore _name = __SEMAPHORE_INITIALIZER(_name, _n)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:95:8: note: in expansion of macro 'DEFINE_SEMAPHORE'
      95 | static DEFINE_SEMAPHORE(console_sem, 1);
         |        ^~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:23:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      23 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED((name).lock),        \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:35:34: note: in expansion of macro '__SEMAPHORE_INITIALIZER'
      35 |         struct semaphore _name = __SEMAPHORE_INITIALIZER(_name, _n)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:95:8: note: in expansion of macro 'DEFINE_SEMAPHORE'
      95 | static DEFINE_SEMAPHORE(console_sem, 1);
         |        ^~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:23:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      23 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED((name).lock),        \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/semaphore.h:35:34: note: in expansion of macro '__SEMAPHORE_INITIALIZER'
      35 |         struct semaphore _name = __SEMAPHORE_INITIALIZER(_name, _n)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:95:8: note: in expansion of macro 'DEFINE_SEMAPHORE'
      95 | static DEFINE_SEMAPHORE(console_sem, 1);
         |        ^~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/swait.h:62:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      62 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),          \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:36:20: note: in expansion of macro '__SWAIT_QUEUE_HEAD_INITIALIZER'
      36 |         .srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq),        \
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:49:42: note: in expansion of macro '__SRCU_STRUCT_INIT'
      49 |         static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name)
         |                                          ^~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:98:1: note: in expansion of macro 'DEFINE_STATIC_SRCU'
      98 | DEFINE_STATIC_SRCU(console_srcu);
         | ^~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/swait.h:62:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      62 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),          \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:36:20: note: in expansion of macro '__SWAIT_QUEUE_HEAD_INITIALIZER'
      36 |         .srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq),        \
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:49:42: note: in expansion of macro '__SRCU_STRUCT_INIT'
      49 |         static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name)
         |                                          ^~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:98:1: note: in expansion of macro 'DEFINE_STATIC_SRCU'
      98 | DEFINE_STATIC_SRCU(console_srcu);
         | ^~~~~~~~~~~~~~~~~~
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/swait.h:62:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      62 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),          \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:36:20: note: in expansion of macro '__SWAIT_QUEUE_HEAD_INITIALIZER'
      36 |         .srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq),        \
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:49:42: note: in expansion of macro '__SRCU_STRUCT_INIT'
      49 |         static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name)
         |                                          ^~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:98:1: note: in expansion of macro 'DEFINE_STATIC_SRCU'
      98 | DEFINE_STATIC_SRCU(console_srcu);
         | ^~~~~~~~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/swait.h:62:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      62 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),          \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:36:20: note: in expansion of macro '__SWAIT_QUEUE_HEAD_INITIALIZER'
      36 |         .srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq),        \
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:49:42: note: in expansion of macro '__SRCU_STRUCT_INIT'
      49 |         static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name)
         |                                          ^~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:98:1: note: in expansion of macro 'DEFINE_STATIC_SRCU'
      98 | DEFINE_STATIC_SRCU(console_srcu);
         | ^~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/swait.h:62:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      62 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),          \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:36:20: note: in expansion of macro '__SWAIT_QUEUE_HEAD_INITIALIZER'
      36 |         .srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq),        \
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:49:42: note: in expansion of macro '__SRCU_STRUCT_INIT'
      49 |         static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name)
         |                                          ^~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:98:1: note: in expansion of macro 'DEFINE_STATIC_SRCU'
      98 | DEFINE_STATIC_SRCU(console_srcu);
         | ^~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/swait.h:62:27: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      62 |         .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),          \
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:36:20: note: in expansion of macro '__SWAIT_QUEUE_HEAD_INITIALIZER'
      36 |         .srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq),        \
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/srcutiny.h:49:42: note: in expansion of macro '__SRCU_STRUCT_INIT'
      49 |         static struct srcu_struct name = __SRCU_STRUCT_INIT(name, name, name)
         |                                          ^~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:98:1: note: in expansion of macro 'DEFINE_STATIC_SRCU'
      98 | DEFINE_STATIC_SRCU(console_srcu);
         | ^~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:465:8: note: in expansion of macro 'DEFINE_MUTEX'
     465 | static DEFINE_MUTEX(syslog_lock);
         |        ^~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:465:8: note: in expansion of macro 'DEFINE_MUTEX'
     465 | static DEFINE_MUTEX(syslog_lock);
         |        ^~~~~~~~~~~~
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:465:8: note: in expansion of macro 'DEFINE_MUTEX'
     465 | static DEFINE_MUTEX(syslog_lock);
         |        ^~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:465:8: note: in expansion of macro 'DEFINE_MUTEX'
     465 | static DEFINE_MUTEX(syslog_lock);
         |        ^~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:465:8: note: in expansion of macro 'DEFINE_MUTEX'
     465 | static DEFINE_MUTEX(syslog_lock);
         |        ^~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:32: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:34: note: in expansion of macro '__MUTEX_INITIALIZER'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:465:8: note: in expansion of macro 'DEFINE_MUTEX'
     465 | static DEFINE_MUTEX(syslog_lock);
         |        ^~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:493:1: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     493 | DECLARE_WAIT_QUEUE_HEAD(log_wait);
         | ^~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:493:1: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     493 | DECLARE_WAIT_QUEUE_HEAD(log_wait);
         | ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:493:1: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     493 | DECLARE_WAIT_QUEUE_HEAD(log_wait);
         | ^~~~~~~~~~~~~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:493:1: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     493 | DECLARE_WAIT_QUEUE_HEAD(log_wait);
         | ^~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:493:1: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     493 | DECLARE_WAIT_QUEUE_HEAD(log_wait);
         | ^~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:493:1: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     493 | DECLARE_WAIT_QUEUE_HEAD(log_wait);
         | ^~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:494:8: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     494 | static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
         |        ^~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:494:8: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     494 | static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
         |        ^~~~~~~~~~~~~~~~~~~~~~~
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:494:8: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     494 | static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
         |        ^~~~~~~~~~~~~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:494:8: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     494 | static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
         |        ^~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:494:8: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     494 | static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
         |        ^~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:56:27: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      56 |         .lock           = __SPIN_LOCK_UNLOCKED(name.lock),                      \
         |                           ^~~~~~~~~~~~~~~~~~~~
   include/linux/wait.h:60:39: note: in expansion of macro '__WAIT_QUEUE_HEAD_INITIALIZER'
      60 |         struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
         |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:494:8: note: in expansion of macro 'DECLARE_WAIT_QUEUE_HEAD'
     494 | static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
         |        ^~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:71:52: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      71 | #define DEFINE_RAW_SPINLOCK(x)  raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
         |                                                    ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:1891:8: note: in expansion of macro 'DEFINE_RAW_SPINLOCK'
    1891 | static DEFINE_RAW_SPINLOCK(console_owner_lock);
         |        ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:71:52: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      71 | #define DEFINE_RAW_SPINLOCK(x)  raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
         |                                                    ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:1891:8: note: in expansion of macro 'DEFINE_RAW_SPINLOCK'
    1891 | static DEFINE_RAW_SPINLOCK(console_owner_lock);
         |        ^~~~~~~~~~~~~~~~~~~
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:71:52: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      71 | #define DEFINE_RAW_SPINLOCK(x)  raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
         |                                                    ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:1891:8: note: in expansion of macro 'DEFINE_RAW_SPINLOCK'
    1891 | static DEFINE_RAW_SPINLOCK(console_owner_lock);
         |        ^~~~~~~~~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:71:52: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      71 | #define DEFINE_RAW_SPINLOCK(x)  raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
         |                                                    ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:1891:8: note: in expansion of macro 'DEFINE_RAW_SPINLOCK'
    1891 | static DEFINE_RAW_SPINLOCK(console_owner_lock);
         |        ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:71:52: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      71 | #define DEFINE_RAW_SPINLOCK(x)  raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
         |                                                    ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:1891:8: note: in expansion of macro 'DEFINE_RAW_SPINLOCK'
    1891 | static DEFINE_RAW_SPINLOCK(console_owner_lock);
         |        ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:71:52: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      71 | #define DEFINE_RAW_SPINLOCK(x)  raw_spinlock_t x = __RAW_SPIN_LOCK_UNLOCKED(x)
         |                                                    ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:1891:8: note: in expansion of macro 'DEFINE_RAW_SPINLOCK'
    1891 | static DEFINE_RAW_SPINLOCK(console_owner_lock);
         |        ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:27:35: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      27 |                 .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),            \
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:34:9: note: in expansion of macro 'RATELIMIT_STATE_INIT_FLAGS'
      34 |         RATELIMIT_STATE_INIT_FLAGS(name, interval_init, burst_init, 0)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:42:17: note: in expansion of macro 'RATELIMIT_STATE_INIT'
      42 |                 RATELIMIT_STATE_INIT(name, interval_init, burst_init)   \
         |                 ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4592:1: note: in expansion of macro 'DEFINE_RATELIMIT_STATE'
    4592 | DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
         | ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:27:35: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      27 |                 .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),            \
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:34:9: note: in expansion of macro 'RATELIMIT_STATE_INIT_FLAGS'
      34 |         RATELIMIT_STATE_INIT_FLAGS(name, interval_init, burst_init, 0)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:42:17: note: in expansion of macro 'RATELIMIT_STATE_INIT'
      42 |                 RATELIMIT_STATE_INIT(name, interval_init, burst_init)   \
         |                 ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4592:1: note: in expansion of macro 'DEFINE_RATELIMIT_STATE'
    4592 | DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
         | ^~~~~~~~~~~~~~~~~~~~~~
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:27:35: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      27 |                 .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),            \
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:34:9: note: in expansion of macro 'RATELIMIT_STATE_INIT_FLAGS'
      34 |         RATELIMIT_STATE_INIT_FLAGS(name, interval_init, burst_init, 0)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:42:17: note: in expansion of macro 'RATELIMIT_STATE_INIT'
      42 |                 RATELIMIT_STATE_INIT(name, interval_init, burst_init)   \
         |                 ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4592:1: note: in expansion of macro 'DEFINE_RATELIMIT_STATE'
    4592 | DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
         | ^~~~~~~~~~~~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:27:35: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      27 |                 .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),            \
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:34:9: note: in expansion of macro 'RATELIMIT_STATE_INIT_FLAGS'
      34 |         RATELIMIT_STATE_INIT_FLAGS(name, interval_init, burst_init, 0)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:42:17: note: in expansion of macro 'RATELIMIT_STATE_INIT'
      42 |                 RATELIMIT_STATE_INIT(name, interval_init, burst_init)   \
         |                 ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4592:1: note: in expansion of macro 'DEFINE_RATELIMIT_STATE'
    4592 | DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
         | ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:27:35: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      27 |                 .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),            \
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:34:9: note: in expansion of macro 'RATELIMIT_STATE_INIT_FLAGS'
      34 |         RATELIMIT_STATE_INIT_FLAGS(name, interval_init, burst_init, 0)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:42:17: note: in expansion of macro 'RATELIMIT_STATE_INIT'
      42 |                 RATELIMIT_STATE_INIT(name, interval_init, burst_init)   \
         |                 ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4592:1: note: in expansion of macro 'DEFINE_RATELIMIT_STATE'
    4592 | DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
         | ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types_raw.h:64:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:26: note: in expansion of macro '__RAW_SPIN_LOCK_INITIALIZER'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:27:35: note: in expansion of macro '__RAW_SPIN_LOCK_UNLOCKED'
      27 |                 .lock           = __RAW_SPIN_LOCK_UNLOCKED(name.lock),            \
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:34:9: note: in expansion of macro 'RATELIMIT_STATE_INIT_FLAGS'
      34 |         RATELIMIT_STATE_INIT_FLAGS(name, interval_init, burst_init, 0)
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/ratelimit_types.h:42:17: note: in expansion of macro 'RATELIMIT_STATE_INIT'
      42 |                 RATELIMIT_STATE_INIT(name, interval_init, burst_init)   \
         |                 ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4592:1: note: in expansion of macro 'DEFINE_RATELIMIT_STATE'
    4592 | DEFINE_RATELIMIT_STATE(printk_ratelimit_state, 5 * HZ, 10);
         | ^~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: error: extra brace group at end of initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:43:48: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      43 | #define DEFINE_SPINLOCK(x)      spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
         |                                                ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4622:8: note: in expansion of macro 'DEFINE_SPINLOCK'
    4622 | static DEFINE_SPINLOCK(dump_list_lock);
         |        ^~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:43:48: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      43 | #define DEFINE_SPINLOCK(x)      spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
         |                                                ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4622:8: note: in expansion of macro 'DEFINE_SPINLOCK'
    4622 | static DEFINE_SPINLOCK(dump_list_lock);
         |        ^~~~~~~~~~~~~~~
>> arch/loongarch/include/asm/atomic.h:32:27: error: extra brace group at end of initializer
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:43:48: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      43 | #define DEFINE_SPINLOCK(x)      spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
         |                                                ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4622:8: note: in expansion of macro 'DEFINE_SPINLOCK'
    4622 | static DEFINE_SPINLOCK(dump_list_lock);
         |        ^~~~~~~~~~~~~~~
   arch/loongarch/include/asm/atomic.h:32:27: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      32 | #define ATOMIC_INIT(i)    { (i) }
         |                           ^
   include/asm-generic/qspinlock_types.h:49:52: note: in expansion of macro 'ATOMIC_INIT'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                                    ^~~~~~~~~~~
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:43:48: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      43 | #define DEFINE_SPINLOCK(x)      spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
         |                                                ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4622:8: note: in expansion of macro 'DEFINE_SPINLOCK'
    4622 | static DEFINE_SPINLOCK(dump_list_lock);
         |        ^~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: warning: excess elements in struct initializer
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:43:48: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      43 | #define DEFINE_SPINLOCK(x)      spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
         |                                                ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4622:8: note: in expansion of macro 'DEFINE_SPINLOCK'
    4622 | static DEFINE_SPINLOCK(dump_list_lock);
         |        ^~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:43: note: (near initialization for '(anonymous).<anonymous>.rlock.raw_lock')
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^
   include/linux/spinlock_types.h:33:21: note: in expansion of macro '__ARCH_SPIN_LOCK_UNLOCKED'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:22: note: in expansion of macro '___SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:22: note: in expansion of macro '__SPIN_LOCK_INITIALIZER'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:43:48: note: in expansion of macro '__SPIN_LOCK_UNLOCKED'
      43 | #define DEFINE_SPINLOCK(x)      spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
         |                                                ^~~~~~~~~~~~~~~~~~~~
   kernel/printk/printk.c:4622:8: note: in expansion of macro 'DEFINE_SPINLOCK'
    4622 | static DEFINE_SPINLOCK(dump_list_lock);
         |        ^~~~~~~~~~~~~~~


vim +32 arch/loongarch/include/asm/atomic.h

5b0b14e550a006 Huacai Chen 2022-05-31  31  
5b0b14e550a006 Huacai Chen 2022-05-31 @32  #define ATOMIC_INIT(i)	  { (i) }
5b0b14e550a006 Huacai Chen 2022-05-31  33  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs
  2025-01-07 14:00 ` [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
  2025-01-08 10:23   ` kernel test robot
  2025-01-08 10:23   ` kernel test robot
@ 2025-01-08 10:44   ` kernel test robot
  2 siblings, 0 replies; 63+ messages in thread
From: kernel test robot @ 2025-01-08 10:44 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, linux-kernel
  Cc: llvm, oe-kbuild-all, Peter Zijlstra, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

Hi Kumar,

kernel test robot noticed the following build errors:

[auto build test ERROR on f44275e7155dc310d36516fc25be503da099781c]

url:    https://github.com/intel-lab-lkp/linux/commits/Kumar-Kartikeya-Dwivedi/locking-Move-MCS-struct-definition-to-public-header/20250107-220615
base:   f44275e7155dc310d36516fc25be503da099781c
patch link:    https://lore.kernel.org/r/20250107140004.2732830-21-memxor%40gmail.com
patch subject: [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs
config: um-allnoconfig (https://download.01.org/0day-ci/archive/20250108/202501081854.xzCcM6nm-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250108/202501081854.xzCcM6nm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501081854.xzCcM6nm-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from fs/kernfs/mount.c:22:
   In file included from fs/kernfs/kernfs-internal.h:20:
   In file included from include/linux/fs_context.h:14:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:7:
>> include/asm-generic/qspinlock_types.h:44:3: error: typedef redefinition with different types ('struct qspinlock' vs 'struct arch_spinlock_t')
      44 | } arch_spinlock_t;
         |   ^
   include/linux/spinlock_types_up.h:25:20: note: previous definition is here
      25 | typedef struct { } arch_spinlock_t;
         |                    ^
   In file included from fs/kernfs/mount.c:22:
   In file included from fs/kernfs/kernfs-internal.h:20:
   In file included from include/linux/fs_context.h:14:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:7:
>> include/asm-generic/qspinlock_types.h:49:9: warning: '__ARCH_SPIN_LOCK_UNLOCKED' macro redefined [-Wmacro-redefined]
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |         ^
   include/linux/spinlock_types_up.h:27:9: note: previous definition is here
      27 | #define __ARCH_SPIN_LOCK_UNLOCKED { }
         |         ^
   In file included from fs/kernfs/mount.c:22:
   In file included from fs/kernfs/kernfs-internal.h:20:
   In file included from include/linux/fs_context.h:14:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:144:9: warning: 'arch_spin_is_locked' macro redefined [-Wmacro-redefined]
     144 | #define arch_spin_is_locked(l)          queued_spin_is_locked(l)
         |         ^
   include/linux/spinlock_up.h:62:9: note: previous definition is here
      62 | #define arch_spin_is_locked(lock)       ((void)(lock), 0)
         |         ^
   In file included from fs/kernfs/mount.c:22:
   In file included from fs/kernfs/kernfs-internal.h:20:
   In file included from include/linux/fs_context.h:14:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:145:9: warning: 'arch_spin_is_contended' macro redefined [-Wmacro-redefined]
     145 | #define arch_spin_is_contended(l)       queued_spin_is_contended(l)
         |         ^
   include/linux/spinlock_up.h:69:9: note: previous definition is here
      69 | #define arch_spin_is_contended(lock)    (((void)(lock), 0))
         |         ^
   In file included from fs/kernfs/mount.c:22:
   In file included from fs/kernfs/kernfs-internal.h:20:
   In file included from include/linux/fs_context.h:14:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:147:9: warning: 'arch_spin_lock' macro redefined [-Wmacro-redefined]
     147 | #define arch_spin_lock(l)               queued_spin_lock(l)
         |         ^
   include/linux/spinlock_up.h:64:10: note: previous definition is here
      64 | # define arch_spin_lock(lock)           do { barrier(); (void)(lock); } while (0)
         |          ^
   In file included from fs/kernfs/mount.c:22:
   In file included from fs/kernfs/kernfs-internal.h:20:
   In file included from include/linux/fs_context.h:14:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:148:9: warning: 'arch_spin_trylock' macro redefined [-Wmacro-redefined]
     148 | #define arch_spin_trylock(l)            queued_spin_trylock(l)
         |         ^
   include/linux/spinlock_up.h:66:10: note: previous definition is here
      66 | # define arch_spin_trylock(lock)        ({ barrier(); (void)(lock); 1; })
         |          ^
   In file included from fs/kernfs/mount.c:22:
   In file included from fs/kernfs/kernfs-internal.h:20:
   In file included from include/linux/fs_context.h:14:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:149:9: warning: 'arch_spin_unlock' macro redefined [-Wmacro-redefined]
     149 | #define arch_spin_unlock(l)             queued_spin_unlock(l)
         |         ^
   include/linux/spinlock_up.h:65:10: note: previous definition is here
      65 | # define arch_spin_unlock(lock) do { barrier(); (void)(lock); } while (0)
         |          ^
   6 warnings and 1 error generated.
--
   In file included from fs/kernfs/inode.c:16:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:7:
>> include/asm-generic/qspinlock_types.h:44:3: error: typedef redefinition with different types ('struct qspinlock' vs 'struct arch_spinlock_t')
      44 | } arch_spinlock_t;
         |   ^
   include/linux/spinlock_types_up.h:25:20: note: previous definition is here
      25 | typedef struct { } arch_spinlock_t;
         |                    ^
   In file included from fs/kernfs/inode.c:16:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:7:
>> include/asm-generic/qspinlock_types.h:49:9: warning: '__ARCH_SPIN_LOCK_UNLOCKED' macro redefined [-Wmacro-redefined]
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |         ^
   include/linux/spinlock_types_up.h:27:9: note: previous definition is here
      27 | #define __ARCH_SPIN_LOCK_UNLOCKED { }
         |         ^
   In file included from fs/kernfs/inode.c:16:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:144:9: warning: 'arch_spin_is_locked' macro redefined [-Wmacro-redefined]
     144 | #define arch_spin_is_locked(l)          queued_spin_is_locked(l)
         |         ^
   include/linux/spinlock_up.h:62:9: note: previous definition is here
      62 | #define arch_spin_is_locked(lock)       ((void)(lock), 0)
         |         ^
   In file included from fs/kernfs/inode.c:16:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:145:9: warning: 'arch_spin_is_contended' macro redefined [-Wmacro-redefined]
     145 | #define arch_spin_is_contended(l)       queued_spin_is_contended(l)
         |         ^
   include/linux/spinlock_up.h:69:9: note: previous definition is here
      69 | #define arch_spin_is_contended(lock)    (((void)(lock), 0))
         |         ^
   In file included from fs/kernfs/inode.c:16:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:147:9: warning: 'arch_spin_lock' macro redefined [-Wmacro-redefined]
     147 | #define arch_spin_lock(l)               queued_spin_lock(l)
         |         ^
   include/linux/spinlock_up.h:64:10: note: previous definition is here
      64 | # define arch_spin_lock(lock)           do { barrier(); (void)(lock); } while (0)
         |          ^
   In file included from fs/kernfs/inode.c:16:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:148:9: warning: 'arch_spin_trylock' macro redefined [-Wmacro-redefined]
     148 | #define arch_spin_trylock(l)            queued_spin_trylock(l)
         |         ^
   include/linux/spinlock_up.h:66:10: note: previous definition is here
      66 | # define arch_spin_trylock(lock)        ({ barrier(); (void)(lock); 1; })
         |          ^
   In file included from fs/kernfs/inode.c:16:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:149:9: warning: 'arch_spin_unlock' macro redefined [-Wmacro-redefined]
     149 | #define arch_spin_unlock(l)             queued_spin_unlock(l)
         |         ^
   include/linux/spinlock_up.h:65:10: note: previous definition is here
      65 | # define arch_spin_unlock(lock) do { barrier(); (void)(lock); } while (0)
         |          ^
   fs/kernfs/inode.c:29:9: warning: excess elements in struct initializer [-Wexcess-initializers]
      29 |         static DEFINE_MUTEX(iattr_mutex);
         |                ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:87:27: note: expanded from macro 'DEFINE_MUTEX'
      87 |         struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/mutex.h:81:18: note: expanded from macro '__MUTEX_INITIALIZER'
      81 |                 , .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
         |                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:69:19: note: expanded from macro '__RAW_SPIN_LOCK_UNLOCKED'
      69 |         (raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
         |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types_raw.h:64:14: note: expanded from macro '__RAW_SPIN_LOCK_INITIALIZER'
      64 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:37: note: expanded from macro '__ARCH_SPIN_LOCK_UNLOCKED'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^~~~~~~~~~~~~~~~~~~~~~~~~
   7 warnings and 1 error generated.
--
   In file included from fs/kernfs/dir.c:15:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:7:
>> include/asm-generic/qspinlock_types.h:44:3: error: typedef redefinition with different types ('struct qspinlock' vs 'struct arch_spinlock_t')
      44 | } arch_spinlock_t;
         |   ^
   include/linux/spinlock_types_up.h:25:20: note: previous definition is here
      25 | typedef struct { } arch_spinlock_t;
         |                    ^
   In file included from fs/kernfs/dir.c:15:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:7:
>> include/asm-generic/qspinlock_types.h:49:9: warning: '__ARCH_SPIN_LOCK_UNLOCKED' macro redefined [-Wmacro-redefined]
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |         ^
   include/linux/spinlock_types_up.h:27:9: note: previous definition is here
      27 | #define __ARCH_SPIN_LOCK_UNLOCKED { }
         |         ^
   In file included from fs/kernfs/dir.c:15:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:144:9: warning: 'arch_spin_is_locked' macro redefined [-Wmacro-redefined]
     144 | #define arch_spin_is_locked(l)          queued_spin_is_locked(l)
         |         ^
   include/linux/spinlock_up.h:62:9: note: previous definition is here
      62 | #define arch_spin_is_locked(lock)       ((void)(lock), 0)
         |         ^
   In file included from fs/kernfs/dir.c:15:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:145:9: warning: 'arch_spin_is_contended' macro redefined [-Wmacro-redefined]
     145 | #define arch_spin_is_contended(l)       queued_spin_is_contended(l)
         |         ^
   include/linux/spinlock_up.h:69:9: note: previous definition is here
      69 | #define arch_spin_is_contended(lock)    (((void)(lock), 0))
         |         ^
   In file included from fs/kernfs/dir.c:15:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:147:9: warning: 'arch_spin_lock' macro redefined [-Wmacro-redefined]
     147 | #define arch_spin_lock(l)               queued_spin_lock(l)
         |         ^
   include/linux/spinlock_up.h:64:10: note: previous definition is here
      64 | # define arch_spin_lock(lock)           do { barrier(); (void)(lock); } while (0)
         |          ^
   In file included from fs/kernfs/dir.c:15:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:148:9: warning: 'arch_spin_trylock' macro redefined [-Wmacro-redefined]
     148 | #define arch_spin_trylock(l)            queued_spin_trylock(l)
         |         ^
   include/linux/spinlock_up.h:66:10: note: previous definition is here
      66 | # define arch_spin_trylock(lock)        ({ barrier(); (void)(lock); 1; })
         |          ^
   In file included from fs/kernfs/dir.c:15:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:33:
   In file included from arch/x86/include/asm/rqspinlock.h:18:
   In file included from include/asm-generic/rqspinlock.h:15:
   In file included from arch/x86/include/asm/qspinlock.h:114:
>> include/asm-generic/qspinlock.h:149:9: warning: 'arch_spin_unlock' macro redefined [-Wmacro-redefined]
     149 | #define arch_spin_unlock(l)             queued_spin_unlock(l)
         |         ^
   include/linux/spinlock_up.h:65:10: note: previous definition is here
      65 | # define arch_spin_unlock(lock) do { barrier(); (void)(lock); } while (0)
         |          ^
   fs/kernfs/dir.c:28:8: warning: excess elements in struct initializer [-Wexcess-initializers]
      28 | static DEFINE_SPINLOCK(kernfs_pr_cont_lock);
         |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:43:43: note: expanded from macro 'DEFINE_SPINLOCK'
      43 | #define DEFINE_SPINLOCK(x)      spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
         |                                                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:15: note: expanded from macro '__SPIN_LOCK_UNLOCKED'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:15: note: expanded from macro '__SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:33:14: note: expanded from macro '___SPIN_LOCK_INITIALIZER'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:37: note: expanded from macro '__ARCH_SPIN_LOCK_UNLOCKED'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^~~~~~~~~~~~~~~~~~~~~~~~~
   fs/kernfs/dir.c:30:8: warning: excess elements in struct initializer [-Wexcess-initializers]
      30 | static DEFINE_SPINLOCK(kernfs_idr_lock);        /* root->ino_idr */
         |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:43:43: note: expanded from macro 'DEFINE_SPINLOCK'
      43 | #define DEFINE_SPINLOCK(x)      spinlock_t x = __SPIN_LOCK_UNLOCKED(x)
         |                                                ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:41:15: note: expanded from macro '__SPIN_LOCK_UNLOCKED'
      41 |         (spinlock_t) __SPIN_LOCK_INITIALIZER(lockname)
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:38:15: note: expanded from macro '__SPIN_LOCK_INITIALIZER'
      38 |         { { .rlock = ___SPIN_LOCK_INITIALIZER(lockname) } }
         |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/spinlock_types.h:33:14: note: expanded from macro '___SPIN_LOCK_INITIALIZER'
      33 |         .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/asm-generic/qspinlock_types.h:49:37: note: expanded from macro '__ARCH_SPIN_LOCK_UNLOCKED'
      49 | #define __ARCH_SPIN_LOCK_UNLOCKED       { { .val = ATOMIC_INIT(0) } }
         |                                           ^~~~~~~~~~~~~~~~~~~~~~~~~
   8 warnings and 1 error generated.
..


vim +/arch_spin_is_locked +144 include/asm-generic/qspinlock.h

2aa79af6426319 Peter Zijlstra (Intel  2015-04-24  138) 
ab83647fadae2f Alexandre Ghiti        2024-11-03  139  #ifndef __no_arch_spinlock_redefine
a33fda35e3a765 Waiman Long            2015-04-24  140  /*
a33fda35e3a765 Waiman Long            2015-04-24  141   * Remapping spinlock architecture specific functions to the corresponding
a33fda35e3a765 Waiman Long            2015-04-24  142   * queued spinlock functions.
a33fda35e3a765 Waiman Long            2015-04-24  143   */
a33fda35e3a765 Waiman Long            2015-04-24 @144  #define arch_spin_is_locked(l)		queued_spin_is_locked(l)
a33fda35e3a765 Waiman Long            2015-04-24 @145  #define arch_spin_is_contended(l)	queued_spin_is_contended(l)
a33fda35e3a765 Waiman Long            2015-04-24  146  #define arch_spin_value_unlocked(l)	queued_spin_value_unlocked(l)
a33fda35e3a765 Waiman Long            2015-04-24 @147  #define arch_spin_lock(l)		queued_spin_lock(l)
a33fda35e3a765 Waiman Long            2015-04-24 @148  #define arch_spin_trylock(l)		queued_spin_trylock(l)
a33fda35e3a765 Waiman Long            2015-04-24 @149  #define arch_spin_unlock(l)		queued_spin_unlock(l)
ab83647fadae2f Alexandre Ghiti        2024-11-03  150  #endif
a33fda35e3a765 Waiman Long            2015-04-24  151  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and recovery
  2025-01-07 13:59 ` [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and recovery Kumar Kartikeya Dwivedi
@ 2025-01-08 16:06   ` Waiman Long
  2025-01-08 20:19     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-08 16:06 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team


On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> While the timeout logic provides guarantees for the waiter's forward
> progress, the time until a stalling waiter unblocks can still be long.
> The default timeout of 1/2 sec can be excessively long for some use
> cases.  Additionally, custom timeouts may exacerbate recovery time.
>
> Introduce logic to detect common cases of deadlocks and perform quicker
> recovery. This is done by dividing the time from entry into the locking
> slow path until the timeout into intervals of 1 ms. Then, after each
> interval elapses, deadlock detection is performed, while also polling
> the lock word to ensure we can quickly break out of the detection logic
> and proceed with lock acquisition.
>
> A 'held_locks' table is maintained per-CPU where the entry at the bottom
> denotes a lock being waited for or already taken. Entries coming before
> it denote locks that are already held. The current CPU's table can thus
> be looked at to detect AA deadlocks. The tables from other CPUs can be
> looked at to discover ABBA situations. Finally, when a matching entry
> for the lock being taken on the current CPU is found on some other CPU,
> a deadlock situation is detected. This function can take a long time,
> therefore the lock word is constantly polled in each loop iteration to
> ensure we can preempt detection and proceed with lock acquisition, using
> the is_lock_released check.
>
> We set 'spin' member of rqspinlock_timeout struct to 0 to trigger
> deadlock checks immediately to perform faster recovery.
>
> Note: Extending lock word size by 4 bytes to record owner CPU can allow
> faster detection for ABBA. It is typically the owner which participates
> in a ABBA situation. However, to keep compatibility with existing lock
> words in the kernel (struct qspinlock), and given deadlocks are a rare
> event triggered by bugs, we choose to favor compatibility over faster
> detection.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   include/asm-generic/rqspinlock.h |  56 +++++++++-
>   kernel/locking/rqspinlock.c      | 178 ++++++++++++++++++++++++++++---
>   2 files changed, 220 insertions(+), 14 deletions(-)
>
> diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
> index 5c996a82e75f..c7e33ccc57a6 100644
> --- a/include/asm-generic/rqspinlock.h
> +++ b/include/asm-generic/rqspinlock.h
> @@ -11,14 +11,68 @@
>   
>   #include <linux/types.h>
>   #include <vdso/time64.h>
> +#include <linux/percpu.h>
>   
>   struct qspinlock;
>   
> +extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
> +
>   /*
>    * Default timeout for waiting loops is 0.5 seconds
>    */
>   #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
>   
> -extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
> +#define RES_NR_HELD 32
> +
> +struct rqspinlock_held {
> +	int cnt;
> +	void *locks[RES_NR_HELD];
> +};
> +
> +DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
> +
> +static __always_inline void grab_held_lock_entry(void *lock)
> +{
> +	int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt);
> +
> +	if (unlikely(cnt > RES_NR_HELD)) {
> +		/* Still keep the inc so we decrement later. */
> +		return;
> +	}
> +
> +	/*
> +	 * Implied compiler barrier in per-CPU operations; otherwise we can have
> +	 * the compiler reorder inc with write to table, allowing interrupts to
> +	 * overwrite and erase our write to the table (as on interrupt exit it
> +	 * will be reset to NULL).
> +	 */
> +	this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock);
> +}
> +
> +/*
> + * It is possible to run into misdetection scenarios of AA deadlocks on the same
> + * CPU, and missed ABBA deadlocks on remote CPUs when this function pops entries
> + * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
> + * logic to preserve right entries in the table would be to walk the array of
> + * held locks and swap and clear out-of-order entries, but that's too
> + * complicated and we don't have a compelling use case for out of order unlocking.
Maybe we can pass in the lock and print a warning if out-of-order unlock 
is being done.
> + *
> + * Therefore, we simply don't support such cases and keep the logic simple here.
> + */
> +static __always_inline void release_held_lock_entry(void)
> +{
> +	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> +
> +	if (unlikely(rqh->cnt > RES_NR_HELD))
> +		goto dec;
> +	smp_store_release(&rqh->locks[rqh->cnt - 1], NULL);
> +	/*
> +	 * Overwrite of NULL should appear before our decrement of the count to
> +	 * other CPUs, otherwise we have the issue of a stale non-NULL entry being
> +	 * visible in the array, leading to misdetection during deadlock detection.
> +	 */
> +dec:
> +	this_cpu_dec(rqspinlock_held_locks.cnt);
AFAIU, smp_store_release() only guarantees memory ordering before it, 
not after. That shouldn't be a problem if the decrement is observed 
before clearing the entry as that non-NULL entry won't be checked anyway.
> +}
>   
>   #endif /* __ASM_GENERIC_RQSPINLOCK_H */
> diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
> index b63f92bd43b1..b7c86127d288 100644
> --- a/kernel/locking/rqspinlock.c
> +++ b/kernel/locking/rqspinlock.c
> @@ -30,6 +30,7 @@
>    * Include queued spinlock definitions and statistics code
>    */
>   #include "qspinlock.h"
> +#include "rqspinlock.h"
>   #include "qspinlock_stat.h"
>   
>   /*
> @@ -74,16 +75,141 @@
>   struct rqspinlock_timeout {
>   	u64 timeout_end;
>   	u64 duration;
> +	u64 cur;
>   	u16 spin;
>   };
>   
>   #define RES_TIMEOUT_VAL	2
>   
> -static noinline int check_timeout(struct rqspinlock_timeout *ts)
> +DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
> +
> +static bool is_lock_released(struct qspinlock *lock, u32 mask, struct rqspinlock_timeout *ts)
> +{
> +	if (!(atomic_read_acquire(&lock->val) & (mask)))
> +		return true;
> +	return false;
> +}
> +
> +static noinline int check_deadlock_AA(struct qspinlock *lock, u32 mask,
> +				      struct rqspinlock_timeout *ts)
> +{
> +	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> +	int cnt = min(RES_NR_HELD, rqh->cnt);
> +
> +	/*
> +	 * Return an error if we hold the lock we are attempting to acquire.
> +	 * We'll iterate over max 32 locks; no need to do is_lock_released.
> +	 */
> +	for (int i = 0; i < cnt - 1; i++) {
> +		if (rqh->locks[i] == lock)
> +			return -EDEADLK;
> +	}
> +	return 0;
> +}
> +
> +static noinline int check_deadlock_ABBA(struct qspinlock *lock, u32 mask,
> +					struct rqspinlock_timeout *ts)
> +{

I think you should note that the ABBA check here is not exhaustive. It 
is just the most common case and there are corner cases that will be missed.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-07 13:59 ` [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT Kumar Kartikeya Dwivedi
@ 2025-01-08 16:27   ` Waiman Long
  2025-01-08 20:32     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-08 16:27 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> We ripped out PV and virtualization related bits from rqspinlock in an
> earlier commit, however, a fair lock performs poorly within a virtual
> machine when the lock holder is preempted. As such, retain the
> virt_spin_lock fallback to test and set lock, but with timeout and
> deadlock detection.
>
> We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
> requires more involved algorithmic changes and introduces more
> complexity. It can be done when the need arises in the future.

virt_spin_lock() doesn't scale well. It is for hypervisors that don't 
support PV qspinlock yet. Now rqspinlock() will be in this category.

I wonder if we should provide an option to disable rqspinlock and fall 
back to the regular qspinlock with strict BPF locking semantics.

Another question that I have is about PREEMPT_RT kernel which cannot 
tolerate any locking stall. That will probably require disabling 
rqspinlock if CONFIG_PREEMPT_RT is enabled.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage
  2025-01-07 13:59 ` [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage Kumar Kartikeya Dwivedi
@ 2025-01-08 16:55   ` Waiman Long
  2025-01-08 20:41     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-08 16:55 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, bpf, linux-kernel
  Cc: Linus Torvalds, Peter Zijlstra, Waiman Long, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Paul E. McKenney, Tejun Heo, Barret Rhoden,
	Josh Don, Dohyun Kim, kernel-team

On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> Introduce helper macros that wrap around the rqspinlock slow path and
> provide an interface analogous to the raw_spin_lock API. Note that
> in case of error conditions, preemption and IRQ disabling is
> automatically unrolled before returning the error back to the caller.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>   include/asm-generic/rqspinlock.h | 58 ++++++++++++++++++++++++++++++++
>   1 file changed, 58 insertions(+)
>
> diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
> index dc436ab01471..53be8426373c 100644
> --- a/include/asm-generic/rqspinlock.h
> +++ b/include/asm-generic/rqspinlock.h
> @@ -12,8 +12,10 @@
>   #include <linux/types.h>
>   #include <vdso/time64.h>
>   #include <linux/percpu.h>
> +#include <asm/qspinlock.h>
>   
>   struct qspinlock;
> +typedef struct qspinlock rqspinlock_t;
>   
>   extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
>   
> @@ -82,4 +84,60 @@ static __always_inline void release_held_lock_entry(void)
>   	this_cpu_dec(rqspinlock_held_locks.cnt);
>   }
>   
> +/**
> + * res_spin_lock - acquire a queued spinlock
> + * @lock: Pointer to queued spinlock structure
> + */
> +static __always_inline int res_spin_lock(rqspinlock_t *lock)
> +{
> +	int val = 0;
> +
> +	if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
> +		grab_held_lock_entry(lock);
> +		return 0;
> +	}
> +	return resilient_queued_spin_lock_slowpath(lock, val, RES_DEF_TIMEOUT);
> +}
> +
> +static __always_inline void res_spin_unlock(rqspinlock_t *lock)
> +{
> +	struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> +
> +	if (unlikely(rqh->cnt > RES_NR_HELD))
> +		goto unlock;
> +	WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
> +	/*
> +	 * Release barrier, ensuring ordering. See release_held_lock_entry.
> +	 */
> +unlock:
> +	queued_spin_unlock(lock);
> +	this_cpu_dec(rqspinlock_held_locks.cnt);
> +}
> +
> +#define raw_res_spin_lock_init(lock) ({ *(lock) = (struct qspinlock)__ARCH_SPIN_LOCK_UNLOCKED; })
> +
> +#define raw_res_spin_lock(lock)                    \
> +	({                                         \
> +		int __ret;                         \
> +		preempt_disable();                 \
> +		__ret = res_spin_lock(lock);	   \
> +		if (__ret)                         \
> +			preempt_enable();          \
> +		__ret;                             \
> +	})
> +
> +#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); })
> +
> +#define raw_res_spin_lock_irqsave(lock, flags)    \
> +	({                                        \
> +		int __ret;                        \
> +		local_irq_save(flags);            \
> +		__ret = raw_res_spin_lock(lock);  \
> +		if (__ret)                        \
> +			local_irq_restore(flags); \
> +		__ret;                            \
> +	})
> +
> +#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); })
> +
>   #endif /* __ASM_GENERIC_RQSPINLOCK_H */

Lockdep calls aren't included in the helper functions. That means all 
the *res_spin_lock*() calls will be outside the purview of lockdep. That 
also means a multi-CPU circular locking dependency involving a mixture 
of qspinlocks and rqspinlocks may not be detectable.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-08  9:18   ` Peter Zijlstra
@ 2025-01-08 20:12     ` Kumar Kartikeya Dwivedi
  2025-01-08 20:30       ` Linus Torvalds
  2025-01-09 13:59       ` Waiman Long
  0 siblings, 2 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-08 20:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Will Deacon, bpf, linux-kernel, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Wed, 8 Jan 2025 at 14:48, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 07, 2025 at 03:54:36PM -0800, Linus Torvalds wrote:
> > On Tue, 7 Jan 2025 at 06:00, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > This patch set introduces Resilient Queued Spin Lock (or rqspinlock with
> > > res_spin_lock() and res_spin_unlock() APIs).
> >
> > So when I see people doing new locking mechanisms, I invariably go "Oh no!".
> >
> > But this series seems reasonable to me. I see that PeterZ had a couple
> > of minor comments (well, the arm64 one is more fundamental), which
> > hopefully means that it seems reasonable to him too. Peter?
>
> I've not had time to fully read the whole thing yet, I only did a quick
> once over. I'll try and get around to doing a proper reading eventually,
> but I'm chasing a regression atm, and then I need to go review a ton of
> code Andrew merged over the xmas/newyears holiday :/
>
> One potential issue is that qspinlock isn't suitable for all
> architectures -- and I've yet to figure out widely BPF is planning on
> using this.

For architectures where qspinlock is not available, I think we can
have a fallback to a test and set lock with timeout and deadlock
checks, like patch 12.
We plan on using this in BPF core and BPF maps, so the usage will be
pervasive, and we have atleast one architecture in CI (s390) which
doesn't have ARCH_USER_QUEUED_SPINLOCK selected, so we should have
coverage for both cases. For now the fallback is missing, but I will
add one in v2.

> Notably qspinlock is ineffective (as in way over engineered)
> for architectures that do not provide hardware level progress guarantees
> on competing atomics and qspinlock uses mixed sized atomics, which are
> typically under specified, architecturally.

Yes, we also noticed during development that try_cmpxchg_tail (in
patch 9) couldn't rely on 16-bit cmpxchg being available everywhere (I
think the build broke on arm64), unlike 16-bit xchg which is used in
xchg_tail, but otherwise we should be using 32-bit atomics or relying
on mixed sized atomics similar to qspinlock.

>
> Another issue is the code duplication.

I agree that this isn't ideal, but IMO it would be too ugly to ifdef
parts of qspinlock slow path to accommodate rqspinlock logic, and it
will get harder to reason about. Plus there's distinct return types
for both slow paths, which means if we combine them we end up with the
normal qspinlock returning a value, which isn't very meaningful. We
can probably discuss more code sharing possibilities through common
inline functions to minimize duplication though.

>
> Anyway, I'll get to it eventually...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls
  2025-01-08  2:19   ` Waiman Long
@ 2025-01-08 20:13     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-08 20:13 UTC (permalink / raw)
  To: Waiman Long
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Peter Zijlstra,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On Wed, 8 Jan 2025 at 07:49, Waiman Long <llong@redhat.com> wrote:
>
> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> > The pending bit is used to avoid queueing in case the lock is
> > uncontended, and has demonstrated benefits for the 2 contender scenario,
> > esp. on x86. In case the pending bit is acquired and we wait for the
> > locked bit to disappear, we may get stuck due to the lock owner not
> > making progress. Hence, this waiting loop must be protected with a
> > timeout check.
> >
> > To perform a graceful recovery once we decide to abort our lock
> > acquisition attempt in this case, we must unset the pending bit since we
> > own it. All waiters undoing their changes and exiting gracefully allows
> > the lock word to be restored to the unlocked state once all participants
> > (owner, waiters) have been recovered, and the lock remains usable.
> > Hence, set the pending bit back to zero before returning to the caller.
> >
> > Introduce a lockevent (rqspinlock_lock_timeout) to capture timeout
> > event statistics.
> >
> > Reviewed-by: Barret Rhoden <brho@google.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >   include/asm-generic/rqspinlock.h  |  2 +-
> >   kernel/locking/lock_events_list.h |  5 +++++
> >   kernel/locking/rqspinlock.c       | 28 +++++++++++++++++++++++-----
> >   3 files changed, 29 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
> > index 8ed266f4e70b..5c996a82e75f 100644
> > --- a/include/asm-generic/rqspinlock.h
> > +++ b/include/asm-generic/rqspinlock.h
> > @@ -19,6 +19,6 @@ struct qspinlock;
> >    */
> >   #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
> >
> > -extern void resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
> > +extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
> >
> >   #endif /* __ASM_GENERIC_RQSPINLOCK_H */
> > diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
> > index 97fb6f3f840a..c5286249994d 100644
> > --- a/kernel/locking/lock_events_list.h
> > +++ b/kernel/locking/lock_events_list.h
> > @@ -49,6 +49,11 @@ LOCK_EVENT(lock_use_node4) /* # of locking ops that use 4th percpu node */
> >   LOCK_EVENT(lock_no_node)    /* # of locking ops w/o using percpu node    */
> >   #endif /* CONFIG_QUEUED_SPINLOCKS */
> >
> > +/*
> > + * Locking events for Resilient Queued Spin Lock
> > + */
> > +LOCK_EVENT(rqspinlock_lock_timeout)  /* # of locking ops that timeout        */
> > +
> >   /*
> >    * Locking events for rwsem
> >    */
>
> Since the build of rqspinlock.c is conditional on
> CONFIG_QUEUED_SPINLOCKS, this lock event should be inside the
> CONFIG_QUEUED_SPINLOCKS block.

Ack, I will fix this.

>
> Cheers,
> Longman
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and recovery
  2025-01-08 16:06   ` Waiman Long
@ 2025-01-08 20:19     ` Kumar Kartikeya Dwivedi
  2025-01-09  0:32       ` Waiman Long
  0 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-08 20:19 UTC (permalink / raw)
  To: Waiman Long
  Cc: bpf, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Wed, 8 Jan 2025 at 21:36, Waiman Long <llong@redhat.com> wrote:
>
>
> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> > While the timeout logic provides guarantees for the waiter's forward
> > progress, the time until a stalling waiter unblocks can still be long.
> > The default timeout of 1/2 sec can be excessively long for some use
> > cases.  Additionally, custom timeouts may exacerbate recovery time.
> >
> > Introduce logic to detect common cases of deadlocks and perform quicker
> > recovery. This is done by dividing the time from entry into the locking
> > slow path until the timeout into intervals of 1 ms. Then, after each
> > interval elapses, deadlock detection is performed, while also polling
> > the lock word to ensure we can quickly break out of the detection logic
> > and proceed with lock acquisition.
> >
> > A 'held_locks' table is maintained per-CPU where the entry at the bottom
> > denotes a lock being waited for or already taken. Entries coming before
> > it denote locks that are already held. The current CPU's table can thus
> > be looked at to detect AA deadlocks. The tables from other CPUs can be
> > looked at to discover ABBA situations. Finally, when a matching entry
> > for the lock being taken on the current CPU is found on some other CPU,
> > a deadlock situation is detected. This function can take a long time,
> > therefore the lock word is constantly polled in each loop iteration to
> > ensure we can preempt detection and proceed with lock acquisition, using
> > the is_lock_released check.
> >
> > We set 'spin' member of rqspinlock_timeout struct to 0 to trigger
> > deadlock checks immediately to perform faster recovery.
> >
> > Note: Extending lock word size by 4 bytes to record owner CPU can allow
> > faster detection for ABBA. It is typically the owner which participates
> > in a ABBA situation. However, to keep compatibility with existing lock
> > words in the kernel (struct qspinlock), and given deadlocks are a rare
> > event triggered by bugs, we choose to favor compatibility over faster
> > detection.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >   include/asm-generic/rqspinlock.h |  56 +++++++++-
> >   kernel/locking/rqspinlock.c      | 178 ++++++++++++++++++++++++++++---
> >   2 files changed, 220 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
> > index 5c996a82e75f..c7e33ccc57a6 100644
> > --- a/include/asm-generic/rqspinlock.h
> > +++ b/include/asm-generic/rqspinlock.h
> > @@ -11,14 +11,68 @@
> >
> >   #include <linux/types.h>
> >   #include <vdso/time64.h>
> > +#include <linux/percpu.h>
> >
> >   struct qspinlock;
> >
> > +extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
> > +
> >   /*
> >    * Default timeout for waiting loops is 0.5 seconds
> >    */
> >   #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
> >
> > -extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
> > +#define RES_NR_HELD 32
> > +
> > +struct rqspinlock_held {
> > +     int cnt;
> > +     void *locks[RES_NR_HELD];
> > +};
> > +
> > +DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
> > +
> > +static __always_inline void grab_held_lock_entry(void *lock)
> > +{
> > +     int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt);
> > +
> > +     if (unlikely(cnt > RES_NR_HELD)) {
> > +             /* Still keep the inc so we decrement later. */
> > +             return;
> > +     }
> > +
> > +     /*
> > +      * Implied compiler barrier in per-CPU operations; otherwise we can have
> > +      * the compiler reorder inc with write to table, allowing interrupts to
> > +      * overwrite and erase our write to the table (as on interrupt exit it
> > +      * will be reset to NULL).
> > +      */
> > +     this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock);
> > +}
> > +
> > +/*
> > + * It is possible to run into misdetection scenarios of AA deadlocks on the same
> > + * CPU, and missed ABBA deadlocks on remote CPUs when this function pops entries
> > + * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
> > + * logic to preserve right entries in the table would be to walk the array of
> > + * held locks and swap and clear out-of-order entries, but that's too
> > + * complicated and we don't have a compelling use case for out of order unlocking.
> Maybe we can pass in the lock and print a warning if out-of-order unlock
> is being done.

I think alternatively, I will constrain the verifier in v2 to require
lock release to be in-order, which would obviate the need to warn at
runtime and reject programs potentially doing out-of-order unlocks.
This doesn't cover in-kernel users though, but we're not doing
out-of-order unlocks with this lock there, and it would be yet another
branch in the unlock function with little benefit.

> > + *
> > + * Therefore, we simply don't support such cases and keep the logic simple here.
> > + */
> > +static __always_inline void release_held_lock_entry(void)
> > +{
> > +     struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> > +
> > +     if (unlikely(rqh->cnt > RES_NR_HELD))
> > +             goto dec;
> > +     smp_store_release(&rqh->locks[rqh->cnt - 1], NULL);
> > +     /*
> > +      * Overwrite of NULL should appear before our decrement of the count to
> > +      * other CPUs, otherwise we have the issue of a stale non-NULL entry being
> > +      * visible in the array, leading to misdetection during deadlock detection.
> > +      */
> > +dec:
> > +     this_cpu_dec(rqspinlock_held_locks.cnt);
> AFAIU, smp_store_release() only guarantees memory ordering before it,
> not after. That shouldn't be a problem if the decrement is observed
> before clearing the entry as that non-NULL entry won't be checked anyway.

Ack, I will improve the comment, it's a bit misleading right now.

> > +}
> >
> >   #endif /* __ASM_GENERIC_RQSPINLOCK_H */
> > diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
> > index b63f92bd43b1..b7c86127d288 100644
> > --- a/kernel/locking/rqspinlock.c
> > +++ b/kernel/locking/rqspinlock.c
> > @@ -30,6 +30,7 @@
> >    * Include queued spinlock definitions and statistics code
> >    */
> >   #include "qspinlock.h"
> > +#include "rqspinlock.h"
> >   #include "qspinlock_stat.h"
> >
> >   /*
> > @@ -74,16 +75,141 @@
> >   struct rqspinlock_timeout {
> >       u64 timeout_end;
> >       u64 duration;
> > +     u64 cur;
> >       u16 spin;
> >   };
> >
> >   #define RES_TIMEOUT_VAL     2
> >
> > -static noinline int check_timeout(struct rqspinlock_timeout *ts)
> > +DEFINE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
> > +
> > +static bool is_lock_released(struct qspinlock *lock, u32 mask, struct rqspinlock_timeout *ts)
> > +{
> > +     if (!(atomic_read_acquire(&lock->val) & (mask)))
> > +             return true;
> > +     return false;
> > +}
> > +
> > +static noinline int check_deadlock_AA(struct qspinlock *lock, u32 mask,
> > +                                   struct rqspinlock_timeout *ts)
> > +{
> > +     struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> > +     int cnt = min(RES_NR_HELD, rqh->cnt);
> > +
> > +     /*
> > +      * Return an error if we hold the lock we are attempting to acquire.
> > +      * We'll iterate over max 32 locks; no need to do is_lock_released.
> > +      */
> > +     for (int i = 0; i < cnt - 1; i++) {
> > +             if (rqh->locks[i] == lock)
> > +                     return -EDEADLK;
> > +     }
> > +     return 0;
> > +}
> > +
> > +static noinline int check_deadlock_ABBA(struct qspinlock *lock, u32 mask,
> > +                                     struct rqspinlock_timeout *ts)
> > +{
>
> I think you should note that the ABBA check here is not exhaustive. It
> is just the most common case and there are corner cases that will be missed.

Ack, will add a comment.

>
> Cheers,
> Longman
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-08 20:12     ` Kumar Kartikeya Dwivedi
@ 2025-01-08 20:30       ` Linus Torvalds
  2025-01-08 21:06         ` Kumar Kartikeya Dwivedi
  2025-01-08 21:30         ` Paul E. McKenney
  2025-01-09 13:59       ` Waiman Long
  1 sibling, 2 replies; 63+ messages in thread
From: Linus Torvalds @ 2025-01-08 20:30 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Peter Zijlstra, Will Deacon, bpf, linux-kernel, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Wed, 8 Jan 2025 at 12:13, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> Yes, we also noticed during development that try_cmpxchg_tail (in
> patch 9) couldn't rely on 16-bit cmpxchg being available everywhere

I think that's purely a "we have had no use for it" issue.

A 16-bit cmpxchg can always be written using a larger size, and we did
that for 8-bit ones for RCU.

See commit d4e287d7caff ("rcu-tasks: Remove open-coded one-byte
cmpxchg() emulation") which switched RCU over to use a "native" 8-bit
cmpxchg, because Paul had added the capability to all architectures,
sometimes using a bigger size and "emulating" it: a88d970c8bb5 ("lib:
Add one-byte emulation function").

In fact, I think that series added a couple of 16-bit cases too, but I
actually went "if we have no users, don't bother".

              Linus

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-08 16:27   ` Waiman Long
@ 2025-01-08 20:32     ` Kumar Kartikeya Dwivedi
  2025-01-09  0:48       ` Waiman Long
  0 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-08 20:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: bpf, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Wed, 8 Jan 2025 at 21:57, Waiman Long <llong@redhat.com> wrote:
>
> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> > We ripped out PV and virtualization related bits from rqspinlock in an
> > earlier commit, however, a fair lock performs poorly within a virtual
> > machine when the lock holder is preempted. As such, retain the
> > virt_spin_lock fallback to test and set lock, but with timeout and
> > deadlock detection.
> >
> > We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
> > requires more involved algorithmic changes and introduces more
> > complexity. It can be done when the need arises in the future.
>
> virt_spin_lock() doesn't scale well. It is for hypervisors that don't
> support PV qspinlock yet. Now rqspinlock() will be in this category.

We would need to make algorithmic changes to paravirt versions, which
would be too much for this series, so I didn't go there.

>
> I wonder if we should provide an option to disable rqspinlock and fall
> back to the regular qspinlock with strict BPF locking semantics.

That unfortunately won't work, because rqspinlock operates essentially
like a trylock, where it is allowed to fail and callers must handle
errors accordingly. Some of the users in BPF (e.g. in patch 17) remove
their per-cpu nesting counts to rely on AA deadlock detection of
rqspinlock, which would cause a deadlock if we transparently replace
it with qspinlock as a fallback.

>
> Another question that I have is about PREEMPT_RT kernel which cannot
> tolerate any locking stall. That will probably require disabling
> rqspinlock if CONFIG_PREEMPT_RT is enabled.

I think rqspinlock better maps to the raw spin lock variants, which
stays as a spin lock on RT kernels, and as you see in patch 17 and 18,
BPF maps were already using the raw spin lock variants. To avoid
stalling, we perform deadlock checks immediately when we enter the
slow path, so for the cases where we rely upon rqspinlock to diagnose
and report an error, we'll recover quickly. If we still hit the
timeout it is probably a different problem / bug anyway (and would
have caused a kernel hang otherwise).

>
> Cheers,
> Longman
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage
  2025-01-08 16:55   ` Waiman Long
@ 2025-01-08 20:41     ` Kumar Kartikeya Dwivedi
  2025-01-09  1:11       ` Waiman Long
  0 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-08 20:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: bpf, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Wed, 8 Jan 2025 at 22:26, Waiman Long <llong@redhat.com> wrote:
>
> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> > Introduce helper macros that wrap around the rqspinlock slow path and
> > provide an interface analogous to the raw_spin_lock API. Note that
> > in case of error conditions, preemption and IRQ disabling is
> > automatically unrolled before returning the error back to the caller.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >   include/asm-generic/rqspinlock.h | 58 ++++++++++++++++++++++++++++++++
> >   1 file changed, 58 insertions(+)
> >
> > diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
> > index dc436ab01471..53be8426373c 100644
> > --- a/include/asm-generic/rqspinlock.h
> > +++ b/include/asm-generic/rqspinlock.h
> > @@ -12,8 +12,10 @@
> >   #include <linux/types.h>
> >   #include <vdso/time64.h>
> >   #include <linux/percpu.h>
> > +#include <asm/qspinlock.h>
> >
> >   struct qspinlock;
> > +typedef struct qspinlock rqspinlock_t;
> >
> >   extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
> >
> > @@ -82,4 +84,60 @@ static __always_inline void release_held_lock_entry(void)
> >       this_cpu_dec(rqspinlock_held_locks.cnt);
> >   }
> >
> > +/**
> > + * res_spin_lock - acquire a queued spinlock
> > + * @lock: Pointer to queued spinlock structure
> > + */
> > +static __always_inline int res_spin_lock(rqspinlock_t *lock)
> > +{
> > +     int val = 0;
> > +
> > +     if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
> > +             grab_held_lock_entry(lock);
> > +             return 0;
> > +     }
> > +     return resilient_queued_spin_lock_slowpath(lock, val, RES_DEF_TIMEOUT);
> > +}
> > +
> > +static __always_inline void res_spin_unlock(rqspinlock_t *lock)
> > +{
> > +     struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
> > +
> > +     if (unlikely(rqh->cnt > RES_NR_HELD))
> > +             goto unlock;
> > +     WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
> > +     /*
> > +      * Release barrier, ensuring ordering. See release_held_lock_entry.
> > +      */
> > +unlock:
> > +     queued_spin_unlock(lock);
> > +     this_cpu_dec(rqspinlock_held_locks.cnt);
> > +}
> > +
> > +#define raw_res_spin_lock_init(lock) ({ *(lock) = (struct qspinlock)__ARCH_SPIN_LOCK_UNLOCKED; })
> > +
> > +#define raw_res_spin_lock(lock)                    \
> > +     ({                                         \
> > +             int __ret;                         \
> > +             preempt_disable();                 \
> > +             __ret = res_spin_lock(lock);       \
> > +             if (__ret)                         \
> > +                     preempt_enable();          \
> > +             __ret;                             \
> > +     })
> > +
> > +#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); })
> > +
> > +#define raw_res_spin_lock_irqsave(lock, flags)    \
> > +     ({                                        \
> > +             int __ret;                        \
> > +             local_irq_save(flags);            \
> > +             __ret = raw_res_spin_lock(lock);  \
> > +             if (__ret)                        \
> > +                     local_irq_restore(flags); \
> > +             __ret;                            \
> > +     })
> > +
> > +#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); })
> > +
> >   #endif /* __ASM_GENERIC_RQSPINLOCK_H */
>
> Lockdep calls aren't included in the helper functions. That means all
> the *res_spin_lock*() calls will be outside the purview of lockdep. That
> also means a multi-CPU circular locking dependency involving a mixture
> of qspinlocks and rqspinlocks may not be detectable.

Yes, this is true, but I am not sure whether lockdep fits well in this
case, or how to map its semantics.
Some BPF users (e.g. in patch 17) expect and rely on rqspinlock to
return errors on AA deadlocks, as nesting is possible, so we'll get
false alarms with it. Lockdep also needs to treat rqspinlock as a
trylock, since it's essentially fallible, and IIUC it skips diagnosing
in those cases.
Most of the users use rqspinlock because it is expected a deadlock may
be constructed at runtime (either due to BPF programs or by attaching
programs to the kernel), so lockdep splats will not be helpful on
debug kernels.

Say if a mix of both qspinlock and rqspinlock were involved in an ABBA
situation, as long as rqspinlock is being acquired on one of the
threads, it will still timeout even if check_deadlock fails to
establish presence of a deadlock. This will mean the qspinlock call on
the other side will make progress as long as the kernel unwinds locks
correctly on failures (by handling rqspinlock errors and releasing
held locks on the way out).

>
> Cheers,
> Longman
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 09/22] rqspinlock: Protect waiters in queue from stalls
  2025-01-08  3:38   ` Waiman Long
@ 2025-01-08 20:42     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-08 20:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: bpf, linux-kernel, Barret Rhoden, Linus Torvalds, Peter Zijlstra,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Josh Don, Dohyun Kim, kernel-team

On Wed, 8 Jan 2025 at 09:08, Waiman Long <llong@redhat.com> wrote:
>
> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
> > Implement the wait queue cleanup algorithm for rqspinlock. There are
> > three forms of waiters in the original queued spin lock algorithm. The
> > first is the waiter which acquires the pending bit and spins on the lock
> > word without forming a wait queue. The second is the head waiter that is
> > the first waiter heading the wait queue. The third form is of all the
> > non-head waiters queued behind the head, waiting to be signalled through
> > their MCS node to overtake the responsibility of the head.
> >
> > In this commit, we are concerned with the second and third kind. First,
> > we augment the waiting loop of the head of the wait queue with a
> > timeout. When this timeout happens, all waiters part of the wait queue
> > will abort their lock acquisition attempts. This happens in three steps.
> > First, the head breaks out of its loop waiting for pending and locked
> > bits to turn to 0, and non-head waiters break out of their MCS node spin
> > (more on that later). Next, every waiter (head or non-head) attempts to
> > check whether they are also the tail waiter, in such a case they attempt
> > to zero out the tail word and allow a new queue to be built up for this
> > lock. If they succeed, they have no one to signal next in the queue to
> > stop spinning. Otherwise, they signal the MCS node of the next waiter to
> > break out of its spin and try resetting the tail word back to 0. This
> > goes on until the tail waiter is found. In case of races, the new tail
> > will be responsible for performing the same task, as the old tail will
> > then fail to reset the tail word and wait for its next pointer to be
> > updated before it signals the new tail to do the same.
> >
> > Lastly, all of these waiters release the rqnode and return to the
> > caller. This patch underscores the point that rqspinlock's timeout does
> > not apply to each waiter individually, and cannot be relied upon as an
> > upper bound. It is possible for the rqspinlock waiters to return early
> > from a failed lock acquisition attempt as soon as stalls are detected.
> >
> > The head waiter cannot directly WRITE_ONCE the tail to zero, as it may
> > race with a concurrent xchg and a non-head waiter linking its MCS node
> > to the head's MCS node through 'prev->next' assignment.
> >
> > Reviewed-by: Barret Rhoden <brho@google.com>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >   kernel/locking/rqspinlock.c | 42 +++++++++++++++++++++++++++++---
> >   kernel/locking/rqspinlock.h | 48 +++++++++++++++++++++++++++++++++++++
> >   2 files changed, 87 insertions(+), 3 deletions(-)
> >   create mode 100644 kernel/locking/rqspinlock.h
> >
> > diff --git a/kernel/locking/rqspinlock.c b/kernel/locking/rqspinlock.c
> > index dd305573db13..f712fe4b1f38 100644
> > --- a/kernel/locking/rqspinlock.c
> > +++ b/kernel/locking/rqspinlock.c
> > @@ -77,6 +77,8 @@ struct rqspinlock_timeout {
> >       u16 spin;
> >   };
> >
> > +#define RES_TIMEOUT_VAL      2
> > +
> >   static noinline int check_timeout(struct rqspinlock_timeout *ts)
> >   {
> >       u64 time = ktime_get_mono_fast_ns();
> > @@ -305,12 +307,18 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
> >        * head of the waitqueue.
> >        */
> >       if (old & _Q_TAIL_MASK) {
> > +             int val;
> > +
> >               prev = decode_tail(old, qnodes);
> >
> >               /* Link @node into the waitqueue. */
> >               WRITE_ONCE(prev->next, node);
> >
> > -             arch_mcs_spin_lock_contended(&node->locked);
> > +             val = arch_mcs_spin_lock_contended(&node->locked);
> > +             if (val == RES_TIMEOUT_VAL) {
> > +                     ret = -EDEADLK;
> > +                     goto waitq_timeout;
> > +             }
> >
> >               /*
> >                * While waiting for the MCS lock, the next pointer may have
> > @@ -334,7 +342,35 @@ int __lockfunc resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 v
> >        * sequentiality; this is because the set_locked() function below
> >        * does not imply a full barrier.
> >        */
> > -     val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));
> > +     RES_RESET_TIMEOUT(ts);
> > +     val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
> > +                                    RES_CHECK_TIMEOUT(ts, ret));
>
> This has the same wfe problem for arm64.

Ack, I will keep the no-WFE fallback as mentioned in the reply to
Peter for now, and switch over once Ankur's smp_cond_load_*_timeout
patches land.

>
> Cheers,
> Longman
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-08 20:30       ` Linus Torvalds
@ 2025-01-08 21:06         ` Kumar Kartikeya Dwivedi
  2025-01-08 21:30         ` Paul E. McKenney
  1 sibling, 0 replies; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-08 21:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Will Deacon, bpf, linux-kernel, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Thu, 9 Jan 2025 at 02:00, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, 8 Jan 2025 at 12:13, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Yes, we also noticed during development that try_cmpxchg_tail (in
> > patch 9) couldn't rely on 16-bit cmpxchg being available everywhere
>
> I think that's purely a "we have had no use for it" issue.
>
> A 16-bit cmpxchg can always be written using a larger size, and we did
> that for 8-bit ones for RCU.
>
> See commit d4e287d7caff ("rcu-tasks: Remove open-coded one-byte
> cmpxchg() emulation") which switched RCU over to use a "native" 8-bit
> cmpxchg, because Paul had added the capability to all architectures,
> sometimes using a bigger size and "emulating" it: a88d970c8bb5 ("lib:
> Add one-byte emulation function").
>
> In fact, I think that series added a couple of 16-bit cases too, but I
> actually went "if we have no users, don't bother".

I see, that makes sense. I don't think we have a pressing need for it,
so it should be fine as is.

I initially used it because comparing other bits wasn't necessary when
we only needed to reset the tail back to 0, but we would fall back to
32-bit cmpxchg in case of NR_CPUS > 16k anyway, since the tail is >
16-bits in that config.

>
>               Linus

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-08 20:30       ` Linus Torvalds
  2025-01-08 21:06         ` Kumar Kartikeya Dwivedi
@ 2025-01-08 21:30         ` Paul E. McKenney
  1 sibling, 0 replies; 63+ messages in thread
From: Paul E. McKenney @ 2025-01-08 21:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kumar Kartikeya Dwivedi, Peter Zijlstra, Will Deacon, bpf,
	linux-kernel, Waiman Long, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Wed, Jan 08, 2025 at 12:30:27PM -0800, Linus Torvalds wrote:
> On Wed, 8 Jan 2025 at 12:13, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Yes, we also noticed during development that try_cmpxchg_tail (in
> > patch 9) couldn't rely on 16-bit cmpxchg being available everywhere
> 
> I think that's purely a "we have had no use for it" issue.
> 
> A 16-bit cmpxchg can always be written using a larger size, and we did
> that for 8-bit ones for RCU.
> 
> See commit d4e287d7caff ("rcu-tasks: Remove open-coded one-byte
> cmpxchg() emulation") which switched RCU over to use a "native" 8-bit
> cmpxchg, because Paul had added the capability to all architectures,
> sometimes using a bigger size and "emulating" it: a88d970c8bb5 ("lib:
> Add one-byte emulation function").

Glad you liked it.  ;-)

> In fact, I think that series added a couple of 16-bit cases too, but I
> actually went "if we have no users, don't bother".

Not only that, there were still architectures supported by the Linux
kernel that lacked 16-bit store instructions.  Although this does not
make 16-bit emulation useless, it does give it some nasty sharp edges
in the form of compilers turning those 16-bit stores into non-atomic
RMW instructions.  Or tearing them into 8-bit stores.

So yes, I dropped 16-bit emulated cmpxchg() from later versions of that
patch series.

When support for those architectures are dropped, I would be happy to do
the honors for 16-bit cmpxchg() emulation.  Or to review someone else's
doing the honors, for that matter.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and recovery
  2025-01-08 20:19     ` Kumar Kartikeya Dwivedi
@ 2025-01-09  0:32       ` Waiman Long
  0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2025-01-09  0:32 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Waiman Long
  Cc: bpf, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On 1/8/25 3:19 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Jan 2025 at 21:36, Waiman Long <llong@redhat.com> wrote:
>>
>> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
>>> While the timeout logic provides guarantees for the waiter's forward
>>> progress, the time until a stalling waiter unblocks can still be long.
>>> The default timeout of 1/2 sec can be excessively long for some use
>>> cases.  Additionally, custom timeouts may exacerbate recovery time.
>>>
>>> Introduce logic to detect common cases of deadlocks and perform quicker
>>> recovery. This is done by dividing the time from entry into the locking
>>> slow path until the timeout into intervals of 1 ms. Then, after each
>>> interval elapses, deadlock detection is performed, while also polling
>>> the lock word to ensure we can quickly break out of the detection logic
>>> and proceed with lock acquisition.
>>>
>>> A 'held_locks' table is maintained per-CPU where the entry at the bottom
>>> denotes a lock being waited for or already taken. Entries coming before
>>> it denote locks that are already held. The current CPU's table can thus
>>> be looked at to detect AA deadlocks. The tables from other CPUs can be
>>> looked at to discover ABBA situations. Finally, when a matching entry
>>> for the lock being taken on the current CPU is found on some other CPU,
>>> a deadlock situation is detected. This function can take a long time,
>>> therefore the lock word is constantly polled in each loop iteration to
>>> ensure we can preempt detection and proceed with lock acquisition, using
>>> the is_lock_released check.
>>>
>>> We set 'spin' member of rqspinlock_timeout struct to 0 to trigger
>>> deadlock checks immediately to perform faster recovery.
>>>
>>> Note: Extending lock word size by 4 bytes to record owner CPU can allow
>>> faster detection for ABBA. It is typically the owner which participates
>>> in a ABBA situation. However, to keep compatibility with existing lock
>>> words in the kernel (struct qspinlock), and given deadlocks are a rare
>>> event triggered by bugs, we choose to favor compatibility over faster
>>> detection.
>>>
>>> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>>> ---
>>>    include/asm-generic/rqspinlock.h |  56 +++++++++-
>>>    kernel/locking/rqspinlock.c      | 178 ++++++++++++++++++++++++++++---
>>>    2 files changed, 220 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
>>> index 5c996a82e75f..c7e33ccc57a6 100644
>>> --- a/include/asm-generic/rqspinlock.h
>>> +++ b/include/asm-generic/rqspinlock.h
>>> @@ -11,14 +11,68 @@
>>>
>>>    #include <linux/types.h>
>>>    #include <vdso/time64.h>
>>> +#include <linux/percpu.h>
>>>
>>>    struct qspinlock;
>>>
>>> +extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
>>> +
>>>    /*
>>>     * Default timeout for waiting loops is 0.5 seconds
>>>     */
>>>    #define RES_DEF_TIMEOUT (NSEC_PER_SEC / 2)
>>>
>>> -extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
>>> +#define RES_NR_HELD 32
>>> +
>>> +struct rqspinlock_held {
>>> +     int cnt;
>>> +     void *locks[RES_NR_HELD];
>>> +};
>>> +
>>> +DECLARE_PER_CPU_ALIGNED(struct rqspinlock_held, rqspinlock_held_locks);
>>> +
>>> +static __always_inline void grab_held_lock_entry(void *lock)
>>> +{
>>> +     int cnt = this_cpu_inc_return(rqspinlock_held_locks.cnt);
>>> +
>>> +     if (unlikely(cnt > RES_NR_HELD)) {
>>> +             /* Still keep the inc so we decrement later. */
>>> +             return;
>>> +     }
>>> +
>>> +     /*
>>> +      * Implied compiler barrier in per-CPU operations; otherwise we can have
>>> +      * the compiler reorder inc with write to table, allowing interrupts to
>>> +      * overwrite and erase our write to the table (as on interrupt exit it
>>> +      * will be reset to NULL).
>>> +      */
>>> +     this_cpu_write(rqspinlock_held_locks.locks[cnt - 1], lock);
>>> +}
>>> +
>>> +/*
>>> + * It is possible to run into misdetection scenarios of AA deadlocks on the same
>>> + * CPU, and missed ABBA deadlocks on remote CPUs when this function pops entries
>>> + * out of order (due to lock A, lock B, unlock A, unlock B) pattern. The correct
>>> + * logic to preserve right entries in the table would be to walk the array of
>>> + * held locks and swap and clear out-of-order entries, but that's too
>>> + * complicated and we don't have a compelling use case for out of order unlocking.
>> Maybe we can pass in the lock and print a warning if out-of-order unlock
>> is being done.
> I think alternatively, I will constrain the verifier in v2 to require
> lock release to be in-order, which would obviate the need to warn at
> runtime and reject programs potentially doing out-of-order unlocks.
> This doesn't cover in-kernel users though, but we're not doing
> out-of-order unlocks with this lock there, and it would be yet another
> branch in the unlock function with little benefit.

That will work too.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-08 20:32     ` Kumar Kartikeya Dwivedi
@ 2025-01-09  0:48       ` Waiman Long
  2025-01-09  2:42         ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-09  0:48 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Waiman Long
  Cc: bpf, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team


On 1/8/25 3:32 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Jan 2025 at 21:57, Waiman Long <llong@redhat.com> wrote:
>> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
>>> We ripped out PV and virtualization related bits from rqspinlock in an
>>> earlier commit, however, a fair lock performs poorly within a virtual
>>> machine when the lock holder is preempted. As such, retain the
>>> virt_spin_lock fallback to test and set lock, but with timeout and
>>> deadlock detection.
>>>
>>> We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
>>> requires more involved algorithmic changes and introduces more
>>> complexity. It can be done when the need arises in the future.
>> virt_spin_lock() doesn't scale well. It is for hypervisors that don't
>> support PV qspinlock yet. Now rqspinlock() will be in this category.
> We would need to make algorithmic changes to paravirt versions, which
> would be too much for this series, so I didn't go there.
I know. The paravirt part is the most difficult. It took me over a year 
to work on the paravirt part of qspinlock to get it right and merged 
upstream.
>
>> I wonder if we should provide an option to disable rqspinlock and fall
>> back to the regular qspinlock with strict BPF locking semantics.
> That unfortunately won't work, because rqspinlock operates essentially
> like a trylock, where it is allowed to fail and callers must handle
> errors accordingly. Some of the users in BPF (e.g. in patch 17) remove
> their per-cpu nesting counts to rely on AA deadlock detection of
> rqspinlock, which would cause a deadlock if we transparently replace
> it with qspinlock as a fallback.

I see. This information should be documented somewhere.


>> Another question that I have is about PREEMPT_RT kernel which cannot
>> tolerate any locking stall. That will probably require disabling
>> rqspinlock if CONFIG_PREEMPT_RT is enabled.
> I think rqspinlock better maps to the raw spin lock variants, which
> stays as a spin lock on RT kernels, and as you see in patch 17 and 18,
> BPF maps were already using the raw spin lock variants. To avoid
> stalling, we perform deadlock checks immediately when we enter the
> slow path, so for the cases where we rely upon rqspinlock to diagnose
> and report an error, we'll recover quickly. If we still hit the
> timeout it is probably a different problem / bug anyway (and would
> have caused a kernel hang otherwise).

Is the intention to only replace raw_spinlock_t by rqspinlock but never 
spinlock_t? Again, this information need to be documented. Looking at 
the pdf file, it looks like the rqspinlock usage will be extended over time.

As for the locking semantics allowed by the BPF verifier, is it possible 
to enforce the strict locking rules for PREEMPT_RT kernel and use the 
relaxed semantics for non-PREEMPT_RT kernel. We don't want the loading 
of an arbitrary BPF program to break the latency guarantee of a 
PREEMPT_RT kernel.

Cheers,
Longman




^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage
  2025-01-08 20:41     ` Kumar Kartikeya Dwivedi
@ 2025-01-09  1:11       ` Waiman Long
  2025-01-09  3:30         ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-09  1:11 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Waiman Long
  Cc: bpf, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On 1/8/25 3:41 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Jan 2025 at 22:26, Waiman Long <llong@redhat.com> wrote:
>> On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:
>>> Introduce helper macros that wrap around the rqspinlock slow path and
>>> provide an interface analogous to the raw_spin_lock API. Note that
>>> in case of error conditions, preemption and IRQ disabling is
>>> automatically unrolled before returning the error back to the caller.
>>>
>>> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
>>> ---
>>>    include/asm-generic/rqspinlock.h | 58 ++++++++++++++++++++++++++++++++
>>>    1 file changed, 58 insertions(+)
>>>
>>> diff --git a/include/asm-generic/rqspinlock.h b/include/asm-generic/rqspinlock.h
>>> index dc436ab01471..53be8426373c 100644
>>> --- a/include/asm-generic/rqspinlock.h
>>> +++ b/include/asm-generic/rqspinlock.h
>>> @@ -12,8 +12,10 @@
>>>    #include <linux/types.h>
>>>    #include <vdso/time64.h>
>>>    #include <linux/percpu.h>
>>> +#include <asm/qspinlock.h>
>>>
>>>    struct qspinlock;
>>> +typedef struct qspinlock rqspinlock_t;
>>>
>>>    extern int resilient_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val, u64 timeout);
>>>
>>> @@ -82,4 +84,60 @@ static __always_inline void release_held_lock_entry(void)
>>>        this_cpu_dec(rqspinlock_held_locks.cnt);
>>>    }
>>>
>>> +/**
>>> + * res_spin_lock - acquire a queued spinlock
>>> + * @lock: Pointer to queued spinlock structure
>>> + */
>>> +static __always_inline int res_spin_lock(rqspinlock_t *lock)
>>> +{
>>> +     int val = 0;
>>> +
>>> +     if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) {
>>> +             grab_held_lock_entry(lock);
>>> +             return 0;
>>> +     }
>>> +     return resilient_queued_spin_lock_slowpath(lock, val, RES_DEF_TIMEOUT);
>>> +}
>>> +
>>> +static __always_inline void res_spin_unlock(rqspinlock_t *lock)
>>> +{
>>> +     struct rqspinlock_held *rqh = this_cpu_ptr(&rqspinlock_held_locks);
>>> +
>>> +     if (unlikely(rqh->cnt > RES_NR_HELD))
>>> +             goto unlock;
>>> +     WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL);
>>> +     /*
>>> +      * Release barrier, ensuring ordering. See release_held_lock_entry.
>>> +      */
>>> +unlock:
>>> +     queued_spin_unlock(lock);
>>> +     this_cpu_dec(rqspinlock_held_locks.cnt);
>>> +}
>>> +
>>> +#define raw_res_spin_lock_init(lock) ({ *(lock) = (struct qspinlock)__ARCH_SPIN_LOCK_UNLOCKED; })
>>> +
>>> +#define raw_res_spin_lock(lock)                    \
>>> +     ({                                         \
>>> +             int __ret;                         \
>>> +             preempt_disable();                 \
>>> +             __ret = res_spin_lock(lock);       \
>>> +             if (__ret)                         \
>>> +                     preempt_enable();          \
>>> +             __ret;                             \
>>> +     })
>>> +
>>> +#define raw_res_spin_unlock(lock) ({ res_spin_unlock(lock); preempt_enable(); })
>>> +
>>> +#define raw_res_spin_lock_irqsave(lock, flags)    \
>>> +     ({                                        \
>>> +             int __ret;                        \
>>> +             local_irq_save(flags);            \
>>> +             __ret = raw_res_spin_lock(lock);  \
>>> +             if (__ret)                        \
>>> +                     local_irq_restore(flags); \
>>> +             __ret;                            \
>>> +     })
>>> +
>>> +#define raw_res_spin_unlock_irqrestore(lock, flags) ({ raw_res_spin_unlock(lock); local_irq_restore(flags); })
>>> +
>>>    #endif /* __ASM_GENERIC_RQSPINLOCK_H */
>> Lockdep calls aren't included in the helper functions. That means all
>> the *res_spin_lock*() calls will be outside the purview of lockdep. That
>> also means a multi-CPU circular locking dependency involving a mixture
>> of qspinlocks and rqspinlocks may not be detectable.
> Yes, this is true, but I am not sure whether lockdep fits well in this
> case, or how to map its semantics.
> Some BPF users (e.g. in patch 17) expect and rely on rqspinlock to
> return errors on AA deadlocks, as nesting is possible, so we'll get
> false alarms with it. Lockdep also needs to treat rqspinlock as a
> trylock, since it's essentially fallible, and IIUC it skips diagnosing
> in those cases.
Yes, we can certainly treat rqspinlock as a trylock.

> Most of the users use rqspinlock because it is expected a deadlock may
> be constructed at runtime (either due to BPF programs or by attaching
> programs to the kernel), so lockdep splats will not be helpful on
> debug kernels.

In most cases, lockdep will report a cyclic locking dependency 
(potential deadlock) before a real deadlock happens as it requires the 
right combination of events happening in a specific sequence. So lockdep 
can report a deadlock while the runtime check of rqspinlock may not see 
it and there is no locking stall. Also rqspinlock will not see the other 
locks held in the current context.


> Say if a mix of both qspinlock and rqspinlock were involved in an ABBA
> situation, as long as rqspinlock is being acquired on one of the
> threads, it will still timeout even if check_deadlock fails to
> establish presence of a deadlock. This will mean the qspinlock call on
> the other side will make progress as long as the kernel unwinds locks
> correctly on failures (by handling rqspinlock errors and releasing
> held locks on the way out).

That is true only if the latest lock to be acquired is a rqspinlock. If 
all the rqspinlocks in the circular path have already been acquired, no 
unwinding is possible.

That is probably not an issue with the limited rqspinlock conversion in 
this patch series. In the future when more and more locks are converted 
to use rqspinlock, this scenario may happen.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-09  0:48       ` Waiman Long
@ 2025-01-09  2:42         ` Alexei Starovoitov
  2025-01-09  2:58           ` Waiman Long
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-01-09  2:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: Kumar Kartikeya Dwivedi, bpf, LKML, Linus Torvalds,
	Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Paul E. McKenney, Tejun Heo, Barret Rhoden, Josh Don, Dohyun Kim,
	Kernel Team

On Wed, Jan 8, 2025 at 4:48 PM Waiman Long <llong@redhat.com> wrote:
>
> Is the intention to only replace raw_spinlock_t by rqspinlock but never
> spinlock_t?

Correct. We brainstormed whether we can introduce resilient mutex
for sleepable context, but it's way out of scope and PI
considerations are too complex to think through.
rqspinlock is a spinning lock, so it's a replacement for raw_spin_lock
and really only for bpf use cases.

We considered placing rqspinlock.c in kernel/bpf/ directory
to discourage any other use beyond bpf,
but decided to keep in kernel/locking/ only because
it's using mcs_spinlock.h and qspinlock_stat.h
and doing #include "../locking/mcs_spinlock.h"
is kinda ugly.

Patch 16 does:
+++ b/kernel/locking/Makefile
@@ -24,6 +24,9 @@  obj-$(CONFIG_SMP) += spinlock.o
 obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
 obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
+ifeq ($(CONFIG_BPF_SYSCALL),y)
+obj-$(CONFIG_QUEUED_SPINLOCKS) += rqspinlock.o
+endif

so that should give enough of a hint that it's for bpf usage.

> As for the locking semantics allowed by the BPF verifier, is it possible
> to enforce the strict locking rules for PREEMPT_RT kernel and use the
> relaxed semantics for non-PREEMPT_RT kernel. We don't want the loading
> of an arbitrary BPF program to break the latency guarantee of a
> PREEMPT_RT kernel.

Not really.
root can load silly bpf progs that take significant
amount time without abusing spinlocks.
Like 100k integer divides or a sequence of thousands of calls to map_update.
Long runtime of broken progs is a known issue.
We're working on a runtime termination check/watchdog that
will detect long running progs and will terminate them.
Safe termination is tricky, as you can imagine.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-09  2:42         ` Alexei Starovoitov
@ 2025-01-09  2:58           ` Waiman Long
  2025-01-09  3:37             ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-09  2:58 UTC (permalink / raw)
  To: Alexei Starovoitov, Waiman Long
  Cc: Kumar Kartikeya Dwivedi, bpf, LKML, Linus Torvalds,
	Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Paul E. McKenney, Tejun Heo, Barret Rhoden, Josh Don, Dohyun Kim,
	Kernel Team


On 1/8/25 9:42 PM, Alexei Starovoitov wrote:
> On Wed, Jan 8, 2025 at 4:48 PM Waiman Long <llong@redhat.com> wrote:
>> Is the intention to only replace raw_spinlock_t by rqspinlock but never
>> spinlock_t?
> Correct. We brainstormed whether we can introduce resilient mutex
> for sleepable context, but it's way out of scope and PI
> considerations are too complex to think through.
> rqspinlock is a spinning lock, so it's a replacement for raw_spin_lock
> and really only for bpf use cases.
Thank for the confirmation. I think we should document the fact that 
rqspinlock is a replacement for raw_spin_lock only in the rqspinlock.c 
file to prevent possible abuse in the future.
>
> We considered placing rqspinlock.c in kernel/bpf/ directory
> to discourage any other use beyond bpf,
> but decided to keep in kernel/locking/ only because
> it's using mcs_spinlock.h and qspinlock_stat.h
> and doing #include "../locking/mcs_spinlock.h"
> is kinda ugly.
>
> Patch 16 does:
> +++ b/kernel/locking/Makefile
> @@ -24,6 +24,9 @@  obj-$(CONFIG_SMP) += spinlock.o
>   obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
>   obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
>   obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
> +ifeq ($(CONFIG_BPF_SYSCALL),y)
> +obj-$(CONFIG_QUEUED_SPINLOCKS) += rqspinlock.o
> +endif
>
> so that should give enough of a hint that it's for bpf usage.
>
>> As for the locking semantics allowed by the BPF verifier, is it possible
>> to enforce the strict locking rules for PREEMPT_RT kernel and use the
>> relaxed semantics for non-PREEMPT_RT kernel. We don't want the loading
>> of an arbitrary BPF program to break the latency guarantee of a
>> PREEMPT_RT kernel.
> Not really.
> root can load silly bpf progs that take significant
> amount time without abusing spinlocks.
> Like 100k integer divides or a sequence of thousands of calls to map_update.
> Long runtime of broken progs is a known issue.
> We're working on a runtime termination check/watchdog that
> will detect long running progs and will terminate them.
> Safe termination is tricky, as you can imagine.

Right.

In that case, we just have to warn users that they can load BPF prog at 
their own risk and PREEMPT_RT kernel may break its latency guarantee.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage
  2025-01-09  1:11       ` Waiman Long
@ 2025-01-09  3:30         ` Alexei Starovoitov
  2025-01-09  4:09           ` Waiman Long
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-01-09  3:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: Kumar Kartikeya Dwivedi, bpf, LKML, Linus Torvalds,
	Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Paul E. McKenney, Tejun Heo, Barret Rhoden, Josh Don, Dohyun Kim,
	Kernel Team

On Wed, Jan 8, 2025 at 5:11 PM Waiman Long <llong@redhat.com> wrote:
>
>
> > Most of the users use rqspinlock because it is expected a deadlock may
> > be constructed at runtime (either due to BPF programs or by attaching
> > programs to the kernel), so lockdep splats will not be helpful on
> > debug kernels.
>
> In most cases, lockdep will report a cyclic locking dependency
> (potential deadlock) before a real deadlock happens as it requires the
> right combination of events happening in a specific sequence. So lockdep
> can report a deadlock while the runtime check of rqspinlock may not see
> it and there is no locking stall. Also rqspinlock will not see the other
> locks held in the current context.
>
>
> > Say if a mix of both qspinlock and rqspinlock were involved in an ABBA
> > situation, as long as rqspinlock is being acquired on one of the
> > threads, it will still timeout even if check_deadlock fails to
> > establish presence of a deadlock. This will mean the qspinlock call on
> > the other side will make progress as long as the kernel unwinds locks
> > correctly on failures (by handling rqspinlock errors and releasing
> > held locks on the way out).
>
> That is true only if the latest lock to be acquired is a rqspinlock. If.
> all the rqspinlocks in the circular path have already been acquired, no
> unwinding is possible.

There is no 'last lock'. If it's not an AA deadlock there are more
than 1 cpu that are spinning. In a hypothetical mix of rqspinlocks
and regular raw_spinlocks at least one cpu will be spinning on
rqspinlock and despite missing the entries in the lock table it will
still exit by timeout. The execution will continue and eventually
all locks will be released.

We considered annotating rqspinlock as trylock with
raw_spin_lock_init lock class, but usefulness is quite limited.
It's trylock only. So it may appear in a circular dependency
only if it's a combination of raw_spin_locks and rqspinlocks
which is not supposed to ever happen once we convert all bpf inner
parts to rqspinlock.
Patches 17,18,19 convert the main offenders. Few remain
that need a bit more thinking.
At the end all locks at the leaves will be rqspinlocks and
no normal locks will be taken after
(unless NMIs are doing silly things).
And since rqspinlock is a trylock, lockdep will never complain
on rqspinlock.
Even if NMI handler is buggy it's unlikely that NMI's raw_spin_lock
is in a circular dependency with rqspinlock on bpf side.
So rqspinlock entries will be adding computational
overhead to lockdep engine to filter out and not much more.

This all assumes that rqspinlocks are limited to bpf, of course.

If rqspinlock has use cases beyond bpf then, sure, let's add
trylock lockdep annotations.

Note that if there is an actual bug on bpf side with rqspinlock usage
it will be reported even when lockdep is off.
This is patch 13.
Currently it's pr_info() of held rqspinlocks and dumpstack,
but in the future we plan to make it better consumable by bpf
side. Printing into something like a special trace_pipe.
This is tbd.

> That is probably not an issue with the limited rqspinlock conversion in
> this patch series. In the future when more and more locks are converted
> to use rqspinlock, this scenario may happen.

The rqspinlock usage should be limited to bpf and no other
normal lock should be taken after.
At least that was the intent.
If folks feel that it's useful beyond bpf then we need to think harder.
lockdep annotations is an easy part to add.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-09  2:58           ` Waiman Long
@ 2025-01-09  3:37             ` Alexei Starovoitov
  2025-01-09  3:46               ` Waiman Long
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-01-09  3:37 UTC (permalink / raw)
  To: Waiman Long
  Cc: Kumar Kartikeya Dwivedi, bpf, LKML, Linus Torvalds,
	Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Paul E. McKenney, Tejun Heo, Barret Rhoden, Josh Don, Dohyun Kim,
	Kernel Team

On Wed, Jan 8, 2025 at 6:58 PM Waiman Long <llong@redhat.com> wrote:
>
>
> On 1/8/25 9:42 PM, Alexei Starovoitov wrote:
> > On Wed, Jan 8, 2025 at 4:48 PM Waiman Long <llong@redhat.com> wrote:
> >> Is the intention to only replace raw_spinlock_t by rqspinlock but never
> >> spinlock_t?
> > Correct. We brainstormed whether we can introduce resilient mutex
> > for sleepable context, but it's way out of scope and PI
> > considerations are too complex to think through.
> > rqspinlock is a spinning lock, so it's a replacement for raw_spin_lock
> > and really only for bpf use cases.
> Thank for the confirmation. I think we should document the fact that
> rqspinlock is a replacement for raw_spin_lock only in the rqspinlock.c
> file to prevent possible abuse in the future.

Agreed.

> >
> > We considered placing rqspinlock.c in kernel/bpf/ directory
> > to discourage any other use beyond bpf,
> > but decided to keep in kernel/locking/ only because
> > it's using mcs_spinlock.h and qspinlock_stat.h
> > and doing #include "../locking/mcs_spinlock.h"
> > is kinda ugly.
> >
> > Patch 16 does:
> > +++ b/kernel/locking/Makefile
> > @@ -24,6 +24,9 @@  obj-$(CONFIG_SMP) += spinlock.o
> >   obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
> >   obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
> >   obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
> > +ifeq ($(CONFIG_BPF_SYSCALL),y)
> > +obj-$(CONFIG_QUEUED_SPINLOCKS) += rqspinlock.o
> > +endif
> >
> > so that should give enough of a hint that it's for bpf usage.
> >
> >> As for the locking semantics allowed by the BPF verifier, is it possible
> >> to enforce the strict locking rules for PREEMPT_RT kernel and use the
> >> relaxed semantics for non-PREEMPT_RT kernel. We don't want the loading
> >> of an arbitrary BPF program to break the latency guarantee of a
> >> PREEMPT_RT kernel.
> > Not really.
> > root can load silly bpf progs that take significant
> > amount time without abusing spinlocks.
> > Like 100k integer divides or a sequence of thousands of calls to map_update.
> > Long runtime of broken progs is a known issue.
> > We're working on a runtime termination check/watchdog that
> > will detect long running progs and will terminate them.
> > Safe termination is tricky, as you can imagine.
>
> Right.
>
> In that case, we just have to warn users that they can load BPF prog at
> their own risk and PREEMPT_RT kernel may break its latency guarantee.

Let's not open this can of worms.
There will be a proper watchdog eventually.
If we start to warn, when do we warn? On any bpf program loaded?
How about classic BPF ? tcpdump and seccomp ? They are limited
to 4k instructions, but folks can abuse that too.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-09  3:37             ` Alexei Starovoitov
@ 2025-01-09  3:46               ` Waiman Long
  2025-01-09  3:53                 ` Alexei Starovoitov
  0 siblings, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-09  3:46 UTC (permalink / raw)
  To: Alexei Starovoitov, Waiman Long
  Cc: Kumar Kartikeya Dwivedi, bpf, LKML, Linus Torvalds,
	Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Paul E. McKenney, Tejun Heo, Barret Rhoden, Josh Don, Dohyun Kim,
	Kernel Team

On 1/8/25 10:37 PM, Alexei Starovoitov wrote:
> On Wed, Jan 8, 2025 at 6:58 PM Waiman Long <llong@redhat.com> wrote:
>>
>> On 1/8/25 9:42 PM, Alexei Starovoitov wrote:
>>> On Wed, Jan 8, 2025 at 4:48 PM Waiman Long <llong@redhat.com> wrote:
>>>> Is the intention to only replace raw_spinlock_t by rqspinlock but never
>>>> spinlock_t?
>>> Correct. We brainstormed whether we can introduce resilient mutex
>>> for sleepable context, but it's way out of scope and PI
>>> considerations are too complex to think through.
>>> rqspinlock is a spinning lock, so it's a replacement for raw_spin_lock
>>> and really only for bpf use cases.
>> Thank for the confirmation. I think we should document the fact that
>> rqspinlock is a replacement for raw_spin_lock only in the rqspinlock.c
>> file to prevent possible abuse in the future.
> Agreed.
>
>>> We considered placing rqspinlock.c in kernel/bpf/ directory
>>> to discourage any other use beyond bpf,
>>> but decided to keep in kernel/locking/ only because
>>> it's using mcs_spinlock.h and qspinlock_stat.h
>>> and doing #include "../locking/mcs_spinlock.h"
>>> is kinda ugly.
>>>
>>> Patch 16 does:
>>> +++ b/kernel/locking/Makefile
>>> @@ -24,6 +24,9 @@  obj-$(CONFIG_SMP) += spinlock.o
>>>    obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
>>>    obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
>>>    obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
>>> +ifeq ($(CONFIG_BPF_SYSCALL),y)
>>> +obj-$(CONFIG_QUEUED_SPINLOCKS) += rqspinlock.o
>>> +endif
>>>
>>> so that should give enough of a hint that it's for bpf usage.
>>>
>>>> As for the locking semantics allowed by the BPF verifier, is it possible
>>>> to enforce the strict locking rules for PREEMPT_RT kernel and use the
>>>> relaxed semantics for non-PREEMPT_RT kernel. We don't want the loading
>>>> of an arbitrary BPF program to break the latency guarantee of a
>>>> PREEMPT_RT kernel.
>>> Not really.
>>> root can load silly bpf progs that take significant
>>> amount time without abusing spinlocks.
>>> Like 100k integer divides or a sequence of thousands of calls to map_update.
>>> Long runtime of broken progs is a known issue.
>>> We're working on a runtime termination check/watchdog that
>>> will detect long running progs and will terminate them.
>>> Safe termination is tricky, as you can imagine.
>> Right.
>>
>> In that case, we just have to warn users that they can load BPF prog at
>> their own risk and PREEMPT_RT kernel may break its latency guarantee.
> Let's not open this can of worms.
> There will be a proper watchdog eventually.
> If we start to warn, when do we warn? On any bpf program loaded?
> How about classic BPF ? tcpdump and seccomp ? They are limited
> to 4k instructions, but folks can abuse that too.

My intention is to document this somewhere, not to print out a warning 
in the kernel dmesg log.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-09  3:46               ` Waiman Long
@ 2025-01-09  3:53                 ` Alexei Starovoitov
  2025-01-09  3:58                   ` Waiman Long
  0 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-01-09  3:53 UTC (permalink / raw)
  To: Waiman Long
  Cc: Kumar Kartikeya Dwivedi, bpf, LKML, Linus Torvalds,
	Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Paul E. McKenney, Tejun Heo, Barret Rhoden, Josh Don, Dohyun Kim,
	Kernel Team

On Wed, Jan 8, 2025 at 7:46 PM Waiman Long <llong@redhat.com> wrote:
>
> >>>> As for the locking semantics allowed by the BPF verifier, is it possible
> >>>> to enforce the strict locking rules for PREEMPT_RT kernel and use the
> >>>> relaxed semantics for non-PREEMPT_RT kernel. We don't want the loading
> >>>> of an arbitrary BPF program to break the latency guarantee of a
> >>>> PREEMPT_RT kernel.
> >>> Not really.
> >>> root can load silly bpf progs that take significant
> >>> amount time without abusing spinlocks.
> >>> Like 100k integer divides or a sequence of thousands of calls to map_update.
> >>> Long runtime of broken progs is a known issue.
> >>> We're working on a runtime termination check/watchdog that
> >>> will detect long running progs and will terminate them.
> >>> Safe termination is tricky, as you can imagine.
> >> Right.
> >>
> >> In that case, we just have to warn users that they can load BPF prog at
> >> their own risk and PREEMPT_RT kernel may break its latency guarantee.
> > Let's not open this can of worms.
> > There will be a proper watchdog eventually.
> > If we start to warn, when do we warn? On any bpf program loaded?
> > How about classic BPF ? tcpdump and seccomp ? They are limited
> > to 4k instructions, but folks can abuse that too.
>
> My intention is to document this somewhere, not to print out a warning
> in the kernel dmesg log.

Document what exactly?
"Loading arbitrary BPF program may break the latency guarantee of PREEMPT_RT"
?
That's not helpful to anyone.
Especially it undermines the giant effort we did together
with RT folks to make bpf behave well on RT.
For a long time bpf was the only user of migrate_disable().
Some of XDP bits got friendly to RT only in the last release. Etc.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT
  2025-01-09  3:53                 ` Alexei Starovoitov
@ 2025-01-09  3:58                   ` Waiman Long
  0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2025-01-09  3:58 UTC (permalink / raw)
  To: Alexei Starovoitov, Waiman Long
  Cc: Kumar Kartikeya Dwivedi, bpf, LKML, Linus Torvalds,
	Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Paul E. McKenney, Tejun Heo, Barret Rhoden, Josh Don, Dohyun Kim,
	Kernel Team

On 1/8/25 10:53 PM, Alexei Starovoitov wrote:
> On Wed, Jan 8, 2025 at 7:46 PM Waiman Long <llong@redhat.com> wrote:
>>>>>> As for the locking semantics allowed by the BPF verifier, is it possible
>>>>>> to enforce the strict locking rules for PREEMPT_RT kernel and use the
>>>>>> relaxed semantics for non-PREEMPT_RT kernel. We don't want the loading
>>>>>> of an arbitrary BPF program to break the latency guarantee of a
>>>>>> PREEMPT_RT kernel.
>>>>> Not really.
>>>>> root can load silly bpf progs that take significant
>>>>> amount time without abusing spinlocks.
>>>>> Like 100k integer divides or a sequence of thousands of calls to map_update.
>>>>> Long runtime of broken progs is a known issue.
>>>>> We're working on a runtime termination check/watchdog that
>>>>> will detect long running progs and will terminate them.
>>>>> Safe termination is tricky, as you can imagine.
>>>> Right.
>>>>
>>>> In that case, we just have to warn users that they can load BPF prog at
>>>> their own risk and PREEMPT_RT kernel may break its latency guarantee.
>>> Let's not open this can of worms.
>>> There will be a proper watchdog eventually.
>>> If we start to warn, when do we warn? On any bpf program loaded?
>>> How about classic BPF ? tcpdump and seccomp ? They are limited
>>> to 4k instructions, but folks can abuse that too.
>> My intention is to document this somewhere, not to print out a warning
>> in the kernel dmesg log.
> Document what exactly?
> "Loading arbitrary BPF program may break the latency guarantee of PREEMPT_RT"
> ?
> That's not helpful to anyone.
> Especially it undermines the giant effort we did together
> with RT folks to make bpf behave well on RT.
> For a long time bpf was the only user of migrate_disable().
> Some of XDP bits got friendly to RT only in the last release. Etc.

OK, it is just a suggestion. If you don't think that is necessary, I am 
not going to insist. Anyway, users should thoroughly test their BPF 
program before deplolying on production systems.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage
  2025-01-09  3:30         ` Alexei Starovoitov
@ 2025-01-09  4:09           ` Waiman Long
  0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2025-01-09  4:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Waiman Long
  Cc: Kumar Kartikeya Dwivedi, bpf, LKML, Linus Torvalds,
	Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Paul E. McKenney, Tejun Heo, Barret Rhoden, Josh Don, Dohyun Kim,
	Kernel Team

On 1/8/25 10:30 PM, Alexei Starovoitov wrote:
> On Wed, Jan 8, 2025 at 5:11 PM Waiman Long <llong@redhat.com> wrote:
>>
>>> Most of the users use rqspinlock because it is expected a deadlock may
>>> be constructed at runtime (either due to BPF programs or by attaching
>>> programs to the kernel), so lockdep splats will not be helpful on
>>> debug kernels.
>> In most cases, lockdep will report a cyclic locking dependency
>> (potential deadlock) before a real deadlock happens as it requires the
>> right combination of events happening in a specific sequence. So lockdep
>> can report a deadlock while the runtime check of rqspinlock may not see
>> it and there is no locking stall. Also rqspinlock will not see the other
>> locks held in the current context.
>>
>>
>>> Say if a mix of both qspinlock and rqspinlock were involved in an ABBA
>>> situation, as long as rqspinlock is being acquired on one of the
>>> threads, it will still timeout even if check_deadlock fails to
>>> establish presence of a deadlock. This will mean the qspinlock call on
>>> the other side will make progress as long as the kernel unwinds locks
>>> correctly on failures (by handling rqspinlock errors and releasing
>>> held locks on the way out).
>> That is true only if the latest lock to be acquired is a rqspinlock. If.
>> all the rqspinlocks in the circular path have already been acquired, no
>> unwinding is possible.
> There is no 'last lock'. If it's not an AA deadlock there are more
> than 1 cpu that are spinning. In a hypothetical mix of rqspinlocks
> and regular raw_spinlocks at least one cpu will be spinning on
> rqspinlock and despite missing the entries in the lock table it will
> still exit by timeout. The execution will continue and eventually
> all locks will be released.
>
> We considered annotating rqspinlock as trylock with
> raw_spin_lock_init lock class, but usefulness is quite limited.
> It's trylock only. So it may appear in a circular dependency
> only if it's a combination of raw_spin_locks and rqspinlocks
> which is not supposed to ever happen once we convert all bpf inner
> parts to rqspinlock.
> Patches 17,18,19 convert the main offenders. Few remain
> that need a bit more thinking.
> At the end all locks at the leaves will be rqspinlocks and
> no normal locks will be taken after
> (unless NMIs are doing silly things).
> And since rqspinlock is a trylock, lockdep will never complain
> on rqspinlock.
> Even if NMI handler is buggy it's unlikely that NMI's raw_spin_lock
> is in a circular dependency with rqspinlock on bpf side.
> So rqspinlock entries will be adding computational
> overhead to lockdep engine to filter out and not much more.
>
> This all assumes that rqspinlocks are limited to bpf, of course.
>
> If rqspinlock has use cases beyond bpf then, sure, let's add
> trylock lockdep annotations.
>
> Note that if there is an actual bug on bpf side with rqspinlock usage
> it will be reported even when lockdep is off.
> This is patch 13.
> Currently it's pr_info() of held rqspinlocks and dumpstack,
> but in the future we plan to make it better consumable by bpf
> side. Printing into something like a special trace_pipe.
> This is tbd.

If rqspinlock is only limited to within the BPF core and BPF progs and 
won't call out to other subsystems that may acquire other 
raw_spinlock's, lockdep may not be needed. Once the scope is extended 
beyond that, we certainly need to have lockdep enabled. Again, this has 
to be clearly documented.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-08 20:12     ` Kumar Kartikeya Dwivedi
  2025-01-08 20:30       ` Linus Torvalds
@ 2025-01-09 13:59       ` Waiman Long
  2025-01-09 21:13         ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 63+ messages in thread
From: Waiman Long @ 2025-01-09 13:59 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Peter Zijlstra
  Cc: Linus Torvalds, Will Deacon, bpf, linux-kernel, Waiman Long,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On 1/8/25 3:12 PM, Kumar Kartikeya Dwivedi wrote:
> On Wed, 8 Jan 2025 at 14:48, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Tue, Jan 07, 2025 at 03:54:36PM -0800, Linus Torvalds wrote:
>>> On Tue, 7 Jan 2025 at 06:00, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>>> This patch set introduces Resilient Queued Spin Lock (or rqspinlock with
>>>> res_spin_lock() and res_spin_unlock() APIs).
>>> So when I see people doing new locking mechanisms, I invariably go "Oh no!".
>>>
>>> But this series seems reasonable to me. I see that PeterZ had a couple
>>> of minor comments (well, the arm64 one is more fundamental), which
>>> hopefully means that it seems reasonable to him too. Peter?
>> I've not had time to fully read the whole thing yet, I only did a quick
>> once over. I'll try and get around to doing a proper reading eventually,
>> but I'm chasing a regression atm, and then I need to go review a ton of
>> code Andrew merged over the xmas/newyears holiday :/
>>
>> One potential issue is that qspinlock isn't suitable for all
>> architectures -- and I've yet to figure out widely BPF is planning on
>> using this.
> For architectures where qspinlock is not available, I think we can
> have a fallback to a test and set lock with timeout and deadlock
> checks, like patch 12.
> We plan on using this in BPF core and BPF maps, so the usage will be
> pervasive, and we have atleast one architecture in CI (s390) which
> doesn't have ARCH_USER_QUEUED_SPINLOCK selected, so we should have
> coverage for both cases. For now the fallback is missing, but I will
> add one in v2.

Event though ARCH_USE_QUEUED_SPINLOCK isn't set for s390, it is actually 
using its own variant of qspinlock which encodes in the lock word 
additional information needed by the architecture. Similary for PPC.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-09 13:59       ` Waiman Long
@ 2025-01-09 21:13         ` Kumar Kartikeya Dwivedi
  2025-01-09 21:18           ` Waiman Long
  0 siblings, 1 reply; 63+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-01-09 21:13 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Linus Torvalds, Will Deacon, bpf, linux-kernel,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On Thu, 9 Jan 2025 at 19:29, Waiman Long <llong@redhat.com> wrote:
>
> On 1/8/25 3:12 PM, Kumar Kartikeya Dwivedi wrote:
> > On Wed, 8 Jan 2025 at 14:48, Peter Zijlstra <peterz@infradead.org> wrote:
> >> On Tue, Jan 07, 2025 at 03:54:36PM -0800, Linus Torvalds wrote:
> >>> On Tue, 7 Jan 2025 at 06:00, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >>>> This patch set introduces Resilient Queued Spin Lock (or rqspinlock with
> >>>> res_spin_lock() and res_spin_unlock() APIs).
> >>> So when I see people doing new locking mechanisms, I invariably go "Oh no!".
> >>>
> >>> But this series seems reasonable to me. I see that PeterZ had a couple
> >>> of minor comments (well, the arm64 one is more fundamental), which
> >>> hopefully means that it seems reasonable to him too. Peter?
> >> I've not had time to fully read the whole thing yet, I only did a quick
> >> once over. I'll try and get around to doing a proper reading eventually,
> >> but I'm chasing a regression atm, and then I need to go review a ton of
> >> code Andrew merged over the xmas/newyears holiday :/
> >>
> >> One potential issue is that qspinlock isn't suitable for all
> >> architectures -- and I've yet to figure out widely BPF is planning on
> >> using this.
> > For architectures where qspinlock is not available, I think we can
> > have a fallback to a test and set lock with timeout and deadlock
> > checks, like patch 12.
> > We plan on using this in BPF core and BPF maps, so the usage will be
> > pervasive, and we have atleast one architecture in CI (s390) which
> > doesn't have ARCH_USER_QUEUED_SPINLOCK selected, so we should have
> > coverage for both cases. For now the fallback is missing, but I will
> > add one in v2.
>
> Event though ARCH_USE_QUEUED_SPINLOCK isn't set for s390, it is actually
> using its own variant of qspinlock which encodes in the lock word
> additional information needed by the architecture. Similary for PPC.

Thanks, I see that now. It seems it is pretty similar to the paravirt
scenario, where the algorithm would require changes to accommodate
rqspinlock bits.
For this series, I am planning to stick to a default TAS fallback, but
we can tackle these cases together in a follow up.
This series is already quite big and it would be better to focus on
the base rqspinlock bits to keep things reviewable.
Given we're only using this in BPF right now (in specific places where
we're mindful we may fall back to TAS on some arches), we won't be
regressing any other users.

>
> Cheers,
> Longman
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock
  2025-01-09 21:13         ` Kumar Kartikeya Dwivedi
@ 2025-01-09 21:18           ` Waiman Long
  0 siblings, 0 replies; 63+ messages in thread
From: Waiman Long @ 2025-01-09 21:18 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Waiman Long
  Cc: Peter Zijlstra, Linus Torvalds, Will Deacon, bpf, linux-kernel,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, Paul E. McKenney, Tejun Heo,
	Barret Rhoden, Josh Don, Dohyun Kim, kernel-team

On 1/9/25 4:13 PM, Kumar Kartikeya Dwivedi wrote:
> On Thu, 9 Jan 2025 at 19:29, Waiman Long <llong@redhat.com> wrote:
>> On 1/8/25 3:12 PM, Kumar Kartikeya Dwivedi wrote:
>>> On Wed, 8 Jan 2025 at 14:48, Peter Zijlstra <peterz@infradead.org> wrote:
>>>> On Tue, Jan 07, 2025 at 03:54:36PM -0800, Linus Torvalds wrote:
>>>>> On Tue, 7 Jan 2025 at 06:00, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>>>>>> This patch set introduces Resilient Queued Spin Lock (or rqspinlock with
>>>>>> res_spin_lock() and res_spin_unlock() APIs).
>>>>> So when I see people doing new locking mechanisms, I invariably go "Oh no!".
>>>>>
>>>>> But this series seems reasonable to me. I see that PeterZ had a couple
>>>>> of minor comments (well, the arm64 one is more fundamental), which
>>>>> hopefully means that it seems reasonable to him too. Peter?
>>>> I've not had time to fully read the whole thing yet, I only did a quick
>>>> once over. I'll try and get around to doing a proper reading eventually,
>>>> but I'm chasing a regression atm, and then I need to go review a ton of
>>>> code Andrew merged over the xmas/newyears holiday :/
>>>>
>>>> One potential issue is that qspinlock isn't suitable for all
>>>> architectures -- and I've yet to figure out widely BPF is planning on
>>>> using this.
>>> For architectures where qspinlock is not available, I think we can
>>> have a fallback to a test and set lock with timeout and deadlock
>>> checks, like patch 12.
>>> We plan on using this in BPF core and BPF maps, so the usage will be
>>> pervasive, and we have atleast one architecture in CI (s390) which
>>> doesn't have ARCH_USER_QUEUED_SPINLOCK selected, so we should have
>>> coverage for both cases. For now the fallback is missing, but I will
>>> add one in v2.
>> Event though ARCH_USE_QUEUED_SPINLOCK isn't set for s390, it is actually
>> using its own variant of qspinlock which encodes in the lock word
>> additional information needed by the architecture. Similary for PPC.
> Thanks, I see that now. It seems it is pretty similar to the paravirt
> scenario, where the algorithm would require changes to accommodate
> rqspinlock bits.
> For this series, I am planning to stick to a default TAS fallback, but
> we can tackle these cases together in a follow up.
> This series is already quite big and it would be better to focus on
> the base rqspinlock bits to keep things reviewable.
> Given we're only using this in BPF right now (in specific places where
> we're mindful we may fall back to TAS on some arches), we won't be
> regressing any other users.

I am not saying that you have to deal with that for the current patch 
series. However, it is something we need to tackle in the long run.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2025-01-09 21:18 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-07 13:59 [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 01/22] locking: Move MCS struct definition to public header Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 02/22] locking: Move common qspinlock helpers to a private header Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 03/22] locking: Allow obtaining result of arch_mcs_spin_lock_contended Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 04/22] locking: Copy out qspinlock.c to rqspinlock.c Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 05/22] rqspinlock: Add rqspinlock.h header Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 06/22] rqspinlock: Drop PV and virtualization support Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 07/22] rqspinlock: Add support for timeouts Kumar Kartikeya Dwivedi
2025-01-07 14:50   ` Peter Zijlstra
2025-01-07 17:14     ` Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 08/22] rqspinlock: Protect pending bit owners from stalls Kumar Kartikeya Dwivedi
2025-01-07 14:51   ` Peter Zijlstra
2025-01-07 17:14     ` Kumar Kartikeya Dwivedi
2025-01-07 19:17       ` Peter Zijlstra
2025-01-07 19:22         ` Peter Zijlstra
2025-01-07 19:54           ` Kumar Kartikeya Dwivedi
2025-01-08  2:19   ` Waiman Long
2025-01-08 20:13     ` Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 09/22] rqspinlock: Protect waiters in queue " Kumar Kartikeya Dwivedi
2025-01-08  3:38   ` Waiman Long
2025-01-08 20:42     ` Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 10/22] rqspinlock: Protect waiters in trylock fallback " Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 11/22] rqspinlock: Add deadlock detection and recovery Kumar Kartikeya Dwivedi
2025-01-08 16:06   ` Waiman Long
2025-01-08 20:19     ` Kumar Kartikeya Dwivedi
2025-01-09  0:32       ` Waiman Long
2025-01-07 13:59 ` [PATCH bpf-next v1 12/22] rqspinlock: Add basic support for CONFIG_PARAVIRT Kumar Kartikeya Dwivedi
2025-01-08 16:27   ` Waiman Long
2025-01-08 20:32     ` Kumar Kartikeya Dwivedi
2025-01-09  0:48       ` Waiman Long
2025-01-09  2:42         ` Alexei Starovoitov
2025-01-09  2:58           ` Waiman Long
2025-01-09  3:37             ` Alexei Starovoitov
2025-01-09  3:46               ` Waiman Long
2025-01-09  3:53                 ` Alexei Starovoitov
2025-01-09  3:58                   ` Waiman Long
2025-01-07 13:59 ` [PATCH bpf-next v1 13/22] rqspinlock: Add helper to print a splat on timeout or deadlock Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 14/22] rqspinlock: Add macros for rqspinlock usage Kumar Kartikeya Dwivedi
2025-01-08 16:55   ` Waiman Long
2025-01-08 20:41     ` Kumar Kartikeya Dwivedi
2025-01-09  1:11       ` Waiman Long
2025-01-09  3:30         ` Alexei Starovoitov
2025-01-09  4:09           ` Waiman Long
2025-01-07 13:59 ` [PATCH bpf-next v1 15/22] rqspinlock: Add locktorture support Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 16/22] rqspinlock: Add entry to Makefile, MAINTAINERS Kumar Kartikeya Dwivedi
2025-01-07 13:59 ` [PATCH bpf-next v1 17/22] bpf: Convert hashtab.c to rqspinlock Kumar Kartikeya Dwivedi
2025-01-07 14:00 ` [PATCH bpf-next v1 18/22] bpf: Convert percpu_freelist.c " Kumar Kartikeya Dwivedi
2025-01-07 14:00 ` [PATCH bpf-next v1 19/22] bpf: Convert lpm_trie.c " Kumar Kartikeya Dwivedi
2025-01-07 14:00 ` [PATCH bpf-next v1 20/22] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
2025-01-08 10:23   ` kernel test robot
2025-01-08 10:23   ` kernel test robot
2025-01-08 10:44   ` kernel test robot
2025-01-07 14:00 ` [PATCH bpf-next v1 21/22] bpf: Implement verifier support for rqspinlock Kumar Kartikeya Dwivedi
2025-01-07 14:00 ` [PATCH bpf-next v1 22/22] selftests/bpf: Add tests " Kumar Kartikeya Dwivedi
2025-01-07 23:54 ` [PATCH bpf-next v1 00/22] Resilient Queued Spin Lock Linus Torvalds
2025-01-08  9:18   ` Peter Zijlstra
2025-01-08 20:12     ` Kumar Kartikeya Dwivedi
2025-01-08 20:30       ` Linus Torvalds
2025-01-08 21:06         ` Kumar Kartikeya Dwivedi
2025-01-08 21:30         ` Paul E. McKenney
2025-01-09 13:59       ` Waiman Long
2025-01-09 21:13         ` Kumar Kartikeya Dwivedi
2025-01-09 21:18           ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox