* [PATCH] smp: Use release stores for csd_lock_record() state @ 2026-06-17 21:20 Usama Arif 2026-06-18 1:44 ` Alan Stern 0 siblings, 1 reply; 7+ messages in thread From: Usama Arif @ 2026-06-17 21:20 UTC (permalink / raw) To: lkmm, joelagnelf, linux-kernel, marco.crivellari, paulmck, rafael.j.wysocki, rdunlap, riel, sshegde, tglx, ulfh, usama.arif, yury.norov, rcu Cc: shakeel.butt, hannes, kernel-team __csd_lock_record() publishes per-CPU CSD debug state that is read by csd_lock_wait_toolong() on another CPU. The remote side first reads cur_csd with smp_load_acquire() and, when non-NULL, may then read the matching cur_csd_func and cur_csd_info fields. Use smp_store_release() when publishing cur_csd so that the preceding cur_csd_func and cur_csd_info stores are ordered before the pointer that csd_lock_wait_toolong() acquires. This replaces the open-coded smp_wmb() plus plain cur_csd store with the release operation that matches the smp_load_acquire() in csd_lock_wait_toolong(). For the clear path, use smp_store_release(&cur_csd, NULL) so that clearing the diagnostic state remains ordered after the preceding callback/unlock work, without requiring a full barrier before the store. On x86 this removes the locked full barrier from the clear path; on weaker memory models it uses the release operation needed by the smp_load_acquire() in csd_lock_wait_toolong(). The old code also had smp_mb() calls around cur_csd updates. Those would only be needed if cur_csd were treated as an exact live-state marker whose publication had to be observed before callback execution or CSD unlock. CSD stall warnings do not currently have RCU-style stall-ended checks, so they already allow the stall to end while diagnostics are being assembled. The cur_csd record is therefore best-effort diagnostic context, not a precise completion/stall boundary. Signed-off-by: Usama Arif <usama.arif@linux.dev> --- kernel/smp.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index a0bb56bd8dda..5ba4a20ba77d 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -182,16 +182,12 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); static void __csd_lock_record(call_single_data_t *csd) { if (!csd) { - smp_mb(); /* NULL cur_csd after unlock. */ - __this_cpu_write(cur_csd, NULL); + smp_store_release(this_cpu_ptr(&cur_csd), NULL); return; } __this_cpu_write(cur_csd_func, csd->func); __this_cpu_write(cur_csd_info, csd->info); - smp_wmb(); /* func and info before csd. */ - __this_cpu_write(cur_csd, csd); - smp_mb(); /* Update cur_csd before function call. */ - /* Or before unlock, as the case may be. */ + smp_store_release(this_cpu_ptr(&cur_csd), csd); } static __always_inline void csd_lock_record(call_single_data_t *csd) -- 2.53.0-Meta ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] smp: Use release stores for csd_lock_record() state 2026-06-17 21:20 [PATCH] smp: Use release stores for csd_lock_record() state Usama Arif @ 2026-06-18 1:44 ` Alan Stern 2026-06-18 3:30 ` Randy Dunlap 0 siblings, 1 reply; 7+ messages in thread From: Alan Stern @ 2026-06-18 1:44 UTC (permalink / raw) To: Usama Arif Cc: lkmm, joelagnelf, linux-kernel, marco.crivellari, paulmck, rafael.j.wysocki, rdunlap, riel, sshegde, tglx, ulfh, yury.norov, rcu, shakeel.butt, hannes, kernel-team On Wed, Jun 17, 2026 at 02:20:01PM -0700, Usama Arif wrote: > __csd_lock_record() publishes per-CPU CSD debug state that is read by > csd_lock_wait_toolong() on another CPU. The remote side first reads > cur_csd with smp_load_acquire() and, when non-NULL, may then read the > matching cur_csd_func and cur_csd_info fields. > > Use smp_store_release() when publishing cur_csd so that the preceding > cur_csd_func and cur_csd_info stores are ordered before the pointer > that csd_lock_wait_toolong() acquires. This replaces the open-coded > smp_wmb() plus plain cur_csd store with the release operation that > matches the smp_load_acquire() in csd_lock_wait_toolong(). > > For the clear path, use smp_store_release(&cur_csd, NULL) so that > clearing the diagnostic state remains ordered after the preceding > callback/unlock work, without requiring a full barrier before the > store. On x86 this removes the locked full barrier from the clear > path; on weaker memory models it uses the release operation needed by > the smp_load_acquire() in csd_lock_wait_toolong(). > > The old code also had smp_mb() calls around cur_csd updates. Those would > only be needed if cur_csd were treated as an exact live-state marker whose > publication had to be observed before callback execution or CSD unlock. > CSD stall warnings do not currently have RCU-style stall-ended checks, so > they already allow the stall to end while diagnostics are being assembled. > The cur_csd record is therefore best-effort diagnostic context, not a > precise completion/stall boundary. > > Signed-off-by: Usama Arif <usama.arif@linux.dev> > --- > kernel/smp.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/kernel/smp.c b/kernel/smp.c > index a0bb56bd8dda..5ba4a20ba77d 100644 > --- a/kernel/smp.c > +++ b/kernel/smp.c > @@ -182,16 +182,12 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); > static void __csd_lock_record(call_single_data_t *csd) > { > if (!csd) { > - smp_mb(); /* NULL cur_csd after unlock. */ > - __this_cpu_write(cur_csd, NULL); > + smp_store_release(this_cpu_ptr(&cur_csd), NULL); > return; > } > __this_cpu_write(cur_csd_func, csd->func); > __this_cpu_write(cur_csd_info, csd->info); > - smp_wmb(); /* func and info before csd. */ > - __this_cpu_write(cur_csd, csd); > - smp_mb(); /* Update cur_csd before function call. */ > - /* Or before unlock, as the case may be. */ > + smp_store_release(this_cpu_ptr(&cur_csd), csd); Isn't there a general policy in the kernel that memory barriers should be accompanied by a comment explaining what other memory barriers they synchronize with? Including such comments is a good idea in any case. Alan Stern ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] smp: Use release stores for csd_lock_record() state 2026-06-18 1:44 ` Alan Stern @ 2026-06-18 3:30 ` Randy Dunlap 2026-06-18 4:01 ` Paul E. McKenney 2026-06-18 14:44 ` Usama Arif 0 siblings, 2 replies; 7+ messages in thread From: Randy Dunlap @ 2026-06-18 3:30 UTC (permalink / raw) To: Alan Stern, Usama Arif Cc: lkmm, joelagnelf, linux-kernel, marco.crivellari, paulmck, rafael.j.wysocki, riel, sshegde, tglx, ulfh, yury.norov, rcu, shakeel.butt, hannes, kernel-team On 6/17/26 6:44 PM, Alan Stern wrote: > On Wed, Jun 17, 2026 at 02:20:01PM -0700, Usama Arif wrote: >> __csd_lock_record() publishes per-CPU CSD debug state that is read by >> csd_lock_wait_toolong() on another CPU. The remote side first reads >> cur_csd with smp_load_acquire() and, when non-NULL, may then read the >> matching cur_csd_func and cur_csd_info fields. >> >> Use smp_store_release() when publishing cur_csd so that the preceding >> cur_csd_func and cur_csd_info stores are ordered before the pointer >> that csd_lock_wait_toolong() acquires. This replaces the open-coded >> smp_wmb() plus plain cur_csd store with the release operation that >> matches the smp_load_acquire() in csd_lock_wait_toolong(). >> >> For the clear path, use smp_store_release(&cur_csd, NULL) so that >> clearing the diagnostic state remains ordered after the preceding >> callback/unlock work, without requiring a full barrier before the >> store. On x86 this removes the locked full barrier from the clear >> path; on weaker memory models it uses the release operation needed by >> the smp_load_acquire() in csd_lock_wait_toolong(). >> >> The old code also had smp_mb() calls around cur_csd updates. Those would >> only be needed if cur_csd were treated as an exact live-state marker whose >> publication had to be observed before callback execution or CSD unlock. >> CSD stall warnings do not currently have RCU-style stall-ended checks, so >> they already allow the stall to end while diagnostics are being assembled. >> The cur_csd record is therefore best-effort diagnostic context, not a >> precise completion/stall boundary. >> >> Signed-off-by: Usama Arif <usama.arif@linux.dev> >> --- >> kernel/smp.c | 8 ++------ >> 1 file changed, 2 insertions(+), 6 deletions(-) >> >> diff --git a/kernel/smp.c b/kernel/smp.c >> index a0bb56bd8dda..5ba4a20ba77d 100644 >> --- a/kernel/smp.c >> +++ b/kernel/smp.c >> @@ -182,16 +182,12 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); >> static void __csd_lock_record(call_single_data_t *csd) >> { >> if (!csd) { >> - smp_mb(); /* NULL cur_csd after unlock. */ >> - __this_cpu_write(cur_csd, NULL); >> + smp_store_release(this_cpu_ptr(&cur_csd), NULL); >> return; >> } >> __this_cpu_write(cur_csd_func, csd->func); >> __this_cpu_write(cur_csd_info, csd->info); >> - smp_wmb(); /* func and info before csd. */ >> - __this_cpu_write(cur_csd, csd); >> - smp_mb(); /* Update cur_csd before function call. */ >> - /* Or before unlock, as the case may be. */ >> + smp_store_release(this_cpu_ptr(&cur_csd), csd); > > Isn't there a general policy in the kernel that memory barriers should > be accompanied by a comment explaining what other memory barriers they > synchronize with? Including such comments is a good idea in any case. in Documentation/process/submit-checklist.rst: 3) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a comment in the source code that explains the logic of what they are doing and why. in Documentation/process/4.Coding.rst: Certain things should always be commented. Uses of memory barriers should be accompanied by a line explaining why the barrier is necessary. but looking in the 3000+ lines of Documentation/memory-barriers.txt won't tell anyone about that. -- ~Randy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] smp: Use release stores for csd_lock_record() state 2026-06-18 3:30 ` Randy Dunlap @ 2026-06-18 4:01 ` Paul E. McKenney 2026-06-18 14:44 ` Usama Arif 1 sibling, 0 replies; 7+ messages in thread From: Paul E. McKenney @ 2026-06-18 4:01 UTC (permalink / raw) To: Randy Dunlap Cc: Alan Stern, Usama Arif, lkmm, joelagnelf, linux-kernel, marco.crivellari, rafael.j.wysocki, riel, sshegde, tglx, ulfh, yury.norov, rcu, shakeel.butt, hannes, kernel-team On Wed, Jun 17, 2026 at 08:30:32PM -0700, Randy Dunlap wrote: > > > On 6/17/26 6:44 PM, Alan Stern wrote: > > On Wed, Jun 17, 2026 at 02:20:01PM -0700, Usama Arif wrote: > >> __csd_lock_record() publishes per-CPU CSD debug state that is read by > >> csd_lock_wait_toolong() on another CPU. The remote side first reads > >> cur_csd with smp_load_acquire() and, when non-NULL, may then read the > >> matching cur_csd_func and cur_csd_info fields. > >> > >> Use smp_store_release() when publishing cur_csd so that the preceding > >> cur_csd_func and cur_csd_info stores are ordered before the pointer > >> that csd_lock_wait_toolong() acquires. This replaces the open-coded > >> smp_wmb() plus plain cur_csd store with the release operation that > >> matches the smp_load_acquire() in csd_lock_wait_toolong(). > >> > >> For the clear path, use smp_store_release(&cur_csd, NULL) so that > >> clearing the diagnostic state remains ordered after the preceding > >> callback/unlock work, without requiring a full barrier before the > >> store. On x86 this removes the locked full barrier from the clear > >> path; on weaker memory models it uses the release operation needed by > >> the smp_load_acquire() in csd_lock_wait_toolong(). > >> > >> The old code also had smp_mb() calls around cur_csd updates. Those would > >> only be needed if cur_csd were treated as an exact live-state marker whose > >> publication had to be observed before callback execution or CSD unlock. > >> CSD stall warnings do not currently have RCU-style stall-ended checks, so > >> they already allow the stall to end while diagnostics are being assembled. > >> The cur_csd record is therefore best-effort diagnostic context, not a > >> precise completion/stall boundary. > >> > >> Signed-off-by: Usama Arif <usama.arif@linux.dev> > >> --- > >> kernel/smp.c | 8 ++------ > >> 1 file changed, 2 insertions(+), 6 deletions(-) > >> > >> diff --git a/kernel/smp.c b/kernel/smp.c > >> index a0bb56bd8dda..5ba4a20ba77d 100644 > >> --- a/kernel/smp.c > >> +++ b/kernel/smp.c > >> @@ -182,16 +182,12 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); > >> static void __csd_lock_record(call_single_data_t *csd) > >> { > >> if (!csd) { > >> - smp_mb(); /* NULL cur_csd after unlock. */ > >> - __this_cpu_write(cur_csd, NULL); > >> + smp_store_release(this_cpu_ptr(&cur_csd), NULL); > >> return; > >> } > >> __this_cpu_write(cur_csd_func, csd->func); > >> __this_cpu_write(cur_csd_info, csd->info); > >> - smp_wmb(); /* func and info before csd. */ > >> - __this_cpu_write(cur_csd, csd); > >> - smp_mb(); /* Update cur_csd before function call. */ > >> - /* Or before unlock, as the case may be. */ > >> + smp_store_release(this_cpu_ptr(&cur_csd), csd); > > > > Isn't there a general policy in the kernel that memory barriers should > > be accompanied by a comment explaining what other memory barriers they > > synchronize with? Including such comments is a good idea in any case. > > in Documentation/process/submit-checklist.rst: > > 3) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a > comment in the source code that explains the logic of what they are doing > and why. > > in Documentation/process/4.Coding.rst: > > Certain things should always be commented. Uses of memory barriers should > be accompanied by a line explaining why the barrier is necessary. > > but looking in the 3000+ lines of Documentation/memory-barriers.txt won't tell > anyone about that. Is smp_store_release() a memory barrier? Sorry, couldn't resist... ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] smp: Use release stores for csd_lock_record() state 2026-06-18 3:30 ` Randy Dunlap 2026-06-18 4:01 ` Paul E. McKenney @ 2026-06-18 14:44 ` Usama Arif 2026-06-18 14:57 ` Paul E. McKenney 1 sibling, 1 reply; 7+ messages in thread From: Usama Arif @ 2026-06-18 14:44 UTC (permalink / raw) To: Randy Dunlap, Alan Stern, paulmck Cc: lkmm, joelagnelf, linux-kernel, marco.crivellari, paulmck, rafael.j.wysocki, riel, sshegde, tglx, ulfh, yury.norov, rcu, shakeel.butt, hannes, kernel-team On 18/06/2026 04:30, Randy Dunlap wrote: > > > On 6/17/26 6:44 PM, Alan Stern wrote: >> On Wed, Jun 17, 2026 at 02:20:01PM -0700, Usama Arif wrote: >>> __csd_lock_record() publishes per-CPU CSD debug state that is read by >>> csd_lock_wait_toolong() on another CPU. The remote side first reads >>> cur_csd with smp_load_acquire() and, when non-NULL, may then read the >>> matching cur_csd_func and cur_csd_info fields. >>> >>> Use smp_store_release() when publishing cur_csd so that the preceding >>> cur_csd_func and cur_csd_info stores are ordered before the pointer >>> that csd_lock_wait_toolong() acquires. This replaces the open-coded >>> smp_wmb() plus plain cur_csd store with the release operation that >>> matches the smp_load_acquire() in csd_lock_wait_toolong(). >>> >>> For the clear path, use smp_store_release(&cur_csd, NULL) so that >>> clearing the diagnostic state remains ordered after the preceding >>> callback/unlock work, without requiring a full barrier before the >>> store. On x86 this removes the locked full barrier from the clear >>> path; on weaker memory models it uses the release operation needed by >>> the smp_load_acquire() in csd_lock_wait_toolong(). >>> >>> The old code also had smp_mb() calls around cur_csd updates. Those would >>> only be needed if cur_csd were treated as an exact live-state marker whose >>> publication had to be observed before callback execution or CSD unlock. >>> CSD stall warnings do not currently have RCU-style stall-ended checks, so >>> they already allow the stall to end while diagnostics are being assembled. >>> The cur_csd record is therefore best-effort diagnostic context, not a >>> precise completion/stall boundary. >>> >>> Signed-off-by: Usama Arif <usama.arif@linux.dev> >>> --- >>> kernel/smp.c | 8 ++------ >>> 1 file changed, 2 insertions(+), 6 deletions(-) >>> >>> diff --git a/kernel/smp.c b/kernel/smp.c >>> index a0bb56bd8dda..5ba4a20ba77d 100644 >>> --- a/kernel/smp.c >>> +++ b/kernel/smp.c >>> @@ -182,16 +182,12 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); >>> static void __csd_lock_record(call_single_data_t *csd) >>> { >>> if (!csd) { >>> - smp_mb(); /* NULL cur_csd after unlock. */ >>> - __this_cpu_write(cur_csd, NULL); >>> + smp_store_release(this_cpu_ptr(&cur_csd), NULL); >>> return; >>> } >>> __this_cpu_write(cur_csd_func, csd->func); >>> __this_cpu_write(cur_csd_info, csd->info); >>> - smp_wmb(); /* func and info before csd. */ >>> - __this_cpu_write(cur_csd, csd); >>> - smp_mb(); /* Update cur_csd before function call. */ >>> - /* Or before unlock, as the case may be. */ >>> + smp_store_release(this_cpu_ptr(&cur_csd), csd); >> >> Isn't there a general policy in the kernel that memory barriers should >> be accompanied by a comment explaining what other memory barriers they >> synchronize with? Including such comments is a good idea in any case. > > in Documentation/process/submit-checklist.rst: > > 3) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a > comment in the source code that explains the logic of what they are doing > and why. > > in Documentation/process/4.Coding.rst: > > Certain things should always be commented. Uses of memory barriers should > be accompanied by a line explaining why the barrier is necessary. > > but looking in the 3000+ lines of Documentation/memory-barriers.txt won't tell > anyone about that. > > Thanks! I will send a v2 with the below diff if there are no objections? diff --git a/kernel/smp.c b/kernel/smp.c index 5ba4a20ba77d..685829875a3e 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -182,11 +182,21 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); static void __csd_lock_record(call_single_data_t *csd) { if (!csd) { + /* + * Pairs with smp_load_acquire() of cur_csd in + * csd_lock_wait_toolong(): orders any preceding CSD + * callback/unlock before a remote reader observes NULL. + */ smp_store_release(this_cpu_ptr(&cur_csd), NULL); return; } __this_cpu_write(cur_csd_func, csd->func); __this_cpu_write(cur_csd_info, csd->info); + /* + * Pairs with smp_load_acquire() of cur_csd in + * csd_lock_wait_toolong(): publishes cur_csd_func and + * cur_csd_info before the non-NULL pointer becomes visible. + */ smp_store_release(this_cpu_ptr(&cur_csd), csd); } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] smp: Use release stores for csd_lock_record() state 2026-06-18 14:44 ` Usama Arif @ 2026-06-18 14:57 ` Paul E. McKenney 2026-06-18 15:30 ` Usama Arif 0 siblings, 1 reply; 7+ messages in thread From: Paul E. McKenney @ 2026-06-18 14:57 UTC (permalink / raw) To: Usama Arif Cc: Randy Dunlap, Alan Stern, lkmm, joelagnelf, linux-kernel, marco.crivellari, rafael.j.wysocki, riel, sshegde, tglx, ulfh, yury.norov, rcu, shakeel.butt, hannes, kernel-team On Thu, Jun 18, 2026 at 03:44:36PM +0100, Usama Arif wrote: > On 18/06/2026 04:30, Randy Dunlap wrote: > > On 6/17/26 6:44 PM, Alan Stern wrote: > >> On Wed, Jun 17, 2026 at 02:20:01PM -0700, Usama Arif wrote: > >>> __csd_lock_record() publishes per-CPU CSD debug state that is read by > >>> csd_lock_wait_toolong() on another CPU. The remote side first reads > >>> cur_csd with smp_load_acquire() and, when non-NULL, may then read the > >>> matching cur_csd_func and cur_csd_info fields. > >>> > >>> Use smp_store_release() when publishing cur_csd so that the preceding > >>> cur_csd_func and cur_csd_info stores are ordered before the pointer > >>> that csd_lock_wait_toolong() acquires. This replaces the open-coded > >>> smp_wmb() plus plain cur_csd store with the release operation that > >>> matches the smp_load_acquire() in csd_lock_wait_toolong(). > >>> > >>> For the clear path, use smp_store_release(&cur_csd, NULL) so that > >>> clearing the diagnostic state remains ordered after the preceding > >>> callback/unlock work, without requiring a full barrier before the > >>> store. On x86 this removes the locked full barrier from the clear > >>> path; on weaker memory models it uses the release operation needed by > >>> the smp_load_acquire() in csd_lock_wait_toolong(). > >>> > >>> The old code also had smp_mb() calls around cur_csd updates. Those would > >>> only be needed if cur_csd were treated as an exact live-state marker whose > >>> publication had to be observed before callback execution or CSD unlock. > >>> CSD stall warnings do not currently have RCU-style stall-ended checks, so > >>> they already allow the stall to end while diagnostics are being assembled. > >>> The cur_csd record is therefore best-effort diagnostic context, not a > >>> precise completion/stall boundary. > >>> > >>> Signed-off-by: Usama Arif <usama.arif@linux.dev> > >>> --- > >>> kernel/smp.c | 8 ++------ > >>> 1 file changed, 2 insertions(+), 6 deletions(-) > >>> > >>> diff --git a/kernel/smp.c b/kernel/smp.c > >>> index a0bb56bd8dda..5ba4a20ba77d 100644 > >>> --- a/kernel/smp.c > >>> +++ b/kernel/smp.c > >>> @@ -182,16 +182,12 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); > >>> static void __csd_lock_record(call_single_data_t *csd) > >>> { > >>> if (!csd) { > >>> - smp_mb(); /* NULL cur_csd after unlock. */ > >>> - __this_cpu_write(cur_csd, NULL); > >>> + smp_store_release(this_cpu_ptr(&cur_csd), NULL); > >>> return; > >>> } > >>> __this_cpu_write(cur_csd_func, csd->func); > >>> __this_cpu_write(cur_csd_info, csd->info); > >>> - smp_wmb(); /* func and info before csd. */ > >>> - __this_cpu_write(cur_csd, csd); > >>> - smp_mb(); /* Update cur_csd before function call. */ > >>> - /* Or before unlock, as the case may be. */ > >>> + smp_store_release(this_cpu_ptr(&cur_csd), csd); > >> > >> Isn't there a general policy in the kernel that memory barriers should > >> be accompanied by a comment explaining what other memory barriers they > >> synchronize with? Including such comments is a good idea in any case. > > > > in Documentation/process/submit-checklist.rst: > > > > 3) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a > > comment in the source code that explains the logic of what they are doing > > and why. > > > > in Documentation/process/4.Coding.rst: > > > > Certain things should always be commented. Uses of memory barriers should > > be accompanied by a line explaining why the barrier is necessary. > > > > but looking in the 3000+ lines of Documentation/memory-barriers.txt won't tell > > anyone about that. > > > > > > Thanks! > I will send a v2 with the below diff if there are no objections? > > diff --git a/kernel/smp.c b/kernel/smp.c > index 5ba4a20ba77d..685829875a3e 100644 > --- a/kernel/smp.c > +++ b/kernel/smp.c > @@ -182,11 +182,21 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); > static void __csd_lock_record(call_single_data_t *csd) > { > if (!csd) { > + /* > + * Pairs with smp_load_acquire() of cur_csd in > + * csd_lock_wait_toolong(): orders any preceding CSD > + * callback/unlock before a remote reader observes NULL. > + */ Please replace the spaces with tabs. (Probably a copy-pasta issue.) > smp_store_release(this_cpu_ptr(&cur_csd), NULL); smp_store_release(this_cpu_ptr(&cur_csd), NULL); /* ^^^ */ Adding the comment as show above will satisfy tools such as checkpatch. > return; > } > __this_cpu_write(cur_csd_func, csd->func); > __this_cpu_write(cur_csd_info, csd->info); > + /* > + * Pairs with smp_load_acquire() of cur_csd in > + * csd_lock_wait_toolong(): publishes cur_csd_func and > + * cur_csd_info before the non-NULL pointer becomes visible. > + */ > smp_store_release(this_cpu_ptr(&cur_csd), csd); > } The comments look good to me, but I must defer to Alan and Randy. Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] smp: Use release stores for csd_lock_record() state 2026-06-18 14:57 ` Paul E. McKenney @ 2026-06-18 15:30 ` Usama Arif 0 siblings, 0 replies; 7+ messages in thread From: Usama Arif @ 2026-06-18 15:30 UTC (permalink / raw) To: paulmck Cc: Randy Dunlap, Alan Stern, lkmm, joelagnelf, linux-kernel, marco.crivellari, rafael.j.wysocki, riel, sshegde, tglx, ulfh, yury.norov, rcu, shakeel.butt, hannes, kernel-team On 18/06/2026 15:57, Paul E. McKenney wrote: > On Thu, Jun 18, 2026 at 03:44:36PM +0100, Usama Arif wrote: >> On 18/06/2026 04:30, Randy Dunlap wrote: >>> On 6/17/26 6:44 PM, Alan Stern wrote: >>>> On Wed, Jun 17, 2026 at 02:20:01PM -0700, Usama Arif wrote: >>>>> __csd_lock_record() publishes per-CPU CSD debug state that is read by >>>>> csd_lock_wait_toolong() on another CPU. The remote side first reads >>>>> cur_csd with smp_load_acquire() and, when non-NULL, may then read the >>>>> matching cur_csd_func and cur_csd_info fields. >>>>> >>>>> Use smp_store_release() when publishing cur_csd so that the preceding >>>>> cur_csd_func and cur_csd_info stores are ordered before the pointer >>>>> that csd_lock_wait_toolong() acquires. This replaces the open-coded >>>>> smp_wmb() plus plain cur_csd store with the release operation that >>>>> matches the smp_load_acquire() in csd_lock_wait_toolong(). >>>>> >>>>> For the clear path, use smp_store_release(&cur_csd, NULL) so that >>>>> clearing the diagnostic state remains ordered after the preceding >>>>> callback/unlock work, without requiring a full barrier before the >>>>> store. On x86 this removes the locked full barrier from the clear >>>>> path; on weaker memory models it uses the release operation needed by >>>>> the smp_load_acquire() in csd_lock_wait_toolong(). >>>>> >>>>> The old code also had smp_mb() calls around cur_csd updates. Those would >>>>> only be needed if cur_csd were treated as an exact live-state marker whose >>>>> publication had to be observed before callback execution or CSD unlock. >>>>> CSD stall warnings do not currently have RCU-style stall-ended checks, so >>>>> they already allow the stall to end while diagnostics are being assembled. >>>>> The cur_csd record is therefore best-effort diagnostic context, not a >>>>> precise completion/stall boundary. >>>>> >>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev> >>>>> --- >>>>> kernel/smp.c | 8 ++------ >>>>> 1 file changed, 2 insertions(+), 6 deletions(-) >>>>> >>>>> diff --git a/kernel/smp.c b/kernel/smp.c >>>>> index a0bb56bd8dda..5ba4a20ba77d 100644 >>>>> --- a/kernel/smp.c >>>>> +++ b/kernel/smp.c >>>>> @@ -182,16 +182,12 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); >>>>> static void __csd_lock_record(call_single_data_t *csd) >>>>> { >>>>> if (!csd) { >>>>> - smp_mb(); /* NULL cur_csd after unlock. */ >>>>> - __this_cpu_write(cur_csd, NULL); >>>>> + smp_store_release(this_cpu_ptr(&cur_csd), NULL); >>>>> return; >>>>> } >>>>> __this_cpu_write(cur_csd_func, csd->func); >>>>> __this_cpu_write(cur_csd_info, csd->info); >>>>> - smp_wmb(); /* func and info before csd. */ >>>>> - __this_cpu_write(cur_csd, csd); >>>>> - smp_mb(); /* Update cur_csd before function call. */ >>>>> - /* Or before unlock, as the case may be. */ >>>>> + smp_store_release(this_cpu_ptr(&cur_csd), csd); >>>> >>>> Isn't there a general policy in the kernel that memory barriers should >>>> be accompanied by a comment explaining what other memory barriers they >>>> synchronize with? Including such comments is a good idea in any case. >>> >>> in Documentation/process/submit-checklist.rst: >>> >>> 3) All memory barriers {e.g., ``barrier()``, ``rmb()``, ``wmb()``} need a >>> comment in the source code that explains the logic of what they are doing >>> and why. >>> >>> in Documentation/process/4.Coding.rst: >>> >>> Certain things should always be commented. Uses of memory barriers should >>> be accompanied by a line explaining why the barrier is necessary. >>> >>> but looking in the 3000+ lines of Documentation/memory-barriers.txt won't tell >>> anyone about that. >>> >>> >> >> Thanks! >> I will send a v2 with the below diff if there are no objections? >> >> diff --git a/kernel/smp.c b/kernel/smp.c >> index 5ba4a20ba77d..685829875a3e 100644 >> --- a/kernel/smp.c >> +++ b/kernel/smp.c >> @@ -182,11 +182,21 @@ static atomic_t csd_bug_count = ATOMIC_INIT(0); >> static void __csd_lock_record(call_single_data_t *csd) >> { >> if (!csd) { >> + /* >> + * Pairs with smp_load_acquire() of cur_csd in >> + * csd_lock_wait_toolong(): orders any preceding CSD >> + * callback/unlock before a remote reader observes NULL. >> + */ > > Please replace the spaces with tabs. (Probably a copy-pasta issue.) Yes, its tabs in my commit, looks like the email client messed it up. > >> smp_store_release(this_cpu_ptr(&cur_csd), NULL); > > smp_store_release(this_cpu_ptr(&cur_csd), NULL); /* ^^^ */ > > Adding the comment as show above will satisfy tools such as checkpatch. Will do, Thanks! > >> return; >> } >> __this_cpu_write(cur_csd_func, csd->func); >> __this_cpu_write(cur_csd_info, csd->info); >> + /* >> + * Pairs with smp_load_acquire() of cur_csd in >> + * csd_lock_wait_toolong(): publishes cur_csd_func and >> + * cur_csd_info before the non-NULL pointer becomes visible. >> + */ >> smp_store_release(this_cpu_ptr(&cur_csd), csd); >> } > > The comments look good to me, but I must defer to Alan and Randy Thanks! > > Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-18 15:30 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-17 21:20 [PATCH] smp: Use release stores for csd_lock_record() state Usama Arif 2026-06-18 1:44 ` Alan Stern 2026-06-18 3:30 ` Randy Dunlap 2026-06-18 4:01 ` Paul E. McKenney 2026-06-18 14:44 ` Usama Arif 2026-06-18 14:57 ` Paul E. McKenney 2026-06-18 15:30 ` Usama Arif
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.