* [PATCH v2] trace/hwlat: prevent false sharing in get_sample() @ 2026-02-02 2:58 Colin Lord 2026-02-05 14:47 ` Steven Rostedt 0 siblings, 1 reply; 4+ messages in thread From: Colin Lord @ 2026-02-02 2:58 UTC (permalink / raw) To: linux-kernel, linux-trace-kernel Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Colin Lord The get_sample() function in the hwlat tracer assumes the caller holds hwlat_data.lock, but this is not actually happening. The result is unprotected data access to hwlat_data, and in per-cpu mode can result in false sharing. The false sharing can cause false positive latency events, since the sample_width member is involved and gets read as part of the main latency detection loop. Convert hwlat_data.count to atomic64_t so it can be safely accessed without locking, and prevent false sharing by pulling sample_width into a local variable. One system this was tested on was a dual socket server with 32 CPUs on each numa node. With settings of 1us threshold, 1000us width, and 2000us window, this change reduced the number of latency events from 500 per second down to approximately 1 event per minute. Some machines tested did not exhibit measurable latency from the false sharing. Signed-off-by: Colin Lord <clord@mykolab.com> --- Changes in v2: - convert hwlat_data.count to atomic64_t - leave irqs_disabled block where it originally was, outside of get_sample() Thanks for the v1 review Steve, have updated and retested. cheers, Colin kernel/trace/trace_hwlat.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 2f7b94e98317..3fe274b84f1c 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -102,9 +102,9 @@ struct hwlat_sample { /* keep the global state somewhere. */ static struct hwlat_data { - struct mutex lock; /* protect changes */ + struct mutex lock; /* protect changes */ - u64 count; /* total since reset */ + atomic64_t count; /* total since reset */ u64 sample_window; /* total sampling window (on+off) */ u64 sample_width; /* active sampling portion of window */ @@ -193,8 +193,7 @@ void trace_hwlat_callback(bool enter) * get_sample - sample the CPU TSC and look for likely hardware latencies * * Used to repeatedly capture the CPU TSC (or similar), looking for potential - * hardware-induced latency. Called with interrupts disabled and with - * hwlat_data.lock held. + * hardware-induced latency. Called with interrupts disabled. */ static int get_sample(void) { @@ -204,6 +203,7 @@ static int get_sample(void) time_type start, t1, t2, last_t2; s64 diff, outer_diff, total, last_total = 0; u64 sample = 0; + u64 sample_width = READ_ONCE(hwlat_data.sample_width); u64 thresh = tracing_thresh; u64 outer_sample = 0; int ret = -1; @@ -267,7 +267,7 @@ static int get_sample(void) if (diff > sample) sample = diff; /* only want highest value */ - } while (total <= hwlat_data.sample_width); + } while (total <= sample_width); barrier(); /* finish the above in the view for NMIs */ trace_hwlat_callback_enabled = false; @@ -285,8 +285,7 @@ static int get_sample(void) if (kdata->nmi_total_ts) do_div(kdata->nmi_total_ts, NSEC_PER_USEC); - hwlat_data.count++; - s.seqnum = hwlat_data.count; + s.seqnum = atomic64_inc_return(&hwlat_data.count); s.duration = sample; s.outer_duration = outer_sample; s.nmi_total_ts = kdata->nmi_total_ts; @@ -832,7 +831,7 @@ static int hwlat_tracer_init(struct trace_array *tr) hwlat_trace = tr; - hwlat_data.count = 0; + atomic64_set(&hwlat_data.count, 0); tr->max_latency = 0; save_tracing_thresh = tracing_thresh; base-commit: 24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7 -- 2.51.2 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] trace/hwlat: prevent false sharing in get_sample() 2026-02-02 2:58 [PATCH v2] trace/hwlat: prevent false sharing in get_sample() Colin Lord @ 2026-02-05 14:47 ` Steven Rostedt 2026-02-06 8:09 ` Colin Lord 0 siblings, 1 reply; 4+ messages in thread From: Steven Rostedt @ 2026-02-05 14:47 UTC (permalink / raw) To: Colin Lord Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mathieu Desnoyers On Sun, 1 Feb 2026 18:58:38 -0800 Colin Lord <clord@mykolab.com> wrote: I was about to send this as part of my fixes, but taking a closer look at it, I have more questions. > The get_sample() function in the hwlat tracer assumes the caller holds > hwlat_data.lock, but this is not actually happening. The result is > unprotected data access to hwlat_data, and in per-cpu mode can result in > false sharing. The false sharing can cause false positive latency BTW, what exactly do you mean by "false sharing"? > events, since the sample_width member is involved and gets read as part > of the main latency detection loop. I'm trying to figure out why the change in sample_width would cause any issue. > > Convert hwlat_data.count to atomic64_t so it can be safely accessed > without locking, and prevent false sharing by pulling sample_width into > a local variable. The above still makes sense, but this only effects the seqnum, which may show either duplicate or skipped numbers. But shouldn't affect any of the other data. > > One system this was tested on was a dual socket server with 32 CPUs on > each numa node. With settings of 1us threshold, 1000us width, and > 2000us window, this change reduced the number of latency events from > 500 per second down to approximately 1 event per minute. Some machines > tested did not exhibit measurable latency from the false sharing. Is this because the read of hwlat_data.sample_width is a global variable and could possibly be causing a cache hit latency that shows up on large machines? Is that what you mean by "false sharing"? Oh, and subjects for the tracing subsystem should start with a capital, and should be something like: tracing: Fix get_sample() in hwlat from ... -- Steve > > Signed-off-by: Colin Lord <clord@mykolab.com> > --- > Changes in v2: > - convert hwlat_data.count to atomic64_t > - leave irqs_disabled block where it originally was, outside of > get_sample() > > Thanks for the v1 review Steve, have updated and retested. > > cheers, > Colin > > kernel/trace/trace_hwlat.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c > index 2f7b94e98317..3fe274b84f1c 100644 > --- a/kernel/trace/trace_hwlat.c > +++ b/kernel/trace/trace_hwlat.c > @@ -102,9 +102,9 @@ struct hwlat_sample { > /* keep the global state somewhere. */ > static struct hwlat_data { > > - struct mutex lock; /* protect changes */ > + struct mutex lock; /* protect changes */ > > - u64 count; /* total since reset */ > + atomic64_t count; /* total since reset */ > > u64 sample_window; /* total sampling window (on+off) */ > u64 sample_width; /* active sampling portion of window */ > @@ -193,8 +193,7 @@ void trace_hwlat_callback(bool enter) > * get_sample - sample the CPU TSC and look for likely hardware latencies > * > * Used to repeatedly capture the CPU TSC (or similar), looking for potential > - * hardware-induced latency. Called with interrupts disabled and with > - * hwlat_data.lock held. > + * hardware-induced latency. Called with interrupts disabled. > */ > static int get_sample(void) > { > @@ -204,6 +203,7 @@ static int get_sample(void) > time_type start, t1, t2, last_t2; > s64 diff, outer_diff, total, last_total = 0; > u64 sample = 0; > + u64 sample_width = READ_ONCE(hwlat_data.sample_width); > u64 thresh = tracing_thresh; > u64 outer_sample = 0; > int ret = -1; > @@ -267,7 +267,7 @@ static int get_sample(void) > if (diff > sample) > sample = diff; /* only want highest value */ > > - } while (total <= hwlat_data.sample_width); > + } while (total <= sample_width); > > barrier(); /* finish the above in the view for NMIs */ > trace_hwlat_callback_enabled = false; > @@ -285,8 +285,7 @@ static int get_sample(void) > if (kdata->nmi_total_ts) > do_div(kdata->nmi_total_ts, NSEC_PER_USEC); > > - hwlat_data.count++; > - s.seqnum = hwlat_data.count; > + s.seqnum = atomic64_inc_return(&hwlat_data.count); > s.duration = sample; > s.outer_duration = outer_sample; > s.nmi_total_ts = kdata->nmi_total_ts; > @@ -832,7 +831,7 @@ static int hwlat_tracer_init(struct trace_array *tr) > > hwlat_trace = tr; > > - hwlat_data.count = 0; > + atomic64_set(&hwlat_data.count, 0); > tr->max_latency = 0; > save_tracing_thresh = tracing_thresh; > > > base-commit: 24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] trace/hwlat: prevent false sharing in get_sample() 2026-02-05 14:47 ` Steven Rostedt @ 2026-02-06 8:09 ` Colin Lord 2026-02-06 14:53 ` Steven Rostedt 0 siblings, 1 reply; 4+ messages in thread From: Colin Lord @ 2026-02-06 8:09 UTC (permalink / raw) To: Steven Rostedt, Colin Lord Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mathieu Desnoyers On Thu Feb 5, 2026 at 6:47 AM PST, Steven Rostedt wrote: > On Sun, 1 Feb 2026 18:58:38 -0800 > Colin Lord <clord@mykolab.com> wrote: > > I was about to send this as part of my fixes, but taking a closer look at > it, I have more questions. > >> The get_sample() function in the hwlat tracer assumes the caller holds >> hwlat_data.lock, but this is not actually happening. The result is >> unprotected data access to hwlat_data, and in per-cpu mode can result in >> false sharing. The false sharing can cause false positive latency > > BTW, what exactly do you mean by "false sharing"? > >> events, since the sample_width member is involved and gets read as part >> of the main latency detection loop. > > I'm trying to figure out why the change in sample_width would cause any > issue. > You described it below, but yes, by false sharing I am referring to cache latencies that occur when reading sample_width. In this case, it's primarily due to hwlat_data.count getting modified. Since count and sample_width are just 8B apart from each other, they'll often share a cache line, so one thread incrementing count due to a latency event causes all other running threads to fetch the updated cache line when reading sample_width. On most systems I tested, this doesn't cause high enough latency to show up, but I guess the system mentioned in the commit message has a slow interconnect between the two sockets. Combined with large numbers of threads, a single latency event would cause a cascade of cache related latencies. Another test I did to verify this is what was happening on that system was setting tracing_cpumask to only contain CPUs on the same socket. If I ran 32 threads on the same socket, there were far fewer latency events compared to running 16 on one socket + 16 on the other (same number of total threads just different distribution). >> >> Convert hwlat_data.count to atomic64_t so it can be safely accessed >> without locking, and prevent false sharing by pulling sample_width into >> a local variable. > > The above still makes sense, but this only effects the seqnum, which may > show either duplicate or skipped numbers. But shouldn't affect any of the > other data. > Yes, the modifications made in this commit to hwlat_data.count do not affect the latency. The change is related to the rest of the commit in the sense that it is addressing the incorrect assumption that hwlat_data.lock is held during get_sample(), but it would also make sense to put the hwlat_data.count changes in a separate commit. I could split it up if you'd prefer that. For what it's worth, I did also see many cases of duplicate/skipped sequence numbers on the machine that was affected by this latency issue, but I wasn't sure how detailed I should make my commit message. >> >> One system this was tested on was a dual socket server with 32 CPUs on >> each numa node. With settings of 1us threshold, 1000us width, and >> 2000us window, this change reduced the number of latency events from >> 500 per second down to approximately 1 event per minute. Some machines >> tested did not exhibit measurable latency from the false sharing. > > Is this because the read of hwlat_data.sample_width is a global variable > and could possibly be causing a cache hit latency that shows up on large > machines? > > Is that what you mean by "false sharing"? > > Oh, and subjects for the tracing subsystem should start with a capital, and > should be something like: > > tracing: Fix get_sample() in hwlat from ... Got it, I can make that adjustment and resend. Would you also like me to modify the commit message with any of the above clarifications? thanks, Colin > -- Steve > > >> >> Signed-off-by: Colin Lord <clord@mykolab.com> >> --- >> Changes in v2: >> - convert hwlat_data.count to atomic64_t >> - leave irqs_disabled block where it originally was, outside of >> get_sample() >> >> Thanks for the v1 review Steve, have updated and retested. >> >> cheers, >> Colin >> >> kernel/trace/trace_hwlat.c | 15 +++++++-------- >> 1 file changed, 7 insertions(+), 8 deletions(-) >> >> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c >> index 2f7b94e98317..3fe274b84f1c 100644 >> --- a/kernel/trace/trace_hwlat.c >> +++ b/kernel/trace/trace_hwlat.c >> @@ -102,9 +102,9 @@ struct hwlat_sample { >> /* keep the global state somewhere. */ >> static struct hwlat_data { >> >> - struct mutex lock; /* protect changes */ >> + struct mutex lock; /* protect changes */ >> >> - u64 count; /* total since reset */ >> + atomic64_t count; /* total since reset */ >> >> u64 sample_window; /* total sampling window (on+off) */ >> u64 sample_width; /* active sampling portion of window */ >> @@ -193,8 +193,7 @@ void trace_hwlat_callback(bool enter) >> * get_sample - sample the CPU TSC and look for likely hardware latencies >> * >> * Used to repeatedly capture the CPU TSC (or similar), looking for potential >> - * hardware-induced latency. Called with interrupts disabled and with >> - * hwlat_data.lock held. >> + * hardware-induced latency. Called with interrupts disabled. >> */ >> static int get_sample(void) >> { >> @@ -204,6 +203,7 @@ static int get_sample(void) >> time_type start, t1, t2, last_t2; >> s64 diff, outer_diff, total, last_total = 0; >> u64 sample = 0; >> + u64 sample_width = READ_ONCE(hwlat_data.sample_width); >> u64 thresh = tracing_thresh; >> u64 outer_sample = 0; >> int ret = -1; >> @@ -267,7 +267,7 @@ static int get_sample(void) >> if (diff > sample) >> sample = diff; /* only want highest value */ >> >> - } while (total <= hwlat_data.sample_width); >> + } while (total <= sample_width); >> >> barrier(); /* finish the above in the view for NMIs */ >> trace_hwlat_callback_enabled = false; >> @@ -285,8 +285,7 @@ static int get_sample(void) >> if (kdata->nmi_total_ts) >> do_div(kdata->nmi_total_ts, NSEC_PER_USEC); >> >> - hwlat_data.count++; >> - s.seqnum = hwlat_data.count; >> + s.seqnum = atomic64_inc_return(&hwlat_data.count); >> s.duration = sample; >> s.outer_duration = outer_sample; >> s.nmi_total_ts = kdata->nmi_total_ts; >> @@ -832,7 +831,7 @@ static int hwlat_tracer_init(struct trace_array *tr) >> >> hwlat_trace = tr; >> >> - hwlat_data.count = 0; >> + atomic64_set(&hwlat_data.count, 0); >> tr->max_latency = 0; >> save_tracing_thresh = tracing_thresh; >> >> >> base-commit: 24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] trace/hwlat: prevent false sharing in get_sample() 2026-02-06 8:09 ` Colin Lord @ 2026-02-06 14:53 ` Steven Rostedt 0 siblings, 0 replies; 4+ messages in thread From: Steven Rostedt @ 2026-02-06 14:53 UTC (permalink / raw) To: Colin Lord Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mathieu Desnoyers On Fri, 06 Feb 2026 00:09:49 -0800 "Colin Lord" <clord@mykolab.com> wrote: > > tracing: Fix get_sample() in hwlat from ... > > Got it, I can make that adjustment and resend. Would you also like me to > modify the commit message with any of the above clarifications? Yes please. Mention the cache sharing issues and that you saw missed/duplicate seqnums too. Not everyone knows what "false sharing" is (I forgot what it was ;). You can use the term, but also explain it in terms of cache like you did in this email. Thanks, -- Steve ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-06 14:52 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-02 2:58 [PATCH v2] trace/hwlat: prevent false sharing in get_sample() Colin Lord 2026-02-05 14:47 ` Steven Rostedt 2026-02-06 8:09 ` Colin Lord 2026-02-06 14:53 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox