Linux Trace Kernel
 help / color / mirror / Atom feed
From: Dmitry Ilvokhin <d@ilvokhin.com>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, Steven Rostedt <rostedt@goodmis.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>, Shuah Khan <shuah@kernel.org>,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	kernel-team@cloudflare.com
Subject: Re: [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions
Date: Mon, 18 May 2026 17:01:03 +0000	[thread overview]
Message-ID: <agtFzw0a4SEyfUkr@shell.ilvokhin.com> (raw)
In-Reply-To: <fab7d27a-6c2b-47aa-abe8-a327f05fb5cd@kernel.org>

On Wed, May 13, 2026 at 05:32:41PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 08/05/2026 20.07, Dmitry Ilvokhin wrote:
> > On Fri, May 08, 2026 at 07:40:51PM +0200, Vlastimil Babka (SUSE) wrote:
> > > On 5/8/26 7:38 PM, Vlastimil Babka (SUSE) wrote:
> > > > On 5/8/26 7:29 PM, Andrew Morton wrote:
> > > > > e .configOn Fri,  8 May 2026 18:22:06 +0200 hawk@kernel.org wrote:
> > > > > 
> > > > > > Add tracepoints to the page allocator fast paths that acquire
> > > > > > zone->lock, allowing diagnosis of lock contention in production.
> > > > > 
> > > > > Thanks, I'm surprised we haven't done this yet.
> > > > 
> > > > There was a recent attempt [1]. Not being a generic solution wasn't welcome.
> > > > 
> > > > [1] https://lore.kernel.org/all/cover.1772206930.git.d@ilvokhin.com/
> > > 
> > > And this is the generic solution I think?
> > > 
> > > https://lore.kernel.org/all/cover.1777999826.git.d@ilvokhin.com/
> > 
> > Thanks for cc'ing me, Vlastimil.
> > 
> > Yes, this is an attempt at a generic solution for tracing contended
> > locks, including spinlocks, so it should also cover the use case
> > proposed in this patchset.
> > 
> 
> I'm aware of the generic solution and often use `perf lock contention`.
> And the tool libbpf-tools/klockstat. My experience is unfortunately that
> enabling these tracepoint is prohibitive expensive on production server,
> and production suffers when I run these tools.

I think it depends on the workload: in particular how lock heavy it is.

At Meta we have a lock contention profiler (uses contention_begin and
contention_end tracepoints under the hood) running continiously in the
fleet. It is heavily sampled and each profilling session runs only for
few seconds, but in practice it is usually enough to get a pretty good
understanding what is going on.

That said, I understand the concern, and I can absolutely imagine
workloads where the overhead is still unacceptably high.

> 
> I'm very happy to see a patchset adding a contended case. But I worry
> that tracing all contented locks in the system is also too much to have
> enabled continuously for production.
> 
> This patch is carefully constructed to minimize overhead, such that I
> can enable this continuously on production to catch issues.  If I
> identify issue I will use the generic tracpoints for further debugging.
> 
> 
> > In fact, zone->lock contention was one of the primary motivations for
> > this work.
> 
> In the generic solution I'm loosing the "zone" and pages "count".  I
> need this information to get the answers I'm looking for.  Specifically
> I'm looking at reducing CONFIG_PCP_BATCH_SCALE_MAX, but I want to this
> to be a data-driven decision (my first principle is: if you cannot
> measure it you cannot improve it).
> 
> I'm likely going to apply this patch to our production system, such that
> I can get my data-driven decision.  I need to deploy it widely enough to
> get enough server experiencing direct-reclaim.  I'll report back if
> people are interested in these learning?

I would definitely be interested in hearing about your findings.

> 
> --Jesper

      reply	other threads:[~2026-05-18 17:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 16:22 [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions hawk
2026-05-08 16:22 ` [PATCH 2/2] selftests/mm: add zone->lock tracepoint verification test hawk
2026-05-08 20:15   ` David Hildenbrand (Arm)
2026-05-13 15:00     ` Jesper Dangaard Brouer
2026-05-08 17:29 ` [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions Andrew Morton
2026-05-08 17:38   ` Vlastimil Babka (SUSE)
2026-05-08 17:40     ` Vlastimil Babka (SUSE)
2026-05-08 18:07       ` Dmitry Ilvokhin
2026-05-13 15:32         ` Jesper Dangaard Brouer
2026-05-18 17:01           ` Dmitry Ilvokhin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agtFzw0a4SEyfUkr@shell.ilvokhin.com \
    --to=d@ilvokhin.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=hawk@kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox