Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Shakeel Butt <shakeelb@google.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Jeff Layton <jlayton@kernel.org>,
	 Chuck Lever <chuck.lever@oracle.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org,  Jens Axboe <axboe@kernel.dk>,
	Tejun Heo <tj@kernel.org>,
	Vasily Averin <vasily.averin@linux.dev>,
	 Michal Koutny <mkoutny@suse.com>,
	Waiman Long <longman@redhat.com>,
	 Muchun Song <muchun.song@linux.dev>,
	Jiri Kosina <jikos@kernel.org>,
	cgroups@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
Date: Thu, 18 Jan 2024 23:47:51 -0800	[thread overview]
Message-ID: <CALvZod7T=gops1B6gU3M7rOJ8D2mOrSwQ2hfpLaE-tNWZynAug@mail.gmail.com> (raw)
In-Reply-To: <ZahSlnqw9yRo3d1v@P9FQF9L96D.corp.robot.car>

On Wed, Jan 17, 2024 at 2:20 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Wed, Jan 17, 2024 at 01:02:19PM -0800, Shakeel Butt wrote:
> > On Wed, Jan 17, 2024 at 12:21 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > > >
> > > > That's a good point.  If the microbenchmark isn't likely to be even
> > > > remotely realistic, maybe we should just revert the revert until if/when
> > > > somebody shows a real world impact.
> > > >
> > > > Linus, any objections to that?
> > >
> > > We use SLAB_ACCOUNT for much more common allocations like queued
> > > signals, so I would tend to agree with Jeff that it's probably just
> > > some not very interesting microbenchmark that shows any file locking
> > > effects from SLAB_ALLOC, not any real use.
> > >
> > > That said, those benchmarks do matter. It's very easy to say "not
> > > relevant in the big picture" and then the end result is that
> > > everything is a bit of a pig.
> > >
> > > And the regression was absolutely *ENORMOUS*. We're not talking "a few
> > > percent". We're talking a 33% regression that caused the revert:
> > >
> > >    https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
> > >
> > > I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every
> > > single allocation, it would be much nicer to account at a bigger
> > > granularity, possibly by having per-thread counters first before
> > > falling back to the obj_cgroup_charge. Whatever.
> > >
> > > It's kind of stupid to have a benchmark that just allocates and
> > > deallocates a file lock in quick succession spend lots of time
> > > incrementing and decrementing cgroup charges for that repeated
> > > alloc/free.
> > >
> > > However, that problem with SLAB_ACCOUNT is not the fault of file
> > > locking, but more of a slab issue.
> > >
> > > End result: I think we should bring in Vlastimil and whoever else is
> > > doing SLAB_ACCOUNT things, and have them look at that side.
> > >
> > > And then just enable SLAB_ACCOUNT for file locks. But very much look
> > > at silly costs in SLAB_ACCOUNT first, at least for trivial
> > > "alloc/free" patterns..
> > >
> > > Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT
> > > thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for
> > > file lock caches") for the history here.
> > >
> >
> > Roman last looked into optimizing this code path. I suspect
> > mod_objcg_state() to be more costly than obj_cgroup_charge(). I will
> > try to measure this path and see if I can improve it.
>
> It's roughly an equal split between mod_objcg_state() and obj_cgroup_charge().
> And each is comparable (by order of magnitude) to the slab allocation cost
> itself. On the free() path a significant cost comes simple from reading
> the objcg pointer (it's usually a cache miss).
>
> So I don't see how we can make it really cheap (say, less than 5% overhead)
> without caching pre-accounted objects.
>
> I thought about merging of charge and stats handling paths, which _maybe_ can
> shave off another 20-30%, but there still will be a double-digit% accounting
> overhead.
>
> I'm curious to hear other ideas and suggestions.
>
> Thanks!

I profiled (perf record -a) the same benchmark i.e. lock1_processes on
an icelake machine with 72 cores and got the following results:

  12.72%  lock1_processes  [kernel.kallsyms]   [k] mod_objcg_state
  10.89%  lock1_processes  [kernel.kallsyms]   [k] kmem_cache_free
   8.40%  lock1_processes  [kernel.kallsyms]   [k] slab_post_alloc_hook
   8.36%  lock1_processes  [kernel.kallsyms]   [k] kmem_cache_alloc
   5.18%  lock1_processes  [kernel.kallsyms]   [k] refill_obj_stock
   5.18%  lock1_processes  [kernel.kallsyms]   [k] _copy_from_user

On annotating mod_objcg_state(), the following irq disabling
instructions are taking 30% of its time.

  6.64 │       pushfq
 10.26│       popq   -0x38(%rbp)
  6.05 │       mov    -0x38(%rbp),%rcx
  7.60 │       cli

For kmem_cache_free() & kmem_cache_alloc(), the following instruction
was expensive, which corresponds to __update_cpu_freelist_fast().

 16.33 │      cmpxchg16b %gs:(%rsi)

For slab_post_alloc_hook(), it's all over the place and
refill_obj_stock() is very similar to mod_objcg_state().

I will dig more in the next couple of days.

next prev parent reply	other threads:[~2024-01-19  7:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 16:14 [PATCH RFC 0/4] Fix file lock cache accounting, again Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
2024-01-17 19:00   ` Jeff Layton
2024-01-17 19:39     ` Josh Poimboeuf
2024-01-17 20:20       ` Linus Torvalds
2024-01-17 21:02         ` Shakeel Butt
2024-01-17 22:20           ` Roman Gushchin
2024-01-17 22:56             ` Shakeel Butt
2024-01-22  5:10               ` Linus Torvalds
2024-01-22 17:38                 ` Shakeel Butt
2024-01-26  9:50                 ` Vlastimil Babka
2024-01-30 11:04                   ` Vlastimil Babka
2024-01-19  7:47             ` Shakeel Butt [this message]
2024-01-17 21:19         ` Vlastimil Babka
2024-01-17 21:50         ` Roman Gushchin
2024-01-18  9:49     ` Michal Hocko
2024-01-17 16:14 ` [PATCH RFC 2/4] fs/locks: Add CONFIG_FLOCK_ACCOUNTING Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 3/4] mitigations: Expand 'mitigations=off' to include optional software mitigations Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off' Josh Poimboeuf
2024-01-18  9:04   ` Michal Koutný

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod7T=gops1B6gU3M7rOJ8D2mOrSwQ2hfpLaE-tNWZynAug@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=jikos@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).