From: David Laight <david.laight.linux@gmail.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
kernel test robot <oliver.sang@intel.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
linux-kernel@vger.kernel.org, x86@kernel.org,
Benjamin Segall <bsegall@google.com>,
Frederic Weisbecker <frederic@kernel.org>
Subject: Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression
Date: Thu, 27 Mar 2025 20:45:12 +0000 [thread overview]
Message-ID: <20250327204512.548d2507@pumpkin> (raw)
In-Reply-To: <CAGudoHE_K4iBHSNjKEPuQxJJ-cNx_8f74dou7qdEb58Pc4eKBQ@mail.gmail.com>
On Thu, 27 Mar 2025 14:48:37 +0100
Mateusz Guzik <mjguzik@gmail.com> wrote:
> On Thu, Mar 27, 2025 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Mar 27, 2025 at 2:43 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> > >
> > > On Thu, Mar 27, 2025 at 2:17 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Thu, Mar 27, 2025 at 2:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >
> > > > > On Thu, Mar 27 2025 at 12:37, Eric Dumazet wrote:
> > > > > > On Thu, Mar 27, 2025 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > > >> Cute. How much bloat does it cause?
> > > > > >
> > > > > > This would expand 'struct ucounts' by 192 bytes on x86, if the patch
> > > > > > was actually working :)
> > > > > >
> > > > > > Note sure if it is feasible without something more intrusive like
> > > > >
> > > > > I'm not sure about the actual benefit. The problem is that parallel
> > > > > invocations which access the same ucount still will run into contention
> > > > > of the cache line they are modifying.
> > > > >
> > > > > For the signal case, all invocations increment rlimit[SIGPENDING], so
> > > > > putting that into a different cache line does not buy a lot.
> > > > >
> > > > > False sharing is when you have a lot of hot path readers on some other
> > > > > member of the data structure, which happens to share the cache line with
> > > > > the modified member. But that's not really the case here.
> > > >
> > > > We applications stressing all the counters at the same time (from
> > > > different threads)
> > > >
> > > > You seem to focus on posix timers only :)
> > >
> > > Well in that case:
> > > (gdb) ptype /o struct ucounts
> > > /* offset | size */ type = struct ucounts {
> > > /* 0 | 16 */ struct hlist_node {
> > > /* 0 | 8 */ struct hlist_node *next;
> > > /* 8 | 8 */ struct hlist_node **pprev;
> > >
> > > /* total size (bytes): 16 */
> > > } node;
> > > /* 16 | 8 */ struct user_namespace *ns;
> > > /* 24 | 4 */ kuid_t uid;
> > > /* 28 | 4 */ atomic_t count;
> > > /* 32 | 96 */ atomic_long_t ucount[12];
> > > /* 128 | 256 */ struct {
> > > /* 0 | 8 */ atomic_long_t val;
> > > } rlimit[4];
> > >
> > > /* total size (bytes): 384 */
> > > }
> > >
> > > This comes from malloc. Given 384 bytes of size it is going to be
> > > backed by a 512-byte sized buffer -- that's a clear cut waste of 128
> > > bytes.
> > >
> > > It is plausible creating a 384-byte sized slab for kmalloc would help
> > > save memory overall (not just for this specific struct), but that
> > > would require extensive testing in real workloads. I think Google is
> > > in position to do it on their fleet and android? fwiw Solaris and
> > > FreeBSD do have slabs of this size and it does save memory over there.
> > > I understand it is a tradeoff, hence I'm not claiming this needs to be
> > > added. I do claim it does warrant evaluation, but I wont blame anyone
> > > for not wanting to do dig into it.
> > >
> > > The other option is to lean into it. In this case I point out the
> > > refcount shares the cacheline with some of the limits and that it
> > > could be moved to a dedicated line while still keeping the struct <
> > > 512 bytes, thus not spending more memory on allocation. the refcount
> > > changes less frequently than limits themselves so it's not a big deal,
> > > but it can be adjusted "for free" if you will.
> > >
> > > while here I would probably change the name of the field. A reference
> > > counter named "count" in a struct named "ucounts", followed by an
> > > "ucount" array is rather unpleasing. How about s/count/refcount?
> >
> >
> > How many 'struct ucounts' are in use in a typical host ?
> >
> > Compared to other costs, this seems pure noise to me.
>
> I did not claim this is going to increase memory usage in a significant manner.
>
> I claim regardless of this change a 384-byte slab for kmalloc may be
> saving memory and this bit may be enough of an excuse to evaluate it,
> should someone be interested.
>
> Apart from that I claim that if the 512-byte is going to be used to
> back the 384 bytes used by the struct, the patch can trivially move
> the refcount to a dedicated cacheline to avoid some of the bouncing
> and still fit in the 512-byte allocation. I see no reason to not do
> it.
>
What about systems with much larger cache lines?
David
prev parent reply other threads:[~2025-03-27 20:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 6:39 [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression kernel test robot
2025-03-26 8:07 ` Thomas Gleixner
2025-03-26 21:11 ` Mateusz Guzik
2025-03-26 21:43 ` Thomas Gleixner
2025-03-27 6:21 ` Eric Dumazet
2025-03-27 8:10 ` Thomas Gleixner
2025-03-27 8:26 ` Eric Dumazet
2025-03-27 9:11 ` Eric Dumazet
2025-03-27 10:50 ` Thomas Gleixner
2025-03-27 11:37 ` Eric Dumazet
2025-03-27 13:14 ` Thomas Gleixner
2025-03-27 13:17 ` Eric Dumazet
2025-03-27 13:43 ` Mateusz Guzik
2025-03-27 13:44 ` Eric Dumazet
2025-03-27 13:48 ` Mateusz Guzik
2025-03-27 20:45 ` David Laight [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250327204512.548d2507@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=bsegall@google.com \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=mjguzik@gmail.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.