From: Jakub Wartak <jakub.wartak@mailbox.org>
To: "anthony.yznaga@oracle.com" <anthony.yznaga@oracle.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"andreyknvl@gmail.com" <andreyknvl@gmail.com>,
"arnd@arndb.de" <arnd@arndb.de>,
"brauner@kernel.org" <brauner@kernel.org>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"dave.hansen@intel.com" <dave.hansen@intel.com>,
"david@redhat.com" <david@redhat.com>,
"ebiederm@xmission.com" <ebiederm@xmission.com>,
"khalid@kernel.org" <khalid@kernel.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"luto@kernel.org" <luto@kernel.org>,
"markhemm@googlemail.com" <markhemm@googlemail.com>,
"maz@kernel.org" <maz@kernel.org>,
"mhiramat@kernel.org" <mhiramat@kernel.org>,
"neilb@suse.de" <neilb@suse.de>,
"pcc@google.com" <pcc@google.com>,
"rostedt@goodmis.org" <rostedt@goodmis.org>,
"vasily.averin@linux.dev" <vasily.averin@linux.dev>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"willy@infradead.org" <willy@infradead.org>,
"xhao@linux.alibaba.com" <xhao@linux.alibaba.com>
Subject: Re: [PATCH v2 00/20] Add support for shared PTEs across processes
Date: Wed, 18 Jun 2025 12:21:07 +0200 (CEST) [thread overview]
Message-ID: <1968222200.618999.1750242067613@office-sso.mailbox.org> (raw)
Hi all,
I wanted to share some results. I modified PostgreSQL (master) to use the proposed here msharefs patchset (v2) on top of linux-6.14.7 kernel as I suspected sharing PTEs might be helpful in some cases, especially with high process counts. Traditionally in PostgreSQL having process counts is an anti-pattern and it's not recommended (for various reasons) to have that many backends (process) running, but I was researching for the exact reasons why (there are plenty others too), but in short that's how I suspected dTLB misses, followed up on PTEs and finally arrived here: msharefs.
I've tried it on a couple scenarios and it always helps (+5% .. 40%) in artificial pgbench readonly measurements on any machine, but here I'm posting results:
a. from some properly isolated legacy SMP box in homelab (4s32c64/4xNUMA nodes, Xeon 46xx, 128GB RAM)
b. PostgreSQL's pgbench OLTP-like benchmark was used with -c $c -j 64 -S -T 60 -P 1
c. PostgreSQLs shared_buffers(shared_memory)=32GB
d. pgbench -i -s 2000 (~31GB, all used data was in shared memory, not in VFS cache, to avoid syscalls),
e. no hugepages were used as msharefs seems to not support it yet (but Anthony already told me he's on it)
f. I've used cpupower with perf governor, D0 and no_turbo as well and data was prewarmed.
Again, having PostgreSQL with 8k or 16k processes is not the way to go, but it illustrates well that fork() model (1 client = 1 process) can really benefit from msharefs:
shared_memory_type=mmap (default on Linux is mmap(MAP_SHARED)+fork())
c=8000 tps = 143-150k (~4s to init all conns)
c=16000 tps = 130-140k (~50s-70s! to init all conns! had to extend benchmark, lots of fork()!)
shared_memory_type=msharefs (literally same as above, open()/fallocate()/ioctl()/mmap()+fork()):
c=8000 tps = ~189k (3s to init all conns)
c=16000 tps = ~189k (6s to init all conns)
That's 1.35x - 1.45x.
Illustrative sample of 1 second of `perf stat -a -e ...` during those run with 16k processes:
# mmap:
# time counts unit events
190.223101118 15257144598 cycles
190.223101118 10485389437 instructions # 0.69 insn per cycle
190.223101118 34413 context-switches
190.223101118 703 cpu-migrations
190.223101118 0 major-faults
190.223101118 256302 minor-faults
190.223101118 3922621887 dTLB-loads
190.223101118 12520660 dTLB-load-misses # 0.32% of all dTLB cache accesses
# msharefs:
# time counts unit events
105.122916131 15256454170 cycles
105.122916131 10732582790 instructions # 0.70 insn per cycle
105.122916131 38420 context-switches
105.122916131 1125 cpu-migrations
105.122916131 0 major-faults
105.122916131 34304 minor-faults
105.122916131 4143569524 dTLB-loads
105.122916131 12179260 dTLB-load-misses # 0.29% of all dTLB cache accesses
On smaller hardware and single socket there are also such gains even on the lower process counts, but the more process are running concurrently and accessing shared memory the bigger the performance boost. I hope this feedback is useful (so it's not only lowering memory use for PTEs, but also quite a nice perf. boost). I would like too to thank Anthony and Khalid for answering some initial questions outside mailing list.
BTW I have not yet posted it to PostgreSQL main hacking mailing list, well... because there's no kernel in the first place to support that ;)
-J.
p.s. I'm not subscribed to linux-mm, so please CC me.
next reply other threads:[~2025-06-18 10:21 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-18 10:21 Jakub Wartak [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-04-04 2:18 [PATCH v2 00/20] Add support for shared PTEs across processes Anthony Yznaga
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1968222200.618999.1750242067613@office-sso.mailbox.org \
--to=jakub.wartak@mailbox.org \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=ebiederm@xmission.com \
--cc=khalid@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=markhemm@googlemail.com \
--cc=maz@kernel.org \
--cc=mhiramat@kernel.org \
--cc=neilb@suse.de \
--cc=pcc@google.com \
--cc=rostedt@goodmis.org \
--cc=vasily.averin@linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=xhao@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).