linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Wartak <jakub.wartak@mailbox.org>
To: "anthony.yznaga@oracle.com" <anthony.yznaga@oracle.com>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"andreyknvl@gmail.com" <andreyknvl@gmail.com>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"dave.hansen@intel.com" <dave.hansen@intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"ebiederm@xmission.com" <ebiederm@xmission.com>,
	"khalid@kernel.org" <khalid@kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"luto@kernel.org" <luto@kernel.org>,
	"markhemm@googlemail.com" <markhemm@googlemail.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"mhiramat@kernel.org" <mhiramat@kernel.org>,
	"neilb@suse.de" <neilb@suse.de>,
	"pcc@google.com" <pcc@google.com>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>,
	"vasily.averin@linux.dev" <vasily.averin@linux.dev>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"willy@infradead.org" <willy@infradead.org>,
	"xhao@linux.alibaba.com" <xhao@linux.alibaba.com>
Subject: Re: [PATCH v2 00/20] Add support for shared PTEs across processes
Date: Wed, 18 Jun 2025 12:21:07 +0200 (CEST)	[thread overview]
Message-ID: <1968222200.618999.1750242067613@office-sso.mailbox.org> (raw)

Hi all,

I wanted to share some results. I modified PostgreSQL (master) to use the proposed here msharefs patchset (v2) on top of linux-6.14.7 kernel as I suspected sharing PTEs might be helpful in some cases, especially with high process counts. Traditionally in PostgreSQL having process counts is an anti-pattern and it's not recommended (for various reasons) to have that many backends (process) running, but I was researching for the exact reasons why (there are plenty others too), but in short that's how I suspected dTLB misses, followed up on PTEs and finally arrived here: msharefs.

I've tried it on a couple scenarios and it always helps (+5% .. 40%) in artificial pgbench readonly measurements on any machine, but here I'm posting results:
a. from some properly isolated legacy SMP box in homelab (4s32c64/4xNUMA nodes, Xeon 46xx, 128GB RAM)
b. PostgreSQL's pgbench OLTP-like benchmark was used with -c $c -j 64 -S -T 60 -P 1
c. PostgreSQLs shared_buffers(shared_memory)=32GB
d. pgbench -i -s 2000 (~31GB, all used data was in shared memory, not in VFS cache, to avoid syscalls),
e. no hugepages were used as msharefs seems to not support it yet (but Anthony already told me he's on it) 
f. I've used cpupower with perf governor, D0 and no_turbo as well and data was prewarmed.

Again, having PostgreSQL with 8k or 16k processes is not the way to go, but it illustrates well that fork() model (1 client = 1 process) can really benefit from msharefs:

shared_memory_type=mmap (default on Linux is mmap(MAP_SHARED)+fork())
 c=8000 tps  = 143-150k (~4s to init all conns)
 c=16000 tps = 130-140k (~50s-70s! to init all conns! had to extend benchmark, lots of fork()!)

shared_memory_type=msharefs (literally same as above, open()/fallocate()/ioctl()/mmap()+fork()):
 c=8000 tps  = ~189k (3s to init all conns)
 c=16000 tps = ~189k (6s to init all conns)

That's 1.35x - 1.45x.

Illustrative sample of 1 second of `perf stat -a -e ...` during those run with 16k processes:

# mmap:
#           time             counts unit events
   190.223101118        15257144598      cycles
   190.223101118        10485389437      instructions                     #    0.69  insn per cycle
   190.223101118              34413      context-switches
   190.223101118                703      cpu-migrations
   190.223101118                  0      major-faults
   190.223101118             256302      minor-faults
   190.223101118         3922621887      dTLB-loads
   190.223101118           12520660      dTLB-load-misses                 #    0.32% of all dTLB cache accesses
   
# msharefs:
#           time             counts unit events
   105.122916131        15256454170      cycles
   105.122916131        10732582790      instructions                     #    0.70  insn per cycle
   105.122916131              38420      context-switches
   105.122916131               1125      cpu-migrations
   105.122916131                  0      major-faults
   105.122916131              34304      minor-faults
   105.122916131         4143569524      dTLB-loads
   105.122916131           12179260      dTLB-load-misses                 #    0.29% of all dTLB cache accesses 

On smaller hardware and single socket there are also such gains even on the lower process counts, but the more process are running concurrently and accessing shared memory the bigger the performance boost. I hope this feedback is useful (so it's not only lowering memory use for PTEs, but also quite a nice perf. boost). I would like too to thank Anthony and Khalid for answering some initial questions outside mailing list.

BTW I have not yet posted it to PostgreSQL main hacking mailing list, well... because there's no kernel in the first place to support that ;)

-J.

p.s. I'm not subscribed to linux-mm, so please CC me.

             reply	other threads:[~2025-06-18 10:21 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-18 10:21 Jakub Wartak [this message]
  -- strict thread matches above, loose matches on Subject: below --
2025-04-04  2:18 [PATCH v2 00/20] Add support for shared PTEs across processes Anthony Yznaga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1968222200.618999.1750242067613@office-sso.mailbox.org \
    --to=jakub.wartak@mailbox.org \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=khalid@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=markhemm@googlemail.com \
    --cc=maz@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=neilb@suse.de \
    --cc=pcc@google.com \
    --cc=rostedt@goodmis.org \
    --cc=vasily.averin@linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=xhao@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).