From: Ingo Molnar <mingo@kernel.org>
To: Bo Li <libo.gcs85@bytedance.com>
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, luto@kernel.org,
kees@kernel.org, akpm@linux-foundation.org, david@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
peterz@infradead.org, dietmar.eggemann@arm.com, hpa@zytor.com,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, viro@zeniv.linux.org.uk,
brauner@kernel.org, jack@suse.cz, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
jannh@google.com, pfalcato@suse.de, riel@surriel.com,
harry.yoo@oracle.com, linux-kernel@vger.kernel.org,
linux-perf-users@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, duanxiongchun@bytedance.com,
yinhongbo@bytedance.com, dengliang.1214@bytedance.com,
xieyongji@bytedance.com, chaiwen.cc@bytedance.com,
songmuchun@bytedance.com, yuanzhu@bytedance.com,
chengguozhu@bytedance.com, sunjiadong.lff@bytedance.com
Subject: Re: [RFC v2 00/35] optimize cost of inter-process communication
Date: Sat, 31 May 2025 09:16:35 +0200 [thread overview]
Message-ID: <aDqs0_c_vq96EWW6@gmail.com> (raw)
In-Reply-To: <cover.1748594840.git.libo.gcs85@bytedance.com>
* Bo Li <libo.gcs85@bytedance.com> wrote:
> # Performance
>
> To quantify the performance improvements driven by RPAL, we measured
> latency both before and after its deployment. Experiments were
> conducted on a server equipped with two Intel(R) Xeon(R) Platinum
> 8336C CPUs (2.30 GHz) and 1 TB of memory. Latency was defined as the
> duration from when the client thread initiates a message to when the
> server thread is invoked and receives it.
>
> During testing, the client transmitted 1 million 32-byte messages, and we
> computed the per-message average latency. The results are as follows:
>
> *****************
> Without RPAL: Message length: 32 bytes, Total TSC cycles: 19616222534,
> Message count: 1000000, Average latency: 19616 cycles
> With RPAL: Message length: 32 bytes, Total TSC cycles: 1703459326,
> Message count: 1000000, Average latency: 1703 cycles
> *****************
>
> These results confirm that RPAL delivers substantial latency
> improvements over the current epoll implementation—achieving a
> 17,913-cycle reduction (an ~91.3% improvement) for 32-byte messages.
No, these results do not necessarily confirm that.
19,616 cycles per message on a vanilla kernel on a 2.3 GHz CPU suggests
a messaging performance of 117k messages/second or 8.5 usecs/message,
which is *way* beyond typical kernel interprocess communication
latencies on comparable CPUs:
root@localhost:~# taskset 1 perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 2.790 [sec]
2.790614 usecs/op
358344 ops/sec
And my 2.8 usecs result was from a kernel running inside a KVM sandbox
...
( I used 'taskset' to bind the benchmark to a single CPU, to remove any
inter-CPU migration noise from the measurement. )
The scheduler parts of your series simply try to remove much of
scheduler and context switching functionality to create a special
fast-path with no FPU context switching and TLB flushing AFAICS, for
the purposes of message latency benchmarking in essence, and you then
compare it against the full scheduling and MM context switching costs
of full-blown Linux processes.
I'm not convinced, at all, that this many changes are required to speed
up the usecase you are trying to optimize:
> 61 files changed, 9710 insertions(+), 4 deletions(-)
Nor am I convinced that 9,700 lines of *new* code of a parallel
facility are needed, crudely wrapped in 1970s technology (#ifdefs),
instead of optimizing/improving facilities we already have...
So NAK for the scheduler bits, until proven otherwise (and presented in
a clean fashion, which the current series is very far from).
I'll be the first one to acknowledge that our process and MM context
switching overhead is too high and could be improved, and I have no
objections against the general goal of improving Linux inter-process
messaging performance either, I only NAK this particular
implementation/approach.
Thanks,
Ingo
next prev parent reply other threads:[~2025-05-31 7:16 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-30 9:27 [RFC v2 00/35] optimize cost of inter-process communication Bo Li
2025-05-30 9:27 ` [RFC v2 01/35] Kbuild: rpal support Bo Li
2025-05-30 9:27 ` [RFC v2 02/35] RPAL: add struct rpal_service Bo Li
2025-05-30 9:27 ` [RFC v2 03/35] RPAL: add service registration interface Bo Li
2025-05-30 9:27 ` [RFC v2 04/35] RPAL: add member to task_struct and mm_struct Bo Li
2025-05-30 9:27 ` [RFC v2 05/35] RPAL: enable virtual address space partitions Bo Li
2025-05-30 9:27 ` [RFC v2 06/35] RPAL: add user interface Bo Li
2025-05-30 9:27 ` [RFC v2 07/35] RPAL: enable shared page mmap Bo Li
2025-05-30 9:27 ` [RFC v2 08/35] RPAL: enable sender/receiver registration Bo Li
2025-05-30 9:27 ` [RFC v2 09/35] RPAL: enable address space sharing Bo Li
2025-05-30 9:27 ` [RFC v2 10/35] RPAL: allow service enable/disable Bo Li
2025-05-30 9:27 ` [RFC v2 11/35] RPAL: add service request/release Bo Li
2025-05-30 9:27 ` [RFC v2 12/35] RPAL: enable service disable notification Bo Li
2025-05-30 9:27 ` [RFC v2 13/35] RPAL: add tlb flushing support Bo Li
2025-05-30 9:27 ` [RFC v2 14/35] RPAL: enable page fault handling Bo Li
2025-05-30 13:59 ` Dave Hansen
2025-05-30 9:27 ` [RFC v2 15/35] RPAL: add sender/receiver state Bo Li
2025-05-30 9:27 ` [RFC v2 16/35] RPAL: add cpu lock interface Bo Li
2025-05-30 9:27 ` [RFC v2 17/35] RPAL: add a mapping between fsbase and tasks Bo Li
2025-05-30 9:27 ` [RFC v2 18/35] sched: pick a specified task Bo Li
2025-05-30 9:27 ` [RFC v2 19/35] RPAL: add lazy switch main logic Bo Li
2025-05-30 9:27 ` [RFC v2 20/35] RPAL: add rpal_ret_from_lazy_switch Bo Li
2025-05-30 9:27 ` [RFC v2 21/35] RPAL: add kernel entry handling for lazy switch Bo Li
2025-05-30 9:27 ` [RFC v2 22/35] RPAL: rebuild receiver state Bo Li
2025-05-30 9:27 ` [RFC v2 23/35] RPAL: resume cpumask when fork Bo Li
2025-05-30 9:27 ` [RFC v2 24/35] RPAL: critical section optimization Bo Li
2025-05-30 9:27 ` [RFC v2 25/35] RPAL: add MPK initialization and interface Bo Li
2025-05-30 9:27 ` [RFC v2 26/35] RPAL: enable MPK support Bo Li
2025-05-30 17:03 ` Dave Hansen
2025-05-30 9:27 ` [RFC v2 27/35] RPAL: add epoll support Bo Li
2025-05-30 9:27 ` [RFC v2 28/35] RPAL: add rpal_uds_fdmap() support Bo Li
2025-05-30 9:27 ` [RFC v2 29/35] RPAL: fix race condition in pkru update Bo Li
2025-05-30 9:27 ` [RFC v2 30/35] RPAL: fix pkru setup when fork Bo Li
2025-05-30 9:27 ` [RFC v2 31/35] RPAL: add receiver waker Bo Li
2025-05-30 9:28 ` [RFC v2 32/35] RPAL: fix unknown nmi on AMD CPU Bo Li
2025-05-30 9:28 ` [RFC v2 33/35] RPAL: enable time slice correction Bo Li
2025-05-30 9:28 ` [RFC v2 34/35] RPAL: enable fast epoll wait Bo Li
2025-05-30 9:28 ` [RFC v2 35/35] samples/rpal: add RPAL samples Bo Li
2025-05-30 9:33 ` [RFC v2 00/35] optimize cost of inter-process communication Lorenzo Stoakes
2025-06-03 8:22 ` Bo Li
2025-06-03 9:22 ` Lorenzo Stoakes
2025-05-30 9:41 ` Pedro Falcato
2025-05-30 9:56 ` David Hildenbrand
2025-05-30 22:42 ` Andrew Morton
2025-05-31 7:16 ` Ingo Molnar [this message]
2025-06-03 17:49 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aDqs0_c_vq96EWW6@gmail.com \
--to=mingo@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=bsegall@google.com \
--cc=chaiwen.cc@bytedance.com \
--cc=chengguozhu@bytedance.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dengliang.1214@bytedance.com \
--cc=dietmar.eggemann@arm.com \
--cc=duanxiongchun@bytedance.com \
--cc=harry.yoo@oracle.com \
--cc=hpa@zytor.com \
--cc=irogers@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jolsa@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=kan.liang@linux.intel.com \
--cc=kees@kernel.org \
--cc=libo.gcs85@bytedance.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=songmuchun@bytedance.com \
--cc=sunjiadong.lff@bytedance.com \
--cc=surenb@google.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vincent.guittot@linaro.org \
--cc=viro@zeniv.linux.org.uk \
--cc=vschneid@redhat.com \
--cc=x86@kernel.org \
--cc=xieyongji@bytedance.com \
--cc=yinhongbo@bytedance.com \
--cc=yuanzhu@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).