* riscv syscall performance regression
@ 2024-02-23 5:28 Wu, Fei
2024-02-27 1:15 ` Guo Ren
2024-08-13 12:51 ` Alexandre Ghiti
0 siblings, 2 replies; 4+ messages in thread
From: Wu, Fei @ 2024-02-23 5:28 UTC (permalink / raw)
To: linux-riscv, linux-kernel, guoren, fei2.wu
Hi All,
I am doing some performance regression testing on a sophgo machine, the
unixbench syscall benchmark drops 14% from 6.1 to 6.6. This change
should be due to commit f0bddf50 riscv: entry: Convert to generic entry.
I know it's a tradeoff, just checking if it's been discussed already and
any improvement can be done.
The unixbench benchmark I used is:
$ ./syscall 10 getpid
The dynamic instruction count per syscall is increased from ~200 to
~250, this should be the key factor so I switch to test it on system
QEMU to avoid porting different versions on sophgo, and use plugin
libinsn.so to count the instructions. There are a few background noises
during test but the impact should be limited. This is dyninst count per
syscall I got:
* commit d0db02c6 (right before the change): ~200
* commit f0bddf50 (the change): ~250
* commit ffd2cb6b (latest upstream): ~250
Any comment?
Thanks,
Fei.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: riscv syscall performance regression
2024-02-23 5:28 riscv syscall performance regression Wu, Fei
@ 2024-02-27 1:15 ` Guo Ren
2024-08-13 12:51 ` Alexandre Ghiti
1 sibling, 0 replies; 4+ messages in thread
From: Guo Ren @ 2024-02-27 1:15 UTC (permalink / raw)
To: Wu, Fei; +Cc: linux-riscv, linux-kernel
On Fri, Feb 23, 2024 at 1:29 PM Wu, Fei <fei2.wu@intel.com> wrote:
>
> Hi All,
>
> I am doing some performance regression testing on a sophgo machine, the
> unixbench syscall benchmark drops 14% from 6.1 to 6.6. This change
> should be due to commit f0bddf50 riscv: entry: Convert to generic entry.
> I know it's a tradeoff, just checking if it's been discussed already and
> any improvement can be done.
>
> The unixbench benchmark I used is:
> $ ./syscall 10 getpid
>
> The dynamic instruction count per syscall is increased from ~200 to
> ~250, this should be the key factor so I switch to test it on system
> QEMU to avoid porting different versions on sophgo, and use plugin
> libinsn.so to count the instructions. There are a few background noises
> during test but the impact should be limited. This is dyninst count per
> syscall I got:
>
> * commit d0db02c6 (right before the change): ~200
> * commit f0bddf50 (the change): ~250
> * commit ffd2cb6b (latest upstream): ~250
>
> Any comment?
1. I think this is about generic entry performance, all architectures
should move to that framework and improve the generic entry
performance together.
2. Another point is there are added sched functions in the generic
entry code, so using a simple empty syscall can't show the benefit of
generic entry.
3. Could we use vdso to improve getpid?
PS:
Now, the syscall arguments are using pt_regs instead of
syscall_wrapper, which broke the rv32 syscall, ref:
https://github.com/T-head-Semi/linux/pull/5
>
> Thanks,
> Fei.
--
Best Regards
Guo Ren
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: riscv syscall performance regression
2024-02-23 5:28 riscv syscall performance regression Wu, Fei
2024-02-27 1:15 ` Guo Ren
@ 2024-08-13 12:51 ` Alexandre Ghiti
2024-08-13 19:15 ` Charlie Jenkins
1 sibling, 1 reply; 4+ messages in thread
From: Alexandre Ghiti @ 2024-08-13 12:51 UTC (permalink / raw)
To: Wu, Fei, linux-riscv, linux-kernel, guoren, Björn Töpel
Hi Fei,
On 23/02/2024 06:28, Wu, Fei wrote:
> Hi All,
>
> I am doing some performance regression testing on a sophgo machine, the
> unixbench syscall benchmark drops 14% from 6.1 to 6.6. This change
> should be due to commit f0bddf50 riscv: entry: Convert to generic entry.
> I know it's a tradeoff, just checking if it's been discussed already and
> any improvement can be done.
>
> The unixbench benchmark I used is:
> $ ./syscall 10 getpid
>
> The dynamic instruction count per syscall is increased from ~200 to
> ~250, this should be the key factor so I switch to test it on system
> QEMU to avoid porting different versions on sophgo, and use plugin
> libinsn.so to count the instructions. There are a few background noises
> during test but the impact should be limited. This is dyninst count per
> syscall I got:
>
> * commit d0db02c6 (right before the change): ~200
> * commit f0bddf50 (the change): ~250
> * commit ffd2cb6b (latest upstream): ~250
>
> Any comment?
>
> Thanks,
> Fei.
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
So I finally took some time to look into this. Indeed the conversion to
the generic entry introduced the overhead you observe.
The numbers I get are similar:
* commit d0db02c6 (right before the change): 185
* 6.11-rc3: 245
I dived a bit deeper and noticed that we could regain ~40 instructions
by inlining syscall_exit_to_user_mode() and do_trap_ecall_u():
- we used to intercept the syscall trap but now it's dealt with in the
exception vector, not sure if we can inline do_trap_ecall_u()
- I quickly tried to inline syscall_exit_to_user_mode() but it pulls
quite a few functions and I failed to do so.
Note that a recent effort already inlined most of the common entry
functions already
https://lore.kernel.org/all/20231218074520.1998026-1-svens@linux.ibm.com/
The remaining instructions are caused by:
* the vector extension handling. It won't improve the above numbers
because the test does not use the vector extension, but we could improve
__riscv_v_vstate_discard() as mentioned in commit 9657e9b7d253 ("riscv:
Discard vector state on syscalls")
* the random kernel stack offset
I'll add some performance regressions in my CI in the near future :)
Thanks,
Alex
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: riscv syscall performance regression
2024-08-13 12:51 ` Alexandre Ghiti
@ 2024-08-13 19:15 ` Charlie Jenkins
0 siblings, 0 replies; 4+ messages in thread
From: Charlie Jenkins @ 2024-08-13 19:15 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: Wu, Fei, linux-riscv, linux-kernel, guoren, Björn Töpel
On Tue, Aug 13, 2024 at 02:51:09PM +0200, Alexandre Ghiti wrote:
> Hi Fei,
>
> On 23/02/2024 06:28, Wu, Fei wrote:
> > Hi All,
> >
> > I am doing some performance regression testing on a sophgo machine, the
> > unixbench syscall benchmark drops 14% from 6.1 to 6.6. This change
> > should be due to commit f0bddf50 riscv: entry: Convert to generic entry.
> > I know it's a tradeoff, just checking if it's been discussed already and
> > any improvement can be done.
> >
> > The unixbench benchmark I used is:
> > $ ./syscall 10 getpid
> >
> > The dynamic instruction count per syscall is increased from ~200 to
> > ~250, this should be the key factor so I switch to test it on system
> > QEMU to avoid porting different versions on sophgo, and use plugin
> > libinsn.so to count the instructions. There are a few background noises
> > during test but the impact should be limited. This is dyninst count per
> > syscall I got:
> >
> > * commit d0db02c6 (right before the change): ~200
> > * commit f0bddf50 (the change): ~250
> > * commit ffd2cb6b (latest upstream): ~250
> >
> > Any comment?
> >
> > Thanks,
> > Fei.
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
> So I finally took some time to look into this. Indeed the conversion to the
> generic entry introduced the overhead you observe.
>
> The numbers I get are similar:
>
> * commit d0db02c6 (right before the change): 185
>
> * 6.11-rc3: 245
>
> I dived a bit deeper and noticed that we could regain ~40 instructions by
> inlining syscall_exit_to_user_mode() and do_trap_ecall_u():
>
> - we used to intercept the syscall trap but now it's dealt with in the
> exception vector, not sure if we can inline do_trap_ecall_u()
> - I quickly tried to inline syscall_exit_to_user_mode() but it pulls quite a
> few functions and I failed to do so.
>
> Note that a recent effort already inlined most of the common entry functions
> already
> https://lore.kernel.org/all/20231218074520.1998026-1-svens@linux.ibm.com/
>
> The remaining instructions are caused by:
>
> * the vector extension handling. It won't improve the above numbers because
> the test does not use the vector extension, but we could improve
> __riscv_v_vstate_discard() as mentioned in commit 9657e9b7d253 ("riscv:
> Discard vector state on syscalls")
> * the random kernel stack offset
>
> I'll add some performance regressions in my CI in the near future :)
>
> Thanks,
>
> Alex
I have written patches to do this inlining but haven't sent it out yet.
I don't know a good way of showing performance improvement so I have
been hesistant to send it. It is generic so showing the improvement on
x86 is probably the best. I have also written some patches for cleaning
up some of the other syscall handling but again haven't been able to
show performance numbers. I was going to use a thead board but was
unable to get it to boot on an up-to-date kernel as I posted about here
[1]. The patches here [2] should also show improvements.
I can try to get some numbers and send out the patches.
Link: https://lore.kernel.org/linux-arm-kernel/ZoydV7vad5JWIcZb@ghost/
[1]
Link:
https://patchwork.kernel.org/project/linux-riscv/cover/20240720171232.1753-1-jszhang@kernel.org/
[2]
- Charlie
>
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-08-13 19:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-23 5:28 riscv syscall performance regression Wu, Fei
2024-02-27 1:15 ` Guo Ren
2024-08-13 12:51 ` Alexandre Ghiti
2024-08-13 19:15 ` Charlie Jenkins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox