public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] KPTI effect on IO performance
@ 2018-01-31  8:23 Ming Lei
  2018-01-31 18:43 ` Scotty Bauer
  2018-02-01 21:51 ` Bart Van Assche
  0 siblings, 2 replies; 5+ messages in thread
From: Ming Lei @ 2018-01-31  8:23 UTC (permalink / raw)
  To: lsf-pc, Linux-scsi, linux-block, linux-nvme

Hi All,

After KPTI is merged, there is extra load introduced to context switch
between user space and kernel space. It is observed on my laptop that one
syscall takes extra ~0.15us[1] compared with 'nopti'.

IO performance is affected too, it is observed that IOPS drops by 32% in
my test[2] on null_blk compared with 'nopti':

randread IOPS on latest linus tree:
-------------------------------------------------
| randread IOPS     | randread IOPS with 'nopti'|	
------------------------------------------------
| 928K              | 1372K                     |	
------------------------------------------------


Two paths are affected, one is IO submission(read, write,... syscall),
another is the IO completion path in which interrupt may be triggered
from user space, and context switch is needed.

So is there something we can do for decreasing the effect on IO performance?

This effect may make Hannes's issue[3] worse, and maybe 'irq poll' should be
used more widely for all high performance IO device, even some optimization
should be considered for KPTI's effect.


[1] http://people.redhat.com/minlei/tests/tools/syscall_speed.c
[2] http://people.redhat.com/minlei/tests/tools/null_perf
[3] [LSF/MM TOPIC] irq affinity handling for high CPU count machines
	https://marc.info/?t=151722156800002&r=1&w=2

Thanks,
Ming

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] KPTI effect on IO performance
  2018-01-31  8:23 [LSF/MM TOPIC] KPTI effect on IO performance Ming Lei
@ 2018-01-31 18:43 ` Scotty Bauer
  2018-02-01  2:35   ` Ming Lei
  2018-02-01  3:05   ` Ming Lei
  2018-02-01 21:51 ` Bart Van Assche
  1 sibling, 2 replies; 5+ messages in thread
From: Scotty Bauer @ 2018-01-31 18:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: lsf-pc, Linux-scsi, linux-block, linux-nvme

On 2018-01-31 01:23, Ming Lei wrote:
> Hi All,
> 
> After KPTI is merged, there is extra load introduced to context switch
> between user space and kernel space. It is observed on my laptop that 
> one
> syscall takes extra ~0.15us[1] compared with 'nopti'.
> 
> IO performance is affected too, it is observed that IOPS drops by 32% 
> in
> my test[2] on null_blk compared with 'nopti':
> 
> randread IOPS on latest linus tree:
> -------------------------------------------------
> | randread IOPS     | randread IOPS with 'nopti'|
> ------------------------------------------------
> | 928K              | 1372K                     |
> ------------------------------------------------
> 
> 

Do you know if your CPU has PCID? It would be interesting to see these 
tests on older CPUs or older kernels without PCID support.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] KPTI effect on IO performance
  2018-01-31 18:43 ` Scotty Bauer
@ 2018-02-01  2:35   ` Ming Lei
  2018-02-01  3:05   ` Ming Lei
  1 sibling, 0 replies; 5+ messages in thread
From: Ming Lei @ 2018-02-01  2:35 UTC (permalink / raw)
  To: Scotty Bauer; +Cc: lsf-pc, Linux-scsi, linux-block, linux-nvme

Hi Scotty,

On Wed, Jan 31, 2018 at 11:43:33AM -0700, Scotty Bauer wrote:
> On 2018-01-31 01:23, Ming Lei wrote:
> > Hi All,
> > 
> > After KPTI is merged, there is extra load introduced to context switch
> > between user space and kernel space. It is observed on my laptop that
> > one
> > syscall takes extra ~0.15us[1] compared with 'nopti'.
> > 
> > IO performance is affected too, it is observed that IOPS drops by 32% in
> > my test[2] on null_blk compared with 'nopti':
> > 
> > randread IOPS on latest linus tree:
> > -------------------------------------------------
> > | randread IOPS     | randread IOPS with 'nopti'|
> > ------------------------------------------------
> > | 928K              | 1372K                     |
> > ------------------------------------------------
> > 
> > 
> 
> Do you know if your CPU has PCID? It would be interesting to see these tests
> on older CPUs or older kernels without PCID support.

My CPU has PCID, which can be retrieved via /proc/cpuinfo.

And the above test is run on same kernel binary, and the result is just done
between 'nopti' and no 'nopti' in kernel command line.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] KPTI effect on IO performance
  2018-01-31 18:43 ` Scotty Bauer
  2018-02-01  2:35   ` Ming Lei
@ 2018-02-01  3:05   ` Ming Lei
  1 sibling, 0 replies; 5+ messages in thread
From: Ming Lei @ 2018-02-01  3:05 UTC (permalink / raw)
  To: Scotty Bauer; +Cc: linux-block, lsf-pc, linux-nvme, Linux-scsi

On Wed, Jan 31, 2018 at 11:43:33AM -0700, Scotty Bauer wrote:
> On 2018-01-31 01:23, Ming Lei wrote:
> > Hi All,
> > 
> > After KPTI is merged, there is extra load introduced to context switch
> > between user space and kernel space. It is observed on my laptop that
> > one
> > syscall takes extra ~0.15us[1] compared with 'nopti'.
> > 
> > IO performance is affected too, it is observed that IOPS drops by 32% in
> > my test[2] on null_blk compared with 'nopti':
> > 
> > randread IOPS on latest linus tree:
> > -------------------------------------------------
> > | randread IOPS     | randread IOPS with 'nopti'|
> > ------------------------------------------------
> > | 928K              | 1372K                     |
> > ------------------------------------------------
> > 
> > 
> 
> Do you know if your CPU has PCID? It would be interesting to see these tests
> on older CPUs or older kernels without PCID support.

BTW, I also saw test data in case of vCPU without PCID, and it is said the syscall
time can be close to ~30X compared with nopti, and the test should be setup easily
by adjust CPU model of Qemu.

So in case of no PCID, KPTI effect on IO performance should be much bigger than the
above data.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM TOPIC] KPTI effect on IO performance
  2018-01-31  8:23 [LSF/MM TOPIC] KPTI effect on IO performance Ming Lei
  2018-01-31 18:43 ` Scotty Bauer
@ 2018-02-01 21:51 ` Bart Van Assche
  1 sibling, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2018-02-01 21:51 UTC (permalink / raw)
  To: Ming Lei, lsf-pc, Linux-scsi, linux-block, linux-nvme

On 01/31/18 00:23, Ming Lei wrote:
> After KPTI is merged, there is extra load introduced to context switch
> between user space and kernel space. It is observed on my laptop that one
> syscall takes extra ~0.15us[1] compared with 'nopti'.
> 
> IO performance is affected too, it is observed that IOPS drops by 32% in
> my test[2] on null_blk compared with 'nopti':
> 
> randread IOPS on latest linus tree:
> -------------------------------------------------
> | randread IOPS     | randread IOPS with 'nopti'|	
> ------------------------------------------------
> | 928K              | 1372K                     |	
> ------------------------------------------------
> 
> 
> Two paths are affected, one is IO submission(read, write,... syscall),
> another is the IO completion path in which interrupt may be triggered
> from user space, and context switch is needed.
> 
> So is there something we can do for decreasing the effect on IO performance?
> 
> This effect may make Hannes's issue[3] worse, and maybe 'irq poll' should be
> used more widely for all high performance IO device, even some optimization
> should be considered for KPTI's effect.

For what kind of workload would you like to improve I/O performance? 
Desktop-style workloads where the only third party code is the code that 
runs in the webbrowser and in the e-mail client or datacenter workloads 
where code from multiple customers runs on the same server? I'm asking 
this because the per-task KPTI work seems very useful to me for 
improving I/O performance for desktop-style workloads. I'm not sure 
however whether that work will be as useful for datacenter workloads. 
See also Willy Tarreau, [PATCH RFC 0/4] Per-task PTI activation 
(https://lkml.org/lkml/2018/1/8/568).

Bart.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-01 21:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-31  8:23 [LSF/MM TOPIC] KPTI effect on IO performance Ming Lei
2018-01-31 18:43 ` Scotty Bauer
2018-02-01  2:35   ` Ming Lei
2018-02-01  3:05   ` Ming Lei
2018-02-01 21:51 ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox