[Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe)
@ 2018-02-25 22:52 Huaicheng Li
  2018-02-26  8:45 ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Huaicheng Li @ 2018-02-25 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Fam Zheng, Paolo Bonzini

Hi all,

The project would be about utilizing shadow doorbell buffer features in
NVMe 1.3 to enable QEMU side polling for virtualized NVMe device, thus
achieving comparable performance as in virtio-dataplane.

**Why not virtio?**
The reason is many industrial/academic researchers uses QEMU NVMe as a
performance platform for research/product prototyping. NVMe interface is
better in the rich features it provides than virtio interface. If we can
make QEMU NVMe performance competent with virtio, it will benefit a lot of
communities.

**Doable?**
NVMe spec 1.3 introduces a shadow doorbell buffer which is aimed for
virtual NVMe controller optimizations. QEMU can certainly utilize this
feature to reduce or even eliminate VM-exits triggered by doorbell writes.

I remember there were some discussions back in 2015 about this, but I don't
see it finally done. For this project, I think we can go in three steps:
(1). add the shadow doorbell buffer support into QEMU NVMe emulation, this
will reduce # of VM-exits. (2). replace current timers used by QEMU NVMe
with a separate polling thread, thus we can completely eliminate VM-exits.
(3). Even further, we can adapt the architecture to use one polling thread
for each NVMe queue pair, thus it's possible to provide more performance.
(step 3 can be left for next year if the workload is too much for 3 months).

Actually, I have an initial implementation over step (1)(2) and would like
to work more on it to push it upstream. More information is in this papper,
(Section 3.1 and Figure 2-left),
http://ucare.cs.uchicago.edu/pdf/fast18-femu.pdf

Comments are welcome.

Thanks.

Best,
Huaicheng

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe)
  2018-02-25 22:52 [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe) Huaicheng Li
@ 2018-02-26  8:45 ` Paolo Bonzini
  2018-02-27  9:05   ` Huaicheng Li
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2018-02-26  8:45 UTC (permalink / raw)
  To: Huaicheng Li, qemu-devel; +Cc: Stefan Hajnoczi, Fam Zheng

On 25/02/2018 23:52, Huaicheng Li wrote:
> I remember there were some discussions back in 2015 about this, but I
> don't see it finally done. For this project, I think we can go in three
> steps: (1). add the shadow doorbell buffer support into QEMU NVMe
> emulation, this will reduce # of VM-exits. (2). replace current timers
> used by QEMU NVMe with a separate polling thread, thus we can completely
> eliminate VM-exits. (3). Even further, we can adapt the architecture to
> use one polling thread for each NVMe queue pair, thus it's possible to
> provide more performance. (step 3 can be left for next year if the
> workload is too much for 3 months).

Slightly rephrased:

(1) add shadow doorbell buffer and ioeventfd support into QEMU NVMe
emulation, which will reduce # of VM-exits and make them less expensive
(reduce VCPU latency.

(2) add iothread support to QEMU NVMe emulation.  This can also be used
to eliminate VM-exits because iothreads can do adaptive polling.

(1) and (2) seem okay for at most 1.5 months, especially if you already
have experience with QEMU.

For (3), there is work in progress to add multiqueue support to QEMU's
block device layer.  We're hoping to get the infrastructure part in
(removing the AioContext lock) during the first half of 2018.  As you
say, we can see what the workload will be.

Including a RAM disk backend in QEMU would be nice too, and it may
interest you as it would reduce the delta between upstream QEMU and
FEMU.  So this could be another idea.

However, the main issue that I'd love to see tackled is interrupt
mitigation.  With higher rates of I/O ops and high queue depth (e.g.
32), it's common for the guest to become slower when you introduce
optimizations in QEMU.  The reason is that lower latency causes higher
interrupt rates and that in turn slows down the guest.  If you have any
ideas on how to work around this, I would love to hear about it.

In any case, I would very much like to mentor this project.  Let me know
if you have any more ideas on how to extend it!

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe)
  2018-02-26  8:45 ` Paolo Bonzini
@ 2018-02-27  9:05   ` Huaicheng Li
  2018-02-27 11:04     ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Huaicheng Li @ 2018-02-27  9:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi, Fam Zheng

Hi Paolo,

Slightly rephrased:
> (1) add shadow doorbell buffer and ioeventfd support into QEMU NVMe
> emulation, which will reduce # of VM-exits and make them less expensive
> (reduce VCPU latency.
> (2) add iothread support to QEMU NVMe emulation.  This can also be used
> to eliminate VM-exits because iothreads can do adaptive polling.
> (1) and (2) seem okay for at most 1.5 months, especially if you already
> have experience with QEMU.


Thanks a lot for rephrasing it to make it more clear.

Yes, I think (1)(2) should be achievable in 1-1.5month. What needs to be
added based on FEMU includes: ioeventfd support to QEMU NVMe, and
use iothread for polling (current FEMU implementation uses a periodic
timer to poll shadow buffer directly, moving to iothread would deliver
better
performance).

Including a RAM disk backend in QEMU would be nice too, and it may
> interest you as it would reduce the delta between upstream QEMU and
> FEMU.  So this could be another idea.


Glad you're also interested in this part. This can definitely be part of the
project.

For (3), there is work in progress to add multiqueue support to QEMU's
> block device layer.  We're hoping to get the infrastructure part in
> (removing the AioContext lock) during the first half of 2018.  As you
> say, we can see what the workload will be.


Thanks for letting me know this. Could you provide a link to the on-going
multiqueue implementation? I would like to learn how this is done. :)

However, the main issue that I'd love to see tackled is interrupt
> mitigation.  With higher rates of I/O ops and high queue depth (e.g.
> 32), it's common for the guest to become slower when you introduce
> optimizations in QEMU.  The reason is that lower latency causes higher
> interrupt rates and that in turn slows down the guest.  If you have any
> ideas on how to work around this, I would love to hear about it.


Yeah, indeed interrupt overhead (host-to-guest notification) is a headache.
I thought about this, and one intuitive optimization in my mind is to add
interrupt
coalescing support into QEMU NVMe. We may use some heuristic to batch
I/O completions back to guest, thus reducing # of interrupts. The heuristic
can be time-window based (i.e., for I/Os completed in the same time window,
we only do one interrupt for each CQ).

I believe there are several research papers that can achieve direct
interrupt
delivery without exits for para-virtual devices, but those need KVM side
modifications. It might be not a good fit here.


In any case, I would very much like to mentor this project.  Let me know
> if you have any more ideas on how to extend it!


Great to know that you'd like to mentor the project! If so, can we make it
an official project idea and put it on QEMU GSoC page?

Thank you so much for the feedbacks and agreeing to be a potential mentor
for this
project. I'm happy to see that you also think this is something that's worth
putting efforts into.

Best,
Huaicheng

On Mon, Feb 26, 2018 at 2:45 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 25/02/2018 23:52, Huaicheng Li wrote:
> > I remember there were some discussions back in 2015 about this, but I
> > don't see it finally done. For this project, I think we can go in three
> > steps: (1). add the shadow doorbell buffer support into QEMU NVMe
> > emulation, this will reduce # of VM-exits. (2). replace current timers
> > used by QEMU NVMe with a separate polling thread, thus we can completely
> > eliminate VM-exits. (3). Even further, we can adapt the architecture to
> > use one polling thread for each NVMe queue pair, thus it's possible to
> > provide more performance. (step 3 can be left for next year if the
> > workload is too much for 3 months).
>
> Slightly rephrased:
>
> (1) add shadow doorbell buffer and ioeventfd support into QEMU NVMe
> emulation, which will reduce # of VM-exits and make them less expensive
> (reduce VCPU latency.
>
> (2) add iothread support to QEMU NVMe emulation.  This can also be used
> to eliminate VM-exits because iothreads can do adaptive polling.
>
> (1) and (2) seem okay for at most 1.5 months, especially if you already
> have experience with QEMU.
>
> For (3), there is work in progress to add multiqueue support to QEMU's
> block device layer.  We're hoping to get the infrastructure part in
> (removing the AioContext lock) during the first half of 2018.  As you
> say, we can see what the workload will be.
>
> Including a RAM disk backend in QEMU would be nice too, and it may
> interest you as it would reduce the delta between upstream QEMU and
> FEMU.  So this could be another idea.
>
> However, the main issue that I'd love to see tackled is interrupt
> mitigation.  With higher rates of I/O ops and high queue depth (e.g.
> 32), it's common for the guest to become slower when you introduce
> optimizations in QEMU.  The reason is that lower latency causes higher
> interrupt rates and that in turn slows down the guest.  If you have any
> ideas on how to work around this, I would love to hear about it.
>
> In any case, I would very much like to mentor this project.  Let me know
> if you have any more ideas on how to extend it!
>
> Paolo
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe)
  2018-02-27  9:05   ` Huaicheng Li
@ 2018-02-27 11:04     ` Paolo Bonzini
  2018-02-27 14:36       ` Huaicheng Li
  2018-02-27 16:35       ` Stefan Hajnoczi
  0 siblings, 2 replies; 7+ messages in thread
From: Paolo Bonzini @ 2018-02-27 11:04 UTC (permalink / raw)
  To: Huaicheng Li; +Cc: qemu-devel, Stefan Hajnoczi, Fam Zheng

On 27/02/2018 10:05, Huaicheng Li wrote:
>     Including a RAM disk backend in QEMU would be nice too, and it may
>     interest you as it would reduce the delta between upstream QEMU and
>     FEMU.  So this could be another idea.
> 
> Glad you're also interested in this part. This can definitely be part of the
> project.
> 
>     For (3), there is work in progress to add multiqueue support to QEMU's
>     block device layer.  We're hoping to get the infrastructure part in
>     (removing the AioContext lock) during the first half of 2018.  As you
>     say, we can see what the workload will be.
> 
> Thanks for letting me know this. Could you provide a link to the on-going
> multiqueue implementation? I would like to learn how this is done. :)

Well, there is no multiqueue implementation yet, but for now you can see
a lot of work in block/ regarding making drivers and BlockDriverState
thread safe.  We can't just do it for null-co:// so we have a little
preparatory work to do. :)

>     However, the main issue that I'd love to see tackled is interrupt
>     mitigation.  With higher rates of I/O ops and high queue depth (e.g.
>     32), it's common for the guest to become slower when you introduce
>     optimizations in QEMU.  The reason is that lower latency causes higher
>     interrupt rates and that in turn slows down the guest.  If you have any
>     ideas on how to work around this, I would love to hear about it.
> 
> Yeah, indeed interrupt overhead (host-to-guest notification) is a headache.
> I thought about this, and one intuitive optimization in my mind is to add
> interrupt coalescing support into QEMU NVMe. We may use some heuristic to batch
> I/O completions back to guest, thus reducing # of interrupts. The heuristic
> can be time-window based (i.e., for I/Os completed in the same time window,
> we only do one interrupt for each CQ).
> 
> I believe there are several research papers that can achieve direct interrupt
> delivery without exits for para-virtual devices, but those need KVM side
> modifications. It might be not a good fit here.  

No, indeed.  But the RAM disk backend and interrupt coalescing (for
either NVMe or virtio-blk... or maybe a generic scheme that can be
reused by virtio-net and others too!) is a good idea for the third part
of the project.

>     In any case, I would very much like to mentor this project.  Let me know
>     if you have any more ideas on how to extend it!
> 
> 
> Great to know that you'd like to mentor the project! If so, can we make it
> an official project idea and put it on QEMU GSoC page?

Submissions need not come from the QEMU GSoC page.  You are free to
submit any idea that you think can be worthwhile.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe)
  2018-02-27 11:04     ` Paolo Bonzini
@ 2018-02-27 14:36       ` Huaicheng Li
  2018-02-27 16:35       ` Stefan Hajnoczi
  1 sibling, 0 replies; 7+ messages in thread
From: Huaicheng Li @ 2018-02-27 14:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi, Fam Zheng

Sounds great. Thanks!

On Tue, Feb 27, 2018 at 5:04 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 27/02/2018 10:05, Huaicheng Li wrote:
> >     Including a RAM disk backend in QEMU would be nice too, and it may
> >     interest you as it would reduce the delta between upstream QEMU and
> >     FEMU.  So this could be another idea.
> >
> > Glad you're also interested in this part. This can definitely be part of
> the
> > project.
> >
> >     For (3), there is work in progress to add multiqueue support to
> QEMU's
> >     block device layer.  We're hoping to get the infrastructure part in
> >     (removing the AioContext lock) during the first half of 2018.  As you
> >     say, we can see what the workload will be.
> >
> > Thanks for letting me know this. Could you provide a link to the on-going
> > multiqueue implementation? I would like to learn how this is done. :)
>
> Well, there is no multiqueue implementation yet, but for now you can see
> a lot of work in block/ regarding making drivers and BlockDriverState
> thread safe.  We can't just do it for null-co:// so we have a little
> preparatory work to do. :)
>
> >     However, the main issue that I'd love to see tackled is interrupt
> >     mitigation.  With higher rates of I/O ops and high queue depth (e.g.
> >     32), it's common for the guest to become slower when you introduce
> >     optimizations in QEMU.  The reason is that lower latency causes
> higher
> >     interrupt rates and that in turn slows down the guest.  If you have
> any
> >     ideas on how to work around this, I would love to hear about it.
> >
> > Yeah, indeed interrupt overhead (host-to-guest notification) is a
> headache.
> > I thought about this, and one intuitive optimization in my mind is to add
> > interrupt coalescing support into QEMU NVMe. We may use some heuristic
> to batch
> > I/O completions back to guest, thus reducing # of interrupts. The
> heuristic
> > can be time-window based (i.e., for I/Os completed in the same time
> window,
> > we only do one interrupt for each CQ).
> >
> > I believe there are several research papers that can achieve direct
> interrupt
> > delivery without exits for para-virtual devices, but those need KVM side
> > modifications. It might be not a good fit here.
>
> No, indeed.  But the RAM disk backend and interrupt coalescing (for
> either NVMe or virtio-blk... or maybe a generic scheme that can be
> reused by virtio-net and others too!) is a good idea for the third part
> of the project.
>
> >     In any case, I would very much like to mentor this project.  Let me
> know
> >     if you have any more ideas on how to extend it!
> >
> >
> > Great to know that you'd like to mentor the project! If so, can we make
> it
> > an official project idea and put it on QEMU GSoC page?
>
> Submissions need not come from the QEMU GSoC page.  You are free to
> submit any idea that you think can be worthwhile.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe)
  2018-02-27 11:04     ` Paolo Bonzini
  2018-02-27 14:36       ` Huaicheng Li
@ 2018-02-27 16:35       ` Stefan Hajnoczi
  2018-03-01 18:12         ` Huaicheng Li
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-02-27 16:35 UTC (permalink / raw)
  To: Paolo Bonzini, Huaicheng Li; +Cc: qemu-devel, Fam Zheng

[-- Attachment #1: Type: text/plain, Size: 1043 bytes --]

On Tue, Feb 27, 2018 at 12:04:48PM +0100, Paolo Bonzini wrote:
> On 27/02/2018 10:05, Huaicheng Li wrote:
> > Great to know that you'd like to mentor the project! If so, can we make it
> > an official project idea and put it on QEMU GSoC page?
> 
> Submissions need not come from the QEMU GSoC page.  You are free to
> submit any idea that you think can be worthwhile.

Please follow the process described here:

https://wiki.qemu.org/Google_Summer_of_Code_2018#How_to_propose_a_custom_project_idea

The project idea needs to be posted on the wiki page.

Huaicheng & Paolo, please fill out this template so we can add it to the
wiki page:

=== TITLE ===

 '''Summary:''' Short description of the project

 Detailed description of the project.

 '''Links:'''
 * Wiki links to relevant material
 * External links to mailing lists or web sites

 '''Details:'''
 * Skill level: beginner or intermediate or advanced
 * Language: C
 * Mentor: Email address and IRC nick
 * Suggested by: Person who suggested the idea

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe)
  2018-02-27 16:35       ` Stefan Hajnoczi
@ 2018-03-01 18:12         ` Huaicheng Li
  0 siblings, 0 replies; 7+ messages in thread
From: Huaicheng Li @ 2018-03-01 18:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel

Hi Stefan,

Paolo and I have filled the template. And Paolo has helped update the wiki
with this proposed project idea.

Thanks.

Best,
Huaicheng


On Tue, Feb 27, 2018 at 10:35 AM, Stefan Hajnoczi <stefanha@gmail.com>
wrote:

> On Tue, Feb 27, 2018 at 12:04:48PM +0100, Paolo Bonzini wrote:
> > On 27/02/2018 10:05, Huaicheng Li wrote:
> > > Great to know that you'd like to mentor the project! If so, can we
> make it
> > > an official project idea and put it on QEMU GSoC page?
> >
> > Submissions need not come from the QEMU GSoC page.  You are free to
> > submit any idea that you think can be worthwhile.
>
> Please follow the process described here:
>
> https://wiki.qemu.org/Google_Summer_of_Code_2018#How_to_
> propose_a_custom_project_idea
>
> The project idea needs to be posted on the wiki page.
>
> Huaicheng & Paolo, please fill out this template so we can add it to the
> wiki page:
>
> === TITLE ===
>
>  '''Summary:''' Short description of the project
>
>  Detailed description of the project.
>
>  '''Links:'''
>  * Wiki links to relevant material
>  * External links to mailing lists or web sites
>
>  '''Details:'''
>  * Skill level: beginner or intermediate or advanced
>  * Language: C
>  * Mentor: Email address and IRC nick
>  * Suggested by: Person who suggested the idea
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-03-01 18:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-25 22:52 [Qemu-devel] QEMU GSoC 2018 Project Idea (Apply polling to QEMU NVMe) Huaicheng Li
2018-02-26  8:45 ` Paolo Bonzini
2018-02-27  9:05   ` Huaicheng Li
2018-02-27 11:04     ` Paolo Bonzini
2018-02-27 14:36       ` Huaicheng Li
2018-02-27 16:35       ` Stefan Hajnoczi
2018-03-01 18:12         ` Huaicheng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).