* (unknown),
@ 2009-07-23 5:38 Haneef Syed
2009-07-23 5:50 ` your mail Gleb Natapov
0 siblings, 1 reply; 14+ messages in thread
From: Haneef Syed @ 2009-07-23 5:38 UTC (permalink / raw)
To: kvm
HI All,
I have taken kvm-22 with linux-2.6.24 kernel but when ever i install guest
through qemu bins, system hangs.
In dmesg it prints as "Unable to handle NULL derefrencing pointer".
Please suggest me why it is behaving like this
______________________________________________________________________
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2009-07-23 5:38 (unknown), Haneef Syed
@ 2009-07-23 5:50 ` Gleb Natapov
2009-07-23 6:09 ` Haneef Syed
0 siblings, 1 reply; 14+ messages in thread
From: Gleb Natapov @ 2009-07-23 5:50 UTC (permalink / raw)
To: Haneef Syed; +Cc: kvm
On Thu, Jul 23, 2009 at 11:08:41AM +0530, Haneef Syed wrote:
> HI All,
>
> I have taken kvm-22 with linux-2.6.24 kernel but when ever i install guest
> through qemu bins, system hangs.
>
> In dmesg it prints as "Unable to handle NULL derefrencing pointer".
>
> Please suggest me why it is behaving like this
>
I would suggest you to use more recent KVM. The latest one is kvm-88.
There were a couple of bug fixes and enhancements in those 66 version.
--
Gleb.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2009-07-23 5:50 ` your mail Gleb Natapov
@ 2009-07-23 6:09 ` Haneef Syed
2009-07-23 6:14 ` Gleb Natapov
0 siblings, 1 reply; 14+ messages in thread
From: Haneef Syed @ 2009-07-23 6:09 UTC (permalink / raw)
To: Gleb Natapov; +Cc: kvm, kvm-owner
Is kvm-88 is compatible with linux-2.6.24 kernel..???
______________________________________________________________________
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2009-07-23 6:09 ` Haneef Syed
@ 2009-07-23 6:14 ` Gleb Natapov
0 siblings, 0 replies; 14+ messages in thread
From: Gleb Natapov @ 2009-07-23 6:14 UTC (permalink / raw)
To: Haneef Syed; +Cc: kvm
On Thu, Jul 23, 2009 at 11:39:55AM +0530, Haneef Syed wrote:
> Is kvm-88 is compatible with linux-2.6.24 kernel..???
>
It may be. Some degree of compatibility with older kernels is
maintained. Try it.
http://www.linux-kvm.org/page/Code
--
Gleb.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-19 14:41 (unknown) Maxim Levitsky
@ 2019-03-19 15:22 ` Keith Busch
2019-03-19 23:49 ` Chaitanya Kulkarni
` (2 more replies)
2019-03-21 16:13 ` Stefan Hajnoczi
1 sibling, 3 replies; 14+ messages in thread
From: Keith Busch @ 2019-03-19 15:22 UTC (permalink / raw)
To: Maxim Levitsky
Cc: linux-nvme, linux-kernel, kvm, Jens Axboe, Alex Williamson,
Keith Busch, Christoph Hellwig, Sagi Grimberg, Kirti Wankhede,
David S . Miller, Mauro Carvalho Chehab, Greg Kroah-Hartman,
Wolfram Sang, Nicolas Ferre, Paul E . McKenney , Paolo Bonzini,
Liang Cunming, Liu Changpeng, Fam Zheng, Amnon Ilan,
John Ferlan <jfe
On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> -> Share the NVMe device between host and guest.
> Even in fully virtualized configurations,
> some partitions of nvme device could be used by guests as block devices
> while others passed through with nvme-mdev to achieve balance between
> all features of full IO stack emulation and performance.
>
> -> NVME-MDEV is a bit faster due to the fact that in-kernel driver
> can send interrupts to the guest directly without a context
> switch that can be expensive due to meltdown mitigation.
>
> -> Is able to utilize interrupts to get reasonable performance.
> This is only implemented
> as a proof of concept and not included in the patches,
> but interrupt driven mode shows reasonable performance
>
> -> This is a framework that later can be used to support NVMe devices
> with more of the IO virtualization built-in
> (IOMMU with PASID support coupled with device that supports it)
Would be very interested to see the PASID support. You wouldn't even
need to mediate the IO doorbells or translations if assigning entire
namespaces, and should be much faster than the shadow doorbells.
I think you should send 6/9 "nvme/pci: init shadow doorbell after each
reset" separately for immediate inclusion.
I like the idea in principle, but it will take me a little time to get
through reviewing your implementation. I would have guessed we could
have leveraged something from the existing nvme/target for the mediating
controller register access and admin commands. Maybe even start with
implementing an nvme passthrough namespace target type (we currently
have block and file).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-19 15:22 ` your mail Keith Busch
@ 2019-03-19 23:49 ` Chaitanya Kulkarni
2019-03-20 16:44 ` Maxim Levitsky
2019-03-20 16:30 ` Maxim Levitsky
2019-04-08 10:04 ` Maxim Levitsky
2 siblings, 1 reply; 14+ messages in thread
From: Chaitanya Kulkarni @ 2019-03-19 23:49 UTC (permalink / raw)
To: Keith Busch, Maxim Levitsky
Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm@vger.kernel.org,
Wolfram Sang, Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
David S . Miller, Jens Axboe, Alex Williamson, Kirti Wankhede,
Mauro Carvalho Chehab, Paolo Bonzini, Liu Changpeng,
Paul E . McKenney
Hi Keith,
On 03/19/2019 08:21 AM, Keith Busch wrote:
> On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
>> -> Share the NVMe device between host and guest.
>> Even in fully virtualized configurations,
>> some partitions of nvme device could be used by guests as block devices
>> while others passed through with nvme-mdev to achieve balance between
>> all features of full IO stack emulation and performance.
>>
>> -> NVME-MDEV is a bit faster due to the fact that in-kernel driver
>> can send interrupts to the guest directly without a context
>> switch that can be expensive due to meltdown mitigation.
>>
>> -> Is able to utilize interrupts to get reasonable performance.
>> This is only implemented
>> as a proof of concept and not included in the patches,
>> but interrupt driven mode shows reasonable performance
>>
>> -> This is a framework that later can be used to support NVMe devices
>> with more of the IO virtualization built-in
>> (IOMMU with PASID support coupled with device that supports it)
>
> Would be very interested to see the PASID support. You wouldn't even
> need to mediate the IO doorbells or translations if assigning entire
> namespaces, and should be much faster than the shadow doorbells.
>
> I think you should send 6/9 "nvme/pci: init shadow doorbell after each
> reset" separately for immediate inclusion.
>
> I like the idea in principle, but it will take me a little time to get
> through reviewing your implementation. I would have guessed we could
> have leveraged something from the existing nvme/target for the mediating
> controller register access and admin commands. Maybe even start with
> implementing an nvme passthrough namespace target type (we currently
> have block and file).
I have the code for the NVMeOf target passthru-ctrl, I think we can use
that as it is if you are looking for the passthru for NVMeOF.
I'll post patch-series based on the latest code base soon.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-19 15:22 ` your mail Keith Busch
2019-03-19 23:49 ` Chaitanya Kulkarni
@ 2019-03-20 16:30 ` Maxim Levitsky
2019-03-20 17:03 ` Keith Busch
2019-04-08 10:04 ` Maxim Levitsky
2 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2019-03-20 16:30 UTC (permalink / raw)
To: Keith Busch
Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
Liu Changpeng, Paul E . McKenney, Amnon Ilan, Christoph Hellwig,
John Ferlan
On Tue, 2019-03-19 at 09:22 -0600, Keith Busch wrote:
> On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> > -> Share the NVMe device between host and guest.
> > Even in fully virtualized configurations,
> > some partitions of nvme device could be used by guests as block
> > devices
> > while others passed through with nvme-mdev to achieve balance between
> > all features of full IO stack emulation and performance.
> >
> > -> NVME-MDEV is a bit faster due to the fact that in-kernel driver
> > can send interrupts to the guest directly without a context
> > switch that can be expensive due to meltdown mitigation.
> >
> > -> Is able to utilize interrupts to get reasonable performance.
> > This is only implemented
> > as a proof of concept and not included in the patches,
> > but interrupt driven mode shows reasonable performance
> >
> > -> This is a framework that later can be used to support NVMe devices
> > with more of the IO virtualization built-in
> > (IOMMU with PASID support coupled with device that supports it)
>
> Would be very interested to see the PASID support. You wouldn't even
> need to mediate the IO doorbells or translations if assigning entire
> namespaces, and should be much faster than the shadow doorbells.
I fully agree with that.
Note that to enable PASID support two things have to happen in this vendor.
1. Mature support for IOMMU with PASID support. On Intel side I know that they
only have a spec released and currently the kernel bits to support it are
placed.
I still don't know when a product actually supporting this spec is going to be
released. For other vendors (ARM/AMD/) I haven't done yet a research on state of
PASID based IOMMU support on their platforms.
2. NVMe spec has to be extended to support PASID. At minimum, we need an ability
to assign an PASID to a sq/cq queue pair and ability to relocate the doorbells,
such as each guest would get its own (hardware backed) MMIO page with its own
doorbells. Plus of course the hardware vendors have to embrace the spec. I guess
these two things will happen in collaborative manner.
>
> I think you should send 6/9 "nvme/pci: init shadow doorbell after each
> reset" separately for immediate inclusion.
I'll do this soon.
Also '5/9 nvme/pci: add known admin effects to augment admin effects log page'
can be considered for immediate inclusion as well, as it works around a flaw
in the NVMe controller badly done admin side effects page with no side effects
(pun intended) for spec compliant controllers (I think so).
This can be fixed with a quirk if you prefer though.
>
> I like the idea in principle, but it will take me a little time to get
> through reviewing your implementation. I would have guessed we could
> have leveraged something from the existing nvme/target for the mediating
> controller register access and admin commands. Maybe even start with
> implementing an nvme passthrough namespace target type (we currently
> have block and file).
I fully agree with you on that I could have used some of the nvme/target code,
and I am planning to do so eventually.
For that I would need to make my driver, to be one of the target drivers, and I
would need to add another target back end, like you said to allow my target
driver to talk directly to the nvme hardware bypassing the block layer.
Or instead I can use the block backend,
(but note that currently the block back-end doesn't support polling which is
critical for the performance).
Switch to the target code might though have some (probably minor) performance
impact, as it would probably lengthen the critical code path a bit (I might need
for instance to translate the PRP lists I am getting from the virtual controller
to a scattergather list and back).
This is why I did this the way I did, but now knowing that probably I can afford
to loose a bit of performance, I can look at doing that.
Best regards,
Thanks in advance for the review,
Maxim Levitsky
PS:
For reference currently the IO path looks more or less like that:
My IO thread notices a doorbell write, reads a command from a submission queue,
translates it (without even looking at the data pointer) and sends it to the
nvme pci driver together with pointer to data iterator'.
The nvme pci driver calls the data iterator N times, which makes the iterator
translate and fetch the DMA addresses where the data is already mapped on the
its pci nvme device (the mdev driver maps all the guest memory to the nvme pci
device).
The nvme pci driver uses these addresses it receives, to create a prp list,
which it puts into the data pointer.
The nvme pci driver also allocates an free command id, from a list, puts it into
the command ID and sends the command to the real hardware.
Later the IO thread calls to the nvme pci driver to poll the queue. When
completions arrive, the nvme pci driver returns them back to the IO thread.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-19 23:49 ` Chaitanya Kulkarni
@ 2019-03-20 16:44 ` Maxim Levitsky
0 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2019-03-20 16:44 UTC (permalink / raw)
To: Chaitanya Kulkarni, Keith Busch
Cc: Fam Zheng, Jens Axboe, Sagi Grimberg, kvm@vger.kernel.org,
Wolfram Sang, Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
Keith Busch, Alex Williamson, Christoph Hellwig, Kirti Wankhede,
Mauro Carvalho Chehab, Paolo Bonzini, Liu Changpeng,
Paul E . McKenney
On Tue, 2019-03-19 at 23:49 +0000, Chaitanya Kulkarni wrote:
> Hi Keith,
> On 03/19/2019 08:21 AM, Keith Busch wrote:
> > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> > > -> Share the NVMe device between host and guest.
> > > Even in fully virtualized configurations,
> > > some partitions of nvme device could be used by guests as block
> > > devices
> > > while others passed through with nvme-mdev to achieve balance
> > > between
> > > all features of full IO stack emulation and performance.
> > >
> > > -> NVME-MDEV is a bit faster due to the fact that in-kernel driver
> > > can send interrupts to the guest directly without a context
> > > switch that can be expensive due to meltdown mitigation.
> > >
> > > -> Is able to utilize interrupts to get reasonable performance.
> > > This is only implemented
> > > as a proof of concept and not included in the patches,
> > > but interrupt driven mode shows reasonable performance
> > >
> > > -> This is a framework that later can be used to support NVMe devices
> > > with more of the IO virtualization built-in
> > > (IOMMU with PASID support coupled with device that supports it)
> >
> > Would be very interested to see the PASID support. You wouldn't even
> > need to mediate the IO doorbells or translations if assigning entire
> > namespaces, and should be much faster than the shadow doorbells.
> >
> > I think you should send 6/9 "nvme/pci: init shadow doorbell after each
> > reset" separately for immediate inclusion.
> >
> > I like the idea in principle, but it will take me a little time to get
> > through reviewing your implementation. I would have guessed we could
> > have leveraged something from the existing nvme/target for the mediating
> > controller register access and admin commands. Maybe even start with
> > implementing an nvme passthrough namespace target type (we currently
> > have block and file).
>
> I have the code for the NVMeOf target passthru-ctrl, I think we can use
> that as it is if you are looking for the passthru for NVMeOF.
>
> I'll post patch-series based on the latest code base soon.
I am very intersing in this code.
Could you explain how your NVMeOF target passthrough works?
Which components of the NVME stack does it involve?
Best regards,
Maxim Levitsky
> >
> > _______________________________________________
> > Linux-nvme mailing list
> > Linux-nvme@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-nvme
> >
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-20 16:30 ` Maxim Levitsky
@ 2019-03-20 17:03 ` Keith Busch
2019-03-20 17:33 ` Maxim Levitsky
0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2019-03-20 17:03 UTC (permalink / raw)
To: Maxim Levitsky
Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
Liu Changpeng, Paul E . McKenney, Amnon Ilan, Christoph Hellwig,
John Ferlan
On Wed, Mar 20, 2019 at 06:30:29PM +0200, Maxim Levitsky wrote:
> Or instead I can use the block backend,
> (but note that currently the block back-end doesn't support polling which is
> critical for the performance).
Oh, I think you can do polling through there. For reference, fs/io_uring.c
has a pretty good implementation that aligns with how you could use it.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-20 17:03 ` Keith Busch
@ 2019-03-20 17:33 ` Maxim Levitsky
0 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2019-03-20 17:33 UTC (permalink / raw)
To: Keith Busch
Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
Liu Changpeng, Paul E . McKenney, Amnon Ilan, Christoph Hellwig,
John Ferlan
On Wed, 2019-03-20 at 11:03 -0600, Keith Busch wrote:
> On Wed, Mar 20, 2019 at 06:30:29PM +0200, Maxim Levitsky wrote:
> > Or instead I can use the block backend,
> > (but note that currently the block back-end doesn't support polling which is
> > critical for the performance).
>
> Oh, I think you can do polling through there. For reference, fs/io_uring.c
> has a pretty good implementation that aligns with how you could use it.
That is exactly my thought. The polling recently got lot of improvements in the
block layer, which migh make this feasable.
I will give it a try.
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-19 14:41 (unknown) Maxim Levitsky
2019-03-19 15:22 ` your mail Keith Busch
@ 2019-03-21 16:13 ` Stefan Hajnoczi
2019-03-21 17:07 ` Maxim Levitsky
1 sibling, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2019-03-21 16:13 UTC (permalink / raw)
To: Maxim Levitsky
Cc: linux-nvme, linux-kernel, kvm, Jens Axboe, Alex Williamson,
Keith Busch, Christoph Hellwig, Sagi Grimberg, Kirti Wankhede,
David S . Miller, Mauro Carvalho Chehab, Greg Kroah-Hartman,
Wolfram Sang, Nicolas Ferre, Paul E . McKenney , Paolo Bonzini,
Liang Cunming, Liu Changpeng, Fam Zheng, Amnon Ilan,
John Ferlan <jfe
[-- Attachment #1: Type: text/plain, Size: 2018 bytes --]
On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> Date: Tue, 19 Mar 2019 14:45:45 +0200
> Subject: [PATCH 0/9] RFC: NVME VFIO mediated device
>
> Hi everyone!
>
> In this patch series, I would like to introduce my take on the problem of doing
> as fast as possible virtualization of storage with emphasis on low latency.
>
> In this patch series I implemented a kernel vfio based, mediated device that
> allows the user to pass through a partition and/or whole namespace to a guest.
>
> The idea behind this driver is based on paper you can find at
> https://www.usenix.org/conference/atc18/presentation/peng,
>
> Although note that I stared the development prior to reading this paper,
> independently.
>
> In addition to that implementation is not based on code used in the paper as
> I wasn't being able at that time to make the source available to me.
>
> ***Key points about the implementation:***
>
> * Polling kernel thread is used. The polling is stopped after a
> predefined timeout (1/2 sec by default).
> Support for all interrupt driven mode is planned, and it shows promising results.
>
> * Guest sees a standard NVME device - this allows to run guest with
> unmodified drivers, for example windows guests.
>
> * The NVMe device is shared between host and guest.
> That means that even a single namespace can be split between host
> and guest based on different partitions.
>
> * Simple configuration
>
> *** Performance ***
>
> Performance was tested on Intel DC P3700, With Xeon E5-2620 v2
> and both latency and throughput is very similar to SPDK.
>
> Soon I will test this on a better server and nvme device and provide
> more formal performance numbers.
>
> Latency numbers:
> ~80ms - spdk with fio plugin on the host.
> ~84ms - nvme driver on the host
> ~87ms - mdev-nvme + nvme driver in the guest
You mentioned the spdk numbers are with vhost-user-nvme. Have you
measured SPDK's vhost-user-blk?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-21 16:13 ` Stefan Hajnoczi
@ 2019-03-21 17:07 ` Maxim Levitsky
2019-03-25 16:46 ` Stefan Hajnoczi
0 siblings, 1 reply; 14+ messages in thread
From: Maxim Levitsky @ 2019-03-21 17:07 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: linux-nvme, linux-kernel, kvm, Jens Axboe, Alex Williamson,
Keith Busch, Christoph Hellwig, Sagi Grimberg, Kirti Wankhede,
David S . Miller, Mauro Carvalho Chehab, Greg Kroah-Hartman,
Wolfram Sang, Nicolas Ferre, Paul E . McKenney, Paolo Bonzini,
Liang Cunming, Liu Changpeng, Fam Zheng, Amnon Ilan,
John Ferlan <jfer
On Thu, 2019-03-21 at 16:13 +0000, Stefan Hajnoczi wrote:
> On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> > Date: Tue, 19 Mar 2019 14:45:45 +0200
> > Subject: [PATCH 0/9] RFC: NVME VFIO mediated device
> >
> > Hi everyone!
> >
> > In this patch series, I would like to introduce my take on the problem of
> > doing
> > as fast as possible virtualization of storage with emphasis on low latency.
> >
> > In this patch series I implemented a kernel vfio based, mediated device
> > that
> > allows the user to pass through a partition and/or whole namespace to a
> > guest.
> >
> > The idea behind this driver is based on paper you can find at
> > https://www.usenix.org/conference/atc18/presentation/peng,
> >
> > Although note that I stared the development prior to reading this paper,
> > independently.
> >
> > In addition to that implementation is not based on code used in the paper
> > as
> > I wasn't being able at that time to make the source available to me.
> >
> > ***Key points about the implementation:***
> >
> > * Polling kernel thread is used. The polling is stopped after a
> > predefined timeout (1/2 sec by default).
> > Support for all interrupt driven mode is planned, and it shows promising
> > results.
> >
> > * Guest sees a standard NVME device - this allows to run guest with
> > unmodified drivers, for example windows guests.
> >
> > * The NVMe device is shared between host and guest.
> > That means that even a single namespace can be split between host
> > and guest based on different partitions.
> >
> > * Simple configuration
> >
> > *** Performance ***
> >
> > Performance was tested on Intel DC P3700, With Xeon E5-2620 v2
> > and both latency and throughput is very similar to SPDK.
> >
> > Soon I will test this on a better server and nvme device and provide
> > more formal performance numbers.
> >
> > Latency numbers:
> > ~80ms - spdk with fio plugin on the host.
> > ~84ms - nvme driver on the host
> > ~87ms - mdev-nvme + nvme driver in the guest
>
> You mentioned the spdk numbers are with vhost-user-nvme. Have you
> measured SPDK's vhost-user-blk?
I had lot of measuments of vhost-user-blk vs vhost-user-nvme.
vhost-user-nvme was always a bit faster but only a bit.
Thus I don't think it makes sense to benchamrk against vhost-user-blk.
Best regards,
Maxim Levitsky
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-21 17:07 ` Maxim Levitsky
@ 2019-03-25 16:46 ` Stefan Hajnoczi
0 siblings, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2019-03-25 16:46 UTC (permalink / raw)
To: Maxim Levitsky
Cc: linux-nvme, linux-kernel, kvm, Jens Axboe, Alex Williamson,
Keith Busch, Christoph Hellwig, Sagi Grimberg, Kirti Wankhede,
David S . Miller, Mauro Carvalho Chehab, Greg Kroah-Hartman,
Wolfram Sang, Nicolas Ferre, Paul E . McKenney, Paolo Bonzini,
Liang Cunming, Liu Changpeng, Fam Zheng, Amnon Ilan,
John Ferlan <jfer
[-- Attachment #1: Type: text/plain, Size: 2913 bytes --]
On Thu, Mar 21, 2019 at 07:07:38PM +0200, Maxim Levitsky wrote:
> On Thu, 2019-03-21 at 16:13 +0000, Stefan Hajnoczi wrote:
> > On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> > > Date: Tue, 19 Mar 2019 14:45:45 +0200
> > > Subject: [PATCH 0/9] RFC: NVME VFIO mediated device
> > >
> > > Hi everyone!
> > >
> > > In this patch series, I would like to introduce my take on the problem of
> > > doing
> > > as fast as possible virtualization of storage with emphasis on low latency.
> > >
> > > In this patch series I implemented a kernel vfio based, mediated device
> > > that
> > > allows the user to pass through a partition and/or whole namespace to a
> > > guest.
> > >
> > > The idea behind this driver is based on paper you can find at
> > > https://www.usenix.org/conference/atc18/presentation/peng,
> > >
> > > Although note that I stared the development prior to reading this paper,
> > > independently.
> > >
> > > In addition to that implementation is not based on code used in the paper
> > > as
> > > I wasn't being able at that time to make the source available to me.
> > >
> > > ***Key points about the implementation:***
> > >
> > > * Polling kernel thread is used. The polling is stopped after a
> > > predefined timeout (1/2 sec by default).
> > > Support for all interrupt driven mode is planned, and it shows promising
> > > results.
> > >
> > > * Guest sees a standard NVME device - this allows to run guest with
> > > unmodified drivers, for example windows guests.
> > >
> > > * The NVMe device is shared between host and guest.
> > > That means that even a single namespace can be split between host
> > > and guest based on different partitions.
> > >
> > > * Simple configuration
> > >
> > > *** Performance ***
> > >
> > > Performance was tested on Intel DC P3700, With Xeon E5-2620 v2
> > > and both latency and throughput is very similar to SPDK.
> > >
> > > Soon I will test this on a better server and nvme device and provide
> > > more formal performance numbers.
> > >
> > > Latency numbers:
> > > ~80ms - spdk with fio plugin on the host.
> > > ~84ms - nvme driver on the host
> > > ~87ms - mdev-nvme + nvme driver in the guest
> >
> > You mentioned the spdk numbers are with vhost-user-nvme. Have you
> > measured SPDK's vhost-user-blk?
>
> I had lot of measuments of vhost-user-blk vs vhost-user-nvme.
> vhost-user-nvme was always a bit faster but only a bit.
> Thus I don't think it makes sense to benchamrk against vhost-user-blk.
It's interesting because mdev-nvme is closest to the hardware while
vhost-user-blk is closest to software. Doing things at the NVMe level
isn't buying much performance because it's still going through a
software path comparable to vhost-user-blk.
From what you say it sounds like there isn't much to optimize away :(.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: your mail
2019-03-19 15:22 ` your mail Keith Busch
2019-03-19 23:49 ` Chaitanya Kulkarni
2019-03-20 16:30 ` Maxim Levitsky
@ 2019-04-08 10:04 ` Maxim Levitsky
2 siblings, 0 replies; 14+ messages in thread
From: Maxim Levitsky @ 2019-04-08 10:04 UTC (permalink / raw)
To: Keith Busch
Cc: Fam Zheng, Keith Busch, Sagi Grimberg, kvm, Wolfram Sang,
Greg Kroah-Hartman, Liang Cunming, Nicolas Ferre, linux-kernel,
linux-nvme, David S . Miller, Jens Axboe, Alex Williamson,
Kirti Wankhede, Mauro Carvalho Chehab, Paolo Bonzini,
Liu Changpeng, Paul E . McKenney, Amnon Ilan, Christoph Hellwig,
John Ferlan
On Tue, 2019-03-19 at 09:22 -0600, Keith Busch wrote:
> On Tue, Mar 19, 2019 at 04:41:07PM +0200, Maxim Levitsky wrote:
> > -> Share the NVMe device between host and guest.
> > Even in fully virtualized configurations,
> > some partitions of nvme device could be used by guests as block
> > devices
> > while others passed through with nvme-mdev to achieve balance between
> > all features of full IO stack emulation and performance.
> >
> > -> NVME-MDEV is a bit faster due to the fact that in-kernel driver
> > can send interrupts to the guest directly without a context
> > switch that can be expensive due to meltdown mitigation.
> >
> > -> Is able to utilize interrupts to get reasonable performance.
> > This is only implemented
> > as a proof of concept and not included in the patches,
> > but interrupt driven mode shows reasonable performance
> >
> > -> This is a framework that later can be used to support NVMe devices
> > with more of the IO virtualization built-in
> > (IOMMU with PASID support coupled with device that supports it)
>
> Would be very interested to see the PASID support. You wouldn't even
> need to mediate the IO doorbells or translations if assigning entire
> namespaces, and should be much faster than the shadow doorbells.
>
> I think you should send 6/9 "nvme/pci: init shadow doorbell after each
> reset" separately for immediate inclusion.
>
> I like the idea in principle, but it will take me a little time to get
> through reviewing your implementation. I would have guessed we could
> have leveraged something from the existing nvme/target for the mediating
> controller register access and admin commands. Maybe even start with
> implementing an nvme passthrough namespace target type (we currently
> have block and file).
Hi!
Sorry to bother you, but any update?
I was somewhat sick for the last week, now finally back in shape to continue
working on this and other tasks I have.
I am studing now the nvme target code and the io_uring to evaluate the
difficultiy of using something similiar to talk to the block device instead of /
in addtion to the direct connection I implemented.
I would be glad to hear more feedback on this project.
I will also soon post the few fixes separately as you suggested.
Best regards,
Maxim Levitskky
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-04-08 10:04 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-23 5:38 (unknown), Haneef Syed
2009-07-23 5:50 ` your mail Gleb Natapov
2009-07-23 6:09 ` Haneef Syed
2009-07-23 6:14 ` Gleb Natapov
-- strict thread matches above, loose matches on Subject: below --
2019-03-19 14:41 (unknown) Maxim Levitsky
2019-03-19 15:22 ` your mail Keith Busch
2019-03-19 23:49 ` Chaitanya Kulkarni
2019-03-20 16:44 ` Maxim Levitsky
2019-03-20 16:30 ` Maxim Levitsky
2019-03-20 17:03 ` Keith Busch
2019-03-20 17:33 ` Maxim Levitsky
2019-04-08 10:04 ` Maxim Levitsky
2019-03-21 16:13 ` Stefan Hajnoczi
2019-03-21 17:07 ` Maxim Levitsky
2019-03-25 16:46 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).