Linux virtualization list
 help / color / mirror / Atom feed
* Re: [RFC v2] virtio: support packed ring
From: Tiwei Bie @ 2018-04-24  1:49 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180424044250-mutt-send-email-mst@kernel.org>

On Tue, Apr 24, 2018 at 04:43:22AM +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 24, 2018 at 09:37:47AM +0800, Tiwei Bie wrote:
> > On Tue, Apr 24, 2018 at 04:29:51AM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Apr 24, 2018 at 09:16:38AM +0800, Tiwei Bie wrote:
> > > > On Tue, Apr 24, 2018 at 04:05:07AM +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Apr 24, 2018 at 08:54:52AM +0800, Jason Wang wrote:
> > > > > > 
> > > > > > 
> > > > > > On 2018年04月23日 17:29, Tiwei Bie wrote:
> > > > > > > On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
> > > > > > > > On 2018年04月01日 22:12, Tiwei Bie wrote:
> > > > > > > > > Hello everyone,
> > > > > > > > > 
> > > > > > > > > This RFC implements packed ring support for virtio driver.
> > > > > > > > > 
> > > > > > > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > > > > > > > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > > > > > > > > Minor changes are needed for the vhost code, e.g. to kick the guest.
> > > > > > > > > 
> > > > > > > > > TODO:
> > > > > > > > > - Refinements and bug fixes;
> > > > > > > > > - Split into small patches;
> > > > > > > > > - Test indirect descriptor support;
> > > > > > > > > - Test/fix event suppression support;
> > > > > > > > > - Test devices other than net;
> > > > > > > > > 
> > > > > > > > > RFC v1 -> RFC v2:
> > > > > > > > > - Add indirect descriptor support - compile test only;
> > > > > > > > > - Add event suppression supprt - compile test only;
> > > > > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > > > > - Avoid using '%' operator (Jason);
> > > > > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > > > > - Some other refinements and bug fixes;
> > > > > > > > > 
> > > > > > > > > Thanks!
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > > > ---
> > > > > > > > >    drivers/virtio/virtio_ring.c       | 1094 +++++++++++++++++++++++++++++-------
> > > > > > > > >    include/linux/virtio_ring.h        |    8 +-
> > > > > > > > >    include/uapi/linux/virtio_config.h |   12 +-
> > > > > > > > >    include/uapi/linux/virtio_ring.h   |   61 ++
> > > > > > > > >    4 files changed, 980 insertions(+), 195 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > > index 71458f493cf8..0515dca34d77 100644
> > > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > > @@ -58,14 +58,15 @@
> > > > > > > > [...]
> > > > > > > > 
> > > > > > > > > +
> > > > > > > > > +	if (vq->indirect) {
> > > > > > > > > +		u32 len;
> > > > > > > > > +
> > > > > > > > > +		desc = vq->desc_state[head].indir_desc;
> > > > > > > > > +		/* Free the indirect table, if any, now that it's unmapped. */
> > > > > > > > > +		if (!desc)
> > > > > > > > > +			goto out;
> > > > > > > > > +
> > > > > > > > > +		len = virtio32_to_cpu(vq->vq.vdev,
> > > > > > > > > +				      vq->vring_packed.desc[head].len);
> > > > > > > > > +
> > > > > > > > > +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> > > > > > > > > +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > > > > > > > It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
> > > > > > > > can safely remove this BUG_ON() here.
> > > > > > > > 
> > > > > > > > > +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > > > > > > > Len could be ignored for used descriptor according to the spec, so we need
> > > > > > > > remove this BUG_ON() too.
> > > > > > > Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
> > > > > > > And I think something related to this in the spec isn't very
> > > > > > > clear currently.
> > > > > > > 
> > > > > > > In the spec, there are below words:
> > > > > > > 
> > > > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
> > > > > > > """
> > > > > > > In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
> > > > > > > is reserved and is ignored by the device.
> > > > > > > """
> > > > > > > 
> > > > > > > So when device writes back an used descriptor in this case,
> > > > > > > device may not set the VIRTQ_DESC_F_WRITE flag as the flag
> > > > > > > is reserved and should be ignored.
> > > > > > > 
> > > > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
> > > > > > > """
> > > > > > > Element Length is reserved for used descriptors without the
> > > > > > > VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
> > > > > > > """
> > > > > > > 
> > > > > > > And this is the way how driver ignores the `len` in an used
> > > > > > > descriptor.
> > > > > > > 
> > > > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
> > > > > > > """
> > > > > > > To increase ring capacity the driver can store a (read-only
> > > > > > > by the device) table of indirect descriptors anywhere in memory,
> > > > > > > and insert a descriptor in the main virtqueue (with \field{Flags}
> > > > > > > bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
> > > > > > > containing this indirect descriptor table;
> > > > > > > """
> > > > > > > 
> > > > > > > So the indirect descriptors in the table are read-only by
> > > > > > > the device. And the only descriptor which is writeable by
> > > > > > > the device is the descriptor in the main virtqueue (with
> > > > > > > Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
> > > > > > > `len` in this descriptor, we won't be able to get the
> > > > > > > length of the data written by the device.
> > > > > > > 
> > > > > > > So I think the `len` in this descriptor will carry the
> > > > > > > length of the data written by the device (if the buffers
> > > > > > > are writable to the device) even if the VIRTQ_DESC_F_WRITE
> > > > > > > isn't set by the device. How do you think?
> > > > > > 
> > > > > > Yes I think so. But we'd better need clarification from Michael.
> > > > > 
> > > > > I think if you use a descriptor, and you want to supply len
> > > > > to guest, you set VIRTQ_DESC_F_WRITE in the used descriptor.
> > > > > Spec also says you must not set VIRTQ_DESC_F_INDIRECT then.
> > > > > If that's a problem we could look at relaxing that last requirement -
> > > > > does driver want INDIRECT in used descriptor to match
> > > > > the value in the avail descriptor for some reason?
> > > > 
> > > > For indirect, driver needs some way to get the length
> > > > of the data written by the driver. And the descriptors
> > > > in the indirect table is read-only, so the only place
> > > > device could put this value is the descriptor with the
> > > > VIRTQ_DESC_F_INDIRECT flag set.
> > > 
> > > when writing out used descriptor, device should set VIRTQ_DESC_F_WRITE
> > > (and clear VIRTQ_DESC_F_INDIRECT).
> > 
> > So the spec allows device to set VIRTQ_DESC_F_WRITE bit
> > when writing out an used descriptor even if the corresponding
> > descriptors it just parsed don't have the VIRTQ_DESC_F_WRITE
> > bit set?
> > 
> > Best regards,
> > Tiwei Bie
> 
> I think so. In a used descriptor, VIRTQ_DESC_F_WRITE just means length
> is valid.

Got it. It's very neat. Thanks! :)

Best regards,
Tiwei Bie

> 
> -- 
> MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC v2] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-04-24  1:43 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180424013747.hmr4ytqe7gdqfhdt@debian>

On Tue, Apr 24, 2018 at 09:37:47AM +0800, Tiwei Bie wrote:
> On Tue, Apr 24, 2018 at 04:29:51AM +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 24, 2018 at 09:16:38AM +0800, Tiwei Bie wrote:
> > > On Tue, Apr 24, 2018 at 04:05:07AM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 24, 2018 at 08:54:52AM +0800, Jason Wang wrote:
> > > > > 
> > > > > 
> > > > > On 2018年04月23日 17:29, Tiwei Bie wrote:
> > > > > > On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
> > > > > > > On 2018年04月01日 22:12, Tiwei Bie wrote:
> > > > > > > > Hello everyone,
> > > > > > > > 
> > > > > > > > This RFC implements packed ring support for virtio driver.
> > > > > > > > 
> > > > > > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > > > > > > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > > > > > > > Minor changes are needed for the vhost code, e.g. to kick the guest.
> > > > > > > > 
> > > > > > > > TODO:
> > > > > > > > - Refinements and bug fixes;
> > > > > > > > - Split into small patches;
> > > > > > > > - Test indirect descriptor support;
> > > > > > > > - Test/fix event suppression support;
> > > > > > > > - Test devices other than net;
> > > > > > > > 
> > > > > > > > RFC v1 -> RFC v2:
> > > > > > > > - Add indirect descriptor support - compile test only;
> > > > > > > > - Add event suppression supprt - compile test only;
> > > > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > > > - Avoid using '%' operator (Jason);
> > > > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > > > - Some other refinements and bug fixes;
> > > > > > > > 
> > > > > > > > Thanks!
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > > ---
> > > > > > > >    drivers/virtio/virtio_ring.c       | 1094 +++++++++++++++++++++++++++++-------
> > > > > > > >    include/linux/virtio_ring.h        |    8 +-
> > > > > > > >    include/uapi/linux/virtio_config.h |   12 +-
> > > > > > > >    include/uapi/linux/virtio_ring.h   |   61 ++
> > > > > > > >    4 files changed, 980 insertions(+), 195 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > > index 71458f493cf8..0515dca34d77 100644
> > > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > > @@ -58,14 +58,15 @@
> > > > > > > [...]
> > > > > > > 
> > > > > > > > +
> > > > > > > > +	if (vq->indirect) {
> > > > > > > > +		u32 len;
> > > > > > > > +
> > > > > > > > +		desc = vq->desc_state[head].indir_desc;
> > > > > > > > +		/* Free the indirect table, if any, now that it's unmapped. */
> > > > > > > > +		if (!desc)
> > > > > > > > +			goto out;
> > > > > > > > +
> > > > > > > > +		len = virtio32_to_cpu(vq->vq.vdev,
> > > > > > > > +				      vq->vring_packed.desc[head].len);
> > > > > > > > +
> > > > > > > > +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> > > > > > > > +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > > > > > > It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
> > > > > > > can safely remove this BUG_ON() here.
> > > > > > > 
> > > > > > > > +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > > > > > > Len could be ignored for used descriptor according to the spec, so we need
> > > > > > > remove this BUG_ON() too.
> > > > > > Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
> > > > > > And I think something related to this in the spec isn't very
> > > > > > clear currently.
> > > > > > 
> > > > > > In the spec, there are below words:
> > > > > > 
> > > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
> > > > > > """
> > > > > > In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
> > > > > > is reserved and is ignored by the device.
> > > > > > """
> > > > > > 
> > > > > > So when device writes back an used descriptor in this case,
> > > > > > device may not set the VIRTQ_DESC_F_WRITE flag as the flag
> > > > > > is reserved and should be ignored.
> > > > > > 
> > > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
> > > > > > """
> > > > > > Element Length is reserved for used descriptors without the
> > > > > > VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
> > > > > > """
> > > > > > 
> > > > > > And this is the way how driver ignores the `len` in an used
> > > > > > descriptor.
> > > > > > 
> > > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
> > > > > > """
> > > > > > To increase ring capacity the driver can store a (read-only
> > > > > > by the device) table of indirect descriptors anywhere in memory,
> > > > > > and insert a descriptor in the main virtqueue (with \field{Flags}
> > > > > > bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
> > > > > > containing this indirect descriptor table;
> > > > > > """
> > > > > > 
> > > > > > So the indirect descriptors in the table are read-only by
> > > > > > the device. And the only descriptor which is writeable by
> > > > > > the device is the descriptor in the main virtqueue (with
> > > > > > Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
> > > > > > `len` in this descriptor, we won't be able to get the
> > > > > > length of the data written by the device.
> > > > > > 
> > > > > > So I think the `len` in this descriptor will carry the
> > > > > > length of the data written by the device (if the buffers
> > > > > > are writable to the device) even if the VIRTQ_DESC_F_WRITE
> > > > > > isn't set by the device. How do you think?
> > > > > 
> > > > > Yes I think so. But we'd better need clarification from Michael.
> > > > 
> > > > I think if you use a descriptor, and you want to supply len
> > > > to guest, you set VIRTQ_DESC_F_WRITE in the used descriptor.
> > > > Spec also says you must not set VIRTQ_DESC_F_INDIRECT then.
> > > > If that's a problem we could look at relaxing that last requirement -
> > > > does driver want INDIRECT in used descriptor to match
> > > > the value in the avail descriptor for some reason?
> > > 
> > > For indirect, driver needs some way to get the length
> > > of the data written by the driver. And the descriptors
> > > in the indirect table is read-only, so the only place
> > > device could put this value is the descriptor with the
> > > VIRTQ_DESC_F_INDIRECT flag set.
> > 
> > when writing out used descriptor, device should set VIRTQ_DESC_F_WRITE
> > (and clear VIRTQ_DESC_F_INDIRECT).
> 
> So the spec allows device to set VIRTQ_DESC_F_WRITE bit
> when writing out an used descriptor even if the corresponding
> descriptors it just parsed don't have the VIRTQ_DESC_F_WRITE
> bit set?
> 
> Best regards,
> Tiwei Bie

I think so. In a used descriptor, VIRTQ_DESC_F_WRITE just means length
is valid.

-- 
MST
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-04-24  1:42 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Jakub Kicinski, Netdev,
	virtualization, Siwei Liu, Sridhar Samudrala, David Miller
In-Reply-To: <20180423182503.353180c9@xeon-e3>

On Mon, Apr 23, 2018 at 06:25:03PM -0700, Stephen Hemminger wrote:
> On Mon, 23 Apr 2018 12:44:39 -0700
> Siwei Liu <loseweigh@gmail.com> wrote:
> 
> > On Mon, Apr 23, 2018 at 10:56 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Mon, Apr 23, 2018 at 10:44:40AM -0700, Stephen Hemminger wrote:  
> > >> On Mon, 23 Apr 2018 20:24:56 +0300
> > >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >>  
> > >> > On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:  
> > >> > > > >
> > >> > > > >I will NAK patches to change to common code for netvsc especially the
> > >> > > > >three device model.  MS worked hard with distro vendors to support transparent
> > >> > > > >mode, ans we really can't have a new model; or do backport.
> > >> > > > >
> > >> > > > >Plus, DPDK is now dependent on existing model.  
> > >> > > >
> > >> > > > Sorry, but nobody here cares about dpdk or other similar oddities.  
> > >> > >
> > >> > > The network device model is a userspace API, and DPDK is a userspace application.  
> > >> >
> > >> > It is userspace but are you sure dpdk is actually poking at netdevs?
> > >> > AFAIK it's normally banging device registers directly.
> > >> >  
> > >> > > You can't go breaking userspace even if you don't like the application.  
> > >> >
> > >> > Could you please explain how is the proposed patchset breaking
> > >> > userspace? Ignoring DPDK for now, I don't think it changes the userspace
> > >> > API at all.
> > >> >  
> > >>
> > >> The DPDK has a device driver vdev_netvsc which scans the Linux network devices
> > >> to look for Linux netvsc device and the paired VF device and setup the
> > >> DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
> > >> and sets up TAP support over the Linux netvsc device as well as the Mellanox
> > >> VF device.
> > >>
> > >> So it depends on existing 2 device model. You can't go to a 3 device model
> > >> or start hiding devices from userspace.  
> > >
> > > Okay so how does the existing patch break that? IIUC does not go to
> > > a 3 device model since netvsc calls failover_register directly.
> > >  
> > >> Also, I am working on associating netvsc and VF device based on serial number
> > >> rather than MAC address. The serial number is how Windows works now, and it makes
> > >> sense for Linux and Windows to use the same mechanism if possible.  
> > >
> > > Maybe we should support same for virtio ...
> > > Which serial do you mean? From vpd?
> > >
> > > I guess you will want to keep supporting MAC for old hypervisors?
> 
> The serial number has always been in the hypervisor since original support of SR-IOV
> in WS2008.  So no backward compatibility special cases would be needed.

Is that a serial from real hardware or a hypervisor thing?


-- 
MST

^ permalink raw reply

* Re: [RFC v2] virtio: support packed ring
From: Tiwei Bie @ 2018-04-24  1:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180424042913-mutt-send-email-mst@kernel.org>

On Tue, Apr 24, 2018 at 04:29:51AM +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 24, 2018 at 09:16:38AM +0800, Tiwei Bie wrote:
> > On Tue, Apr 24, 2018 at 04:05:07AM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Apr 24, 2018 at 08:54:52AM +0800, Jason Wang wrote:
> > > > 
> > > > 
> > > > On 2018年04月23日 17:29, Tiwei Bie wrote:
> > > > > On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
> > > > > > On 2018年04月01日 22:12, Tiwei Bie wrote:
> > > > > > > Hello everyone,
> > > > > > > 
> > > > > > > This RFC implements packed ring support for virtio driver.
> > > > > > > 
> > > > > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > > > > > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > > > > > > Minor changes are needed for the vhost code, e.g. to kick the guest.
> > > > > > > 
> > > > > > > TODO:
> > > > > > > - Refinements and bug fixes;
> > > > > > > - Split into small patches;
> > > > > > > - Test indirect descriptor support;
> > > > > > > - Test/fix event suppression support;
> > > > > > > - Test devices other than net;
> > > > > > > 
> > > > > > > RFC v1 -> RFC v2:
> > > > > > > - Add indirect descriptor support - compile test only;
> > > > > > > - Add event suppression supprt - compile test only;
> > > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > > - Avoid using '%' operator (Jason);
> > > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > > - Some other refinements and bug fixes;
> > > > > > > 
> > > > > > > Thanks!
> > > > > > > 
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > ---
> > > > > > >    drivers/virtio/virtio_ring.c       | 1094 +++++++++++++++++++++++++++++-------
> > > > > > >    include/linux/virtio_ring.h        |    8 +-
> > > > > > >    include/uapi/linux/virtio_config.h |   12 +-
> > > > > > >    include/uapi/linux/virtio_ring.h   |   61 ++
> > > > > > >    4 files changed, 980 insertions(+), 195 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > > index 71458f493cf8..0515dca34d77 100644
> > > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > > @@ -58,14 +58,15 @@
> > > > > > [...]
> > > > > > 
> > > > > > > +
> > > > > > > +	if (vq->indirect) {
> > > > > > > +		u32 len;
> > > > > > > +
> > > > > > > +		desc = vq->desc_state[head].indir_desc;
> > > > > > > +		/* Free the indirect table, if any, now that it's unmapped. */
> > > > > > > +		if (!desc)
> > > > > > > +			goto out;
> > > > > > > +
> > > > > > > +		len = virtio32_to_cpu(vq->vq.vdev,
> > > > > > > +				      vq->vring_packed.desc[head].len);
> > > > > > > +
> > > > > > > +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> > > > > > > +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > > > > > It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
> > > > > > can safely remove this BUG_ON() here.
> > > > > > 
> > > > > > > +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > > > > > Len could be ignored for used descriptor according to the spec, so we need
> > > > > > remove this BUG_ON() too.
> > > > > Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
> > > > > And I think something related to this in the spec isn't very
> > > > > clear currently.
> > > > > 
> > > > > In the spec, there are below words:
> > > > > 
> > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
> > > > > """
> > > > > In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
> > > > > is reserved and is ignored by the device.
> > > > > """
> > > > > 
> > > > > So when device writes back an used descriptor in this case,
> > > > > device may not set the VIRTQ_DESC_F_WRITE flag as the flag
> > > > > is reserved and should be ignored.
> > > > > 
> > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
> > > > > """
> > > > > Element Length is reserved for used descriptors without the
> > > > > VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
> > > > > """
> > > > > 
> > > > > And this is the way how driver ignores the `len` in an used
> > > > > descriptor.
> > > > > 
> > > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
> > > > > """
> > > > > To increase ring capacity the driver can store a (read-only
> > > > > by the device) table of indirect descriptors anywhere in memory,
> > > > > and insert a descriptor in the main virtqueue (with \field{Flags}
> > > > > bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
> > > > > containing this indirect descriptor table;
> > > > > """
> > > > > 
> > > > > So the indirect descriptors in the table are read-only by
> > > > > the device. And the only descriptor which is writeable by
> > > > > the device is the descriptor in the main virtqueue (with
> > > > > Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
> > > > > `len` in this descriptor, we won't be able to get the
> > > > > length of the data written by the device.
> > > > > 
> > > > > So I think the `len` in this descriptor will carry the
> > > > > length of the data written by the device (if the buffers
> > > > > are writable to the device) even if the VIRTQ_DESC_F_WRITE
> > > > > isn't set by the device. How do you think?
> > > > 
> > > > Yes I think so. But we'd better need clarification from Michael.
> > > 
> > > I think if you use a descriptor, and you want to supply len
> > > to guest, you set VIRTQ_DESC_F_WRITE in the used descriptor.
> > > Spec also says you must not set VIRTQ_DESC_F_INDIRECT then.
> > > If that's a problem we could look at relaxing that last requirement -
> > > does driver want INDIRECT in used descriptor to match
> > > the value in the avail descriptor for some reason?
> > 
> > For indirect, driver needs some way to get the length
> > of the data written by the driver. And the descriptors
> > in the indirect table is read-only, so the only place
> > device could put this value is the descriptor with the
> > VIRTQ_DESC_F_INDIRECT flag set.
> 
> when writing out used descriptor, device should set VIRTQ_DESC_F_WRITE
> (and clear VIRTQ_DESC_F_INDIRECT).

So the spec allows device to set VIRTQ_DESC_F_WRITE bit
when writing out an used descriptor even if the corresponding
descriptors it just parsed don't have the VIRTQ_DESC_F_WRITE
bit set?

Best regards,
Tiwei Bie
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC v2] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-04-24  1:29 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180424011638.vf6f22dnl7ufnnij@debian>

On Tue, Apr 24, 2018 at 09:16:38AM +0800, Tiwei Bie wrote:
> On Tue, Apr 24, 2018 at 04:05:07AM +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 24, 2018 at 08:54:52AM +0800, Jason Wang wrote:
> > > 
> > > 
> > > On 2018年04月23日 17:29, Tiwei Bie wrote:
> > > > On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
> > > > > On 2018年04月01日 22:12, Tiwei Bie wrote:
> > > > > > Hello everyone,
> > > > > > 
> > > > > > This RFC implements packed ring support for virtio driver.
> > > > > > 
> > > > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > > > > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > > > > > Minor changes are needed for the vhost code, e.g. to kick the guest.
> > > > > > 
> > > > > > TODO:
> > > > > > - Refinements and bug fixes;
> > > > > > - Split into small patches;
> > > > > > - Test indirect descriptor support;
> > > > > > - Test/fix event suppression support;
> > > > > > - Test devices other than net;
> > > > > > 
> > > > > > RFC v1 -> RFC v2:
> > > > > > - Add indirect descriptor support - compile test only;
> > > > > > - Add event suppression supprt - compile test only;
> > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > - Avoid using '%' operator (Jason);
> > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > - Some other refinements and bug fixes;
> > > > > > 
> > > > > > Thanks!
> > > > > > 
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > ---
> > > > > >    drivers/virtio/virtio_ring.c       | 1094 +++++++++++++++++++++++++++++-------
> > > > > >    include/linux/virtio_ring.h        |    8 +-
> > > > > >    include/uapi/linux/virtio_config.h |   12 +-
> > > > > >    include/uapi/linux/virtio_ring.h   |   61 ++
> > > > > >    4 files changed, 980 insertions(+), 195 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index 71458f493cf8..0515dca34d77 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -58,14 +58,15 @@
> > > > > [...]
> > > > > 
> > > > > > +
> > > > > > +	if (vq->indirect) {
> > > > > > +		u32 len;
> > > > > > +
> > > > > > +		desc = vq->desc_state[head].indir_desc;
> > > > > > +		/* Free the indirect table, if any, now that it's unmapped. */
> > > > > > +		if (!desc)
> > > > > > +			goto out;
> > > > > > +
> > > > > > +		len = virtio32_to_cpu(vq->vq.vdev,
> > > > > > +				      vq->vring_packed.desc[head].len);
> > > > > > +
> > > > > > +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> > > > > > +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > > > > It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
> > > > > can safely remove this BUG_ON() here.
> > > > > 
> > > > > > +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > > > > Len could be ignored for used descriptor according to the spec, so we need
> > > > > remove this BUG_ON() too.
> > > > Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
> > > > And I think something related to this in the spec isn't very
> > > > clear currently.
> > > > 
> > > > In the spec, there are below words:
> > > > 
> > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
> > > > """
> > > > In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
> > > > is reserved and is ignored by the device.
> > > > """
> > > > 
> > > > So when device writes back an used descriptor in this case,
> > > > device may not set the VIRTQ_DESC_F_WRITE flag as the flag
> > > > is reserved and should be ignored.
> > > > 
> > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
> > > > """
> > > > Element Length is reserved for used descriptors without the
> > > > VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
> > > > """
> > > > 
> > > > And this is the way how driver ignores the `len` in an used
> > > > descriptor.
> > > > 
> > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
> > > > """
> > > > To increase ring capacity the driver can store a (read-only
> > > > by the device) table of indirect descriptors anywhere in memory,
> > > > and insert a descriptor in the main virtqueue (with \field{Flags}
> > > > bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
> > > > containing this indirect descriptor table;
> > > > """
> > > > 
> > > > So the indirect descriptors in the table are read-only by
> > > > the device. And the only descriptor which is writeable by
> > > > the device is the descriptor in the main virtqueue (with
> > > > Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
> > > > `len` in this descriptor, we won't be able to get the
> > > > length of the data written by the device.
> > > > 
> > > > So I think the `len` in this descriptor will carry the
> > > > length of the data written by the device (if the buffers
> > > > are writable to the device) even if the VIRTQ_DESC_F_WRITE
> > > > isn't set by the device. How do you think?
> > > 
> > > Yes I think so. But we'd better need clarification from Michael.
> > 
> > I think if you use a descriptor, and you want to supply len
> > to guest, you set VIRTQ_DESC_F_WRITE in the used descriptor.
> > Spec also says you must not set VIRTQ_DESC_F_INDIRECT then.
> > If that's a problem we could look at relaxing that last requirement -
> > does driver want INDIRECT in used descriptor to match
> > the value in the avail descriptor for some reason?
> 
> For indirect, driver needs some way to get the length
> of the data written by the driver. And the descriptors
> in the indirect table is read-only, so the only place
> device could put this value is the descriptor with the
> VIRTQ_DESC_F_INDIRECT flag set.

when writing out used descriptor, device should set VIRTQ_DESC_F_WRITE
(and clear VIRTQ_DESC_F_INDIRECT).

> > 
> > > > 
> > > > 
> > > > > The reason is we don't touch descriptor ring in the case of split, so
> > > > > BUG_ON()s may help there.
> > > > > 
> > > > > > +
> > > > > > +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> > > > > > +			vring_unmap_one_packed(vq, &desc[j]);
> > > > > > +
> > > > > > +		kfree(desc);
> > > > > > +		vq->desc_state[head].indir_desc = NULL;
> > > > > > +	} else if (ctx) {
> > > > > > +		*ctx = vq->desc_state[head].indir_desc;
> > > > > > +	}
> > > > > > +
> > > > > > +out:
> > > > > > +	return vq->desc_state[head].num;
> > > > > > +}
> > > > > > +
> > > > > > +static inline bool more_used_split(const struct vring_virtqueue *vq)
> > > > > >    {
> > > > > >    	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> > > > > >    }
> > > > > > +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> > > > > > +{
> > > > > > +	u16 last_used, flags;
> > > > > > +	bool avail, used;
> > > > > > +
> > > > > > +	if (vq->vq.num_free == vq->vring_packed.num)
> > > > > > +		return false;
> > > > > > +
> > > > > > +	last_used = vq->last_used_idx;
> > > > > > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > > > > > +				vq->vring_packed.desc[last_used].flags);
> > > > > > +	avail = flags & VRING_DESC_F_AVAIL(1);
> > > > > > +	used = flags & VRING_DESC_F_USED(1);
> > > > > > +
> > > > > > +	return avail == used;
> > > > > > +}
> > > > > This looks interesting, spec said:
> > > > > 
> > > > > "
> > > > > Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different for an
> > > > > available descriptor and
> > > > > equal for a used descriptor.
> > > > > Note that this observation is mostly useful for sanity-checking as these are
> > > > > necessary but not sufficient
> > > > > conditions - for example, all descriptors are zero-initialized. To detect
> > > > > used and available descriptors it is
> > > > > possible for drivers and devices to keep track of the last observed value of
> > > > > VIRTQ_DESC_F_USED/VIRTQ_-
> > > > > DESC_F_AVAIL. Other techniques to detect
> > > > > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> > > > > might also be possible.
> > > > > "
> > > > > 
> > > > > So it looks to me it was not sufficient, looking at the example codes in
> > > > > spec, do we need to track last seen used_wrap_counter here?
> > > > I don't think we have to track used_wrap_counter in
> > > > driver. There was a discussion on this:
> > > > 
> > > > https://lists.oasis-open.org/archives/virtio-dev/201802/msg00177.html
> > > > 
> > > > And after that, below sentence was added (it's also
> > > > in the above words you quoted):
> > > > 
> > > > """
> > > > Other techniques to detect
> > > > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> > > > might also be possible.
> > > > """
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > 
> > > I see, the extra condition "if (vq->vq.num_free == vq->vring_packed.num)"
> > > help in this case.
> > > 
> > > Thanks
> > 
> > I still think tracking a wrap counter is better.
> 
> >From my understanding, wrap counter is only needed when
> one side just want to update parts of the status bit(s),
> it's something like the "report status" or "write back"
> feature [1] in the hardware NIC. And in the driver, all
> the status must be updated, and that's why I don't want
> to track the usedwrap counter.
> 
> [1] https://github.com/btw616/dpdk-virtio1.1/commit/ca837865bd10
> 
> Best regards,
> Tiwei Bie
> 
> > 
> > > > 
> > > > > Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-04-24  1:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Jakub Kicinski, Netdev,
	virtualization, Siwei Liu, Sridhar Samudrala, David Miller
In-Reply-To: <20180423230037-mutt-send-email-mst@kernel.org>

On Mon, 23 Apr 2018 23:06:55 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Apr 23, 2018 at 12:44:39PM -0700, Siwei Liu wrote:
> > On Mon, Apr 23, 2018 at 10:56 AM, Michael S. Tsirkin <mst@redhat.com> wrote:  
> > > On Mon, Apr 23, 2018 at 10:44:40AM -0700, Stephen Hemminger wrote:  
> > >> On Mon, 23 Apr 2018 20:24:56 +0300
> > >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >>  
> > >> > On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:  
> > >> > > > >
> > >> > > > >I will NAK patches to change to common code for netvsc especially the
> > >> > > > >three device model.  MS worked hard with distro vendors to support transparent
> > >> > > > >mode, ans we really can't have a new model; or do backport.
> > >> > > > >
> > >> > > > >Plus, DPDK is now dependent on existing model.  
> > >> > > >
> > >> > > > Sorry, but nobody here cares about dpdk or other similar oddities.  
> > >> > >
> > >> > > The network device model is a userspace API, and DPDK is a userspace application.  
> > >> >
> > >> > It is userspace but are you sure dpdk is actually poking at netdevs?
> > >> > AFAIK it's normally banging device registers directly.
> > >> >  
> > >> > > You can't go breaking userspace even if you don't like the application.  
> > >> >
> > >> > Could you please explain how is the proposed patchset breaking
> > >> > userspace? Ignoring DPDK for now, I don't think it changes the userspace
> > >> > API at all.
> > >> >  
> > >>
> > >> The DPDK has a device driver vdev_netvsc which scans the Linux network devices
> > >> to look for Linux netvsc device and the paired VF device and setup the
> > >> DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
> > >> and sets up TAP support over the Linux netvsc device as well as the Mellanox
> > >> VF device.
> > >>
> > >> So it depends on existing 2 device model. You can't go to a 3 device model
> > >> or start hiding devices from userspace.  
> > >
> > > Okay so how does the existing patch break that? IIUC does not go to
> > > a 3 device model since netvsc calls failover_register directly.
> > >  
> > >> Also, I am working on associating netvsc and VF device based on serial number
> > >> rather than MAC address. The serial number is how Windows works now, and it makes
> > >> sense for Linux and Windows to use the same mechanism if possible.  
> > >
> > > Maybe we should support same for virtio ...
> > > Which serial do you mean? From vpd?
> > >
> > > I guess you will want to keep supporting MAC for old hypervisors?
> > >
> > > It all seems like a reasonable thing to support in the generic core.  
> > 
> > That's the reason why I chose explicit identifier rather than rely on
> > MAC address to bind/pair a device. MAC address can change. Even if it
> > can't, malicious guest user can fake MAC address to skip binding.
> > 
> > -Siwei  
> 
> Address should be sampled at device creation to prevent this
> kind of hack. Not that it buys the malicious user much:
> if you can poke at MAC addresses you probably already can
> break networking.

On Hyper-V guest can't really change MAC address if SR-IOV is enabled.
Also, Linux has permanent ether address in netdev which is what should
be used to avoid user messing with device.

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-04-24  1:25 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Michael S. Tsirkin,
	Jakub Kicinski, Sridhar Samudrala, virtualization, Netdev,
	David Miller
In-Reply-To: <CADGSJ20ge75T+ddxtUBT4d9m1i3=HLOAHJEoS7Cg0bqnXrutwA@mail.gmail.com>

On Mon, 23 Apr 2018 12:44:39 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Mon, Apr 23, 2018 at 10:56 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Apr 23, 2018 at 10:44:40AM -0700, Stephen Hemminger wrote:  
> >> On Mon, 23 Apr 2018 20:24:56 +0300
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>  
> >> > On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:  
> >> > > > >
> >> > > > >I will NAK patches to change to common code for netvsc especially the
> >> > > > >three device model.  MS worked hard with distro vendors to support transparent
> >> > > > >mode, ans we really can't have a new model; or do backport.
> >> > > > >
> >> > > > >Plus, DPDK is now dependent on existing model.  
> >> > > >
> >> > > > Sorry, but nobody here cares about dpdk or other similar oddities.  
> >> > >
> >> > > The network device model is a userspace API, and DPDK is a userspace application.  
> >> >
> >> > It is userspace but are you sure dpdk is actually poking at netdevs?
> >> > AFAIK it's normally banging device registers directly.
> >> >  
> >> > > You can't go breaking userspace even if you don't like the application.  
> >> >
> >> > Could you please explain how is the proposed patchset breaking
> >> > userspace? Ignoring DPDK for now, I don't think it changes the userspace
> >> > API at all.
> >> >  
> >>
> >> The DPDK has a device driver vdev_netvsc which scans the Linux network devices
> >> to look for Linux netvsc device and the paired VF device and setup the
> >> DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
> >> and sets up TAP support over the Linux netvsc device as well as the Mellanox
> >> VF device.
> >>
> >> So it depends on existing 2 device model. You can't go to a 3 device model
> >> or start hiding devices from userspace.  
> >
> > Okay so how does the existing patch break that? IIUC does not go to
> > a 3 device model since netvsc calls failover_register directly.
> >  
> >> Also, I am working on associating netvsc and VF device based on serial number
> >> rather than MAC address. The serial number is how Windows works now, and it makes
> >> sense for Linux and Windows to use the same mechanism if possible.  
> >
> > Maybe we should support same for virtio ...
> > Which serial do you mean? From vpd?
> >
> > I guess you will want to keep supporting MAC for old hypervisors?

The serial number has always been in the hypervisor since original support of SR-IOV
in WS2008.  So no backward compatibility special cases would be needed.

^ permalink raw reply

* Re: [RFC v2] virtio: support packed ring
From: Tiwei Bie @ 2018-04-24  1:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180424040258-mutt-send-email-mst@kernel.org>

On Tue, Apr 24, 2018 at 04:05:07AM +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 24, 2018 at 08:54:52AM +0800, Jason Wang wrote:
> > 
> > 
> > On 2018年04月23日 17:29, Tiwei Bie wrote:
> > > On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
> > > > On 2018年04月01日 22:12, Tiwei Bie wrote:
> > > > > Hello everyone,
> > > > > 
> > > > > This RFC implements packed ring support for virtio driver.
> > > > > 
> > > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > > > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > > > > Minor changes are needed for the vhost code, e.g. to kick the guest.
> > > > > 
> > > > > TODO:
> > > > > - Refinements and bug fixes;
> > > > > - Split into small patches;
> > > > > - Test indirect descriptor support;
> > > > > - Test/fix event suppression support;
> > > > > - Test devices other than net;
> > > > > 
> > > > > RFC v1 -> RFC v2:
> > > > > - Add indirect descriptor support - compile test only;
> > > > > - Add event suppression supprt - compile test only;
> > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > - Avoid using '%' operator (Jason);
> > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > - Some other refinements and bug fixes;
> > > > > 
> > > > > Thanks!
> > > > > 
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > ---
> > > > >    drivers/virtio/virtio_ring.c       | 1094 +++++++++++++++++++++++++++++-------
> > > > >    include/linux/virtio_ring.h        |    8 +-
> > > > >    include/uapi/linux/virtio_config.h |   12 +-
> > > > >    include/uapi/linux/virtio_ring.h   |   61 ++
> > > > >    4 files changed, 980 insertions(+), 195 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 71458f493cf8..0515dca34d77 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -58,14 +58,15 @@
> > > > [...]
> > > > 
> > > > > +
> > > > > +	if (vq->indirect) {
> > > > > +		u32 len;
> > > > > +
> > > > > +		desc = vq->desc_state[head].indir_desc;
> > > > > +		/* Free the indirect table, if any, now that it's unmapped. */
> > > > > +		if (!desc)
> > > > > +			goto out;
> > > > > +
> > > > > +		len = virtio32_to_cpu(vq->vq.vdev,
> > > > > +				      vq->vring_packed.desc[head].len);
> > > > > +
> > > > > +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> > > > > +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > > > It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
> > > > can safely remove this BUG_ON() here.
> > > > 
> > > > > +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > > > Len could be ignored for used descriptor according to the spec, so we need
> > > > remove this BUG_ON() too.
> > > Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
> > > And I think something related to this in the spec isn't very
> > > clear currently.
> > > 
> > > In the spec, there are below words:
> > > 
> > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
> > > """
> > > In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
> > > is reserved and is ignored by the device.
> > > """
> > > 
> > > So when device writes back an used descriptor in this case,
> > > device may not set the VIRTQ_DESC_F_WRITE flag as the flag
> > > is reserved and should be ignored.
> > > 
> > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
> > > """
> > > Element Length is reserved for used descriptors without the
> > > VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
> > > """
> > > 
> > > And this is the way how driver ignores the `len` in an used
> > > descriptor.
> > > 
> > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
> > > """
> > > To increase ring capacity the driver can store a (read-only
> > > by the device) table of indirect descriptors anywhere in memory,
> > > and insert a descriptor in the main virtqueue (with \field{Flags}
> > > bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
> > > containing this indirect descriptor table;
> > > """
> > > 
> > > So the indirect descriptors in the table are read-only by
> > > the device. And the only descriptor which is writeable by
> > > the device is the descriptor in the main virtqueue (with
> > > Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
> > > `len` in this descriptor, we won't be able to get the
> > > length of the data written by the device.
> > > 
> > > So I think the `len` in this descriptor will carry the
> > > length of the data written by the device (if the buffers
> > > are writable to the device) even if the VIRTQ_DESC_F_WRITE
> > > isn't set by the device. How do you think?
> > 
> > Yes I think so. But we'd better need clarification from Michael.
> 
> I think if you use a descriptor, and you want to supply len
> to guest, you set VIRTQ_DESC_F_WRITE in the used descriptor.
> Spec also says you must not set VIRTQ_DESC_F_INDIRECT then.
> If that's a problem we could look at relaxing that last requirement -
> does driver want INDIRECT in used descriptor to match
> the value in the avail descriptor for some reason?

For indirect, driver needs some way to get the length
of the data written by the driver. And the descriptors
in the indirect table is read-only, so the only place
device could put this value is the descriptor with the
VIRTQ_DESC_F_INDIRECT flag set.

> 
> > > 
> > > 
> > > > The reason is we don't touch descriptor ring in the case of split, so
> > > > BUG_ON()s may help there.
> > > > 
> > > > > +
> > > > > +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> > > > > +			vring_unmap_one_packed(vq, &desc[j]);
> > > > > +
> > > > > +		kfree(desc);
> > > > > +		vq->desc_state[head].indir_desc = NULL;
> > > > > +	} else if (ctx) {
> > > > > +		*ctx = vq->desc_state[head].indir_desc;
> > > > > +	}
> > > > > +
> > > > > +out:
> > > > > +	return vq->desc_state[head].num;
> > > > > +}
> > > > > +
> > > > > +static inline bool more_used_split(const struct vring_virtqueue *vq)
> > > > >    {
> > > > >    	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> > > > >    }
> > > > > +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> > > > > +{
> > > > > +	u16 last_used, flags;
> > > > > +	bool avail, used;
> > > > > +
> > > > > +	if (vq->vq.num_free == vq->vring_packed.num)
> > > > > +		return false;
> > > > > +
> > > > > +	last_used = vq->last_used_idx;
> > > > > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > > > > +				vq->vring_packed.desc[last_used].flags);
> > > > > +	avail = flags & VRING_DESC_F_AVAIL(1);
> > > > > +	used = flags & VRING_DESC_F_USED(1);
> > > > > +
> > > > > +	return avail == used;
> > > > > +}
> > > > This looks interesting, spec said:
> > > > 
> > > > "
> > > > Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different for an
> > > > available descriptor and
> > > > equal for a used descriptor.
> > > > Note that this observation is mostly useful for sanity-checking as these are
> > > > necessary but not sufficient
> > > > conditions - for example, all descriptors are zero-initialized. To detect
> > > > used and available descriptors it is
> > > > possible for drivers and devices to keep track of the last observed value of
> > > > VIRTQ_DESC_F_USED/VIRTQ_-
> > > > DESC_F_AVAIL. Other techniques to detect
> > > > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> > > > might also be possible.
> > > > "
> > > > 
> > > > So it looks to me it was not sufficient, looking at the example codes in
> > > > spec, do we need to track last seen used_wrap_counter here?
> > > I don't think we have to track used_wrap_counter in
> > > driver. There was a discussion on this:
> > > 
> > > https://lists.oasis-open.org/archives/virtio-dev/201802/msg00177.html
> > > 
> > > And after that, below sentence was added (it's also
> > > in the above words you quoted):
> > > 
> > > """
> > > Other techniques to detect
> > > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> > > might also be possible.
> > > """
> > > 
> > > Best regards,
> > > Tiwei Bie
> > 
> > I see, the extra condition "if (vq->vq.num_free == vq->vring_packed.num)"
> > help in this case.
> > 
> > Thanks
> 
> I still think tracking a wrap counter is better.

From my understanding, wrap counter is only needed when
one side just want to update parts of the status bit(s),
it's something like the "report status" or "write back"
feature [1] in the hardware NIC. And in the driver, all
the status must be updated, and that's why I don't want
to track the usedwrap counter.

[1] https://github.com/btw616/dpdk-virtio1.1/commit/ca837865bd10

Best regards,
Tiwei Bie

> 
> > > 
> > > > Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC v2] virtio: support packed ring
From: Jason Wang @ 2018-04-24  1:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180424040258-mutt-send-email-mst@kernel.org>



On 2018年04月24日 09:05, Michael S. Tsirkin wrote:
>>>>> +	if (vq->indirect) {
>>>>> +		u32 len;
>>>>> +
>>>>> +		desc = vq->desc_state[head].indir_desc;
>>>>> +		/* Free the indirect table, if any, now that it's unmapped. */
>>>>> +		if (!desc)
>>>>> +			goto out;
>>>>> +
>>>>> +		len = virtio32_to_cpu(vq->vq.vdev,
>>>>> +				      vq->vring_packed.desc[head].len);
>>>>> +
>>>>> +		BUG_ON(!(vq->vring_packed.desc[head].flags &
>>>>> +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
>>>> It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
>>>> can safely remove this BUG_ON() here.
>>>>
>>>>> +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
>>>> Len could be ignored for used descriptor according to the spec, so we need
>>>> remove this BUG_ON() too.
>>> Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
>>> And I think something related to this in the spec isn't very
>>> clear currently.
>>>
>>> In the spec, there are below words:
>>>
>>> https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
>>> """
>>> In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
>>> is reserved and is ignored by the device.
>>> """
>>>
>>> So when device writes back an used descriptor in this case,
>>> device may not set the VIRTQ_DESC_F_WRITE flag as the flag
>>> is reserved and should be ignored.
>>>
>>> https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
>>> """
>>> Element Length is reserved for used descriptors without the
>>> VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
>>> """
>>>
>>> And this is the way how driver ignores the `len` in an used
>>> descriptor.
>>>
>>> https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
>>> """
>>> To increase ring capacity the driver can store a (read-only
>>> by the device) table of indirect descriptors anywhere in memory,
>>> and insert a descriptor in the main virtqueue (with \field{Flags}
>>> bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
>>> containing this indirect descriptor table;
>>> """
>>>
>>> So the indirect descriptors in the table are read-only by
>>> the device. And the only descriptor which is writeable by
>>> the device is the descriptor in the main virtqueue (with
>>> Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
>>> `len` in this descriptor, we won't be able to get the
>>> length of the data written by the device.
>>>
>>> So I think the `len` in this descriptor will carry the
>>> length of the data written by the device (if the buffers
>>> are writable to the device) even if the VIRTQ_DESC_F_WRITE
>>> isn't set by the device. How do you think?
>> Yes I think so. But we'd better need clarification from Michael.
> I think if you use a descriptor, and you want to supply len
> to guest, you set VIRTQ_DESC_F_WRITE in the used descriptor.
> Spec also says you must not set VIRTQ_DESC_F_INDIRECT then.
> If that's a problem we could look at relaxing that last requirement -
> does driver want INDIRECT in used descriptor to match
> the value in the avail descriptor for some reason?
>

Looks not, so what I get it:

- device and set VIRTQ_DESC_F_WRITE flag for used descriptor when needed
- there no need to keep INDIRECT flag in used descriptor

So for the above case, we can just have a used descriptor with _F_WRITE 
but without INDIRECT flag.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC v2] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-04-24  1:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <e08c55df-6ea2-cf7f-d7c9-91e55913ab16@redhat.com>

On Tue, Apr 24, 2018 at 08:54:52AM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月23日 17:29, Tiwei Bie wrote:
> > On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
> > > On 2018年04月01日 22:12, Tiwei Bie wrote:
> > > > Hello everyone,
> > > > 
> > > > This RFC implements packed ring support for virtio driver.
> > > > 
> > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > > > Minor changes are needed for the vhost code, e.g. to kick the guest.
> > > > 
> > > > TODO:
> > > > - Refinements and bug fixes;
> > > > - Split into small patches;
> > > > - Test indirect descriptor support;
> > > > - Test/fix event suppression support;
> > > > - Test devices other than net;
> > > > 
> > > > RFC v1 -> RFC v2:
> > > > - Add indirect descriptor support - compile test only;
> > > > - Add event suppression supprt - compile test only;
> > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > - Avoid using '%' operator (Jason);
> > > > - Rename free_head -> next_avail_idx (Jason);
> > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > - Some other refinements and bug fixes;
> > > > 
> > > > Thanks!
> > > > 
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > ---
> > > >    drivers/virtio/virtio_ring.c       | 1094 +++++++++++++++++++++++++++++-------
> > > >    include/linux/virtio_ring.h        |    8 +-
> > > >    include/uapi/linux/virtio_config.h |   12 +-
> > > >    include/uapi/linux/virtio_ring.h   |   61 ++
> > > >    4 files changed, 980 insertions(+), 195 deletions(-)
> > > > 
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 71458f493cf8..0515dca34d77 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -58,14 +58,15 @@
> > > [...]
> > > 
> > > > +
> > > > +	if (vq->indirect) {
> > > > +		u32 len;
> > > > +
> > > > +		desc = vq->desc_state[head].indir_desc;
> > > > +		/* Free the indirect table, if any, now that it's unmapped. */
> > > > +		if (!desc)
> > > > +			goto out;
> > > > +
> > > > +		len = virtio32_to_cpu(vq->vq.vdev,
> > > > +				      vq->vring_packed.desc[head].len);
> > > > +
> > > > +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> > > > +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > > It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
> > > can safely remove this BUG_ON() here.
> > > 
> > > > +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > > Len could be ignored for used descriptor according to the spec, so we need
> > > remove this BUG_ON() too.
> > Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
> > And I think something related to this in the spec isn't very
> > clear currently.
> > 
> > In the spec, there are below words:
> > 
> > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
> > """
> > In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
> > is reserved and is ignored by the device.
> > """
> > 
> > So when device writes back an used descriptor in this case,
> > device may not set the VIRTQ_DESC_F_WRITE flag as the flag
> > is reserved and should be ignored.
> > 
> > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
> > """
> > Element Length is reserved for used descriptors without the
> > VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
> > """
> > 
> > And this is the way how driver ignores the `len` in an used
> > descriptor.
> > 
> > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
> > """
> > To increase ring capacity the driver can store a (read-only
> > by the device) table of indirect descriptors anywhere in memory,
> > and insert a descriptor in the main virtqueue (with \field{Flags}
> > bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
> > containing this indirect descriptor table;
> > """
> > 
> > So the indirect descriptors in the table are read-only by
> > the device. And the only descriptor which is writeable by
> > the device is the descriptor in the main virtqueue (with
> > Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
> > `len` in this descriptor, we won't be able to get the
> > length of the data written by the device.
> > 
> > So I think the `len` in this descriptor will carry the
> > length of the data written by the device (if the buffers
> > are writable to the device) even if the VIRTQ_DESC_F_WRITE
> > isn't set by the device. How do you think?
> 
> Yes I think so. But we'd better need clarification from Michael.

I think if you use a descriptor, and you want to supply len
to guest, you set VIRTQ_DESC_F_WRITE in the used descriptor.
Spec also says you must not set VIRTQ_DESC_F_INDIRECT then.
If that's a problem we could look at relaxing that last requirement -
does driver want INDIRECT in used descriptor to match
the value in the avail descriptor for some reason?

> > 
> > 
> > > The reason is we don't touch descriptor ring in the case of split, so
> > > BUG_ON()s may help there.
> > > 
> > > > +
> > > > +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> > > > +			vring_unmap_one_packed(vq, &desc[j]);
> > > > +
> > > > +		kfree(desc);
> > > > +		vq->desc_state[head].indir_desc = NULL;
> > > > +	} else if (ctx) {
> > > > +		*ctx = vq->desc_state[head].indir_desc;
> > > > +	}
> > > > +
> > > > +out:
> > > > +	return vq->desc_state[head].num;
> > > > +}
> > > > +
> > > > +static inline bool more_used_split(const struct vring_virtqueue *vq)
> > > >    {
> > > >    	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> > > >    }
> > > > +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> > > > +{
> > > > +	u16 last_used, flags;
> > > > +	bool avail, used;
> > > > +
> > > > +	if (vq->vq.num_free == vq->vring_packed.num)
> > > > +		return false;
> > > > +
> > > > +	last_used = vq->last_used_idx;
> > > > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > > > +				vq->vring_packed.desc[last_used].flags);
> > > > +	avail = flags & VRING_DESC_F_AVAIL(1);
> > > > +	used = flags & VRING_DESC_F_USED(1);
> > > > +
> > > > +	return avail == used;
> > > > +}
> > > This looks interesting, spec said:
> > > 
> > > "
> > > Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different for an
> > > available descriptor and
> > > equal for a used descriptor.
> > > Note that this observation is mostly useful for sanity-checking as these are
> > > necessary but not sufficient
> > > conditions - for example, all descriptors are zero-initialized. To detect
> > > used and available descriptors it is
> > > possible for drivers and devices to keep track of the last observed value of
> > > VIRTQ_DESC_F_USED/VIRTQ_-
> > > DESC_F_AVAIL. Other techniques to detect
> > > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> > > might also be possible.
> > > "
> > > 
> > > So it looks to me it was not sufficient, looking at the example codes in
> > > spec, do we need to track last seen used_wrap_counter here?
> > I don't think we have to track used_wrap_counter in
> > driver. There was a discussion on this:
> > 
> > https://lists.oasis-open.org/archives/virtio-dev/201802/msg00177.html
> > 
> > And after that, below sentence was added (it's also
> > in the above words you quoted):
> > 
> > """
> > Other techniques to detect
> > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> > might also be possible.
> > """
> > 
> > Best regards,
> > Tiwei Bie
> 
> I see, the extra condition "if (vq->vq.num_free == vq->vring_packed.num)"
> help in this case.
> 
> Thanks

I still think tracking a wrap counter is better.

> > 
> > > Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC V3 PATCH 0/8] Packed ring for vhost
From: Jason Wang @ 2018-04-24  1:04 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Michael S. Tsirkin
  Cc: kvm, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180423201151.GD5215@char.us.oracle.com>



On 2018年04月24日 04:11, Konrad Rzeszutek Wilk wrote:
> On Mon, Apr 23, 2018 at 10:59:43PM +0300, Michael S. Tsirkin wrote:
>> On Mon, Apr 23, 2018 at 03:31:20PM -0400, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Apr 23, 2018 at 01:34:52PM +0800, Jason Wang wrote:
>>>> Hi all:
>>>>
>>>> This RFC implement packed ring layout. The code were tested with
>>>> Tiwei's RFC V2 a thttps://lkml.org/lkml/2018/4/1/48. Some fixups and
>>>> tweaks were needed on top of Tiwei's code to make it run. TCP stream
>>>> and pktgen does not show obvious difference compared with split ring.
>>> I have to ask then - what is the benefit of this?
>> You can use this with dpdk within guest.
>> The DPDK version did see a performance improvement so hopefully with
> Is there a link to this performance improvement document?
>

You probably want to this:

https://www.mail-archive.com/dev@dpdk.org/msg97735.html

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC v2] virtio: support packed ring
From: Jason Wang @ 2018-04-24  0:54 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180423092908.77rii3gi7dcaf7o6@debian>



On 2018年04月23日 17:29, Tiwei Bie wrote:
> On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
>> On 2018年04月01日 22:12, Tiwei Bie wrote:
>>> Hello everyone,
>>>
>>> This RFC implements packed ring support for virtio driver.
>>>
>>> The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
>>> by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
>>> Minor changes are needed for the vhost code, e.g. to kick the guest.
>>>
>>> TODO:
>>> - Refinements and bug fixes;
>>> - Split into small patches;
>>> - Test indirect descriptor support;
>>> - Test/fix event suppression support;
>>> - Test devices other than net;
>>>
>>> RFC v1 -> RFC v2:
>>> - Add indirect descriptor support - compile test only;
>>> - Add event suppression supprt - compile test only;
>>> - Move vring_packed_init() out of uapi (Jason, MST);
>>> - Merge two loops into one in virtqueue_add_packed() (Jason);
>>> - Split vring_unmap_one() for packed ring and split ring (Jason);
>>> - Avoid using '%' operator (Jason);
>>> - Rename free_head -> next_avail_idx (Jason);
>>> - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
>>> - Some other refinements and bug fixes;
>>>
>>> Thanks!
>>>
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>>    drivers/virtio/virtio_ring.c       | 1094 +++++++++++++++++++++++++++++-------
>>>    include/linux/virtio_ring.h        |    8 +-
>>>    include/uapi/linux/virtio_config.h |   12 +-
>>>    include/uapi/linux/virtio_ring.h   |   61 ++
>>>    4 files changed, 980 insertions(+), 195 deletions(-)
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index 71458f493cf8..0515dca34d77 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -58,14 +58,15 @@
>> [...]
>>
>>> +
>>> +	if (vq->indirect) {
>>> +		u32 len;
>>> +
>>> +		desc = vq->desc_state[head].indir_desc;
>>> +		/* Free the indirect table, if any, now that it's unmapped. */
>>> +		if (!desc)
>>> +			goto out;
>>> +
>>> +		len = virtio32_to_cpu(vq->vq.vdev,
>>> +				      vq->vring_packed.desc[head].len);
>>> +
>>> +		BUG_ON(!(vq->vring_packed.desc[head].flags &
>>> +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
>> It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
>> can safely remove this BUG_ON() here.
>>
>>> +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
>> Len could be ignored for used descriptor according to the spec, so we need
>> remove this BUG_ON() too.
> Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
> And I think something related to this in the spec isn't very
> clear currently.
>
> In the spec, there are below words:
>
> https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
> """
> In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
> is reserved and is ignored by the device.
> """
>
> So when device writes back an used descriptor in this case,
> device may not set the VIRTQ_DESC_F_WRITE flag as the flag
> is reserved and should be ignored.
>
> https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
> """
> Element Length is reserved for used descriptors without the
> VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
> """
>
> And this is the way how driver ignores the `len` in an used
> descriptor.
>
> https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
> """
> To increase ring capacity the driver can store a (read-only
> by the device) table of indirect descriptors anywhere in memory,
> and insert a descriptor in the main virtqueue (with \field{Flags}
> bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
> containing this indirect descriptor table;
> """
>
> So the indirect descriptors in the table are read-only by
> the device. And the only descriptor which is writeable by
> the device is the descriptor in the main virtqueue (with
> Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
> `len` in this descriptor, we won't be able to get the
> length of the data written by the device.
>
> So I think the `len` in this descriptor will carry the
> length of the data written by the device (if the buffers
> are writable to the device) even if the VIRTQ_DESC_F_WRITE
> isn't set by the device. How do you think?

Yes I think so. But we'd better need clarification from Michael.

>
>
>> The reason is we don't touch descriptor ring in the case of split, so
>> BUG_ON()s may help there.
>>
>>> +
>>> +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
>>> +			vring_unmap_one_packed(vq, &desc[j]);
>>> +
>>> +		kfree(desc);
>>> +		vq->desc_state[head].indir_desc = NULL;
>>> +	} else if (ctx) {
>>> +		*ctx = vq->desc_state[head].indir_desc;
>>> +	}
>>> +
>>> +out:
>>> +	return vq->desc_state[head].num;
>>> +}
>>> +
>>> +static inline bool more_used_split(const struct vring_virtqueue *vq)
>>>    {
>>>    	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
>>>    }
>>> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
>>> +{
>>> +	u16 last_used, flags;
>>> +	bool avail, used;
>>> +
>>> +	if (vq->vq.num_free == vq->vring_packed.num)
>>> +		return false;
>>> +
>>> +	last_used = vq->last_used_idx;
>>> +	flags = virtio16_to_cpu(vq->vq.vdev,
>>> +				vq->vring_packed.desc[last_used].flags);
>>> +	avail = flags & VRING_DESC_F_AVAIL(1);
>>> +	used = flags & VRING_DESC_F_USED(1);
>>> +
>>> +	return avail == used;
>>> +}
>> This looks interesting, spec said:
>>
>> "
>> Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different for an
>> available descriptor and
>> equal for a used descriptor.
>> Note that this observation is mostly useful for sanity-checking as these are
>> necessary but not sufficient
>> conditions - for example, all descriptors are zero-initialized. To detect
>> used and available descriptors it is
>> possible for drivers and devices to keep track of the last observed value of
>> VIRTQ_DESC_F_USED/VIRTQ_-
>> DESC_F_AVAIL. Other techniques to detect
>> VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
>> might also be possible.
>> "
>>
>> So it looks to me it was not sufficient, looking at the example codes in
>> spec, do we need to track last seen used_wrap_counter here?
> I don't think we have to track used_wrap_counter in
> driver. There was a discussion on this:
>
> https://lists.oasis-open.org/archives/virtio-dev/201802/msg00177.html
>
> And after that, below sentence was added (it's also
> in the above words you quoted):
>
> """
> Other techniques to detect
> VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> might also be possible.
> """
>
> Best regards,
> Tiwei Bie

I see, the extra condition "if (vq->vq.num_free == 
vq->vring_packed.num)" help in this case.

Thanks

>
>> Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Mikulas Patocka @ 2018-04-24  0:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, Matthew Wilcox,
	virtualization, linux-mm, edumazet, Andrew Morton, David Miller,
	Vlastimil Babka
In-Reply-To: <20180423151545.GU17484@dhcp22.suse.cz>



On Mon, 23 Apr 2018, Michal Hocko wrote:

> On Mon 23-04-18 10:06:08, Mikulas Patocka wrote:
> 
> > > > He didn't want to fix vmalloc(GFP_NOIO)
> > > 
> > > I don't remember that conversation, so I don't know whether I agree with
> > > his reasoning or not.  But we are supposed to be moving away from GFP_NOIO
> > > towards marking regions with memalloc_noio_save() / restore.  If you do
> > > that, you won't need vmalloc(GFP_NOIO).
> > 
> > He said the same thing a year ago. And there was small progress. 6 out of 
> > 27 __vmalloc calls were converted to memalloc_noio_save in a year - 5 in 
> > infiniband and 1 in btrfs. (the whole discussion is here 
> > http://lkml.iu.edu/hypermail/linux/kernel/1706.3/04681.html )
> 
> Well this is not that easy. It requires a cooperation from maintainers.
> I can only do as much. I've posted patches in the past and actively
> bringing up this topic at LSFMM last two years...

You're right - but you have chosen the uneasy path. Fixing __vmalloc code 
is easy and it doesn't require cooperation with maintainers.

> > He refuses 15-line patch to fix GFP_NOIO bug because he believes that in 4 
> > years, the kernel will be refactored and GFP_NOIO will be eliminated. Why 
> > does he have veto over this part of the code? I'd much rather argue with 
> > people who have constructive comments about fixing bugs than with him.
> 
> I didn't NACK the patch AFAIR. I've said it is not a good idea longterm.
> I would be much more willing to change my mind if you would back your
> patch by a real bug report. Hacks are acceptable when we have a real
> issue in hands. But if we want to fix potential issue then better make
> it properly.

Developers should fix bugs in advance, not to wait until a crash hapens, 
is analyzed and reported.

What's the problem with 15-line hack? Is the problem that kernel 
developers would feel depressed when looking the source code? Other than 
harming developers' feelings, I don't see what kind of damange could that 
piece of code do.

Mikulas

^ permalink raw reply

* [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
From: Mikulas Patocka @ 2018-04-24  0:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, Matthew Wilcox,
	virtualization, linux-mm, edumazet, Andrew Morton, David Miller,
	Vlastimil Babka
In-Reply-To: <20180423151545.GU17484@dhcp22.suse.cz>

The kvmalloc function tries to use kmalloc and falls back to vmalloc if
kmalloc fails.

Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
uses DMA-API on the returned memory or frees it with kfree. Such bugs were
found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
code.

These bugs are hard to reproduce because kvmalloc falls back to vmalloc
only if memory is fragmented.

In order to detect these bugs reliably I submit this patch that changes
kvmalloc to fall back to vmalloc with 1/2 probability if CONFIG_DEBUG_SG
is turned on. CONFIG_DEBUG_SG is used, because it makes the DMA API layer
verify the addresses passed to it, and so the user will get a reliable
stacktrace.

Some bugs (such as buffer overflows) are better detected
with kmalloc code, so we must test the kmalloc path too.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 mm/util.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

Index: linux-2.6/mm/util.c
===================================================================
--- linux-2.6.orig/mm/util.c	2018-04-23 00:12:05.000000000 +0200
+++ linux-2.6/mm/util.c	2018-04-23 17:57:02.000000000 +0200
@@ -14,6 +14,7 @@
 #include <linux/hugetlb.h>
 #include <linux/vmalloc.h>
 #include <linux/userfaultfd_k.h>
+#include <linux/random.h>
 
 #include <asm/sections.h>
 #include <linux/uaccess.h>
@@ -404,6 +405,12 @@ void *kvmalloc_node(size_t size, gfp_t f
 	 */
 	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
 
+#ifdef CONFIG_DEBUG_SG
+	/* Catch bugs when the caller uses DMA API on the result of kvmalloc. */
+	if (!(prandom_u32_max(2) & 1))
+		goto do_vmalloc;
+#endif
+
 	/*
 	 * We want to attempt a large physically contiguous block first because
 	 * it is less likely to fragment multiple larger blocks and therefore
@@ -427,6 +434,9 @@ void *kvmalloc_node(size_t size, gfp_t f
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
+#ifdef CONFIG_DEBUG_SG
+do_vmalloc:
+#endif
 	return __vmalloc_node_flags_caller(size, node, flags,
 			__builtin_return_address(0));
 }

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Mikulas Patocka @ 2018-04-23 23:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: dm-devel, eric.dumazet, mst, netdev, linux-kernel, Matthew Wilcox,
	virtualization, linux-mm, edumazet, bhutchings, Andrew Morton,
	David Miller, Vlastimil Babka
In-Reply-To: <20180423151015.GT17484@dhcp22.suse.cz>



On Mon, 23 Apr 2018, Michal Hocko wrote:

> On Mon 23-04-18 10:24:02, Mikulas Patocka wrote:
> 
> > > Really, we have a fault injection framework and this sounds like
> > > something to hook in there.
> > 
> > The testing people won't set it up. They install the "kernel-debug" 
> > package and run the tests in it.
> > 
> > If you introduce a hidden option that no one knows about, no one will use 
> > it.
> 
> then make sure people know about it. Fuzzers already do test fault
> injections.

I think that in the long term we can introduce a kernel parameter like 
"debug_level=1", "debug_level=2", "debug_level=3" that will turn on 
debugging features across all kernel subsystems and we can teach users to 
use it to diagnose problems.

But it won't work if every subsystem has different debug parameters. There 
are 192 distinct filenames in debugfs, if we add 193rd one, harly anyone 
notices it.

Mikulas

^ permalink raw reply

* Re: virtio remoteproc device
From: Michael S. Tsirkin @ 2018-04-23 21:02 UTC (permalink / raw)
  To: Loic PALLARDY
  Cc: Ohad Ben-Cohen, Anup Patel,
	virtualization@lists.linux-foundation.org,
	linux-remoteproc@vger.kernel.org, Bjorn Andersson
In-Reply-To: <6207f230555b46a99f85a6a7c18155bd@SFHDAG7NODE2.st.com>

On Mon, Apr 23, 2018 at 08:45:57PM +0000, Loic PALLARDY wrote:
> 
> 
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Monday, April 23, 2018 9:41 PM
> > To: Loic PALLARDY <loic.pallardy@st.com>
> > Cc: Anup Patel <anup@brainfault.org>; linux-remoteproc@vger.kernel.org;
> > Ohad Ben-Cohen <ohad@wizery.com>; Bjorn Andersson
> > <bjorn.andersson@linaro.org>; virtualization@lists.linux-foundation.org
> > Subject: Re: virtio remoteproc device
> > 
> > On Mon, Apr 23, 2018 at 08:55:57AM +0000, Loic PALLARDY wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: linux-remoteproc-owner@vger.kernel.org [mailto:linux-
> > remoteproc-
> > > > owner@vger.kernel.org] On Behalf Of Anup Patel
> > > > Sent: Sunday, April 22, 2018 6:08 AM
> > > > To: Michael S. Tsirkin <mst@redhat.com>
> > > > Cc: linux-remoteproc@vger.kernel.org; Ohad Ben-Cohen
> > > > <ohad@wizery.com>; Bjorn Andersson <bjorn.andersson@linaro.org>;
> > > > virtualization@lists.linux-foundation.org
> > > > Subject: Re: virtio remoteproc device
> > > >
> > > > On Fri, Apr 20, 2018 at 10:23 PM, Michael S. Tsirkin <mst@redhat.com>
> > > > wrote:
> > > > > Hello!
> > > > > I note the following in the serial console:
> > > > >
> > > > >       if (is_rproc_serial(vdev)) {
> > > > >                 /*
> > > > >                  * Allocate DMA memory from ancestor. When a virtio
> > > > >                  * device is created by remoteproc, the DMA memory is
> > > > >                  * associated with the grandparent device:
> > > > >                  * vdev => rproc => platform-dev.
> > > > >                  */
> > > > >                 if (!vdev->dev.parent || !vdev->dev.parent->parent)
> > > > >                         goto free_buf;
> > > > >                 buf->dev = vdev->dev.parent->parent;
> > > > >
> > > > >                 /* Increase device refcnt to avoid freeing it */
> > > > >                 get_device(buf->dev);
> > > > >                 buf->buf = dma_alloc_coherent(buf->dev, buf_size, &buf-
> > >dma,
> > > > >                                               GFP_KERNEL);
> > > > >         }
> > > > >
> > > > > Added here:
> > > > >         commit 1b6370463e88b0c1c317de16d7b962acc1dab4f2
> > > > >         Author: Sjur Brændeland <sjur.brandeland@stericsson.com>
> > > > >         Date:   Fri Dec 14 14:40:51 2012 +1030
> > > > >
> > > > >     virtio_console: Add support for remoteproc serial
> > > > >
> > > > >
> > > > > I am not familiar with rproc so I have a question:
> > > > > why is it required to use coherent memory here,
> > > > > and why through a grandparent device?
> > > >
> > > > I faced similar issue when I tried VirtIO RPMSG bus over
> > > > VirtIO MMIO transport.
> > > >
> > > > Here's my fix for VirtIO RPMSG bus driver:
> > > > https://patchwork.kernel.org/patch/10155145/
> > >
> > > Hi Anup and Michael,
> > >
> > > I pushed a series to modify virtio device allocation in remoteproc. Please
> > see [1].
> > > It allows to remove allocation based on "grand-parent" device in the case
> > of virtio device allocated by remoteproc.
> > > Virto_console patch missing, only virtio_rpmsg modification proposed. I can
> > add it in next version.
> > >
> > > Regards,
> > > Loic
> > >
> > > [1] https://lkml.org/lkml/2018/3/1/644
> > >
> > > >
> > > > Regards,
> > > > Anup
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-
> > remoteproc" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > So on top of your patches, can we just force use of DMA API
> > and drop special casting?
> > E.g. maybe something like the below - completely untested - patch?
> > Does it work for you?
> > 
> 
> Just have a look to VIRTIO_F_IOMMU_PLATFORM usage.
> If we activate it, this will change virtio_ring behavior, allowing to rely on dma api, but need to have deeper look to virtio_console impacts.

Could you test the patch though?

> For sure, if you remove/disable rproc specific code in virtio_console, alloc_buf will rely on kmalloc instead of dma_alloc_coherent. As some coprocessors doesn't have access to complete memory map and in some case no access to DDR memory, buffer copy will have to be done in dedicated vdev memory before adding it to vring.

I'm guessing that for console an extra copy is not a big deal. How much
data is pushed there after all, right?

> Specific rproc case was added to avoid copy and directly using shared memory for buffer allocation.
> 
> Regards,
> Loic

I was under the impression that it was more to just make it work
somehow. But I might be wrong.

> > 
> > diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> > index 2108551..6c6767c 100644
> > --- a/drivers/char/virtio_console.c
> > +++ b/drivers/char/virtio_console.c
> > @@ -40,7 +40,8 @@
> >  #include <linux/dma-mapping.h>
> >  #include "../tty/hvc/hvc_console.h"
> > 
> > -#define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
> > +//#define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
> > +#define is_rproc_enabled 0
> > 
> >  /*
> >   * This is a global struct for storing common data for all the devices
> > diff --git a/drivers/remoteproc/remoteproc_virtio.c
> > b/drivers/remoteproc/remoteproc_virtio.c
> > index b0633fd..4a13ca2 100644
> > --- a/drivers/remoteproc/remoteproc_virtio.c
> > +++ b/drivers/remoteproc/remoteproc_virtio.c
> > @@ -199,7 +199,7 @@ static u64 rproc_virtio_get_features(struct
> > virtio_device *vdev)
> > 
> >  	rsc = (void *)rvdev->rproc->table_ptr + rvdev->rsc_offset;
> > 
> > -	return rsc->dfeatures;
> > +	return rsc->dfeatures | (1ULL << VIRTIO_F_IOMMU_PLATFORM);
> >  }
> > 
> >  static int rproc_virtio_finalize_features(struct virtio_device *vdev)
> > @@ -213,7 +213,7 @@ static int rproc_virtio_finalize_features(struct
> > virtio_device *vdev)
> >  	vring_transport_features(vdev);
> > 
> >  	/* Make sure we don't have any features > 32 bits! */
> > -	BUG_ON((u32)vdev->features != vdev->features);
> > +	//BUG_ON((u32)vdev->features != vdev->features);
> > 
> >  	/*
> >  	 * Remember the finalized features of our vdev, and provide it

^ permalink raw reply

* RE: virtio remoteproc device
From: Loic PALLARDY @ 2018-04-23 20:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Ohad Ben-Cohen, Anup Patel,
	virtualization@lists.linux-foundation.org,
	linux-remoteproc@vger.kernel.org, Bjorn Andersson
In-Reply-To: <20180423223717-mutt-send-email-mst@kernel.org>



> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Monday, April 23, 2018 9:41 PM
> To: Loic PALLARDY <loic.pallardy@st.com>
> Cc: Anup Patel <anup@brainfault.org>; linux-remoteproc@vger.kernel.org;
> Ohad Ben-Cohen <ohad@wizery.com>; Bjorn Andersson
> <bjorn.andersson@linaro.org>; virtualization@lists.linux-foundation.org
> Subject: Re: virtio remoteproc device
> 
> On Mon, Apr 23, 2018 at 08:55:57AM +0000, Loic PALLARDY wrote:
> >
> >
> > > -----Original Message-----
> > > From: linux-remoteproc-owner@vger.kernel.org [mailto:linux-
> remoteproc-
> > > owner@vger.kernel.org] On Behalf Of Anup Patel
> > > Sent: Sunday, April 22, 2018 6:08 AM
> > > To: Michael S. Tsirkin <mst@redhat.com>
> > > Cc: linux-remoteproc@vger.kernel.org; Ohad Ben-Cohen
> > > <ohad@wizery.com>; Bjorn Andersson <bjorn.andersson@linaro.org>;
> > > virtualization@lists.linux-foundation.org
> > > Subject: Re: virtio remoteproc device
> > >
> > > On Fri, Apr 20, 2018 at 10:23 PM, Michael S. Tsirkin <mst@redhat.com>
> > > wrote:
> > > > Hello!
> > > > I note the following in the serial console:
> > > >
> > > >       if (is_rproc_serial(vdev)) {
> > > >                 /*
> > > >                  * Allocate DMA memory from ancestor. When a virtio
> > > >                  * device is created by remoteproc, the DMA memory is
> > > >                  * associated with the grandparent device:
> > > >                  * vdev => rproc => platform-dev.
> > > >                  */
> > > >                 if (!vdev->dev.parent || !vdev->dev.parent->parent)
> > > >                         goto free_buf;
> > > >                 buf->dev = vdev->dev.parent->parent;
> > > >
> > > >                 /* Increase device refcnt to avoid freeing it */
> > > >                 get_device(buf->dev);
> > > >                 buf->buf = dma_alloc_coherent(buf->dev, buf_size, &buf-
> >dma,
> > > >                                               GFP_KERNEL);
> > > >         }
> > > >
> > > > Added here:
> > > >         commit 1b6370463e88b0c1c317de16d7b962acc1dab4f2
> > > >         Author: Sjur Brændeland <sjur.brandeland@stericsson.com>
> > > >         Date:   Fri Dec 14 14:40:51 2012 +1030
> > > >
> > > >     virtio_console: Add support for remoteproc serial
> > > >
> > > >
> > > > I am not familiar with rproc so I have a question:
> > > > why is it required to use coherent memory here,
> > > > and why through a grandparent device?
> > >
> > > I faced similar issue when I tried VirtIO RPMSG bus over
> > > VirtIO MMIO transport.
> > >
> > > Here's my fix for VirtIO RPMSG bus driver:
> > > https://patchwork.kernel.org/patch/10155145/
> >
> > Hi Anup and Michael,
> >
> > I pushed a series to modify virtio device allocation in remoteproc. Please
> see [1].
> > It allows to remove allocation based on "grand-parent" device in the case
> of virtio device allocated by remoteproc.
> > Virto_console patch missing, only virtio_rpmsg modification proposed. I can
> add it in next version.
> >
> > Regards,
> > Loic
> >
> > [1] https://lkml.org/lkml/2018/3/1/644
> >
> > >
> > > Regards,
> > > Anup
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> remoteproc" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> So on top of your patches, can we just force use of DMA API
> and drop special casting?
> E.g. maybe something like the below - completely untested - patch?
> Does it work for you?
> 

Just have a look to VIRTIO_F_IOMMU_PLATFORM usage.
If we activate it, this will change virtio_ring behavior, allowing to rely on dma api, but need to have deeper look to virtio_console impacts.

For sure, if you remove/disable rproc specific code in virtio_console, alloc_buf will rely on kmalloc instead of dma_alloc_coherent. As some coprocessors doesn't have access to complete memory map and in some case no access to DDR memory, buffer copy will have to be done in dedicated vdev memory before adding it to vring.
Specific rproc case was added to avoid copy and directly using shared memory for buffer allocation.

Regards,
Loic
> 
> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> index 2108551..6c6767c 100644
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -40,7 +40,8 @@
>  #include <linux/dma-mapping.h>
>  #include "../tty/hvc/hvc_console.h"
> 
> -#define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
> +//#define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
> +#define is_rproc_enabled 0
> 
>  /*
>   * This is a global struct for storing common data for all the devices
> diff --git a/drivers/remoteproc/remoteproc_virtio.c
> b/drivers/remoteproc/remoteproc_virtio.c
> index b0633fd..4a13ca2 100644
> --- a/drivers/remoteproc/remoteproc_virtio.c
> +++ b/drivers/remoteproc/remoteproc_virtio.c
> @@ -199,7 +199,7 @@ static u64 rproc_virtio_get_features(struct
> virtio_device *vdev)
> 
>  	rsc = (void *)rvdev->rproc->table_ptr + rvdev->rsc_offset;
> 
> -	return rsc->dfeatures;
> +	return rsc->dfeatures | (1ULL << VIRTIO_F_IOMMU_PLATFORM);
>  }
> 
>  static int rproc_virtio_finalize_features(struct virtio_device *vdev)
> @@ -213,7 +213,7 @@ static int rproc_virtio_finalize_features(struct
> virtio_device *vdev)
>  	vring_transport_features(vdev);
> 
>  	/* Make sure we don't have any features > 32 bits! */
> -	BUG_ON((u32)vdev->features != vdev->features);
> +	//BUG_ON((u32)vdev->features != vdev->features);
> 
>  	/*
>  	 * Remember the finalized features of our vdev, and provide it

^ permalink raw reply

* Re: [RFC V3 PATCH 0/8] Packed ring for vhost
From: Konrad Rzeszutek Wilk @ 2018-04-23 20:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180423224715-mutt-send-email-mst@kernel.org>

On Mon, Apr 23, 2018 at 10:59:43PM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 23, 2018 at 03:31:20PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Mon, Apr 23, 2018 at 01:34:52PM +0800, Jason Wang wrote:
> > > Hi all:
> > > 
> > > This RFC implement packed ring layout. The code were tested with
> > > Tiwei's RFC V2 a thttps://lkml.org/lkml/2018/4/1/48. Some fixups and
> > > tweaks were needed on top of Tiwei's code to make it run. TCP stream
> > > and pktgen does not show obvious difference compared with split ring.
> > 
> > I have to ask then - what is the benefit of this?
> 
> You can use this with dpdk within guest.
> The DPDK version did see a performance improvement so hopefully with

Is there a link to this performance improvement document?

> future versions this will too.
> 
> > > 
> > > Changes from V2:
> > > - do not use & in checking desc_event_flags
> > > - off should be most significant bit
> > > - remove the workaround of mergeable buffer for dpdk prototype
> > > - id should be in the last descriptor in the chain
> > > - keep _F_WRITE for write descriptor when adding used
> > > - device flags updating should use ADDR_USED type
> > > - return error on unexpected unavail descriptor in a chain
> > > - return false in vhost_ve_avail_empty is descriptor is available
> > > - track last seen avail_wrap_counter
> > > - correctly examine available descriptor in get_indirect_packed()
> > > - vhost_idx_diff should return u16 instead of bool
> > > 
> > > Changes from V1:
> > > 
> > > - Refactor vhost used elem code to avoid open coding on used elem
> > > - Event suppression support (compile test only).
> > > - Indirect descriptor support (compile test only).
> > > - Zerocopy support.
> > > - vIOMMU support.
> > > - SCSI/VSOCK support (compile test only).
> > > - Fix several bugs
> > > 
> > > For simplicity, I don't implement batching or other optimizations.
> > > 
> > > Please review.
> > > 
> > > Jason Wang (8):
> > >   vhost: move get_rx_bufs to vhost.c
> > >   vhost: hide used ring layout from device
> > >   vhost: do not use vring_used_elem
> > >   vhost_net: do not explicitly manipulate vhost_used_elem
> > >   vhost: vhost_put_user() can accept metadata type
> > >   virtio: introduce packed ring defines
> > >   vhost: packed ring support
> > >   vhost: event suppression for packed ring
> > > 
> > >  drivers/vhost/net.c                | 136 ++----
> > >  drivers/vhost/scsi.c               |  62 +--
> > >  drivers/vhost/vhost.c              | 824 ++++++++++++++++++++++++++++++++++---
> > >  drivers/vhost/vhost.h              |  47 ++-
> > >  drivers/vhost/vsock.c              |  42 +-
> > >  include/uapi/linux/virtio_config.h |   9 +
> > >  include/uapi/linux/virtio_ring.h   |  32 ++
> > >  7 files changed, 926 insertions(+), 226 deletions(-)
> > > 
> > > -- 
> > > 2.7.4
> > > 

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-04-23 20:06 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Jakub Kicinski,
	Sridhar Samudrala, virtualization, Netdev, David Miller
In-Reply-To: <CADGSJ20ge75T+ddxtUBT4d9m1i3=HLOAHJEoS7Cg0bqnXrutwA@mail.gmail.com>

On Mon, Apr 23, 2018 at 12:44:39PM -0700, Siwei Liu wrote:
> On Mon, Apr 23, 2018 at 10:56 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Apr 23, 2018 at 10:44:40AM -0700, Stephen Hemminger wrote:
> >> On Mon, 23 Apr 2018 20:24:56 +0300
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>
> >> > On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:
> >> > > > >
> >> > > > >I will NAK patches to change to common code for netvsc especially the
> >> > > > >three device model.  MS worked hard with distro vendors to support transparent
> >> > > > >mode, ans we really can't have a new model; or do backport.
> >> > > > >
> >> > > > >Plus, DPDK is now dependent on existing model.
> >> > > >
> >> > > > Sorry, but nobody here cares about dpdk or other similar oddities.
> >> > >
> >> > > The network device model is a userspace API, and DPDK is a userspace application.
> >> >
> >> > It is userspace but are you sure dpdk is actually poking at netdevs?
> >> > AFAIK it's normally banging device registers directly.
> >> >
> >> > > You can't go breaking userspace even if you don't like the application.
> >> >
> >> > Could you please explain how is the proposed patchset breaking
> >> > userspace? Ignoring DPDK for now, I don't think it changes the userspace
> >> > API at all.
> >> >
> >>
> >> The DPDK has a device driver vdev_netvsc which scans the Linux network devices
> >> to look for Linux netvsc device and the paired VF device and setup the
> >> DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
> >> and sets up TAP support over the Linux netvsc device as well as the Mellanox
> >> VF device.
> >>
> >> So it depends on existing 2 device model. You can't go to a 3 device model
> >> or start hiding devices from userspace.
> >
> > Okay so how does the existing patch break that? IIUC does not go to
> > a 3 device model since netvsc calls failover_register directly.
> >
> >> Also, I am working on associating netvsc and VF device based on serial number
> >> rather than MAC address. The serial number is how Windows works now, and it makes
> >> sense for Linux and Windows to use the same mechanism if possible.
> >
> > Maybe we should support same for virtio ...
> > Which serial do you mean? From vpd?
> >
> > I guess you will want to keep supporting MAC for old hypervisors?
> >
> > It all seems like a reasonable thing to support in the generic core.
> 
> That's the reason why I chose explicit identifier rather than rely on
> MAC address to bind/pair a device. MAC address can change. Even if it
> can't, malicious guest user can fake MAC address to skip binding.
> 
> -Siwei

Address should be sampled at device creation to prevent this
kind of hack. Not that it buys the malicious user much:
if you can poke at MAC addresses you probably already can
break networking.




> 
> >
> > --
> > MST

^ permalink raw reply

* Re: [RFC V3 PATCH 0/8] Packed ring for vhost
From: Michael S. Tsirkin @ 2018-04-23 19:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: kvm, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180423193120.GD30033@char.us.oracle.com>

On Mon, Apr 23, 2018 at 03:31:20PM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Apr 23, 2018 at 01:34:52PM +0800, Jason Wang wrote:
> > Hi all:
> > 
> > This RFC implement packed ring layout. The code were tested with
> > Tiwei's RFC V2 a thttps://lkml.org/lkml/2018/4/1/48. Some fixups and
> > tweaks were needed on top of Tiwei's code to make it run. TCP stream
> > and pktgen does not show obvious difference compared with split ring.
> 
> I have to ask then - what is the benefit of this?

You can use this with dpdk within guest.
The DPDK version did see a performance improvement so hopefully with
future versions this will too.

> > 
> > Changes from V2:
> > - do not use & in checking desc_event_flags
> > - off should be most significant bit
> > - remove the workaround of mergeable buffer for dpdk prototype
> > - id should be in the last descriptor in the chain
> > - keep _F_WRITE for write descriptor when adding used
> > - device flags updating should use ADDR_USED type
> > - return error on unexpected unavail descriptor in a chain
> > - return false in vhost_ve_avail_empty is descriptor is available
> > - track last seen avail_wrap_counter
> > - correctly examine available descriptor in get_indirect_packed()
> > - vhost_idx_diff should return u16 instead of bool
> > 
> > Changes from V1:
> > 
> > - Refactor vhost used elem code to avoid open coding on used elem
> > - Event suppression support (compile test only).
> > - Indirect descriptor support (compile test only).
> > - Zerocopy support.
> > - vIOMMU support.
> > - SCSI/VSOCK support (compile test only).
> > - Fix several bugs
> > 
> > For simplicity, I don't implement batching or other optimizations.
> > 
> > Please review.
> > 
> > Jason Wang (8):
> >   vhost: move get_rx_bufs to vhost.c
> >   vhost: hide used ring layout from device
> >   vhost: do not use vring_used_elem
> >   vhost_net: do not explicitly manipulate vhost_used_elem
> >   vhost: vhost_put_user() can accept metadata type
> >   virtio: introduce packed ring defines
> >   vhost: packed ring support
> >   vhost: event suppression for packed ring
> > 
> >  drivers/vhost/net.c                | 136 ++----
> >  drivers/vhost/scsi.c               |  62 +--
> >  drivers/vhost/vhost.c              | 824 ++++++++++++++++++++++++++++++++++---
> >  drivers/vhost/vhost.h              |  47 ++-
> >  drivers/vhost/vsock.c              |  42 +-
> >  include/uapi/linux/virtio_config.h |   9 +
> >  include/uapi/linux/virtio_ring.h   |  32 ++
> >  7 files changed, 926 insertions(+), 226 deletions(-)
> > 
> > -- 
> > 2.7.4
> > 

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Siwei Liu @ 2018-04-23 19:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Jakub Kicinski,
	Sridhar Samudrala, virtualization, Netdev, David Miller
In-Reply-To: <20180423205019-mutt-send-email-mst@kernel.org>

On Mon, Apr 23, 2018 at 10:56 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Apr 23, 2018 at 10:44:40AM -0700, Stephen Hemminger wrote:
>> On Mon, 23 Apr 2018 20:24:56 +0300
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>
>> > On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:
>> > > > >
>> > > > >I will NAK patches to change to common code for netvsc especially the
>> > > > >three device model.  MS worked hard with distro vendors to support transparent
>> > > > >mode, ans we really can't have a new model; or do backport.
>> > > > >
>> > > > >Plus, DPDK is now dependent on existing model.
>> > > >
>> > > > Sorry, but nobody here cares about dpdk or other similar oddities.
>> > >
>> > > The network device model is a userspace API, and DPDK is a userspace application.
>> >
>> > It is userspace but are you sure dpdk is actually poking at netdevs?
>> > AFAIK it's normally banging device registers directly.
>> >
>> > > You can't go breaking userspace even if you don't like the application.
>> >
>> > Could you please explain how is the proposed patchset breaking
>> > userspace? Ignoring DPDK for now, I don't think it changes the userspace
>> > API at all.
>> >
>>
>> The DPDK has a device driver vdev_netvsc which scans the Linux network devices
>> to look for Linux netvsc device and the paired VF device and setup the
>> DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
>> and sets up TAP support over the Linux netvsc device as well as the Mellanox
>> VF device.
>>
>> So it depends on existing 2 device model. You can't go to a 3 device model
>> or start hiding devices from userspace.
>
> Okay so how does the existing patch break that? IIUC does not go to
> a 3 device model since netvsc calls failover_register directly.
>
>> Also, I am working on associating netvsc and VF device based on serial number
>> rather than MAC address. The serial number is how Windows works now, and it makes
>> sense for Linux and Windows to use the same mechanism if possible.
>
> Maybe we should support same for virtio ...
> Which serial do you mean? From vpd?
>
> I guess you will want to keep supporting MAC for old hypervisors?
>
> It all seems like a reasonable thing to support in the generic core.

That's the reason why I chose explicit identifier rather than rely on
MAC address to bind/pair a device. MAC address can change. Even if it
can't, malicious guest user can fake MAC address to skip binding.

-Siwei


>
> --
> MST

^ permalink raw reply

* Re: virtio remoteproc device
From: Michael S. Tsirkin @ 2018-04-23 19:40 UTC (permalink / raw)
  To: Loic PALLARDY
  Cc: Ohad Ben-Cohen, Anup Patel,
	virtualization@lists.linux-foundation.org,
	linux-remoteproc@vger.kernel.org, Bjorn Andersson
In-Reply-To: <1d7df45a692b42bd9d70462e0af63ca9@SFHDAG7NODE2.st.com>

On Mon, Apr 23, 2018 at 08:55:57AM +0000, Loic PALLARDY wrote:
> 
> 
> > -----Original Message-----
> > From: linux-remoteproc-owner@vger.kernel.org [mailto:linux-remoteproc-
> > owner@vger.kernel.org] On Behalf Of Anup Patel
> > Sent: Sunday, April 22, 2018 6:08 AM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: linux-remoteproc@vger.kernel.org; Ohad Ben-Cohen
> > <ohad@wizery.com>; Bjorn Andersson <bjorn.andersson@linaro.org>;
> > virtualization@lists.linux-foundation.org
> > Subject: Re: virtio remoteproc device
> > 
> > On Fri, Apr 20, 2018 at 10:23 PM, Michael S. Tsirkin <mst@redhat.com>
> > wrote:
> > > Hello!
> > > I note the following in the serial console:
> > >
> > >       if (is_rproc_serial(vdev)) {
> > >                 /*
> > >                  * Allocate DMA memory from ancestor. When a virtio
> > >                  * device is created by remoteproc, the DMA memory is
> > >                  * associated with the grandparent device:
> > >                  * vdev => rproc => platform-dev.
> > >                  */
> > >                 if (!vdev->dev.parent || !vdev->dev.parent->parent)
> > >                         goto free_buf;
> > >                 buf->dev = vdev->dev.parent->parent;
> > >
> > >                 /* Increase device refcnt to avoid freeing it */
> > >                 get_device(buf->dev);
> > >                 buf->buf = dma_alloc_coherent(buf->dev, buf_size, &buf->dma,
> > >                                               GFP_KERNEL);
> > >         }
> > >
> > > Added here:
> > >         commit 1b6370463e88b0c1c317de16d7b962acc1dab4f2
> > >         Author: Sjur Brændeland <sjur.brandeland@stericsson.com>
> > >         Date:   Fri Dec 14 14:40:51 2012 +1030
> > >
> > >     virtio_console: Add support for remoteproc serial
> > >
> > >
> > > I am not familiar with rproc so I have a question:
> > > why is it required to use coherent memory here,
> > > and why through a grandparent device?
> > 
> > I faced similar issue when I tried VirtIO RPMSG bus over
> > VirtIO MMIO transport.
> > 
> > Here's my fix for VirtIO RPMSG bus driver:
> > https://patchwork.kernel.org/patch/10155145/
> 
> Hi Anup and Michael,
> 
> I pushed a series to modify virtio device allocation in remoteproc. Please see [1].
> It allows to remove allocation based on "grand-parent" device in the case of virtio device allocated by remoteproc.
> Virto_console patch missing, only virtio_rpmsg modification proposed. I can add it in next version.
> 
> Regards,
> Loic
> 
> [1] https://lkml.org/lkml/2018/3/1/644
> 
> > 
> > Regards,
> > Anup
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-remoteproc" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

So on top of your patches, can we just force use of DMA API
and drop special casting?
E.g. maybe something like the below - completely untested - patch?
Does it work for you?


diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 2108551..6c6767c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -40,7 +40,8 @@
 #include <linux/dma-mapping.h>
 #include "../tty/hvc/hvc_console.h"
 
-#define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
+//#define is_rproc_enabled IS_ENABLED(CONFIG_REMOTEPROC)
+#define is_rproc_enabled 0
 
 /*
  * This is a global struct for storing common data for all the devices
diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c
index b0633fd..4a13ca2 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -199,7 +199,7 @@ static u64 rproc_virtio_get_features(struct virtio_device *vdev)
 
 	rsc = (void *)rvdev->rproc->table_ptr + rvdev->rsc_offset;
 
-	return rsc->dfeatures;
+	return rsc->dfeatures | (1ULL << VIRTIO_F_IOMMU_PLATFORM);
 }
 
 static int rproc_virtio_finalize_features(struct virtio_device *vdev)
@@ -213,7 +213,7 @@ static int rproc_virtio_finalize_features(struct virtio_device *vdev)
 	vring_transport_features(vdev);
 
 	/* Make sure we don't have any features > 32 bits! */
-	BUG_ON((u32)vdev->features != vdev->features);
+	//BUG_ON((u32)vdev->features != vdev->features);
 
 	/*
 	 * Remember the finalized features of our vdev, and provide it

^ permalink raw reply related

* Re: [RFC V3 PATCH 0/8] Packed ring for vhost
From: Konrad Rzeszutek Wilk @ 2018-04-23 19:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <1524461700-5469-1-git-send-email-jasowang@redhat.com>

On Mon, Apr 23, 2018 at 01:34:52PM +0800, Jason Wang wrote:
> Hi all:
> 
> This RFC implement packed ring layout. The code were tested with
> Tiwei's RFC V2 a thttps://lkml.org/lkml/2018/4/1/48. Some fixups and
> tweaks were needed on top of Tiwei's code to make it run. TCP stream
> and pktgen does not show obvious difference compared with split ring.

I have to ask then - what is the benefit of this?

> 
> Changes from V2:
> - do not use & in checking desc_event_flags
> - off should be most significant bit
> - remove the workaround of mergeable buffer for dpdk prototype
> - id should be in the last descriptor in the chain
> - keep _F_WRITE for write descriptor when adding used
> - device flags updating should use ADDR_USED type
> - return error on unexpected unavail descriptor in a chain
> - return false in vhost_ve_avail_empty is descriptor is available
> - track last seen avail_wrap_counter
> - correctly examine available descriptor in get_indirect_packed()
> - vhost_idx_diff should return u16 instead of bool
> 
> Changes from V1:
> 
> - Refactor vhost used elem code to avoid open coding on used elem
> - Event suppression support (compile test only).
> - Indirect descriptor support (compile test only).
> - Zerocopy support.
> - vIOMMU support.
> - SCSI/VSOCK support (compile test only).
> - Fix several bugs
> 
> For simplicity, I don't implement batching or other optimizations.
> 
> Please review.
> 
> Jason Wang (8):
>   vhost: move get_rx_bufs to vhost.c
>   vhost: hide used ring layout from device
>   vhost: do not use vring_used_elem
>   vhost_net: do not explicitly manipulate vhost_used_elem
>   vhost: vhost_put_user() can accept metadata type
>   virtio: introduce packed ring defines
>   vhost: packed ring support
>   vhost: event suppression for packed ring
> 
>  drivers/vhost/net.c                | 136 ++----
>  drivers/vhost/scsi.c               |  62 +--
>  drivers/vhost/vhost.c              | 824 ++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.h              |  47 ++-
>  drivers/vhost/vsock.c              |  42 +-
>  include/uapi/linux/virtio_config.h |   9 +
>  include/uapi/linux/virtio_ring.h   |  32 ++
>  7 files changed, 926 insertions(+), 226 deletions(-)
> 
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-04-23 17:56 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: alexander.h.duyck, virtio-dev, Jiri Pirko, kubakici,
	Sridhar Samudrala, virtualization, loseweigh, netdev, davem
In-Reply-To: <20180423104440.2fe6cfd2@xeon-e3>

On Mon, Apr 23, 2018 at 10:44:40AM -0700, Stephen Hemminger wrote:
> On Mon, 23 Apr 2018 20:24:56 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:
> > > > >
> > > > >I will NAK patches to change to common code for netvsc especially the
> > > > >three device model.  MS worked hard with distro vendors to support transparent
> > > > >mode, ans we really can't have a new model; or do backport.
> > > > >
> > > > >Plus, DPDK is now dependent on existing model.    
> > > > 
> > > > Sorry, but nobody here cares about dpdk or other similar oddities.  
> > > 
> > > The network device model is a userspace API, and DPDK is a userspace application.  
> > 
> > It is userspace but are you sure dpdk is actually poking at netdevs?
> > AFAIK it's normally banging device registers directly.
> > 
> > > You can't go breaking userspace even if you don't like the application.  
> > 
> > Could you please explain how is the proposed patchset breaking
> > userspace? Ignoring DPDK for now, I don't think it changes the userspace
> > API at all.
> > 
> 
> The DPDK has a device driver vdev_netvsc which scans the Linux network devices
> to look for Linux netvsc device and the paired VF device and setup the
> DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
> and sets up TAP support over the Linux netvsc device as well as the Mellanox
> VF device.
> 
> So it depends on existing 2 device model. You can't go to a 3 device model
> or start hiding devices from userspace.

Okay so how does the existing patch break that? IIUC does not go to
a 3 device model since netvsc calls failover_register directly.

> Also, I am working on associating netvsc and VF device based on serial number
> rather than MAC address. The serial number is how Windows works now, and it makes
> sense for Linux and Windows to use the same mechanism if possible.

Maybe we should support same for virtio ...
Which serial do you mean? From vpd?

I guess you will want to keep supporting MAC for old hypervisors?

It all seems like a reasonable thing to support in the generic core.

-- 
MST

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Stephen Hemminger @ 2018-04-23 17:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, virtio-dev, Jiri Pirko, kubakici,
	Sridhar Samudrala, virtualization, loseweigh, netdev, davem
In-Reply-To: <20180423202204-mutt-send-email-mst@kernel.org>

On Mon, 23 Apr 2018 20:24:56 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Apr 23, 2018 at 10:04:06AM -0700, Stephen Hemminger wrote:
> > > >
> > > >I will NAK patches to change to common code for netvsc especially the
> > > >three device model.  MS worked hard with distro vendors to support transparent
> > > >mode, ans we really can't have a new model; or do backport.
> > > >
> > > >Plus, DPDK is now dependent on existing model.    
> > > 
> > > Sorry, but nobody here cares about dpdk or other similar oddities.  
> > 
> > The network device model is a userspace API, and DPDK is a userspace application.  
> 
> It is userspace but are you sure dpdk is actually poking at netdevs?
> AFAIK it's normally banging device registers directly.
> 
> > You can't go breaking userspace even if you don't like the application.  
> 
> Could you please explain how is the proposed patchset breaking
> userspace? Ignoring DPDK for now, I don't think it changes the userspace
> API at all.
> 

The DPDK has a device driver vdev_netvsc which scans the Linux network devices
to look for Linux netvsc device and the paired VF device and setup the
DPDK environment.  This setup creates a DPDK failsafe (bondingish) instance
and sets up TAP support over the Linux netvsc device as well as the Mellanox
VF device.

So it depends on existing 2 device model. You can't go to a 3 device model
or start hiding devices from userspace.

Also, I am working on associating netvsc and VF device based on serial number
rather than MAC address. The serial number is how Windows works now, and it makes
sense for Linux and Windows to use the same mechanism if possible.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox