* Re: [PATCH net-next v2 2/5] virtio_ring: support creating packed ring
From: Tiwei Bie @ 2018-09-12 7:51 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180910022837.GA11822@debian>
On Mon, Sep 10, 2018 at 10:28:37AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 10:03:24AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 11, 2018 at 10:27:08AM +0800, Tiwei Bie wrote:
> > > This commit introduces the support for creating packed ring.
> > > All split ring specific functions are added _split suffix.
> > > Some necessary stubs for packed ring are also added.
> > >
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >
[...]
> > > +
> > > +/*
> > > + * The layout for the packed ring is a continuous chunk of memory
> > > + * which looks like this.
> > > + *
> > > + * struct vring_packed {
> > > + * // The actual descriptors (16 bytes each)
> > > + * struct vring_packed_desc desc[num];
> > > + *
> > > + * // Padding to the next align boundary.
> > > + * char pad[];
> > > + *
> > > + * // Driver Event Suppression
> > > + * struct vring_packed_desc_event driver;
> > > + *
> > > + * // Device Event Suppression
> > > + * struct vring_packed_desc_event device;
> > > + * };
> > > + */
> >
> > Why not just allocate event structures separately?
> > Is it a win to have them share a cache line for some reason?
Users may call vring_new_virtqueue() with preallocated
memory to setup the ring, e.g.:
https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/s390/virtio/virtio_ccw.c#L513-L522
https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/misc/mic/vop/vop_main.c#L306-L307
Below is the corresponding definition in split ring:
https://github.com/oasis-tcs/virtio-spec/blob/89dd55f5e606/split-ring.tex#L64-L78
If my understanding is correct, this is just for legacy
interfaces, and we won't define layout in packed ring
and don't need to support vring_new_virtqueue() in packed
ring. Is it correct? Thanks!
>
> Will do that.
>
> >
> > > +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> > > + void *p, unsigned long align)
> > > +{
> > > + vr->num = num;
> > > + vr->desc = p;
> > > + vr->driver = (void *)ALIGN(((uintptr_t)p +
> > > + sizeof(struct vring_packed_desc) * num), align);
> > > + vr->device = vr->driver + 1;
> > > +}
> >
> > What's all this about alignment? Where does it come from?
>
> It comes from the `vring_align` parameter of vring_create_virtqueue()
> and vring_new_virtqueue():
>
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1061
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1123
>
> Should I just ignore it in packed ring?
>
> CCW defined this:
>
> #define KVM_VIRTIO_CCW_RING_ALIGN 4096
>
> I'm not familiar with CCW. Currently, in this patch set, packed ring
> isn't enabled on CCW dues to some legacy accessors are not implemented
> in packed ring yet.
>
> >
> > > +
> > > +static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> > > +{
> > > + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > > + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > > +}
> [...]
> > > @@ -1129,10 +1388,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> > > void (*callback)(struct virtqueue *vq),
> > > const char *name)
> > > {
> > > - struct vring vring;
> > > - vring_init(&vring, num, pages, vring_align);
> > > - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > - notify, callback, name);
> > > + union vring_union vring;
> > > + bool packed;
> > > +
> > > + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > > + if (packed)
> > > + vring_init_packed(&vring.vring_packed, num, pages, vring_align);
> > > + else
> > > + vring_init(&vring.vring_split, num, pages, vring_align);
> >
> >
> > vring_init in the UAPI header is more or less a bug.
> > I'd just stop using it, keep it around for legacy userspace.
>
> Got it. I'd like to do that. Thanks.
>
> >
> > > +
> > > + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > + context, notify, callback, name);
> > > }
> > > EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> > >
> > > @@ -1142,7 +1408,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> > >
> > > if (vq->we_own_ring) {
> > > vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> > > - vq->vring.desc, vq->queue_dma_addr);
> > > + vq->packed ? (void *)vq->vring_packed.desc :
> > > + (void *)vq->vring.desc,
> > > + vq->queue_dma_addr);
> > > }
> > > list_del(&_vq->list);
> > > kfree(vq);
> > > @@ -1184,7 +1452,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
> > >
> > > struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > - return vq->vring.num;
> > > + return vq->packed ? vq->vring_packed.num : vq->vring.num;
> > > }
> > > EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
> > >
> > > @@ -1227,6 +1495,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> > >
> > > BUG_ON(!vq->we_own_ring);
> > >
> > > + if (vq->packed)
> > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> > > + (char *)vq->vring_packed.desc);
> > > +
> > > return vq->queue_dma_addr +
> > > ((char *)vq->vring.avail - (char *)vq->vring.desc);
> > > }
> > > @@ -1238,11 +1510,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> > >
> > > BUG_ON(!vq->we_own_ring);
> > >
> > > + if (vq->packed)
> > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> > > + (char *)vq->vring_packed.desc);
> > > +
> > > return vq->queue_dma_addr +
> > > ((char *)vq->vring.used - (char *)vq->vring.desc);
> > > }
> > > EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
> > >
> > > +/* Only available for split ring */
> > > const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> > > {
> > > return &to_vvq(vq)->vring;
> > > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > > index fab02133a919..992142b35f55 100644
> > > --- a/include/linux/virtio_ring.h
> > > +++ b/include/linux/virtio_ring.h
> > > @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> > > struct virtio_device;
> > > struct virtqueue;
> > >
> > > +union vring_union {
> > > + struct vring vring_split;
> > > + struct vring_packed vring_packed;
> > > +};
> > > +
> > > /*
> > > * Creates a virtqueue and allocates the descriptor ring. If
> > > * may_reduce_num is set, then this may allocate a smaller ring than
> > > @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
> > >
> > > /* Creates a virtqueue with a custom layout. */
> > > struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > - struct vring vring,
> > > + union vring_union vring,
> > > + bool packed,
> > > struct virtio_device *vdev,
> > > bool weak_barriers,
> > > bool ctx,
> > > --
> > > 2.18.0
^ permalink raw reply
* Re: [PATCH net-next v2 2/5] virtio_ring: support creating packed ring
From: Michael S. Tsirkin @ 2018-09-12 12:45 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180910022837.GA11822@debian>
On Mon, Sep 10, 2018 at 10:28:37AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 10:03:24AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 11, 2018 at 10:27:08AM +0800, Tiwei Bie wrote:
> > > This commit introduces the support for creating packed ring.
> > > All split ring specific functions are added _split suffix.
> > > Some necessary stubs for packed ring are also added.
> > >
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >
> > I'd rather have a patch just renaming split functions, then
> > add all packed stuff in as a separate patch on top.
>
> Sure, I will do that.
>
> >
> >
> > > ---
> > > drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
> > > include/linux/virtio_ring.h | 8 +-
> > > 2 files changed, 546 insertions(+), 263 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 814b395007b2..c4f8abc7445a 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -60,11 +60,15 @@ struct vring_desc_state {
> > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > };
> > >
> > > +struct vring_desc_state_packed {
> > > + int next; /* The next desc state. */
> >
> > So this can go away with IN_ORDER?
>
> Yes. If IN_ORDER is negotiated, next won't be needed anymore.
> Currently, IN_ORDER isn't included in this patch set, because
> some changes for split ring are needed to make sure that it
> will use the descs in the expected order. After that,
> optimizations can be done for both of split ring and packed
> ring respectively.
>
> >
> > > +};
> > > +
> > > struct vring_virtqueue {
> > > struct virtqueue vq;
> > >
> > > - /* Actual memory layout for this queue */
> > > - struct vring vring;
> > > + /* Is this a packed ring? */
> > > + bool packed;
> > >
> > > /* Can we use weak barriers? */
> > > bool weak_barriers;
> > > @@ -86,11 +90,39 @@ struct vring_virtqueue {
> > > /* Last used index we've seen. */
> > > u16 last_used_idx;
> > >
> > > - /* Last written value to avail->flags */
> > > - u16 avail_flags_shadow;
> > > + union {
> > > + /* Available for split ring */
> > > + struct {
> > > + /* Actual memory layout for this queue. */
> > > + struct vring vring;
> > >
> > > - /* Last written value to avail->idx in guest byte order */
> > > - u16 avail_idx_shadow;
> > > + /* Last written value to avail->flags */
> > > + u16 avail_flags_shadow;
> > > +
> > > + /* Last written value to avail->idx in
> > > + * guest byte order. */
> > > + u16 avail_idx_shadow;
> > > + };
> >
> > Name this field split so it's easier to detect misuse of e.g.
> > packed fields in split code?
>
> Good point, I'll do that.
>
> >
> > > +
> > > + /* Available for packed ring */
> > > + struct {
> > > + /* Actual memory layout for this queue. */
> > > + struct vring_packed vring_packed;
> > > +
> > > + /* Driver ring wrap counter. */
> > > + bool avail_wrap_counter;
> > > +
> > > + /* Device ring wrap counter. */
> > > + bool used_wrap_counter;
> > > +
> > > + /* Index of the next avail descriptor. */
> > > + u16 next_avail_idx;
> > > +
> > > + /* Last written value to driver->flags in
> > > + * guest byte order. */
> > > + u16 event_flags_shadow;
> > > + };
> > > + };
> [...]
> > > +
> > > +/*
> > > + * The layout for the packed ring is a continuous chunk of memory
> > > + * which looks like this.
> > > + *
> > > + * struct vring_packed {
> > > + * // The actual descriptors (16 bytes each)
> > > + * struct vring_packed_desc desc[num];
> > > + *
> > > + * // Padding to the next align boundary.
> > > + * char pad[];
> > > + *
> > > + * // Driver Event Suppression
> > > + * struct vring_packed_desc_event driver;
> > > + *
> > > + * // Device Event Suppression
> > > + * struct vring_packed_desc_event device;
> > > + * };
> > > + */
> >
> > Why not just allocate event structures separately?
> > Is it a win to have them share a cache line for some reason?
>
> Will do that.
>
> >
> > > +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> > > + void *p, unsigned long align)
> > > +{
> > > + vr->num = num;
> > > + vr->desc = p;
> > > + vr->driver = (void *)ALIGN(((uintptr_t)p +
> > > + sizeof(struct vring_packed_desc) * num), align);
> > > + vr->device = vr->driver + 1;
> > > +}
> >
> > What's all this about alignment? Where does it come from?
>
> It comes from the `vring_align` parameter of vring_create_virtqueue()
> and vring_new_virtqueue():
>
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1061
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1123
Note the TODO - we just never got to fixing it. It would be great to fix
this for virtio 1 rings, but if not - at least let's not add this
stuff for packed rings.
> Should I just ignore it in packed ring?
>
> CCW defined this:
>
> #define KVM_VIRTIO_CCW_RING_ALIGN 4096
>
> I'm not familiar with CCW. Currently, in this patch set, packed ring
> isn't enabled on CCW dues to some legacy accessors are not implemented
> in packed ring yet.
Then you need to take steps not to negotiate this feature bit for
ccw drivers.
> >
> > > +
> > > +static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> > > +{
> > > + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > > + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > > +}
> [...]
> > > @@ -1129,10 +1388,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> > > void (*callback)(struct virtqueue *vq),
> > > const char *name)
> > > {
> > > - struct vring vring;
> > > - vring_init(&vring, num, pages, vring_align);
> > > - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > - notify, callback, name);
> > > + union vring_union vring;
> > > + bool packed;
> > > +
> > > + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > > + if (packed)
> > > + vring_init_packed(&vring.vring_packed, num, pages, vring_align);
> > > + else
> > > + vring_init(&vring.vring_split, num, pages, vring_align);
> >
> >
> > vring_init in the UAPI header is more or less a bug.
> > I'd just stop using it, keep it around for legacy userspace.
>
> Got it. I'd like to do that. Thanks.
>
> >
> > > +
> > > + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > + context, notify, callback, name);
> > > }
> > > EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> > >
> > > @@ -1142,7 +1408,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> > >
> > > if (vq->we_own_ring) {
> > > vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> > > - vq->vring.desc, vq->queue_dma_addr);
> > > + vq->packed ? (void *)vq->vring_packed.desc :
> > > + (void *)vq->vring.desc,
> > > + vq->queue_dma_addr);
> > > }
> > > list_del(&_vq->list);
> > > kfree(vq);
> > > @@ -1184,7 +1452,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
> > >
> > > struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > - return vq->vring.num;
> > > + return vq->packed ? vq->vring_packed.num : vq->vring.num;
> > > }
> > > EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
> > >
> > > @@ -1227,6 +1495,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> > >
> > > BUG_ON(!vq->we_own_ring);
> > >
> > > + if (vq->packed)
> > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> > > + (char *)vq->vring_packed.desc);
> > > +
> > > return vq->queue_dma_addr +
> > > ((char *)vq->vring.avail - (char *)vq->vring.desc);
> > > }
> > > @@ -1238,11 +1510,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> > >
> > > BUG_ON(!vq->we_own_ring);
> > >
> > > + if (vq->packed)
> > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> > > + (char *)vq->vring_packed.desc);
> > > +
> > > return vq->queue_dma_addr +
> > > ((char *)vq->vring.used - (char *)vq->vring.desc);
> > > }
> > > EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
> > >
> > > +/* Only available for split ring */
> > > const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> > > {
> > > return &to_vvq(vq)->vring;
> > > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > > index fab02133a919..992142b35f55 100644
> > > --- a/include/linux/virtio_ring.h
> > > +++ b/include/linux/virtio_ring.h
> > > @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> > > struct virtio_device;
> > > struct virtqueue;
> > >
> > > +union vring_union {
> > > + struct vring vring_split;
> > > + struct vring_packed vring_packed;
> > > +};
> > > +
> > > /*
> > > * Creates a virtqueue and allocates the descriptor ring. If
> > > * may_reduce_num is set, then this may allocate a smaller ring than
> > > @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
> > >
> > > /* Creates a virtqueue with a custom layout. */
> > > struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > - struct vring vring,
> > > + union vring_union vring,
> > > + bool packed,
> > > struct virtio_device *vdev,
> > > bool weak_barriers,
> > > bool ctx,
> > > --
> > > 2.18.0
^ permalink raw reply
* Re: [PATCH net-next v2 1/5] virtio: add packed ring definitions
From: Michael S. Tsirkin @ 2018-09-12 12:53 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180910021322.GA10912@debian>
On Mon, Sep 10, 2018 at 10:13:22AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 09:51:23AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 11, 2018 at 10:27:07AM +0800, Tiwei Bie wrote:
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > ---
> > > include/uapi/linux/virtio_config.h | 3 +++
> > > include/uapi/linux/virtio_ring.h | 43 ++++++++++++++++++++++++++++++
> > > 2 files changed, 46 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > index 449132c76b1c..1196e1c1d4f6 100644
> > > --- a/include/uapi/linux/virtio_config.h
> > > +++ b/include/uapi/linux/virtio_config.h
> > > @@ -75,6 +75,9 @@
> > > */
> > > #define VIRTIO_F_IOMMU_PLATFORM 33
> > >
> > > +/* This feature indicates support for the packed virtqueue layout. */
> > > +#define VIRTIO_F_RING_PACKED 34
> > > +
> > > /*
> > > * Does the device support Single Root I/O Virtualization?
> > > */
> > > diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> > > index 6d5d5faa989b..0254a2ba29cf 100644
> > > --- a/include/uapi/linux/virtio_ring.h
> > > +++ b/include/uapi/linux/virtio_ring.h
> > > @@ -44,6 +44,10 @@
> > > /* This means the buffer contains a list of buffer descriptors. */
> > > #define VRING_DESC_F_INDIRECT 4
> > >
> > > +/* Mark a descriptor as available or used. */
> > > +#define VRING_DESC_F_AVAIL (1ul << 7)
> > > +#define VRING_DESC_F_USED (1ul << 15)
> > > +
> > > /* The Host uses this in used->flags to advise the Guest: don't kick me when
> > > * you add a buffer. It's unreliable, so it's simply an optimization. Guest
> > > * will still kick if it's out of buffers. */
> > > @@ -53,6 +57,17 @@
> > > * optimization. */
> > > #define VRING_AVAIL_F_NO_INTERRUPT 1
> > >
> > > +/* Enable events. */
> > > +#define VRING_EVENT_F_ENABLE 0x0
> > > +/* Disable events. */
> > > +#define VRING_EVENT_F_DISABLE 0x1
> > > +/*
> > > + * Enable events for a specific descriptor
> > > + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> > > + * Only valid if VIRTIO_RING_F_EVENT_IDX has been negotiated.
> > > + */
> > > +#define VRING_EVENT_F_DESC 0x2
> > > +
> > > /* We support indirect buffer descriptors */
> > > #define VIRTIO_RING_F_INDIRECT_DESC 28
> > >
> >
> > These are for the packed ring, right? Pls prefix accordingly.
>
> How about something like this:
>
> #define VRING_PACKED_DESC_F_AVAIL (1u << 7)
> #define VRING_PACKED_DESC_F_USED (1u << 15)
_F_ should be a bit number, not a shifted value.
Yes I know current virtio_ring.h is inconsistent in
that respect. changes welcome though we need to
maintain the old names for the sake of old apps.
>
> #define VRING_PACKED_EVENT_F_ENABLE 0x0
> #define VRING_PACKED_EVENT_F_DISABLE 0x1
> #define VRING_PACKED_EVENT_F_DESC 0x2
These are values, I think they should not have _F_.
Yes I know current virtio_ring.h is inconsistent in
that respect.
>
> > Also, you likely need macros for the wrap counters.
>
> How about something like this:
>
> #define VRING_PACKED_EVENT_WRAP_COUNTER_SHIFT 15
Given it's a single bit, maybe even
VRING_PACKED_EVENT_F_WRAP_CTR
> #define VRING_PACKED_EVENT_WRAP_COUNTER_MASK \
> (1u << VRING_PACKED_WRAP_COUNTER_SHIFT)
> #define VRING_PACKED_EVENT_OFFSET_MASK \
> ((1u << VRING_PACKED_WRAP_COUNTER_SHIFT) - 1)
I'm not sure the mask is justified.
> >
> > > @@ -171,4 +186,32 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
> > > return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
> > > }
> > >
> > > +struct vring_packed_desc_event {
> > > + /* Descriptor Ring Change Event Offset/Wrap Counter. */
> > > + __virtio16 off_wrap;
> > > + /* Descriptor Ring Change Event Flags. */
> > > + __virtio16 flags;
> > > +};
> > > +
> > > +struct vring_packed_desc {
> > > + /* Buffer Address. */
> > > + __virtio64 addr;
> > > + /* Buffer Length. */
> > > + __virtio32 len;
> > > + /* Buffer ID. */
> > > + __virtio16 id;
> > > + /* The flags depending on descriptor type. */
> > > + __virtio16 flags;
> > > +};
> >
> > Don't use __virtioXX types, just __leXX ones.
>
> Got it, will do that.
>
> >
> > > +
> > > +struct vring_packed {
> > > + unsigned int num;
> > > +
> > > + struct vring_packed_desc *desc;
> > > +
> > > + struct vring_packed_desc_event *driver;
> > > +
> > > + struct vring_packed_desc_event *device;
> > > +};
> > > +
> > > #endif /* _UAPI_LINUX_VIRTIO_RING_H */
> > > --
> > > 2.18.0
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-09-12 13:06 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180910030053.GA15645@debian>
On Mon, Sep 10, 2018 at 11:00:53AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 09:00:49AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> > > On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > > > Are there still plans to test the performance with vost pmd?
> > > > vhost doesn't seem to show a performance gain ...
> > > >
> > >
> > > I tried some performance tests with vhost PMD. In guest, the
> > > XDP program will return XDP_DROP directly. And in host, testpmd
> > > will do txonly fwd.
> > >
> > > When burst size is 1 and packet size is 64 in testpmd and
> > > testpmd needs to iterate 5 Tx queues (but only the first two
> > > queues are enabled) to prepare and inject packets, I got ~12%
> > > performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> > > is faster (e.g. just need to iterate the first two queues to
> > > prepare and inject packets), then I got similar performance
> > > for both rings (~9.9Mpps) (packed ring's performance can be
> > > lower, because it's more complicated in driver.)
> > >
> > > I think packed ring makes vhost PMD faster, but it doesn't make
> > > the driver faster. In packed ring, the ring is simplified, and
> > > the handling of the ring in vhost (device) is also simplified,
> > > but things are not simplified in driver, e.g. although there is
> > > no desc table in the virtqueue anymore, driver still needs to
> > > maintain a private desc state table (which is still managed as
> > > a list in this patch set) to support the out-of-order desc
> > > processing in vhost (device).
> > >
> > > I think this patch set is mainly to make the driver have a full
> > > functional support for the packed ring, which makes it possible
> > > to leverage the packed ring feature in vhost (device). But I'm
> > > not sure whether there is any other better idea, I'd like to
> > > hear your thoughts. Thanks!
> >
> > Just this: Jens seems to report a nice gain with virtio and
> > vhost pmd across the board. Try to compare virtio and
> > virtio pmd to see what does pmd do better?
>
> The virtio PMD (drivers/net/virtio) in DPDK doesn't need to share
> the virtio ring operation code with other drivers and is highly
> optimized for network. E.g. in Rx, the Rx burst function won't
> chain descs.
> So the ID management for the Rx ring can be quite
> simple and straightforward, we just need to initialize these IDs
> when initializing the ring and don't need to change these IDs
> in data path anymore (the mergable Rx code in that patch set
> assumes the descs will be written back in order, which should be
> fixed. I.e., the ID in the desc should be used to index vq->descx[]).
> The Tx code in that patch set also assumes the descs will be
> written back by device in order, which should be fixed.
>
> But in kernel virtio driver, the virtio_ring.c is very generic.
> The enqueue (virtqueue_add()) and dequeue (virtqueue_get_buf_ctx())
> functions need to support all the virtio devices and should be
> able to handle all the possible cases that may happen. So although
> the packed ring can be very efficient in some cases, currently
> the room to optimize the performance in kernel's virtio_ring.c
> isn't that much. If we want to take the fully advantage of the
> packed ring's efficiency, we need some further e.g. API changes
> in virtio_ring.c, which shouldn't be part of this patch set. So
> I still think this patch set is mainly to make the kernel virtio
> driver to have a full functional support of the packed ring, and
> we can't expect impressive performance boost with it.
So what are the cases that make things complex?
Maybe we should drop support for them completely.
> >
> >
> > >
> > > >
> > > > On Wed, Jul 11, 2018 at 10:27:06AM +0800, Tiwei Bie wrote:
> > > > > Hello everyone,
> > > > >
> > > > > This patch set implements packed ring support in virtio driver.
> > > > >
> > > > > Some functional tests have been done with Jason's
> > > > > packed ring implementation in vhost:
> > > > >
> > > > > https://lkml.org/lkml/2018/7/3/33
> > > > >
> > > > > Both of ping and netperf worked as expected.
> > > > >
> > > > > v1 -> v2:
> > > > > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > > > > - Add comments related to ccw (Jason);
> > > > >
> > > > > RFC (v6) -> v1:
> > > > > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> > > > > when event idx is off (Jason);
> > > > > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > - Test the state of the desc at used_idx instead of last_used_idx
> > > > > in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > - Save wrap counter (as part of queue state) in the return value
> > > > > of virtqueue_enable_cb_prepare_packed();
> > > > > - Refine the packed ring definitions in uapi;
> > > > > - Rebase on the net-next tree;
> > > > >
> > > > > RFC v5 -> RFC v6:
> > > > > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > > > > - Define wrap counter as bool (Jason);
> > > > > - Use ALIGN() in vring_init_packed() (Jason);
> > > > > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > > > > - Add comments for barriers (Jason);
> > > > > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > > > > - Refine the memory barrier in virtqueue_poll();
> > > > > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > > > > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > > > >
> > > > > RFC v4 -> RFC v5:
> > > > > - Save DMA addr, etc in desc state (Jason);
> > > > > - Track used wrap counter;
> > > > >
> > > > > RFC v3 -> RFC v4:
> > > > > - Make ID allocation support out-of-order (Jason);
> > > > > - Various fixes for EVENT_IDX support;
> > > > >
> > > > > RFC v2 -> RFC v3:
> > > > > - Split into small patches (Jason);
> > > > > - Add helper virtqueue_use_indirect() (Jason);
> > > > > - Just set id for the last descriptor of a list (Jason);
> > > > > - Calculate the prev in virtqueue_add_packed() (Jason);
> > > > > - Fix/improve desc suppression code (Jason/MST);
> > > > > - Refine the code layout for XXX_split/packed and wrappers (MST);
> > > > > - Fix the comments and API in uapi (MST);
> > > > > - Remove the BUG_ON() for indirect (Jason);
> > > > > - Some other refinements and bug fixes;
> > > > >
> > > > > RFC v1 -> RFC v2:
> > > > > - Add indirect descriptor support - compile test only;
> > > > > - Add event suppression supprt - compile test only;
> > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > - Avoid using '%' operator (Jason);
> > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > - Some other refinements and bug fixes;
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Tiwei Bie (5):
> > > > > virtio: add packed ring definitions
> > > > > virtio_ring: support creating packed ring
> > > > > virtio_ring: add packed ring support
> > > > > virtio_ring: add event idx support in packed ring
> > > > > virtio_ring: enable packed ring
> > > > >
> > > > > drivers/s390/virtio/virtio_ccw.c | 14 +
> > > > > drivers/virtio/virtio_ring.c | 1365 ++++++++++++++++++++++------
> > > > > include/linux/virtio_ring.h | 8 +-
> > > > > include/uapi/linux/virtio_config.h | 3 +
> > > > > include/uapi/linux/virtio_ring.h | 43 +
> > > > > 5 files changed, 1157 insertions(+), 276 deletions(-)
> > > > >
> > > > > --
> > > > > 2.18.0
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
^ permalink raw reply
* [PATCH v2 02/17] compat_ioctl: move drivers to generic_compat_ioctl_ptrarg
From: Arnd Bergmann @ 2018-09-12 15:01 UTC (permalink / raw)
To: viro
Cc: kvm, Alexander Shishkin, Jarkko Sakkinen, virtualization,
Benjamin Tissoires, linux-mtd, Peter Huewe, linux1394-devel,
devel, Jason Gunthorpe, Marek Vasut, linux-input, Tomas Winkler,
Arnd Bergmann, Jiri Kosina, Artem Bityutskiy, Greg Kroah-Hartman,
linux-usb, linux-kernel, Sudip Mukherjee, Stefan Richter, netdev,
linux-fsdevel, linux-integrity
In-Reply-To: <20180912150142.157913-1-arnd@arndb.de>
Each of these drivers has a copy of the same trivial helper function to
convert the pointer argument and then call the native ioctl handler.
We now have a generic implementation of that, so use it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/char/ppdev.c | 12 +---------
drivers/char/tpm/tpm_vtpm_proxy.c | 12 +---------
drivers/firewire/core-cdev.c | 12 +---------
drivers/hid/usbhid/hiddev.c | 11 +--------
drivers/hwtracing/stm/core.c | 12 +---------
drivers/misc/mei/main.c | 22 +----------------
drivers/mtd/ubi/cdev.c | 36 +++-------------------------
drivers/net/tap.c | 12 +---------
drivers/staging/pi433/pi433_if.c | 12 +---------
drivers/usb/core/devio.c | 16 +------------
drivers/vfio/vfio.c | 39 +++----------------------------
drivers/vhost/net.c | 12 +---------
drivers/vhost/scsi.c | 12 +---------
drivers/vhost/test.c | 12 +---------
drivers/vhost/vsock.c | 12 +---------
fs/fat/file.c | 13 +----------
16 files changed, 20 insertions(+), 237 deletions(-)
diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c
index 1ae77b41050a..c38a62457cf0 100644
--- a/drivers/char/ppdev.c
+++ b/drivers/char/ppdev.c
@@ -674,14 +674,6 @@ static long pp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return ret;
}
-#ifdef CONFIG_COMPAT
-static long pp_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- return pp_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static int pp_open(struct inode *inode, struct file *file)
{
unsigned int minor = iminor(inode);
@@ -790,9 +782,7 @@ static const struct file_operations pp_fops = {
.write = pp_write,
.poll = pp_poll,
.unlocked_ioctl = pp_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = pp_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = pp_open,
.release = pp_release,
};
diff --git a/drivers/char/tpm/tpm_vtpm_proxy.c b/drivers/char/tpm/tpm_vtpm_proxy.c
index 87a0ce47f201..a170f5ca7416 100644
--- a/drivers/char/tpm/tpm_vtpm_proxy.c
+++ b/drivers/char/tpm/tpm_vtpm_proxy.c
@@ -678,20 +678,10 @@ static long vtpmx_fops_ioctl(struct file *f, unsigned int ioctl,
}
}
-#ifdef CONFIG_COMPAT
-static long vtpmx_fops_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vtpmx_fops_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations vtpmx_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = vtpmx_fops_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vtpmx_fops_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.llseek = noop_llseek,
};
diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index d8e185582642..2acc0c9ddf94 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -1659,14 +1659,6 @@ static long fw_device_op_ioctl(struct file *file,
return dispatch_ioctl(file->private_data, cmd, (void __user *)arg);
}
-#ifdef CONFIG_COMPAT
-static long fw_device_op_compat_ioctl(struct file *file,
- unsigned int cmd, unsigned long arg)
-{
- return dispatch_ioctl(file->private_data, cmd, compat_ptr(arg));
-}
-#endif
-
static int fw_device_op_mmap(struct file *file, struct vm_area_struct *vma)
{
struct client *client = file->private_data;
@@ -1808,7 +1800,5 @@ const struct file_operations fw_device_ops = {
.mmap = fw_device_op_mmap,
.release = fw_device_op_release,
.poll = fw_device_op_poll,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = fw_device_op_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
diff --git a/drivers/hid/usbhid/hiddev.c b/drivers/hid/usbhid/hiddev.c
index 23872d08308c..73a168f97024 100644
--- a/drivers/hid/usbhid/hiddev.c
+++ b/drivers/hid/usbhid/hiddev.c
@@ -845,13 +845,6 @@ static long hiddev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return r;
}
-#ifdef CONFIG_COMPAT
-static long hiddev_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
- return hiddev_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations hiddev_fops = {
.owner = THIS_MODULE,
.read = hiddev_read,
@@ -861,9 +854,7 @@ static const struct file_operations hiddev_fops = {
.release = hiddev_release,
.unlocked_ioctl = hiddev_ioctl,
.fasync = hiddev_fasync,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = hiddev_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.llseek = noop_llseek,
};
diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index 10bcb5d73f90..3f5cbb948781 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -651,23 +651,13 @@ stm_char_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return err;
}
-#ifdef CONFIG_COMPAT
-static long
-stm_char_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
- return stm_char_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
-}
-#else
-#define stm_char_compat_ioctl NULL
-#endif
-
static const struct file_operations stm_fops = {
.open = stm_char_open,
.release = stm_char_release,
.write = stm_char_write,
.mmap = stm_char_mmap,
.unlocked_ioctl = stm_char_ioctl,
- .compat_ioctl = stm_char_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.llseek = no_llseek,
};
diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c
index 4d77a6ae183a..a26b645e432f 100644
--- a/drivers/misc/mei/main.c
+++ b/drivers/misc/mei/main.c
@@ -535,24 +535,6 @@ static long mei_ioctl(struct file *file, unsigned int cmd, unsigned long data)
return rets;
}
-/**
- * mei_compat_ioctl - the compat IOCTL function
- *
- * @file: pointer to file structure
- * @cmd: ioctl command
- * @data: pointer to mei message structure
- *
- * Return: 0 on success , <0 on error
- */
-#ifdef CONFIG_COMPAT
-static long mei_compat_ioctl(struct file *file,
- unsigned int cmd, unsigned long data)
-{
- return mei_ioctl(file, cmd, (unsigned long)compat_ptr(data));
-}
-#endif
-
-
/**
* mei_poll - the poll function
*
@@ -855,9 +837,7 @@ static const struct file_operations mei_fops = {
.owner = THIS_MODULE,
.read = mei_read,
.unlocked_ioctl = mei_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = mei_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = mei_open,
.release = mei_release,
.write = mei_write,
diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c
index 22547d7a84ea..2aad1da86acc 100644
--- a/drivers/mtd/ubi/cdev.c
+++ b/drivers/mtd/ubi/cdev.c
@@ -1061,36 +1061,6 @@ static long ctrl_cdev_ioctl(struct file *file, unsigned int cmd,
return err;
}
-#ifdef CONFIG_COMPAT
-static long vol_cdev_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- unsigned long translated_arg = (unsigned long)compat_ptr(arg);
-
- return vol_cdev_ioctl(file, cmd, translated_arg);
-}
-
-static long ubi_cdev_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- unsigned long translated_arg = (unsigned long)compat_ptr(arg);
-
- return ubi_cdev_ioctl(file, cmd, translated_arg);
-}
-
-static long ctrl_cdev_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- unsigned long translated_arg = (unsigned long)compat_ptr(arg);
-
- return ctrl_cdev_ioctl(file, cmd, translated_arg);
-}
-#else
-#define vol_cdev_compat_ioctl NULL
-#define ubi_cdev_compat_ioctl NULL
-#define ctrl_cdev_compat_ioctl NULL
-#endif
-
/* UBI volume character device operations */
const struct file_operations ubi_vol_cdev_operations = {
.owner = THIS_MODULE,
@@ -1101,7 +1071,7 @@ const struct file_operations ubi_vol_cdev_operations = {
.write = vol_cdev_write,
.fsync = vol_cdev_fsync,
.unlocked_ioctl = vol_cdev_ioctl,
- .compat_ioctl = vol_cdev_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
/* UBI character device operations */
@@ -1109,13 +1079,13 @@ const struct file_operations ubi_cdev_operations = {
.owner = THIS_MODULE,
.llseek = no_llseek,
.unlocked_ioctl = ubi_cdev_ioctl,
- .compat_ioctl = ubi_cdev_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
/* UBI control character device operations */
const struct file_operations ubi_ctrl_cdev_operations = {
.owner = THIS_MODULE,
.unlocked_ioctl = ctrl_cdev_ioctl,
- .compat_ioctl = ctrl_cdev_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.llseek = no_llseek,
};
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index f0f7cd977667..720deb07f2b4 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1124,14 +1124,6 @@ static long tap_ioctl(struct file *file, unsigned int cmd,
}
}
-#ifdef CONFIG_COMPAT
-static long tap_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- return tap_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations tap_fops = {
.owner = THIS_MODULE,
.open = tap_open,
@@ -1141,9 +1133,7 @@ static const struct file_operations tap_fops = {
.poll = tap_poll,
.llseek = no_llseek,
.unlocked_ioctl = tap_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = tap_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
static int tap_sendmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c
index c85a805a1243..9e4caf7ad384 100644
--- a/drivers/staging/pi433/pi433_if.c
+++ b/drivers/staging/pi433/pi433_if.c
@@ -945,16 +945,6 @@ pi433_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return retval;
}
-#ifdef CONFIG_COMPAT
-static long
-pi433_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
-{
- return pi433_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
-}
-#else
-#define pi433_compat_ioctl NULL
-#endif /* CONFIG_COMPAT */
-
/*-------------------------------------------------------------------------*/
static int pi433_open(struct inode *inode, struct file *filp)
@@ -1111,7 +1101,7 @@ static const struct file_operations pi433_fops = {
.write = pi433_write,
.read = pi433_read,
.unlocked_ioctl = pi433_ioctl,
- .compat_ioctl = pi433_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = pi433_open,
.release = pi433_release,
.llseek = no_llseek,
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 6ce77b33da61..269e0befba2d 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -2553,18 +2553,6 @@ static long usbdev_ioctl(struct file *file, unsigned int cmd,
return ret;
}
-#ifdef CONFIG_COMPAT
-static long usbdev_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- int ret;
-
- ret = usbdev_do_ioctl(file, cmd, compat_ptr(arg));
-
- return ret;
-}
-#endif
-
/* No kernel lock - fine */
static __poll_t usbdev_poll(struct file *file,
struct poll_table_struct *wait)
@@ -2588,9 +2576,7 @@ const struct file_operations usbdev_file_operations = {
.read = usbdev_read,
.poll = usbdev_poll,
.unlocked_ioctl = usbdev_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = usbdev_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.mmap = usbdev_mmap,
.open = usbdev_open,
.release = usbdev_release,
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 64833879f75d..79f08a99602d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1200,15 +1200,6 @@ static long vfio_fops_unl_ioctl(struct file *filep,
return ret;
}
-#ifdef CONFIG_COMPAT
-static long vfio_fops_compat_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
-{
- arg = (unsigned long)compat_ptr(arg);
- return vfio_fops_unl_ioctl(filep, cmd, arg);
-}
-#endif /* CONFIG_COMPAT */
-
static int vfio_fops_open(struct inode *inode, struct file *filep)
{
struct vfio_container *container;
@@ -1291,9 +1282,7 @@ static const struct file_operations vfio_fops = {
.read = vfio_fops_read,
.write = vfio_fops_write,
.unlocked_ioctl = vfio_fops_unl_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vfio_fops_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.mmap = vfio_fops_mmap,
};
@@ -1572,15 +1561,6 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
return ret;
}
-#ifdef CONFIG_COMPAT
-static long vfio_group_fops_compat_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
-{
- arg = (unsigned long)compat_ptr(arg);
- return vfio_group_fops_unl_ioctl(filep, cmd, arg);
-}
-#endif /* CONFIG_COMPAT */
-
static int vfio_group_fops_open(struct inode *inode, struct file *filep)
{
struct vfio_group *group;
@@ -1636,9 +1616,7 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
static const struct file_operations vfio_group_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = vfio_group_fops_unl_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vfio_group_fops_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = vfio_group_fops_open,
.release = vfio_group_fops_release,
};
@@ -1703,24 +1681,13 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
return device->ops->mmap(device->device_data, vma);
}
-#ifdef CONFIG_COMPAT
-static long vfio_device_fops_compat_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
-{
- arg = (unsigned long)compat_ptr(arg);
- return vfio_device_fops_unl_ioctl(filep, cmd, arg);
-}
-#endif /* CONFIG_COMPAT */
-
static const struct file_operations vfio_device_fops = {
.owner = THIS_MODULE,
.release = vfio_device_fops_release,
.read = vfio_device_fops_read,
.write = vfio_device_fops_write,
.unlocked_ioctl = vfio_device_fops_unl_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vfio_device_fops_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.mmap = vfio_device_fops_mmap,
};
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 4e656f89cb22..e9624350f6a5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1535,14 +1535,6 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
}
}
-#ifdef CONFIG_COMPAT
-static long vhost_net_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vhost_net_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static ssize_t vhost_net_chr_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
@@ -1578,9 +1570,7 @@ static const struct file_operations vhost_net_fops = {
.write_iter = vhost_net_chr_write_iter,
.poll = vhost_net_chr_poll,
.unlocked_ioctl = vhost_net_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vhost_net_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = vhost_net_open,
.llseek = noop_llseek,
};
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index c24bb690680b..9180d38de353 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1495,21 +1495,11 @@ vhost_scsi_ioctl(struct file *f,
}
}
-#ifdef CONFIG_COMPAT
-static long vhost_scsi_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vhost_scsi_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations vhost_scsi_fops = {
.owner = THIS_MODULE,
.release = vhost_scsi_release,
.unlocked_ioctl = vhost_scsi_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vhost_scsi_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = vhost_scsi_open,
.llseek = noop_llseek,
};
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 40589850eb33..0b185b4712fb 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -298,21 +298,11 @@ static long vhost_test_ioctl(struct file *f, unsigned int ioctl,
}
}
-#ifdef CONFIG_COMPAT
-static long vhost_test_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vhost_test_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations vhost_test_fops = {
.owner = THIS_MODULE,
.release = vhost_test_release,
.unlocked_ioctl = vhost_test_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vhost_test_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = vhost_test_open,
.llseek = noop_llseek,
};
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 34bc3ab40c6d..83c60f3a9c09 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -699,23 +699,13 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
}
}
-#ifdef CONFIG_COMPAT
-static long vhost_vsock_dev_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vhost_vsock_dev_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations vhost_vsock_fops = {
.owner = THIS_MODULE,
.open = vhost_vsock_dev_open,
.release = vhost_vsock_dev_release,
.llseek = noop_llseek,
.unlocked_ioctl = vhost_vsock_dev_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vhost_vsock_dev_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
static struct miscdevice vhost_vsock_misc = {
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 4f3d72fb1e60..c52c9e9ca36b 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -171,15 +171,6 @@ long fat_generic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
}
}
-#ifdef CONFIG_COMPAT
-static long fat_generic_compat_ioctl(struct file *filp, unsigned int cmd,
- unsigned long arg)
-
-{
- return fat_generic_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static int fat_file_release(struct inode *inode, struct file *filp)
{
if ((filp->f_mode & FMODE_WRITE) &&
@@ -209,9 +200,7 @@ const struct file_operations fat_file_operations = {
.mmap = generic_file_mmap,
.release = fat_file_release,
.unlocked_ioctl = fat_generic_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = fat_generic_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.fsync = fat_file_fsync,
.splice_read = generic_file_splice_read,
.fallocate = fat_fallocate,
--
2.18.0
^ permalink raw reply related
* Re: [PATCH net-next v2 2/5] virtio_ring: support creating packed ring
From: Michael S. Tsirkin @ 2018-09-12 16:12 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180912075140.GA22545@debian>
On Wed, Sep 12, 2018 at 03:51:40PM +0800, Tiwei Bie wrote:
> On Mon, Sep 10, 2018 at 10:28:37AM +0800, Tiwei Bie wrote:
> > On Fri, Sep 07, 2018 at 10:03:24AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 11, 2018 at 10:27:08AM +0800, Tiwei Bie wrote:
> > > > This commit introduces the support for creating packed ring.
> > > > All split ring specific functions are added _split suffix.
> > > > Some necessary stubs for packed ring are also added.
> > > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > >
> [...]
> > > > +
> > > > +/*
> > > > + * The layout for the packed ring is a continuous chunk of memory
> > > > + * which looks like this.
> > > > + *
> > > > + * struct vring_packed {
> > > > + * // The actual descriptors (16 bytes each)
> > > > + * struct vring_packed_desc desc[num];
> > > > + *
> > > > + * // Padding to the next align boundary.
> > > > + * char pad[];
> > > > + *
> > > > + * // Driver Event Suppression
> > > > + * struct vring_packed_desc_event driver;
> > > > + *
> > > > + * // Device Event Suppression
> > > > + * struct vring_packed_desc_event device;
> > > > + * };
> > > > + */
> > >
> > > Why not just allocate event structures separately?
> > > Is it a win to have them share a cache line for some reason?
>
> Users may call vring_new_virtqueue() with preallocated
> memory to setup the ring, e.g.:
>
> https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/s390/virtio/virtio_ccw.c#L513-L522
> https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/misc/mic/vop/vop_main.c#L306-L307
>
> Below is the corresponding definition in split ring:
>
> https://github.com/oasis-tcs/virtio-spec/blob/89dd55f5e606/split-ring.tex#L64-L78
>
> If my understanding is correct, this is just for legacy
> interfaces, and we won't define layout in packed ring
> and don't need to support vring_new_virtqueue() in packed
> ring. Is it correct? Thanks!
mic doesn't support 1.0 yet but ccw does.
It's probably best to look into converting ccw to
virtio_create_virtqueue, then you can just fail
vring_new_virtqueue for packed.
Cornelia, are there any gotchas to look out for?
>
>
> >
> > Will do that.
> >
> > >
> > > > +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> > > > + void *p, unsigned long align)
> > > > +{
> > > > + vr->num = num;
> > > > + vr->desc = p;
> > > > + vr->driver = (void *)ALIGN(((uintptr_t)p +
> > > > + sizeof(struct vring_packed_desc) * num), align);
> > > > + vr->device = vr->driver + 1;
> > > > +}
> > >
> > > What's all this about alignment? Where does it come from?
> >
> > It comes from the `vring_align` parameter of vring_create_virtqueue()
> > and vring_new_virtqueue():
> >
> > https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1061
> > https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1123
> >
> > Should I just ignore it in packed ring?
> >
> > CCW defined this:
> >
> > #define KVM_VIRTIO_CCW_RING_ALIGN 4096
> >
> > I'm not familiar with CCW. Currently, in this patch set, packed ring
> > isn't enabled on CCW dues to some legacy accessors are not implemented
> > in packed ring yet.
> >
> > >
> > > > +
> > > > +static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> > > > +{
> > > > + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > > > + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > > > +}
> > [...]
> > > > @@ -1129,10 +1388,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> > > > void (*callback)(struct virtqueue *vq),
> > > > const char *name)
> > > > {
> > > > - struct vring vring;
> > > > - vring_init(&vring, num, pages, vring_align);
> > > > - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > > - notify, callback, name);
> > > > + union vring_union vring;
> > > > + bool packed;
> > > > +
> > > > + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > > > + if (packed)
> > > > + vring_init_packed(&vring.vring_packed, num, pages, vring_align);
> > > > + else
> > > > + vring_init(&vring.vring_split, num, pages, vring_align);
> > >
> > >
> > > vring_init in the UAPI header is more or less a bug.
> > > I'd just stop using it, keep it around for legacy userspace.
> >
> > Got it. I'd like to do that. Thanks.
> >
> > >
> > > > +
> > > > + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > > + context, notify, callback, name);
> > > > }
> > > > EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> > > >
> > > > @@ -1142,7 +1408,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> > > >
> > > > if (vq->we_own_ring) {
> > > > vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> > > > - vq->vring.desc, vq->queue_dma_addr);
> > > > + vq->packed ? (void *)vq->vring_packed.desc :
> > > > + (void *)vq->vring.desc,
> > > > + vq->queue_dma_addr);
> > > > }
> > > > list_del(&_vq->list);
> > > > kfree(vq);
> > > > @@ -1184,7 +1452,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
> > > >
> > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > - return vq->vring.num;
> > > > + return vq->packed ? vq->vring_packed.num : vq->vring.num;
> > > > }
> > > > EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
> > > >
> > > > @@ -1227,6 +1495,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> > > >
> > > > BUG_ON(!vq->we_own_ring);
> > > >
> > > > + if (vq->packed)
> > > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> > > > + (char *)vq->vring_packed.desc);
> > > > +
> > > > return vq->queue_dma_addr +
> > > > ((char *)vq->vring.avail - (char *)vq->vring.desc);
> > > > }
> > > > @@ -1238,11 +1510,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> > > >
> > > > BUG_ON(!vq->we_own_ring);
> > > >
> > > > + if (vq->packed)
> > > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> > > > + (char *)vq->vring_packed.desc);
> > > > +
> > > > return vq->queue_dma_addr +
> > > > ((char *)vq->vring.used - (char *)vq->vring.desc);
> > > > }
> > > > EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
> > > >
> > > > +/* Only available for split ring */
> > > > const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> > > > {
> > > > return &to_vvq(vq)->vring;
> > > > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > > > index fab02133a919..992142b35f55 100644
> > > > --- a/include/linux/virtio_ring.h
> > > > +++ b/include/linux/virtio_ring.h
> > > > @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> > > > struct virtio_device;
> > > > struct virtqueue;
> > > >
> > > > +union vring_union {
> > > > + struct vring vring_split;
> > > > + struct vring_packed vring_packed;
> > > > +};
> > > > +
> > > > /*
> > > > * Creates a virtqueue and allocates the descriptor ring. If
> > > > * may_reduce_num is set, then this may allocate a smaller ring than
> > > > @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
> > > >
> > > > /* Creates a virtqueue with a custom layout. */
> > > > struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > - struct vring vring,
> > > > + union vring_union vring,
> > > > + bool packed,
> > > > struct virtio_device *vdev,
> > > > bool weak_barriers,
> > > > bool ctx,
> > > > --
> > > > 2.18.0
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-09-12 16:16 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180911053726.GA7472@debian>
On Tue, Sep 11, 2018 at 01:37:26PM +0800, Tiwei Bie wrote:
> On Mon, Sep 10, 2018 at 11:33:17AM +0800, Jason Wang wrote:
> > On 2018年09月10日 11:00, Tiwei Bie wrote:
> > > On Fri, Sep 07, 2018 at 09:00:49AM -0400, Michael S. Tsirkin wrote:
> > > > On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> > > > > On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > > > > > Are there still plans to test the performance with vost pmd?
> > > > > > vhost doesn't seem to show a performance gain ...
> > > > > >
> > > > > I tried some performance tests with vhost PMD. In guest, the
> > > > > XDP program will return XDP_DROP directly. And in host, testpmd
> > > > > will do txonly fwd.
> > > > >
> > > > > When burst size is 1 and packet size is 64 in testpmd and
> > > > > testpmd needs to iterate 5 Tx queues (but only the first two
> > > > > queues are enabled) to prepare and inject packets, I got ~12%
> > > > > performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> > > > > is faster (e.g. just need to iterate the first two queues to
> > > > > prepare and inject packets), then I got similar performance
> > > > > for both rings (~9.9Mpps) (packed ring's performance can be
> > > > > lower, because it's more complicated in driver.)
> > > > >
> > > > > I think packed ring makes vhost PMD faster, but it doesn't make
> > > > > the driver faster. In packed ring, the ring is simplified, and
> > > > > the handling of the ring in vhost (device) is also simplified,
> > > > > but things are not simplified in driver, e.g. although there is
> > > > > no desc table in the virtqueue anymore, driver still needs to
> > > > > maintain a private desc state table (which is still managed as
> > > > > a list in this patch set) to support the out-of-order desc
> > > > > processing in vhost (device).
> > > > >
> > > > > I think this patch set is mainly to make the driver have a full
> > > > > functional support for the packed ring, which makes it possible
> > > > > to leverage the packed ring feature in vhost (device). But I'm
> > > > > not sure whether there is any other better idea, I'd like to
> > > > > hear your thoughts. Thanks!
> > > > Just this: Jens seems to report a nice gain with virtio and
> > > > vhost pmd across the board. Try to compare virtio and
> > > > virtio pmd to see what does pmd do better?
> > > The virtio PMD (drivers/net/virtio) in DPDK doesn't need to share
> > > the virtio ring operation code with other drivers and is highly
> > > optimized for network. E.g. in Rx, the Rx burst function won't
> > > chain descs. So the ID management for the Rx ring can be quite
> > > simple and straightforward, we just need to initialize these IDs
> > > when initializing the ring and don't need to change these IDs
> > > in data path anymore (the mergable Rx code in that patch set
> > > assumes the descs will be written back in order, which should be
> > > fixed. I.e., the ID in the desc should be used to index vq->descx[]).
> > > The Tx code in that patch set also assumes the descs will be
> > > written back by device in order, which should be fixed.
> >
> > Yes it is. I think I've pointed it out in some early version of pmd patch.
> > So I suspect part (or all) of the boost may come from in order feature.
> >
> > >
> > > But in kernel virtio driver, the virtio_ring.c is very generic.
> > > The enqueue (virtqueue_add()) and dequeue (virtqueue_get_buf_ctx())
> > > functions need to support all the virtio devices and should be
> > > able to handle all the possible cases that may happen. So although
> > > the packed ring can be very efficient in some cases, currently
> > > the room to optimize the performance in kernel's virtio_ring.c
> > > isn't that much. If we want to take the fully advantage of the
> > > packed ring's efficiency, we need some further e.g. API changes
> > > in virtio_ring.c, which shouldn't be part of this patch set.
> >
> > Could you please share more thoughts on this e.g how to improve the API?
> > Notice since the API is shared by both split ring and packed ring, it may
> > improve the performance of split ring as well. One can easily imagine a
> > batching API, but it does not have many real users now, the only case is the
> > XDP transmission which can accept an array of XDP frames.
>
> I don't have detailed thoughts on this yet. But kernel's
> virtio_ring.c is quite generic compared with what we did
> in virtio PMD.
In what way? What are some things that aren't implemented there?
If what you say is true then we should take a careful look
and not supporting these generic things with packed layout.
Once we do support them it will be too late and we won't
be able to get performance back.
> >
> > > So
> > > I still think this patch set is mainly to make the kernel virtio
> > > driver to have a full functional support of the packed ring, and
> > > we can't expect impressive performance boost with it.
> >
> > We can only gain when virtio ring layout is the bottleneck. If there're
> > bottlenecks elsewhere, we probably won't see any increasing in the numbers.
> > Vhost-net is an example, and lots of optimizations have proved that virtio
> > ring is not the main bottleneck for the current codes. I suspect it also the
> > case of virtio driver. Did perf tell us any interesting things in virtio
> > driver?
> >
> > Thanks
> >
> > >
> > > >
> > > > > > On Wed, Jul 11, 2018 at 10:27:06AM +0800, Tiwei Bie wrote:
> > > > > > > Hello everyone,
> > > > > > >
> > > > > > > This patch set implements packed ring support in virtio driver.
> > > > > > >
> > > > > > > Some functional tests have been done with Jason's
> > > > > > > packed ring implementation in vhost:
> > > > > > >
> > > > > > > https://lkml.org/lkml/2018/7/3/33
> > > > > > >
> > > > > > > Both of ping and netperf worked as expected.
> > > > > > >
> > > > > > > v1 -> v2:
> > > > > > > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > > > > > > - Add comments related to ccw (Jason);
> > > > > > >
> > > > > > > RFC (v6) -> v1:
> > > > > > > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> > > > > > > when event idx is off (Jason);
> > > > > > > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > > > - Test the state of the desc at used_idx instead of last_used_idx
> > > > > > > in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > > > - Save wrap counter (as part of queue state) in the return value
> > > > > > > of virtqueue_enable_cb_prepare_packed();
> > > > > > > - Refine the packed ring definitions in uapi;
> > > > > > > - Rebase on the net-next tree;
> > > > > > >
> > > > > > > RFC v5 -> RFC v6:
> > > > > > > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > > > > > > - Define wrap counter as bool (Jason);
> > > > > > > - Use ALIGN() in vring_init_packed() (Jason);
> > > > > > > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > > > > > > - Add comments for barriers (Jason);
> > > > > > > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > > > > > > - Refine the memory barrier in virtqueue_poll();
> > > > > > > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > > > > > > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > > > > > >
> > > > > > > RFC v4 -> RFC v5:
> > > > > > > - Save DMA addr, etc in desc state (Jason);
> > > > > > > - Track used wrap counter;
> > > > > > >
> > > > > > > RFC v3 -> RFC v4:
> > > > > > > - Make ID allocation support out-of-order (Jason);
> > > > > > > - Various fixes for EVENT_IDX support;
> > > > > > >
> > > > > > > RFC v2 -> RFC v3:
> > > > > > > - Split into small patches (Jason);
> > > > > > > - Add helper virtqueue_use_indirect() (Jason);
> > > > > > > - Just set id for the last descriptor of a list (Jason);
> > > > > > > - Calculate the prev in virtqueue_add_packed() (Jason);
> > > > > > > - Fix/improve desc suppression code (Jason/MST);
> > > > > > > - Refine the code layout for XXX_split/packed and wrappers (MST);
> > > > > > > - Fix the comments and API in uapi (MST);
> > > > > > > - Remove the BUG_ON() for indirect (Jason);
> > > > > > > - Some other refinements and bug fixes;
> > > > > > >
> > > > > > > RFC v1 -> RFC v2:
> > > > > > > - Add indirect descriptor support - compile test only;
> > > > > > > - Add event suppression supprt - compile test only;
> > > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > > - Avoid using '%' operator (Jason);
> > > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > > - Some other refinements and bug fixes;
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Tiwei Bie (5):
> > > > > > > virtio: add packed ring definitions
> > > > > > > virtio_ring: support creating packed ring
> > > > > > > virtio_ring: add packed ring support
> > > > > > > virtio_ring: add event idx support in packed ring
> > > > > > > virtio_ring: enable packed ring
> > > > > > >
> > > > > > > drivers/s390/virtio/virtio_ccw.c | 14 +
> > > > > > > drivers/virtio/virtio_ring.c | 1365 ++++++++++++++++++++++------
> > > > > > > include/linux/virtio_ring.h | 8 +-
> > > > > > > include/uapi/linux/virtio_config.h | 3 +
> > > > > > > include/uapi/linux/virtio_ring.h | 43 +
> > > > > > > 5 files changed, 1157 insertions(+), 276 deletions(-)
> > > > > > >
> > > > > > > --
> > > > > > > 2.18.0
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 02/17] compat_ioctl: move drivers to generic_compat_ioctl_ptrarg
From: Arnd Bergmann @ 2018-09-12 16:20 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: kvm, Alexander Shishkin, Jarkko Sakkinen, virtualization,
Benjamin Tissoires, linux-mtd, Peter Huewe, linux1394-devel,
driverdevel, Marek Vasut, open list:HID CORE LAYER,
Winkler, Tomas, Jiri Kosina, Al Viro, Artem Bityutskiy, gregkh,
USB list, Linux Kernel Mailing List, Sudip Mukherjee,
Stefan Richter, Networking
In-Reply-To: <20180912153300.GE5633@ziepe.ca>
On Wed, Sep 12, 2018 at 5:33 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Sep 12, 2018 at 05:01:03PM +0200, Arnd Bergmann wrote:
> > Each of these drivers has a copy of the same trivial helper function to
> > convert the pointer argument and then call the native ioctl handler.
> >
> > We now have a generic implementation of that, so use it.
> >
>
> For vtpm:
>
> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
>
> Arnd, would you consider including a patch as part of/after this
> series to make compat_ioctl in drivers/infiniband/core/uverbs_main.c
> use this as well? Looks like a bug too?
That should be included in patch 5 in this series. I may have skipped
some Cc there since it had too many recipients (sent only to the
mailing lists instead).
Arnd
^ permalink raw reply
* Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support
From: Michael S. Tsirkin @ 2018-09-12 16:22 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180711022711.7090-4-tiwei.bie@intel.com>
On Wed, Jul 11, 2018 at 10:27:09AM +0800, Tiwei Bie wrote:
> This commit introduces the support (without EVENT_IDX) for
> packed ring.
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
> drivers/virtio/virtio_ring.c | 495 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 487 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c4f8abc7445a..f317b485ba54 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -55,12 +55,21 @@
> #define END_USE(vq)
> #endif
>
> +#define _VRING_DESC_F_AVAIL(b) ((__u16)(b) << 7)
> +#define _VRING_DESC_F_USED(b) ((__u16)(b) << 15)
> +
> struct vring_desc_state {
> void *data; /* Data for callback. */
> struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> };
>
> struct vring_desc_state_packed {
> + void *data; /* Data for callback. */
> + struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> + int num; /* Descriptor list length. */
> + dma_addr_t addr; /* Buffer DMA addr. */
> + u32 len; /* Buffer length. */
> + u16 flags; /* Descriptor flags. */
> int next; /* The next desc state. */
> };
>
Idea: how about using data to chain these descriptors?
You can validate it's pointing within an array to distinguish
e.g. on detach.
--
MST
^ permalink raw reply
* Re: [PATCH v2 02/17] compat_ioctl: move drivers to generic_compat_ioctl_ptrarg
From: Greg Kroah-Hartman @ 2018-09-12 18:13 UTC (permalink / raw)
To: Arnd Bergmann
Cc: kvm, Alexander Shishkin, Jarkko Sakkinen, virtualization,
Benjamin Tissoires, linux-mtd, Peter Huewe, linux1394-devel,
devel, Jason Gunthorpe, Marek Vasut, linux-input, Tomas Winkler,
Jiri Kosina, viro, Artem Bityutskiy, netdev, linux-usb,
linux-kernel, Sudip Mukherjee, Stefan Richter, linux-fsdevel,
linux-integrity, David S. Miller
In-Reply-To: <20180912150142.157913-2-arnd@arndb.de>
On Wed, Sep 12, 2018 at 05:01:03PM +0200, Arnd Bergmann wrote:
> Each of these drivers has a copy of the same trivial helper function to
> convert the pointer argument and then call the native ioctl handler.
>
> We now have a generic implementation of that, so use it.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply
* [PATCH net-next V2] virtio_net: ethtool tx napi configuration
From: Jason Wang @ 2018-09-13 5:35 UTC (permalink / raw)
To: mst, jasowang
Cc: netdev, Willem de Bruijn, davem, linux-kernel, virtualization
Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
Interrupt moderation is currently not supported, so these accept and
display the default settings of 0 usec and 1 frame.
Toggle tx napi through a bit in tx-frames. So as to not interfere
with possible future interrupt moderation, value 1 means tx napi while
value 0 means not.
To properly synchronize with the data path, tx napi is disabled and
tx lock is held when changing the value of napi weight. And two more
places that can access tx napi weight:
- speculative tx polling in rx napi, we can leave it as is since it
not a must for correctness.
- skb_xmit_done(), one more check of napi weight is added before
trying to enable tx to avoid tx to be disabled forever if napi is
disabled after skb_xmit_done() but before the napi
Link: https://patchwork.ozlabs.org/patch/948149/
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from V1:
- try to synchronize with datapath to allow changing mode when
interface is up.
- use tx-frames 0 as to disable tx napi while tx-frames 1 to enable tx napi
---
drivers/net/virtio_net.c | 64 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 63 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 765920905226..6e70864f5899 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -66,6 +66,8 @@ DECLARE_EWMA(pkt_len, 0, 64)
#define VIRTNET_DRIVER_VERSION "1.0.0"
+static const u32 ethtool_coalesce_napi_mask = (1UL << 10);
+
static const unsigned long guest_offloads[] = {
VIRTIO_NET_F_GUEST_TSO4,
VIRTIO_NET_F_GUEST_TSO6,
@@ -1444,7 +1446,10 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
virtqueue_napi_complete(napi, sq->vq, 0);
- if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+ /* Check napi.weight to avoid tx stall since it could be set
+ * to zero by ethtool after skb_xmit_done().
+ */
+ if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS || !sq->napi.weight)
netif_tx_wake_queue(txq);
return 0;
@@ -2181,6 +2186,61 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
return 0;
}
+static int virtnet_set_coalesce(struct net_device *dev,
+ struct ethtool_coalesce *ec)
+{
+ struct ethtool_coalesce ec_default = {
+ .cmd = ETHTOOL_SCOALESCE,
+ .rx_max_coalesced_frames = 1,
+ };
+ struct virtnet_info *vi = netdev_priv(dev);
+ int i, napi_weight;
+
+ if (ec->tx_max_coalesced_frames > 1)
+ return -EINVAL;
+
+ ec_default.tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
+ napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
+
+ /* disallow changes to fields not explicitly tested above */
+ if (memcmp(ec, &ec_default, sizeof(ec_default)))
+ return -EINVAL;
+
+ if (napi_weight ^ vi->sq[0].napi.weight) {
+ for (i = 0; i < vi->max_queue_pairs; i++) {
+ struct netdev_queue *txq =
+ netdev_get_tx_queue(vi->dev, i);
+
+ virtnet_napi_tx_disable(&vi->sq[i].napi);
+ __netif_tx_lock_bh(txq);
+ vi->sq[i].napi.weight = napi_weight;
+ __netif_tx_unlock_bh(txq);
+ virtnet_napi_tx_enable(vi, vi->sq[i].vq,
+ &vi->sq[i].napi);
+ }
+ }
+
+ return 0;
+}
+
+static int virtnet_get_coalesce(struct net_device *dev,
+ struct ethtool_coalesce *ec)
+{
+ struct ethtool_coalesce ec_default = {
+ .cmd = ETHTOOL_GCOALESCE,
+ .rx_max_coalesced_frames = 1,
+ .tx_max_coalesced_frames = 0,
+ };
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ memcpy(ec, &ec_default, sizeof(ec_default));
+
+ if (vi->sq[0].napi.weight)
+ ec->tx_max_coalesced_frames = 1;
+
+ return 0;
+}
+
static void virtnet_init_settings(struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
@@ -2219,6 +2279,8 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
.get_ts_info = ethtool_op_get_ts_info,
.get_link_ksettings = virtnet_get_link_ksettings,
.set_link_ksettings = virtnet_set_link_ksettings,
+ .set_coalesce = virtnet_set_coalesce,
+ .get_coalesce = virtnet_get_coalesce,
};
static void virtnet_freeze_down(struct virtio_device *vdev)
--
2.17.1
^ permalink raw reply related
* [PATCH 2/3] virtio-gpu: add VIRTIO_GPU_F_EDID feature
From: Gerd Hoffmann @ 2018-09-13 8:12 UTC (permalink / raw)
To: virtio-dev, dri-devel
Cc: Michael S. Tsirkin, David Airlie, open list,
open list:VIRTIO CORE, NET AND BLOCK DRIVERS
In-Reply-To: <20180913081259.27457-1-kraxel@redhat.com>
The feature allows the guest request an EDID blob (describing monitor
capabilities) for a given scanout (aka virtual monitor connector).
It brings a new command message, which has just a scanout field (beside
the standard virtio-gpu header) and a response message which carries the
EDID data.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
include/uapi/linux/virtio_gpu.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/uapi/linux/virtio_gpu.h b/include/uapi/linux/virtio_gpu.h
index f43c3c6171..7267c3d2d6 100644
--- a/include/uapi/linux/virtio_gpu.h
+++ b/include/uapi/linux/virtio_gpu.h
@@ -41,6 +41,7 @@
#include <linux/types.h>
#define VIRTIO_GPU_F_VIRGL 0
+#define VIRTIO_GPU_F_EDID 1
enum virtio_gpu_ctrl_type {
VIRTIO_GPU_UNDEFINED = 0,
@@ -56,6 +57,7 @@ enum virtio_gpu_ctrl_type {
VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING,
VIRTIO_GPU_CMD_GET_CAPSET_INFO,
VIRTIO_GPU_CMD_GET_CAPSET,
+ VIRTIO_GPU_CMD_GET_EDID,
/* 3d commands */
VIRTIO_GPU_CMD_CTX_CREATE = 0x0200,
@@ -76,6 +78,7 @@ enum virtio_gpu_ctrl_type {
VIRTIO_GPU_RESP_OK_DISPLAY_INFO,
VIRTIO_GPU_RESP_OK_CAPSET_INFO,
VIRTIO_GPU_RESP_OK_CAPSET,
+ VIRTIO_GPU_RESP_OK_EDID,
/* error responses */
VIRTIO_GPU_RESP_ERR_UNSPEC = 0x1200,
@@ -291,6 +294,20 @@ struct virtio_gpu_resp_capset {
__u8 capset_data[];
};
+/* VIRTIO_GPU_CMD_GET_EDID */
+struct virtio_gpu_get_edid {
+ struct virtio_gpu_ctrl_hdr hdr;
+ __le32 scanout;
+};
+
+/* VIRTIO_GPU_RESP_OK_EDID */
+struct virtio_gpu_resp_edid {
+ struct virtio_gpu_ctrl_hdr hdr;
+ __le32 scanout;
+ __le32 size;
+ __u8 edid[512];
+};
+
#define VIRTIO_GPU_EVENT_DISPLAY (1 << 0)
struct virtio_gpu_config {
--
2.9.3
^ permalink raw reply related
* [PATCH 3/3] drm/virtio: add edid support
From: Gerd Hoffmann @ 2018-09-13 8:12 UTC (permalink / raw)
To: virtio-dev, dri-devel
Cc: David Airlie, open list, open list:VIRTIO GPU DRIVER
In-Reply-To: <20180913081259.27457-1-kraxel@redhat.com>
linux guest driver implementation of the VIRTIO_GPU_F_EDID feature.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
drivers/gpu/drm/virtio/virtgpu_drv.h | 3 ++
drivers/gpu/drm/virtio/virtgpu_display.c | 6 ++++
drivers/gpu/drm/virtio/virtgpu_drv.c | 1 +
drivers/gpu/drm/virtio/virtgpu_kms.c | 8 +++++
drivers/gpu/drm/virtio/virtgpu_vq.c | 59 ++++++++++++++++++++++++++++++++
5 files changed, 77 insertions(+)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 65605e207b..58358a48df 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -112,6 +112,7 @@ struct virtio_gpu_output {
struct drm_encoder enc;
struct virtio_gpu_display_one info;
struct virtio_gpu_update_cursor cursor;
+ struct edid *edid;
int cur_x;
int cur_y;
};
@@ -194,6 +195,7 @@ struct virtio_gpu_device {
spinlock_t ctx_id_idr_lock;
bool has_virgl_3d;
+ bool has_edid;
struct work_struct config_changed_work;
@@ -287,6 +289,7 @@ int virtio_gpu_cmd_get_capset_info(struct virtio_gpu_device *vgdev, int idx);
int virtio_gpu_cmd_get_capset(struct virtio_gpu_device *vgdev,
int idx, int version,
struct virtio_gpu_drv_cap_cache **cache_p);
+int virtio_gpu_cmd_get_edids(struct virtio_gpu_device *vgdev);
void virtio_gpu_cmd_context_create(struct virtio_gpu_device *vgdev, uint32_t id,
uint32_t nlen, const char *name);
void virtio_gpu_cmd_context_destroy(struct virtio_gpu_device *vgdev,
diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index 25503b9335..e573eab1f1 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -168,6 +168,12 @@ static int virtio_gpu_conn_get_modes(struct drm_connector *connector)
struct drm_display_mode *mode = NULL;
int count, width, height;
+ if (output->edid) {
+ count = drm_add_edid_modes(connector, output->edid);
+ if (count)
+ return count;
+ }
+
width = le32_to_cpu(output->info.r.width);
height = le32_to_cpu(output->info.r.height);
count = drm_add_modes_noedid(connector, XRES_MAX, YRES_MAX);
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
index d9287c144f..f7f32a885a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -80,6 +80,7 @@ static unsigned int features[] = {
*/
VIRTIO_GPU_F_VIRGL,
#endif
+ VIRTIO_GPU_F_EDID,
};
static struct virtio_driver virtio_gpu_driver = {
.feature_table = features,
diff --git a/drivers/gpu/drm/virtio/virtgpu_kms.c b/drivers/gpu/drm/virtio/virtgpu_kms.c
index 65060c0852..f6cbbb27e4 100644
--- a/drivers/gpu/drm/virtio/virtgpu_kms.c
+++ b/drivers/gpu/drm/virtio/virtgpu_kms.c
@@ -44,6 +44,8 @@ static void virtio_gpu_config_changed_work_func(struct work_struct *work)
virtio_cread(vgdev->vdev, struct virtio_gpu_config,
events_read, &events_read);
if (events_read & VIRTIO_GPU_EVENT_DISPLAY) {
+ if (vgdev->has_edid)
+ virtio_gpu_cmd_get_edids(vgdev);
virtio_gpu_cmd_get_display_info(vgdev);
drm_helper_hpd_irq_event(vgdev->ddev);
events_clear |= VIRTIO_GPU_EVENT_DISPLAY;
@@ -174,6 +176,10 @@ int virtio_gpu_driver_load(struct drm_device *dev, unsigned long flags)
#else
DRM_INFO("virgl 3d acceleration not supported by guest\n");
#endif
+ if (virtio_has_feature(vgdev->vdev, VIRTIO_GPU_F_EDID)) {
+ vgdev->has_edid = true;
+ DRM_INFO("EDID support available.\n");
+ }
ret = virtio_find_vqs(vgdev->vdev, 2, vqs, callbacks, names, NULL);
if (ret) {
@@ -219,6 +225,8 @@ int virtio_gpu_driver_load(struct drm_device *dev, unsigned long flags)
if (num_capsets)
virtio_gpu_get_capsets(vgdev, num_capsets);
+ if (vgdev->has_edid)
+ virtio_gpu_cmd_get_edids(vgdev);
virtio_gpu_cmd_get_display_info(vgdev);
wait_event_timeout(vgdev->resp_wq, !vgdev->display_info_pending,
5 * HZ);
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 020070d483..7d79211d1f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -596,6 +596,37 @@ static void virtio_gpu_cmd_capset_cb(struct virtio_gpu_device *vgdev,
wake_up(&vgdev->resp_wq);
}
+static void virtio_gpu_cmd_get_edid_cb(struct virtio_gpu_device *vgdev,
+ struct virtio_gpu_vbuffer *vbuf)
+{
+ struct virtio_gpu_resp_edid *resp =
+ (struct virtio_gpu_resp_edid *)vbuf->resp_buf;
+ uint32_t scanout = le32_to_cpu(resp->scanout);
+ uint32_t size = le32_to_cpu(resp->size);
+ struct virtio_gpu_output *output;
+ struct edid *new_edid, *old_edid;
+
+ if (scanout >= vgdev->num_scanouts)
+ return;
+ output = vgdev->outputs + scanout;
+
+ if (drm_edid_is_valid((struct edid *)resp->edid)) {
+ new_edid = kmalloc(size, GFP_KERNEL);
+ memcpy(new_edid, resp->edid, size);
+ } else {
+ new_edid = NULL;
+ }
+
+ spin_lock(&vgdev->display_info_lock);
+ old_edid = output->edid;
+ output->edid = new_edid;
+ drm_connector_update_edid_property(&output->conn, output->edid);
+ spin_unlock(&vgdev->display_info_lock);
+
+ kfree(old_edid);
+ wake_up(&vgdev->resp_wq);
+}
+
int virtio_gpu_cmd_get_display_info(struct virtio_gpu_device *vgdev)
{
struct virtio_gpu_ctrl_hdr *cmd_p;
@@ -697,6 +728,34 @@ int virtio_gpu_cmd_get_capset(struct virtio_gpu_device *vgdev,
return 0;
}
+int virtio_gpu_cmd_get_edids(struct virtio_gpu_device *vgdev)
+{
+ struct virtio_gpu_get_edid *cmd_p;
+ struct virtio_gpu_vbuffer *vbuf;
+ void *resp_buf;
+ int scanout;
+
+ if (WARN_ON(!vgdev->has_edid))
+ return -EINVAL;
+
+ for (scanout = 0; scanout < vgdev->num_scanouts; scanout++) {
+ resp_buf = kzalloc(sizeof(struct virtio_gpu_resp_edid),
+ GFP_KERNEL);
+ if (!resp_buf)
+ return -ENOMEM;
+
+ cmd_p = virtio_gpu_alloc_cmd_resp
+ (vgdev, &virtio_gpu_cmd_get_edid_cb, &vbuf,
+ sizeof(*cmd_p), sizeof(struct virtio_gpu_resp_edid),
+ resp_buf);
+ cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_GET_EDID);
+ cmd_p->scanout = cpu_to_le32(scanout);
+ virtio_gpu_queue_ctrl_buffer(vgdev, vbuf);
+ }
+
+ return 0;
+}
+
void virtio_gpu_cmd_context_create(struct virtio_gpu_device *vgdev, uint32_t id,
uint32_t nlen, const char *name)
{
--
2.9.3
^ permalink raw reply related
* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-09-13 8:59 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180912121457-mutt-send-email-mst@kernel.org>
On Wed, Sep 12, 2018 at 12:16:32PM -0400, Michael S. Tsirkin wrote:
> On Tue, Sep 11, 2018 at 01:37:26PM +0800, Tiwei Bie wrote:
> > On Mon, Sep 10, 2018 at 11:33:17AM +0800, Jason Wang wrote:
> > > On 2018年09月10日 11:00, Tiwei Bie wrote:
> > > > On Fri, Sep 07, 2018 at 09:00:49AM -0400, Michael S. Tsirkin wrote:
> > > > > On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> > > > > > On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > > > > > > Are there still plans to test the performance with vost pmd?
> > > > > > > vhost doesn't seem to show a performance gain ...
> > > > > > >
> > > > > > I tried some performance tests with vhost PMD. In guest, the
> > > > > > XDP program will return XDP_DROP directly. And in host, testpmd
> > > > > > will do txonly fwd.
> > > > > >
> > > > > > When burst size is 1 and packet size is 64 in testpmd and
> > > > > > testpmd needs to iterate 5 Tx queues (but only the first two
> > > > > > queues are enabled) to prepare and inject packets, I got ~12%
> > > > > > performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> > > > > > is faster (e.g. just need to iterate the first two queues to
> > > > > > prepare and inject packets), then I got similar performance
> > > > > > for both rings (~9.9Mpps) (packed ring's performance can be
> > > > > > lower, because it's more complicated in driver.)
> > > > > >
> > > > > > I think packed ring makes vhost PMD faster, but it doesn't make
> > > > > > the driver faster. In packed ring, the ring is simplified, and
> > > > > > the handling of the ring in vhost (device) is also simplified,
> > > > > > but things are not simplified in driver, e.g. although there is
> > > > > > no desc table in the virtqueue anymore, driver still needs to
> > > > > > maintain a private desc state table (which is still managed as
> > > > > > a list in this patch set) to support the out-of-order desc
> > > > > > processing in vhost (device).
> > > > > >
> > > > > > I think this patch set is mainly to make the driver have a full
> > > > > > functional support for the packed ring, which makes it possible
> > > > > > to leverage the packed ring feature in vhost (device). But I'm
> > > > > > not sure whether there is any other better idea, I'd like to
> > > > > > hear your thoughts. Thanks!
> > > > > Just this: Jens seems to report a nice gain with virtio and
> > > > > vhost pmd across the board. Try to compare virtio and
> > > > > virtio pmd to see what does pmd do better?
> > > > The virtio PMD (drivers/net/virtio) in DPDK doesn't need to share
> > > > the virtio ring operation code with other drivers and is highly
> > > > optimized for network. E.g. in Rx, the Rx burst function won't
> > > > chain descs. So the ID management for the Rx ring can be quite
> > > > simple and straightforward, we just need to initialize these IDs
> > > > when initializing the ring and don't need to change these IDs
> > > > in data path anymore (the mergable Rx code in that patch set
> > > > assumes the descs will be written back in order, which should be
> > > > fixed. I.e., the ID in the desc should be used to index vq->descx[]).
> > > > The Tx code in that patch set also assumes the descs will be
> > > > written back by device in order, which should be fixed.
> > >
> > > Yes it is. I think I've pointed it out in some early version of pmd patch.
> > > So I suspect part (or all) of the boost may come from in order feature.
> > >
> > > >
> > > > But in kernel virtio driver, the virtio_ring.c is very generic.
> > > > The enqueue (virtqueue_add()) and dequeue (virtqueue_get_buf_ctx())
> > > > functions need to support all the virtio devices and should be
> > > > able to handle all the possible cases that may happen. So although
> > > > the packed ring can be very efficient in some cases, currently
> > > > the room to optimize the performance in kernel's virtio_ring.c
> > > > isn't that much. If we want to take the fully advantage of the
> > > > packed ring's efficiency, we need some further e.g. API changes
> > > > in virtio_ring.c, which shouldn't be part of this patch set.
> > >
> > > Could you please share more thoughts on this e.g how to improve the API?
> > > Notice since the API is shared by both split ring and packed ring, it may
> > > improve the performance of split ring as well. One can easily imagine a
> > > batching API, but it does not have many real users now, the only case is the
> > > XDP transmission which can accept an array of XDP frames.
> >
> > I don't have detailed thoughts on this yet. But kernel's
> > virtio_ring.c is quite generic compared with what we did
> > in virtio PMD.
>
> In what way? What are some things that aren't implemented there?
Below is the code corresponding to the virtqueue_add()
for Rx ring in virtio PMD:
https://github.com/DPDK/dpdk/blob/3605968c2fa7/drivers/net/virtio/virtio_rxtx.c#L278-L304
And below is the code of virtqueue_add() in Linux:
https://github.com/torvalds/linux/blob/54eda9df17f3/drivers/virtio/virtio_ring.c#L275-L417
In virtio PMD, for each packet (mbuf), the code is pretty
straightforward, it will just check whether there is one
available desc. If it's true, it will just fill this desc
directly.
But in virtqueue_add(), it's obvious that, the logic is
much more complicated or generic. It's supposed to be
able to handle sglist (which may consist of multiple IN
buffers and multiple OUT buffers at the same time), and
it will try to use indirect descriptors. Then it needs
several loops to parse the sglist. That's why I said
it's quite generic.
>
> If what you say is true then we should take a careful look
> and not supporting these generic things with packed layout.
> Once we do support them it will be too late and we won't
> be able to get performance back.
I think it's a good point that we don't need to support
everything in packed ring (especially these which would
hurt the performance), as the packed ring aims at high
performance. I'm also wondering about the features. Is
there any possibility that we won't support the out of
order processing (at least not by default) in packed ring?
If I didn't miss anything, the need to support out of order
processing in packed ring will make the data structure
inside the driver not cache friendly which is similar to
the case of the descriptor table in the split ring (the
difference is that, it only happens in driver now).
>
>
>
> > >
> > > > So
> > > > I still think this patch set is mainly to make the kernel virtio
> > > > driver to have a full functional support of the packed ring, and
> > > > we can't expect impressive performance boost with it.
> > >
> > > We can only gain when virtio ring layout is the bottleneck. If there're
> > > bottlenecks elsewhere, we probably won't see any increasing in the numbers.
> > > Vhost-net is an example, and lots of optimizations have proved that virtio
> > > ring is not the main bottleneck for the current codes. I suspect it also the
> > > case of virtio driver. Did perf tell us any interesting things in virtio
> > > driver?
> > >
> > > Thanks
> > >
> > > >
> > > > >
> > > > > > > On Wed, Jul 11, 2018 at 10:27:06AM +0800, Tiwei Bie wrote:
> > > > > > > > Hello everyone,
> > > > > > > >
> > > > > > > > This patch set implements packed ring support in virtio driver.
> > > > > > > >
> > > > > > > > Some functional tests have been done with Jason's
> > > > > > > > packed ring implementation in vhost:
> > > > > > > >
> > > > > > > > https://lkml.org/lkml/2018/7/3/33
> > > > > > > >
> > > > > > > > Both of ping and netperf worked as expected.
> > > > > > > >
> > > > > > > > v1 -> v2:
> > > > > > > > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > > > > > > > - Add comments related to ccw (Jason);
> > > > > > > >
> > > > > > > > RFC (v6) -> v1:
> > > > > > > > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> > > > > > > > when event idx is off (Jason);
> > > > > > > > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > > > > - Test the state of the desc at used_idx instead of last_used_idx
> > > > > > > > in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > > > > - Save wrap counter (as part of queue state) in the return value
> > > > > > > > of virtqueue_enable_cb_prepare_packed();
> > > > > > > > - Refine the packed ring definitions in uapi;
> > > > > > > > - Rebase on the net-next tree;
> > > > > > > >
> > > > > > > > RFC v5 -> RFC v6:
> > > > > > > > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > > > > > > > - Define wrap counter as bool (Jason);
> > > > > > > > - Use ALIGN() in vring_init_packed() (Jason);
> > > > > > > > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > > > > > > > - Add comments for barriers (Jason);
> > > > > > > > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > > > > > > > - Refine the memory barrier in virtqueue_poll();
> > > > > > > > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > > > > > > > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > > > > > > >
> > > > > > > > RFC v4 -> RFC v5:
> > > > > > > > - Save DMA addr, etc in desc state (Jason);
> > > > > > > > - Track used wrap counter;
> > > > > > > >
> > > > > > > > RFC v3 -> RFC v4:
> > > > > > > > - Make ID allocation support out-of-order (Jason);
> > > > > > > > - Various fixes for EVENT_IDX support;
> > > > > > > >
> > > > > > > > RFC v2 -> RFC v3:
> > > > > > > > - Split into small patches (Jason);
> > > > > > > > - Add helper virtqueue_use_indirect() (Jason);
> > > > > > > > - Just set id for the last descriptor of a list (Jason);
> > > > > > > > - Calculate the prev in virtqueue_add_packed() (Jason);
> > > > > > > > - Fix/improve desc suppression code (Jason/MST);
> > > > > > > > - Refine the code layout for XXX_split/packed and wrappers (MST);
> > > > > > > > - Fix the comments and API in uapi (MST);
> > > > > > > > - Remove the BUG_ON() for indirect (Jason);
> > > > > > > > - Some other refinements and bug fixes;
> > > > > > > >
> > > > > > > > RFC v1 -> RFC v2:
> > > > > > > > - Add indirect descriptor support - compile test only;
> > > > > > > > - Add event suppression supprt - compile test only;
> > > > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > > > - Avoid using '%' operator (Jason);
> > > > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > > > - Some other refinements and bug fixes;
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Tiwei Bie (5):
> > > > > > > > virtio: add packed ring definitions
> > > > > > > > virtio_ring: support creating packed ring
> > > > > > > > virtio_ring: add packed ring support
> > > > > > > > virtio_ring: add event idx support in packed ring
> > > > > > > > virtio_ring: enable packed ring
> > > > > > > >
> > > > > > > > drivers/s390/virtio/virtio_ccw.c | 14 +
> > > > > > > > drivers/virtio/virtio_ring.c | 1365 ++++++++++++++++++++++------
> > > > > > > > include/linux/virtio_ring.h | 8 +-
> > > > > > > > include/uapi/linux/virtio_config.h | 3 +
> > > > > > > > include/uapi/linux/virtio_ring.h | 43 +
> > > > > > > > 5 files changed, 1157 insertions(+), 276 deletions(-)
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.18.0
> > > > > > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > >
> > >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next V2] virtio_net: ethtool tx napi configuration
From: Jason Wang @ 2018-09-13 9:07 UTC (permalink / raw)
To: mst; +Cc: netdev, Willem de Bruijn, davem, linux-kernel, virtualization
In-Reply-To: <20180913053545.18585-1-jasowang@redhat.com>
On 2018年09月13日 13:35, Jason Wang wrote:
> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
> Interrupt moderation is currently not supported, so these accept and
> display the default settings of 0 usec and 1 frame.
>
> Toggle tx napi through a bit in tx-frames. So as to not interfere
> with possible future interrupt moderation, value 1 means tx napi while
> value 0 means not.
>
> To properly synchronize with the data path, tx napi is disabled and
> tx lock is held when changing the value of napi weight. And two more
> places that can access tx napi weight:
>
> - speculative tx polling in rx napi, we can leave it as is since it
> not a must for correctness.
> - skb_xmit_done(), one more check of napi weight is added before
> trying to enable tx to avoid tx to be disabled forever if napi is
> disabled after skb_xmit_done() but before the napi
>
> Link: https://patchwork.ozlabs.org/patch/948149/
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> Changes from V1:
> - try to synchronize with datapath to allow changing mode when
> interface is up.
> - use tx-frames 0 as to disable tx napi while tx-frames 1 to enable tx napi
> ---
> drivers/net/virtio_net.c | 64 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 765920905226..6e70864f5899 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -66,6 +66,8 @@ DECLARE_EWMA(pkt_len, 0, 64)
>
> #define VIRTNET_DRIVER_VERSION "1.0.0"
>
> +static const u32 ethtool_coalesce_napi_mask = (1UL << 10);
> +
> static const unsigned long guest_offloads[] = {
> VIRTIO_NET_F_GUEST_TSO4,
> VIRTIO_NET_F_GUEST_TSO6,
> @@ -1444,7 +1446,10 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>
> virtqueue_napi_complete(napi, sq->vq, 0);
>
> - if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> + /* Check napi.weight to avoid tx stall since it could be set
> + * to zero by ethtool after skb_xmit_done().
> + */
> + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS || !sq->napi.weight)
> netif_tx_wake_queue(txq);
>
> return 0;
> @@ -2181,6 +2186,61 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
> return 0;
> }
>
> +static int virtnet_set_coalesce(struct net_device *dev,
> + struct ethtool_coalesce *ec)
> +{
> + struct ethtool_coalesce ec_default = {
> + .cmd = ETHTOOL_SCOALESCE,
> + .rx_max_coalesced_frames = 1,
> + };
> + struct virtnet_info *vi = netdev_priv(dev);
> + int i, napi_weight;
> +
> + if (ec->tx_max_coalesced_frames > 1)
> + return -EINVAL;
> +
> + ec_default.tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
> + napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
> +
> + /* disallow changes to fields not explicitly tested above */
> + if (memcmp(ec, &ec_default, sizeof(ec_default)))
> + return -EINVAL;
> +
> + if (napi_weight ^ vi->sq[0].napi.weight) {
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + struct netdev_queue *txq =
> + netdev_get_tx_queue(vi->dev, i);
> +
> + virtnet_napi_tx_disable(&vi->sq[i].napi);
> + __netif_tx_lock_bh(txq);
Need to check netif_running() before disabling napi. Otherwise we may
have a infinite loop here.
The discussion is still ongoing, if we decide to go set_coalesce, I will
post a V3.
Thanks
> + vi->sq[i].napi.weight = napi_weight;
> + __netif_tx_unlock_bh(txq);
> + virtnet_napi_tx_enable(vi, vi->sq[i].vq,
> + &vi->sq[i].napi);
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int virtnet_get_coalesce(struct net_device *dev,
> + struct ethtool_coalesce *ec)
> +{
> + struct ethtool_coalesce ec_default = {
> + .cmd = ETHTOOL_GCOALESCE,
> + .rx_max_coalesced_frames = 1,
> + .tx_max_coalesced_frames = 0,
> + };
> + struct virtnet_info *vi = netdev_priv(dev);
> +
> + memcpy(ec, &ec_default, sizeof(ec_default));
> +
> + if (vi->sq[0].napi.weight)
> + ec->tx_max_coalesced_frames = 1;
> +
> + return 0;
> +}
> +
> static void virtnet_init_settings(struct net_device *dev)
> {
> struct virtnet_info *vi = netdev_priv(dev);
> @@ -2219,6 +2279,8 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
> .get_ts_info = ethtool_op_get_ts_info,
> .get_link_ksettings = virtnet_get_link_ksettings,
> .set_link_ksettings = virtnet_set_link_ksettings,
> + .set_coalesce = virtnet_set_coalesce,
> + .get_coalesce = virtnet_get_coalesce,
> };
>
> static void virtnet_freeze_down(struct virtio_device *vdev)
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Jason Wang @ 2018-09-13 9:47 UTC (permalink / raw)
To: Tiwei Bie, Michael S. Tsirkin
Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180913085919.GA42049@fbsd1.sh.intel.com>
On 2018年09月13日 16:59, Tiwei Bie wrote:
>> If what you say is true then we should take a careful look
>> and not supporting these generic things with packed layout.
>> Once we do support them it will be too late and we won't
>> be able to get performance back.
> I think it's a good point that we don't need to support
> everything in packed ring (especially these which would
> hurt the performance), as the packed ring aims at high
> performance. I'm also wondering about the features. Is
> there any possibility that we won't support the out of
> order processing (at least not by default) in packed ring?
> If I didn't miss anything, the need to support out of order
> processing in packed ring will make the data structure
> inside the driver not cache friendly which is similar to
> the case of the descriptor table in the split ring (the
> difference is that, it only happens in driver now).
Out of order is not the only user, DMA is another one. We don't have
used ring(len), so we need to maintain buffer length somewhere even for
in order device. But if it's not too late, I second for a OUT_OF_ORDER
feature. Starting from in order can have much simpler code in driver.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next V2] virtio_net: ethtool tx napi configuration
From: Willem de Bruijn @ 2018-09-13 15:20 UTC (permalink / raw)
To: Jason Wang
Cc: Willem de Bruijn, Michael S. Tsirkin, Network Development, LKML,
virtualization, David Miller
In-Reply-To: <20180913053545.18585-1-jasowang@redhat.com>
On Thu, Sep 13, 2018 at 1:40 AM Jason Wang <jasowang@redhat.com> wrote:
>
> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
> Interrupt moderation is currently not supported, so these accept and
> display the default settings of 0 usec and 1 frame.
>
> Toggle tx napi through a bit in tx-frames. So as to not interfere
> with possible future interrupt moderation, value 1 means tx napi while
> value 0 means not.
>
> To properly synchronize with the data path, tx napi is disabled and
> tx lock is held when changing the value of napi weight. And two more
> places that can access tx napi weight:
>
> - speculative tx polling in rx napi, we can leave it as is since it
> not a must for correctness.
> - skb_xmit_done(), one more check of napi weight is added before
> trying to enable tx to avoid tx to be disabled forever if napi is
> disabled after skb_xmit_done() but before the napi
>
> Link: https://patchwork.ozlabs.org/patch/948149/
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> Changes from V1:
> - try to synchronize with datapath to allow changing mode when
> interface is up.
> - use tx-frames 0 as to disable tx napi while tx-frames 1 to enable tx napi
> ---
> drivers/net/virtio_net.c | 64 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 765920905226..6e70864f5899 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -66,6 +66,8 @@ DECLARE_EWMA(pkt_len, 0, 64)
>
> #define VIRTNET_DRIVER_VERSION "1.0.0"
>
> +static const u32 ethtool_coalesce_napi_mask = (1UL << 10);
> +
This is no longer needed
> static const unsigned long guest_offloads[] = {
> VIRTIO_NET_F_GUEST_TSO4,
> VIRTIO_NET_F_GUEST_TSO6,
> @@ -1444,7 +1446,10 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>
> virtqueue_napi_complete(napi, sq->vq, 0);
>
> - if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> + /* Check napi.weight to avoid tx stall since it could be set
> + * to zero by ethtool after skb_xmit_done().
> + */
> + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS || !sq->napi.weight)
> netif_tx_wake_queue(txq);
I see. This assumes that the napi handler will always be called on
conversion from napi to no-napi mode.
That is safe to assume because if it isn't called (and will not call
netif_tx_wake_queue) that implies that napi was not scheduled, and
thus the tx interrupt was not suppressed and thus there was no tx
completion work to be scheduled?
>
> return 0;
> @@ -2181,6 +2186,61 @@ static int virtnet_get_link_ksettings(struct net_device *dev,
> return 0;
> }
>
> +static int virtnet_set_coalesce(struct net_device *dev,
> + struct ethtool_coalesce *ec)
> +{
> + struct ethtool_coalesce ec_default = {
> + .cmd = ETHTOOL_SCOALESCE,
> + .rx_max_coalesced_frames = 1,
> + };
> + struct virtnet_info *vi = netdev_priv(dev);
> + int i, napi_weight;
> +
> + if (ec->tx_max_coalesced_frames > 1)
> + return -EINVAL;
> +
> + ec_default.tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
> + napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
> +
> + /* disallow changes to fields not explicitly tested above */
> + if (memcmp(ec, &ec_default, sizeof(ec_default)))
> + return -EINVAL;
> +
> + if (napi_weight ^ vi->sq[0].napi.weight) {
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + struct netdev_queue *txq =
> + netdev_get_tx_queue(vi->dev, i);
> +
> + virtnet_napi_tx_disable(&vi->sq[i].napi);
> + __netif_tx_lock_bh(txq);
> + vi->sq[i].napi.weight = napi_weight;
> + __netif_tx_unlock_bh(txq);
> + virtnet_napi_tx_enable(vi, vi->sq[i].vq,
> + &vi->sq[i].napi);
> + }
> + }
> +
> + return 0;
> +}
> +
> +static int virtnet_get_coalesce(struct net_device *dev,
> + struct ethtool_coalesce *ec)
> +{
> + struct ethtool_coalesce ec_default = {
> + .cmd = ETHTOOL_GCOALESCE,
> + .rx_max_coalesced_frames = 1,
> + .tx_max_coalesced_frames = 0,
no need to explicitly initialize to 0 (unless you did this for
documentation purposes, which is fine).
> + };
> + struct virtnet_info *vi = netdev_priv(dev);
> +
> + memcpy(ec, &ec_default, sizeof(ec_default));
> +
> + if (vi->sq[0].napi.weight)
> + ec->tx_max_coalesced_frames = 1;
> +
> + return 0;
> +}
> +
> static void virtnet_init_settings(struct net_device *dev)
> {
> struct virtnet_info *vi = netdev_priv(dev);
> @@ -2219,6 +2279,8 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
> .get_ts_info = ethtool_op_get_ts_info,
> .get_link_ksettings = virtnet_get_link_ksettings,
> .set_link_ksettings = virtnet_set_link_ksettings,
> + .set_coalesce = virtnet_set_coalesce,
> + .get_coalesce = virtnet_get_coalesce,
> };
>
> static void virtnet_freeze_down(struct virtio_device *vdev)
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH net-next V2 00/11] vhost_net TX batching
From: David Miller @ 2018-09-13 16:28 UTC (permalink / raw)
To: jasowang; +Cc: netdev, mst, linux-kernel, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Wed, 12 Sep 2018 11:16:58 +0800
> This series tries to batch submitting packets to underlayer socket
> through msg_control during sendmsg(). This is done by:
...
Series applied, thanks Jason.
^ permalink raw reply
* Re: [PATCH net-next V2 00/11] vhost_net TX batching
From: Michael S. Tsirkin @ 2018-09-13 18:02 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180913.092819.2099750169529223026.davem@davemloft.net>
On Thu, Sep 13, 2018 at 09:28:19AM -0700, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Wed, 12 Sep 2018 11:16:58 +0800
>
> > This series tries to batch submitting packets to underlayer socket
> > through msg_control during sendmsg(). This is done by:
> ...
>
> Series applied, thanks Jason.
Going over it now I don't see a lot to complain about, but I'd
appreciate a bit more time for review in the future. 3 days ok?
--
MST
^ permalink raw reply
* Re: [PATCH net-next V2 11/11] vhost_net: batch submitting XDP buffers to underlayer sockets
From: Michael S. Tsirkin @ 2018-09-13 18:04 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180912031709.14112-12-jasowang@redhat.com>
On Wed, Sep 12, 2018 at 11:17:09AM +0800, Jason Wang wrote:
> +static void vhost_tx_batch(struct vhost_net *net,
> + struct vhost_net_virtqueue *nvq,
> + struct socket *sock,
> + struct msghdr *msghdr)
> +{
> + struct tun_msg_ctl ctl = {
> + .type = TUN_MSG_PTR,
> + .num = nvq->batched_xdp,
> + .ptr = nvq->xdp,
> + };
> + int err;
> +
> + if (nvq->batched_xdp == 0)
> + goto signal_used;
> +
> + msghdr->msg_control = &ctl;
> + err = sock->ops->sendmsg(sock, msghdr, 0);
> + if (unlikely(err < 0)) {
> + vq_err(&nvq->vq, "Fail to batch sending packets\n");
> + return;
> + }
> +
> +signal_used:
> + vhost_net_signal_used(nvq);
> + nvq->batched_xdp = 0;
> +}
> +
Given it's all tun-specific now, how about just exporting tap_sendmsg
and calling that? Will get rid of some indirection too.
--
MST
^ permalink raw reply
* Re: [PATCH net-next V2 00/11] vhost_net TX batching
From: David Miller @ 2018-09-13 18:05 UTC (permalink / raw)
To: mst; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180913135833-mutt-send-email-mst@kernel.org>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 13 Sep 2018 14:02:13 -0400
> On Thu, Sep 13, 2018 at 09:28:19AM -0700, David Miller wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Date: Wed, 12 Sep 2018 11:16:58 +0800
>>
>> > This series tries to batch submitting packets to underlayer socket
>> > through msg_control during sendmsg(). This is done by:
>> ...
>>
>> Series applied, thanks Jason.
>
> Going over it now I don't see a lot to complain about, but I'd
> appreciate a bit more time for review in the future. 3 days ok?
I try to handle every patchwork entry within 48 hours, more
specifically 2 business days.
Given the rate at which patches flow through networking, I think
this is both reasonable and necessary.
One always have the option to send a quick reply to the list saying
"Please wait for my review, I'll get to it in the next day or two."
which I will certainly honor.
Thank you.
^ permalink raw reply
* Re: [PATCH net-next V2] virtio_net: ethtool tx napi configuration
From: David Miller @ 2018-09-13 22:52 UTC (permalink / raw)
To: jasowang; +Cc: netdev, willemb, virtualization, linux-kernel, mst
In-Reply-To: <20180913053545.18585-1-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Thu, 13 Sep 2018 13:35:45 +0800
> Toggle tx napi through a bit in tx-frames.
This is not what the code implements as the interface any more.
Please fix the commit message to match the code.
Thanks.
^ permalink raw reply
* Re: [PATCH net-next V2 11/11] vhost_net: batch submitting XDP buffers to underlayer sockets
From: Jason Wang @ 2018-09-14 3:11 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180913140332-mutt-send-email-mst@kernel.org>
On 2018年09月14日 02:04, Michael S. Tsirkin wrote:
> On Wed, Sep 12, 2018 at 11:17:09AM +0800, Jason Wang wrote:
>> +static void vhost_tx_batch(struct vhost_net *net,
>> + struct vhost_net_virtqueue *nvq,
>> + struct socket *sock,
>> + struct msghdr *msghdr)
>> +{
>> + struct tun_msg_ctl ctl = {
>> + .type = TUN_MSG_PTR,
>> + .num = nvq->batched_xdp,
>> + .ptr = nvq->xdp,
>> + };
>> + int err;
>> +
>> + if (nvq->batched_xdp == 0)
>> + goto signal_used;
>> +
>> + msghdr->msg_control = &ctl;
>> + err = sock->ops->sendmsg(sock, msghdr, 0);
>> + if (unlikely(err < 0)) {
>> + vq_err(&nvq->vq, "Fail to batch sending packets\n");
>> + return;
>> + }
>> +
>> +signal_used:
>> + vhost_net_signal_used(nvq);
>> + nvq->batched_xdp = 0;
>> +}
>> +
> Given it's all tun-specific now, how about just exporting tap_sendmsg
> and calling that? Will get rid of some indirection too.
>
Yes we can do it on top in the future.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next V2] virtio_net: ethtool tx napi configuration
From: Jason Wang @ 2018-09-14 3:29 UTC (permalink / raw)
To: David Miller; +Cc: netdev, willemb, virtualization, linux-kernel, mst
In-Reply-To: <20180913.155253.713713547323242272.davem@davemloft.net>
On 2018年09月14日 06:52, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Thu, 13 Sep 2018 13:35:45 +0800
>
>> Toggle tx napi through a bit in tx-frames.
> This is not what the code implements as the interface any more.
>
> Please fix the commit message to match the code.
>
> Thanks.
Will fix this.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: Sources of initialized memory in virtio?
From: Jason Wang @ 2018-09-14 3:50 UTC (permalink / raw)
To: Alexander Potapenko, stefanha, Michael S. Tsirkin, kraxel
Cc: kvm, virtualization
In-Reply-To: <CAG_fn=URm9Bv0sTy6Ex8ohrLVStaxoZtwH_BOEao6sqGCzQMiA@mail.gmail.com>
On 2018年09月13日 21:00, Alexander Potapenko wrote:
> Hi mighty virtio maintainers,
>
> I'm working on KMSAN, a new runtime detector of uninitialized memory
> based on compiler instrumentation (https://github.com/google/kmsan)
> KMSAN is mostly being tested on QEMU with KVM enabled, so my kernel
> interacts a lot with various virtio drivers, that's why I'm seeking
> your help.
>
> By default KMSAN treats kernel memory allocated by kmalloc() and
> alloc_page() as uninitialized. Writing a constant to memory or using
> it in copy_from_user() makes that memory initialized.
> Unfortunately a lot of writes to memory from KVM (mostly in the disk
> and network drivers) remain unnoticed by the tool, therefore we're
> seeing a lot of false positive reports (along with actual bugs, like
> CVE-2018-1118).
>
> KMSAN has an API function `kmsan_unpoison_shadow(void *buf, int len)`,
> which means "from now on, till this memory is freed or written to,
> mark it as initialized".
> I've tried playing Whack-a-Mole adding it to various places where the
> data comes from KVM, but failed to find them all. In fact, some of my
> annotations were wrong, so I ended up with the following two patches:
>
> https://github.com/google/kmsan/commit/76c671199a4de5bbe73cd13210a5e28848211bd1
> https://github.com/google/kmsan/commit/40ba1c8e2a3c6bbe8f34037413e253894251a405
>
> But I'm far from being sure this is the complete list of places where
> the memory is initialized by virtio drivers.
> May I ask you to help me find the places where we actually need to
> annotate the memory in virtio?
It looks to me another one is the used ring which device writes back the
completed descriptor id and length. It (vr->used) was a part of a page
which was allocated in vring_alloc_queue() through alloc_pages_exact()
with __GFP_ZERO. So I'm not we need care about it.
Thanks
>
> Thanks in advance,
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox