* Re: [RFC v4 5/5] virtio_ring: enable packed ring
From: Sergei Shtylyov @ 2018-05-16 11:42 UTC (permalink / raw)
To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180516102159.GA11467@debian>
On 05/16/2018 01:21 PM, Tiwei Bie wrote:
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>> drivers/virtio/virtio_ring.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index de3839f3621a..b158692263b0 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
>>> break;
>>> case VIRTIO_F_IOMMU_PLATFORM:
>>> break;
>>> + case VIRTIO_F_RING_PACKED:
>>> + break;
>>
>> Why not just add this *case* under the previous *case*?
>
> Do you mean fallthrough? Something like:
>
> case VIRTIO_F_IOMMU_PLATFORM:
> case VIRTIO_F_RING_PACKED:
> break;
Yes, exactly. :-)
> Best regards,
> Tiwei Bie
[...]
MBR, Sergei
^ permalink raw reply
* Re: [RFC v4 5/5] virtio_ring: enable packed ring
From: Tiwei Bie @ 2018-05-16 10:21 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <27dfb4e8-6d63-bf7d-0f97-ac51559f8040@cogentembedded.com>
On Wed, May 16, 2018 at 01:15:48PM +0300, Sergei Shtylyov wrote:
> On 5/16/2018 11:37 AM, Tiwei Bie wrote:
>
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> > drivers/virtio/virtio_ring.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index de3839f3621a..b158692263b0 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
> > break;
> > case VIRTIO_F_IOMMU_PLATFORM:
> > break;
> > + case VIRTIO_F_RING_PACKED:
> > + break;
>
> Why not just add this *case* under the previous *case*?
Do you mean fallthrough? Something like:
case VIRTIO_F_IOMMU_PLATFORM:
case VIRTIO_F_RING_PACKED:
break;
Best regards,
Tiwei Bie
>
> > default:
> > /* We don't understand this bit. */
> > __virtio_clear_bit(vdev, i);
>
> MBR, Sergei
^ permalink raw reply
* Re: [RFC v4 5/5] virtio_ring: enable packed ring
From: Sergei Shtylyov @ 2018-05-16 10:15 UTC (permalink / raw)
To: Tiwei Bie, mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180516083737.26504-6-tiwei.bie@intel.com>
On 5/16/2018 11:37 AM, Tiwei Bie wrote:
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
> drivers/virtio/virtio_ring.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index de3839f3621a..b158692263b0 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
> break;
> case VIRTIO_F_IOMMU_PLATFORM:
> break;
> + case VIRTIO_F_RING_PACKED:
> + break;
Why not just add this *case* under the previous *case*?
> default:
> /* We don't understand this bit. */
> __virtio_clear_bit(vdev, i);
MBR, Sergei
^ permalink raw reply
* [RFC v4 5/5] virtio_ring: enable packed ring
From: Tiwei Bie @ 2018-05-16 8:37 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180516083737.26504-1-tiwei.bie@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index de3839f3621a..b158692263b0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1940,6 +1940,8 @@ void vring_transport_features(struct virtio_device *vdev)
break;
case VIRTIO_F_IOMMU_PLATFORM:
break;
+ case VIRTIO_F_RING_PACKED:
+ break;
default:
/* We don't understand this bit. */
__virtio_clear_bit(vdev, i);
--
2.17.0
^ permalink raw reply related
* [RFC v4 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-16 8:37 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180516083737.26504-1-tiwei.bie@intel.com>
This commit introduces the event idx support in
packed ring.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 75 +++++++++++++++++++++++++++++++++---
1 file changed, 70 insertions(+), 5 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c6c5deb0e3ae..de3839f3621a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1006,7 +1006,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 flags;
+ u16 new, old, off_wrap, flags, wrap_counter, event_idx;
bool needs_kick;
u32 snapshot;
@@ -1015,9 +1015,19 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
* suppressions. */
virtio_mb(vq->weak_barriers);
+ old = vq->next_avail_idx - vq->num_added;
+ new = vq->next_avail_idx;
+ vq->num_added = 0;
+
snapshot = *(u32 *)vq->vring_packed.device;
+ off_wrap = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot & 0xffff));
flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
+ wrap_counter = off_wrap >> 15;
+ event_idx = off_wrap & ~(1<<15);
+ if (wrap_counter != vq->wrap_counter)
+ event_idx -= vq->vring_packed.num;
+
#ifdef DEBUG
if (vq->last_add_time_valid) {
WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
@@ -1026,7 +1036,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
vq->last_add_time_valid = false;
#endif
- needs_kick = (flags != VRING_EVENT_F_DISABLE);
+ if (flags == VRING_EVENT_F_DESC)
+ needs_kick = vring_need_event(event_idx, new, old);
+ else
+ needs_kick = (flags != VRING_EVENT_F_DISABLE);
END_USE(vq);
return needs_kick;
}
@@ -1098,7 +1111,7 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
void **ctx)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 last_used, id;
+ u16 wrap_counter, last_used, id;
void *ret;
START_USE(vq);
@@ -1138,6 +1151,19 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
ret = vq->desc_state[id].data;
detach_buf_packed(vq, last_used, id, ctx);
+ wrap_counter = vq->wrap_counter;
+ if (vq->last_used_idx > vq->next_avail_idx)
+ wrap_counter ^= 1;
+
+ /* If we expect an interrupt for the next entry, tell host
+ * by writing event index and flush out the write before
+ * the read in the next get_buf call. */
+ if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
+ virtio_store_mb(vq->weak_barriers,
+ &vq->vring_packed.driver->off_wrap,
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
+ (wrap_counter << 15)));
+
#ifdef DEBUG
vq->last_add_time_valid = false;
#endif
@@ -1160,15 +1186,27 @@ static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 wrap_counter;
START_USE(vq);
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
+ /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always update the event index to keep code simple. */
+
+ wrap_counter = vq->wrap_counter;
+ if (vq->last_used_idx > vq->next_avail_idx)
+ wrap_counter ^= 1;
+
+ vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+ vq->last_used_idx | (wrap_counter << 15));
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
- vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+ VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1194,15 +1232,40 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 bufs, used_idx, wrap_counter;
START_USE(vq);
/* We optimistically turn back on interrupts, then check if there was
* more to do. */
+ /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always update the event index to keep code simple. */
+
+ /* TODO: tune this threshold */
+ if (vq->next_avail_idx < vq->last_used_idx)
+ bufs = (vq->vring_packed.num + vq->next_avail_idx -
+ vq->last_used_idx) * 3 / 4;
+ else
+ bufs = (vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
+
+ wrap_counter = vq->wrap_counter;
+ if (vq->last_used_idx > vq->next_avail_idx)
+ wrap_counter ^= 1;
+
+ used_idx = vq->last_used_idx + bufs;
+ if (used_idx >= vq->vring_packed.num) {
+ used_idx -= vq->vring_packed.num;
+ wrap_counter ^= 1;
+ }
+
+ vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
+ used_idx | (wrap_counter << 15));
if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
virtio_wmb(vq->weak_barriers);
- vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
+ VRING_EVENT_F_ENABLE;
vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
vq->event_flags_shadow);
}
@@ -1869,8 +1932,10 @@ void vring_transport_features(struct virtio_device *vdev)
switch (i) {
case VIRTIO_RING_F_INDIRECT_DESC:
break;
+#if 0
case VIRTIO_RING_F_EVENT_IDX:
break;
+#endif
case VIRTIO_F_VERSION_1:
break;
case VIRTIO_F_IOMMU_PLATFORM:
--
2.17.0
^ permalink raw reply related
* [RFC v4 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-05-16 8:37 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180516083737.26504-1-tiwei.bie@intel.com>
This commit introduces the basic support (without EVENT_IDX)
for packed ring.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 491 ++++++++++++++++++++++++++++++++++-
1 file changed, 481 insertions(+), 10 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 62d7c407841a..c6c5deb0e3ae 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,7 +58,8 @@
struct vring_desc_state {
void *data; /* Data for callback. */
- struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
+ void *indir_desc; /* Indirect descriptor, if any. */
+ int num; /* Descriptor list length. */
};
struct vring_virtqueue {
@@ -116,6 +117,9 @@ struct vring_virtqueue {
/* Last written value to driver->flags in
* guest byte order. */
u16 event_flags_shadow;
+
+ /* ID allocation. */
+ struct idr buffer_id;
};
};
@@ -142,6 +146,16 @@ struct vring_virtqueue {
#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
+ unsigned int total_sg)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ /* If the host supports indirect descriptor tables, and we have multiple
+ * buffers, then go indirect. FIXME: tune this threshold */
+ return (vq->indirect && total_sg > 1 && vq->vq.num_free);
+}
+
/*
* Modern virtio devices have feature bits to specify whether they need a
* quirk and bypass the IOMMU. If not there, just use the DMA API.
@@ -327,9 +341,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
head = vq->free_head;
- /* If the host supports indirect descriptor tables, and we have multiple
- * buffers, then go indirect. FIXME: tune this threshold */
- if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+ if (virtqueue_use_indirect(_vq, total_sg))
desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
@@ -741,6 +753,63 @@ static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
}
+static void vring_unmap_one_packed(const struct vring_virtqueue *vq,
+ struct vring_packed_desc *desc)
+{
+ u16 flags;
+
+ if (!vring_use_dma_api(vq->vq.vdev))
+ return;
+
+ flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+
+ if (flags & VRING_DESC_F_INDIRECT) {
+ dma_unmap_single(vring_dma_dev(vq),
+ virtio64_to_cpu(vq->vq.vdev, desc->addr),
+ virtio32_to_cpu(vq->vq.vdev, desc->len),
+ (flags & VRING_DESC_F_WRITE) ?
+ DMA_FROM_DEVICE : DMA_TO_DEVICE);
+ } else {
+ dma_unmap_page(vring_dma_dev(vq),
+ virtio64_to_cpu(vq->vq.vdev, desc->addr),
+ virtio32_to_cpu(vq->vq.vdev, desc->len),
+ (flags & VRING_DESC_F_WRITE) ?
+ DMA_FROM_DEVICE : DMA_TO_DEVICE);
+ }
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+ unsigned int total_sg,
+ gfp_t gfp)
+{
+ struct vring_packed_desc *desc;
+
+ /*
+ * We require lowmem mappings for the descriptors because
+ * otherwise virt_to_phys will give us bogus addresses in the
+ * virtqueue.
+ */
+ gfp &= ~__GFP_HIGHMEM;
+
+ desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+ return desc;
+}
+
+static u16 alloc_id_packed(struct vring_virtqueue *vq)
+{
+ u16 id;
+
+ id = idr_alloc(&vq->buffer_id, NULL, 0, vq->vring_packed.num,
+ GFP_KERNEL);
+ return id;
+}
+
+static void free_id_packed(struct vring_virtqueue *vq, u16 id)
+{
+ idr_remove(&vq->buffer_id, id);
+}
+
static inline int virtqueue_add_packed(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
@@ -750,47 +819,446 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
void *ctx,
gfp_t gfp)
{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ struct vring_packed_desc *desc;
+ struct scatterlist *sg;
+ unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+ __virtio16 uninitialized_var(head_flags), flags;
+ u16 head, wrap_counter, id;
+ bool indirect;
+
+ START_USE(vq);
+
+ BUG_ON(data == NULL);
+ BUG_ON(ctx && vq->indirect);
+
+ if (unlikely(vq->broken)) {
+ END_USE(vq);
+ return -EIO;
+ }
+
+#ifdef DEBUG
+ {
+ ktime_t now = ktime_get();
+
+ /* No kick or get, with .1 second between? Warn. */
+ if (vq->last_add_time_valid)
+ WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+ > 100);
+ vq->last_add_time = now;
+ vq->last_add_time_valid = true;
+ }
+#endif
+
+ BUG_ON(total_sg == 0);
+
+ head = vq->next_avail_idx;
+ wrap_counter = vq->wrap_counter;
+
+ if (virtqueue_use_indirect(_vq, total_sg))
+ desc = alloc_indirect_packed(_vq, total_sg, gfp);
+ else {
+ desc = NULL;
+ WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
+ }
+
+ if (desc) {
+ /* Use a single buffer which doesn't continue */
+ indirect = true;
+ /* Set up rest to use this indirect table. */
+ i = 0;
+ descs_used = 1;
+ } else {
+ indirect = false;
+ desc = vq->vring_packed.desc;
+ i = head;
+ descs_used = total_sg;
+ }
+
+ if (vq->vq.num_free < descs_used) {
+ pr_debug("Can't add buf len %i - avail = %i\n",
+ descs_used, vq->vq.num_free);
+ /* FIXME: for historical reasons, we force a notify here if
+ * there are outgoing parts to the buffer. Presumably the
+ * host should service the ring ASAP. */
+ if (out_sgs)
+ vq->notify(&vq->vq);
+ if (indirect)
+ kfree(desc);
+ END_USE(vq);
+ return -ENOSPC;
+ }
+
+ id = alloc_id_packed(vq);
+
+ for (n = 0; n < out_sgs + in_sgs; n++) {
+ for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+ dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
+ DMA_TO_DEVICE : DMA_FROM_DEVICE);
+ if (vring_mapping_error(vq, addr))
+ goto unmap_release;
+
+ flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
+ (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
+ VRING_DESC_F_AVAIL(vq->wrap_counter) |
+ VRING_DESC_F_USED(!vq->wrap_counter));
+ if (!indirect && i == head)
+ head_flags = flags;
+ else
+ desc[i].flags = flags;
+
+ desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+ desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
+ i++;
+ if (!indirect && i >= vq->vring_packed.num) {
+ i = 0;
+ vq->wrap_counter ^= 1;
+ }
+ }
+ }
+
+ prev = (i > 0 ? i : vq->vring_packed.num) - 1;
+ desc[prev].id = cpu_to_virtio16(_vq->vdev, id);
+
+ /* Last one doesn't continue. */
+ if (total_sg == 1)
+ head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+ else
+ desc[prev].flags &= cpu_to_virtio16(_vq->vdev,
+ ~VRING_DESC_F_NEXT);
+
+ if (indirect) {
+ /* Now that the indirect table is filled in, map it. */
+ dma_addr_t addr = vring_map_single(
+ vq, desc, total_sg * sizeof(struct vring_packed_desc),
+ DMA_TO_DEVICE);
+ if (vring_mapping_error(vq, addr))
+ goto unmap_release;
+
+ head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
+ VRING_DESC_F_AVAIL(wrap_counter) |
+ VRING_DESC_F_USED(!wrap_counter));
+ vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev,
+ addr);
+ vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
+ total_sg * sizeof(struct vring_packed_desc));
+ vq->vring_packed.desc[head].id = cpu_to_virtio16(_vq->vdev, id);
+ }
+
+ /* We're using some buffers from the free list. */
+ vq->vq.num_free -= descs_used;
+
+ /* Update free pointer */
+ if (indirect) {
+ n = head + 1;
+ if (n >= vq->vring_packed.num) {
+ n = 0;
+ vq->wrap_counter ^= 1;
+ }
+ vq->next_avail_idx = n;
+ } else
+ vq->next_avail_idx = i;
+
+ /* Store token and indirect buffer state. */
+ vq->desc_state[id].num = descs_used;
+ vq->desc_state[id].data = data;
+ if (indirect)
+ vq->desc_state[id].indir_desc = desc;
+ else
+ vq->desc_state[id].indir_desc = ctx;
+
+ /* A driver MUST NOT make the first descriptor in the list
+ * available before all subsequent descriptors comprising
+ * the list are made available. */
+ virtio_wmb(vq->weak_barriers);
+ vq->vring_packed.desc[head].flags = head_flags;
+ vq->num_added += descs_used;
+
+ pr_debug("Added buffer head %i to %p\n", head, vq);
+ END_USE(vq);
+
+ return 0;
+
+unmap_release:
+ err_idx = i;
+ i = head;
+
+ for (n = 0; n < total_sg; n++) {
+ if (i == err_idx)
+ break;
+ vring_unmap_one_packed(vq, &desc[i]);
+ i++;
+ if (!indirect && i >= vq->vring_packed.num)
+ i = 0;
+ }
+
+ vq->wrap_counter = wrap_counter;
+
+ if (indirect)
+ kfree(desc);
+
+ free_id_packed(vq, id);
+
+ END_USE(vq);
return -EIO;
}
static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
{
- return false;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 flags;
+ bool needs_kick;
+ u32 snapshot;
+
+ START_USE(vq);
+ /* We need to expose the new flags value before checking notification
+ * suppressions. */
+ virtio_mb(vq->weak_barriers);
+
+ snapshot = *(u32 *)vq->vring_packed.device;
+ flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
+
+#ifdef DEBUG
+ if (vq->last_add_time_valid) {
+ WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
+ vq->last_add_time)) > 100);
+ }
+ vq->last_add_time_valid = false;
+#endif
+
+ needs_kick = (flags != VRING_EVENT_F_DISABLE);
+ END_USE(vq);
+ return needs_kick;
+}
+
+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+ struct vring_packed_desc *desc;
+ unsigned int i, j;
+
+ /* Clear data ptr. */
+ vq->desc_state[id].data = NULL;
+
+ i = head;
+
+ for (j = 0; j < vq->desc_state[id].num; j++) {
+ desc = &vq->vring_packed.desc[i];
+ vring_unmap_one_packed(vq, desc);
+ i++;
+ if (i >= vq->vring_packed.num)
+ i = 0;
+ }
+
+ vq->vq.num_free += vq->desc_state[id].num;
+
+ if (vq->indirect) {
+ u32 len;
+
+ /* Free the indirect table, if any, now that it's unmapped. */
+ desc = vq->desc_state[id].indir_desc;
+ if (!desc)
+ goto out;
+
+ len = virtio32_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[head].len);
+
+ for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
+ vring_unmap_one_packed(vq, &desc[j]);
+
+ kfree(desc);
+ vq->desc_state[id].indir_desc = NULL;
+ } else if (ctx) {
+ *ctx = vq->desc_state[id].indir_desc;
+ }
+
+out:
+ free_id_packed(vq, id);
}
static inline bool more_used_packed(const struct vring_virtqueue *vq)
{
- return false;
+ u16 last_used, flags;
+ bool avail, used;
+
+ if (vq->vq.num_free == vq->vring_packed.num)
+ return false;
+
+ last_used = vq->last_used_idx;
+ flags = virtio16_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[last_used].flags);
+ avail = flags & VRING_DESC_F_AVAIL(1);
+ used = flags & VRING_DESC_F_USED(1);
+
+ return avail == used;
}
static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
unsigned int *len,
void **ctx)
{
- return NULL;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 last_used, id;
+ void *ret;
+
+ START_USE(vq);
+
+ if (unlikely(vq->broken)) {
+ END_USE(vq);
+ return NULL;
+ }
+
+ if (!more_used_packed(vq)) {
+ pr_debug("No more buffers in queue\n");
+ END_USE(vq);
+ return NULL;
+ }
+
+ /* Only get used elements after they have been exposed by host. */
+ virtio_rmb(vq->weak_barriers);
+
+ last_used = vq->last_used_idx;
+ id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
+ *len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
+
+ if (unlikely(id >= vq->vring_packed.num)) {
+ BAD_RING(vq, "id %u out of range\n", id);
+ return NULL;
+ }
+ if (unlikely(!vq->desc_state[id].data)) {
+ BAD_RING(vq, "id %u is not a head!\n", id);
+ return NULL;
+ }
+
+ vq->last_used_idx += vq->desc_state[id].num;
+ if (vq->last_used_idx >= vq->vring_packed.num)
+ vq->last_used_idx -= vq->vring_packed.num;
+
+ /* detach_buf_packed clears data, so grab it now. */
+ ret = vq->desc_state[id].data;
+ detach_buf_packed(vq, last_used, id, ctx);
+
+#ifdef DEBUG
+ vq->last_add_time_valid = false;
+#endif
+
+ END_USE(vq);
+ return ret;
}
static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ if (vq->event_flags_shadow != VRING_EVENT_F_DISABLE) {
+ vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
+ vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+ vq->event_flags_shadow);
+ }
}
static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
{
- return 0;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ START_USE(vq);
+
+ /* We optimistically turn back on interrupts, then check if there was
+ * more to do. */
+
+ if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
+ virtio_wmb(vq->weak_barriers);
+ vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+ vq->event_flags_shadow);
+ }
+
+ END_USE(vq);
+ return vq->last_used_idx;
}
static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
{
- return false;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ bool avail, used;
+ u16 flags;
+
+ virtio_mb(vq->weak_barriers);
+ flags = virtio16_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[last_used_idx].flags);
+ avail = flags & VRING_DESC_F_AVAIL(1);
+ used = flags & VRING_DESC_F_USED(1);
+ return avail == used;
}
static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
{
- return false;
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ START_USE(vq);
+
+ /* We optimistically turn back on interrupts, then check if there was
+ * more to do. */
+
+ if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
+ virtio_wmb(vq->weak_barriers);
+ vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
+ vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
+ vq->event_flags_shadow);
+ }
+
+ if (more_used_packed(vq)) {
+ END_USE(vq);
+ return false;
+ }
+
+ END_USE(vq);
+ return true;
}
static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 flags, head, id, i;
+ unsigned int len;
+ void *buf;
+
+ START_USE(vq);
+
+ /* Detach the used descriptors. */
+ if (more_used_packed(vq)) {
+ buf = virtqueue_get_buf_ctx_packed(_vq, &len, NULL);
+ END_USE(vq);
+ return buf;
+ }
+
+ /* Detach the available descriptors. */
+ for (i = vq->last_used_idx; i != vq->next_avail_idx;
+ i = (i + 1) % vq->vring_packed.num) {
+ flags = virtio16_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[i].flags);
+ while (flags & VRING_DESC_F_NEXT) {
+ i = (i + 1) % vq->vring_packed.num;
+ flags = virtio16_to_cpu(vq->vq.vdev,
+ vq->vring_packed.desc[i].flags);
+ }
+ id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[i].id);
+ if (!vq->desc_state[id].data)
+ continue;
+
+ len = vq->desc_state[id].num - 1;
+ head = (i < len ? i + vq->vring_packed.num : i) - len;
+
+ /* detach_buf clears data, so grab it now. */
+ buf = vq->desc_state[id].data;
+ detach_buf_packed(vq, head, id, NULL);
+ END_USE(vq);
+ return buf;
+ }
+ /* That should have freed everything. */
+ BUG_ON(vq->vq.num_free != vq->vring_packed.num);
+
+ END_USE(vq);
return NULL;
}
@@ -1198,6 +1666,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
vq->next_avail_idx = 0;
vq->wrap_counter = 1;
vq->event_flags_shadow = 0;
+ idr_init(&vq->buffer_id);
} else {
vq->vring = vring.vring_split;
vq->avail_flags_shadow = 0;
@@ -1384,6 +1853,8 @@ void vring_del_virtqueue(struct virtqueue *_vq)
(void *)vq->vring.desc,
vq->queue_dma_addr);
}
+ if (vq->packed)
+ idr_destroy(&vq->buffer_id);
list_del(&_vq->list);
kfree(vq);
}
--
2.17.0
^ permalink raw reply related
* [RFC v4 2/5] virtio_ring: support creating packed ring
From: Tiwei Bie @ 2018-05-16 8:37 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180516083737.26504-1-tiwei.bie@intel.com>
This commit introduces the support for creating packed ring.
All split ring specific functions are added _split suffix.
Some necessary stubs for packed ring are also added.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
drivers/virtio/virtio_ring.c | 764 +++++++++++++++++++++++------------
include/linux/virtio_ring.h | 8 +-
2 files changed, 513 insertions(+), 259 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71458f493cf8..62d7c407841a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -64,8 +64,8 @@ struct vring_desc_state {
struct vring_virtqueue {
struct virtqueue vq;
- /* Actual memory layout for this queue */
- struct vring vring;
+ /* Is this a packed ring? */
+ bool packed;
/* Can we use weak barriers? */
bool weak_barriers;
@@ -79,19 +79,45 @@ struct vring_virtqueue {
/* Host publishes avail event idx */
bool event;
- /* Head of free buffer list. */
- unsigned int free_head;
/* Number we've added since last sync. */
unsigned int num_added;
/* Last used index we've seen. */
u16 last_used_idx;
- /* Last written value to avail->flags */
- u16 avail_flags_shadow;
+ union {
+ /* Available for split ring */
+ struct {
+ /* Actual memory layout for this queue. */
+ struct vring vring;
- /* Last written value to avail->idx in guest byte order */
- u16 avail_idx_shadow;
+ /* Head of free buffer list. */
+ unsigned int free_head;
+
+ /* Last written value to avail->flags */
+ u16 avail_flags_shadow;
+
+ /* Last written value to avail->idx in
+ * guest byte order. */
+ u16 avail_idx_shadow;
+ };
+
+ /* Available for packed ring */
+ struct {
+ /* Actual memory layout for this queue. */
+ struct vring_packed vring_packed;
+
+ /* Driver ring wrap counter. */
+ u8 wrap_counter;
+
+ /* Index of the next avail descriptor. */
+ u16 next_avail_idx;
+
+ /* Last written value to driver->flags in
+ * guest byte order. */
+ u16 event_flags_shadow;
+ };
+ };
/* How to notify other side. FIXME: commonalize hcalls! */
bool (*notify)(struct virtqueue *vq);
@@ -201,8 +227,17 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
cpu_addr, size, direction);
}
-static void vring_unmap_one(const struct vring_virtqueue *vq,
- struct vring_desc *desc)
+static int vring_mapping_error(const struct vring_virtqueue *vq,
+ dma_addr_t addr)
+{
+ if (!vring_use_dma_api(vq->vq.vdev))
+ return 0;
+
+ return dma_mapping_error(vring_dma_dev(vq), addr);
+}
+
+static void vring_unmap_one_split(const struct vring_virtqueue *vq,
+ struct vring_desc *desc)
{
u16 flags;
@@ -226,17 +261,9 @@ static void vring_unmap_one(const struct vring_virtqueue *vq,
}
}
-static int vring_mapping_error(const struct vring_virtqueue *vq,
- dma_addr_t addr)
-{
- if (!vring_use_dma_api(vq->vq.vdev))
- return 0;
-
- return dma_mapping_error(vring_dma_dev(vq), addr);
-}
-
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
- unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+ unsigned int total_sg,
+ gfp_t gfp)
{
struct vring_desc *desc;
unsigned int i;
@@ -257,14 +284,14 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
return desc;
}
-static inline int virtqueue_add(struct virtqueue *_vq,
- struct scatterlist *sgs[],
- unsigned int total_sg,
- unsigned int out_sgs,
- unsigned int in_sgs,
- void *data,
- void *ctx,
- gfp_t gfp)
+static inline int virtqueue_add_split(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ void *ctx,
+ gfp_t gfp)
{
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
@@ -303,7 +330,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
/* If the host supports indirect descriptor tables, and we have multiple
* buffers, then go indirect. FIXME: tune this threshold */
if (vq->indirect && total_sg > 1 && vq->vq.num_free)
- desc = alloc_indirect(_vq, total_sg, gfp);
+ desc = alloc_indirect_split(_vq, total_sg, gfp);
else {
desc = NULL;
WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
@@ -424,7 +451,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
for (n = 0; n < total_sg; n++) {
if (i == err_idx)
break;
- vring_unmap_one(vq, &desc[i]);
+ vring_unmap_one_split(vq, &desc[i]);
i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
}
@@ -435,6 +462,355 @@ static inline int virtqueue_add(struct virtqueue *_vq,
return -EIO;
}
+static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 new, old;
+ bool needs_kick;
+
+ START_USE(vq);
+ /* We need to expose available array entries before checking avail
+ * event. */
+ virtio_mb(vq->weak_barriers);
+
+ old = vq->avail_idx_shadow - vq->num_added;
+ new = vq->avail_idx_shadow;
+ vq->num_added = 0;
+
+#ifdef DEBUG
+ if (vq->last_add_time_valid) {
+ WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
+ vq->last_add_time)) > 100);
+ }
+ vq->last_add_time_valid = false;
+#endif
+
+ if (vq->event) {
+ needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
+ new, old);
+ } else {
+ needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
+ }
+ END_USE(vq);
+ return needs_kick;
+}
+
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+ void **ctx)
+{
+ unsigned int i, j;
+ __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+
+ /* Clear data ptr. */
+ vq->desc_state[head].data = NULL;
+
+ /* Put back on free list: unmap first-level descriptors and find end */
+ i = head;
+
+ while (vq->vring.desc[i].flags & nextflag) {
+ vring_unmap_one_split(vq, &vq->vring.desc[i]);
+ i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
+ vq->vq.num_free++;
+ }
+
+ vring_unmap_one_split(vq, &vq->vring.desc[i]);
+ vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
+ vq->free_head = head;
+
+ /* Plus final descriptor */
+ vq->vq.num_free++;
+
+ if (vq->indirect) {
+ struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
+ u32 len;
+
+ /* Free the indirect table, if any, now that it's unmapped. */
+ if (!indir_desc)
+ return;
+
+ len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
+
+ BUG_ON(!(vq->vring.desc[head].flags &
+ cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
+ BUG_ON(len == 0 || len % sizeof(struct vring_desc));
+
+ for (j = 0; j < len / sizeof(struct vring_desc); j++)
+ vring_unmap_one_split(vq, &indir_desc[j]);
+
+ kfree(indir_desc);
+ vq->desc_state[head].indir_desc = NULL;
+ } else if (ctx) {
+ *ctx = vq->desc_state[head].indir_desc;
+ }
+}
+
+static inline bool more_used_split(const struct vring_virtqueue *vq)
+{
+ return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
+}
+
+static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
+ unsigned int *len,
+ void **ctx)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ void *ret;
+ unsigned int i;
+ u16 last_used;
+
+ START_USE(vq);
+
+ if (unlikely(vq->broken)) {
+ END_USE(vq);
+ return NULL;
+ }
+
+ if (!more_used_split(vq)) {
+ pr_debug("No more buffers in queue\n");
+ END_USE(vq);
+ return NULL;
+ }
+
+ /* Only get used array entries after they have been exposed by host. */
+ virtio_rmb(vq->weak_barriers);
+
+ last_used = (vq->last_used_idx & (vq->vring.num - 1));
+ i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
+ *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
+
+ if (unlikely(i >= vq->vring.num)) {
+ BAD_RING(vq, "id %u out of range\n", i);
+ return NULL;
+ }
+ if (unlikely(!vq->desc_state[i].data)) {
+ BAD_RING(vq, "id %u is not a head!\n", i);
+ return NULL;
+ }
+
+ /* detach_buf_split clears data, so grab it now. */
+ ret = vq->desc_state[i].data;
+ detach_buf_split(vq, i, ctx);
+ vq->last_used_idx++;
+ /* If we expect an interrupt for the next entry, tell host
+ * by writing event index and flush out the write before
+ * the read in the next get_buf call. */
+ if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
+ virtio_store_mb(vq->weak_barriers,
+ &vring_used_event(&vq->vring),
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
+
+#ifdef DEBUG
+ vq->last_add_time_valid = false;
+#endif
+
+ END_USE(vq);
+ return ret;
+}
+
+static void virtqueue_disable_cb_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
+ vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+ if (!vq->event)
+ vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+ }
+}
+
+static unsigned virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 last_used_idx;
+
+ START_USE(vq);
+
+ /* We optimistically turn back on interrupts, then check if there was
+ * more to do. */
+ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always do both to keep code simple. */
+ if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
+ vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
+ if (!vq->event)
+ vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+ }
+ vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
+ END_USE(vq);
+ return last_used_idx;
+}
+
+static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ virtio_mb(vq->weak_barriers);
+ return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
+}
+
+static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 bufs;
+
+ START_USE(vq);
+
+ /* We optimistically turn back on interrupts, then check if there was
+ * more to do. */
+ /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+ * either clear the flags bit or point the event index at the next
+ * entry. Always update the event index to keep code simple. */
+ if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
+ vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
+ if (!vq->event)
+ vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
+ }
+ /* TODO: tune this threshold */
+ bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
+
+ virtio_store_mb(vq->weak_barriers,
+ &vring_used_event(&vq->vring),
+ cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
+
+ if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
+ END_USE(vq);
+ return false;
+ }
+
+ END_USE(vq);
+ return true;
+}
+
+static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ unsigned int i;
+ void *buf;
+
+ START_USE(vq);
+
+ for (i = 0; i < vq->vring.num; i++) {
+ if (!vq->desc_state[i].data)
+ continue;
+ /* detach_buf clears data, so grab it now. */
+ buf = vq->desc_state[i].data;
+ detach_buf_split(vq, i, NULL);
+ vq->avail_idx_shadow--;
+ vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
+ END_USE(vq);
+ return buf;
+ }
+ /* That should have freed everything. */
+ BUG_ON(vq->vq.num_free != vq->vring.num);
+
+ END_USE(vq);
+ return NULL;
+}
+
+/*
+ * The layout for the packed ring is a continuous chunk of memory
+ * which looks like this.
+ *
+ * struct vring_packed {
+ * // The actual descriptors (16 bytes each)
+ * struct vring_packed_desc desc[num];
+ *
+ * // Padding to the next align boundary.
+ * char pad[];
+ *
+ * // Driver Event Suppression
+ * struct vring_packed_desc_event driver;
+ *
+ * // Device Event Suppression
+ * struct vring_packed_desc_event device;
+ * };
+ */
+static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
+ void *p, unsigned long align)
+{
+ vr->num = num;
+ vr->desc = p;
+ vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
+ * num + align - 1) & ~(align - 1));
+ vr->device = vr->driver + 1;
+}
+
+static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
+{
+ return ((sizeof(struct vring_packed_desc) * num + align - 1)
+ & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
+}
+
+static inline int virtqueue_add_packed(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ void *ctx,
+ gfp_t gfp)
+{
+ return -EIO;
+}
+
+static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
+{
+ return false;
+}
+
+static inline bool more_used_packed(const struct vring_virtqueue *vq)
+{
+ return false;
+}
+
+static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
+ unsigned int *len,
+ void **ctx)
+{
+ return NULL;
+}
+
+static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
+{
+}
+
+static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
+{
+ return 0;
+}
+
+static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
+{
+ return false;
+}
+
+static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
+{
+ return false;
+}
+
+static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
+{
+ return NULL;
+}
+
+static inline int virtqueue_add(struct virtqueue *_vq,
+ struct scatterlist *sgs[],
+ unsigned int total_sg,
+ unsigned int out_sgs,
+ unsigned int in_sgs,
+ void *data,
+ void *ctx,
+ gfp_t gfp)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
+ in_sgs, data, ctx, gfp) :
+ virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
+ in_sgs, data, ctx, gfp);
+}
+
/**
* virtqueue_add_sgs - expose buffers to other end
* @vq: the struct virtqueue we're talking about.
@@ -551,34 +927,9 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
bool virtqueue_kick_prepare(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 new, old;
- bool needs_kick;
- START_USE(vq);
- /* We need to expose available array entries before checking avail
- * event. */
- virtio_mb(vq->weak_barriers);
-
- old = vq->avail_idx_shadow - vq->num_added;
- new = vq->avail_idx_shadow;
- vq->num_added = 0;
-
-#ifdef DEBUG
- if (vq->last_add_time_valid) {
- WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
- vq->last_add_time)) > 100);
- }
- vq->last_add_time_valid = false;
-#endif
-
- if (vq->event) {
- needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
- new, old);
- } else {
- needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
- }
- END_USE(vq);
- return needs_kick;
+ return vq->packed ? virtqueue_kick_prepare_packed(_vq) :
+ virtqueue_kick_prepare_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
@@ -626,58 +977,9 @@ bool virtqueue_kick(struct virtqueue *vq)
}
EXPORT_SYMBOL_GPL(virtqueue_kick);
-static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
- void **ctx)
-{
- unsigned int i, j;
- __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
-
- /* Clear data ptr. */
- vq->desc_state[head].data = NULL;
-
- /* Put back on free list: unmap first-level descriptors and find end */
- i = head;
-
- while (vq->vring.desc[i].flags & nextflag) {
- vring_unmap_one(vq, &vq->vring.desc[i]);
- i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
- vq->vq.num_free++;
- }
-
- vring_unmap_one(vq, &vq->vring.desc[i]);
- vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
- vq->free_head = head;
-
- /* Plus final descriptor */
- vq->vq.num_free++;
-
- if (vq->indirect) {
- struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
- u32 len;
-
- /* Free the indirect table, if any, now that it's unmapped. */
- if (!indir_desc)
- return;
-
- len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
-
- BUG_ON(!(vq->vring.desc[head].flags &
- cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
- BUG_ON(len == 0 || len % sizeof(struct vring_desc));
-
- for (j = 0; j < len / sizeof(struct vring_desc); j++)
- vring_unmap_one(vq, &indir_desc[j]);
-
- kfree(indir_desc);
- vq->desc_state[head].indir_desc = NULL;
- } else if (ctx) {
- *ctx = vq->desc_state[head].indir_desc;
- }
-}
-
static inline bool more_used(const struct vring_virtqueue *vq)
{
- return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
+ return vq->packed ? more_used_packed(vq) : more_used_split(vq);
}
/**
@@ -700,57 +1002,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
void **ctx)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- void *ret;
- unsigned int i;
- u16 last_used;
- START_USE(vq);
-
- if (unlikely(vq->broken)) {
- END_USE(vq);
- return NULL;
- }
-
- if (!more_used(vq)) {
- pr_debug("No more buffers in queue\n");
- END_USE(vq);
- return NULL;
- }
-
- /* Only get used array entries after they have been exposed by host. */
- virtio_rmb(vq->weak_barriers);
-
- last_used = (vq->last_used_idx & (vq->vring.num - 1));
- i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
- *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
-
- if (unlikely(i >= vq->vring.num)) {
- BAD_RING(vq, "id %u out of range\n", i);
- return NULL;
- }
- if (unlikely(!vq->desc_state[i].data)) {
- BAD_RING(vq, "id %u is not a head!\n", i);
- return NULL;
- }
-
- /* detach_buf clears data, so grab it now. */
- ret = vq->desc_state[i].data;
- detach_buf(vq, i, ctx);
- vq->last_used_idx++;
- /* If we expect an interrupt for the next entry, tell host
- * by writing event index and flush out the write before
- * the read in the next get_buf call. */
- if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
- virtio_store_mb(vq->weak_barriers,
- &vring_used_event(&vq->vring),
- cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
-
-#ifdef DEBUG
- vq->last_add_time_valid = false;
-#endif
-
- END_USE(vq);
- return ret;
+ return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
+ virtqueue_get_buf_ctx_split(_vq, len, ctx);
}
EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
@@ -772,12 +1026,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
- vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
- vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
- }
-
+ if (vq->packed)
+ virtqueue_disable_cb_packed(_vq);
+ else
+ virtqueue_disable_cb_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
@@ -796,23 +1048,9 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 last_used_idx;
- START_USE(vq);
-
- /* We optimistically turn back on interrupts, then check if there was
- * more to do. */
- /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
- * either clear the flags bit or point the event index at the next
- * entry. Always do both to keep code simple. */
- if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
- vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
- vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
- }
- vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
- END_USE(vq);
- return last_used_idx;
+ return vq->packed ? virtqueue_enable_cb_prepare_packed(_vq) :
+ virtqueue_enable_cb_prepare_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
@@ -829,8 +1067,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- virtio_mb(vq->weak_barriers);
- return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
+ return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
+ virtqueue_poll_split(_vq, last_used_idx);
}
EXPORT_SYMBOL_GPL(virtqueue_poll);
@@ -868,34 +1106,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- u16 bufs;
- START_USE(vq);
-
- /* We optimistically turn back on interrupts, then check if there was
- * more to do. */
- /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
- * either clear the flags bit or point the event index at the next
- * entry. Always update the event index to keep code simple. */
- if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
- vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
- vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
- }
- /* TODO: tune this threshold */
- bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
-
- virtio_store_mb(vq->weak_barriers,
- &vring_used_event(&vq->vring),
- cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
-
- if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
- END_USE(vq);
- return false;
- }
-
- END_USE(vq);
- return true;
+ return vq->packed ? virtqueue_enable_cb_delayed_packed(_vq) :
+ virtqueue_enable_cb_delayed_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
@@ -910,27 +1123,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- unsigned int i;
- void *buf;
- START_USE(vq);
-
- for (i = 0; i < vq->vring.num; i++) {
- if (!vq->desc_state[i].data)
- continue;
- /* detach_buf clears data, so grab it now. */
- buf = vq->desc_state[i].data;
- detach_buf(vq, i, NULL);
- vq->avail_idx_shadow--;
- vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
- END_USE(vq);
- return buf;
- }
- /* That should have freed everything. */
- BUG_ON(vq->vq.num_free != vq->vring.num);
-
- END_USE(vq);
- return NULL;
+ return vq->packed ? virtqueue_detach_unused_buf_packed(_vq) :
+ virtqueue_detach_unused_buf_split(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
@@ -955,7 +1150,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
EXPORT_SYMBOL_GPL(vring_interrupt);
struct virtqueue *__vring_new_virtqueue(unsigned int index,
- struct vring vring,
+ union vring_union vring,
+ bool packed,
struct virtio_device *vdev,
bool weak_barriers,
bool context,
@@ -963,19 +1159,20 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
void (*callback)(struct virtqueue *),
const char *name)
{
- unsigned int i;
+ unsigned int num, i;
struct vring_virtqueue *vq;
- vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
+ num = packed ? vring.vring_packed.num : vring.vring_split.num;
+
+ vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
GFP_KERNEL);
if (!vq)
return NULL;
- vq->vring = vring;
vq->vq.callback = callback;
vq->vq.vdev = vdev;
vq->vq.name = name;
- vq->vq.num_free = vring.num;
+ vq->vq.num_free = num;
vq->vq.index = index;
vq->we_own_ring = false;
vq->queue_dma_addr = 0;
@@ -984,9 +1181,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
- vq->avail_flags_shadow = 0;
- vq->avail_idx_shadow = 0;
vq->num_added = 0;
+ vq->packed = packed;
list_add_tail(&vq->vq.list, &vdev->vqs);
#ifdef DEBUG
vq->in_use = false;
@@ -997,18 +1193,37 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
!context;
vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+ if (vq->packed) {
+ vq->vring_packed = vring.vring_packed;
+ vq->next_avail_idx = 0;
+ vq->wrap_counter = 1;
+ vq->event_flags_shadow = 0;
+ } else {
+ vq->vring = vring.vring_split;
+ vq->avail_flags_shadow = 0;
+ vq->avail_idx_shadow = 0;
+
+ /* Put everything in free lists. */
+ vq->free_head = 0;
+ for (i = 0; i < num-1; i++)
+ vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
+ }
+
/* No callback? Tell other side not to bother us. */
if (!callback) {
- vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
- if (!vq->event)
- vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
+ if (packed) {
+ vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
+ vq->vring_packed.driver->flags = cpu_to_virtio16(vdev,
+ vq->event_flags_shadow);
+ } else {
+ vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+ if (!vq->event)
+ vq->vring.avail->flags = cpu_to_virtio16(vdev,
+ vq->avail_flags_shadow);
+ }
}
- /* Put everything in free lists. */
- vq->free_head = 0;
- for (i = 0; i < vring.num-1; i++)
- vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
- memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
+ memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
return &vq->vq;
}
@@ -1056,6 +1271,12 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
}
}
+static inline int
+__vring_size(unsigned int num, unsigned long align, bool packed)
+{
+ return packed ? vring_size_packed(num, align) : vring_size(num, align);
+}
+
struct virtqueue *vring_create_virtqueue(
unsigned int index,
unsigned int num,
@@ -1072,7 +1293,8 @@ struct virtqueue *vring_create_virtqueue(
void *queue = NULL;
dma_addr_t dma_addr;
size_t queue_size_in_bytes;
- struct vring vring;
+ union vring_union vring;
+ bool packed;
/* We assume num is a power of 2. */
if (num & (num - 1)) {
@@ -1080,9 +1302,13 @@ struct virtqueue *vring_create_virtqueue(
return NULL;
}
+ packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+
/* TODO: allocate each queue chunk individually */
- for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
- queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+ for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
+ num /= 2) {
+ queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+ packed),
&dma_addr,
GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
if (queue)
@@ -1094,17 +1320,21 @@ struct virtqueue *vring_create_virtqueue(
if (!queue) {
/* Try to get a single page. You are my only hope! */
- queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+ queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+ packed),
&dma_addr, GFP_KERNEL|__GFP_ZERO);
}
if (!queue)
return NULL;
- queue_size_in_bytes = vring_size(num, vring_align);
- vring_init(&vring, num, queue, vring_align);
+ queue_size_in_bytes = __vring_size(num, vring_align, packed);
+ if (packed)
+ vring_init_packed(&vring.vring_packed, num, queue, vring_align);
+ else
+ vring_init(&vring.vring_split, num, queue, vring_align);
- vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
- notify, callback, name);
+ vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+ context, notify, callback, name);
if (!vq) {
vring_free_queue(vdev, queue_size_in_bytes, queue,
dma_addr);
@@ -1130,10 +1360,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
void (*callback)(struct virtqueue *vq),
const char *name)
{
- struct vring vring;
- vring_init(&vring, num, pages, vring_align);
- return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
- notify, callback, name);
+ union vring_union vring;
+ bool packed;
+
+ packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+ if (packed)
+ vring_init_packed(&vring.vring_packed, num, pages, vring_align);
+ else
+ vring_init(&vring.vring_split, num, pages, vring_align);
+
+ return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+ context, notify, callback, name);
}
EXPORT_SYMBOL_GPL(vring_new_virtqueue);
@@ -1143,7 +1380,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
if (vq->we_own_ring) {
vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
- vq->vring.desc, vq->queue_dma_addr);
+ vq->packed ? (void *)vq->vring_packed.desc :
+ (void *)vq->vring.desc,
+ vq->queue_dma_addr);
}
list_del(&_vq->list);
kfree(vq);
@@ -1185,7 +1424,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
struct vring_virtqueue *vq = to_vvq(_vq);
- return vq->vring.num;
+ return vq->packed ? vq->vring_packed.num : vq->vring.num;
}
EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
@@ -1228,6 +1467,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
BUG_ON(!vq->we_own_ring);
+ if (vq->packed)
+ return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
+ (char *)vq->vring_packed.desc);
+
return vq->queue_dma_addr +
((char *)vq->vring.avail - (char *)vq->vring.desc);
}
@@ -1239,11 +1482,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
BUG_ON(!vq->we_own_ring);
+ if (vq->packed)
+ return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
+ (char *)vq->vring_packed.desc);
+
return vq->queue_dma_addr +
((char *)vq->vring.used - (char *)vq->vring.desc);
}
EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
+/* Only available for split ring */
const struct vring *virtqueue_get_vring(struct virtqueue *vq)
{
return &to_vvq(vq)->vring;
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index bbf32524ab27..a0075894ad16 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
struct virtio_device;
struct virtqueue;
+union vring_union {
+ struct vring vring_split;
+ struct vring_packed vring_packed;
+};
+
/*
* Creates a virtqueue and allocates the descriptor ring. If
* may_reduce_num is set, then this may allocate a smaller ring than
@@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
/* Creates a virtqueue with a custom layout. */
struct virtqueue *__vring_new_virtqueue(unsigned int index,
- struct vring vring,
+ union vring_union vring,
+ bool packed,
struct virtio_device *vdev,
bool weak_barriers,
bool ctx,
--
2.17.0
^ permalink raw reply related
* [RFC v4 1/5] virtio: add packed ring definitions
From: Tiwei Bie @ 2018-05-16 8:37 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180516083737.26504-1-tiwei.bie@intel.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
include/uapi/linux/virtio_config.h | 12 +++++++++-
include/uapi/linux/virtio_ring.h | 36 ++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e2096291f..a6e392325e3a 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
* transport being used (eg. virtio_ring), the rest are per-device feature
* bits. */
#define VIRTIO_TRANSPORT_F_START 28
-#define VIRTIO_TRANSPORT_F_END 34
+#define VIRTIO_TRANSPORT_F_END 36
#ifndef VIRTIO_CONFIG_NO_LEGACY
/* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,14 @@
* this is for compatibility with legacy systems.
*/
#define VIRTIO_F_IOMMU_PLATFORM 33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED 34
+
+/*
+ * This feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER 35
+
#endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..3932cb80c347 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
/* This means the buffer contains a list of buffer descriptors. */
#define VRING_DESC_F_INDIRECT 4
+#define VRING_DESC_F_AVAIL(b) ((b) << 7)
+#define VRING_DESC_F_USED(b) ((b) << 15)
+
/* The Host uses this in used->flags to advise the Guest: don't kick me when
* you add a buffer. It's unreliable, so it's simply an optimization. Guest
* will still kick if it's out of buffers. */
@@ -53,6 +56,10 @@
* optimization. */
#define VRING_AVAIL_F_NO_INTERRUPT 1
+#define VRING_EVENT_F_ENABLE 0x0
+#define VRING_EVENT_F_DISABLE 0x1
+#define VRING_EVENT_F_DESC 0x2
+
/* We support indirect buffer descriptors */
#define VIRTIO_RING_F_INDIRECT_DESC 28
@@ -171,4 +178,33 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
}
+struct vring_packed_desc_event {
+ /* __virtio16 off : 15; // Descriptor Event Offset
+ * __virtio16 wrap : 1; // Descriptor Event Wrap Counter */
+ __virtio16 off_wrap;
+ /* __virtio16 flags : 2; // Descriptor Event Flags */
+ __virtio16 flags;
+};
+
+struct vring_packed_desc {
+ /* Buffer Address. */
+ __virtio64 addr;
+ /* Buffer Length. */
+ __virtio32 len;
+ /* Buffer ID. */
+ __virtio16 id;
+ /* The flags depending on descriptor type. */
+ __virtio16 flags;
+};
+
+struct vring_packed {
+ unsigned int num;
+
+ struct vring_packed_desc *desc;
+
+ struct vring_packed_desc_event *driver;
+
+ struct vring_packed_desc_event *device;
+};
+
#endif /* _UAPI_LINUX_VIRTIO_RING_H */
--
2.17.0
^ permalink raw reply related
* [RFC v4 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-05-16 8:37 UTC (permalink / raw)
To: mst, jasowang, virtualization, linux-kernel, netdev; +Cc: wexu
Hello everyone,
This RFC implements packed ring support in virtio driver.
Some simple functional tests have been done with Jason's
packed ring implementation in vhost:
https://lkml.org/lkml/2018/4/23/12
Both of ping and netperf worked as expected (with EVENT_IDX
disabled).
TODO:
- Refinements (for code and commit log);
- More tests;
- Bug fixes;
RFC v3 -> RFC v4:
- Make ID allocation support out-of-order (Jason);
- Various fixes for EVENT_IDX support;
RFC v2 -> RFC v3:
- Split into small patches (Jason);
- Add helper virtqueue_use_indirect() (Jason);
- Just set id for the last descriptor of a list (Jason);
- Calculate the prev in virtqueue_add_packed() (Jason);
- Fix/improve desc suppression code (Jason/MST);
- Refine the code layout for XXX_split/packed and wrappers (MST);
- Fix the comments and API in uapi (MST);
- Remove the BUG_ON() for indirect (Jason);
- Some other refinements and bug fixes;
RFC v1 -> RFC v2:
- Add indirect descriptor support - compile test only;
- Add event suppression supprt - compile test only;
- Move vring_packed_init() out of uapi (Jason, MST);
- Merge two loops into one in virtqueue_add_packed() (Jason);
- Split vring_unmap_one() for packed ring and split ring (Jason);
- Avoid using '%' operator (Jason);
- Rename free_head -> next_avail_idx (Jason);
- Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
- Some other refinements and bug fixes;
Thanks!
Tiwei Bie (5):
virtio: add packed ring definitions
virtio_ring: support creating packed ring
virtio_ring: add packed ring support
virtio_ring: add event idx support in packed ring
virtio_ring: enable packed ring
drivers/virtio/virtio_ring.c | 1338 ++++++++++++++++++++++------
include/linux/virtio_ring.h | 8 +-
include/uapi/linux/virtio_config.h | 12 +-
include/uapi/linux/virtio_ring.h | 36 +
4 files changed, 1116 insertions(+), 278 deletions(-)
--
2.17.0
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Tiwei Bie @ 2018-05-16 5:55 UTC (permalink / raw)
To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <915efeaa-4e47-9004-a722-54b0c40ebcbb@redhat.com>
On Wed, May 16, 2018 at 01:01:04PM +0800, Jason Wang wrote:
> On 2018年04月25日 13:15, Tiwei Bie wrote:
[...]
> > @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> > /* We optimistically turn back on interrupts, then check if there was
> > * more to do. */
> > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > + * either clear the flags bit or point the event index at the next
> > + * entry. Always update the event index to keep code simple. */
> > +
> > + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> > + vq->last_used_idx | (vq->wrap_counter << 15));
>
>
> Using vq->wrap_counter seems not correct, what we need is the warp counter
> for the last_used_idx not next_avail_idx.
Yes, you're right. I have fixed it in my local repo,
but haven't sent out a new version yet.
I'll try to send out a new RFC today.
>
> And I think there's even no need to bother with event idx here, how about
> just set VRING_EVENT_F_ENABLE?
We had a similar discussion before. Michael prefers
to use VRING_EVENT_F_DESC when possible to avoid
extra interrupts if host is fast:
https://lkml.org/lkml/2018/4/16/1085
"""
I suspect this will lead to extra interrupts if host is fast.
So I think for now we should always use VRING_EVENT_F_DESC
if EVENT_IDX is negotiated.
"""
>
> > if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> > virtio_wmb(vq->weak_barriers);
> > - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> > + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> > + VRING_EVENT_F_ENABLE;
> > vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> > vq->event_flags_shadow);
> > }
> > @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> > static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > + u16 bufs, used_idx, wrap_counter;
> > START_USE(vq);
> > /* We optimistically turn back on interrupts, then check if there was
> > * more to do. */
> > + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > + * either clear the flags bit or point the event index at the next
> > + * entry. Always update the event index to keep code simple. */
> > +
> > + /* TODO: tune this threshold */
> > + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
>
> bufs could be more than vq->num here, is this intended?
Yes, you're right. Like the above one -- I have fixed
it in my local repo, but haven't sent out a new version
yet. Thanks for spotting this!
>
> > +
> > + used_idx = vq->last_used_idx + bufs;
> > + wrap_counter = vq->wrap_counter;
> > +
> > + if (used_idx >= vq->vring_packed.num) {
> > + used_idx -= vq->vring_packed.num;
> > + wrap_counter ^= 1;
>
> When used_idx is greater or equal vq->num, there's no need to flip
> warp_counter bit since it should match next_avail_idx.
>
> And we need also care about the case when next_avail wraps but used_idx not.
> so we probaly need:
>
> else if (vq->next_avail_idx < used_idx) {
> wrap_counter ^= 1;
> }
>
> I think maybe it's time to add some sample codes in the spec to avoid
> duplicating the efforts(bugs).
+1
Best regards,
Tiwei Bie
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Jason Wang @ 2018-05-16 5:01 UTC (permalink / raw)
To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180425051550.24342-5-tiwei.bie@intel.com>
On 2018年04月25日 13:15, Tiwei Bie wrote:
> This commit introduces the event idx support in packed
> ring. This feature is temporarily disabled, because the
> implementation in this patch may not work as expected,
> and some further discussions on the implementation are
> needed, e.g. do we have to check the wrap counter when
> checking whether a kick is needed?
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
> drivers/virtio/virtio_ring.c | 53 ++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 0181e93897be..b1039c2985b9 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 flags;
> + u16 new, old, off_wrap, flags;
> bool needs_kick;
> u32 snapshot;
>
> @@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> * suppressions. */
> virtio_mb(vq->weak_barriers);
>
> + old = vq->next_avail_idx - vq->num_added;
> + new = vq->next_avail_idx;
> + vq->num_added = 0;
> +
> snapshot = *(u32 *)vq->vring_packed.device;
> + off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0xffff);
> flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
>
> #ifdef DEBUG
> @@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> vq->last_add_time_valid = false;
> #endif
>
> - needs_kick = (flags != VRING_EVENT_F_DISABLE);
> + if (flags == VRING_EVENT_F_DESC)
> + needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
> + else
> + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> END_USE(vq);
> return needs_kick;
> }
> @@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> if (vq->last_used_idx >= vq->vring_packed.num)
> vq->last_used_idx -= vq->vring_packed.num;
>
> + /* If we expect an interrupt for the next entry, tell host
> + * by writing event index and flush out the write before
> + * the read in the next get_buf call. */
> + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> + virtio_store_mb(vq->weak_barriers,
> + &vq->vring_packed.driver->off_wrap,
> + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> + (vq->wrap_counter << 15)));
> +
> #ifdef DEBUG
> vq->last_add_time_valid = false;
> #endif
> @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
>
> /* We optimistically turn back on interrupts, then check if there was
> * more to do. */
> + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always update the event index to keep code simple. */
> +
> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> + vq->last_used_idx | (vq->wrap_counter << 15));
Using vq->wrap_counter seems not correct, what we need is the warp
counter for the last_used_idx not next_avail_idx.
And I think there's even no need to bother with event idx here, how
about just set VRING_EVENT_F_ENABLE?
>
> if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> virtio_wmb(vq->weak_barriers);
> - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> + VRING_EVENT_F_ENABLE;
> vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> vq->event_flags_shadow);
> }
> @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 bufs, used_idx, wrap_counter;
>
> START_USE(vq);
>
> /* We optimistically turn back on interrupts, then check if there was
> * more to do. */
> + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always update the event index to keep code simple. */
> +
> + /* TODO: tune this threshold */
> + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
bufs could be more than vq->num here, is this intended?
> +
> + used_idx = vq->last_used_idx + bufs;
> + wrap_counter = vq->wrap_counter;
> +
> + if (used_idx >= vq->vring_packed.num) {
> + used_idx -= vq->vring_packed.num;
> + wrap_counter ^= 1;
When used_idx is greater or equal vq->num, there's no need to flip
warp_counter bit since it should match next_avail_idx.
And we need also care about the case when next_avail wraps but used_idx
not. so we probaly need:
else if (vq->next_avail_idx < used_idx) {
wrap_counter ^= 1;
}
I think maybe it's time to add some sample codes in the spec to avoid
duplicating the efforts(bugs).
Thanks
> + }
> +
> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> + used_idx | (wrap_counter << 15));
>
> if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> virtio_wmb(vq->weak_barriers);
> - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> + VRING_EVENT_F_ENABLE;
> vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> vq->event_flags_shadow);
> }
> @@ -1822,8 +1865,10 @@ void vring_transport_features(struct virtio_device *vdev)
> switch (i) {
> case VIRTIO_RING_F_INDIRECT_DESC:
> break;
> +#if 0
> case VIRTIO_RING_F_EVENT_IDX:
> break;
> +#endif
> case VIRTIO_F_VERSION_1:
> break;
> case VIRTIO_F_IOMMU_PLATFORM:
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC 0/8] Improving compiler inlining decisions
From: Josh Poimboeuf @ 2018-05-16 3:48 UTC (permalink / raw)
To: Nadav Amit
Cc: Juergen Gross, x86, Kees Cook, Jonathan Corbet, Peter Zijlstra,
Christopher Li, Randy Dunlap, linux-kernel, virtualization,
linux-sparse, Ingo Molnar, Jan Beulich, H. Peter Anvin,
Alok Kataria, nadav.amit, Thomas Gleixner
In-Reply-To: <20180515141124.84254-1-namit@vmware.com>
On Tue, May 15, 2018 at 07:11:07AM -0700, Nadav Amit wrote:
> This patch-set deals with an interesting yet stupid problem: code that
> does not get inlined despite its simplicity.
I got the 0/8 patch twice, and didn't get the 1/8 patch. Was there an
issue with the sending of the patches?
--
Josh
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Jens Axboe @ 2018-05-15 16:20 UTC (permalink / raw)
To: Matthew Wilcox, linux-kernel, linux-scsi, target-devel,
linux1394-devel, linux-usb, kvm, virtualization, netdev,
Juergen Gross, qla2xxx-upstream, Kent Overstreet
Cc: Matthew Wilcox
In-Reply-To: <3a56027b-47bc-dcb8-a465-3670031572f1@kernel.dk>
On 5/15/18 10:11 AM, Jens Axboe wrote:
> On 5/15/18 10:00 AM, Matthew Wilcox wrote:
>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>
>> The sbitmap and the percpu_ida perform essentially the same task,
>> allocating tags for commands. Since the sbitmap is more used than
>> the percpu_ida, convert the percpu_ida users to the sbitmap API.
>
> It should also be the same performance as percpu_ida in light use, and
> performs much better at > 50% utilization of the tag space. I think
> that's better justification than "more used than".
Had to search long and hard for the perf numbers I did for percpu_ida
on higher utilization, but here it is:
https://lkml.org/lkml/2014/4/22/553
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH 1/2] Convert target drivers to use sbitmap
From: Jens Axboe @ 2018-05-15 16:11 UTC (permalink / raw)
To: Matthew Wilcox, linux-kernel, linux-scsi, target-devel,
linux1394-devel, linux-usb, kvm, virtualization, netdev,
Juergen Gross, qla2xxx-upstream, Kent Overstreet
Cc: Matthew Wilcox
In-Reply-To: <20180515160043.27044-2-willy@infradead.org>
On 5/15/18 10:00 AM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> The sbitmap and the percpu_ida perform essentially the same task,
> allocating tags for commands. Since the sbitmap is more used than
> the percpu_ida, convert the percpu_ida users to the sbitmap API.
It should also be the same performance as percpu_ida in light use, and
performs much better at > 50% utilization of the tag space. I think
that's better justification than "more used than".
> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
> index 4435bf374d2d..28bcffae609f 100644
> --- a/drivers/target/iscsi/iscsi_target_util.c
> +++ b/drivers/target/iscsi/iscsi_target_util.c
> @@ -17,7 +17,7 @@
> ******************************************************************************/
>
> #include <linux/list.h>
> -#include <linux/percpu_ida.h>
> +#include <linux/sched/signal.h>
> #include <net/ipv6.h> /* ipv6_addr_equal() */
> #include <scsi/scsi_tcq.h>
> #include <scsi/iscsi_proto.h>
> @@ -147,6 +147,28 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd)
> spin_unlock_bh(&cmd->r2t_lock);
> }
>
> +int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
> +{
> + int tag = -1;
> + DEFINE_WAIT(wait);
> + struct sbq_wait_state *ws;
> +
> + if (state == TASK_RUNNING)
> + return tag;
> +
> + ws = &se_sess->sess_tag_pool.ws[0];
> + for (;;) {
> + prepare_to_wait_exclusive(&ws->wait, &wait, state);
> + if (signal_pending_state(state, current))
> + break;
> + schedule();
> + tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
> + }
> +
> + finish_wait(&ws->wait, &wait);
> + return tag;
> +}
Seems like that should be:
ws = &se_sess->sess_tag_pool.ws[0];
for (;;) {
prepare_to_wait_exclusive(&ws->wait, &wait, state);
if (signal_pending_state(state, current))
break;
tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
if (tag != -1)
break;
schedule();
}
finish_wait(&ws->wait, &wait);
return tag;
> /*
> * May be called from software interrupt (timer) context for allocating
> * iSCSI NopINs.
> @@ -155,9 +177,11 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
> {
> struct iscsi_cmd *cmd;
> struct se_session *se_sess = conn->sess->se_sess;
> - int size, tag;
> + int size, tag, cpu;
>
> - tag = percpu_ida_alloc(&se_sess->sess_tag_pool, state);
> + tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
> + if (tag < 0)
> + tag = iscsit_wait_for_tag(se_sess, state, &cpu);
> if (tag < 0)
> return NULL;
Might make sense to just roll the whole thing into iscsi_get_tag(), that
would be cleaner.
sbitmap should provide a helper for that, but we can do that cleanup
later. That would encapsulate things like the per-cpu caching hint too,
for instance.
Rest looks fine to me.
--
Jens Axboe
^ permalink raw reply
* [PATCH 2/2] Remove percpu_ida
From: Matthew Wilcox @ 2018-05-15 16:00 UTC (permalink / raw)
To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
linux-usb, kvm, virtualization, netdev, Juergen Gross,
qla2xxx-upstream, Kent Overstreet, Jens Axboe
Cc: Matthew Wilcox
In-Reply-To: <20180515160043.27044-1-willy@infradead.org>
From: Matthew Wilcox <mawilcox@microsoft.com>
With its one user gone, remove the library code.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
include/linux/percpu_ida.h | 83 --------
lib/Makefile | 2 +-
lib/percpu_ida.c | 391 -------------------------------------
3 files changed, 1 insertion(+), 475 deletions(-)
delete mode 100644 include/linux/percpu_ida.h
delete mode 100644 lib/percpu_ida.c
diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
deleted file mode 100644
index 07d78e4653bc..000000000000
--- a/include/linux/percpu_ida.h
+++ /dev/null
@@ -1,83 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __PERCPU_IDA_H__
-#define __PERCPU_IDA_H__
-
-#include <linux/types.h>
-#include <linux/bitops.h>
-#include <linux/init.h>
-#include <linux/sched.h>
-#include <linux/spinlock_types.h>
-#include <linux/wait.h>
-#include <linux/cpumask.h>
-
-struct percpu_ida_cpu;
-
-struct percpu_ida {
- /*
- * number of tags available to be allocated, as passed to
- * percpu_ida_init()
- */
- unsigned nr_tags;
- unsigned percpu_max_size;
- unsigned percpu_batch_size;
-
- struct percpu_ida_cpu __percpu *tag_cpu;
-
- /*
- * Bitmap of cpus that (may) have tags on their percpu freelists:
- * steal_tags() uses this to decide when to steal tags, and which cpus
- * to try stealing from.
- *
- * It's ok for a freelist to be empty when its bit is set - steal_tags()
- * will just keep looking - but the bitmap _must_ be set whenever a
- * percpu freelist does have tags.
- */
- cpumask_t cpus_have_tags;
-
- struct {
- spinlock_t lock;
- /*
- * When we go to steal tags from another cpu (see steal_tags()),
- * we want to pick a cpu at random. Cycling through them every
- * time we steal is a bit easier and more or less equivalent:
- */
- unsigned cpu_last_stolen;
-
- /* For sleeping on allocation failure */
- wait_queue_head_t wait;
-
- /*
- * Global freelist - it's a stack where nr_free points to the
- * top
- */
- unsigned nr_free;
- unsigned *freelist;
- } ____cacheline_aligned_in_smp;
-};
-
-/*
- * Number of tags we move between the percpu freelist and the global freelist at
- * a time
- */
-#define IDA_DEFAULT_PCPU_BATCH_MOVE 32U
-/* Max size of percpu freelist, */
-#define IDA_DEFAULT_PCPU_SIZE ((IDA_DEFAULT_PCPU_BATCH_MOVE * 3) / 2)
-
-int percpu_ida_alloc(struct percpu_ida *pool, int state);
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
-
-void percpu_ida_destroy(struct percpu_ida *pool);
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
- unsigned long max_size, unsigned long batch_size);
-static inline int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
-{
- return __percpu_ida_init(pool, nr_tags, IDA_DEFAULT_PCPU_SIZE,
- IDA_DEFAULT_PCPU_BATCH_MOVE);
-}
-
-typedef int (*percpu_ida_cb)(unsigned, void *);
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
- void *data);
-
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu);
-#endif /* __PERCPU_IDA_H__ */
diff --git a/lib/Makefile b/lib/Makefile
index ce20696d5a92..7626dece1d27 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -39,7 +39,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
bsearch.o find_bit.o llist.o memweight.o kfifo.o \
- percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
+ percpu-refcount.o rhashtable.o reciprocal_div.o \
once.o refcount.o usercopy.o errseq.o bucket_locks.o
obj-$(CONFIG_STRING_SELFTEST) += test_string.o
obj-y += string_helpers.o
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
deleted file mode 100644
index 6016f1deb1f5..000000000000
--- a/lib/percpu_ida.c
+++ /dev/null
@@ -1,391 +0,0 @@
-/*
- * Percpu IDA library
- *
- * Copyright (C) 2013 Datera, Inc. Kent Overstreet
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; either version 2, or (at
- * your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- */
-
-#include <linux/mm.h>
-#include <linux/bitmap.h>
-#include <linux/bitops.h>
-#include <linux/bug.h>
-#include <linux/err.h>
-#include <linux/export.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/percpu.h>
-#include <linux/sched/signal.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
-#include <linux/percpu_ida.h>
-
-struct percpu_ida_cpu {
- /*
- * Even though this is percpu, we need a lock for tag stealing by remote
- * CPUs:
- */
- spinlock_t lock;
-
- /* nr_free/freelist form a stack of free IDs */
- unsigned nr_free;
- unsigned freelist[];
-};
-
-static inline void move_tags(unsigned *dst, unsigned *dst_nr,
- unsigned *src, unsigned *src_nr,
- unsigned nr)
-{
- *src_nr -= nr;
- memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr);
- *dst_nr += nr;
-}
-
-/*
- * Try to steal tags from a remote cpu's percpu freelist.
- *
- * We first check how many percpu freelists have tags
- *
- * Then we iterate through the cpus until we find some tags - we don't attempt
- * to find the "best" cpu to steal from, to keep cacheline bouncing to a
- * minimum.
- */
-static inline void steal_tags(struct percpu_ida *pool,
- struct percpu_ida_cpu *tags)
-{
- unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
- struct percpu_ida_cpu *remote;
-
- for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
- cpus_have_tags; cpus_have_tags--) {
- cpu = cpumask_next(cpu, &pool->cpus_have_tags);
-
- if (cpu >= nr_cpu_ids) {
- cpu = cpumask_first(&pool->cpus_have_tags);
- if (cpu >= nr_cpu_ids)
- BUG();
- }
-
- pool->cpu_last_stolen = cpu;
- remote = per_cpu_ptr(pool->tag_cpu, cpu);
-
- cpumask_clear_cpu(cpu, &pool->cpus_have_tags);
-
- if (remote == tags)
- continue;
-
- spin_lock(&remote->lock);
-
- if (remote->nr_free) {
- memcpy(tags->freelist,
- remote->freelist,
- sizeof(unsigned) * remote->nr_free);
-
- tags->nr_free = remote->nr_free;
- remote->nr_free = 0;
- }
-
- spin_unlock(&remote->lock);
-
- if (tags->nr_free)
- break;
- }
-}
-
-/*
- * Pop up to IDA_PCPU_BATCH_MOVE IDs off the global freelist, and push them onto
- * our percpu freelist:
- */
-static inline void alloc_global_tags(struct percpu_ida *pool,
- struct percpu_ida_cpu *tags)
-{
- move_tags(tags->freelist, &tags->nr_free,
- pool->freelist, &pool->nr_free,
- min(pool->nr_free, pool->percpu_batch_size));
-}
-
-static inline unsigned alloc_local_tag(struct percpu_ida_cpu *tags)
-{
- int tag = -ENOSPC;
-
- spin_lock(&tags->lock);
- if (tags->nr_free)
- tag = tags->freelist[--tags->nr_free];
- spin_unlock(&tags->lock);
-
- return tag;
-}
-
-/**
- * percpu_ida_alloc - allocate a tag
- * @pool: pool to allocate from
- * @state: task state for prepare_to_wait
- *
- * Returns a tag - an integer in the range [0..nr_tags) (passed to
- * tag_pool_init()), or otherwise -ENOSPC on allocation failure.
- *
- * Safe to be called from interrupt context (assuming it isn't passed
- * TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, of course).
- *
- * @gfp indicates whether or not to wait until a free id is available (it's not
- * used for internal memory allocations); thus if passed __GFP_RECLAIM we may sleep
- * however long it takes until another thread frees an id (same semantics as a
- * mempool).
- *
- * Will not fail if passed TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE.
- */
-int percpu_ida_alloc(struct percpu_ida *pool, int state)
-{
- DEFINE_WAIT(wait);
- struct percpu_ida_cpu *tags;
- unsigned long flags;
- int tag;
-
- local_irq_save(flags);
- tags = this_cpu_ptr(pool->tag_cpu);
-
- /* Fastpath */
- tag = alloc_local_tag(tags);
- if (likely(tag >= 0)) {
- local_irq_restore(flags);
- return tag;
- }
-
- while (1) {
- spin_lock(&pool->lock);
-
- /*
- * prepare_to_wait() must come before steal_tags(), in case
- * percpu_ida_free() on another cpu flips a bit in
- * cpus_have_tags
- *
- * global lock held and irqs disabled, don't need percpu lock
- */
- if (state != TASK_RUNNING)
- prepare_to_wait(&pool->wait, &wait, state);
-
- if (!tags->nr_free)
- alloc_global_tags(pool, tags);
- if (!tags->nr_free)
- steal_tags(pool, tags);
-
- if (tags->nr_free) {
- tag = tags->freelist[--tags->nr_free];
- if (tags->nr_free)
- cpumask_set_cpu(smp_processor_id(),
- &pool->cpus_have_tags);
- }
-
- spin_unlock(&pool->lock);
- local_irq_restore(flags);
-
- if (tag >= 0 || state == TASK_RUNNING)
- break;
-
- if (signal_pending_state(state, current)) {
- tag = -ERESTARTSYS;
- break;
- }
-
- schedule();
-
- local_irq_save(flags);
- tags = this_cpu_ptr(pool->tag_cpu);
- }
- if (state != TASK_RUNNING)
- finish_wait(&pool->wait, &wait);
-
- return tag;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_alloc);
-
-/**
- * percpu_ida_free - free a tag
- * @pool: pool @tag was allocated from
- * @tag: a tag previously allocated with percpu_ida_alloc()
- *
- * Safe to be called from interrupt context.
- */
-void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
-{
- struct percpu_ida_cpu *tags;
- unsigned long flags;
- unsigned nr_free;
-
- BUG_ON(tag >= pool->nr_tags);
-
- local_irq_save(flags);
- tags = this_cpu_ptr(pool->tag_cpu);
-
- spin_lock(&tags->lock);
- tags->freelist[tags->nr_free++] = tag;
-
- nr_free = tags->nr_free;
- spin_unlock(&tags->lock);
-
- if (nr_free == 1) {
- cpumask_set_cpu(smp_processor_id(),
- &pool->cpus_have_tags);
- wake_up(&pool->wait);
- }
-
- if (nr_free == pool->percpu_max_size) {
- spin_lock(&pool->lock);
-
- /*
- * Global lock held and irqs disabled, don't need percpu
- * lock
- */
- if (tags->nr_free == pool->percpu_max_size) {
- move_tags(pool->freelist, &pool->nr_free,
- tags->freelist, &tags->nr_free,
- pool->percpu_batch_size);
-
- wake_up(&pool->wait);
- }
- spin_unlock(&pool->lock);
- }
-
- local_irq_restore(flags);
-}
-EXPORT_SYMBOL_GPL(percpu_ida_free);
-
-/**
- * percpu_ida_destroy - release a tag pool's resources
- * @pool: pool to free
- *
- * Frees the resources allocated by percpu_ida_init().
- */
-void percpu_ida_destroy(struct percpu_ida *pool)
-{
- free_percpu(pool->tag_cpu);
- free_pages((unsigned long) pool->freelist,
- get_order(pool->nr_tags * sizeof(unsigned)));
-}
-EXPORT_SYMBOL_GPL(percpu_ida_destroy);
-
-/**
- * percpu_ida_init - initialize a percpu tag pool
- * @pool: pool to initialize
- * @nr_tags: number of tags that will be available for allocation
- *
- * Initializes @pool so that it can be used to allocate tags - integers in the
- * range [0, nr_tags). Typically, they'll be used by driver code to refer to a
- * preallocated array of tag structures.
- *
- * Allocation is percpu, but sharding is limited by nr_tags - for best
- * performance, the workload should not span more cpus than nr_tags / 128.
- */
-int __percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags,
- unsigned long max_size, unsigned long batch_size)
-{
- unsigned i, cpu, order;
-
- memset(pool, 0, sizeof(*pool));
-
- init_waitqueue_head(&pool->wait);
- spin_lock_init(&pool->lock);
- pool->nr_tags = nr_tags;
- pool->percpu_max_size = max_size;
- pool->percpu_batch_size = batch_size;
-
- /* Guard against overflow */
- if (nr_tags > (unsigned) INT_MAX + 1) {
- pr_err("percpu_ida_init(): nr_tags too large\n");
- return -EINVAL;
- }
-
- order = get_order(nr_tags * sizeof(unsigned));
- pool->freelist = (void *) __get_free_pages(GFP_KERNEL, order);
- if (!pool->freelist)
- return -ENOMEM;
-
- for (i = 0; i < nr_tags; i++)
- pool->freelist[i] = i;
-
- pool->nr_free = nr_tags;
-
- pool->tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
- pool->percpu_max_size * sizeof(unsigned),
- sizeof(unsigned));
- if (!pool->tag_cpu)
- goto err;
-
- for_each_possible_cpu(cpu)
- spin_lock_init(&per_cpu_ptr(pool->tag_cpu, cpu)->lock);
-
- return 0;
-err:
- percpu_ida_destroy(pool);
- return -ENOMEM;
-}
-EXPORT_SYMBOL_GPL(__percpu_ida_init);
-
-/**
- * percpu_ida_for_each_free - iterate free ids of a pool
- * @pool: pool to iterate
- * @fn: interate callback function
- * @data: parameter for @fn
- *
- * Note, this doesn't guarantee to iterate all free ids restrictly. Some free
- * ids might be missed, some might be iterated duplicated, and some might
- * be iterated and not free soon.
- */
-int percpu_ida_for_each_free(struct percpu_ida *pool, percpu_ida_cb fn,
- void *data)
-{
- unsigned long flags;
- struct percpu_ida_cpu *remote;
- unsigned cpu, i, err = 0;
-
- local_irq_save(flags);
- for_each_possible_cpu(cpu) {
- remote = per_cpu_ptr(pool->tag_cpu, cpu);
- spin_lock(&remote->lock);
- for (i = 0; i < remote->nr_free; i++) {
- err = fn(remote->freelist[i], data);
- if (err)
- break;
- }
- spin_unlock(&remote->lock);
- if (err)
- goto out;
- }
-
- spin_lock(&pool->lock);
- for (i = 0; i < pool->nr_free; i++) {
- err = fn(pool->freelist[i], data);
- if (err)
- break;
- }
- spin_unlock(&pool->lock);
-out:
- local_irq_restore(flags);
- return err;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_for_each_free);
-
-/**
- * percpu_ida_free_tags - return free tags number of a specific cpu or global pool
- * @pool: pool related
- * @cpu: specific cpu or global pool if @cpu == nr_cpu_ids
- *
- * Note: this just returns a snapshot of free tags number.
- */
-unsigned percpu_ida_free_tags(struct percpu_ida *pool, int cpu)
-{
- struct percpu_ida_cpu *remote;
- if (cpu == nr_cpu_ids)
- return pool->nr_free;
- remote = per_cpu_ptr(pool->tag_cpu, cpu);
- return remote->nr_free;
-}
-EXPORT_SYMBOL_GPL(percpu_ida_free_tags);
--
2.17.0
^ permalink raw reply related
* [PATCH 1/2] Convert target drivers to use sbitmap
From: Matthew Wilcox @ 2018-05-15 16:00 UTC (permalink / raw)
To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
linux-usb, kvm, virtualization, netdev, Juergen Gross,
qla2xxx-upstream, Kent Overstreet, Jens Axboe
Cc: Matthew Wilcox
In-Reply-To: <20180515160043.27044-1-willy@infradead.org>
From: Matthew Wilcox <mawilcox@microsoft.com>
The sbitmap and the percpu_ida perform essentially the same task,
allocating tags for commands. Since the sbitmap is more used than
the percpu_ida, convert the percpu_ida users to the sbitmap API.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
drivers/scsi/qla2xxx/qla_target.c | 16 ++++++-----
drivers/target/iscsi/iscsi_target_util.c | 34 +++++++++++++++++++++---
drivers/target/sbp/sbp_target.c | 8 +++---
drivers/target/target_core_transport.c | 5 ++--
drivers/target/tcm_fc/tfc_cmd.c | 11 ++++----
drivers/usb/gadget/function/f_tcm.c | 8 +++---
drivers/vhost/scsi.c | 9 ++++---
drivers/xen/xen-scsiback.c | 8 +++---
include/target/iscsi/iscsi_target_core.h | 1 +
include/target/target_core_base.h | 5 ++--
10 files changed, 73 insertions(+), 32 deletions(-)
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 025dc2d3f3de..cdf671c2af61 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -3719,7 +3719,8 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
return;
}
cmd->jiffies_at_free = get_jiffies_64();
- percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+ sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
+ cmd->se_cmd.map_cpu);
}
EXPORT_SYMBOL(qlt_free_cmd);
@@ -4084,7 +4085,8 @@ static void __qlt_do_work(struct qla_tgt_cmd *cmd)
qlt_send_term_exchange(qpair, NULL, &cmd->atio, 1, 0);
qlt_decr_num_pend_cmds(vha);
- percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+ sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
+ cmd->se_cmd.map_cpu);
spin_unlock_irqrestore(qpair->qp_lock_ptr, flags);
spin_lock_irqsave(&ha->tgt.sess_lock, flags);
@@ -4215,9 +4217,9 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
{
struct se_session *se_sess = sess->se_sess;
struct qla_tgt_cmd *cmd;
- int tag;
+ int tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return NULL;
@@ -4230,6 +4232,7 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
qlt_incr_num_pend_cmds(vha);
cmd->vha = vha;
cmd->se_cmd.map_tag = tag;
+ cmd->se_cmd.map_cpu = cpu;
cmd->sess = sess;
cmd->loop_id = sess->loop_id;
cmd->conf_compl_supported = sess->conf_compl_supported;
@@ -5212,7 +5215,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
struct fc_port *sess;
struct se_session *se_sess;
struct qla_tgt_cmd *cmd;
- int tag;
+ int tag, cpu;
unsigned long flags;
if (unlikely(tgt->tgt_stop)) {
@@ -5244,7 +5247,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
se_sess = sess->se_sess;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return;
@@ -5275,6 +5278,7 @@ qlt_alloc_qfull_cmd(struct scsi_qla_host *vha,
cmd->reset_count = ha->base_qpair->chip_reset;
cmd->q_full = 1;
cmd->qpair = ha->base_qpair;
+ cmd->se_cmd.map_cpu = cpu;
if (qfull) {
cmd->q_full = 1;
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 4435bf374d2d..28bcffae609f 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -17,7 +17,7 @@
******************************************************************************/
#include <linux/list.h>
-#include <linux/percpu_ida.h>
+#include <linux/sched/signal.h>
#include <net/ipv6.h> /* ipv6_addr_equal() */
#include <scsi/scsi_tcq.h>
#include <scsi/iscsi_proto.h>
@@ -147,6 +147,28 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd)
spin_unlock_bh(&cmd->r2t_lock);
}
+int iscsit_wait_for_tag(struct se_session *se_sess, int state, int *cpup)
+{
+ int tag = -1;
+ DEFINE_WAIT(wait);
+ struct sbq_wait_state *ws;
+
+ if (state == TASK_RUNNING)
+ return tag;
+
+ ws = &se_sess->sess_tag_pool.ws[0];
+ for (;;) {
+ prepare_to_wait_exclusive(&ws->wait, &wait, state);
+ if (signal_pending_state(state, current))
+ break;
+ schedule();
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, cpup);
+ }
+
+ finish_wait(&ws->wait, &wait);
+ return tag;
+}
+
/*
* May be called from software interrupt (timer) context for allocating
* iSCSI NopINs.
@@ -155,9 +177,11 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
{
struct iscsi_cmd *cmd;
struct se_session *se_sess = conn->sess->se_sess;
- int size, tag;
+ int size, tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, state);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
+ if (tag < 0)
+ tag = iscsit_wait_for_tag(se_sess, state, &cpu);
if (tag < 0)
return NULL;
@@ -166,6 +190,7 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, int state)
memset(cmd, 0, size);
cmd->se_cmd.map_tag = tag;
+ cmd->se_cmd.map_cpu = cpu;
cmd->conn = conn;
cmd->data_direction = DMA_NONE;
INIT_LIST_HEAD(&cmd->i_conn_node);
@@ -711,7 +736,8 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd)
kfree(cmd->iov_data);
kfree(cmd->text_in_ptr);
- percpu_ida_free(&sess->se_sess->sess_tag_pool, se_cmd->map_tag);
+ sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, se_cmd->map_tag,
+ se_cmd->map_cpu);
}
EXPORT_SYMBOL(iscsit_release_cmd);
diff --git a/drivers/target/sbp/sbp_target.c b/drivers/target/sbp/sbp_target.c
index fb1003921d85..c58f9f04c6be 100644
--- a/drivers/target/sbp/sbp_target.c
+++ b/drivers/target/sbp/sbp_target.c
@@ -926,15 +926,16 @@ static struct sbp_target_request *sbp_mgt_get_req(struct sbp_session *sess,
{
struct se_session *se_sess = sess->se_sess;
struct sbp_target_request *req;
- int tag;
+ int tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return ERR_PTR(-ENOMEM);
req = &((struct sbp_target_request *)se_sess->sess_cmd_map)[tag];
memset(req, 0, sizeof(*req));
req->se_cmd.map_tag = tag;
+ req->se_cmd.map_cpu = cpu;
req->se_cmd.tag = next_orb;
return req;
@@ -1460,7 +1461,8 @@ static void sbp_free_request(struct sbp_target_request *req)
kfree(req->pg_tbl);
kfree(req->cmd_buf);
- percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+ sbitmap_queue_clear(&se_sess->sess_tag_pool, se_cmd->map_tag,
+ se_cmd->map_cpu);
}
static void sbp_mgt_agent_process(struct work_struct *work)
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 4558f2e1fe1b..3103890ed109 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -260,7 +260,8 @@ int transport_alloc_session_tags(struct se_session *se_sess,
}
}
- rc = percpu_ida_init(&se_sess->sess_tag_pool, tag_num);
+ rc = sbitmap_queue_init_node(&se_sess->sess_tag_pool, tag_num, -1,
+ false, GFP_KERNEL, NUMA_NO_NODE);
if (rc < 0) {
pr_err("Unable to init se_sess->sess_tag_pool,"
" tag_num: %u\n", tag_num);
@@ -547,7 +548,7 @@ void transport_free_session(struct se_session *se_sess)
target_put_nacl(se_nacl);
}
if (se_sess->sess_cmd_map) {
- percpu_ida_destroy(&se_sess->sess_tag_pool);
+ sbitmap_queue_free(&se_sess->sess_tag_pool);
kvfree(se_sess->sess_cmd_map);
}
kmem_cache_free(se_sess_cache, se_sess);
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index ec372860106f..b3e3364b7147 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -28,7 +28,6 @@
#include <linux/configfs.h>
#include <linux/ctype.h>
#include <linux/hash.h>
-#include <linux/percpu_ida.h>
#include <asm/unaligned.h>
#include <scsi/scsi_tcq.h>
#include <scsi/libfc.h>
@@ -92,7 +91,8 @@ static void ft_free_cmd(struct ft_cmd *cmd)
if (fr_seq(fp))
fc_seq_release(fr_seq(fp));
fc_frame_free(fp);
- percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+ sbitmap_queue_clear(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag,
+ cmd->se_cmd.map_cpu);
ft_sess_put(sess); /* undo get from lookup at recv */
}
@@ -448,9 +448,9 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
struct ft_cmd *cmd;
struct fc_lport *lport = sess->tport->lport;
struct se_session *se_sess = sess->se_sess;
- int tag;
+ int tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
goto busy;
@@ -458,10 +458,11 @@ static void ft_recv_cmd(struct ft_sess *sess, struct fc_frame *fp)
memset(cmd, 0, sizeof(struct ft_cmd));
cmd->se_cmd.map_tag = tag;
+ cmd->se_cmd.map_cpu = cpu;
cmd->sess = sess;
cmd->seq = fc_seq_assign(lport, fp);
if (!cmd->seq) {
- percpu_ida_free(&se_sess->sess_tag_pool, tag);
+ sbitmap_queue_clear(&se_sess->sess_tag_pool, tag, cpu);
goto busy;
}
cmd->req_frame = fp; /* hold frame during cmd */
diff --git a/drivers/usb/gadget/function/f_tcm.c b/drivers/usb/gadget/function/f_tcm.c
index d78dbb73bde8..b335f4f33bc3 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1071,15 +1071,16 @@ static struct usbg_cmd *usbg_get_cmd(struct f_uas *fu,
{
struct se_session *se_sess = tv_nexus->tvn_se_sess;
struct usbg_cmd *cmd;
- int tag;
+ int tag, cpu;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0)
return ERR_PTR(-ENOMEM);
cmd = &((struct usbg_cmd *)se_sess->sess_cmd_map)[tag];
memset(cmd, 0, sizeof(*cmd));
cmd->se_cmd.map_tag = tag;
+ cmd->se_cmd.map_cpu = cpu;
cmd->se_cmd.tag = cmd->tag = scsi_tag;
cmd->fu = fu;
@@ -1288,7 +1289,8 @@ static void usbg_release_cmd(struct se_cmd *se_cmd)
struct se_session *se_sess = se_cmd->se_sess;
kfree(cmd->data_buf);
- percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+ sbitmap_queue_clear(&se_sess->sess_tag_pool, se_cmd->map_tag,
+ se_cmd->map_cpu);
}
static u32 usbg_sess_get_index(struct se_session *se_sess)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 7ad57094d736..1fadaa39f322 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -46,7 +46,6 @@
#include <linux/virtio_scsi.h>
#include <linux/llist.h>
#include <linux/bitmap.h>
-#include <linux/percpu_ida.h>
#include "vhost.h"
@@ -324,7 +323,8 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd)
}
vhost_scsi_put_inflight(tv_cmd->inflight);
- percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+ sbitmap_queue_clear(&se_sess->sess_tag_pool, se_cmd->map_tag,
+ se_cmd->map_cpu);
}
static u32 vhost_scsi_sess_get_index(struct se_session *se_sess)
@@ -567,7 +567,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
struct se_session *se_sess;
struct scatterlist *sg, *prot_sg;
struct page **pages;
- int tag;
+ int tag, cpu;
tv_nexus = tpg->tpg_nexus;
if (!tv_nexus) {
@@ -576,7 +576,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
}
se_sess = tv_nexus->tvn_se_sess;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0) {
pr_err("Unable to obtain tag for vhost_scsi_cmd\n");
return ERR_PTR(-ENOMEM);
@@ -591,6 +591,7 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
cmd->tvc_prot_sgl = prot_sg;
cmd->tvc_upages = pages;
cmd->tvc_se_cmd.map_tag = tag;
+ cmd->tvc_se_cmd.map_cpu = cpu;
cmd->tvc_tag = scsi_tag;
cmd->tvc_lun = lun;
cmd->tvc_task_attr = task_attr;
diff --git a/drivers/xen/xen-scsiback.c b/drivers/xen/xen-scsiback.c
index 7bc88fd43cfc..d2c71b8608f0 100644
--- a/drivers/xen/xen-scsiback.c
+++ b/drivers/xen/xen-scsiback.c
@@ -654,9 +654,9 @@ static struct vscsibk_pend *scsiback_get_pend_req(struct vscsiif_back_ring *ring
struct scsiback_nexus *nexus = tpg->tpg_nexus;
struct se_session *se_sess = nexus->tvn_se_sess;
struct vscsibk_pend *req;
- int tag, i;
+ int tag, cpu, i;
- tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+ tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu);
if (tag < 0) {
pr_err("Unable to obtain tag for vscsiif_request\n");
return ERR_PTR(-ENOMEM);
@@ -665,6 +665,7 @@ static struct vscsibk_pend *scsiback_get_pend_req(struct vscsiif_back_ring *ring
req = &((struct vscsibk_pend *)se_sess->sess_cmd_map)[tag];
memset(req, 0, sizeof(*req));
req->se_cmd.map_tag = tag;
+ req->se_cmd.map_cpu = cpu;
for (i = 0; i < VSCSI_MAX_GRANTS; i++)
req->grant_handles[i] = SCSIBACK_INVALID_HANDLE;
@@ -1379,7 +1380,8 @@ static void scsiback_release_cmd(struct se_cmd *se_cmd)
{
struct se_session *se_sess = se_cmd->se_sess;
- percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
+ sbitmap_queue_clear(&se_sess->sess_tag_pool, se_cmd->map_tag,
+ se_cmd->map_cpu);
}
static u32 scsiback_sess_get_index(struct se_session *se_sess)
diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
index cf5f3fff1f1a..f2e6abea8490 100644
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -4,6 +4,7 @@
#include <linux/dma-direction.h> /* enum dma_data_direction */
#include <linux/list.h> /* struct list_head */
+#include <linux/sched.h>
#include <linux/socket.h> /* struct sockaddr_storage */
#include <linux/types.h> /* u8 */
#include <scsi/iscsi_proto.h> /* itt_t */
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 9f9f5902af38..cd417b17fee6 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -4,7 +4,7 @@
#include <linux/configfs.h> /* struct config_group */
#include <linux/dma-direction.h> /* enum dma_data_direction */
-#include <linux/percpu_ida.h> /* struct percpu_ida */
+#include <linux/sbitmap.h>
#include <linux/percpu-refcount.h>
#include <linux/semaphore.h> /* struct semaphore */
#include <linux/completion.h>
@@ -454,6 +454,7 @@ struct se_cmd {
int sam_task_attr;
/* Used for se_sess->sess_tag_pool */
unsigned int map_tag;
+ int map_cpu;
/* Transport protocol dependent state, see transport_state_table */
enum transport_state_table t_state;
/* See se_cmd_flags_table */
@@ -607,7 +608,7 @@ struct se_session {
struct list_head sess_wait_list;
spinlock_t sess_cmd_lock;
void *sess_cmd_map;
- struct percpu_ida sess_tag_pool;
+ struct sbitmap_queue sess_tag_pool;
};
struct se_device;
--
2.17.0
^ permalink raw reply related
* [PATCH 0/2] Use sbitmap instead of percpu_ida
From: Matthew Wilcox @ 2018-05-15 16:00 UTC (permalink / raw)
To: linux-kernel, linux-scsi, target-devel, linux1394-devel,
linux-usb, kvm, virtualization, netdev, Juergen Gross,
qla2xxx-upstream, Kent Overstreet, Jens Axboe
Cc: Matthew Wilcox
From: Matthew Wilcox <mawilcox@microsoft.com>
This is a pretty rough-and-ready conversion of the target drivers
from using percpu_ida to sbitmap. It compiles; I don't have a target
setup, so it's completely untested. I haven't tried to do anything
particularly clever here, so it's possible that, for example, the wait
queue in iscsi_target_util could be more clever, like the block layer
uses multiple wait queues to avoid pingpongs. Or maybe we could figure
out a way to not store the CPU that the ID was allocated on, or perhaps
the options I specified to sbitmap_queue_init() are suboptimal.
Patch 2 isn't interesting; it just deletes the implementation. Patch 1
will be where all the action is.
Matthew Wilcox (2):
Convert target drivers to use sbitmap
Remove percpu_ida
drivers/scsi/qla2xxx/qla_target.c | 16 +-
drivers/target/iscsi/iscsi_target_util.c | 34 +-
drivers/target/sbp/sbp_target.c | 8 +-
drivers/target/target_core_transport.c | 5 +-
drivers/target/tcm_fc/tfc_cmd.c | 11 +-
drivers/usb/gadget/function/f_tcm.c | 8 +-
drivers/vhost/scsi.c | 9 +-
drivers/xen/xen-scsiback.c | 8 +-
include/linux/percpu_ida.h | 83 -----
include/target/iscsi/iscsi_target_core.h | 1 +
include/target/target_core_base.h | 5 +-
lib/Makefile | 2 +-
lib/percpu_ida.c | 391 -----------------------
13 files changed, 74 insertions(+), 507 deletions(-)
delete mode 100644 include/linux/percpu_ida.h
delete mode 100644 lib/percpu_ida.c
--
2.17.0
^ permalink raw reply
* Re: [PATCH v3] kvmalloc: always use vmalloc if CONFIG_DEBUG_SG
From: Joonsoo Kim @ 2018-05-15 1:13 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Christoph Lameter, dm-devel, eric.dumazet, mst, netdev,
linux-kernel, Matthew Wilcox, Michal Hocko, Pekka Enberg,
linux-mm, edumazet, David Rientjes, Andrew Morton, virtualization,
David Miller, Vlastimil Babka
In-Reply-To: <alpine.LRH.2.02.1804241428120.8296@file01.intranet.prod.int.rdu2.redhat.com>
Hello, Mikulas.
On Tue, Apr 24, 2018 at 02:41:47PM -0400, Mikulas Patocka wrote:
>
>
> On Tue, 24 Apr 2018, Matthew Wilcox wrote:
>
> > On Tue, Apr 24, 2018 at 08:29:14AM -0400, Mikulas Patocka wrote:
> > >
> > >
> > > On Mon, 23 Apr 2018, Matthew Wilcox wrote:
> > >
> > > > On Mon, Apr 23, 2018 at 08:06:16PM -0400, Mikulas Patocka wrote:
> > > > > Some bugs (such as buffer overflows) are better detected
> > > > > with kmalloc code, so we must test the kmalloc path too.
> > > >
> > > > Well now, this brings up another item for the collective TODO list --
> > > > implement redzone checks for vmalloc. Unless this is something already
> > > > taken care of by kasan or similar.
> > >
> > > The kmalloc overflow testing is also not ideal - it rounds the size up to
> > > the next slab size and detects buffer overflows only at this boundary.
> > >
> > > Some times ago, I made a "kmalloc guard" patch that places a magic number
> > > immediatelly after the requested size - so that it can detect overflows at
> > > byte boundary
> > > ( https://www.redhat.com/archives/dm-devel/2014-September/msg00018.html )
> > >
> > > That patch found a bug in crypto code:
> > > ( http://lkml.iu.edu/hypermail/linux/kernel/1409.1/02325.html )
> >
> > Is it still worth doing this, now we have kasan?
>
> The kmalloc guard has much lower overhead than kasan.
I skimm at your code and it requires rebuilding the kernel.
I think that if rebuilding is required as the same with the KASAN,
using the KASAN is better since it has far better coverage for
detection the bug.
However, I think that if the redzone can be setup tightly
without rebuild, it would be worth implementing. I have an idea to
implement it only for the SLUB. Could I try it? (I'm asking this
because I'm inspired from the above patch.) :)
Or do you wanna try it?
Thanks.
^ permalink raw reply
* Re: [PATCH] gpu: drm: qxl: Adding new typedef vm_fault_t
From: Gerd Hoffmann @ 2018-05-14 9:02 UTC (permalink / raw)
To: Souptick Joarder
Cc: airlied, Gustavo Padovan, Maarten Lankhorst, linux-kernel,
dri-devel, virtualization, Sean Paul, Matthew Wilcox, airlied
In-Reply-To: <CAFqt6zYwCgzBkbeRFgOHGRaYT5EqaAybXQ5_E73cBLEc7Hzv8A@mail.gmail.com>
Hi,
> > So my expectation that a backmerge happens anyway after -rc1/2 is in
> > line with reality, it is just to be delayed this time. I'll stay
> > tuned ;)
>
> Is this patch already merged in drm-misc-next tree ?
Pushed now.
cheers,
Gerd
^ permalink raw reply
* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Michael S. Tsirkin @ 2018-05-11 18:09 UTC (permalink / raw)
To: Randy Dunlap
Cc: alexander.h.duyck, virtio-dev, jiri, kubakici, Sridhar Samudrala,
virtualization, loseweigh, netdev, aaron.f.brown, davem
In-Reply-To: <460f3d8f-b2ec-2118-e296-03f4f9655c5a@infradead.org>
On Mon, May 07, 2018 at 03:39:19PM -0700, Randy Dunlap wrote:
> Hi,
>
> On 05/07/2018 03:10 PM, Sridhar Samudrala wrote:
> >
> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > ---
> > MAINTAINERS | 7 +
> > include/linux/netdevice.h | 16 +
> > include/net/net_failover.h | 52 +++
> > net/Kconfig | 10 +
> > net/core/Makefile | 1 +
> > net/core/net_failover.c | 1044 ++++++++++++++++++++++++++++++++++++++++++++
> > 6 files changed, 1130 insertions(+)
> > create mode 100644 include/net/net_failover.h
> > create mode 100644 net/core/net_failover.c
>
>
> > diff --git a/net/Kconfig b/net/Kconfig
> > index b62089fb1332..0540856676de 100644
> > --- a/net/Kconfig
> > +++ b/net/Kconfig
> > @@ -429,6 +429,16 @@ config MAY_USE_DEVLINK
> > config PAGE_POOL
> > bool
> >
> > +config NET_FAILOVER
> > + tristate "Failover interface"
> > + default m
>
> Need some justification for default m (as opposed to n).
Or one can just leave the default line out.
^ permalink raw reply
* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Michael S. Tsirkin @ 2018-05-11 17:15 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, jiri, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <e8454b29-d66b-9e20-a887-cb312a63847e@intel.com>
On Mon, May 07, 2018 at 05:24:27PM -0700, Samudrala, Sridhar wrote:
>
>
> On 5/7/2018 4:53 PM, Stephen Hemminger wrote:
> > On Mon, 7 May 2018 15:10:44 -0700
> > Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >
> > > +static struct net_device *net_failover_get_bymac(u8 *mac,
> > > + struct net_failover_ops **ops)
> > > +{
> > > + struct net_device *failover_dev;
> > > + struct net_failover *failover;
> > > +
> > > + spin_lock(&net_failover_lock);
> > > + list_for_each_entry(failover, &net_failover_list, list) {
> > > + failover_dev = rtnl_dereference(failover->failover_dev);
> > > + if (ether_addr_equal(failover_dev->perm_addr, mac)) {
> > > + *ops = rtnl_dereference(failover->ops);
> > > + spin_unlock(&net_failover_lock);
> > > + return failover_dev;
> > > + }
> > > + }
> > > + spin_unlock(&net_failover_lock);
> > > + return NULL;
> > > +}
> > This is broken if non-ethernet devices such as Infiniband are present.
>
> There is check to make sure that a slave and failover devices are of the same type in
> net_failover_slave_register()
>
> failover_dev = net_failover_get_bymac(slave_dev->perm_addr, &nfo_ops);
> if (!failover_dev)
> goto done;
>
> if (failover_dev->type != slave_dev->type)
> goto done;
>
> Do you think this is not good enough? I had an explicit check for ARPHRD_ETHER in
> earlier patchsets, but removed it based on Jiri's comment.
Right but how is ether_addr_equal supposed to work if types are
identical but not ethernet?
This can also benefit from a comment referring to the check in
net_failover_slave_register.
--
MST
^ permalink raw reply
* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-11 15:43 UTC (permalink / raw)
To: Randy Dunlap, mst, stephen, davem, netdev, virtualization,
virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
jasowang, loseweigh, jiri, aaron.f.brown
In-Reply-To: <460f3d8f-b2ec-2118-e296-03f4f9655c5a@infradead.org>
On 5/7/2018 3:39 PM, Randy Dunlap wrote:
> Hi,
>
> On 05/07/2018 03:10 PM, Sridhar Samudrala wrote:
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> ---
>> MAINTAINERS | 7 +
>> include/linux/netdevice.h | 16 +
>> include/net/net_failover.h | 52 +++
>> net/Kconfig | 10 +
>> net/core/Makefile | 1 +
>> net/core/net_failover.c | 1044 ++++++++++++++++++++++++++++++++++++++++++++
>> 6 files changed, 1130 insertions(+)
>> create mode 100644 include/net/net_failover.h
>> create mode 100644 net/core/net_failover.c
>
>> diff --git a/net/Kconfig b/net/Kconfig
>> index b62089fb1332..0540856676de 100644
>> --- a/net/Kconfig
>> +++ b/net/Kconfig
>> @@ -429,6 +429,16 @@ config MAY_USE_DEVLINK
>> config PAGE_POOL
>> bool
>>
>> +config NET_FAILOVER
>> + tristate "Failover interface"
>> + default m
> Need some justification for default m (as opposed to n).
default n should be fine. It will get selected automatically when virtio_net or
netvsc are enabled. will fix in the next revision.
>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v10 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-05-11 15:40 UTC (permalink / raw)
To: Stephen Hemminger
Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
virtualization, loseweigh, aaron.f.brown, davem
In-Reply-To: <20180507164632.4f6c2eef@xeon-e3>
On 5/7/2018 4:46 PM, Stephen Hemminger wrote:
> On Mon, 7 May 2018 15:10:44 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> This provides a generic interface for paravirtual drivers to listen
>> for netdev register/unregister/link change events from pci ethernet
>> devices with the same MAC and takeover their datapath. The notifier and
>> event handling code is based on the existing netvsc implementation.
>>
>> It exposes 2 sets of interfaces to the paravirtual drivers.
>> 1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>> the failover module provides interfaces to create/destroy additional
>> master netdev and all the slave events are managed internally.
>> net_failover_create()
>> net_failover_destroy()
>> A failover netdev is created that acts a master device and controls 2
>> slave devices. The original virtio_net netdev is registered as 'standby'
>> netdev and a passthru/vf device with the same MAC gets registered as
>> 'primary' netdev. Both 'standby' and 'failover' netdevs are associated
>> with the same 'pci' device. The user accesses the network interface via
>> 'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>> default for transmits when it is available with link up and running.
>> 2. For existing netvsc driver that uses 2 netdev model, no master netdev
>> is created. The paravirtual driver registers each instance of netvsc
>> as a 'failover' netdev along with a set of ops to manage the slave
>> events. There is no 'standby' netdev in this model. A passthru/vf device
>> with the same MAC gets registered as 'primary' netdev.
>> net_failover_register()
>> net_failover_unregister()
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> You are conflating the net_failover device (3 device model) with
> the generic network failover infrastructure into one file. There should be two
> seperate files net/core/failover.c and drivers/net/failover.c which splits
> the work into two parts (and acts a check for the api).
OK. I started splitting net_failover.c into 2 files.
net/core/failover.c (CONFIG_FAILOVER)
- implements the generic failover infrastructure that exports failover_register(),
failover_unregister() and failover_slave_unregister() as the API that will be
used by netvsc and the net_failover drivers(3 netdev model)
drivers/net/net_failover.c (CONFIG_NET_FAILOVER)
- implements the net_failover netdev as the upper dev for the 3-netdev model and
exports net_failover_create() and net_failover_destroy() as the API that is
used by virtio_net.
HYPERV_NET and NET_FAILOVER selects FAILOVER
VIRTIO_NET selects NET_FAILOVER
Does this look good? Any better suggestion for the prefix to be used for generic
network failover api rather than 'failover'?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v1] virtio: support VIRTIO_F_IO_BARRIER
From: Tiwei Bie @ 2018-05-10 15:36 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-dev, mst, linux-kernel, virtualization, zhihong.wang,
pbonzini
In-Reply-To: <20180510150258.GB9308@stefanha-x1.localdomain>
On Thu, May 10, 2018 at 04:02:58PM +0100, Stefan Hajnoczi wrote:
> On Thu, May 10, 2018 at 06:39:41PM +0800, Tiwei Bie wrote:
> > On Thu, May 10, 2018 at 10:53:17AM +0100, Stefan Hajnoczi wrote:
> > > On Fri, May 04, 2018 at 12:59:15PM +0800, Tiwei Bie wrote:
> > > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > > index 308e2096291f..9fb519a9df28 100644
> > > > --- a/include/uapi/linux/virtio_config.h
> > > > +++ b/include/uapi/linux/virtio_config.h
> > > > @@ -49,7 +49,7 @@
> > > > * transport being used (eg. virtio_ring), the rest are per-device feature
> > > > * bits. */
> > > > #define VIRTIO_TRANSPORT_F_START 28
> > > > -#define VIRTIO_TRANSPORT_F_END 34
> > > > +#define VIRTIO_TRANSPORT_F_END 37
> > >
> > > Have you updated "2.2 Feature Bits" in the VIRTIO spec?
> > >
> > > In 1.0 it says:
> > >
> > > 24 to 32
> > > Feature bits reserved for extensions to the queue and feature negotiation mechanisms
> > >
> > > This information is out-of-date.
> > >
> >
> > No. In the latest spec draft, it's 24 to 33. And it
> > becomes out-of-date since VIRTIO_F_RING_PACKED(34)
> > and VIRTIO_F_IN_ORDER(35) were introduced. Do you
> > want me to update it in the IO_BARRIER patch or do
> > you want me to update it in a new patch?
>
> Please update it to 37 in the IO_BARRIER patch.
Will do it! Thanks!
Best regards,
Tiwei Bie
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v1] virtio: support VIRTIO_F_IO_BARRIER
From: Stefan Hajnoczi @ 2018-05-10 15:02 UTC (permalink / raw)
To: Tiwei Bie
Cc: virtio-dev, mst, linux-kernel, virtualization, zhihong.wang,
pbonzini
In-Reply-To: <20180510103941.znxka65iluybktuq@debian>
[-- Attachment #1.1: Type: text/plain, Size: 1269 bytes --]
On Thu, May 10, 2018 at 06:39:41PM +0800, Tiwei Bie wrote:
> On Thu, May 10, 2018 at 10:53:17AM +0100, Stefan Hajnoczi wrote:
> > On Fri, May 04, 2018 at 12:59:15PM +0800, Tiwei Bie wrote:
> > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > index 308e2096291f..9fb519a9df28 100644
> > > --- a/include/uapi/linux/virtio_config.h
> > > +++ b/include/uapi/linux/virtio_config.h
> > > @@ -49,7 +49,7 @@
> > > * transport being used (eg. virtio_ring), the rest are per-device feature
> > > * bits. */
> > > #define VIRTIO_TRANSPORT_F_START 28
> > > -#define VIRTIO_TRANSPORT_F_END 34
> > > +#define VIRTIO_TRANSPORT_F_END 37
> >
> > Have you updated "2.2 Feature Bits" in the VIRTIO spec?
> >
> > In 1.0 it says:
> >
> > 24 to 32
> > Feature bits reserved for extensions to the queue and feature negotiation mechanisms
> >
> > This information is out-of-date.
> >
>
> No. In the latest spec draft, it's 24 to 33. And it
> becomes out-of-date since VIRTIO_F_RING_PACKED(34)
> and VIRTIO_F_IN_ORDER(35) were introduced. Do you
> want me to update it in the IO_BARRIER patch or do
> you want me to update it in a new patch?
Please update it to 37 in the IO_BARRIER patch.
Stefan
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox