From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
Andy Lutomirski <luto@amacapital.net>,
linux390@de.ibm.com, virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/3] virtio_ring: Use DMA APIs
Date: Wed, 27 Aug 2014 13:35:52 -0400 [thread overview]
Message-ID: <20140827173552.GC5379@laptop.dumpdata.com> (raw)
In-Reply-To: <53FD88E0.9020208@de.ibm.com>
On Wed, Aug 27, 2014 at 09:29:36AM +0200, Christian Borntraeger wrote:
> On 26/08/14 23:17, Andy Lutomirski wrote:
> > virtio_ring currently sends the device (usually a hypervisor)
> > physical addresses of its I/O buffers. This is okay when DMA
> > addresses and physical addresses are the same thing, but this isn't
> > always the case. For example, this never works on Xen guests, and
> > it is likely to fail if a physical "virtio" device ever ends up
> > behind an IOMMU or swiotlb.
> >
> > The immediate use case for me is to enable virtio on Xen guests.
> > For that to work, we need this fix as well as a corresponding
> > fix to virtio_pci or to another driver.
> >
> > With this patch, virtfs survives kmemleak and CONFIG_DMA_API_DEBUG.
> > virtio-net warns (correctly) about DMA from the stack in
> > virtnet_set_rx_mode.
> >
> > This breaks s390's defconfig. The default configuration for s390
> > does virtio through a KVM-specific interface, but that configuration
> > does not support DMA. I could modify this patch to stub out the DMA
> > API calls if !CONFIG_HAS_DMA, but it seems to me that it would be
> > much nicer to make s390 support DMA unconditionally.
>
> s390 has no DMA per se. Newest systems have a PCI-like I/O attach in
> addition to the classic channel I/O, and for that we enable the DMA code
> just for that transport to be able to reuse some of the existing PCI
> drivers. (only some because, we differ in some aspects from how PCI
> looks like) But the architecture itself (and the virtio interface) does
> not provide the DMA interface as you know it:
Don't most of those DMA_API end up then being nops? As in, we do
have in the dma-api file the #ifdef case when a platform does not do
DMA which ends up with all functions stubbed out.
>
> In essence this patch is a no-go for s390.
Is that due to possible compilation?
>
> I am also aksing myself, if this makes virtio-ring more expensive?
>
> Christian
>
>
>
> >
> > It's actually unclear to me whether this can be fixed without either
> > s390 arch support or a special case for s390 in virtio_ring: on
> > brief inspection of s390's DMA code, it seems to assume that
> > everything goes through a PCI IOMMU, which is presumably not the
> > case for virtio.
>
>
> >
> > Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> > Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> > ---
> > drivers/virtio/Kconfig | 1 +
> > drivers/virtio/virtio_ring.c | 116 +++++++++++++++++++++++++++++++++++--------
> > 2 files changed, 95 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index c6683f2e396c..b6f3acc61153 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -1,5 +1,6 @@
> > config VIRTIO
> > tristate
> > + depends on HAS_DMA
> > ---help---
> > This option is selected by any driver which implements the virtio
> > bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index d356a701c9c2..6a78e2fd8e63 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -24,6 +24,7 @@
> > #include <linux/module.h>
> > #include <linux/hrtimer.h>
> > #include <linux/kmemleak.h>
> > +#include <linux/dma-mapping.h>
> >
> > #ifdef DEBUG
> > /* For development, we want to crash whenever the ring is screwed. */
> > @@ -54,6 +55,12 @@
> > #define END_USE(vq)
> > #endif
> >
> > +struct vring_desc_state
> > +{
> > + void *data; /* Data for callback. */
> > + struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > +};
> > +
> > struct vring_virtqueue
> > {
> > struct virtqueue vq;
> > @@ -93,12 +100,45 @@ struct vring_virtqueue
> > ktime_t last_add_time;
> > #endif
> >
> > - /* Tokens for callbacks. */
> > - void *data[];
> > + /* Per-descriptor state. */
> > + struct vring_desc_state desc_state[];
> > };
> >
> > #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
> >
> > +/* Helper to map one sg entry, since we can't use dma_map_sg. */
> > +static dma_addr_t dma_map_one_sg(struct vring_virtqueue *vq,
> > + struct scatterlist *sg,
> > + enum dma_data_direction direction)
> > +{
> > + return dma_map_page(vq->vq.vdev->dev.parent,
> > + sg_page(sg), sg->offset, sg->length, direction);
> > +}
> > +
> > +static void unmap_one(struct vring_virtqueue *vq, struct vring_desc *desc)
> > +{
> > + if (desc->flags & VRING_DESC_F_INDIRECT) {
> > + dma_unmap_single(vq->vq.vdev->dev.parent,
> > + desc->addr, desc->len,
> > + (desc->flags & VRING_DESC_F_WRITE) ?
> > + DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > + } else {
> > + dma_unmap_page(vq->vq.vdev->dev.parent,
> > + desc->addr, desc->len,
> > + (desc->flags & VRING_DESC_F_WRITE) ?
> > + DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > + }
> > +}
> > +
> > +static void unmap_indirect(struct vring_virtqueue *vq, struct vring_desc *desc,
> > + int total)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < total; i++)
> > + unmap_one(vq, &desc[i]);
> > +}
> > +
> > /* Set up an indirect table of descriptors and add it to the queue. */
> > static inline int vring_add_indirect(struct vring_virtqueue *vq,
> > struct scatterlist *sgs[],
> > @@ -132,7 +172,11 @@ static inline int vring_add_indirect(struct vring_virtqueue *vq,
> > if (!sg)
> > break;
> > desc[i].flags = VRING_DESC_F_NEXT;
> > - desc[i].addr = sg_phys(sg);
> > + desc[i].addr =
> > + dma_map_one_sg(vq, sg, DMA_TO_DEVICE);
> > + if (dma_mapping_error(vq->vq.vdev->dev.parent,
> > + desc[i].addr))
> > + goto unmap_free;
> > desc[i].len = sg->length;
> > desc[i].next = i+1;
> > i++;
> > @@ -143,7 +187,11 @@ static inline int vring_add_indirect(struct vring_virtqueue *vq,
> > if (!sg)
> > break;
> > desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> > - desc[i].addr = sg_phys(sg);
> > + desc[i].addr =
> > + dma_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > + if (dma_mapping_error(vq->vq.vdev->dev.parent,
> > + desc[i].addr))
> > + goto unmap_free;
> > desc[i].len = sg->length;
> > desc[i].next = i+1;
> > i++;
> > @@ -161,15 +209,27 @@ static inline int vring_add_indirect(struct vring_virtqueue *vq,
> > /* Use a single buffer which doesn't continue */
> > head = vq->free_head;
> > vq->vring.desc[head].flags = VRING_DESC_F_INDIRECT;
> > - vq->vring.desc[head].addr = virt_to_phys(desc);
> > - /* kmemleak gives a false positive, as it's hidden by virt_to_phys */
> > - kmemleak_ignore(desc);
> > + vq->vring.desc[head].addr =
> > + dma_map_single(vq->vq.vdev->dev.parent,
> > + desc, i * sizeof(struct vring_desc),
> > + DMA_TO_DEVICE);
> > + if (dma_mapping_error(vq->vq.vdev->dev.parent,
> > + vq->vring.desc[head].addr))
> > + goto unmap_free;
> > vq->vring.desc[head].len = i * sizeof(struct vring_desc);
> >
> > /* Update free pointer */
> > vq->free_head = vq->vring.desc[head].next;
> >
> > + /* Save the indirect block */
> > + vq->desc_state[head].indir_desc = desc;
> > +
> > return head;
> > +
> > +unmap_free:
> > + unmap_indirect(vq, desc, i);
> > + kfree(desc);
> > + return -ENOMEM;
> > }
> >
> > static inline int virtqueue_add(struct virtqueue *_vq,
> > @@ -244,7 +304,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > if (!sg)
> > break;
> > vq->vring.desc[i].flags = VRING_DESC_F_NEXT;
> > - vq->vring.desc[i].addr = sg_phys(sg);
> > + vq->vring.desc[i].addr =
> > + dma_map_one_sg(vq, sg, DMA_TO_DEVICE);
> > vq->vring.desc[i].len = sg->length;
> > prev = i;
> > i = vq->vring.desc[i].next;
> > @@ -255,7 +316,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> > if (!sg)
> > break;
> > vq->vring.desc[i].flags = VRING_DESC_F_NEXT|VRING_DESC_F_WRITE;
> > - vq->vring.desc[i].addr = sg_phys(sg);
> > + vq->vring.desc[i].addr =
> > + dma_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > vq->vring.desc[i].len = sg->length;
> > prev = i;
> > i = vq->vring.desc[i].next;
> > @@ -269,7 +331,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> >
> > add_head:
> > /* Set token. */
> > - vq->data[head] = data;
> > + vq->desc_state[head].data = data;
> >
> > /* Put entry in available array (but don't update avail->idx until they
> > * do sync). */
> > @@ -470,22 +532,33 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
> > unsigned int i;
> >
> > /* Clear data ptr. */
> > - vq->data[head] = NULL;
> > + vq->desc_state[head].data = NULL;
> >
> > /* Put back on free list: find end */
> > i = head;
> >
> > /* Free the indirect table */
> > - if (vq->vring.desc[i].flags & VRING_DESC_F_INDIRECT)
> > - kfree(phys_to_virt(vq->vring.desc[i].addr));
> > + if (vq->desc_state[head].indir_desc) {
> > + u32 len = vq->vring.desc[i].len;
> > +
> > + BUG_ON(!(vq->vring.desc[i].flags & VRING_DESC_F_INDIRECT));
> > + BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > + unmap_indirect(vq, vq->desc_state[head].indir_desc,
> > + len / sizeof(struct vring_desc));
> > + kfree(vq->desc_state[head].indir_desc);
> > + vq->desc_state[head].indir_desc = NULL;
> > + }
> >
> > while (vq->vring.desc[i].flags & VRING_DESC_F_NEXT) {
> > + unmap_one(vq, &vq->vring.desc[i]);
> > i = vq->vring.desc[i].next;
> > vq->vq.num_free++;
> > }
> >
> > + unmap_one(vq, &vq->vring.desc[i]);
> > vq->vring.desc[i].next = vq->free_head;
> > vq->free_head = head;
> > +
> > /* Plus final descriptor */
> > vq->vq.num_free++;
> > }
> > @@ -542,13 +615,13 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> > BAD_RING(vq, "id %u out of range\n", i);
> > return NULL;
> > }
> > - if (unlikely(!vq->data[i])) {
> > + if (unlikely(!vq->desc_state[i].data)) {
> > BAD_RING(vq, "id %u is not a head!\n", i);
> > return NULL;
> > }
> >
> > /* detach_buf clears data, so grab it now. */
> > - ret = vq->data[i];
> > + ret = vq->desc_state[i].data;
> > detach_buf(vq, i);
> > vq->last_used_idx++;
> > /* If we expect an interrupt for the next entry, tell host
> > @@ -709,10 +782,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> > START_USE(vq);
> >
> > for (i = 0; i < vq->vring.num; i++) {
> > - if (!vq->data[i])
> > + if (!vq->desc_state[i].data)
> > continue;
> > /* detach_buf clears data, so grab it now. */
> > - buf = vq->data[i];
> > + buf = vq->desc_state[i].data;
> > detach_buf(vq, i);
> > vq->vring.avail->idx--;
> > END_USE(vq);
> > @@ -765,7 +838,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> > return NULL;
> > }
> >
> > - vq = kmalloc(sizeof(*vq) + sizeof(void *)*num, GFP_KERNEL);
> > + vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
> > + GFP_KERNEL);
> > if (!vq)
> > return NULL;
> >
> > @@ -795,11 +869,9 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> >
> > /* Put everything in free lists. */
> > vq->free_head = 0;
> > - for (i = 0; i < num-1; i++) {
> > + for (i = 0; i < num-1; i++)
> > vq->vring.desc[i].next = i+1;
> > - vq->data[i] = NULL;
> > - }
> > - vq->data[i] = NULL;
> > + memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
> >
> > return &vq->vq;
> > }
> >
>
next prev parent reply other threads:[~2014-08-27 17:35 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-26 21:16 [PATCH 0/3] virtio: Clean up scatterlists and use the DMA API Andy Lutomirski
2014-08-26 21:17 ` [PATCH 1/3] virtio_ring: Remove sg_next indirection Andy Lutomirski
2014-09-01 0:58 ` Rusty Russell
2014-09-01 1:42 ` Andy Lutomirski
2014-09-01 6:59 ` Michael S. Tsirkin
2014-09-01 17:15 ` Andy Lutomirski
2014-08-26 21:17 ` [PATCH 2/3] virtio_ring: Use DMA APIs Andy Lutomirski
2014-08-27 7:29 ` Christian Borntraeger
2014-08-27 17:35 ` Konrad Rzeszutek Wilk [this message]
2014-08-27 17:39 ` Andy Lutomirski
2014-08-26 21:17 ` [PATCH 3/3] virtio_pci: Use the DMA API for virtqueues Andy Lutomirski
2014-08-27 17:32 ` Konrad Rzeszutek Wilk
2014-08-27 17:35 ` Andy Lutomirski
2014-08-27 18:52 ` Konrad Rzeszutek Wilk
2014-08-27 6:46 ` [PATCH 0/3] virtio: Clean up scatterlists and use the DMA API Stefan Hajnoczi
2014-08-27 15:11 ` Andy Lutomirski
2014-08-27 15:45 ` Michael S. Tsirkin
2014-08-27 15:50 ` Andy Lutomirski
2014-08-27 16:13 ` Christopher Covington
2014-08-27 16:19 ` Andy Lutomirski
2014-08-27 17:34 ` Christopher Covington
2014-08-27 17:37 ` Andy Lutomirski
2014-08-27 17:50 ` Christopher Covington
2014-08-27 20:52 ` Christian Borntraeger
2014-08-27 17:27 ` Konrad Rzeszutek Wilk
2014-08-27 18:10 ` Stefan Hajnoczi
2014-08-27 18:58 ` Konrad Rzeszutek Wilk
2014-08-27 7:54 ` Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140827173552.GC5379@laptop.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=borntraeger@de.ibm.com \
--cc=linux-s390@vger.kernel.org \
--cc=linux390@de.ibm.com \
--cc=luto@amacapital.net \
--cc=mst@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox