* Re: BUG_ON in virtio-ring.c [not found] <CAPM=9tyLrJ8VAdGCw_1guRS-ogCPE1Qp7faE_nrx_dAm2gvHLQ@mail.gmail.com> @ 2013-05-27 1:42 ` Rusty Russell 2013-05-27 3:29 ` Dave Airlie [not found] ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com> 0 siblings, 2 replies; 7+ messages in thread From: Rusty Russell @ 2013-05-27 1:42 UTC (permalink / raw) To: Dave Airlie, LKML; +Cc: virtualization Dave Airlie <airlied@gmail.com> writes: > Hi Rusty, > > current virtio-ring.c has a BUG_ON in virtqueue_add that checks > total_sg > vg->vring.num, however I'm not sure it really is 100% > correct. > > If I have an indirect ring and I'm adding sgs to it and the host is > delayed (say I've got a thread consuming things from the vring and its > off doing something interesting), > I'd really like to get ENOSPC back from virtqueue_add. However if the > indirect addition fails due to free_sg being 0, we hit the BUG_ON > before we ever get to the ENOSPC check. It is correct for the moment: drivers can't assume indirect buffer support in the transport. BUT for a new device, we could say "this depends on indirect descriptor support", put the appropriate check in the device init, and then remove the BUG_ON(). > the BUG_ON is quite valid in the no indirect case, but when we have > indirect buffers it doesn't seem like it always makes sense. > > Not sure best way to fix it, I'm just a virtio newbie :) Mailing me and the list was the right thing, since this raises question of the spec as well as the Linux implementation. Good luck! Rusty. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c 2013-05-27 1:42 ` BUG_ON in virtio-ring.c Rusty Russell @ 2013-05-27 3:29 ` Dave Airlie [not found] ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com> 1 sibling, 0 replies; 7+ messages in thread From: Dave Airlie @ 2013-05-27 3:29 UTC (permalink / raw) To: Rusty Russell; +Cc: LKML, virtualization >> correct. >> >> If I have an indirect ring and I'm adding sgs to it and the host is >> delayed (say I've got a thread consuming things from the vring and its >> off doing something interesting), >> I'd really like to get ENOSPC back from virtqueue_add. However if the >> indirect addition fails due to free_sg being 0, we hit the BUG_ON >> before we ever get to the ENOSPC check. > > It is correct for the moment: drivers can't assume indirect buffer > support in the transport. > > BUT for a new device, we could say "this depends on indirect descriptor > support", put the appropriate check in the device init, and then remove > the BUG_ON(). But if the transport has indirect buffer support, can it change its mind at runtime? In this case we have vq->indirect set, but the device has run out of free buffers, but it isn't a case that in+out would overflow it if it had free buffers since it would use an indirect and succeed. Getting -ENOSPC is definitely what should happen from what I can see, not a BUG_ON, I should get a BUG_ON only if the device reports no indirect support. Dave. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>]
* Re: BUG_ON in virtio-ring.c [not found] ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com> @ 2013-05-27 5:38 ` Rusty Russell 2014-10-07 15:46 ` Alexey Lapitsky [not found] ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com> 0 siblings, 2 replies; 7+ messages in thread From: Rusty Russell @ 2013-05-27 5:38 UTC (permalink / raw) To: Dave Airlie; +Cc: LKML, virtualization Dave Airlie <airlied@gmail.com> writes: >>> correct. >>> >>> If I have an indirect ring and I'm adding sgs to it and the host is >>> delayed (say I've got a thread consuming things from the vring and its >>> off doing something interesting), >>> I'd really like to get ENOSPC back from virtqueue_add. However if the >>> indirect addition fails due to free_sg being 0, we hit the BUG_ON >>> before we ever get to the ENOSPC check. >> >> It is correct for the moment: drivers can't assume indirect buffer >> support in the transport. >> >> BUT for a new device, we could say "this depends on indirect descriptor >> support", put the appropriate check in the device init, and then remove >> the BUG_ON(). > > But if the transport has indirect buffer support, can it change its > mind at runtime? It's a layering violation. The current rule is simple: A driver should never submit a buffer which can't fit in the ring. This has the nice property that OOM (ie. indirect buffer alloction fail) just slows things down, doesn't break things. > In this case we have vq->indirect set, but the device has run out of > free buffers, > but it isn't a case that in+out would overflow it if it had free > buffers since it would use > an indirect and succeed. OK, but when do you re-xmit? What if the ring is empty, and you submitted a buffer that needs indirect? You won't get interrupted again, because the device has consumed all the buffers. You need to have your own timer or something equally hackish. > Getting -ENOSPC is definitely what should happen from what I can see, > not a BUG_ON, > I should get a BUG_ON only if the device reports no indirect support. I think we should return -ENOMEM if we can't indirect because of failed allocation and it doesn't fit in the ring, ie: This: BUG_ON(total_sg > vq->vring.num); BUG_ON(total_sg == 0); if (vq->vq.num_free < total_sg) { pr_debug("Can't add buf len %i - avail = %i\n", total_sg, vq->vq.num_free); /* FIXME: for historical reasons, we force a notify here if * there are outgoing parts to the buffer. Presumably the * host should service the ring ASAP. */ if (out_sgs) vq->notify(&vq->vq); END_USE(vq); return -ENOSPC; } Becomes (untested): BUG_ON(total_sg == 0); if (vq->vq.num_free < total_sg) { if (total_sg > vq->vring.num) { BUG_ON(!vq->indirect); /* Return -ENOMEM only if we have nothing else to do */ if (vq->vq.num_free == vq->vring.num) return -ENOMEM; } pr_debug("Can't add buf len %i - avail = %i\n", total_sg, vq->vq.num_free); /* FIXME: for historical reasons, we force a notify here if * there are outgoing parts to the buffer. Presumably the * host should service the ring ASAP. */ if (out_sgs) vq->notify(&vq->vq); END_USE(vq); return -ENOSPC; } Cheers, Rusty. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c 2013-05-27 5:38 ` Rusty Russell @ 2014-10-07 15:46 ` Alexey Lapitsky [not found] ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com> 1 sibling, 0 replies; 7+ messages in thread From: Alexey Lapitsky @ 2014-10-07 15:46 UTC (permalink / raw) To: Rusty Russell; +Cc: Dave Airlie, LKML, virtualization Hi, I'm hitting this bug with both ext4 and btrfs. Here's an example of the backtrace: https://gist.github.com/vzctl/e888a821333979120932 I tried raising this BUG only for direct ring and it solved the problem: - BUG_ON(total_sg > vq->vring.num); + BUG_ON(total_sg > vq->vring.num && !vq->indirect); Shall I submit the patch or is a more elaborate fix required? -- Alexey On Mon, May 27, 2013 at 7:38 AM, Rusty Russell <rusty@rustcorp.com.au> wrote: > Dave Airlie <airlied@gmail.com> writes: >>>> correct. >>>> >>>> If I have an indirect ring and I'm adding sgs to it and the host is >>>> delayed (say I've got a thread consuming things from the vring and its >>>> off doing something interesting), >>>> I'd really like to get ENOSPC back from virtqueue_add. However if the >>>> indirect addition fails due to free_sg being 0, we hit the BUG_ON >>>> before we ever get to the ENOSPC check. >>> >>> It is correct for the moment: drivers can't assume indirect buffer >>> support in the transport. >>> >>> BUT for a new device, we could say "this depends on indirect descriptor >>> support", put the appropriate check in the device init, and then remove >>> the BUG_ON(). >> >> But if the transport has indirect buffer support, can it change its >> mind at runtime? > > It's a layering violation. The current rule is simple: > > A driver should never submit a buffer which can't fit in the > ring. > > This has the nice property that OOM (ie. indirect buffer alloction > fail) just slows things down, doesn't break things. > >> In this case we have vq->indirect set, but the device has run out of >> free buffers, >> but it isn't a case that in+out would overflow it if it had free >> buffers since it would use >> an indirect and succeed. > > OK, but when do you re-xmit? What if the ring is empty, and you > submitted a buffer that needs indirect? You won't get interrupted > again, because the device has consumed all the buffers. You need to > have your own timer or something equally hackish. > >> Getting -ENOSPC is definitely what should happen from what I can see, >> not a BUG_ON, >> I should get a BUG_ON only if the device reports no indirect support. > > I think we should return -ENOMEM if we can't indirect because of failed > allocation and it doesn't fit in the ring, ie: > > This: > BUG_ON(total_sg > vq->vring.num); > BUG_ON(total_sg == 0); > > if (vq->vq.num_free < total_sg) { > pr_debug("Can't add buf len %i - avail = %i\n", > total_sg, vq->vq.num_free); > /* FIXME: for historical reasons, we force a notify here if > * there are outgoing parts to the buffer. Presumably the > * host should service the ring ASAP. */ > if (out_sgs) > vq->notify(&vq->vq); > END_USE(vq); > return -ENOSPC; > } > > Becomes (untested): > > BUG_ON(total_sg == 0); > > if (vq->vq.num_free < total_sg) { > if (total_sg > vq->vring.num) { > BUG_ON(!vq->indirect); > /* Return -ENOMEM only if we have nothing else to do */ > if (vq->vq.num_free == vq->vring.num) > return -ENOMEM; > } > pr_debug("Can't add buf len %i - avail = %i\n", > total_sg, vq->vq.num_free); > /* FIXME: for historical reasons, we force a notify here if > * there are outgoing parts to the buffer. Presumably the > * host should service the ring ASAP. */ > if (out_sgs) > vq->notify(&vq->vq); > END_USE(vq); > return -ENOSPC; > } > > Cheers, > Rusty. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>]
* Re: BUG_ON in virtio-ring.c [not found] ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com> @ 2014-10-13 5:39 ` Rusty Russell 2014-11-04 14:44 ` Alexey Lapitsky 0 siblings, 1 reply; 7+ messages in thread From: Rusty Russell @ 2014-10-13 5:39 UTC (permalink / raw) To: Alexey Lapitsky; +Cc: Dave Airlie, LKML, virtualization Alexey Lapitsky <lex.public@gmail.com> writes: > Hi, > > I'm hitting this bug with both ext4 and btrfs. > > Here's an example of the backtrace: > https://gist.github.com/vzctl/e888a821333979120932 > > I tried raising this BUG only for direct ring and it solved the problem: > > - BUG_ON(total_sg > vq->vring.num); > + BUG_ON(total_sg > vq->vring.num && !vq->indirect); > > Shall I submit the patch or is a more elaborate fix required? This is wrong. It looks like the block layer is spending down more sg elements than we have ring entries, yet we tell it not to: blk_queue_max_segments(q, vblk->sg_elems-2); If you apply this patch, what happens? Here it prints: [ 0.616564] virtqueue elements = 128, max_segments = 126 (1 queues) [ 0.621244] vda: vda1 vda2 < vda5 > [ 0.632290] virtqueue elements = 128, max_segments = 126 (1 queues) [ 0.683526] vdb: vdb1 vdb2 < vdb5 > Cheers, Rusty. diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 0a581400de0f..aa9d4d313587 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -683,6 +683,13 @@ static int virtblk_probe(struct virtio_device *vdev) /* We can handle whatever the host told us to handle. */ blk_queue_max_segments(q, vblk->sg_elems-2); + printk("virtqueue elements = %u, max_segments = %u (%u queues)", + virtqueue_get_vring_size(vblk->vqs[0].vq), + vblk->sg_elems-2, + vblk->num_vqs); + + BUG_ON(vblk->sg_elems-2 > virtqueue_get_vring_size(vblk->vqs[0].vq)); + /* No need to bounce any requests */ blk_queue_bounce_limit(q, BLK_BOUNCE_ANY); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c 2014-10-13 5:39 ` Rusty Russell @ 2014-11-04 14:44 ` Alexey Lapitsky 2014-12-18 3:13 ` Rusty Russell 0 siblings, 1 reply; 7+ messages in thread From: Alexey Lapitsky @ 2014-11-04 14:44 UTC (permalink / raw) To: Rusty Russell; +Cc: Dave Airlie, LKML, virtualization Hi, Sorry for the long delay. It prints exactly the same: [ 3.792033] virtqueue elements = 128, max_segments = 126 (1 queues) [ 3.802191] vda: vda1 vda2 < vda5 > A little bit more about my setup (if it helps): It's a qemu-system-x86_64 kvm instance with 16 cores and 10G of RAM. I can reproduce the bug every time with mkfs.btrfs on a 10GB LVM volume (right after the reboot). I have almost no knowledge of vring / virtio. Is it correct that we need just one sg_elem entry in the vq->vring if vq->indirect flag is set? That's what I thought when applying the "BUG_ON(total_sg > vq->vring.num && !vq->indirect)" patch. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c 2014-11-04 14:44 ` Alexey Lapitsky @ 2014-12-18 3:13 ` Rusty Russell 0 siblings, 0 replies; 7+ messages in thread From: Rusty Russell @ 2014-12-18 3:13 UTC (permalink / raw) To: Alexey Lapitsky; +Cc: Dave Airlie, LKML, virtualization Alexey Lapitsky <lex.public@gmail.com> writes: > Hi, > > Sorry for the long delay. It prints exactly the same: > > [ 3.792033] virtqueue elements = 128, max_segments = 126 (1 queues) > [ 3.802191] vda: vda1 vda2 < vda5 > > > A little bit more about my setup (if it helps): OK, I think this is fixed by Ming Lei's recent count updates, which interestingly did not seem to be CC:stable. Cheers, Rusty. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-12-18 3:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAPM=9tyLrJ8VAdGCw_1guRS-ogCPE1Qp7faE_nrx_dAm2gvHLQ@mail.gmail.com>
2013-05-27 1:42 ` BUG_ON in virtio-ring.c Rusty Russell
2013-05-27 3:29 ` Dave Airlie
[not found] ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
2013-05-27 5:38 ` Rusty Russell
2014-10-07 15:46 ` Alexey Lapitsky
[not found] ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
2014-10-13 5:39 ` Rusty Russell
2014-11-04 14:44 ` Alexey Lapitsky
2014-12-18 3:13 ` Rusty Russell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).