* Re: BUG_ON in virtio-ring.c
[not found] <CAPM=9tyLrJ8VAdGCw_1guRS-ogCPE1Qp7faE_nrx_dAm2gvHLQ@mail.gmail.com>
@ 2013-05-27 1:42 ` Rusty Russell
2013-05-27 3:29 ` Dave Airlie
[not found] ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
0 siblings, 2 replies; 7+ messages in thread
From: Rusty Russell @ 2013-05-27 1:42 UTC (permalink / raw)
To: Dave Airlie, LKML; +Cc: virtualization
Dave Airlie <airlied@gmail.com> writes:
> Hi Rusty,
>
> current virtio-ring.c has a BUG_ON in virtqueue_add that checks
> total_sg > vg->vring.num, however I'm not sure it really is 100%
> correct.
>
> If I have an indirect ring and I'm adding sgs to it and the host is
> delayed (say I've got a thread consuming things from the vring and its
> off doing something interesting),
> I'd really like to get ENOSPC back from virtqueue_add. However if the
> indirect addition fails due to free_sg being 0, we hit the BUG_ON
> before we ever get to the ENOSPC check.
It is correct for the moment: drivers can't assume indirect buffer
support in the transport.
BUT for a new device, we could say "this depends on indirect descriptor
support", put the appropriate check in the device init, and then remove
the BUG_ON().
> the BUG_ON is quite valid in the no indirect case, but when we have
> indirect buffers it doesn't seem like it always makes sense.
>
> Not sure best way to fix it, I'm just a virtio newbie :)
Mailing me and the list was the right thing, since this raises question
of the spec as well as the Linux implementation.
Good luck!
Rusty.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c
2013-05-27 1:42 ` BUG_ON in virtio-ring.c Rusty Russell
@ 2013-05-27 3:29 ` Dave Airlie
[not found] ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
1 sibling, 0 replies; 7+ messages in thread
From: Dave Airlie @ 2013-05-27 3:29 UTC (permalink / raw)
To: Rusty Russell; +Cc: LKML, virtualization
>> correct.
>>
>> If I have an indirect ring and I'm adding sgs to it and the host is
>> delayed (say I've got a thread consuming things from the vring and its
>> off doing something interesting),
>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>> before we ever get to the ENOSPC check.
>
> It is correct for the moment: drivers can't assume indirect buffer
> support in the transport.
>
> BUT for a new device, we could say "this depends on indirect descriptor
> support", put the appropriate check in the device init, and then remove
> the BUG_ON().
But if the transport has indirect buffer support, can it change its
mind at runtime?
In this case we have vq->indirect set, but the device has run out of
free buffers,
but it isn't a case that in+out would overflow it if it had free
buffers since it would use
an indirect and succeed.
Getting -ENOSPC is definitely what should happen from what I can see,
not a BUG_ON,
I should get a BUG_ON only if the device reports no indirect support.
Dave.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c
[not found] ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
@ 2013-05-27 5:38 ` Rusty Russell
2014-10-07 15:46 ` Alexey Lapitsky
[not found] ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
0 siblings, 2 replies; 7+ messages in thread
From: Rusty Russell @ 2013-05-27 5:38 UTC (permalink / raw)
To: Dave Airlie; +Cc: LKML, virtualization
Dave Airlie <airlied@gmail.com> writes:
>>> correct.
>>>
>>> If I have an indirect ring and I'm adding sgs to it and the host is
>>> delayed (say I've got a thread consuming things from the vring and its
>>> off doing something interesting),
>>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>>> before we ever get to the ENOSPC check.
>>
>> It is correct for the moment: drivers can't assume indirect buffer
>> support in the transport.
>>
>> BUT for a new device, we could say "this depends on indirect descriptor
>> support", put the appropriate check in the device init, and then remove
>> the BUG_ON().
>
> But if the transport has indirect buffer support, can it change its
> mind at runtime?
It's a layering violation. The current rule is simple:
A driver should never submit a buffer which can't fit in the
ring.
This has the nice property that OOM (ie. indirect buffer alloction
fail) just slows things down, doesn't break things.
> In this case we have vq->indirect set, but the device has run out of
> free buffers,
> but it isn't a case that in+out would overflow it if it had free
> buffers since it would use
> an indirect and succeed.
OK, but when do you re-xmit? What if the ring is empty, and you
submitted a buffer that needs indirect? You won't get interrupted
again, because the device has consumed all the buffers. You need to
have your own timer or something equally hackish.
> Getting -ENOSPC is definitely what should happen from what I can see,
> not a BUG_ON,
> I should get a BUG_ON only if the device reports no indirect support.
I think we should return -ENOMEM if we can't indirect because of failed
allocation and it doesn't fit in the ring, ie:
This:
BUG_ON(total_sg > vq->vring.num);
BUG_ON(total_sg == 0);
if (vq->vq.num_free < total_sg) {
pr_debug("Can't add buf len %i - avail = %i\n",
total_sg, vq->vq.num_free);
/* FIXME: for historical reasons, we force a notify here if
* there are outgoing parts to the buffer. Presumably the
* host should service the ring ASAP. */
if (out_sgs)
vq->notify(&vq->vq);
END_USE(vq);
return -ENOSPC;
}
Becomes (untested):
BUG_ON(total_sg == 0);
if (vq->vq.num_free < total_sg) {
if (total_sg > vq->vring.num) {
BUG_ON(!vq->indirect);
/* Return -ENOMEM only if we have nothing else to do */
if (vq->vq.num_free == vq->vring.num)
return -ENOMEM;
}
pr_debug("Can't add buf len %i - avail = %i\n",
total_sg, vq->vq.num_free);
/* FIXME: for historical reasons, we force a notify here if
* there are outgoing parts to the buffer. Presumably the
* host should service the ring ASAP. */
if (out_sgs)
vq->notify(&vq->vq);
END_USE(vq);
return -ENOSPC;
}
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c
2013-05-27 5:38 ` Rusty Russell
@ 2014-10-07 15:46 ` Alexey Lapitsky
[not found] ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
1 sibling, 0 replies; 7+ messages in thread
From: Alexey Lapitsky @ 2014-10-07 15:46 UTC (permalink / raw)
To: Rusty Russell; +Cc: Dave Airlie, LKML, virtualization
Hi,
I'm hitting this bug with both ext4 and btrfs.
Here's an example of the backtrace:
https://gist.github.com/vzctl/e888a821333979120932
I tried raising this BUG only for direct ring and it solved the problem:
- BUG_ON(total_sg > vq->vring.num);
+ BUG_ON(total_sg > vq->vring.num && !vq->indirect);
Shall I submit the patch or is a more elaborate fix required?
--
Alexey
On Mon, May 27, 2013 at 7:38 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Dave Airlie <airlied@gmail.com> writes:
>>>> correct.
>>>>
>>>> If I have an indirect ring and I'm adding sgs to it and the host is
>>>> delayed (say I've got a thread consuming things from the vring and its
>>>> off doing something interesting),
>>>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>>>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>>>> before we ever get to the ENOSPC check.
>>>
>>> It is correct for the moment: drivers can't assume indirect buffer
>>> support in the transport.
>>>
>>> BUT for a new device, we could say "this depends on indirect descriptor
>>> support", put the appropriate check in the device init, and then remove
>>> the BUG_ON().
>>
>> But if the transport has indirect buffer support, can it change its
>> mind at runtime?
>
> It's a layering violation. The current rule is simple:
>
> A driver should never submit a buffer which can't fit in the
> ring.
>
> This has the nice property that OOM (ie. indirect buffer alloction
> fail) just slows things down, doesn't break things.
>
>> In this case we have vq->indirect set, but the device has run out of
>> free buffers,
>> but it isn't a case that in+out would overflow it if it had free
>> buffers since it would use
>> an indirect and succeed.
>
> OK, but when do you re-xmit? What if the ring is empty, and you
> submitted a buffer that needs indirect? You won't get interrupted
> again, because the device has consumed all the buffers. You need to
> have your own timer or something equally hackish.
>
>> Getting -ENOSPC is definitely what should happen from what I can see,
>> not a BUG_ON,
>> I should get a BUG_ON only if the device reports no indirect support.
>
> I think we should return -ENOMEM if we can't indirect because of failed
> allocation and it doesn't fit in the ring, ie:
>
> This:
> BUG_ON(total_sg > vq->vring.num);
> BUG_ON(total_sg == 0);
>
> if (vq->vq.num_free < total_sg) {
> pr_debug("Can't add buf len %i - avail = %i\n",
> total_sg, vq->vq.num_free);
> /* FIXME: for historical reasons, we force a notify here if
> * there are outgoing parts to the buffer. Presumably the
> * host should service the ring ASAP. */
> if (out_sgs)
> vq->notify(&vq->vq);
> END_USE(vq);
> return -ENOSPC;
> }
>
> Becomes (untested):
>
> BUG_ON(total_sg == 0);
>
> if (vq->vq.num_free < total_sg) {
> if (total_sg > vq->vring.num) {
> BUG_ON(!vq->indirect);
> /* Return -ENOMEM only if we have nothing else to do */
> if (vq->vq.num_free == vq->vring.num)
> return -ENOMEM;
> }
> pr_debug("Can't add buf len %i - avail = %i\n",
> total_sg, vq->vq.num_free);
> /* FIXME: for historical reasons, we force a notify here if
> * there are outgoing parts to the buffer. Presumably the
> * host should service the ring ASAP. */
> if (out_sgs)
> vq->notify(&vq->vq);
> END_USE(vq);
> return -ENOSPC;
> }
>
> Cheers,
> Rusty.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c
[not found] ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
@ 2014-10-13 5:39 ` Rusty Russell
2014-11-04 14:44 ` Alexey Lapitsky
0 siblings, 1 reply; 7+ messages in thread
From: Rusty Russell @ 2014-10-13 5:39 UTC (permalink / raw)
To: Alexey Lapitsky; +Cc: Dave Airlie, LKML, virtualization
Alexey Lapitsky <lex.public@gmail.com> writes:
> Hi,
>
> I'm hitting this bug with both ext4 and btrfs.
>
> Here's an example of the backtrace:
> https://gist.github.com/vzctl/e888a821333979120932
>
> I tried raising this BUG only for direct ring and it solved the problem:
>
> - BUG_ON(total_sg > vq->vring.num);
> + BUG_ON(total_sg > vq->vring.num && !vq->indirect);
>
> Shall I submit the patch or is a more elaborate fix required?
This is wrong. It looks like the block layer is spending down
more sg elements than we have ring entries, yet we tell it not to:
blk_queue_max_segments(q, vblk->sg_elems-2);
If you apply this patch, what happens? Here it prints:
[ 0.616564] virtqueue elements = 128, max_segments = 126 (1 queues)
[ 0.621244] vda: vda1 vda2 < vda5 >
[ 0.632290] virtqueue elements = 128, max_segments = 126 (1 queues)
[ 0.683526] vdb: vdb1 vdb2 < vdb5 >
Cheers,
Rusty.
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0a581400de0f..aa9d4d313587 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -683,6 +683,13 @@ static int virtblk_probe(struct virtio_device *vdev)
/* We can handle whatever the host told us to handle. */
blk_queue_max_segments(q, vblk->sg_elems-2);
+ printk("virtqueue elements = %u, max_segments = %u (%u queues)",
+ virtqueue_get_vring_size(vblk->vqs[0].vq),
+ vblk->sg_elems-2,
+ vblk->num_vqs);
+
+ BUG_ON(vblk->sg_elems-2 > virtqueue_get_vring_size(vblk->vqs[0].vq));
+
/* No need to bounce any requests */
blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c
2014-10-13 5:39 ` Rusty Russell
@ 2014-11-04 14:44 ` Alexey Lapitsky
2014-12-18 3:13 ` Rusty Russell
0 siblings, 1 reply; 7+ messages in thread
From: Alexey Lapitsky @ 2014-11-04 14:44 UTC (permalink / raw)
To: Rusty Russell; +Cc: Dave Airlie, LKML, virtualization
Hi,
Sorry for the long delay. It prints exactly the same:
[ 3.792033] virtqueue elements = 128, max_segments = 126 (1 queues)
[ 3.802191] vda: vda1 vda2 < vda5 >
A little bit more about my setup (if it helps):
It's a qemu-system-x86_64 kvm instance with 16 cores and 10G of RAM.
I can reproduce the bug every time with mkfs.btrfs on a 10GB LVM
volume (right after the reboot).
I have almost no knowledge of vring / virtio.
Is it correct that we need just one sg_elem entry in the vq->vring if
vq->indirect flag is set?
That's what I thought when applying the "BUG_ON(total_sg >
vq->vring.num && !vq->indirect)" patch.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUG_ON in virtio-ring.c
2014-11-04 14:44 ` Alexey Lapitsky
@ 2014-12-18 3:13 ` Rusty Russell
0 siblings, 0 replies; 7+ messages in thread
From: Rusty Russell @ 2014-12-18 3:13 UTC (permalink / raw)
To: Alexey Lapitsky; +Cc: Dave Airlie, LKML, virtualization
Alexey Lapitsky <lex.public@gmail.com> writes:
> Hi,
>
> Sorry for the long delay. It prints exactly the same:
>
> [ 3.792033] virtqueue elements = 128, max_segments = 126 (1 queues)
> [ 3.802191] vda: vda1 vda2 < vda5 >
>
> A little bit more about my setup (if it helps):
OK, I think this is fixed by Ming Lei's recent count updates,
which interestingly did not seem to be CC:stable.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-12-18 3:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAPM=9tyLrJ8VAdGCw_1guRS-ogCPE1Qp7faE_nrx_dAm2gvHLQ@mail.gmail.com>
2013-05-27 1:42 ` BUG_ON in virtio-ring.c Rusty Russell
2013-05-27 3:29 ` Dave Airlie
[not found] ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
2013-05-27 5:38 ` Rusty Russell
2014-10-07 15:46 ` Alexey Lapitsky
[not found] ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
2014-10-13 5:39 ` Rusty Russell
2014-11-04 14:44 ` Alexey Lapitsky
2014-12-18 3:13 ` Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).