Re: BUG_ON in virtio-ring.c

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* Re: BUG_ON in virtio-ring.c
       [not found] <CAPM=9tyLrJ8VAdGCw_1guRS-ogCPE1Qp7faE_nrx_dAm2gvHLQ@mail.gmail.com>
@ 2013-05-27  1:42 ` Rusty Russell
  2013-05-27  3:29   ` Dave Airlie
       [not found]   ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
  0 siblings, 2 replies; 7+ messages in thread
From: Rusty Russell @ 2013-05-27  1:42 UTC (permalink / raw)
  To: Dave Airlie, LKML; +Cc: virtualization

Dave Airlie <airlied@gmail.com> writes:
> Hi Rusty,
>
> current virtio-ring.c has a BUG_ON in virtqueue_add that checks
> total_sg > vg->vring.num, however I'm not sure it really is 100%
> correct.
>
> If I have an indirect ring and I'm adding sgs to it and the host is
> delayed (say I've got a thread consuming things from the vring and its
> off doing something interesting),
> I'd really like to get ENOSPC back from virtqueue_add. However if the
> indirect addition fails due to free_sg being 0, we hit the BUG_ON
> before we ever get to the ENOSPC check.

It is correct for the moment: drivers can't assume indirect buffer
support in the transport.

BUT for a new device, we could say "this depends on indirect descriptor
support", put the appropriate check in the device init, and then remove
the BUG_ON().

> the BUG_ON is quite valid in the no indirect case, but when we have
> indirect buffers it doesn't seem like it always makes sense.
>
> Not sure best way to fix it, I'm just a virtio newbie :)

Mailing me and the list was the right thing, since this raises question
of the spec as well as the Linux implementation.

Good luck!
Rusty.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG_ON in virtio-ring.c
  2013-05-27  1:42 ` BUG_ON in virtio-ring.c Rusty Russell
@ 2013-05-27  3:29   ` Dave Airlie
       [not found]   ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Airlie @ 2013-05-27  3:29 UTC (permalink / raw)
  To: Rusty Russell; +Cc: LKML, virtualization

>> correct.
>>
>> If I have an indirect ring and I'm adding sgs to it and the host is
>> delayed (say I've got a thread consuming things from the vring and its
>> off doing something interesting),
>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>> before we ever get to the ENOSPC check.
>
> It is correct for the moment: drivers can't assume indirect buffer
> support in the transport.
>
> BUT for a new device, we could say "this depends on indirect descriptor
> support", put the appropriate check in the device init, and then remove
> the BUG_ON().

But if the transport has indirect buffer support, can it change its
mind at runtime?

In this case we have vq->indirect set, but the device has run out of
free buffers,
but it isn't a case that in+out would overflow it if it had free
buffers since it would use
an indirect and succeed.

Getting -ENOSPC is definitely what should happen from what I can see,
not a BUG_ON,
I should get a BUG_ON only if the device reports no indirect support.

Dave.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG_ON in virtio-ring.c
       [not found]   ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
@ 2013-05-27  5:38     ` Rusty Russell
  2014-10-07 15:46       ` Alexey Lapitsky
       [not found]       ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
  0 siblings, 2 replies; 7+ messages in thread
From: Rusty Russell @ 2013-05-27  5:38 UTC (permalink / raw)
  To: Dave Airlie; +Cc: LKML, virtualization

Dave Airlie <airlied@gmail.com> writes:
>>> correct.
>>>
>>> If I have an indirect ring and I'm adding sgs to it and the host is
>>> delayed (say I've got a thread consuming things from the vring and its
>>> off doing something interesting),
>>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>>> before we ever get to the ENOSPC check.
>>
>> It is correct for the moment: drivers can't assume indirect buffer
>> support in the transport.
>>
>> BUT for a new device, we could say "this depends on indirect descriptor
>> support", put the appropriate check in the device init, and then remove
>> the BUG_ON().
>
> But if the transport has indirect buffer support, can it change its
> mind at runtime?

It's a layering violation.  The current rule is simple:

        A driver should never submit a buffer which can't fit in the
        ring.

This has the nice property that OOM (ie. indirect buffer alloction
fail) just slows things down, doesn't break things.

> In this case we have vq->indirect set, but the device has run out of
> free buffers,
> but it isn't a case that in+out would overflow it if it had free
> buffers since it would use
> an indirect and succeed.

OK, but when do you re-xmit?  What if the ring is empty, and you
submitted a buffer that needs indirect?  You won't get interrupted
again, because the device has consumed all the buffers.  You need to
have your own timer or something equally hackish.

> Getting -ENOSPC is definitely what should happen from what I can see,
> not a BUG_ON,
> I should get a BUG_ON only if the device reports no indirect support.

I think we should return -ENOMEM if we can't indirect because of failed
allocation and it doesn't fit in the ring, ie:

This:
	BUG_ON(total_sg > vq->vring.num);
	BUG_ON(total_sg == 0);

	if (vq->vq.num_free < total_sg) {
		pr_debug("Can't add buf len %i - avail = %i\n",
			 total_sg, vq->vq.num_free);
		/* FIXME: for historical reasons, we force a notify here if
		 * there are outgoing parts to the buffer.  Presumably the
		 * host should service the ring ASAP. */
		if (out_sgs)
			vq->notify(&vq->vq);
		END_USE(vq);
		return -ENOSPC;
	}

Becomes (untested):

	BUG_ON(total_sg == 0);

	if (vq->vq.num_free < total_sg) {
                if (total_sg > vq->vring.num) {
                        BUG_ON(!vq->indirect);
                       /* Return -ENOMEM only if we have nothing else to do */
                       if (vq->vq.num_free == vq->vring.num)
                               return -ENOMEM;
                }
		pr_debug("Can't add buf len %i - avail = %i\n",
			 total_sg, vq->vq.num_free);
		/* FIXME: for historical reasons, we force a notify here if
		 * there are outgoing parts to the buffer.  Presumably the
		 * host should service the ring ASAP. */
		if (out_sgs)
			vq->notify(&vq->vq);
		END_USE(vq);
		return -ENOSPC;
	}

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG_ON in virtio-ring.c
  2013-05-27  5:38     ` Rusty Russell
@ 2014-10-07 15:46       ` Alexey Lapitsky
       [not found]       ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Alexey Lapitsky @ 2014-10-07 15:46 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Dave Airlie, LKML, virtualization

Hi,

I'm hitting this bug with both ext4 and btrfs.

Here's an example of the backtrace:
https://gist.github.com/vzctl/e888a821333979120932

I tried raising this BUG only for direct ring and it solved the problem:

 -       BUG_ON(total_sg > vq->vring.num);
+       BUG_ON(total_sg > vq->vring.num && !vq->indirect);

Shall I submit the patch or is a more elaborate fix required?

--

Alexey



On Mon, May 27, 2013 at 7:38 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Dave Airlie <airlied@gmail.com> writes:
>>>> correct.
>>>>
>>>> If I have an indirect ring and I'm adding sgs to it and the host is
>>>> delayed (say I've got a thread consuming things from the vring and its
>>>> off doing something interesting),
>>>> I'd really like to get ENOSPC back from virtqueue_add. However if the
>>>> indirect addition fails due to free_sg being 0, we hit the BUG_ON
>>>> before we ever get to the ENOSPC check.
>>>
>>> It is correct for the moment: drivers can't assume indirect buffer
>>> support in the transport.
>>>
>>> BUT for a new device, we could say "this depends on indirect descriptor
>>> support", put the appropriate check in the device init, and then remove
>>> the BUG_ON().
>>
>> But if the transport has indirect buffer support, can it change its
>> mind at runtime?
>
> It's a layering violation.  The current rule is simple:
>
>         A driver should never submit a buffer which can't fit in the
>         ring.
>
> This has the nice property that OOM (ie. indirect buffer alloction
> fail) just slows things down, doesn't break things.
>
>> In this case we have vq->indirect set, but the device has run out of
>> free buffers,
>> but it isn't a case that in+out would overflow it if it had free
>> buffers since it would use
>> an indirect and succeed.
>
> OK, but when do you re-xmit?  What if the ring is empty, and you
> submitted a buffer that needs indirect?  You won't get interrupted
> again, because the device has consumed all the buffers.  You need to
> have your own timer or something equally hackish.
>
>> Getting -ENOSPC is definitely what should happen from what I can see,
>> not a BUG_ON,
>> I should get a BUG_ON only if the device reports no indirect support.
>
> I think we should return -ENOMEM if we can't indirect because of failed
> allocation and it doesn't fit in the ring, ie:
>
> This:
>         BUG_ON(total_sg > vq->vring.num);
>         BUG_ON(total_sg == 0);
>
>         if (vq->vq.num_free < total_sg) {
>                 pr_debug("Can't add buf len %i - avail = %i\n",
>                          total_sg, vq->vq.num_free);
>                 /* FIXME: for historical reasons, we force a notify here if
>                  * there are outgoing parts to the buffer.  Presumably the
>                  * host should service the ring ASAP. */
>                 if (out_sgs)
>                         vq->notify(&vq->vq);
>                 END_USE(vq);
>                 return -ENOSPC;
>         }
>
> Becomes (untested):
>
>         BUG_ON(total_sg == 0);
>
>         if (vq->vq.num_free < total_sg) {
>                 if (total_sg > vq->vring.num) {
>                         BUG_ON(!vq->indirect);
>                        /* Return -ENOMEM only if we have nothing else to do */
>                        if (vq->vq.num_free == vq->vring.num)
>                                return -ENOMEM;
>                 }
>                 pr_debug("Can't add buf len %i - avail = %i\n",
>                          total_sg, vq->vq.num_free);
>                 /* FIXME: for historical reasons, we force a notify here if
>                  * there are outgoing parts to the buffer.  Presumably the
>                  * host should service the ring ASAP. */
>                 if (out_sgs)
>                         vq->notify(&vq->vq);
>                 END_USE(vq);
>                 return -ENOSPC;
>         }
>
> Cheers,
> Rusty.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG_ON in virtio-ring.c
       [not found]       ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
@ 2014-10-13  5:39         ` Rusty Russell
  2014-11-04 14:44           ` Alexey Lapitsky
  0 siblings, 1 reply; 7+ messages in thread
From: Rusty Russell @ 2014-10-13  5:39 UTC (permalink / raw)
  To: Alexey Lapitsky; +Cc: Dave Airlie, LKML, virtualization

Alexey Lapitsky <lex.public@gmail.com> writes:
> Hi,
>
> I'm hitting this bug with both ext4 and btrfs.
>
> Here's an example of the backtrace:
> https://gist.github.com/vzctl/e888a821333979120932
>
> I tried raising this BUG only for direct ring and it solved the problem:
>
>  -       BUG_ON(total_sg > vq->vring.num);
> +       BUG_ON(total_sg > vq->vring.num && !vq->indirect);
>
> Shall I submit the patch or is a more elaborate fix required?

This is wrong.  It looks like the block layer is spending down
more sg elements than we have ring entries, yet we tell it not to:

	blk_queue_max_segments(q, vblk->sg_elems-2);

If you apply this patch, what happens?  Here it prints:

[    0.616564] virtqueue elements = 128, max_segments = 126 (1 queues)
[    0.621244]  vda: vda1 vda2 < vda5 >
[    0.632290] virtqueue elements = 128, max_segments = 126 (1 queues)
[    0.683526]  vdb: vdb1 vdb2 < vdb5 >

Cheers,
Rusty.

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0a581400de0f..aa9d4d313587 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -683,6 +683,13 @@ static int virtblk_probe(struct virtio_device *vdev)
 	/* We can handle whatever the host told us to handle. */
 	blk_queue_max_segments(q, vblk->sg_elems-2);
 
+	printk("virtqueue elements = %u, max_segments = %u (%u queues)", 
+	       virtqueue_get_vring_size(vblk->vqs[0].vq),
+	       vblk->sg_elems-2,
+	       vblk->num_vqs);
+
+	BUG_ON(vblk->sg_elems-2 > virtqueue_get_vring_size(vblk->vqs[0].vq));
+
 	/* No need to bounce any requests */
 	blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: BUG_ON in virtio-ring.c
  2014-10-13  5:39         ` Rusty Russell
@ 2014-11-04 14:44           ` Alexey Lapitsky
  2014-12-18  3:13             ` Rusty Russell
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Lapitsky @ 2014-11-04 14:44 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Dave Airlie, LKML, virtualization

Hi,

Sorry for the long delay. It prints exactly the same:

[    3.792033] virtqueue elements = 128, max_segments = 126 (1 queues)
[    3.802191]  vda: vda1 vda2 < vda5 >

A little bit more about my setup (if it helps):

It's a qemu-system-x86_64 kvm instance with 16 cores and 10G of RAM.
I can reproduce the bug every time with mkfs.btrfs on a 10GB LVM
volume (right after the reboot).

I have almost no knowledge of vring / virtio.
Is it correct that we need just one sg_elem entry in the vq->vring if
vq->indirect flag is set?
That's what I thought when applying the "BUG_ON(total_sg >
vq->vring.num && !vq->indirect)" patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: BUG_ON in virtio-ring.c
  2014-11-04 14:44           ` Alexey Lapitsky
@ 2014-12-18  3:13             ` Rusty Russell
  0 siblings, 0 replies; 7+ messages in thread
From: Rusty Russell @ 2014-12-18  3:13 UTC (permalink / raw)
  To: Alexey Lapitsky; +Cc: Dave Airlie, LKML, virtualization

Alexey Lapitsky <lex.public@gmail.com> writes:
> Hi,
>
> Sorry for the long delay. It prints exactly the same:
>
> [    3.792033] virtqueue elements = 128, max_segments = 126 (1 queues)
> [    3.802191]  vda: vda1 vda2 < vda5 >
>
> A little bit more about my setup (if it helps):

OK, I think this is fixed by Ming Lei's recent count updates,
which interestingly did not seem to be CC:stable.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-12-18  3:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAPM=9tyLrJ8VAdGCw_1guRS-ogCPE1Qp7faE_nrx_dAm2gvHLQ@mail.gmail.com>
2013-05-27  1:42 ` BUG_ON in virtio-ring.c Rusty Russell
2013-05-27  3:29   ` Dave Airlie
     [not found]   ` <CAPM=9tzw8jEv8yoLr7ktjO2DHCDzA+mURmJ-DXby_nsWUUNoyg@mail.gmail.com>
2013-05-27  5:38     ` Rusty Russell
2014-10-07 15:46       ` Alexey Lapitsky
     [not found]       ` <CAAts5mM_+Mj=vjWyJc8wZCvZehfK5dZxeA0Y9eowHwFDiWxo9Q@mail.gmail.com>
2014-10-13  5:39         ` Rusty Russell
2014-11-04 14:44           ` Alexey Lapitsky
2014-12-18  3:13             ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).