From: Mike Snitzer <snitzer@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org,
martin.petersen@oracle.com, hch@infradead.org,
rusty@rustcorp.com.au, dm-devel@redhat.com
Subject: Re: virtio_blk: fix defaults for max_hw_sectors and max_segment_size
Date: Thu, 20 Nov 2014 16:15:22 -0500 [thread overview]
Message-ID: <20141120211521.GA846@redhat.com> (raw)
In-Reply-To: <20141120203044.GA9078@redhat.com>
On Thu, Nov 20 2014 at 3:30pm -0500,
Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Nov 20, 2014 at 02:00:59PM -0500, Mike Snitzer wrote:
> > virtio_blk incorrectly established -1U as the default for these
> > queue_limits. Set these limits to sane default values to avoid crashing
> > the kernel.
...
> > Attempting to mkfs.xfs against a thin device from this thin-pool quickly
> > resulted in fs/direct-io.c:dio_send_cur_page()'s BUG_ON.
>
> Why exactly does it BUG_ON?
> Did some memory allocation fail?
No idea, kernel log doesn't say.. all it has is "kernel BUG" pointing to
fs/direct-io.c:dio_send_cur_page()'s BUG_ON.
I could dig deeper on _why_ but honestly, there really isn't much point.
virtio-blk doesn't get to live in fantasy-land just because it happens
to think it is limitless.
> Will it still BUG_ON if host gives us high values?
Maybe, if/when virtio-blk allows the host to inject a value for
max_hw_sectors. But my fix doesn't stack the host's limits up, it sets
a value that isn't prone to make the block/fs layers BUG.
> If linux makes assumptions about hardware limits, won't
> it be better to put them in blk core and not in
> individual drivers?
The individual block driver is meant to establish sane values for these
limits.
Block core _does_ have some sane wrappers for stacking these limits
(e.g. blk_stack_limits, etc). All of those wrappers are meant to allow
for virtual drivers to build up limits that respect the underlying
hardware's limits.
But virtio-blk doesn't use any of them due to the virtio-blk driver
relying on the virtio-blk protocol to encapsulate each and every one of
them.
> > Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> > Cc: stable@vger.kernel.org
> > ---
> > drivers/block/virtio_blk.c | 9 ++++++---
> > 1 files changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index c6a27d5..68efbdc 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -674,8 +674,11 @@ static int virtblk_probe(struct virtio_device *vdev)
> > /* No need to bounce any requests */
> > blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
> >
> > - /* No real sector limit. */
> > - blk_queue_max_hw_sectors(q, -1U);
> > + /*
> > + * Limited by disk's max_hw_sectors in host, but
> > + * without that info establish a sane default.
> > + */
> > + blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
>
> I see
> drivers/usb/storage/scsiglue.c: blk_queue_max_hw_sectors(sdev->request_queue, 0x7FFFFF);
>
> so maybe we should go higher, and use INT_MAX too?
No, higher doesn't help _at all_ if the driver itself doesn't actually
take care to stack the underlying driver's limits. Without limits
stacking (which virtio-blk doesn't really have) it is the lack of
reality-based default values that is _the_ problem that enduced this
BUG.
blk_stack_limits() does a lot of min_t(top, bottom), etc. So you want
the default "top" of a stacking driver to be high enough so as not to
artificially limit the resulting stacked limit. Which is why we have
things like blk_set_stacking_limits(). You'll note that
blk_set_stacking_limits() properly establishes UINT_MAX, etc. BUT it is
"proper" purely because drivers that call it (e.g. DM) also make use of
the block layer's limits stacking functions (again,
e.g. blk_stack_limits).
> >
> > /* Host can optionally specify maximum segment size and number of
> > * segments. */
> > @@ -684,7 +687,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > if (!err)
> > blk_queue_max_segment_size(q, v);
> > else
> > - blk_queue_max_segment_size(q, -1U);
> > + blk_queue_max_segment_size(q, BLK_MAX_SEGMENT_SIZE);
> >
> > /* Host can optionally specify the block size of the device */
> > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
>
> Here too, I see some drivers asking for more:
> drivers/block/mtip32xx/mtip32xx.c: blk_queue_max_segment_size(dd->queue, 0x400000);
Those drivers you listed could be equally broken..
For virtio-blk the issue is that the limits it establishes don't reflect
the underlying host's hardware capabilties. This was a virtio-blk time
bomb waiting to go off.
And to be clear, I just fixed the blk_queue_max_segment_size(q, -1U);
because it is blatantly wrong when we've established
BLK_MAX_SEGMENT_SIZE.
The bug that was reported is purely due to max_hw_sectors being 2TB and
the established max_sectors being 1TB.
next prev parent reply other threads:[~2014-11-20 21:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-20 19:00 [PATCH] virtio_blk: fix defaults for max_hw_sectors and max_segment_size Mike Snitzer
2014-11-20 20:30 ` Michael S. Tsirkin
2014-11-20 21:15 ` Mike Snitzer [this message]
2014-11-26 5:58 ` Rusty Russell
2014-11-26 14:10 ` Mike Snitzer
2014-11-21 1:59 ` Mike Snitzer
2014-11-21 2:11 ` [PATCH v2] " Mike Snitzer
2014-11-21 9:54 ` [PATCH] " Christoph Hellwig
2014-11-21 9:54 ` [Qemu-devel] " Christoph Hellwig
2014-11-21 15:49 ` Mike Snitzer
2014-11-21 15:49 ` [Qemu-devel] " Mike Snitzer
2014-11-26 19:48 ` Jens Axboe
2014-11-26 19:48 ` [Qemu-devel] " Jens Axboe
2014-11-26 20:51 ` Mike Snitzer
2014-11-26 20:51 ` [Qemu-devel] " Mike Snitzer
2014-11-26 20:54 ` Jens Axboe
2014-11-26 20:54 ` [Qemu-devel] " Jens Axboe
2014-11-26 21:51 ` Mike Snitzer
2014-11-26 21:51 ` [Qemu-devel] " Mike Snitzer
2014-11-26 21:51 ` Mike Snitzer
2014-11-26 21:53 ` Jens Axboe
2014-11-26 21:53 ` [Qemu-devel] " Jens Axboe
2014-11-26 23:00 ` Mike Snitzer
2014-11-26 23:00 ` [Qemu-devel] " Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141120211521.GA846@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mst@redhat.com \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.