From: Julien Grall <julien.grall@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
David Vrabel <david.vrabel@citrix.com>,
axboe@fb.com
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
"Ian Campbell" <Ian.Campbell@citrix.com>,
"Stefano Stabellini" <stefano.stabellini@eu.citrix.com>,
"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: PAGE_SIZE (64KB), while block driver 'struct request' deals with < PAGE_SIZE (up to 44Kb). Was:Re: [RFC] Support of non-indirect grant backend on 64KB guest
Date: Thu, 27 Aug 2015 18:51:00 +0100 [thread overview]
Message-ID: <55DF4E04.8010005@citrix.com> (raw)
In-Reply-To: <20150821171022.GL26663@l.oracle.com>
Hi,
On 21/08/15 18:10, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 21, 2015 at 05:08:35PM +0100, David Vrabel wrote:
>> On 21/08/15 17:05, Konrad Rzeszutek Wilk wrote:
>>>
>>> I have to concur with that. We can't mandate that ARM 64k page MUST use
>>> indirect descriptors.
>>
>> Then it has to be fixed in the block layer to allow < PAGE_SIZE segments
>> and to get the block layer to split requests for blkfront.
>
> Hey Jens,
>
> I am hoping you can help us figure this problem out.
>
> The Linux ARM is capable of using 4KB pages and 64KB pages. Our block
> driver (xen-blkfront) was built with 4KB pages in mind and without using
> any fancy flags (which some backends lack) the maximum amount of I/O it can
> fit on a ring is 44KB.
>
> This has the unfortunate effect that when the xen-blkfront
> gets an 'struct request' it can have on page (64KB) and it can't actually
> fit it on the ring! And the lowest segment size it advertises is PAGE_SIZE
> (64KB). I believe Julien (who found this) tried initially advertising
> smaller segment size than PAGE_SIZE (32KB). However looking at
> __blk_segment_map_sg it looks to assume smallest size is PAGE_SIZE so
> that would explain why it did not work.
To be honest, I haven't tried to see how the block layer will act if I
dropped those checks in blk-settings.c until today.
I don't see any assumption about PAGE_SIZE in __blk_segment_map_sg.
Although dropping the checks in blk-settings (see quick patch [1]),
I got the following error in the frontend:
bio too big device xvda (128 > 88)
Buffer I/O error on dev xvda, logical block 0, async page read
bio too big device xvda (128 > 88)
Buffer I/O error on dev xvda, logical block 0, async page read
The "bio too big device ..." comes from generic_make_request_checks
(linux/block/blk-core.c) and the stack trace is:
[<fffffe0000096c7c>] dump_backtrace+0x0/0x124
[<fffffe0000096db0>] show_stack+0x10/0x1c
[<fffffe00005885e8>] dump_stack+0x78/0xbc
[<fffffe00000bf7f8>] warn_slowpath_common+0x98/0xd0
[<fffffe00000bf8f0>] warn_slowpath_null+0x14/0x20
[<fffffe00002df304>] generic_make_request_checks+0x114/0x230
[<fffffe00002e0580>] generic_make_request+0x10/0x108
[<fffffe00002e070c>] submit_bio+0x94/0x1e0
[<fffffe00001d573c>] submit_bh_wbc.isra.36+0x100/0x1a8
[<fffffe00001d5bf8>] block_read_full_page+0x320/0x3e8
[<fffffe00001d877c>] blkdev_readpage+0x14/0x20
[<fffffe000014582c>] do_read_cache_page+0x16c/0x1a0
[<fffffe0000145870>] read_cache_page+0x10/0x1c
[<fffffe00002f2908>] read_dev_sector+0x30/0x9c
[<fffffe00002f3d84>] msdos_partition+0x84/0x554
[<fffffe00002f38e4>] check_partition+0xf8/0x21c
[<fffffe00002f2f28>] rescan_partitions+0xb0/0x2a4
[<fffffe00001d98b0>] __blkdev_get+0x228/0x34c
[<fffffe00001d9a14>] blkdev_get+0x40/0x364
[<fffffe00002f0b6c>] add_disk+0x398/0x424
[<fffffe00003d8500>] blkback_changed+0x1200/0x152c
[<fffffe000036a954>] xenbus_otherend_changed+0x9c/0xa4
[<fffffe000036c984>] backend_changed+0xc/0x18
[<fffffe000036a088>] xenwatch_thread+0xa0/0x13c
[<fffffe00000d98d0>] kthread+0xd8/0xf0
The fs buffer code seems to assume that the block driver will always support
at least a bio of PAGE_SIZE.
> One wya to make this work is for the driver (xen-blkfront) to split
> the 'struct request' I/O in two internal requests.
>
> But this has to be a normal problem. Surely there are other drivers
> (MMC?) that can't handle PAGE_SIZE and have to break their I/Os.
> Would it make sense for the common block code to be able to deal
> with this?
It will take me a bit of time to fully understand the block layer
as the changes doesn't seem trivial from POV (I don't have any
knowledge in it). So I will wait a feedback from Jens before
going further on this approach.
Regards,
[1] patch
diff --git a/block/blk-settings.c b/block/blk-settings.c
index e0057d0..ac024e7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -251,12 +251,15 @@ EXPORT_SYMBOL(blk_queue_bounce_limit);
**/
void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_sectors)
{
+#if 0
if ((max_hw_sectors << 9) < PAGE_CACHE_SIZE) {
max_hw_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
printk(KERN_INFO "%s: set to minimum %d\n",
__func__, max_hw_sectors);
}
+#endif
+
limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
}
EXPORT_SYMBOL(blk_limits_max_hw_sectors);
@@ -351,11 +354,14 @@ EXPORT_SYMBOL(blk_queue_max_segments);
**/
void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
{
+#if 0
if (max_size < PAGE_CACHE_SIZE) {
max_size = PAGE_CACHE_SIZE;
printk(KERN_INFO "%s: set to minimum %d\n",
__func__, max_size);
}
+#endif
+
q->limits.max_segment_size = max_size;
}
@@ -777,11 +783,14 @@ EXPORT_SYMBOL_GPL(blk_queue_dma_drain);
**/
void blk_queue_segment_boundary(struct request_queue *q, unsigned long mask)
{
+#if 0
if (mask < PAGE_CACHE_SIZE - 1) {
mask = PAGE_CACHE_SIZE - 1;
printk(KERN_INFO "%s: set to minimum %lx\n",
__func__, mask);
}
+#endif
+
q->limits.seg_boundary_mask = mask;
}
--
Julien Grall
next prev parent reply other threads:[~2015-08-27 17:51 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-18 6:29 [RFC] Support of non-indirect grant backend on 64KB guest Julien Grall
2015-08-18 7:09 ` Roger Pau Monné
2015-08-18 7:26 ` Jan Beulich
2015-08-18 18:45 ` Julien Grall
2015-08-19 8:50 ` Roger Pau Monné
2015-08-19 14:54 ` Julien Grall
2015-08-19 15:17 ` Roger Pau Monné
2015-08-19 15:52 ` Julien Grall
2015-08-19 23:44 ` Stefano Stabellini
2015-08-20 8:31 ` Roger Pau Monné
2015-08-20 9:43 ` David Vrabel
2015-08-20 16:16 ` Julien Grall
2015-08-20 17:23 ` Stefano Stabellini
2015-08-21 16:05 ` Konrad Rzeszutek Wilk
2015-08-21 16:08 ` David Vrabel
2015-08-21 16:49 ` Stefano Stabellini
2015-08-21 17:10 ` PAGE_SIZE (64KB), while block driver 'struct request' deals with < PAGE_SIZE (up to 44Kb). Was:Re: " Konrad Rzeszutek Wilk
2015-08-27 17:51 ` Julien Grall [this message]
2015-09-04 14:04 ` Stefano Stabellini
2015-09-04 15:41 ` Konrad Rzeszutek Wilk
2015-09-04 16:15 ` Julien Grall
2015-09-04 17:32 ` Konrad Rzeszutek Wilk
2015-09-04 22:05 ` Julien Grall
2015-08-20 9:37 ` Jan Beulich
2015-08-19 8:58 ` Jan Beulich
2015-08-19 15:25 ` Julien Grall
2015-08-20 17:42 ` David Vrabel
2015-08-21 1:30 ` Julien Grall
2015-08-21 16:07 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55DF4E04.8010005@citrix.com \
--to=julien.grall@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=axboe@fb.com \
--cc=david.vrabel@citrix.com \
--cc=konrad.wilk@oracle.com \
--cc=roger.pau@citrix.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).