All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julien Grall <julien.grall@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	David Vrabel <david.vrabel@citrix.com>,
	axboe@fb.com
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	"Ian Campbell" <Ian.Campbell@citrix.com>,
	"Stefano Stabellini" <stefano.stabellini@eu.citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: PAGE_SIZE (64KB), while block driver 'struct request' deals with < PAGE_SIZE (up to 44Kb). Was:Re: [RFC] Support of non-indirect grant backend on 64KB guest
Date: Thu, 27 Aug 2015 18:51:00 +0100	[thread overview]
Message-ID: <55DF4E04.8010005@citrix.com> (raw)
In-Reply-To: <20150821171022.GL26663@l.oracle.com>

Hi,

On 21/08/15 18:10, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 21, 2015 at 05:08:35PM +0100, David Vrabel wrote:
>> On 21/08/15 17:05, Konrad Rzeszutek Wilk wrote:
>>>
>>> I have to concur with that. We can't mandate that ARM 64k page MUST use
>>> indirect descriptors.
>>
>> Then it has to be fixed in the block layer to allow < PAGE_SIZE segments
>> and to get the block layer to split requests for blkfront.
> 
> Hey Jens,
> 
> I am hoping you can help us figure this problem out.
> 
> The Linux ARM is capable of using 4KB pages and 64KB pages. Our block
> driver (xen-blkfront) was built with 4KB pages in mind and without using
> any fancy flags (which some backends lack) the maximum amount of I/O it can
> fit on a ring is 44KB.
> 
> This has the unfortunate effect that when the xen-blkfront
> gets an 'struct request' it can have on page (64KB) and it can't actually
> fit it on the ring! And the lowest segment size it advertises is PAGE_SIZE
> (64KB). I believe Julien (who found this) tried initially advertising
> smaller segment size than PAGE_SIZE (32KB). However looking at
> __blk_segment_map_sg it looks to assume smallest size is PAGE_SIZE so
> that would explain why it did not work.

To be honest, I haven't tried to see how the block layer will act if I
dropped those checks in blk-settings.c until today.

I don't see any assumption about PAGE_SIZE in __blk_segment_map_sg.
Although dropping the checks in blk-settings (see quick patch [1]),
I got the following error in the frontend:

bio too big device xvda (128 > 88)
Buffer I/O error on dev xvda, logical block 0, async page read
bio too big device xvda (128 > 88)
Buffer I/O error on dev xvda, logical block 0, async page read

The "bio too big device ..." comes from generic_make_request_checks
(linux/block/blk-core.c) and the stack trace is:

[<fffffe0000096c7c>] dump_backtrace+0x0/0x124
[<fffffe0000096db0>] show_stack+0x10/0x1c
[<fffffe00005885e8>] dump_stack+0x78/0xbc
[<fffffe00000bf7f8>] warn_slowpath_common+0x98/0xd0
[<fffffe00000bf8f0>] warn_slowpath_null+0x14/0x20
[<fffffe00002df304>] generic_make_request_checks+0x114/0x230
[<fffffe00002e0580>] generic_make_request+0x10/0x108
[<fffffe00002e070c>] submit_bio+0x94/0x1e0
[<fffffe00001d573c>] submit_bh_wbc.isra.36+0x100/0x1a8
[<fffffe00001d5bf8>] block_read_full_page+0x320/0x3e8
[<fffffe00001d877c>] blkdev_readpage+0x14/0x20
[<fffffe000014582c>] do_read_cache_page+0x16c/0x1a0
[<fffffe0000145870>] read_cache_page+0x10/0x1c
[<fffffe00002f2908>] read_dev_sector+0x30/0x9c
[<fffffe00002f3d84>] msdos_partition+0x84/0x554
[<fffffe00002f38e4>] check_partition+0xf8/0x21c
[<fffffe00002f2f28>] rescan_partitions+0xb0/0x2a4
[<fffffe00001d98b0>] __blkdev_get+0x228/0x34c
[<fffffe00001d9a14>] blkdev_get+0x40/0x364
[<fffffe00002f0b6c>] add_disk+0x398/0x424
[<fffffe00003d8500>] blkback_changed+0x1200/0x152c
[<fffffe000036a954>] xenbus_otherend_changed+0x9c/0xa4
[<fffffe000036c984>] backend_changed+0xc/0x18
[<fffffe000036a088>] xenwatch_thread+0xa0/0x13c
[<fffffe00000d98d0>] kthread+0xd8/0xf0

The fs buffer code seems to assume that the block driver will always support
at least a bio of PAGE_SIZE.

> One wya to make this work is for the driver (xen-blkfront) to split
> the 'struct request' I/O in two internal requests.
> 
> But this has to be a normal problem. Surely there are other drivers
> (MMC?) that can't handle PAGE_SIZE and have to break their I/Os.
> Would it make sense for the common block code to be able to deal
> with this?

It will take me a bit of time to fully understand the block layer
as the changes doesn't seem trivial from POV (I don't have any
knowledge in it). So I will wait a feedback from Jens before
going further on this approach.

Regards,

[1] patch

diff --git a/block/blk-settings.c b/block/blk-settings.c
index e0057d0..ac024e7 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -251,12 +251,15 @@ EXPORT_SYMBOL(blk_queue_bounce_limit);
  **/
 void blk_limits_max_hw_sectors(struct queue_limits *limits, unsigned int max_hw_sectors)
 {
+#if 0
        if ((max_hw_sectors << 9) < PAGE_CACHE_SIZE) {
                max_hw_sectors = 1 << (PAGE_CACHE_SHIFT - 9);
                printk(KERN_INFO "%s: set to minimum %d\n",
                       __func__, max_hw_sectors);
        }
 
+#endif
+
        limits->max_sectors = limits->max_hw_sectors = max_hw_sectors;
 }
 EXPORT_SYMBOL(blk_limits_max_hw_sectors);
@@ -351,11 +354,14 @@ EXPORT_SYMBOL(blk_queue_max_segments);
  **/
 void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size)
 {
+#if 0
        if (max_size < PAGE_CACHE_SIZE) {
                max_size = PAGE_CACHE_SIZE;
                printk(KERN_INFO "%s: set to minimum %d\n",
                       __func__, max_size);
        }
+#endif
+
 
        q->limits.max_segment_size = max_size;
 }
@@ -777,11 +783,14 @@ EXPORT_SYMBOL_GPL(blk_queue_dma_drain);
  **/
 void blk_queue_segment_boundary(struct request_queue *q, unsigned long mask)
 {
+#if 0
        if (mask < PAGE_CACHE_SIZE - 1) {
                mask = PAGE_CACHE_SIZE - 1;
                printk(KERN_INFO "%s: set to minimum %lx\n",
                       __func__, mask);
        }
+#endif
+
 
        q->limits.seg_boundary_mask = mask;
 }



-- 
Julien Grall

  reply	other threads:[~2015-08-27 17:51 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-18  6:29 [RFC] Support of non-indirect grant backend on 64KB guest Julien Grall
2015-08-18  7:09 ` Roger Pau Monné
2015-08-18  7:26   ` Jan Beulich
2015-08-18 18:45   ` Julien Grall
2015-08-19  8:50     ` Roger Pau Monné
2015-08-19 14:54       ` Julien Grall
2015-08-19 15:17         ` Roger Pau Monné
2015-08-19 15:52           ` Julien Grall
2015-08-19 23:44           ` Stefano Stabellini
2015-08-20  8:31             ` Roger Pau Monné
2015-08-20  9:43               ` David Vrabel
2015-08-20 16:16                 ` Julien Grall
2015-08-20 17:23                 ` Stefano Stabellini
2015-08-21 16:05                   ` Konrad Rzeszutek Wilk
2015-08-21 16:08                     ` David Vrabel
2015-08-21 16:49                       ` Stefano Stabellini
2015-08-21 17:10                       ` PAGE_SIZE (64KB), while block driver 'struct request' deals with < PAGE_SIZE (up to 44Kb). Was:Re: " Konrad Rzeszutek Wilk
2015-08-27 17:51                         ` Julien Grall [this message]
2015-09-04 14:04                           ` Stefano Stabellini
2015-09-04 15:41                             ` Konrad Rzeszutek Wilk
2015-09-04 16:15                               ` Julien Grall
2015-09-04 17:32                                 ` Konrad Rzeszutek Wilk
2015-09-04 22:05                                   ` Julien Grall
2015-08-20  9:37             ` Jan Beulich
2015-08-19  8:58     ` Jan Beulich
2015-08-19 15:25       ` Julien Grall
2015-08-20 17:42 ` David Vrabel
2015-08-21  1:30   ` Julien Grall
2015-08-21 16:07     ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55DF4E04.8010005@citrix.com \
    --to=julien.grall@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=axboe@fb.com \
    --cc=david.vrabel@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=roger.pau@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.