public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: "Ing. Daniel Rozsnyó" <daniel@rozsnyo.com>
Cc: Milan Broz <mbroz@redhat.com>, Marti Raudsepp <marti@juffo.org>,
	linux-kernel@vger.kernel.org
Subject: Re: bio too big - in nested raid setup
Date: Thu, 28 Jan 2010 21:50:15 +1100	[thread overview]
Message-ID: <20100128215015.0e0ed3a8@notabene> (raw)
In-Reply-To: <4B6157DB.6080502@rozsnyo.com>

On Thu, 28 Jan 2010 10:24:43 +0100
"Ing. Daniel Rozsnyó" <daniel@rozsnyo.com> wrote:

> Neil Brown wrote:
> > On Mon, 25 Jan 2010 19:27:53 +0100
> > Milan Broz <mbroz@redhat.com> wrote:
> > 
> >> On 01/25/2010 04:25 PM, Marti Raudsepp wrote:
> >>> 2010/1/24 "Ing. Daniel Rozsnyó" <daniel@rozsnyo.com>:
> >>>> Hello,
> >>>>  I am having troubles with nested RAID - when one array is added to the
> >>>> other, the "bio too big device md0" messages are appearing:
> >>>>
> >>>> bio too big device md0 (144 > 8)
> >>>> bio too big device md0 (248 > 8)
> >>>> bio too big device md0 (32 > 8)
> >>> I *think* this is the same bug that I hit years ago when mixing
> >>> different disks and 'pvmove'
> >>>
> >>> It's a design flaw in the DM/MD frameworks; see comment #3 from Milan Broz:
> >>> http://bugzilla.kernel.org/show_bug.cgi?id=9401#c3
> >> Hm. I don't think it is the same problem, you are only adding device to md array...
> >> (adding cc: Neil, this seems to me like MD bug).
> >>
> >> (original report for reference is here http://lkml.org/lkml/2010/1/24/60 )
> > 
> > No, I think it is the same problem.
> > 
> > When you have a stack of devices, the top level client needs to know the
> > maximum restrictions imposed by lower level devices to ensure it doesn't
> > violate them.
> > However there is no mechanism for a device to report that its restrictions
> > have changed.
> > So when md0 gains a linear leg and so needs to reduce the max size for
> > requests, there is no way to tell DM, so DM doesn't know.  And as the
> > filesystem only asks DM for restrictions, it never finds out about the
> > new restrictions.
> 
> Neil, why does it even reduce its block size? I've tried with both 
> "linear" and "raid0" (as they are the only way to get 2T from 4x500G) 
> and both behave the same (sda has 512, md0 127, linear 127 and raid0 has 
> 512 kb block size).
> 
> I do not see the mechanism how 512:127 or 512:512 leads to 4 kb limit

Both raid0 and linear register a 'bvec_mergeable' function (or whatever it is
called today).
This allows for the fact that these devices have restrictions that cannot be
expressed simply with request sizes.  In particular they only handle requests
that don't cross a chunk boundary.

As raid1 never calls the bvec_mergeable function of it's components (it would
be very hard to get that to work reliably, maybe impossible), it treats any
device with a bvec_mergeable function as though the max_sectors were one page.
This is because the interface guarantees that a one page request will always
be handled.

> 
> Is it because:
>   - of rebuilding the array?
>   - of non-multiplicative max block size
>   - of non-multiplicative total device size
>   - of nesting?
>   - of some other fallback to 1 page?

The last I guess.

> 
> I ask because I can not believe that a pre-assembled nested stack would 
> result in 4kb max limit. But I haven't tried yet (e.g. from a live cd).

When people say "I can not believe" I always chuckle to myself.  You just
aren't trying hard enough.  There is adequate evidence that people can
believe whatever they want to believe :-)

> 
> The block device should not do this kind of "magic", unless the higher 
> layers support it. Which one has proper support then?
>   - standard partition table?
>   - LVM?
>   - filesystem drivers?
> 

I don't understand this question, sorry.

Yes, there is definitely something broken here.  Unfortunately fixing it is
non-trivial.

NeilBrown

  reply	other threads:[~2010-01-28 10:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-24 18:49 bio too big - in nested raid setup "Ing. Daniel Rozsnyó"
2010-01-25 15:25 ` Marti Raudsepp
2010-01-25 18:27   ` Milan Broz
2010-01-28  2:28     ` Neil Brown
2010-01-28  9:24       ` "Ing. Daniel Rozsnyó"
2010-01-28 10:50         ` Neil Brown [this message]
2010-01-28 12:07           ` Boaz Harrosh
2010-01-28 22:14             ` Neil Brown
2010-01-31 15:42               ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100128215015.0e0ed3a8@notabene \
    --to=neilb@suse.de \
    --cc=daniel@rozsnyo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marti@juffo.org \
    --cc=mbroz@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox