Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
       [not found] <20110427161901.27049.31001.reportbug@servo.factory.finestructure.net>
@ 2011-04-29  4:39 ` Ben Hutchings
  2011-05-01 22:06   ` Jameson Graef Rollins
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Hutchings @ 2011-04-29  4:39 UTC (permalink / raw)
  To: Jameson Graef Rollins; +Cc: 624343, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]

On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> Package: linux-2.6
> Version: 2.6.38-3
> Severity: normal
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> As you can see from the kern.log snippet below, I am seeing frequent
> messages reporting "bio too big device md0 (248 > 240)".
> 
> I run what I imagine is a fairly unusual disk setup on my laptop,
> consisting of:
> 
>   ssd -> raid1 -> dm-crypt -> lvm -> ext4
> 
> I use the raid1 as a backup.  The raid1 operates normally in degraded
> mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
> then fail/remove the external hdd. 

Well, this is not expected to work.  Possibly the hot-addition of a disk
with different bio restrictions should be rejected.  But I'm not sure,
because it is safe to do that if there is no mounted filesystem or
stacking device on top of the RAID.

I would recommend using filesystem-level backup (e.g. dirvish or
backuppc).  Aside from this bug, if the SSD fails during a RAID resync
you will be left with an inconsistent and therefore useless 'backup'.

> I started noticing these messages after my last sync.  I have not
> rebooted since.
> 
> I found a bug report on the launchpad that describes an almost
> identical situation:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/320638
> 
> The reporter seemed to be concerned that their may be data loss
> happening.  I have not yet noticed any, but of course I'm terrified
> that it's happening and I just haven't found it yet.  Unfortunately
> the bug was closed with a "Won't Fix" without any resolution.
> 
> Is this a kernel bug, or is there something I can do to remedy the
> situation?  I haven't tried to reboot yet to see if the messages stop.
> I'm obviously most worried about data loss.  Please advise!

The block layer correctly returns an error after logging this message.
If it's due to a read operation, the error should be propagated up to
the application that tried to read.  If it's due to a write operation, I
would expect the error to result in the RAID becoming desynchronised.
In some cases it might be propagated to the application that tried to
write.

If the error is somehow discarded then there *is* a kernel bug with the
risk of data loss.

> I am starting to suspect that these messages are in face associated with
> data loss on my system.  I have witnessed these messages occur during
> write operations to the disk, and I have also started to see some
> strange behavior on my system.  dhclient started acting weird after
> these messages appeared (not holding on to leases) and I started to
> notice database exceptions in my mail client.
>
> Interestingly, the messages seem to have gone away after reboot.  I will
> watch closely to see if they return after my next raid1 sync.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-04-29  4:39 ` Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log Ben Hutchings
@ 2011-05-01 22:06   ` Jameson Graef Rollins
  2011-05-02  0:00     ` Ben Hutchings
  2011-05-02  9:11     ` David Brown
  0 siblings, 2 replies; 17+ messages in thread
From: Jameson Graef Rollins @ 2011-05-01 22:06 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: 624343, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2015 bytes --]

On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> > I run what I imagine is a fairly unusual disk setup on my laptop,
> > consisting of:
> > 
> >   ssd -> raid1 -> dm-crypt -> lvm -> ext4
> > 
> > I use the raid1 as a backup.  The raid1 operates normally in degraded
> > mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
> > then fail/remove the external hdd. 
> 
> Well, this is not expected to work.  Possibly the hot-addition of a disk
> with different bio restrictions should be rejected.  But I'm not sure,
> because it is safe to do that if there is no mounted filesystem or
> stacking device on top of the RAID.

Hi, Ben.  Can you explain why this is not expected to work?  Which part
exactly is not expected to work and why?

> I would recommend using filesystem-level backup (e.g. dirvish or
> backuppc).  Aside from this bug, if the SSD fails during a RAID resync
> you will be left with an inconsistent and therefore useless 'backup'.

I appreciate your recommendation, but it doesn't really have anything to
do with this bug report.  Unless I am doing something that is
*expressly* not supposed to work, then it should work, and if it doesn't
then it's either a bug or a documentation failure (ie. if this setup is
not supposed to work then it should be clearly documented somewhere what
exactly the problem is).

> The block layer correctly returns an error after logging this message.
> If it's due to a read operation, the error should be propagated up to
> the application that tried to read.  If it's due to a write operation, I
> would expect the error to result in the RAID becoming desynchronised.
> In some cases it might be propagated to the application that tried to
> write.

Can you say what is "correct" about the returned error?  That's what I'm
still not understanding.  Why is there an error and what is it coming
from?

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-01 22:06   ` Jameson Graef Rollins
@ 2011-05-02  0:00     ` Ben Hutchings
  2011-05-02  0:22       ` NeilBrown
  2011-05-02  0:42       ` Daniel Kahn Gillmor
  2011-05-02  9:11     ` David Brown
  1 sibling, 2 replies; 17+ messages in thread
From: Ben Hutchings @ 2011-05-02  0:00 UTC (permalink / raw)
  To: Jameson Graef Rollins, 624343; +Cc: NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2937 bytes --]

On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> > > I run what I imagine is a fairly unusual disk setup on my laptop,
> > > consisting of:
> > > 
> > >   ssd -> raid1 -> dm-crypt -> lvm -> ext4
> > > 
> > > I use the raid1 as a backup.  The raid1 operates normally in degraded
> > > mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
> > > then fail/remove the external hdd. 
> > 
> > Well, this is not expected to work.  Possibly the hot-addition of a disk
> > with different bio restrictions should be rejected.  But I'm not sure,
> > because it is safe to do that if there is no mounted filesystem or
> > stacking device on top of the RAID.
> 
> Hi, Ben.  Can you explain why this is not expected to work?  Which part
> exactly is not expected to work and why?

Adding another type of disk controller (USB storage versus whatever the
SSD interface is) to a RAID that is already in use.

> > I would recommend using filesystem-level backup (e.g. dirvish or
> > backuppc).  Aside from this bug, if the SSD fails during a RAID resync
> > you will be left with an inconsistent and therefore useless 'backup'.
> 
> I appreciate your recommendation, but it doesn't really have anything to
> do with this bug report.  Unless I am doing something that is
> *expressly* not supposed to work, then it should work, and if it doesn't
> then it's either a bug or a documentation failure (ie. if this setup is
> not supposed to work then it should be clearly documented somewhere what
> exactly the problem is).

The normal state of a RAID set is that all disks are online.  You have
deliberately turned this on its head; the normal state of your RAID set
is that one disk is missing.  This is such a basic principle that most
documentation won't mention it.

> > The block layer correctly returns an error after logging this message.
> > If it's due to a read operation, the error should be propagated up to
> > the application that tried to read.  If it's due to a write operation, I
> > would expect the error to result in the RAID becoming desynchronised.
> > In some cases it might be propagated to the application that tried to
> > write.
> 
> Can you say what is "correct" about the returned error?  That's what I'm
> still not understanding.  Why is there an error and what is it coming
> from?

The error is that you changed the I/O capabilities of the RAID while it
was already in use.  But what I was describing as 'correct' was that an
error code was returned, rather than the error condition only being
logged.  If the error condition is not properly propagated then it could
lead to data loss.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  0:00     ` Ben Hutchings
@ 2011-05-02  0:22       ` NeilBrown
  2011-05-02  2:47         ` Guy Watkins
                           ` (2 more replies)
  2011-05-02  0:42       ` Daniel Kahn Gillmor
  1 sibling, 3 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-02  0:22 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jameson Graef Rollins, 624343, linux-raid

On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings <ben@decadent.org.uk> wrote:

> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> > On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> > > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> > > > I run what I imagine is a fairly unusual disk setup on my laptop,
> > > > consisting of:
> > > > 
> > > >   ssd -> raid1 -> dm-crypt -> lvm -> ext4
> > > > 
> > > > I use the raid1 as a backup.  The raid1 operates normally in degraded
> > > > mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
> > > > then fail/remove the external hdd. 
> > > 
> > > Well, this is not expected to work.  Possibly the hot-addition of a disk
> > > with different bio restrictions should be rejected.  But I'm not sure,
> > > because it is safe to do that if there is no mounted filesystem or
> > > stacking device on top of the RAID.
> > 
> > Hi, Ben.  Can you explain why this is not expected to work?  Which part
> > exactly is not expected to work and why?
> 
> Adding another type of disk controller (USB storage versus whatever the
> SSD interface is) to a RAID that is already in use.

Normally this practice is perfectly OK.
If a filesysytem is mounted directly from an md array, then adding devices
to the array at any time is fine, even if the new devices have quite
different characteristics than the old.

However if there is another layer in between md and the filesystem - such as
dm - then there can be problem.
There is no mechanism in the kernl for md to tell dm that things have
changed, so dm never changes its configuration to match any change in the
config of the md device.

A filesystem always queries the config of the device as it prepares the
request.  As this is not an 'active' query (i.e. it just looks at
variables, it doesn't call a function) there is no opportunity for dm to then
query md.

There is a ->merge_bvec_fn which could be pushed into service.  i.e. if
md/raid1 defined some trivial merge_bvec_fn, then it would probably work.
However the actual effect of this would probably to cause every bio created
by the filesystem to be just one PAGE in size, and this is guaranteed always
to work.  So it could be a significant performance hit for the common case.

We really need either:
 - The fs sends down arbitrarily large requests, and the lower layers split
   them up if/when needed
or
 - A mechanism for a block device to tell the layer above that something has
   changed.

But these are both fairly intrusive which unclear performance/complexity
implications and no one has bothered.

NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  0:00     ` Ben Hutchings
  2011-05-02  0:22       ` NeilBrown
@ 2011-05-02  0:42       ` Daniel Kahn Gillmor
  2011-05-02  1:04         ` Ben Hutchings
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Kahn Gillmor @ 2011-05-02  0:42 UTC (permalink / raw)
  To: Ben Hutchings, 624343; +Cc: Jameson Graef Rollins, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

On 05/01/2011 08:00 PM, Ben Hutchings wrote:
> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
>> Hi, Ben.  Can you explain why this is not expected to work?  Which part
>> exactly is not expected to work and why?
> 
> Adding another type of disk controller (USB storage versus whatever the
> SSD interface is) to a RAID that is already in use.
> 
 [...]
> The normal state of a RAID set is that all disks are online.  You have
> deliberately turned this on its head; the normal state of your RAID set
> is that one disk is missing.  This is such a basic principle that most
> documentation won't mention it.

This is somewhat worrisome to me.  Consider a fileserver with
non-hotswap disks.  One disk fails in the morning, but the machine is in
production use, and the admin's goals are:

 * minimize downtime,
 * reboot only during off-hours, and
 * minimize the amount of time that the array is spent de-synced.

A responsible admin might reasonably expect to attach a disk via a
well-tested USB or ieee1394 adapter, bring the array back into sync,
announce to the rest of the organization that there will be a scheduled
reboot later in the evening.

Then, at the scheduled reboot, move the disk from the USB/ieee1394
adapter to the direct ATA interface on the machine.

If this sequence of operations is likely (or even possible) to cause
data loss, it should be spelled out in BIG RED LETTERS someplace.  I
don't think any of the above steps seem unreasonable, and the set of
goals the admin is attempting to meet are certainly commonplace goals.

> The error is that you changed the I/O capabilities of the RAID while it
> was already in use.  But what I was describing as 'correct' was that an
> error code was returned, rather than the error condition only being
> logged.  If the error condition is not properly propagated then it could
> lead to data loss.

How is an admin to know which I/O capabilities to check before adding a
device to a RAID array?  When is it acceptable to mix I/O capabilities?
 Can a RAID array which is not currently being used as a backing store
for a filesystem be assembled of unlike disks?  What if it is then
(later) used as a backing store for a filesystem?

One of the advantages people tout for in-kernel software raid (over many
H/W RAID implementations) is the ability to mix disks, so that you're
not reliant on a single vendor during a failure.  If this advantage
doesn't extend across certain classes of disk, it would be good to be
unambiguous about what can be mixed and what cannot.

Regards,

	--dkg

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1030 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  0:42       ` Daniel Kahn Gillmor
@ 2011-05-02  1:04         ` Ben Hutchings
  2011-05-02  1:17           ` Jameson Graef Rollins
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Hutchings @ 2011-05-02  1:04 UTC (permalink / raw)
  To: Daniel Kahn Gillmor; +Cc: 624343, Jameson Graef Rollins, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2768 bytes --]

On Sun, 2011-05-01 at 20:42 -0400, Daniel Kahn Gillmor wrote:
> On 05/01/2011 08:00 PM, Ben Hutchings wrote:
> > On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> >> Hi, Ben.  Can you explain why this is not expected to work?  Which part
> >> exactly is not expected to work and why?
> > 
> > Adding another type of disk controller (USB storage versus whatever the
> > SSD interface is) to a RAID that is already in use.
> > 
>  [...]
> > The normal state of a RAID set is that all disks are online.  You have
> > deliberately turned this on its head; the normal state of your RAID set
> > is that one disk is missing.  This is such a basic principle that most
> > documentation won't mention it.
> 
> This is somewhat worrisome to me.  Consider a fileserver with
> non-hotswap disks.  One disk fails in the morning, but the machine is in
> production use, and the admin's goals are:
> 
>  * minimize downtime,
>  * reboot only during off-hours, and
>  * minimize the amount of time that the array is spent de-synced.
> 
> A responsible admin might reasonably expect to attach a disk via a
> well-tested USB or ieee1394 adapter, bring the array back into sync,
> announce to the rest of the organization that there will be a scheduled
> reboot later in the evening.
> 
> Then, at the scheduled reboot, move the disk from the USB/ieee1394
> adapter to the direct ATA interface on the machine.
> 
> If this sequence of operations is likely (or even possible) to cause
> data loss, it should be spelled out in BIG RED LETTERS someplace.

So far as I'm aware, the RAID may stop working, but without loss of data
that's already on disk.

> I don't think any of the above steps seem unreasonable, and the set of
> goals the admin is attempting to meet are certainly commonplace goals.
> 
> > The error is that you changed the I/O capabilities of the RAID while it
> > was already in use.  But what I was describing as 'correct' was that an
> > error code was returned, rather than the error condition only being
> > logged.  If the error condition is not properly propagated then it could
> > lead to data loss.
> 
> How is an admin to know which I/O capabilities to check before adding a
> device to a RAID array?  When is it acceptable to mix I/O capabilities?
>  Can a RAID array which is not currently being used as a backing store
> for a filesystem be assembled of unlike disks?  What if it is then
> (later) used as a backing store for a filesystem?
[...]

I think the answers are:
- Not easily
- When the RAID does not have another device on top
- Yes
- Yes
but Neil can correct me on this.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  1:04         ` Ben Hutchings
@ 2011-05-02  1:17           ` Jameson Graef Rollins
  2011-05-02  9:05             ` David Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Jameson Graef Rollins @ 2011-05-02  1:17 UTC (permalink / raw)
  To: Ben Hutchings, Daniel Kahn Gillmor; +Cc: 624343, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1452 bytes --]

On Mon, 02 May 2011 02:04:18 +0100, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Sun, 2011-05-01 at 20:42 -0400, Daniel Kahn Gillmor wrote:
> So far as I'm aware, the RAID may stop working, but without loss of data
> that's already on disk.

What exactly does "RAID may stop working mean"?  Do you mean that this
bug will be triggered?  The raid will refuse to do further syncs?  Or do
you mean something else?

> > How is an admin to know which I/O capabilities to check before adding a
> > device to a RAID array?  When is it acceptable to mix I/O capabilities?
> >  Can a RAID array which is not currently being used as a backing store
> > for a filesystem be assembled of unlike disks?  What if it is then
> > (later) used as a backing store for a filesystem?
> [...]
> 
> I think the answers are:
> - Not easily
> - When the RAID does not have another device on top

This is very upsetting to me, if it's true.  It completely undermines
all of my assumptions about how software raid works.

Are you really saying that md with mixed disks is not possible/supported
when the md device has *any* other device on top of it?  This is a in
fact a *very* common setup.  *ALL* of my raid devices have other devices
on top of them (lvm at least).  In fact, the debian installer supports
putting dm and/or lvm on top of md on mixed disks.  If what you're
saying is true then the debian installer is in big trouble.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  0:22       ` NeilBrown
@ 2011-05-02  2:47         ` Guy Watkins
  2011-05-02  5:07         ` Daniel Kahn Gillmor
  2011-05-02  9:08         ` David Brown
  2 siblings, 0 replies; 17+ messages in thread
From: Guy Watkins @ 2011-05-02  2:47 UTC (permalink / raw)
  To: 'NeilBrown', 'Ben Hutchings'
  Cc: 'Jameson Graef Rollins', 624343, linux-raid

} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of NeilBrown
} Sent: Sunday, May 01, 2011 8:22 PM
} To: Ben Hutchings
} Cc: Jameson Graef Rollins; 624343@bugs.debian.org; linux-
} raid@vger.kernel.org
} Subject: Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio
} too big device md0 (248 > 240)" in kern.log
} 
} On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings <ben@decadent.org.uk>
} wrote:
} 
} > On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
} > > On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings
} <ben@decadent.org.uk> wrote:
} > > > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
} > > > > I run what I imagine is a fairly unusual disk setup on my laptop,
} > > > > consisting of:
} > > > >
} > > > >   ssd -> raid1 -> dm-crypt -> lvm -> ext4
} > > > >
} > > > > I use the raid1 as a backup.  The raid1 operates normally in
} degraded
} > > > > mode.  For backups I then hot-add a usb hdd, let the raid1 sync,
} and
} > > > > then fail/remove the external hdd.
} > > >
} > > > Well, this is not expected to work.  Possibly the hot-addition of a
} disk
} > > > with different bio restrictions should be rejected.  But I'm not
} sure,
} > > > because it is safe to do that if there is no mounted filesystem or
} > > > stacking device on top of the RAID.
} > >
} > > Hi, Ben.  Can you explain why this is not expected to work?  Which
} part
} > > exactly is not expected to work and why?
} >
} > Adding another type of disk controller (USB storage versus whatever the
} > SSD interface is) to a RAID that is already in use.
} 
} Normally this practice is perfectly OK.
} If a filesysytem is mounted directly from an md array, then adding devices
} to the array at any time is fine, even if the new devices have quite
} different characteristics than the old.
} 
} However if there is another layer in between md and the filesystem - such
} as
} dm - then there can be problem.
} There is no mechanism in the kernl for md to tell dm that things have
} changed, so dm never changes its configuration to match any change in the
} config of the md device.
} 
} A filesystem always queries the config of the device as it prepares the
} request.  As this is not an 'active' query (i.e. it just looks at
} variables, it doesn't call a function) there is no opportunity for dm to
} then
} query md.
} 
} There is a ->merge_bvec_fn which could be pushed into service.  i.e. if
} md/raid1 defined some trivial merge_bvec_fn, then it would probably work.
} However the actual effect of this would probably to cause every bio
} created
} by the filesystem to be just one PAGE in size, and this is guaranteed
} always
} to work.  So it could be a significant performance hit for the common
} case.
} 
} We really need either:
}  - The fs sends down arbitrarily large requests, and the lower layers
} split
}    them up if/when needed
} or
}  - A mechanism for a block device to tell the layer above that something
} has
}    changed.
} 
} But these are both fairly intrusive which unclear performance/complexity
} implications and no one has bothered.
} 
} NeilBrown

Maybe mdadm should not allow a disk to be added if its characteristics are
different enough to be an issue?  And require the --force option if the
admin really wants to do it anyhow.

Oh, and a good error message explaining the issues and risks.  :)

Guy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  0:22       ` NeilBrown
  2011-05-02  2:47         ` Guy Watkins
@ 2011-05-02  5:07         ` Daniel Kahn Gillmor
  2011-05-02  9:08         ` David Brown
  2 siblings, 0 replies; 17+ messages in thread
From: Daniel Kahn Gillmor @ 2011-05-02  5:07 UTC (permalink / raw)
  To: NeilBrown, 624343; +Cc: Ben Hutchings, Jameson Graef Rollins, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]

On 05/01/2011 08:22 PM, NeilBrown wrote:
> However if there is another layer in between md and the filesystem - such as
> dm - then there can be problem.
> There is no mechanism in the kernl for md to tell dm that things have
> changed, so dm never changes its configuration to match any change in the
> config of the md device.
> 
> A filesystem always queries the config of the device as it prepares the
> request.  As this is not an 'active' query (i.e. it just looks at
> variables, it doesn't call a function) there is no opportunity for dm to then
> query md.

Thanks for this followup, Neil.

Just to clarify, it sounds like any one of the following situations on
its own is *not* problematic from the kernel's perspective:

 0) having a RAID array that is more often in a de-synced state than in
an online state.

 1) mixing various types of disk in a single RAID array (e.g. SSD and
spinning metal)

 2) mixing various disk access channels within a single RAID array (e.g.
USB and SATA)

 3) putting other block device layers (e.g. loopback, dm-crypt, dm (via
lvm or otherwise) above md and below a filesystem

 4) hot-adding a device to an active RAID array from which filesystems
are mounted.

However, having any layers between md and the filesystem becomes
problematic if the array is re-synced while the filesystem is online,
because the intermediate layer can't communicate $SOMETHING (what
specifically?) from md to the kernel's filesystem code.

As a workaround, would the following sequence of actions (perhaps
impossible for any given machine's operational state) allow a RAID
re-sync without the errors jrollins reports or requiring a reboot?

 a) unmount all filesystems which ultimately derive from the RAID array
 b) hot-add the device with mdadm
 c) re-mount the filesystems

or would something else need to be done with lvm (or cryptsetup, or the
loopback device) between steps b and c?

Coming at it from another angle: is there a way that an admin can ensure
that the RAID array can be re-synced without unmounting the filesystems
other than limiting themselves to exactly the same models of hardware
for all components in the storage chain?

Alternately, Is there a way to manually inform a given mounted
filesystem that it should change $SOMETHING (what?), so that an aware
admin could keep filesystems online by issuing this instruction before a
raid re-sync?

From a modular-kernel perspective: Is this specifically a problem with
md itself, or would it also be the case with other block-device layering
in the kernel?  For example, suppose an admin has (without md) lvm over
a bare disk, and a filesystem mounted from an LV.  The admin then adds a
second bare disk as a PV to the VG, and uses pvmove to transfer the
physical extents of the active filesystem to the new disk, while
mounted.  Assuming that the new disk doesn't have the same
characteristics (which characteristics?), does the fact that LVM sits
between the underlying disk and the filesystem cause the same problem?
What if dm-crypt sits between the disk and lvm?  Between lvm and the
filesystem?

What if the layering is disk-dm-md-fs instead of disk-md-dm-fs ?

Sorry for all the questions without having much concrete to contribute
at the moment.  If these limitations are actually well-documented
somewhere, I would be grateful for a pointer.  As a systems
administrator, i would be unhappy to be caught out by some
as-yet-unknown constraints during a hardware failure.  I'd like to at
least know my constraints beforehand.

Regards,

	--dkg

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1030 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  1:17           ` Jameson Graef Rollins
@ 2011-05-02  9:05             ` David Brown
  0 siblings, 0 replies; 17+ messages in thread
From: David Brown @ 2011-05-02  9:05 UTC (permalink / raw)
  To: linux-raid

On 02/05/2011 03:17, Jameson Graef Rollins wrote:
> On Mon, 02 May 2011 02:04:18 +0100, Ben Hutchings<ben@decadent.org.uk>  wrote:
>> On Sun, 2011-05-01 at 20:42 -0400, Daniel Kahn Gillmor wrote:
>> So far as I'm aware, the RAID may stop working, but without loss of data
>> that's already on disk.
>
> What exactly does "RAID may stop working mean"?  Do you mean that this
> bug will be triggered?  The raid will refuse to do further syncs?  Or do
> you mean something else?
>
>>> How is an admin to know which I/O capabilities to check before adding a
>>> device to a RAID array?  When is it acceptable to mix I/O capabilities?
>>>   Can a RAID array which is not currently being used as a backing store
>>> for a filesystem be assembled of unlike disks?  What if it is then
>>> (later) used as a backing store for a filesystem?
>> [...]
>>
>> I think the answers are:
>> - Not easily
>> - When the RAID does not have another device on top
>
> This is very upsetting to me, if it's true.  It completely undermines
> all of my assumptions about how software raid works.
>
> Are you really saying that md with mixed disks is not possible/supported
> when the md device has *any* other device on top of it?  This is a in
> fact a *very* common setup.  *ALL* of my raid devices have other devices
> on top of them (lvm at least).  In fact, the debian installer supports
> putting dm and/or lvm on top of md on mixed disks.  If what you're
> saying is true then the debian installer is in big trouble.
>
> jamie.

I can't imagine that this is the case - layered setups are perfectly 
standard.  While the dm-layer might be less used, it is normal practice 
to have lvm on top of md raids, and it is not uncommon to have more than 
one md layer (such as raid50 setups).  It is also perfectly reasonable 
to through USB media into the mix (though depending on the 
kernel/distro, there may be boot issues if the USB disk is not stable 
fast enough during booting - I had such problems with a USB disk in an 
LVM setup without md raid).

As far as I understand it, there are two sorts of communication between 
the layers of the block devices.  There is the block device access 
itself - the ability to read and write blocks of data.  And there is the 
metadata, covering things like sizes, stripe information, etc.  Only the 
block access is actually needed to get everything working - the other 
information is used for things like resizing, filesystem layout 
optimisation, etc.

The whole point of the layered block system is that the block access 
layers are independent and isolated.  So if you have a dm layer on top 
of an md layer, then the dm layer should not care how the md layer is 
implemented - it just sees a /dev/mdX device.  It doesn't matter if it's 
a degraded raid1 or anything else.  As long as the /dev/mdX device stays 
up, it should not matter that you add or remove devices, or what type of 
underlying device is used.

Similarly, the md raid1 layer is mainly interested in the block access - 
it will work with any block devices.  It will use the metadata to 
improve things like resizes, and perhaps to optimise accesses, but it 
should work /correctly/ (though perhaps slower than optimal) regardless 
of the mix of disks.

I have used layered setups and odd block devices (such as loopback 
devices on files on a tmpfs mount and multiple md layers) - getting 
resizing to work properly involved a little more effort, but it all 
worked perfectly.  I haven't tried such a mix as the OP has been describing.

If my understanding of the block layers is wrong, then I too would like 
to know - running lvm on top of md raid is essential capability, as is 
using USB disks as temporary additions to an array.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  0:22       ` NeilBrown
  2011-05-02  2:47         ` Guy Watkins
  2011-05-02  5:07         ` Daniel Kahn Gillmor
@ 2011-05-02  9:08         ` David Brown
  2011-05-02 10:00           ` NeilBrown
  2 siblings, 1 reply; 17+ messages in thread
From: David Brown @ 2011-05-02  9:08 UTC (permalink / raw)
  To: linux-raid

On 02/05/2011 02:22, NeilBrown wrote:
> On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings<ben@decadent.org.uk>  wrote:
>
>> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
>>> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings<ben@decadent.org.uk>  wrote:
>>>> On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
>>>>> I run what I imagine is a fairly unusual disk setup on my laptop,
>>>>> consisting of:
>>>>>
>>>>>    ssd ->  raid1 ->  dm-crypt ->  lvm ->  ext4
>>>>>
>>>>> I use the raid1 as a backup.  The raid1 operates normally in degraded
>>>>> mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
>>>>> then fail/remove the external hdd.
>>>>
>>>> Well, this is not expected to work.  Possibly the hot-addition of a disk
>>>> with different bio restrictions should be rejected.  But I'm not sure,
>>>> because it is safe to do that if there is no mounted filesystem or
>>>> stacking device on top of the RAID.
>>>
>>> Hi, Ben.  Can you explain why this is not expected to work?  Which part
>>> exactly is not expected to work and why?
>>
>> Adding another type of disk controller (USB storage versus whatever the
>> SSD interface is) to a RAID that is already in use.
>
> Normally this practice is perfectly OK.
> If a filesysytem is mounted directly from an md array, then adding devices
> to the array at any time is fine, even if the new devices have quite
> different characteristics than the old.
>
> However if there is another layer in between md and the filesystem - such as
> dm - then there can be problem.
> There is no mechanism in the kernl for md to tell dm that things have
> changed, so dm never changes its configuration to match any change in the
> config of the md device.
>

While I can see that there might be limitations in informing the dm 
layer about changes to the md layer, I fail to see what changes we are 
talking about.  If the OP were changing the size of the raid1, for 
example, then that would be a metadata change that needed to propagate 
up so that lvm could grow its physical volume.  But the dm layer should 
not care if a disk is added or removed from the md raid1 set - as long 
as the /dev/mdX device stays online and valid, it should work correctly.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-01 22:06   ` Jameson Graef Rollins
  2011-05-02  0:00     ` Ben Hutchings
@ 2011-05-02  9:11     ` David Brown
  2011-05-02 16:38       ` Jameson Graef Rollins
  1 sibling, 1 reply; 17+ messages in thread
From: David Brown @ 2011-05-02  9:11 UTC (permalink / raw)
  Cc: Ben Hutchings, 624343, NeilBrown, linux-raid

On 02/05/2011 00:06, Jameson Graef Rollins wrote:
> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings<ben@decadent.org.uk>  wrote:
>> On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
>>> I run what I imagine is a fairly unusual disk setup on my laptop,
>>> consisting of:
>>>
>>>    ssd ->  raid1 ->  dm-crypt ->  lvm ->  ext4
>>>
>>> I use the raid1 as a backup.  The raid1 operates normally in degraded
>>> mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
>>> then fail/remove the external hdd.

This is not directly related to your issues here, but it is possible to 
make a 1-disk raid1 set so that you are not normally degraded.  When you 
want to do the backup, you can grow the raid1 set with the usb disk, 
want for the resync, then fail it and remove it, then "grow" the raid1 
back to 1 disk.  That way you don't feel you are always living in a 
degraded state.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  9:08         ` David Brown
@ 2011-05-02 10:00           ` NeilBrown
  2011-05-02 10:32             ` David Brown
  2011-05-02 14:56             ` David Brown
  0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-02 10:00 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On Mon, 02 May 2011 11:08:11 +0200 David Brown <david@westcontrol.com> wrote:

> On 02/05/2011 02:22, NeilBrown wrote:
> > On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings<ben@decadent.org.uk>  wrote:
> >
> >> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> >>> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings<ben@decadent.org.uk>  wrote:
> >>>> On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> >>>>> I run what I imagine is a fairly unusual disk setup on my laptop,
> >>>>> consisting of:
> >>>>>
> >>>>>    ssd ->  raid1 ->  dm-crypt ->  lvm ->  ext4
> >>>>>
> >>>>> I use the raid1 as a backup.  The raid1 operates normally in degraded
> >>>>> mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
> >>>>> then fail/remove the external hdd.
> >>>>
> >>>> Well, this is not expected to work.  Possibly the hot-addition of a disk
> >>>> with different bio restrictions should be rejected.  But I'm not sure,
> >>>> because it is safe to do that if there is no mounted filesystem or
> >>>> stacking device on top of the RAID.
> >>>
> >>> Hi, Ben.  Can you explain why this is not expected to work?  Which part
> >>> exactly is not expected to work and why?
> >>
> >> Adding another type of disk controller (USB storage versus whatever the
> >> SSD interface is) to a RAID that is already in use.
> >
> > Normally this practice is perfectly OK.
> > If a filesysytem is mounted directly from an md array, then adding devices
> > to the array at any time is fine, even if the new devices have quite
> > different characteristics than the old.
> >
> > However if there is another layer in between md and the filesystem - such as
> > dm - then there can be problem.
> > There is no mechanism in the kernl for md to tell dm that things have
> > changed, so dm never changes its configuration to match any change in the
> > config of the md device.
> >
> 
> While I can see that there might be limitations in informing the dm 
> layer about changes to the md layer, I fail to see what changes we are 
> talking about.  If the OP were changing the size of the raid1, for 
> example, then that would be a metadata change that needed to propagate 
> up so that lvm could grow its physical volume.  But the dm layer should 
> not care if a disk is added or removed from the md raid1 set - as long 
> as the /dev/mdX device stays online and valid, it should work correctly.
> 

The changes we are talking about are "maximum supported request size" aka
max_sectors.

md sets max_sectors from the minimum of the max_sectors values of all
component devices.  Of course if a device changes its max_sectors value, md
won't notice.

dm sets max_sectors from the minimum of the max_sectors values of all
component devices.  Of course if a device changes its max_sectors value
after it has been included in the map, dm doesn't notice.

Every time a filesystem creates a request, it check the max_sectors of
the device and limits the request size accordingly.

So if I add a device to an md/raid array which has a smaller max_sectors
value, then the max_sectors of the md array will change, but no-one will
notice.

There might be a way to tell dm to re-evaluate max_sectors etc, I don't
know.  But even if there was having to do that would be a clumsy solution.

NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02 10:00           ` NeilBrown
@ 2011-05-02 10:32             ` David Brown
  2011-05-02 14:56             ` David Brown
  1 sibling, 0 replies; 17+ messages in thread
From: David Brown @ 2011-05-02 10:32 UTC (permalink / raw)
  To: linux-raid

On 02/05/2011 12:00, NeilBrown wrote:
> On Mon, 02 May 2011 11:08:11 +0200 David Brown<david@westcontrol.com>  wrote:
>
>> On 02/05/2011 02:22, NeilBrown wrote:
>>> On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings<ben@decadent.org.uk>   wrote:
>>>
>>>> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
>>>>> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings<ben@decadent.org.uk>   wrote:
>>>>>> On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
>>>>>>> I run what I imagine is a fairly unusual disk setup on my laptop,
>>>>>>> consisting of:
>>>>>>>
>>>>>>>     ssd ->   raid1 ->   dm-crypt ->   lvm ->   ext4
>>>>>>>
>>>>>>> I use the raid1 as a backup.  The raid1 operates normally in degraded
>>>>>>> mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
>>>>>>> then fail/remove the external hdd.
>>>>>>
>>>>>> Well, this is not expected to work.  Possibly the hot-addition of a disk
>>>>>> with different bio restrictions should be rejected.  But I'm not sure,
>>>>>> because it is safe to do that if there is no mounted filesystem or
>>>>>> stacking device on top of the RAID.
>>>>>
>>>>> Hi, Ben.  Can you explain why this is not expected to work?  Which part
>>>>> exactly is not expected to work and why?
>>>>
>>>> Adding another type of disk controller (USB storage versus whatever the
>>>> SSD interface is) to a RAID that is already in use.
>>>
>>> Normally this practice is perfectly OK.
>>> If a filesysytem is mounted directly from an md array, then adding devices
>>> to the array at any time is fine, even if the new devices have quite
>>> different characteristics than the old.
>>>
>>> However if there is another layer in between md and the filesystem - such as
>>> dm - then there can be problem.
>>> There is no mechanism in the kernl for md to tell dm that things have
>>> changed, so dm never changes its configuration to match any change in the
>>> config of the md device.
>>>
>>
>> While I can see that there might be limitations in informing the dm
>> layer about changes to the md layer, I fail to see what changes we are
>> talking about.  If the OP were changing the size of the raid1, for
>> example, then that would be a metadata change that needed to propagate
>> up so that lvm could grow its physical volume.  But the dm layer should
>> not care if a disk is added or removed from the md raid1 set - as long
>> as the /dev/mdX device stays online and valid, it should work correctly.
>>
>
> The changes we are talking about are "maximum supported request size" aka
> max_sectors.
>
> md sets max_sectors from the minimum of the max_sectors values of all
> component devices.  Of course if a device changes its max_sectors value, md
> won't notice.
>
> dm sets max_sectors from the minimum of the max_sectors values of all
> component devices.  Of course if a device changes its max_sectors value
> after it has been included in the map, dm doesn't notice.
>
> Every time a filesystem creates a request, it check the max_sectors of
> the device and limits the request size accordingly.
>
> So if I add a device to an md/raid array which has a smaller max_sectors
> value, then the max_sectors of the md array will change, but no-one will
> notice.
>
> There might be a way to tell dm to re-evaluate max_sectors etc, I don't
> know.  But even if there was having to do that would be a clumsy solution.
>

I would think the one possible solution would be that if one layer (or 
the filesystem) requested too large a block from the next layer, there 
would be an error code returned.  This would avoid the need to do any 
re-evaluation under normal circumstances - there would only be extra 
effort if the devices changed and caused a problem.

Another idea would be for a block layer to transparently split the 
request into multiple smaller requests if needed - but that may cause 
other complications such as when the order and atomicity of requests is 
important.

Is there such a big variation of the maximum supported request size?  Or 
could it simply be fixed at some value - and any devices that can't 
support that directly would emulate it with multiple requests?

I know next to nothing about the internals of the block layers (in case 
that's not obvious from my posts...).  But this strikes me as a strange 
limitation to have on the flexibility of block layers.

I can see that in the great majority of cases, the maximum request size 
would be static - as the layers of raid, dm, etc., are built up, with 
the filesystem mounted on top, then the request size limitations are 
passed on.  What is unusual about the circumstances we are discussing is 
that the maximum request size may change dynamically.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02 10:00           ` NeilBrown
  2011-05-02 10:32             ` David Brown
@ 2011-05-02 14:56             ` David Brown
  1 sibling, 0 replies; 17+ messages in thread
From: David Brown @ 2011-05-02 14:56 UTC (permalink / raw)
  To: linux-raid

On 02/05/2011 12:00, NeilBrown wrote:
> On Mon, 02 May 2011 11:08:11 +0200 David Brown<david@westcontrol.com>  wrote:
>
>> On 02/05/2011 02:22, NeilBrown wrote:
>>> On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings<ben@decadent.org.uk>   wrote:
>>>
>>>> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
>>>>> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings<ben@decadent.org.uk>   wrote:
>>>>>> On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
>>>>>>> I run what I imagine is a fairly unusual disk setup on my laptop,
>>>>>>> consisting of:
>>>>>>>
>>>>>>>     ssd ->   raid1 ->   dm-crypt ->   lvm ->   ext4
>>>>>>>
>>>>>>> I use the raid1 as a backup.  The raid1 operates normally in degraded
>>>>>>> mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
>>>>>>> then fail/remove the external hdd.
>>>>>>
>>>>>> Well, this is not expected to work.  Possibly the hot-addition of a disk
>>>>>> with different bio restrictions should be rejected.  But I'm not sure,
>>>>>> because it is safe to do that if there is no mounted filesystem or
>>>>>> stacking device on top of the RAID.
>>>>>
>>>>> Hi, Ben.  Can you explain why this is not expected to work?  Which part
>>>>> exactly is not expected to work and why?
>>>>
>>>> Adding another type of disk controller (USB storage versus whatever the
>>>> SSD interface is) to a RAID that is already in use.
>>>
>>> Normally this practice is perfectly OK.
>>> If a filesysytem is mounted directly from an md array, then adding devices
>>> to the array at any time is fine, even if the new devices have quite
>>> different characteristics than the old.
>>>
>>> However if there is another layer in between md and the filesystem - such as
>>> dm - then there can be problem.
>>> There is no mechanism in the kernl for md to tell dm that things have
>>> changed, so dm never changes its configuration to match any change in the
>>> config of the md device.
>>>
>>
>> While I can see that there might be limitations in informing the dm
>> layer about changes to the md layer, I fail to see what changes we are
>> talking about.  If the OP were changing the size of the raid1, for
>> example, then that would be a metadata change that needed to propagate
>> up so that lvm could grow its physical volume.  But the dm layer should
>> not care if a disk is added or removed from the md raid1 set - as long
>> as the /dev/mdX device stays online and valid, it should work correctly.
>>
>
> The changes we are talking about are "maximum supported request size" aka
> max_sectors.
>
> md sets max_sectors from the minimum of the max_sectors values of all
> component devices.  Of course if a device changes its max_sectors value, md
> won't notice.
>
> dm sets max_sectors from the minimum of the max_sectors values of all
> component devices.  Of course if a device changes its max_sectors value
> after it has been included in the map, dm doesn't notice.
>
> Every time a filesystem creates a request, it check the max_sectors of
> the device and limits the request size accordingly.
>
> So if I add a device to an md/raid array which has a smaller max_sectors
> value, then the max_sectors of the md array will change, but no-one will
> notice.
>
> There might be a way to tell dm to re-evaluate max_sectors etc, I don't
> know.  But even if there was having to do that would be a clumsy solution.
>

I've done a little more reading about max_sectors, and it seems to be 
specific to USB.  It is also dynamically configurable through the sysfs 
interface.  This means that the user can freely play around with the 
value while testing throughput on a USB device.  So if the dm layer has 
a problem with underlying devices changing their max_sectors value, then 
anyone fiddling with max_sectors on a USB device with a dm layer and a 
mounted filesystem is going to get in big trouble.  Since dm-crypt and 
USB are a common combination (encrypted flash stick), surely this is a 
serious bug in the dm code?




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02  9:11     ` David Brown
@ 2011-05-02 16:38       ` Jameson Graef Rollins
  2011-05-02 18:54         ` David Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Jameson Graef Rollins @ 2011-05-02 16:38 UTC (permalink / raw)
  To: David Brown; +Cc: Ben Hutchings, 624343, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]

On Mon, 02 May 2011 11:11:25 +0200, David Brown <david@westcontrol.com> wrote:
> This is not directly related to your issues here, but it is possible to 
> make a 1-disk raid1 set so that you are not normally degraded.  When you 
> want to do the backup, you can grow the raid1 set with the usb disk, 
> want for the resync, then fail it and remove it, then "grow" the raid1 
> back to 1 disk.  That way you don't feel you are always living in a 
> degraded state.

Hi, David.  I appreciate the concern, but I am not at all concerned
about "living in a degraded state".  I'm far more concerned about data
loss and the fact that this bug has seemingly revealed that some
commonly held assumptions and uses of software raid are wrong, with
potentially far-reaching affects.

I also don't see how the setup you're describing will avoid this bug.
If this bug is triggered by having a layer between md and the filesystem
and then changing the raid configuration by adding or removing a disk,
then I don't see how there's a difference between hot-adding to a
degraded array and growing a single-disk raid1.  In fact, I would
suspect that your suggestion would be more problematic because it
involves *two* raid reconfigurations (grow and then shrink) rather than
one (hot-add) to achieve the same result.  I imagine that each raid
reconfiguration could potentially triggering the bug.  But I still don't
have a clear understanding of what is going on here to be sure.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log
  2011-05-02 16:38       ` Jameson Graef Rollins
@ 2011-05-02 18:54         ` David Brown
  0 siblings, 0 replies; 17+ messages in thread
From: David Brown @ 2011-05-02 18:54 UTC (permalink / raw)
  Cc: David Brown, Ben Hutchings, 624343, NeilBrown, linux-raid

On 02/05/11 18:38, Jameson Graef Rollins wrote:
> On Mon, 02 May 2011 11:11:25 +0200, David Brown<david@westcontrol.com>  wrote:
>> This is not directly related to your issues here, but it is possible to
>> make a 1-disk raid1 set so that you are not normally degraded.  When you
>> want to do the backup, you can grow the raid1 set with the usb disk,
>> want for the resync, then fail it and remove it, then "grow" the raid1
>> back to 1 disk.  That way you don't feel you are always living in a
>> degraded state.
>
> Hi, David.  I appreciate the concern, but I am not at all concerned
> about "living in a degraded state".  I'm far more concerned about data
> loss and the fact that this bug has seemingly revealed that some
> commonly held assumptions and uses of software raid are wrong, with
> potentially far-reaching affects.
>
> I also don't see how the setup you're describing will avoid this bug.
> If this bug is triggered by having a layer between md and the filesystem
> and then changing the raid configuration by adding or removing a disk,
> then I don't see how there's a difference between hot-adding to a
> degraded array and growing a single-disk raid1.  In fact, I would
> suspect that your suggestion would be more problematic because it
> involves *two* raid reconfigurations (grow and then shrink) rather than
> one (hot-add) to achieve the same result.  I imagine that each raid
> reconfiguration could potentially triggering the bug.  But I still don't
> have a clear understanding of what is going on here to be sure.
>

I didn't mean to suggest this as a way around these issues - I was just 
making a side point.  Like you and others in this thread, I am concerned 
about failures that could be caused by having the sort of layered and 
non-homogeneous raid you describe.

I merely mentioned single-disk raid1 "mirrors" as an interesting feature 
you can get with md raid.  Many people don't like to have their system 
in a continuous error state - it can make it harder to notice when you 
have a /real/ problem.  And single-disk "mirrors" gives you the same 
features, but no "degraded" state.

As you say, it is conceivable that adding or removing disks to the raid 
could make matters worse.

 From what I have read so far, it looks like you can get around problems 
here if the usb disk is attached when the block layers are built up 
(i.e., when the dm-crypt is activated, and the lvm and filesystems on 
top of it).  It should then be safe to remove it, and re-attach it 
later.  Of course, it's hardly ideal to have to attach your backup 
device every time you boot the machine!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-05-02 18:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20110427161901.27049.31001.reportbug@servo.factory.finestructure.net>
2011-04-29  4:39 ` Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log Ben Hutchings
2011-05-01 22:06   ` Jameson Graef Rollins
2011-05-02  0:00     ` Ben Hutchings
2011-05-02  0:22       ` NeilBrown
2011-05-02  2:47         ` Guy Watkins
2011-05-02  5:07         ` Daniel Kahn Gillmor
2011-05-02  9:08         ` David Brown
2011-05-02 10:00           ` NeilBrown
2011-05-02 10:32             ` David Brown
2011-05-02 14:56             ` David Brown
2011-05-02  0:42       ` Daniel Kahn Gillmor
2011-05-02  1:04         ` Ben Hutchings
2011-05-02  1:17           ` Jameson Graef Rollins
2011-05-02  9:05             ` David Brown
2011-05-02  9:11     ` David Brown
2011-05-02 16:38       ` Jameson Graef Rollins
2011-05-02 18:54         ` David Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).