Weird Issue with raid 5+0

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Weird Issue with raid 5+0
@ 2010-02-21  4:33 chris
  2010-02-21  5:48 ` Neil Brown
  0 siblings, 1 reply; 17+ messages in thread
From: chris @ 2010-02-21  4:33 UTC (permalink / raw)
  To: linux-raid

Hello,

I am trying to setup a raid 5+0 on 6 1TB sata disks. I created the
arrays like so:

mdadm --create /dev/md2 --level=5 --raid-devices=2 /dev/sda /dev/sdb /dev/sdc
mdadm --create /dev/md3 --level=5 --raid-devices=2 /dev/sdd /dev/sde /dev/sdf
mdadm --create /dev/md4 --level=0 --raid-devices=2 /dev/md2 /dev/md3

The arrays create and sync fine, then I put lvm on top and create a
volume group and everything seems fine. I created 2 logical volumes
and formatted them with filesystems and initially didn't realize
anything was wrong. After running 2 virtual machines on them for a
while  I noticed the vm's were reporting bad blocks on the volume. I
looked in the dom0 dmesg and found tons of messages such as:

[444905.674655] raid0_make_request bug: can't convert block across
chunks or bigger than 64k 69314431 4

Chunksize for both raid5's and the raid0 is 64k so it would appear the
issue is not that the chunk size is greater than 64k. I also find it
hard to believe it could be any kind of lvm issue simply because the
message in dmesg clearly shows its related to the raid0.

Any ideas on what I'm missing here would be greatly appreciated. I
would imagine it is some kind of alignment between block and chunk
sizes but I can't seem to figure it out :)

More detailed information including raid information and errors is at
http://pastebin.com/f6a52db74

- chris

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-02-21  4:33 Weird Issue with raid 5+0 chris
@ 2010-02-21  5:48 ` Neil Brown
  2010-02-21  7:26   ` chris
  0 siblings, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-02-21  5:48 UTC (permalink / raw)
  To: chris; +Cc: linux-raid

On Sat, 20 Feb 2010 23:33:23 -0500
chris <tknchris@gmail.com> wrote:

> Hello,
> 
> I am trying to setup a raid 5+0 on 6 1TB sata disks. I created the
> arrays like so:
> 
> mdadm --create /dev/md2 --level=5 --raid-devices=2 /dev/sda /dev/sdb /dev/sdc
> mdadm --create /dev/md3 --level=5 --raid-devices=2 /dev/sdd /dev/sde /dev/sdf
> mdadm --create /dev/md4 --level=0 --raid-devices=2 /dev/md2 /dev/md3
> 
> The arrays create and sync fine, then I put lvm on top and create a
> volume group and everything seems fine. I created 2 logical volumes
> and formatted them with filesystems and initially didn't realize
> anything was wrong. After running 2 virtual machines on them for a
> while  I noticed the vm's were reporting bad blocks on the volume. I
> looked in the dom0 dmesg and found tons of messages such as:
> 
> [444905.674655] raid0_make_request bug: can't convert block across
> chunks or bigger than 64k 69314431 4

This looks like a bug in 'dm' or more likely xen.
Assuming you are using a recent kernel (you didn't say), raid0 is
receiving a request that does not fit entirely in on chunk, and
which has more than on page in the bi_iovec.
i.e. bi_vcnt != 1 or bi_idx != 0.

As raid0 has a merge_bvec_fn, dm should not be sending bios with more than 1
page without first cheking that the merge_bvec_fn accepts the extra page.
But the raid0 merge_bvec_fn will reject any bio which does not fit in
a chunk.

dm-linear appears to honour the merge_bvec_fn of the underlying device
in the implementation of its own merge_bvec_fn.  So presumably the xen client
is not making the appropriate merge_bvec_fn call.
I am not very familiar with xen:  how exactly are you making the logical
volume available to xen?
Also, what kernel are you running?

NeilBrown


> 
> Chunksize for both raid5's and the raid0 is 64k so it would appear the
> issue is not that the chunk size is greater than 64k. I also find it
> hard to believe it could be any kind of lvm issue simply because the
> message in dmesg clearly shows its related to the raid0.
> 
> Any ideas on what I'm missing here would be greatly appreciated. I
> would imagine it is some kind of alignment between block and chunk
> sizes but I can't seem to figure it out :)
> 
> More detailed information including raid information and errors is at
> http://pastebin.com/f6a52db74
> 
> - chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-02-21  5:48 ` Neil Brown
@ 2010-02-21  7:26   ` chris
  2010-02-21  8:16     ` Neil Brown
  0 siblings, 1 reply; 17+ messages in thread
From: chris @ 2010-02-21  7:26 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

That is exactly what I didn't want to hear :( I am running
2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
do with my chunk/block sizes? If this is a bug what versions are
affected, I'll build a new domU kernel and see if I can get it working
there.

- chris

On Sun, Feb 21, 2010 at 12:48 AM, Neil Brown <neilb@suse.de> wrote:
> On Sat, 20 Feb 2010 23:33:23 -0500
> chris <tknchris@gmail.com> wrote:
>
>> Hello,
>>
>> I am trying to setup a raid 5+0 on 6 1TB sata disks. I created the
>> arrays like so:
>>
>> mdadm --create /dev/md2 --level=5 --raid-devices=2 /dev/sda /dev/sdb /dev/sdc
>> mdadm --create /dev/md3 --level=5 --raid-devices=2 /dev/sdd /dev/sde /dev/sdf
>> mdadm --create /dev/md4 --level=0 --raid-devices=2 /dev/md2 /dev/md3
>>
>> The arrays create and sync fine, then I put lvm on top and create a
>> volume group and everything seems fine. I created 2 logical volumes
>> and formatted them with filesystems and initially didn't realize
>> anything was wrong. After running 2 virtual machines on them for a
>> while  I noticed the vm's were reporting bad blocks on the volume. I
>> looked in the dom0 dmesg and found tons of messages such as:
>>
>> [444905.674655] raid0_make_request bug: can't convert block across
>> chunks or bigger than 64k 69314431 4
>
> This looks like a bug in 'dm' or more likely xen.
> Assuming you are using a recent kernel (you didn't say), raid0 is
> receiving a request that does not fit entirely in on chunk, and
> which has more than on page in the bi_iovec.
> i.e. bi_vcnt != 1 or bi_idx != 0.
>
> As raid0 has a merge_bvec_fn, dm should not be sending bios with more than 1
> page without first cheking that the merge_bvec_fn accepts the extra page.
> But the raid0 merge_bvec_fn will reject any bio which does not fit in
> a chunk.
>
> dm-linear appears to honour the merge_bvec_fn of the underlying device
> in the implementation of its own merge_bvec_fn.  So presumably the xen client
> is not making the appropriate merge_bvec_fn call.
> I am not very familiar with xen:  how exactly are you making the logical
> volume available to xen?
> Also, what kernel are you running?
>
> NeilBrown
>
>
>>
>> Chunksize for both raid5's and the raid0 is 64k so it would appear the
>> issue is not that the chunk size is greater than 64k. I also find it
>> hard to believe it could be any kind of lvm issue simply because the
>> message in dmesg clearly shows its related to the raid0.
>>
>> Any ideas on what I'm missing here would be greatly appreciated. I
>> would imagine it is some kind of alignment between block and chunk
>> sizes but I can't seem to figure it out :)
>>
>> More detailed information including raid information and errors is at
>> http://pastebin.com/f6a52db74
>>
>> - chris
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-02-21  7:26   ` chris
@ 2010-02-21  8:16     ` Neil Brown
  2010-02-21  8:21       ` Neil Brown
  2010-03-08  5:50       ` Neil Brown
  0 siblings, 2 replies; 17+ messages in thread
From: Neil Brown @ 2010-02-21  8:16 UTC (permalink / raw)
  To: chris; +Cc: linux-raid

On Sun, 21 Feb 2010 02:26:42 -0500
chris <tknchris@gmail.com> wrote:

> That is exactly what I didn't want to hear :( I am running
> 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
> do with my chunk/block sizes? If this is a bug what versions are
> affected, I'll build a new domU kernel and see if I can get it working
> there.
> 
> - chris

I'm absolutely sure it is a kernel bug.
I have no idea what version might be affected.  Probably all, but 
as xen isn't fully in main line it isn't easy for me to explore.

I suggest you contact the xen developers (or whoever you got xen from).
I'm happy to discuss the problem with someone who knows about xen block
device access, but I don't want to go hunting to find such a person.

NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-02-21  8:16     ` Neil Brown
@ 2010-02-21  8:21       ` Neil Brown
  2010-02-21  9:17         ` chris
  2010-03-08  5:50       ` Neil Brown
  1 sibling, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-02-21  8:21 UTC (permalink / raw)
  To: Neil Brown; +Cc: chris, linux-raid

On Sun, 21 Feb 2010 19:16:40 +1100
Neil Brown <neilb@suse.de> wrote:

> On Sun, 21 Feb 2010 02:26:42 -0500
> chris <tknchris@gmail.com> wrote:
> 
> > That is exactly what I didn't want to hear :( I am running
> > 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
> > do with my chunk/block sizes? If this is a bug what versions are
> > affected, I'll build a new domU kernel and see if I can get it working
> > there.
> > 
> > - chris
> 
> I'm absolutely sure it is a kernel bug.

though it just occurs to me that you might be able to work around it.
If, in the guest, you set "max_sectors_kb" to "4"
in /sys/block/whatever/queue, it might avoid the problem.

NeilBrown


> I have no idea what version might be affected.  Probably all, but 
> as xen isn't fully in main line it isn't easy for me to explore.
> 
> I suggest you contact the xen developers (or whoever you got xen from).
> I'm happy to discuss the problem with someone who knows about xen block
> device access, but I don't want to go hunting to find such a person.
> 
> NeilBrown


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-02-21  8:21       ` Neil Brown
@ 2010-02-21  9:17         ` chris
  2010-02-21 10:35           ` chris
  0 siblings, 1 reply; 17+ messages in thread
From: chris @ 2010-02-21  9:17 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil,

The raid is being done in dom0, any idea how to set that from the
kernel cmdline?

- chris

On Sun, Feb 21, 2010 at 3:21 AM, Neil Brown <neilb@suse.de> wrote:
> On Sun, 21 Feb 2010 19:16:40 +1100
> Neil Brown <neilb@suse.de> wrote:
>
>> On Sun, 21 Feb 2010 02:26:42 -0500
>> chris <tknchris@gmail.com> wrote:
>>
>> > That is exactly what I didn't want to hear :( I am running
>> > 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
>> > do with my chunk/block sizes? If this is a bug what versions are
>> > affected, I'll build a new domU kernel and see if I can get it working
>> > there.
>> >
>> > - chris
>>
>> I'm absolutely sure it is a kernel bug.
>
> though it just occurs to me that you might be able to work around it.
> If, in the guest, you set "max_sectors_kb" to "4"
> in /sys/block/whatever/queue, it might avoid the problem.
>
> NeilBrown
>
>
>> I have no idea what version might be affected.  Probably all, but
>> as xen isn't fully in main line it isn't easy for me to explore.
>>
>> I suggest you contact the xen developers (or whoever you got xen from).
>> I'm happy to discuss the problem with someone who knows about xen block
>> device access, but I don't want to go hunting to find such a person.
>>
>> NeilBrown
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-02-21  9:17         ` chris
@ 2010-02-21 10:35           ` chris
  0 siblings, 0 replies; 17+ messages in thread
From: chris @ 2010-02-21 10:35 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Another thought, I have several boxes running raid1's and raid5's
under the same kernel. Does this bug affect only dom0 or only the way
im nesting the raids? I will try a more recent dom0 kernel and see if
that makes any difference in the meantime.

- chris

On Sun, Feb 21, 2010 at 4:17 AM, chris <tknchris@gmail.com> wrote:
> Neil,
>
> The raid is being done in dom0, any idea how to set that from the
> kernel cmdline?
>
> - chris
>
> On Sun, Feb 21, 2010 at 3:21 AM, Neil Brown <neilb@suse.de> wrote:
>> On Sun, 21 Feb 2010 19:16:40 +1100
>> Neil Brown <neilb@suse.de> wrote:
>>
>>> On Sun, 21 Feb 2010 02:26:42 -0500
>>> chris <tknchris@gmail.com> wrote:
>>>
>>> > That is exactly what I didn't want to hear :( I am running
>>> > 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
>>> > do with my chunk/block sizes? If this is a bug what versions are
>>> > affected, I'll build a new domU kernel and see if I can get it working
>>> > there.
>>> >
>>> > - chris
>>>
>>> I'm absolutely sure it is a kernel bug.
>>
>> though it just occurs to me that you might be able to work around it.
>> If, in the guest, you set "max_sectors_kb" to "4"
>> in /sys/block/whatever/queue, it might avoid the problem.
>>
>> NeilBrown
>>
>>
>>> I have no idea what version might be affected.  Probably all, but
>>> as xen isn't fully in main line it isn't easy for me to explore.
>>>
>>> I suggest you contact the xen developers (or whoever you got xen from).
>>> I'm happy to discuss the problem with someone who knows about xen block
>>> device access, but I don't want to go hunting to find such a person.
>>>
>>> NeilBrown
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-02-21  8:16     ` Neil Brown
  2010-02-21  8:21       ` Neil Brown
@ 2010-03-08  5:50       ` Neil Brown
  2010-03-08  6:16         ` chris
                           ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Neil Brown @ 2010-03-08  5:50 UTC (permalink / raw)
  Cc: chris, linux-raid

On Sun, 21 Feb 2010 19:16:40 +1100
Neil Brown <neilb@suse.de> wrote:

> On Sun, 21 Feb 2010 02:26:42 -0500
> chris <tknchris@gmail.com> wrote:
> 
> > That is exactly what I didn't want to hear :( I am running
> > 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
> > do with my chunk/block sizes? If this is a bug what versions are
> > affected, I'll build a new domU kernel and see if I can get it working
> > there.
> > 
> > - chris
> 
> I'm absolutely sure it is a kernel bug.

And I think I now know what the bug is.

A patch was recently posted to dm-devel which I think addresses exactly this
problem.

I reproduce it below.

NeilBrown

-------------------
If the lower device exposes a merge_bvec_fn,
dm_set_device_limits() restricts max_sectors
to PAGE_SIZE "just to be safe".

This is not sufficient, however.

If someone uses bio_add_page() to add 8 disjunct 512 byte partial
pages to a bio, it would succeed, but could still cross a border
of whatever restrictions are below us (e.g. raid10 stripe boundary).
An attempted bio_split() would not succeed, because bi_vcnt is 8.

One example that triggered this frequently is the xen io layer.

raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1

Signed-off-by: Lars <lars.ellenberg@linbit.com>


---
 drivers/md/dm-table.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 4b22feb..c686ff4 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
 
 	/*
 	 * Check if merge fn is supported.
-	 * If not we'll force DM to use PAGE_SIZE or
+	 * If not we'll force DM to use single bio_vec of PAGE_SIZE or
 	 * smaller I/O, just to be safe.
 	 */
 
-	if (q->merge_bvec_fn && !ti->type->merge)
+	if (q->merge_bvec_fn && !ti->type->merge) {
 		limits->max_sectors =
 			min_not_zero(limits->max_sectors,
 				     (unsigned int) (PAGE_SIZE >> 9));
+		/* Restricting max_sectors is not enough.
+		 * If someone uses bio_add_page to add 8 disjunct 512 byte
+		 * partial pages to a bio, it would succeed,
+		 * but could still cross a border of whatever restrictions
+		 * are below us (e.g. raid0 stripe boundary).  An attempted
+		 * bio_split() would not succeed, because bi_vcnt is 8. */
+		limits->max_segments = 1;
+	}
 	return 0;
 }
 EXPORT_SYMBOL_GPL(dm_set_device_limits);
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-03-08  5:50       ` Neil Brown
@ 2010-03-08  6:16         ` chris
  2010-03-08  7:05           ` Neil Brown
  2010-03-08 15:35         ` chris
  2010-03-08 20:14         ` Bill Davidsen
  2 siblings, 1 reply; 17+ messages in thread
From: chris @ 2010-03-08  6:16 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Interesting, I moved to raid6 on the machine I was working on because
it was similar enough and I had a deadline to meet. I would still be
interested in testing this though. With your approval I would like to
copy xen-devel on this thread so that we can hopefully put together a
fix. Thanks again for your help previously and for what looks to be
the solution :)

- chris

On Mon, Mar 8, 2010 at 12:50 AM, Neil Brown <neilb@suse.de> wrote:
> On Sun, 21 Feb 2010 19:16:40 +1100
> Neil Brown <neilb@suse.de> wrote:
>
>> On Sun, 21 Feb 2010 02:26:42 -0500
>> chris <tknchris@gmail.com> wrote:
>>
>> > That is exactly what I didn't want to hear :( I am running
>> > 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
>> > do with my chunk/block sizes? If this is a bug what versions are
>> > affected, I'll build a new domU kernel and see if I can get it working
>> > there.
>> >
>> > - chris
>>
>> I'm absolutely sure it is a kernel bug.
>
> And I think I now know what the bug is.
>
> A patch was recently posted to dm-devel which I think addresses exactly this
> problem.
>
> I reproduce it below.
>
> NeilBrown
>
> -------------------
> If the lower device exposes a merge_bvec_fn,
> dm_set_device_limits() restricts max_sectors
> to PAGE_SIZE "just to be safe".
>
> This is not sufficient, however.
>
> If someone uses bio_add_page() to add 8 disjunct 512 byte partial
> pages to a bio, it would succeed, but could still cross a border
> of whatever restrictions are below us (e.g. raid10 stripe boundary).
> An attempted bio_split() would not succeed, because bi_vcnt is 8.
>
> One example that triggered this frequently is the xen io layer.
>
> raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1
>
> Signed-off-by: Lars <lars.ellenberg@linbit.com>
>
>
> ---
>  drivers/md/dm-table.c |   12 ++++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 4b22feb..c686ff4 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
>
>        /*
>         * Check if merge fn is supported.
> -        * If not we'll force DM to use PAGE_SIZE or
> +        * If not we'll force DM to use single bio_vec of PAGE_SIZE or
>         * smaller I/O, just to be safe.
>         */
>
> -       if (q->merge_bvec_fn && !ti->type->merge)
> +       if (q->merge_bvec_fn && !ti->type->merge) {
>                limits->max_sectors =
>                        min_not_zero(limits->max_sectors,
>                                     (unsigned int) (PAGE_SIZE >> 9));
> +               /* Restricting max_sectors is not enough.
> +                * If someone uses bio_add_page to add 8 disjunct 512 byte
> +                * partial pages to a bio, it would succeed,
> +                * but could still cross a border of whatever restrictions
> +                * are below us (e.g. raid0 stripe boundary).  An attempted
> +                * bio_split() would not succeed, because bi_vcnt is 8. */
> +               limits->max_segments = 1;
> +       }
>        return 0;
>  }
>  EXPORT_SYMBOL_GPL(dm_set_device_limits);
> --
> 1.6.3.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-03-08  6:16         ` chris
@ 2010-03-08  7:05           ` Neil Brown
  0 siblings, 0 replies; 17+ messages in thread
From: Neil Brown @ 2010-03-08  7:05 UTC (permalink / raw)
  To: chris; +Cc: linux-raid

On Mon, 8 Mar 2010 01:16:49 -0500
chris <tknchris@gmail.com> wrote:

> Interesting, I moved to raid6 on the machine I was working on because
> it was similar enough and I had a deadline to meet. I would still be
> interested in testing this though. With your approval I would like to
> copy xen-devel on this thread so that we can hopefully put together a
> fix. Thanks again for your help previously and for what looks to be
> the solution :)
>

I don't think you need my approval though you have it if you want.
Once something is on a public list you are free to forward it anywhere and as
long as the discussion is relevant to raid (which this is) it is appropriate
to keep it copied to linux-raid too.

NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-03-08  5:50       ` Neil Brown
  2010-03-08  6:16         ` chris
@ 2010-03-08 15:35         ` chris
  2010-03-08 17:29           ` [Xen-devel] " Konrad Rzeszutek Wilk
  2010-03-08 23:26           ` Jeremy Fitzhardinge
  2010-03-08 20:14         ` Bill Davidsen
  2 siblings, 2 replies; 17+ messages in thread
From: chris @ 2010-03-08 15:35 UTC (permalink / raw)
  To: Xen-Devel List; +Cc: linux-raid, Neil Brown

I forwarding this to xen-devel because it appears to be a bug in dom0 kernel.

I recently experienced a strange issue with software raid1+0 under Xen
on a new machine. I was getting corruption in my guest volumes and
tons of kernel messages such as:

[305044.571962] raid0_make_request bug: can't convert block across
chunks or bigger than 64k 14147455 4

The full thread is located at http://marc.info/?t=126672694700001&r=1&w=2
Detailed output at http://pastebin.com/f6a52db74

It appears after speaking with the linux-raid mailing list that this
is due a bug which has been fixed but the fix is not included in the
dom0 kernel. I'm not sure what sources kernel 2.6.26-2-xen-amd64 is
based on, but since xenlinux is still at 2.6.18 I was assuming that
this bug would still exist.

My questions for xen-devel are:

Can you tell me if there is any dom0 kernel where this issue is fixed?
Is there anything I can do to help get this resolved? Testing? Patching?

- chrris

On Mon, Mar 8, 2010 at 12:50 AM, Neil Brown <neilb@suse.de> wrote:
> On Sun, 21 Feb 2010 19:16:40 +1100
> Neil Brown <neilb@suse.de> wrote:
>
>> On Sun, 21 Feb 2010 02:26:42 -0500
>> chris <tknchris@gmail.com> wrote:
>>
>> > That is exactly what I didn't want to hear :( I am running
>> > 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
>> > do with my chunk/block sizes? If this is a bug what versions are
>> > affected, I'll build a new domU kernel and see if I can get it working
>> > there.
>> >
>> > - chris
>>
>> I'm absolutely sure it is a kernel bug.
>
> And I think I now know what the bug is.
>
> A patch was recently posted to dm-devel which I think addresses exactly this
> problem.
>
> I reproduce it below.
>
> NeilBrown
>
> -------------------
> If the lower device exposes a merge_bvec_fn,
> dm_set_device_limits() restricts max_sectors
> to PAGE_SIZE "just to be safe".
>
> This is not sufficient, however.
>
> If someone uses bio_add_page() to add 8 disjunct 512 byte partial
> pages to a bio, it would succeed, but could still cross a border
> of whatever restrictions are below us (e.g. raid10 stripe boundary).
> An attempted bio_split() would not succeed, because bi_vcnt is 8.
>
> One example that triggered this frequently is the xen io layer.
>
> raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1
>
> Signed-off-by: Lars <lars.ellenberg@linbit.com>
>
>
> ---
>  drivers/md/dm-table.c |   12 ++++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 4b22feb..c686ff4 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
>
>        /*
>         * Check if merge fn is supported.
> -        * If not we'll force DM to use PAGE_SIZE or
> +        * If not we'll force DM to use single bio_vec of PAGE_SIZE or
>         * smaller I/O, just to be safe.
>         */
>
> -       if (q->merge_bvec_fn && !ti->type->merge)
> +       if (q->merge_bvec_fn && !ti->type->merge) {
>                limits->max_sectors =
>                        min_not_zero(limits->max_sectors,
>                                     (unsigned int) (PAGE_SIZE >> 9));
> +               /* Restricting max_sectors is not enough.
> +                * If someone uses bio_add_page to add 8 disjunct 512 byte
> +                * partial pages to a bio, it would succeed,
> +                * but could still cross a border of whatever restrictions
> +                * are below us (e.g. raid0 stripe boundary).  An attempted
> +                * bio_split() would not succeed, because bi_vcnt is 8. */
> +               limits->max_segments = 1;
> +       }
>        return 0;
>  }
>  EXPORT_SYMBOL_GPL(dm_set_device_limits);
> --
> 1.6.3.3
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] Re: Weird Issue with raid 5+0
  2010-03-08 15:35         ` chris
@ 2010-03-08 17:29           ` Konrad Rzeszutek Wilk
  2010-03-09 19:42             ` Neil Brown
  2010-03-08 23:26           ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-08 17:29 UTC (permalink / raw)
  To: chris, neilb; +Cc: Xen-Devel List, linux-raid

On Mon, Mar 08, 2010 at 10:35:57AM -0500, chris wrote:
> I forwarding this to xen-devel because it appears to be a bug in dom0 kernel.
> 
> I recently experienced a strange issue with software raid1+0 under Xen
> on a new machine. I was getting corruption in my guest volumes and
> tons of kernel messages such as:
> 
> [305044.571962] raid0_make_request bug: can't convert block across
> chunks or bigger than 64k 14147455 4
> 
> The full thread is located at http://marc.info/?t=126672694700001&r=1&w=2
> Detailed output at http://pastebin.com/f6a52db74
> 
> It appears after speaking with the linux-raid mailing list that this
> is due a bug which has been fixed but the fix is not included in the
> dom0 kernel. I'm not sure what sources kernel 2.6.26-2-xen-amd64 is
> based on, but since xenlinux is still at 2.6.18 I was assuming that
> this bug would still exist.
> 
> My questions for xen-devel are:
> 
> Can you tell me if there is any dom0 kernel where this issue is fixed?

Not there yet.

> Is there anything I can do to help get this resolved? Testing? Patching?

It looks to me that the patch hasn't reached the latest Linux tree. Nor
the stable branch. I believe once it gets there we would pull it in
automatically.

The patch at http://marc.info/?l=linux-raid&m=126802743419044&w=2 looks
to be quite safe so it should be easy for you to pull it and apply it to
your sources?

Neil, any idea when this patch might land in Greg KH's tree (2.6.32) or
upstream?

> 
> - chrris
> 
> On Mon, Mar 8, 2010 at 12:50 AM, Neil Brown <neilb@suse.de> wrote:
> > On Sun, 21 Feb 2010 19:16:40 +1100
> > Neil Brown <neilb@suse.de> wrote:
> >
> >> On Sun, 21 Feb 2010 02:26:42 -0500
> >> chris <tknchris@gmail.com> wrote:
> >>
> >> > That is exactly what I didn't want to hear :( I am running
> >> > 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
> >> > do with my chunk/block sizes? If this is a bug what versions are
> >> > affected, I'll build a new domU kernel and see if I can get it working
> >> > there.
> >> >
> >> > - chris
> >>
> >> I'm absolutely sure it is a kernel bug.
> >
> > And I think I now know what the bug is.
> >
> > A patch was recently posted to dm-devel which I think addresses exactly this
> > problem.
> >
> > I reproduce it below.
> >
> > NeilBrown
> >
> > -------------------
> > If the lower device exposes a merge_bvec_fn,
> > dm_set_device_limits() restricts max_sectors
> > to PAGE_SIZE "just to be safe".
> >
> > This is not sufficient, however.
> >
> > If someone uses bio_add_page() to add 8 disjunct 512 byte partial
> > pages to a bio, it would succeed, but could still cross a border
> > of whatever restrictions are below us (e.g. raid10 stripe boundary).
> > An attempted bio_split() would not succeed, because bi_vcnt is 8.
> >
> > One example that triggered this frequently is the xen io layer.
> >
> > raid10_make_request bug: can't convert block across chunks or bigger than 64k 209265151 1
> >
> > Signed-off-by: Lars <lars.ellenberg@linbit.com>
> >
> >
> > ---
> >  drivers/md/dm-table.c |   12 ++++++++++--
> >  1 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 4b22feb..c686ff4 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -515,14 +515,22 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
> >
> >        /*
> >         * Check if merge fn is supported.
> > -        * If not we'll force DM to use PAGE_SIZE or
> > +        * If not we'll force DM to use single bio_vec of PAGE_SIZE or
> >         * smaller I/O, just to be safe.
> >         */
> >
> > -       if (q->merge_bvec_fn && !ti->type->merge)
> > +       if (q->merge_bvec_fn && !ti->type->merge) {
> >                limits->max_sectors =
> >                        min_not_zero(limits->max_sectors,
> >                                     (unsigned int) (PAGE_SIZE >> 9));
> > +               /* Restricting max_sectors is not enough.
> > +                * If someone uses bio_add_page to add 8 disjunct 512 byte
> > +                * partial pages to a bio, it would succeed,
> > +                * but could still cross a border of whatever restrictions
> > +                * are below us (e.g. raid0 stripe boundary).  An attempted
> > +                * bio_split() would not succeed, because bi_vcnt is 8. */
> > +               limits->max_segments = 1;
> > +       }
> >        return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(dm_set_device_limits);
> > --
> > 1.6.3.3
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Weird Issue with raid 5+0
  2010-03-08  5:50       ` Neil Brown
  2010-03-08  6:16         ` chris
  2010-03-08 15:35         ` chris
@ 2010-03-08 20:14         ` Bill Davidsen
  2 siblings, 0 replies; 17+ messages in thread
From: Bill Davidsen @ 2010-03-08 20:14 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:
> On Sun, 21 Feb 2010 19:16:40 +1100
> Neil Brown <neilb@suse.de> wrote:
>
>   
>> On Sun, 21 Feb 2010 02:26:42 -0500
>> chris <tknchris@gmail.com> wrote:
>>
>>     
>>> That is exactly what I didn't want to hear :( I am running
>>> 2.6.26-2-xen-amd64. Are you sure its a kernel problem and nothing to
>>> do with my chunk/block sizes? If this is a bug what versions are
>>> affected, I'll build a new domU kernel and see if I can get it working
>>> there.
>>>
>>> - chris
>>>       
>> I'm absolutely sure it is a kernel bug.
>>     
>
> And I think I now know what the bug is.
>
> A patch was recently posted to dm-devel which I think addresses exactly this
> problem.
>
> I reproduce it below.
>
> NeilBrown
>
> -------------------
> If the lower device exposes a merge_bvec_fn,
> dm_set_device_limits() restricts max_sectors
> to PAGE_SIZE "just to be safe".
>
> This is not sufficient, however.
>
> If someone uses bio_add_page() to add 8 disjunct 512 byte partial
> pages to a bio, it would succeed, but could still cross a border
> of whatever restrictions are below us (e.g. raid10 stripe boundary).
> An attempted bio_split() would not succeed, because bi_vcnt is 8.
>
> One example that triggered this frequently is the xen io layer.
>   

And I bet this hasn't been fix in RHEL yet... I believe we saw this a 
while ago.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Weird Issue with raid 5+0
  2010-03-08 15:35         ` chris
  2010-03-08 17:29           ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2010-03-08 23:26           ` Jeremy Fitzhardinge
  2010-03-09  0:48             ` chris
  1 sibling, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-08 23:26 UTC (permalink / raw)
  To: chris; +Cc: linux-raid, Neil Brown, Xen-Devel List

On 03/08/2010 07:35 AM, chris wrote:
> I forwarding this to xen-devel because it appears to be a bug in dom0 kernel.
>
> I recently experienced a strange issue with software raid1+0 under Xen
> on a new machine. I was getting corruption in my guest volumes and
> tons of kernel messages such as:
>
> [305044.571962] raid0_make_request bug: can't convert block across
> chunks or bigger than 64k 14147455 4
>
> The full thread is located at http://marc.info/?t=126672694700001&r=1&w=2
> Detailed output at http://pastebin.com/f6a52db74
>
> It appears after speaking with the linux-raid mailing list that this
> is due a bug which has been fixed but the fix is not included in the
> dom0 kernel. I'm not sure what sources kernel 2.6.26-2-xen-amd64 is
> based on, but since xenlinux is still at 2.6.18 I was assuming that
> this bug would still exist.
>    

There are a number of possible dom0 kernel in use.  The xen.org 
2.6.18-xen tree is still maintained on a bugfix basis.  Are you seeing 
the problem with 2.6.18-xen?

I don't know what 2.6.26-2-xen-amd64 is, but I'm guessing from the name 
it is a Debian kernel.  I don't know what its provenance is, but it is 
not a xen.org supported kernel; if a fix is needed you should file a bug 
against your distro.

Modern Xen kernels are based on 2.6.31.x and 2.6.32.x.  See 
http://wiki.xensource.com/xenwiki/XenParavirtOps and 
http://wiki.xensource.com/xenwiki/XenDom0Kernels for more details.

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Weird Issue with raid 5+0
  2010-03-08 23:26           ` Jeremy Fitzhardinge
@ 2010-03-09  0:48             ` chris
  2010-03-09  1:14               ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: chris @ 2010-03-09  0:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-raid, Neil Brown, Xen-Devel List

I will test with xenlinux 2.6.18, as far as pv_ops dom0 does this work
with any version of the hypervisor? The current system is running 3.2
which is pretty old so I just assumed pv_ops wouldn't be a
possibility.

 -chris

On Mon, Mar 8, 2010 at 6:26 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> On 03/08/2010 07:35 AM, chris wrote:
>>
>> I forwarding this to xen-devel because it appears to be a bug in dom0
>> kernel.
>>
>> I recently experienced a strange issue with software raid1+0 under Xen
>> on a new machine. I was getting corruption in my guest volumes and
>> tons of kernel messages such as:
>>
>> [305044.571962] raid0_make_request bug: can't convert block across
>> chunks or bigger than 64k 14147455 4
>>
>> The full thread is located at http://marc.info/?t=126672694700001&r=1&w=2
>> Detailed output at http://pastebin.com/f6a52db74
>>
>> It appears after speaking with the linux-raid mailing list that this
>> is due a bug which has been fixed but the fix is not included in the
>> dom0 kernel. I'm not sure what sources kernel 2.6.26-2-xen-amd64 is
>> based on, but since xenlinux is still at 2.6.18 I was assuming that
>> this bug would still exist.
>>
>
> There are a number of possible dom0 kernel in use.  The xen.org 2.6.18-xen
> tree is still maintained on a bugfix basis.  Are you seeing the problem with
> 2.6.18-xen?
>
> I don't know what 2.6.26-2-xen-amd64 is, but I'm guessing from the name it
> is a Debian kernel.  I don't know what its provenance is, but it is not a
> xen.org supported kernel; if a fix is needed you should file a bug against
> your distro.
>
> Modern Xen kernels are based on 2.6.31.x and 2.6.32.x.  See
> http://wiki.xensource.com/xenwiki/XenParavirtOps and
> http://wiki.xensource.com/xenwiki/XenDom0Kernels for more details.
>
>    J
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Weird Issue with raid 5+0
  2010-03-09  0:48             ` chris
@ 2010-03-09  1:14               ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-09  1:14 UTC (permalink / raw)
  To: chris; +Cc: linux-raid, Neil Brown, Xen-Devel List

On 03/08/2010 04:48 PM, chris wrote:
> I will test with xenlinux 2.6.18, as far as pv_ops dom0 does this work
> with any version of the hypervisor? The current system is running 3.2
> which is pretty old so I just assumed pv_ops wouldn't be a
> possibility.
>    

I'm not sure - it's worth trying ;).  xen/next / stable-2.6.32.x 
explicitly requires 4.0 or 3.4.3, but I don't know what the earliest 
working version is for xen/master / stable-2.6.31.x.

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xen-devel] Re: Weird Issue with raid 5+0
  2010-03-08 17:29           ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2010-03-09 19:42             ` Neil Brown
  0 siblings, 0 replies; 17+ messages in thread
From: Neil Brown @ 2010-03-09 19:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: chris, Xen-Devel List, linux-raid

On Mon, 8 Mar 2010 12:29:15 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

> The patch at http://marc.info/?l=linux-raid&m=126802743419044&w=2 looks
> to be quite safe so it should be easy for you to pull it and apply it to
> your sources?
> 
> Neil, any idea when this patch might land in Greg KH's tree (2.6.32) or
> upstream?
> 

Not really.
It was observer on the dm-devel list that the patch isn't quite perfect.
There are possible, though rare, cases where it is not sufficient.  So I
suspect it will be revised and re-submitted.  So it might go upstream in a
week or so, and then into -stable in due course, but it is really up to the
dm developers.

NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-03-09 19:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-21  4:33 Weird Issue with raid 5+0 chris
2010-02-21  5:48 ` Neil Brown
2010-02-21  7:26   ` chris
2010-02-21  8:16     ` Neil Brown
2010-02-21  8:21       ` Neil Brown
2010-02-21  9:17         ` chris
2010-02-21 10:35           ` chris
2010-03-08  5:50       ` Neil Brown
2010-03-08  6:16         ` chris
2010-03-08  7:05           ` Neil Brown
2010-03-08 15:35         ` chris
2010-03-08 17:29           ` [Xen-devel] " Konrad Rzeszutek Wilk
2010-03-09 19:42             ` Neil Brown
2010-03-08 23:26           ` Jeremy Fitzhardinge
2010-03-09  0:48             ` chris
2010-03-09  1:14               ` Jeremy Fitzhardinge
2010-03-08 20:14         ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).