From: "Christian P. Schmidt" <schmidt@digadd.de>
To: linux-raid@vger.kernel.org
Cc: mingo@redhat.com, neilb@cse.unsw.edu.au
Subject: Kernel OOPS with partitioned software raid (+ further questions) [PATCH]
Date: Mon, 30 Oct 2006 19:56:17 +0100 [thread overview]
Message-ID: <45464AD1.9030407@digadd.de> (raw)
[-- Attachment #1: Type: text/plain, Size: 5179 bytes --]
Hi all,
I'm running the following software-raid setup:
two raid 0 with two 250GB disks each (sdd1-sdg1) named md_d2 and md_d3
one raid 5 with three 500GB disks (sda2-sdc2) and the two raid0 as
members named md_d5
one raid 1 with 100MB of each of the 500GB disks (sda1-sdc1) named md_d1
The only raid device that actually has a partition table is md_d5. The
other devices are used unpartitioned, which brings me to the first
question: Is it possible to run partitioned and unpartitioned software
raids at the same time?
Back to the topic now after this question. The resulting problem is: due
to the raid5 layout, the partition table of md_d5 is written to where a
partition table on md_d3 would be as well:
[~]>fdisk -l /dev/md_d3
Disk /dev/md_d3: 500.1 GB, 500113211392 bytes
2 heads, 4 sectors/track, 122097952 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Device Boot Start End Blocks Id System
/dev/md_d3p1 1 244142 976566 83 Linux
/dev/md_d3p2 244143 5126956 19531256 8e Linux LVM
/dev/md_d3p3 5126957 488279488 1932610128 8e Linux LVM
Note that the end of md_d3p3 is way beyond the end of the actual device.
Now during boot udev tries to find out about the content of the devices,
using the vol_id program. It checks the various locations for raid
superblocks, lvm superblocks. What happens show the following strace
excerpts:
execve("./vol_id.bin", ["./vol_id.bin", "-t", "/dev/md_d3p3"], [/* 26
vars */]) = 0
[... Dynamic library setup, etc]
open("/dev/md_d3p3", O_RDONLY) = 3
[... various brk()]
ioctl(3, BLKGETSIZE64, 0x7fff9ff36948) = 0
[... drop to nobody/nogroup after lots of nscd interaction]
lseek(3, 1978992689152, SEEK_SET) = 1978992689152
read(3,
Never returns.
The connection reset of course only happens after reboot. This is what I
can see on a serial console:
* Letting udev process events ...Unable to handle kernel NULL pointer
dereference
<ffffffff8041a9b3>{raid0_make_request+291}
PGD 3e751067 PUD 3e748067 PMD 0
Oops: 0000 [1]
CPU 0
Modules linked in:
Pid: 1994, comm: vol_id Not tainted 2.6.17-hardened-r1 #2
RIP: 0010:[<ffffffff8041a9b3>] <ffffffff8041a9b3>{raid0_make_request+291}
RSP: 0018:ffff81003e7479d8 EFLAGS: 00010212
RAX: ffff81003facace0 RBX: ffff81003fd17440 RCX: 0000000000000003
RDX: 000000001d156930 RSI: 0000000000000006 RDI: 0000000000000000
RBP: 0000000000000040 R08: 00000000746a36b0 R09: 0000000000000080
R10: ffff81003f503900 R11: 00000000e8d46d60 R12: ffff81003f0c5330
R13: ffff81003e747ad8 R14: 0000000000000001 R15: 0000000000000000
FS: 00002b5b6f634b90(0000) GS:ffffffff806cb000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000003e75d000 CR4: 00000000000006e0
Process vol_id (pid: 1994, threadinfo ffff81003e746000, task
ffff81003e5ef5b0)
Stack: 0000000000000008 ffff81003fd17440 0000000000000080 ffffffff80345305
0000000000000000 0000000000001000 0000000000000000 ffff81003fd17440
ffff81003fd17440 0000000000000000
Call Trace: <ffffffff80345305>{generic_make_request+357}
<ffffffff80347458>{submit_bio+200} <ffffffff80268fcb>{submit_bh+251}
<ffffffff8026bbb2>{block_read_full_page+610}
<ffffffff8026f930>{blkdev_g}
<ffffffff80353db3>{radix_tree_node_alloc+19}
<ffffffff8035455d>{radix_tr}
<ffffffff8024dd0d>{__do_page_cache_readahead+509}
<ffffffff80276fbd>{__l}
<ffffffff8024ddfd>{blockable_page_cache_readahead+109}
<ffffffff8024e06e>{page_cache_readahead+334}
<ffffffff80247a17>{do_gener}
<ffffffff80249b40>{file_read_actor+0}
<ffffffff80248682>{__generic_file_}
<ffffffff802498ec>{generic_file_read+172}
<ffffffff8023bfc0>{autoremove_}
<ffffffff8025698c>{unmap_region+220} <ffffffff80267dca>{vfs_read+186}
<ffffffff80268203>{sys_read+83} <ffffffff80209a0e>{system_call+126}
Code: 48 8b 17 48 89 d0 48 03 47 10 49 39 c0 72 06 48 83 c7 28 eb
RIP <ffffffff8041a9b3>{raid0_make_request+291} RSP <ffff81003e7479d8>
CR2: 0000000000000000
The kernel above contains a lot of patches (gentoo's hardened sources),
but the same syndrom can be seen with vanilla 2.6.18 or 2.6.19 rc3.
Even if there are likely a dozend workarounds (create a partition table
on the raid 0s one by one and resync; no not rely on raid=part for
autodetection as the raid5 doesn't come up automatically anyway; don't
use vol_id) this should in my oppinion not happen. The points I'd like
to criticize are:
- The partition table read code, which accepts to create the devices
even though they are obviously wrong,
- The partitioned raid device creation code, which creates subdevices
which are larger than the containing device,
- The layer in the kernel that allows the read beyond end of device down
to the raid driver,
- Most importantly, the raid driver for failing that bad mannered.
I honestly didn't look into the other software raid drivers, which are
likely to produce the same result. The attached patch for raid0.c makes
accesses beyond the end of a device into Buffer I/O errors:
xxxxxx Buffer I/O error on device md_d3p3, logical block 483152512
Regards,
Christian
[-- Attachment #2: raid0.patch --]
[-- Type: text/plain, Size: 404 bytes --]
--- raid0.c.orig 2006-10-30 00:12:22.000000000 +0100
+++ raid0.c 2006-10-30 00:14:48.000000000 +0100
@@ -415,6 +415,10 @@
chunksize_bits = ffz(~chunk_size);
block = bio->bi_sector >> 1;
+ if (block >= mddev->array_size) {
+ bio_endio(bio, bio->bi_size, -EIO);
+ return 0;
+ }
if (unlikely(chunk_sects < (bio->bi_sector & (chunk_sects - 1)) + (bio->bi_size >> 9))) {
struct bio_pair *bp;
reply other threads:[~2006-10-30 18:56 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45464AD1.9030407@digadd.de \
--to=schmidt@digadd.de \
--cc=linux-raid@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=neilb@cse.unsw.edu.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.