From: Lem <l3mming@iinet.net.au>
To: linux-raid@vger.kernel.org
Subject: Re: RAID5 producing fake partition table on single drive
Date: Mon, 21 Aug 2006 18:03:45 +1000 [thread overview]
Message-ID: <1156147425.4525.20.camel@localhost.localdomain> (raw)
In-Reply-To: <17641.25141.373827.77279@cse.unsw.edu.au>
On Mon, 2006-08-21 at 17:35 +1000, Neil Brown wrote:
> On Saturday August 19, l3mming@iinet.net.au wrote:
> > Hi all,
> >
> > I'm having a problem with my RAID5 array, here's the deal:
> >
> > System is an AMD Athlon 64 X2 4200+ on a Gigabyte K8NS-939-Ultra
> > (nForce3 Ultra). Linux 2.6.17.7, x86_64. Debian GNU/Linux Sid, GCC 4.1.1
> > (kernel configured and compiled by hand).
> >
> > RAID5 array created using mdadm 2.5.2. All drives are 250Gb Seagate
> > SATAs, spread across three controllers: nForce3 Ultra (motherboard),
> > Silicon Image 3124 (motherboard) and Promise SATA TX300 (PCI).
> >
> > /dev/sda: ST3250624NS
> > /dev/sdb: ST3250624NS
> > /dev/sdc: ST3250823AS
> > /dev/sdd: ST3250624NS
> > /dev/sde: ST3250823AS
> >
> > The array assembles and runs perfectly at boot, and continues to operate
> > without errors, and has been for a few months. It is using a version
> > 0.90 superblock. None of the devices were partitioned with fdisk, they
> > were just passed to mdadm when the array was created.
> >
> > Recently (last week or two), I have noticed the following in dmesg:
> >
> > SCSI device sde: 488397168 512-byte hdwr sectors (250059 MB)
> > sde: Write Protect is off
> > sde: Mode Sense: 00 3a 00 00
> > SCSI device sde: drive cache: write back
> > SCSI device sde: 488397168 512-byte hdwr sectors (250059 MB)
> > sde: Write Protect is off
> > sde: Mode Sense: 00 3a 00 00
> > SCSI device sde: drive cache: write back
> > sde: sde1 sde3
>
> This itself shouldn't be a problem. The fact that the kernel imagines
> there are partitions shouldn't hurt as long as no-one tries to access
> them.
This is where I'm having a problem - lilo fails due to the bogus
partition table, here's the output:
# lilo
part_nowrite: read:: Input/output error
and from dmesg/syslog due to running lilo:
printk: 537 messages suppressed.
Buffer I/O error on device sde3, logical block 0
Buffer I/O error on device sde3, logical block 1
Buffer I/O error on device sde3, logical block 2
Buffer I/O error on device sde3, logical block 3
Buffer I/O error on device sde3, logical block 4
Buffer I/O error on device sde3, logical block 5
Buffer I/O error on device sde3, logical block 6
Buffer I/O error on device sde3, logical block 7
Buffer I/O error on device sde3, logical block 8
Buffer I/O error on device sde3, logical block 9
>
> > sd 6:0:0:0: Attached scsi disk sde
> >
> > Buffer I/O error on device sde3, logical block 1792
> > Buffer I/O error on device sde3, logical block 1793
> > Buffer I/O error on device sde3, logical block 1794
> > Buffer I/O error on device sde3, logical block 1795
> > Buffer I/O error on device sde3, logical block 1796
> > Buffer I/O error on device sde3, logical block 1797
> > Buffer I/O error on device sde3, logical block 1798
> > Buffer I/O error on device sde3, logical block 1799
> > Buffer I/O error on device sde3, logical block 1792
> > Buffer I/O error on device sde3, logical block 1793
>
> This, on the other hand, might be a problem - though possibly only a
> small one.
> Who is trying to access sde3 I wonder. I'm fairly sure the kernel
> wouldn't do that directly.
>
> Maybe some 'udev' related thing is trying to be clever?
The above buffer I/O errors (logical block 1792+) occur as filesystems
are being automounted. /dev/sde* doesn't exist in /etc/fstab of course.
> Apart from these messages, is there any symptoms that cause a problem?
> It could just be that something is reading from somewhere that doesn't
> exist, and is complaining. So let them complain. Who cares :-)
There's no problems with any software apart from lilo so far. fdisk
works (since it doesn't scan all block devices on startup). Gparted
might fail, though I haven't tried (it scans all block devices on
startup). And yep, sounds about right that something is reading from
somewhere that doesn't exist (the bogus partition table on /dev/sde
would suggest this is the case).
> >
> > I'm not a software/kernel/RAID developer by any stretch of the
> > imagination, but my thoughts are that I've just been unlucky with my
> > array and that the data on there has somehow managed to look like a
> > partition table, and the kernel is trying to read it, resulting in the
> > buffer IO errors.
>
> But these errors are necessarily a problem (I admit they look scary).
>
> >
> > I believe a solution to this problem would be for me to create proper
> > partitions on my RAID disks (with type fd I suspect?), and possibly use
> > a version 1.x superblock rather than 0.90.
>
> Creation partitions and then raiding them would remove these messages.
> Also using a verions 1.1 or 1.2 superblock would (as they put the
> superblock at the start of the device instead of the end).
>
> But is it worth the effort?
A few questions, searching for the best possible solution. I believe
this is worth the effort, else I can't run lilo without disabling the
array and removing /dev/sde from the system.
1. Is it possible to have mdadm or another tool automatically convert
the superblocks to v1.1/1.2 (and perhaps create proper partitions)?
2. If number 1 isn't possible, is it possible to convert one drive at a
time to have a proper partition table? Like this: Stop array;
fdisk /dev/sde, create partition of type fd (entire disk), save
partition table; Start array. (then I'd assume mdadm would notice
that /dev/sde has changed and possibly start a resync? - if not, and it
just works, then great!). If that works, then do every other drive, one
at a time.
Thanks for your help Neil.
>
> NeilBrown
next prev parent reply other threads:[~2006-08-21 8:03 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-19 11:40 RAID5 producing fake partition table on single drive Lem
2006-08-21 7:35 ` Neil Brown
2006-08-21 8:03 ` Lem [this message]
2006-08-28 3:46 ` Neil Brown
2006-08-29 7:17 ` Lem
2006-08-21 22:47 ` Doug Ledford
2006-09-04 17:55 ` Bill Davidsen
2006-09-05 16:49 ` Luca Berra
2006-09-10 5:59 ` Lem
2006-09-14 22:42 ` Bill Davidsen
2006-09-15 7:51 ` Lem
2006-09-15 8:29 ` Luca Berra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1156147425.4525.20.camel@localhost.localdomain \
--to=l3mming@iinet.net.au \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).