physical size of the device inconsistent with superblock, after RAID problems

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* physical size of the device inconsistent with superblock, after RAID problems
@ 2011-02-15  3:14 Gavin Flower
  0 siblings, 0 replies; 4+ messages in thread
From: Gavin Flower @ 2011-02-15  3:14 UTC (permalink / raw)
  To: ext3-users; +Cc: neilb, linux-raid

Hi,

I would appreciate advice recovering from the following situation, after an aborted mdadm resizing operation and subsequent recovery actions:

/dev/md1: The filing system size (according to the superblock) is 76799952 blocks
The physical size of the device is 76799616
Either the superblock or the partition table is likely to be corrupt!

/dev/md1: UNEXPECTED INCONSISTENCY: RUN fsck manually
(i.e. without -a or -p options)

fsck.ext4 -f -n /dev/md1 output:

e2fsck 1.41.12 (17-May-2010)
The filesystem size (according to the superblock) is 76799952 blocks
The physical size of the device is 76799616 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? no

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -9626 -(9728--9752) +(405344--405369)
Fix? no

/dev/md1: ********** WARNING: Filesystem still has errors **********

/dev/md1: 1693644/19202048 files (0.3% non-contiguous), 54273929/76799952 blocks

Note that original size, according mdadm, was not a multiple of 512KB, so I reshaped it to be the largest multiple or 512KB less than the original size using the -size option of mdadm.  So my second attempt to reshape, using the 512 chunk size, started okay.  The previous chunk size was 64KB.

Note I am using Fedora 14, up-to-date as of Friday February 11th, and that there are 5X500KB drives, with 3 RAID-6 arrays:
/dev/md0 swap
/dev/md1 mostly user data (the problematic one)
/dev/md2 distribution & O/S files
plus /boot on a non-RAID ext4 partition

Sequence of events:
Reshaped /dev/md1 using mdadm, without first reducing size of the ext4 filesystem.

The process of reshaping /dev/md1 was about 20% through when I killed it.

System appeared okay.

I rebooted a few minute later, but shortly after I selected the kernel, it stopped, and I dropped into a shell.

With the help of Neil Brown, I made some progress and /dev/md1 reshaping appeared to have completed without error.

However, on the next reboot I got the INCONSISTENCY message.

Will it be safe to simply accept fsck's offer to fix, or are there other things I should do?

Thanks,
Gavin 

--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: physical size of the device inconsistent with superblock, after RAID problems
@ 2011-02-17 23:53 Gavin Flower
  2011-02-18  1:51 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Gavin Flower @ 2011-02-17 23:53 UTC (permalink / raw)
  To: neilb; +Cc: ext3-users, linux-raid

Hi Neil,

My attempted post to ext3-users@redhat.com, had not been published there (even though I had emailed it 4 days ago!), as at a minute ago.

I finally bit the bullet and went ahead.

I accepted the fixes put forward by fsck associated with bitmap differences, and rebooted.

Still problems.

Still had the discrepancy in the file size.  So I ran the command:

resize2fs -p /dev/md1 76799616

I used the smaller of the 2 block counts, as:
(a) I needed to reduce the file system size, because I had already reduced the RAID size (I _SHOULD_ have done this first, before resizing the RAID), and
(b) it is reported as the 'physical' size of the device, so it is likely to be the correct value IMHO

The system the came up successfully after a reboot, and I was able to log in as normal.

There appeared to be no apparent loss of data, not that I did an exhaustive systematic check. However, several users have logged on successfully, and it is playing its part as gateway to the Internet, and squid appears to be providing its normal functionality.

Neil, your help and encouragement was/is greatly appreciated!

Thanks,
Gavin
--

All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!

--- On Tue, 15/2/11, Gavin Flower <gavinflower@yahoo.com> wrote:

From: Gavin Flower <gavinflower@yahoo.com>
Subject: physical size of the device inconsistent with superblock, after RAID problems
To: ext3-users@redhat.com
Cc: neilb@suse.de, linux-raid@vger.kernel.org
Date: Tuesday, 15 February, 2011, 16:14

Hi,

I would appreciate advice recovering from the following situation, after an aborted mdadm resizing operation and subsequent recovery actions:

/dev/md1: The filing system size (according to the superblock) is 76799952 blocks
The physical size of the device is 76799616
Either the superblock or the partition table is likely to be corrupt!

/dev/md1: UNEXPECTED INCONSISTENCY: RUN fsck manually
(i.e. without -a or -p options)

fsck.ext4 -f -n /dev/md1 output:

e2fsck 1.41.12 (17-May-2010)
The filesystem size (according to the superblock) is 76799952 blocks
The physical size of the device is 76799616 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? no

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -9626 -(9728--9752) +(405344--405369)
Fix? no

/dev/md1: ********** WARNING: Filesystem still has errors **********

/dev/md1: 1693644/19202048 files (0.3% non-contiguous), 54273929/76799952 blocks

Note that original size, according mdadm, was not a multiple of 512KB, so I reshaped it to be the largest multiple or 512KB less than the original size using the -size option of mdadm.  So my second attempt to reshape, using the 512 chunk size, started okay.  The previous chunk size was 64KB.

Note I am using Fedora 14, up-to-date as of Friday February 11th, and that there are 5X500KB drives, with 3 RAID-6 arrays:
/dev/md0 swap
/dev/md1 mostly user data (the problematic one)
/dev/md2 distribution & O/S files
plus /boot on a non-RAID ext4 partition

Sequence of events:
Reshaped /dev/md1 using mdadm, without first reducing size of the ext4 filesystem.

The process of reshaping /dev/md1 was about 20% through when I killed it.

System appeared okay.

I rebooted a few minute later, but shortly after I selected the kernel, it stopped, and I dropped into a shell.

With the help of Neil Brown, I made some progress and /dev/md1 reshaping appeared to have completed without error.

However, on the next reboot I got the INCONSISTENCY message.

Will it be safe to simply accept fsck's offer to fix, or are there other things I should do?

Thanks,
Gavin 

--
All Adults share the Responsibility
to help Raise Today's Children,
for they are Tomorrow's Society!

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: physical size of the device inconsistent with superblock, after RAID problems
  2011-02-17 23:53 physical size of the device inconsistent with superblock, after RAID problems Gavin Flower
@ 2011-02-18  1:51 ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2011-02-18  1:51 UTC (permalink / raw)
  To: Gavin Flower; +Cc: ext3-users, linux-raid

On Thu, 17 Feb 2011 15:53:11 -0800 (PST) Gavin Flower <gavinflower@yahoo.com>
wrote:

> Hi Neil,
> 
> My attempted post to ext3-users@redhat.com, had not been published there (even though I had emailed it 4 days ago!), as at a minute ago.
> 
> I finally bit the bullet and went ahead.
> 
> I accepted the fixes put forward by fsck associated with bitmap differences, and rebooted.
> 
> Still problems.
> 
> Still had the discrepancy in the file size.  So I ran the command:
> 
> resize2fs -p /dev/md1 76799616
> 
> I used the smaller of the 2 block counts, as:
> (a) I needed to reduce the file system size, because I had already reduced the RAID size (I _SHOULD_ have done this first, before resizing the RAID), and
> (b) it is reported as the 'physical' size of the device, so it is likely to be the correct value IMHO
> 
> The system the came up successfully after a reboot, and I was able to log in as normal.
> 
> There appeared to be no apparent loss of data, not that I did an exhaustive systematic check. However, several users have logged on successfully, and it is playing its part as gateway to the Internet, and squid appears to be providing its normal functionality.
> 
> Neil, your help and encouragement was/is greatly appreciated!
> 

Excellent!  I'm glad you found a way through.
As you didn't really trim very much from your device it is certainly possible
that no critical data was there.  Quite possibly resize2fs would have told
you if there was (I certainly hope it would have done).

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: physical size of the device inconsistent with superblock, after RAID problems
@ 2011-02-18  3:50 Gavin Flower
  0 siblings, 0 replies; 4+ messages in thread
From: Gavin Flower @ 2011-02-18  3:50 UTC (permalink / raw)
  To: neilb; +Cc: ext3-users, linux-raid


--- On Fri, 18/2/11, NeilBrown <neilb@suse.de> wrote:

> From: NeilBrown <neilb@suse.de>
> Subject: Re: physical size of the device inconsistent with superblock, after RAID problems
> To: "Gavin Flower" <gavinflower@yahoo.com>
> Cc: ext3-users@redhat.com, linux-raid@vger.kernel.org
> Date: Friday, 18 February, 2011, 14:51
> On Thu, 17 Feb 2011 15:53:11 -0800
> (PST) Gavin Flower <gavinflower@yahoo.com>
> wrote:
>
> > Hi Neil,
> >
> > My attempted post to ext3-users@redhat.com,
> had not been published there (even though I had emailed it 4
> days ago!), as at a minute ago.
> >
> > I finally bit the bullet and went ahead.
> >
> > I accepted the fixes put forward by fsck associated
> with bitmap differences, and rebooted.
> >
> > Still problems.
> >
> > Still had the discrepancy in the file size.  So I ran
> the command:
> >
> > resize2fs -p /dev/md1 76799616
> >
> > I used the smaller of the 2 block counts, as:
> > (a) I needed to reduce the file system size, because I
> had already reduced the RAID size (I _SHOULD_ have done this
> first, before resizing the RAID), and
> > (b) it is reported as the 'physical' size of the
> device, so it is likely to be the correct value IMHO
> >
> > The system the came up successfully after a reboot,
> and I was able to log in as normal.
> >
> > There appeared to be no apparent loss of data, not
> that I did an exhaustive systematic check. However, several
> users have logged on successfully, and it is playing its
> part as gateway to the Internet, and squid appears to be
> providing its normal functionality.
> >
> > Neil, your help and encouragement was/is greatly
> appreciated!
> >
>
> Excellent!  I'm glad you found a way through.
> As you didn't really trim very much from your device it is
> certainly possible
> that no critical data was there.  Quite possibly
> resize2fs would have told
> you if there was (I certainly hope it would have done).
>
> NeilBrown
>
Hi Neil,

Having about 26% spare capacity (see output of the df) md1 (the problematic RAID 6), probably (?) meant that nothing was likely to be lost by trimming a tiny fraction of a percent from the end.

However, since the md1 device actually resides on 5 real physical drives, reality is almost certainly more complicated! - possibly, hence the bit map discrepancies (now I'm firmly outside my area of expertise!).

# df
Filesystem  1K-blocks       Used    Available   Use%   Mounted on
/dev/md2   1097254408   27547660   1013969456     3%   /
tmpfs         4097108        772      4096336     1%   /dev/shm
/dev/sda1     1032088     129800       849860    14%   /boot
/dev/md1    302377920  212244524     74773476    74%   /data
# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Thu Dec  3 13:05:02 2009
     Raid Level : raid6
     Array Size : 307198464 (292.97 GiB 314.57 GB)
  Used Dev Size : 102399488 (97.66 GiB 104.86 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Feb 18 15:09:50 2011
          State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           UUID : 6f1176ae:a0ad6cac:bfe78010:bc810f04
         Events : 0.3389728

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       66        2      active sync   /dev/sde2
       3       8       50        3      active sync   /dev/sdd2
       4       8       34        4      active sync   /dev/sdc2
#


Cheers,
Gavin






      

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-02-18  3:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-17 23:53 physical size of the device inconsistent with superblock, after RAID problems Gavin Flower
2011-02-18  1:51 ` NeilBrown
  -- strict thread matches above, loose matches on Subject: below --
2011-02-18  3:50 Gavin Flower
2011-02-15  3:14 Gavin Flower

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).