linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* reshape from raid5 to raid6 failed -- and backup file is gone?
@ 2010-04-22 18:32 Rich Rauenzahn
  2010-04-22 21:46 ` Rich Rauenzahn
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Rauenzahn @ 2010-04-22 18:32 UTC (permalink / raw)
  To: linux-raid

The system froze sometime in the middle of the night, and oddly... the
--backup-file I specified last night is missing.  Is there any reason
that mdadm would unlink it for a short span of time during the
reshape?

The raid5 array was 4x500GB drives, I was reshaping to raid6 by adding
another 500gb drive.

$ sudo mdadm --assemble /dev/md1
mdadm: Failed to restore critical section for reshape, sorry.
      Possibly you needed to specify the --backup-file

FC12, using

$ mdadm --version
mdadm - v3.1.1 - 19th November 2009

sudo mdadm --assemble /dev/md1 --backup-file=/.MEDIA/tmp/mdadm.backup
mdadm: backup file /.MEDIA/tmp/mdadm.backup inaccessible: No such file
or directory
mdadm: Failed to restore critical section for reshape, sorry.

[rrauenza@tendo ~]$

Hmmm...

I wonder if I hit this - as I might have had a drive failure based on
some extra noise at bootup:

3/ There is another bug where by if one of the devices in the array dies
  during the reshape, the backup process stops working correctly with the
  result that the reshape goes much faster but the backup is completely
  useless.  If you crash during the reshape after a failed device,
  you will probably lose data.  If you try to stop and restart the
  array after one device has failed, the restart will fail.  However
  this is still the safest thing to do.  I will try to put out some
  updates to mdadm so that you can reassemble the array safely in this
  case (and of course, fix the problem so that the backup is maintained
  throughout the entire run).

I'm currently running a smartctl -t long against all of the drives...

Any other ideas in the meantime?   I suspect I've lost my 1.5 TB array :(

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: reshape from raid5 to raid6 failed -- and backup file is gone?
  2010-04-22 18:32 reshape from raid5 to raid6 failed -- and backup file is gone? Rich Rauenzahn
@ 2010-04-22 21:46 ` Rich Rauenzahn
  2010-04-22 22:09   ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Rauenzahn @ 2010-04-22 21:46 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb

On Thu, Apr 22, 2010 at 11:32 AM, Rich Rauenzahn <rrauenza@gmail.com> wrote:
> The system froze sometime in the middle of the night, and oddly... the
> --backup-file I specified last night is missing.  Is there any reason
> that mdadm would unlink it for a short span of time during the
> reshape?

I'm at the point now where I'm just looking for triage advice so I can
work on it tonight. Most of the contents were backed up offsite, but a
little bit out of date.  Nothing critical, but inconvenient, like
recovery images for relative's systems.

Do I kill it and start over, or is there any hope of recovering from
the reshape?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: reshape from raid5 to raid6 failed -- and backup file is gone?
  2010-04-22 21:46 ` Rich Rauenzahn
@ 2010-04-22 22:09   ` Neil Brown
  2010-04-22 22:26     ` Rich Rauenzahn
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2010-04-22 22:09 UTC (permalink / raw)
  To: Rich Rauenzahn; +Cc: linux-raid

On Thu, 22 Apr 2010 14:46:32 -0700
Rich Rauenzahn <rrauenza@gmail.com> wrote:

> On Thu, Apr 22, 2010 at 11:32 AM, Rich Rauenzahn <rrauenza@gmail.com> wrote:
> > The system froze sometime in the middle of the night, and oddly... the
> > --backup-file I specified last night is missing.  Is there any reason
> > that mdadm would unlink it for a short span of time during the
> > reshape?
> 
> I'm at the point now where I'm just looking for triage advice so I can
> work on it tonight. Most of the contents were backed up offsite, but a
> little bit out of date.  Nothing critical, but inconvenient, like
> recovery images for relative's systems.
> 
> Do I kill it and start over, or is there any hope of recovering from
> the reshape?


It is very odd that the backup file disappeared.  What is
   /.MEDIA/tmp
?? Not a tmpfs filesystem I hope...

You could hack mdadm to assemble the array even without a backup file.
Get 'Grow_restart' to always return 0.

Then most of your data should be assessible, but the could be a section in
the middle which is currupt.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: reshape from raid5 to raid6 failed -- and backup file is gone?
  2010-04-22 22:09   ` Neil Brown
@ 2010-04-22 22:26     ` Rich Rauenzahn
  2010-04-22 22:37       ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Rauenzahn @ 2010-04-22 22:26 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>
>
> It is very odd that the backup file disappeared.  What is
>   /.MEDIA/tmp
> ?? Not a tmpfs filesystem I hope...

No, a persistent filesystem on another disk -- with no autodelete
scripts or anything.  Very very weird that it disappeared.   I saw it
before I went to bed... was about 50MB if I recall.  Unless the fsck
at boot removed it -- but I would imagine you make sure you fsync that
file to disk before you proceed?   I think ./.MEDIA is ext4.

> You could hack mdadm to assemble the array even without a backup file.
> Get 'Grow_restart' to always return 0.
>
> Then most of your data should be assessible, but the could be a section in
> the middle which is currupt.

As long as fsck can bring it back to a consistent state, I'm ok with
that.   But I would guess some file contents could be silently
corrupt?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: reshape from raid5 to raid6 failed -- and backup file is gone?
  2010-04-22 22:26     ` Rich Rauenzahn
@ 2010-04-22 22:37       ` Neil Brown
  0 siblings, 0 replies; 5+ messages in thread
From: Neil Brown @ 2010-04-22 22:37 UTC (permalink / raw)
  To: Rich Rauenzahn; +Cc: linux-raid

On Thu, 22 Apr 2010 15:26:13 -0700
Rich Rauenzahn <rrauenza@gmail.com> wrote:

> >
> >
> > It is very odd that the backup file disappeared.  What is
> >   /.MEDIA/tmp
> > ?? Not a tmpfs filesystem I hope...
> 
> No, a persistent filesystem on another disk -- with no autodelete
> scripts or anything.  Very very weird that it disappeared.   I saw it
> before I went to bed... was about 50MB if I recall.  Unless the fsck
> at boot removed it -- but I would imagine you make sure you fsync that
> file to disk before you proceed?   I think ./.MEDIA is ext4.
> 

I don't fsync the directory containing the file, but I do fsync the file
quite regularly.
But if the system ran for even just one minute after the --grow started the
directory would have been flushed to disk so the absence of an fsync on the
directory cannot have been a problem... not that I'm convinced it could be a
problem anyway.


> > You could hack mdadm to assemble the array even without a backup file.
> > Get 'Grow_restart' to always return 0.
> >
> > Then most of your data should be assessible, but the could be a section in
> > the middle which is currupt.
> 
> As long as fsck can bring it back to a consistent state, I'm ok with
> that.   But I would guess some file contents could be silently
> corrupt?

Yes, there could be silent corruption.  Or there might be no corruption.
If the backup file was about 50MB (which seems likely) then the total amount
of corruption should not exceed 50MB - though in an indexing block was lost
you could loose more than 50MB of a single file..

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-04-22 22:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-22 18:32 reshape from raid5 to raid6 failed -- and backup file is gone? Rich Rauenzahn
2010-04-22 21:46 ` Rich Rauenzahn
2010-04-22 22:09   ` Neil Brown
2010-04-22 22:26     ` Rich Rauenzahn
2010-04-22 22:37       ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).