s2disk and raid

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* s2disk and raid
@ 2007-04-03 15:55 Tim Dijkstra
  2007-04-03 16:34 ` [Suspend-devel] " Stefan Seyfried
  2007-04-04  5:20 ` Neil Brown
  0 siblings, 2 replies; 9+ messages in thread
From: Tim Dijkstra @ 2007-04-03 15:55 UTC (permalink / raw)
  To: suspend-devel List, linux-raid; +Cc: 415441

[-- Attachment #1.1: Type: text/plain, Size: 1826 bytes --]

Hi,

I've got a bugreport [0] from a user trying to use raid and uswsusp. He's
using initramfs-tools available in debian. I'll describe the problem
and my analysis, maybe you can comment on what you think. A warning: I only
have a casual understanding of raid, never looked at any code related to it.

This is a setup where root maybe on raid, but swap isn't. Swap on raid
will be very difficult to support, I think.

When s2disk is started, nothing special is done to the array. It may be
in an unclean state (just like filesystems). Image is written to disk.

After the power cycle the kernel boots, devices are discovered, among
which the ones holding raid. Then we try to find the device that holds
swap in case of resume and / in case of a normal boot.

Now comes a crucial point. The script that finds the raid array, finds
the array in an unclean state and starts syncing.

After this, resume finds an image in the swap partition and starts the
resume process. Part of this process is freezing everything but itself,
which fails on the process/thread that does the syncing.

IMO, the problem comes from the fact we started syncing, before we could
start resume. 

Now the problem could theoretically be solved by not starting the
assembly of the array once it is discovered, but modifying the 
initramfs to do the assembly after we have had the chance to resume.

The debian-maintainer of mdadm thinks that the suspend process should
have left the array in a clean state, but this is IMHO impossible. We
are freezing userspace. A mdamd process looking after the array will
probably get into trouble if we come back from suspend and we have
done something to the array in the mean time.

What do you think?

grts Tim

[0] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=415441

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 345 bytes --]

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

[-- Attachment #3: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Suspend-devel mailing list
Suspend-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/suspend-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Suspend-devel] s2disk and raid
  2007-04-03 15:55 s2disk and raid Tim Dijkstra
@ 2007-04-03 16:34 ` Stefan Seyfried
  2007-04-03 19:00   ` Rafael J. Wysocki
  2007-04-04  5:20 ` Neil Brown
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Seyfried @ 2007-04-03 16:34 UTC (permalink / raw)
  To: Tim Dijkstra; +Cc: suspend-devel List, linux-raid, 415441

On Tue, Apr 03, 2007 at 05:55:21PM +0200, Tim Dijkstra wrote:
> Hi,
> 
> I've got a bugreport [0] from a user trying to use raid and uswsusp. He's
> using initramfs-tools available in debian. I'll describe the problem
> and my analysis, maybe you can comment on what you think. A warning: I only
> have a casual understanding of raid, never looked at any code related to it.
> 
> This is a setup where root maybe on raid, but swap isn't. Swap on raid
> will be very difficult to support, I think.

"it depends" :-)

> Now comes a crucial point. The script that finds the raid array, finds
> the array in an unclean state and starts syncing.

bad. Don't do that. Data will be lost.

> After this, resume finds an image in the swap partition and starts the
> resume process. Part of this process is freezing everything but itself,
> which fails on the process/thread that does the syncing.
> 
> IMO, the problem comes from the fact we started syncing, before we could
> start resume. 

Yes.
 
> Now the problem could theoretically be solved by not starting the
> assembly of the array once it is discovered, but modifying the 
> initramfs to do the assembly after we have had the chance to resume.

Yes.

> The debian-maintainer of mdadm thinks that the suspend process should
> have left the array in a clean state, but this is IMHO impossible. We
> are freezing userspace. A mdamd process looking after the array will
> probably get into trouble if we come back from suspend and we have
> done something to the array in the mean time.

Yes.
 
> What do you think?

You are right, he is wrong.
Do not touch anything before resume.
-- 
Stefan Seyfried

"Any ideas, John?"
"Well, surrounding them's out." 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Suspend-devel] s2disk and raid
  2007-04-03 16:34 ` [Suspend-devel] " Stefan Seyfried
@ 2007-04-03 19:00   ` Rafael J. Wysocki
  0 siblings, 0 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2007-04-03 19:00 UTC (permalink / raw)
  To: Stefan Seyfried, Tim Dijkstra; +Cc: suspend-devel, linux-raid, 415441

On Tuesday, 3 April 2007 18:34, Stefan Seyfried wrote:
> On Tue, Apr 03, 2007 at 05:55:21PM +0200, Tim Dijkstra wrote:
> > Hi,
> > 
> > I've got a bugreport [0] from a user trying to use raid and uswsusp. He's
> > using initramfs-tools available in debian. I'll describe the problem
> > and my analysis, maybe you can comment on what you think. A warning: I only
> > have a casual understanding of raid, never looked at any code related to it.
> > 
> > This is a setup where root maybe on raid, but swap isn't. Swap on raid
> > will be very difficult to support, I think.
> 
> "it depends" :-)

Theoretically (and I mean it), we can do that.

> > Now comes a crucial point. The script that finds the raid array, finds
> > the array in an unclean state and starts syncing.
> 
> bad. Don't do that. Data will be lost.
> 
> > After this, resume finds an image in the swap partition and starts the
> > resume process. Part of this process is freezing everything but itself,
> > which fails on the process/thread that does the syncing.
> > 
> > IMO, the problem comes from the fact we started syncing, before we could
> > start resume. 
> 
> Yes.
>  
> > Now the problem could theoretically be solved by not starting the
> > assembly of the array once it is discovered, but modifying the 
> > initramfs to do the assembly after we have had the chance to resume.
> 
> Yes.
> 
> > The debian-maintainer of mdadm thinks that the suspend process should
> > have left the array in a clean state, but this is IMHO impossible. We
> > are freezing userspace. A mdamd process looking after the array will
> > probably get into trouble if we come back from suspend and we have
> > done something to the array in the mean time.
> 
> Yes.
>  
> > What do you think?
> 
> You are right, he is wrong.
> Do not touch anything before resume.

Definitely.  The same applies to the built-in swsusp, BTW.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: s2disk and raid
  2007-04-03 15:55 s2disk and raid Tim Dijkstra
  2007-04-03 16:34 ` [Suspend-devel] " Stefan Seyfried
@ 2007-04-04  5:20 ` Neil Brown
  2007-04-04 18:53   ` Tim Dijkstra
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Neil Brown @ 2007-04-04  5:20 UTC (permalink / raw)
  To: Tim Dijkstra; +Cc: suspend-devel List, linux-raid, 415441

On Tuesday April 3, newsuser@famdijkstra.org wrote:
> Hi,
> 
> I've got a bugreport [0] from a user trying to use raid and uswsusp. He's
> using initramfs-tools available in debian. I'll describe the problem
> and my analysis, maybe you can comment on what you think. A warning: I only
> have a casual understanding of raid, never looked at any code related to it.
> 
> This is a setup where root maybe on raid, but swap isn't. Swap on raid
> will be very difficult to support, I think.

Nah... shouldn't be a problem.... well, maybe raid5.

> 
> When s2disk is started, nothing special is done to the array. It may be
> in an unclean state (just like filesystems). Image is written to disk.
> 
> After the power cycle the kernel boots, devices are discovered, among
> which the ones holding raid. Then we try to find the device that holds
> swap in case of resume and / in case of a normal boot.
> 
> Now comes a crucial point. The script that finds the raid array, finds
> the array in an unclean state and starts syncing.

Uhm, so you are finding the device for the root filesystem before you
have decided which case it will be (resume or normal boot).  Can that
be delayed until after the decision.  It's probably not important but
it seems neater.
Or do you need the root device even when resuming (I guess if swap is
in a file on the root filesystem....)

The trick is to use the 'start_ro' module parameter.
  echo 1 > /sys/module/md_mod/parameters/start_ro

Then md will start arrays assuming read-only.  No resync will be
started, no superblock will be written.  They stay this way until the
first write at which point they become normal read-write and any
required resync starts.

So you can start arrays 'readonly', and resume off a raid1 without any
risk of the the resync starting when it shouldn't.

It is probably best to 'echo 0 > ....' once you have committed to a
normal boot, but it isn't really critical.

> 
> The debian-maintainer of mdadm thinks that the suspend process should
> have left the array in a clean state, but this is IMHO impossible.

It probably would be best if suspend left the process in a clean
state.  It shouldn't be too hard, but it needs to be done in the
kernel.
However it isn't critical to all of this working well.

I mentioned above that if swap in on raid5 it might be awkward.  This
is because raid5 caches some data that is on disk.  If you snapshot
the raid5 memory, then resume raid5 so it can write to disk, when you
come back from suspend you could have old data in the cache.  It
should be possible to fix this, but it is currently a potential
problem that might be worth warning people against.

NeilBrown

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: s2disk and raid
  2007-04-04  5:20 ` Neil Brown
@ 2007-04-04 18:53   ` Tim Dijkstra
  2007-04-04 20:47   ` Michael Tokarev
  2007-04-06  9:08   ` Luca Berra
  2 siblings, 0 replies; 9+ messages in thread
From: Tim Dijkstra @ 2007-04-04 18:53 UTC (permalink / raw)
  To: Neil Brown, suspend-devel List, linux-raid, 415441


[-- Attachment #1.1: Type: text/plain, Size: 3371 bytes --]

On Wed, 4 Apr 2007 15:20:56 +1000
Neil Brown <neilb@suse.de> wrote:

> On Tuesday April 3, newsuser@famdijkstra.org wrote:
> > Hi,
> > 
> > I've got a bugreport [0] from a user trying to use raid and uswsusp. He's
> > using initramfs-tools available in debian. I'll describe the problem
> > and my analysis, maybe you can comment on what you think. A warning: I only
> > have a casual understanding of raid, never looked at any code related to it.
> > 
> > This is a setup where root maybe on raid, but swap isn't. Swap on raid
> > will be very difficult to support, I think.
> 
> Nah... shouldn't be a problem.... well, maybe raid5.

OK, that is nice to hear.

> > 
> > When s2disk is started, nothing special is done to the array. It may be
> > in an unclean state (just like filesystems). Image is written to disk.
> > 
> > After the power cycle the kernel boots, devices are discovered, among
> > which the ones holding raid. Then we try to find the device that holds
> > swap in case of resume and / in case of a normal boot.
> > 
> > Now comes a crucial point. The script that finds the raid array, finds
> > the array in an unclean state and starts syncing.
> 
> Uhm, so you are finding the device for the root filesystem before you
> have decided which case it will be (resume or normal boot).  Can that
> be delayed until after the decision.  It's probably not important but
> it seems neater.
> Or do you need the root device even when resuming (I guess if swap is
> in a file on the root filesystem....)

It is not that we need the root filesystem for resume. It is more how
the initramfs is currently setup. To be as general as possible, all
partitions are discoverd, of which one will contain the image.

> The trick is to use the 'start_ro' module parameter.
>   echo 1 > /sys/module/md_mod/parameters/start_ro
> 
> Then md will start arrays assuming read-only.  No resync will be
> started, no superblock will be written.  They stay this way until the
> first write at which point they become normal read-write and any
> required resync starts.
> 
> So you can start arrays 'readonly', and resume off a raid1 without any
> risk of the the resync starting when it shouldn't.
> 
> It is probably best to 'echo 0 > ....' once you have committed to a
> normal boot, but it isn't really critical.

This is very good to know. I think we can work out something with the
debian-maintainer based on this.

> > 
> > The debian-maintainer of mdadm thinks that the suspend process should
> > have left the array in a clean state, but this is IMHO impossible.
> 
> It probably would be best if suspend left the process in a clean
> state.  It shouldn't be too hard, but it needs to be done in the
> kernel.
> However it isn't critical to all of this working well.
> 
> I mentioned above that if swap in on raid5 it might be awkward.  This
> is because raid5 caches some data that is on disk.  If you snapshot
> the raid5 memory, then resume raid5 so it can write to disk, when you
> come back from suspend you could have old data in the cache.  It
> should be possible to fix this, but it is currently a potential
> problem that might be worth warning people against.

OK, if we can support suspend and raid and even with swap on raid0 or
raid1, I'm happy.

Thanks for the input.

grts Tim

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 345 bytes --]

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

[-- Attachment #3: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Suspend-devel mailing list
Suspend-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/suspend-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: s2disk and raid
  2007-04-04  5:20 ` Neil Brown
  2007-04-04 18:53   ` Tim Dijkstra
@ 2007-04-04 20:47   ` Michael Tokarev
  2007-04-12  5:37     ` Luis Rodrigo Gallardo Cruz
  2007-04-06  9:08   ` Luca Berra
  2 siblings, 1 reply; 9+ messages in thread
From: Michael Tokarev @ 2007-04-04 20:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: Tim Dijkstra, suspend-devel List, linux-raid, 415441

Neil Brown wrote:
> On Tuesday April 3, newsuser@famdijkstra.org wrote:
[]
>> After the power cycle the kernel boots, devices are discovered, among
>> which the ones holding raid. Then we try to find the device that holds
>> swap in case of resume and / in case of a normal boot.
>>
>> Now comes a crucial point. The script that finds the raid array, finds
>> the array in an unclean state and starts syncing.
[]
> So you can start arrays 'readonly', and resume off a raid1 without any
> risk of the the resync starting when it shouldn't.

But I wonder why this raid is necessary in the first place.
For raid1, assuming the superblock is at the end, -- the only
thing needed for resume is one component of the mirror.  I.e,
if your raid array is (was) composed off hda1 and hdb1, either
of the two will do as source of resume image.  The trick is to
find which, in case the array was degraded -- and mdadm does the
job here, but assembling it isn't really necessary.  Maybe mdadm
can be told to "examine" the component devices and write a short
line to stdout *instead* of real assembly (like mdadm -A --dummy),
to show the most recent component, and the offset if superblock
is at the beginning... having that, it will be possible to resume
from that component directly...

By the way, my home-grown initramfs stuff accepts several devices
for resume= command line, and tries each in turn.  If main disks
has more-or-less stable names, this may be an alternative way.
To mean, just give the component devices in resume= line...

Yes, this way it may do some weird things in case when the original
swap array was degraded (with first component, which contained a
valid resume image, removed from the array)...  But it's not really
a big issue, since - usually anyway - if one uses resume=, it means
the machine in question isn't some remote 100-miles-away, but it's
here, and it's ok to bypass the resume for recovery purposes.

Just some random thoughts.

/mjt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: s2disk and raid
  2007-04-04 20:47   ` Michael Tokarev
@ 2007-04-12  5:37     ` Luis Rodrigo Gallardo Cruz
  2007-04-17 18:58       ` Bug#415441: " Tim Dijkstra
  0 siblings, 1 reply; 9+ messages in thread
From: Luis Rodrigo Gallardo Cruz @ 2007-04-12  5:37 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Neil Brown, suspend-devel List, linux-raid, 415441

[-- Attachment #1.1: Type: text/plain, Size: 1794 bytes --]

[I'm the original bug reporter. Sorry for getting so late into the
conversation]

On Thu, Apr 05, 2007 at 12:47:49AM +0400, Michael Tokarev wrote:
> Neil Brown wrote:
> > On Tuesday April 3, newsuser@famdijkstra.org wrote:
> []
> >> After the power cycle the kernel boots, devices are discovered, among
> >> which the ones holding raid. Then we try to find the device that holds
> >> swap in case of resume and / in case of a normal boot.
> >>
> >> Now comes a crucial point. The script that finds the raid array, finds
> >> the array in an unclean state and starts syncing.
> []
> > So you can start arrays 'readonly', and resume off a raid1 without any
> > risk of the the resync starting when it shouldn't.
> 
> But I wonder why this raid is necessary in the first place.

In the case of my original report, the array is not actually necesary,
since the resume image is in another (normal) partition. The array
gets resumed since the mdadm scripts run before the resume ones in the
initrd and they by default start *every* array in the system.

But at least the mdadm maintainer seems to think that having the
resume image in a raid device, or in an lvm logical volume inside a
raid device, or other such esoteric arangements, is an use case worth
supporting.

Something that I seem to not have said. It's not *all* arrays that are
unclean on reboot, just one (that is used as physical volume for
LVM. I don't know if that's relevant). Also worth mentioning is that
kernel space suspend on 2.6.17 did not have this problem (or didn't
show it in my system, anyways).

After reading through the responses, I have come to think this is a
kernel issue, and have posted a report (#418823) to debian's linux-2.6
package. I'll wait to see what they have to say.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 345 bytes --]

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

[-- Attachment #3: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Suspend-devel mailing list
Suspend-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/suspend-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug#415441: s2disk and raid
  2007-04-12  5:37     ` Luis Rodrigo Gallardo Cruz
@ 2007-04-17 18:58       ` Tim Dijkstra
  0 siblings, 0 replies; 9+ messages in thread
From: Tim Dijkstra @ 2007-04-17 18:58 UTC (permalink / raw)
  To: Luis Rodrigo Gallardo Cruz, 415441
  Cc: Neil Brown, suspend-devel List, Michael Tokarev, linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 1540 bytes --]

On Thu, 12 Apr 2007 00:37:53 -0500
Luis Rodrigo Gallardo Cruz <rodrigo@nul-unu.com> wrote:


> On Thu, Apr 05, 2007 at 12:47:49AM +0400, Michael Tokarev wrote:
> > Neil Brown wrote:
> > > On Tuesday April 3, newsuser@famdijkstra.org wrote:
> > []
> > >> After the power cycle the kernel boots, devices are discovered, among
> > >> which the ones holding raid. Then we try to find the device that holds
> > >> swap in case of resume and / in case of a normal boot.
> > >>
> > >> Now comes a crucial point. The script that finds the raid array, finds
> > >> the array in an unclean state and starts syncing.
> > []
> > > So you can start arrays 'readonly', and resume off a raid1 without any
> > > risk of the the resync starting when it shouldn't.
> > 

> Something that I seem to not have said. It's not *all* arrays that are
> unclean on reboot, just one (that is used as physical volume for
> LVM. I don't know if that's relevant). Also worth mentioning is that
> kernel space suspend on 2.6.17 did not have this problem (or didn't
> show it in my system, anyways).
> 
> After reading through the responses, I have come to think this is a
> kernel issue, and have posted a report (#418823) to debian's linux-2.6
> package. I'll wait to see what they have to say.

Maybe there is a kernel issue, but we still are doing something wrong;
We shouldn't try to write to raid before we resume, that is just asking
for problems.

I'll look into the `readonly' option. That would fix or problem IMHO.

grts Tim

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 286 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

[-- Attachment #3: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Suspend-devel mailing list
Suspend-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/suspend-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: s2disk and raid
  2007-04-04  5:20 ` Neil Brown
  2007-04-04 18:53   ` Tim Dijkstra
  2007-04-04 20:47   ` Michael Tokarev
@ 2007-04-06  9:08   ` Luca Berra
  2 siblings, 0 replies; 9+ messages in thread
From: Luca Berra @ 2007-04-06  9:08 UTC (permalink / raw)
  To: linux-raid

On Wed, Apr 04, 2007 at 03:20:56PM +1000, Neil Brown wrote:
>The trick is to use the 'start_ro' module parameter.
>  echo 1 > /sys/module/md_mod/parameters/start_ro
>
>Then md will start arrays assuming read-only.  No resync will be
>started, no superblock will be written.  They stay this way until the
>first write at which point they become normal read-write and any
>required resync starts.
>
uh, i tought a read-only array was supposed to remain read-only, and
that write attempts would fail.
My bad for not testing my assumptions.

L.

-- 
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-04-17 18:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-03 15:55 s2disk and raid Tim Dijkstra
2007-04-03 16:34 ` [Suspend-devel] " Stefan Seyfried
2007-04-03 19:00   ` Rafael J. Wysocki
2007-04-04  5:20 ` Neil Brown
2007-04-04 18:53   ` Tim Dijkstra
2007-04-04 20:47   ` Michael Tokarev
2007-04-12  5:37     ` Luis Rodrigo Gallardo Cruz
2007-04-17 18:58       ` Bug#415441: " Tim Dijkstra
2007-04-06  9:08   ` Luca Berra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).