Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-11 15:30 [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Huang, Ying
@ 2007-07-11 11:13 ` Pavel Machek
  2007-07-12  0:22 ` Andrew Morton
       [not found] ` <20070711111350.GI7091@elf.ucw.cz>
  2 siblings, 0 replies; 113+ messages in thread
From: Pavel Machek @ 2007-07-11 11:13 UTC (permalink / raw)
  To: Huang, Ying; +Cc: linux-kernel, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

Hi!

Looks interesting... but I was feeling strange dejavu reading
this... and that's because you pasted the changelog twice :-).

> Kexec base hibernation has some potential advantages over uswsusp and
> suspend2. Some most obvious advantages are:
> 
> 1. The hibernation image size can exceed half of memory size easily.

Yes.

> 2. The hibernation image can be written to and read from almost
>    anywhere, such as USB disk, NFS.

We could do USB disk with uswsusp... NFS would be harder.

How fast can kexec boot secondary kernel?

> This patch implements the functionality of "jumping from kexeced
> kernel to original kernel". That is, the following sequence is
> possible:
> 
> 1. Boot a kernel A
> 2. Work under kernel A
> 3. Kexec another kernel B in kernel A
> 4. Work under kernel B
> 5. Jump from kernel B to kernel A
> 6. Continue work under kernel A

Nice!

> 2. Compile the kexec-tools with kdump and kjump patches added, the
>    kdump patch can be found at:
> 
> http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch

I got 404 error :-(.

> 3. Boot compiled kernel, the reserved crash kernel memory region must
>    be added to kernel command line as following:
> 
>    crashkernel=<XX>M@<XX>M
> 
>    Where, <XX> should be replaced by the real memory size and position.

How much memory do you suggest to reserve? 64M? 

> 7. In the kexec booted kernel, trigger the jumping back with following
>    shell command.
> 
>    echo <a>:<b> > /sys/power/resume
> 
>    Where <a> and <b> is non-negative integer, at least one of them must
>    be non-zero.

What does a and b mean?

[Was it more than three copies? If they were non-identical, assume I
read some random one].

> +/* Adds the kexec_backup= command line parameter to command line. */
> +static int cmdline_add_backup(char *cmdline, unsigned long addr)
> +{
> +	int cmdlen, len, align = 1024;
> +	char str[30], *ptr;
> +
> +	/* Passing in kexec_backup=xxxK format. Saves space required in cmdline.
> +	 * Ensure 1K alignment*/
> +	if (addr%align)
> +		return -1;
> +	addr = addr/align;
> +	ptr = str;
> +	strcpy(str, " kexec_backup=");
> +	ptr += strlen(str);
> +	ultoa(addr, ptr);
> +	strcat(str, "K");
> +	len = strlen(str);
> +	cmdlen = strlen(cmdline) + len;
> +	if (cmdlen > (COMMAND_LINE_SIZE - 1))
> +		die("Command line overflow\n");
> +	strcat(cmdline, str);
> +#if 0
> +		printf("Command line after adding backup\n");
> +		printf("%s\n", cmdline);
> +#endif
> +	return 0;
> +}

printf()? ...and please remove out commented code.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 113+ messages in thread

* [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
@ 2007-07-11 15:30 Huang, Ying
  2007-07-11 11:13 ` Pavel Machek
                   ` (2 more replies)
  0 siblings, 3 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-11 15:30 UTC (permalink / raw)
  To: Pavel Machek, nigel, Rafael J. Wysocki, Jeremy Maitin-Shepard,
	Andrew Morton
  Cc: linux-pm, linux-kernel

Kexec base hibernation has some potential advantages over uswsusp and
suspend2. Some most obvious advantages are:

1. The hibernation image size can exceed half of memory size easily.
2. The hibernation image can be written to and read from almost
   anywhere, such as USB disk, NFS.

This patch implements the functionality of "jumping from kexeced
kernel to original kernel". That is, the following sequence is
possible:

1. Boot a kernel A
2. Work under kernel A
3. Kexec another kernel B in kernel A
4. Work under kernel B
5. Jump from kernel B to kernel A
6. Continue work under kernel A

This is the first step to implement kexec based hibernation. If the
memory image of kernel A is written to or read from a permanent media
in step 4, a preliminary version of kexec based hibernation can be
implemented.

The kernel B is run as a crashdump kernel in reserved memory
region. This is the biggest constrains of the patch. It is planed to
be eliminated in the next version. That is, instead of reserving memory
region previously, the needed memory region is backuped before kexec
and restored after jumping back.

Another constrains of the patch is that the CONFIG_ACPI must be turned
off to make kexec jump work. Because ACPI will put devices into low
power state, the kexeced kernel can not be booted properly under
it. This constrains can be eliminated by separating the suspend method
and hibernation method of the devices as proposed earlier in the LKML.

The kexec jump is implemented in the framework of software suspend. In
fact, the kexec based hibernation can be seen as just implementing the
image writing and reading method of software suspend with a kexeced
Linux kernel.

Now, only the i386 architecture is supported. The patch is based on
Linux kernel 2.6.22, and has been tested on my IBM T42.

Usage:

1. Compile kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y # not needed strictly, but it is more convenient with it
CONFIG_KEXEC=y
CONFIG_SOFTWARE_SUSPEND=y
CONFIG_KEXEC_HIBERNATION=y

2. Compile the kexec-tools with kdump and kjump patches added, the
   kdump patch can be found at:

http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch

   While, the kexec-tools kjump patch is appended with the mail.

3. Boot compiled kernel, the reserved crash kernel memory region must
   be added to kernel command line as following:

   crashkernel=<XX>M@<XX>M

   Where, <XX> should be replaced by the real memory size and position.
Kexec jump - The first step to kexec base hibernation

Kexec base hibernation has some potential advantages over uswsusp and
suspend2. Some most obvious advantages are:

1. The hibernation image size can exceed half of memory size easily.
2. The hibernation image can be written to and read from almost
   anywhere, such as USB disk, NFS.

This patch implements the functionality of "jumping from kexeced
kernel to original kernel". That is, the following sequence is
possible:

1. Boot a kernel A
2. Work under kernel A
3. Kexec another kernel B in kernel A
4. Work under kernel B
5. Jump from kernel B to kernel A
6. Continue work under kernel A

This is the first step to implement kexec based hibernation. If the
memory image of kernel A is written to or read from a permanent media
in step 4, a preliminary version of kexec based hibernation can be
implemented.

The kernel B is run as a crashdump kernel in reserved memory
region. This is the biggest constrains of the patch. It is planed to
be eliminated in the next version. That is, instead of reserving memory
region previously, the needed memory region is backuped before kexec
and restored after jumping back.

Another constrains of the patch is that the CONFIG_ACPI must be turned
off to make kexec jump work. Because ACPI will put devices into low
power state, the kexeced kernel can not be booted properly under
it. This constrains can be eliminated by separating the suspend method
and hibernation method of the devices as proposed earlier in the LKML.

The kexec jump is implemented in the framework of software suspend. In
fact, the kexec based hibernation can be seen as just implementing the
image writing and reading method of software suspend with a kexeced
Linux kernel.

Now, only the i386 architecture is supported. The patch is based on
Linux kernel 2.6.22, and has been tested on my IBM T42.

Usage:

1. Compile kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y # not needed strictly, but it is more convenient with it
CONFIG_KEXEC=y
CONFIG_SOFTWARE_SUSPEND=y
CONFIG_KEXEC_HIBERNATION=y

2. Compile the kexec-tools with kdump and kjump patches added, the
   kdump patch can be found at:

http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch

   While, the kexec-tools kjump patch is appended with the mail.

3. Boot compiled kernel, the reserved crash kernel memory region must
   be added to kernel command line as following:

   crashkernel=<XX>M@<XX>M

   Where, <XX> should be replaced by the real memory size and position.

4. Switch hibernation image operations, through shell command as follow:

   echo kexec > /sys/power/hibernation_image_ops

5. Boot the kexeced kernel as a crashdump kernel, the same kernel can
   be used if CONFIG_RELOCATABLE=y is selected. The kernel command
   line option as following must be appended to kernel command line.

   kexec_jump_buf_pfn=`cat /sys/kernel/kexec_jump_buf_pfn`

6. In the kexec booted kernel, switch hibernation image operations, as
   in 4.

7. In the kexec booted kernel, trigger the jumping back with following
   shell command.

   echo <a>:<b> > /sys/power/resume

   Where <a> and <b> is non-negative integer, at least one of them must
   be non-zero.

Hibernation image operations

This patch make it possible to have multiple implementations of
hibernation image operations such as write, read, check, etc, and they
can be switched at run time through writing the
"/sys/power/hibernation_image_ops". The uswsusp is the default
implementation.

Signed-off-by: Huang Ying <ying.huang@intel.com>

Kexec jump

This patch provide the kexec based implementation of hibernation image
operation. Now, only jumping between original kernel and kexeced
kernel is supported, real image write/read/check will be provided in
next patches.

Signed-off-by: Huang Ying <ying.huang@intel.com>

4. Switch hibernation image operations, through shell command as follow:

   echo kexec > /sys/power/hibernation_image_ops

5. Boot the kexeced kernel as a crashdump kernel, the same kernel Kexec jump - The first step to kexec base hibernation

Kexec base hibernation has some potential advantages over uswsusp and
suspend2. Some most obvious advantages are:

1. The hibernation image size can exceed half of memory size easily.
2. The hibernation image can be written to and read from almost
   anywhere, such as USB disk, NFS.

This patch implements the functionality of "jumping from kexeced
kernel to original kernel". That is, the following sequence is
possible:

1. Boot a kernel A
2. Work under kernel A
3. Kexec another kernel B in kernel A
4. Work under kernel B
5. Jump from kernel B to kernel A
6. Continue work under kernel A

This is the first step to implement kexec based hibernation. If the
memory image of kernel A is written to or read from a permanent media
in step 4, a preliminary version of kexec based hibernation can be
implemented.

The kernel B is run as a crashdump kernel in reserved memory
region. This is the biggest constrains of the patch. It is planed to
be eliminated in the next version. That is, instead of reserving memory
region previously, the needed memory region is backuped before kexec
and restored after jumping back.

Another constrains of the patch is that the CONFIG_ACPI must be turned
off to make kexec jump work. Because ACPI will put devices into low
power state, the kexeced kernel can not be booted properly under
it. This constrains can be eliminated by separating the suspend method
and hibernation method of the devices as proposed earlier in the LKML.

The kexec jump is implemented in the framework of software suspend. In
fact, the kexec based hibernation can be seen as just implementing the
image writing and reading method of software suspend with a kexeced
Linux kernel.

Now, only the i386 architecture is supported. The patch is based on
Linux kernel 2.6.22, and has been tested on my IBM T42.

Usage:

1. Compile kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y # not needed strictly, but it is more convenient with it
CONFIG_KEXEC=y
CONFIG_SOFTWARE_SUSPEND=y
CONFIG_KEXEC_HIBERNATION=y

2. Compile the kexec-tools with kdump and kjump patches added, the
   kdump patch can be found at:

http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch

   While, the kexec-tools kjump patch is appended with the mail.

3. Boot compiled kernel, the reserved crash kernel memory region must
   be added to kernel command line as following:

   crashkernel=<XX>M@<XX>M

   Where, <XX> should be replaced by the real memory size and position.

4. Switch hibernation image operations, through shell command as follow:

   echo kexec > /sys/power/hibernation_image_ops

5. Boot the kexeced kernel as a crashdump kernel, the same kernel can
   be used if CONFIG_RELOCATABLE=y is selected. The kernel command
   line option as following must be appended to kernel command line.

   kexec_jump_buf_pfn=`cat /sys/kernel/kexec_jump_buf_pfn`

6. In the kexec booted kernel, switch hibernation image operations, as
   in 4.

7. In the kexec booted kernel, trigger the jumping back with following
   shell command.

   echo <a>:<b> > /sys/power/resume

   Where <a> and <b> is non-negative integer, at least one of them must
   be non-zero.

Hibernation image operations

This patch make it possible to have multiple implementations of
hibernation image operations such as write, read, check, etc, and they
can be switched at run time through writing the
"/sys/power/hibernation_image_ops". The uswsusp is the default
implementation.

Signed-off-by: Huang Ying <ying.huang@intel.com>

Kexec jump

This patch provide the kexec based implementation of hibernation image
operation. Now, only jumping between original kernel and kexeced
kernel is supported, real image write/read/check will be provided in
next patches.

Signed-off-by: Huang Ying <ying.huang@intel.com>
can
   be used if CONFIG_RELOCATABLE=y is selected. The kernel command
   line option as following must be appended to kernel command line.

   kexec_jump_buf_pfn=`cat /sys/kernel/kexec_jump_buf_pfn`

6. In the kexec booted kernel, switch hibernation image operations, as
   in 4.

7. In the kexec booted kernel, trigger the jumping back with following
   shell command.

   echo <a>:<b> > /sys/power/resume

   Where <a> and <b> is non-negative integer, at least one of them must
   be non-zero.

Index: kexec-tools-1.101/kexec/arch/i386/crashdump-x86.c
===================================================================
--- kexec-tools-1.101.orig/kexec/arch/i386/crashdump-x86.c	2007-07-08 15:00:25.000000000 +0000
+++ kexec-tools-1.101/kexec/arch/i386/crashdump-x86.c	2007-07-09 22:58:46.000000000 +0000
@@ -428,6 +428,33 @@
 	return 0;
 }

+/* Adds the kexec_backup= command line parameter to command line. */
+static int cmdline_add_backup(char *cmdline, unsigned long addr)
+{
+	int cmdlen, len, align = 1024;
+	char str[30], *ptr;
+
+	/* Passing in kexec_backup=xxxK format. Saves space required in cmdline.
+	 * Ensure 1K alignment*/
+	if (addr%align)
+		return -1;
+	addr = addr/align;
+	ptr = str;
+	strcpy(str, " kexec_backup=");
+	ptr += strlen(str);
+	ultoa(addr, ptr);
+	strcat(str, "K");
+	len = strlen(str);
+	cmdlen = strlen(cmdline) + len;
+	if (cmdlen > (COMMAND_LINE_SIZE - 1))
+		die("Command line overflow\n");
+	strcat(cmdline, str);
+#if 0
+		printf("Command line after adding backup\n");
+		printf("%s\n", cmdline);
+#endif
+	return 0;
+}

 /*
  * This routine is specific to i386 architecture to maintain the
@@ -724,5 +751,6 @@
 		return -1;
 	cmdline_add_memmap(mod_cmdline, memmap_p);
 	cmdline_add_elfcorehdr(mod_cmdline, elfcorehdr);
+	cmdline_add_backup(mod_cmdline, info->backup_start);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-11 15:30 [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Huang, Ying
  2007-07-11 11:13 ` Pavel Machek
@ 2007-07-12  0:22 ` Andrew Morton
  2007-07-12  5:48   ` Jeremy Fitzhardinge
                     ` (4 more replies)
       [not found] ` <20070711111350.GI7091@elf.ucw.cz>
  2 siblings, 5 replies; 113+ messages in thread
From: Andrew Morton @ 2007-07-12  0:22 UTC (permalink / raw)
  To: Huang, Ying; +Cc: linux-kernel, Pavel Machek, linux-pm, Jeremy Maitin-Shepard

On Wed, 11 Jul 2007 15:30:31 +0000
"Huang, Ying" <ying.huang@intel.com> wrote:

> Kexec base hibernation has some potential advantages over uswsusp and
> suspend2. Some most obvious advantages are:
> 
> 1. The hibernation image size can exceed half of memory size easily.
> 2. The hibernation image can be written to and read from almost
>    anywhere, such as USB disk, NFS.
> 
> This patch implements the functionality of "jumping from kexeced
> kernel to original kernel". That is, the following sequence is
> possible:
> 
> 1. Boot a kernel A
> 2. Work under kernel A
> 3. Kexec another kernel B in kernel A
> 4. Work under kernel B
> 5. Jump from kernel B to kernel A
> 6. Continue work under kernel A
> 
> This is the first step to implement kexec based hibernation. If the
> memory image of kernel A is written to or read from a permanent media
> in step 4, a preliminary version of kexec based hibernation can be
> implemented.
> 
> The kernel B is run as a crashdump kernel in reserved memory
> region. This is the biggest constrains of the patch. It is planed to
> be eliminated in the next version. That is, instead of reserving memory
> region previously, the needed memory region is backuped before kexec
> and restored after jumping back.
> 
> Another constrains of the patch is that the CONFIG_ACPI must be turned
> off to make kexec jump work. Because ACPI will put devices into low
> power state, the kexeced kernel can not be booted properly under
> it. This constrains can be eliminated by separating the suspend method
> and hibernation method of the devices as proposed earlier in the LKML.
> 
> The kexec jump is implemented in the framework of software suspend. In
> fact, the kexec based hibernation can be seen as just implementing the
> image writing and reading method of software suspend with a kexeced
> Linux kernel.
> 
> Now, only the i386 architecture is supported. The patch is based on
> Linux kernel 2.6.22, and has been tested on my IBM T42.

This sounds awesome.  Am I correct in expecting that ultimately the
existing hibernation implementation just goes away and we reuse (and hence
strengthen) the existing kexec (and kdump?) infrastructure?

And that we get hibernation support almost for free on all kexec (and
relocatable-kernel?) capable architectures?

And that all the management of hibernation and resume happens in userspace?

I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
be disabled in the to-be-hibernated kernel, or in the little transient
kexec kernel?

How close do you think all this is to being a viable thing?

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12  0:22 ` Andrew Morton
@ 2007-07-12  5:48   ` Jeremy Fitzhardinge
       [not found]   ` <4695C096.5080400@goop.org>
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 113+ messages in thread
From: Jeremy Fitzhardinge @ 2007-07-12  5:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Pavel Machek, Huang, Ying, linux-pm,
	Jeremy Maitin-Shepard

Andrew Morton wrote:
> On Wed, 11 Jul 2007 15:30:31 +0000
> "Huang, Ying" <ying.huang@intel.com> wrote:
>   
>> 1. Boot a kernel A
>> 2. Work under kernel A
>> 3. Kexec another kernel B in kernel A
>> 4. Work under kernel B
>> 5. Jump from kernel B to kernel A
>> 6. Continue work under kernel A
>>
>> This is the first step to implement kexec based hibernation. If the
>> memory image of kernel A is written to or read from a permanent media
>> in step 4, a preliminary version of kexec based hibernation can be
>> implemented.
>>
>> The kernel B is run as a crashdump kernel in reserved memory
>> region. This is the biggest constrains of the patch. It is planed to
>> be eliminated in the next version. That is, instead of reserving memory
>> region previously, the needed memory region is backuped before kexec
>> and restored after jumping back.
>>
>> Another constrains of the patch is that the CONFIG_ACPI must be turned
>> off to make kexec jump work. Because ACPI will put devices into low
>> power state, the kexeced kernel can not be booted properly under
>> it. This constrains can be eliminated by separating the suspend method
>> and hibernation method of the devices as proposed earlier in the LKML.
>>
>> The kexec jump is implemented in the framework of software suspend. In
>> fact, the kexec based hibernation can be seen as just implementing the
>> image writing and reading method of software suspend with a kexeced
>> Linux kernel.
>>     

I guess I'm (still) confused by the terminology here.  Do you mean that 
it fits into suspend-to-disk as a disk-writing mechanism, or in 
suspend-to-ram as a way of going to sleep?

>> Now, only the i386 architecture is supported. The patch is based on
>> Linux kernel 2.6.22, and has been tested on my IBM T42.
>>     
>
> This sounds awesome.  Am I correct in expecting that ultimately the
> existing hibernation implementation just goes away and we reuse (and hence
> strengthen) the existing kexec (and kdump?) infrastructure?
>
> And that we get hibernation support almost for free on all kexec (and
> relocatable-kernel?) capable architectures?
>
> And that all the management of hibernation and resume happens in userspace?
>
> I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
> be disabled in the to-be-hibernated kernel, or in the little transient
> kexec kernel?
>   

I think the point is that if kernel A says "I'm suspending" and calls 
the suspend method on all its devices, then kernel B finds that it has 
no powered on devices to work with.  But then couldn't it turn on the 
ones it wants anyway?  And don't you want to suspend them, to make sure 
they're not still DMAing memory while B is trying to shuffle everything 
off to disk?

It does sound pretty cool.

    J

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]   ` <4695C096.5080400@goop.org>
@ 2007-07-12  6:43     ` david
  2007-07-12 12:46       ` Rafael J. Wysocki
       [not found]       ` <200707121446.14170.rjw@sisk.pl>
       [not found]     ` <1184260174.9346.85.camel@caritas-dev.intel.com>
  2007-07-12 17:09     ` Huang, Ying
  2 siblings, 2 replies; 113+ messages in thread
From: david @ 2007-07-12  6:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote:

> Andrew Morton wrote:
>>  On Wed, 11 Jul 2007 15:30:31 +0000
>>  "Huang, Ying" <ying.huang@intel.com> wrote:
>> 
>> >  1. Boot a kernel A
>> >  2. Work under kernel A
>> >  3. Kexec another kernel B in kernel A
>> >  4. Work under kernel B
>> >  5. Jump from kernel B to kernel A
>> >  6. Continue work under kernel A
>> > 
>> >  This is the first step to implement kexec based hibernation. If the
>> >  memory image of kernel A is written to or read from a permanent media
>> >  in step 4, a preliminary version of kexec based hibernation can be
>> >  implemented.
>> > 
>> >  The kernel B is run as a crashdump kernel in reserved memory
>> >  region. This is the biggest constrains of the patch. It is planed to
>> >  be eliminated in the next version. That is, instead of reserving memory
>> >  region previously, the needed memory region is backuped before kexec
>> >  and restored after jumping back.
>> > 
>> >  Another constrains of the patch is that the CONFIG_ACPI must be turned
>> >  off to make kexec jump work. Because ACPI will put devices into low
>> >  power state, the kexeced kernel can not be booted properly under
>> >  it. This constrains can be eliminated by separating the suspend method
>> >  and hibernation method of the devices as proposed earlier in the LKML.
>> > 
>> >  The kexec jump is implemented in the framework of software suspend. In
>> >  fact, the kexec based hibernation can be seen as just implementing the
>> >  image writing and reading method of software suspend with a kexeced
>> >  Linux kernel.
>> > 
>
> I guess I'm (still) confused by the terminology here.  Do you mean that it 
> fits into suspend-to-disk as a disk-writing mechanism, or in suspend-to-ram 
> as a way of going to sleep?

Suspend-to-ram involves stopping the system and shutting down devices to 
go into low-power mode, then on wakeup restarting devices and resuming 
operation

so the steps would be.

1. stop userspace

2. walk the system device tree and put devices to sleep

3. go into the lowest power mode available and wait for a wakeup signal

later

4. walk the system device tree and wake up devices

5. resume userspace scheduling.

note that what devices get put to sleep could be configurable, potentially 
to the extreme of things like the OLPC (that have hardware designed for 
cheap sleeping) going into a light suspend-to-ram state between keystrokes 
if nothing else has a timer event scheduled before that.

Suspend-do-disk (Hibernate) involves stopping the system, makeing a 
snapshot of ram, writing the snapshot to somewhere and powering off the 
box. on wakeup (power-on) a helper kernel boots, loads the snapshot into 
ram and jumps to the kernel in the snapshot to resume operation.

as I understand the proposal the thought is to do the following

1. system kernel does suspend-to-ram to put the devices into a known safe 
state.

2. system kernel uses kexec to start hibernate kernel

3. hibernate kernel wakes up devices it needs as if it was doing a 
resume-from-ram

4. hibernate kernel copies ram image somewhere

5. hibernate kernel shuts down the box

later

6. hibernate kernel boots

7. hibernate kernel copies ram image from somewhere

8. hibernate kernel does syspend-to-ram to put the devices into a known 
safe state.

9. hibernate kernel uses kexec to start system kernel

10. system kernel wakes up devices it needs as if it was doing a 
resume-from-ram.

>> >  Now, only the i386 architecture is supported. The patch is based on
>> >  Linux kernel 2.6.22, and has been tested on my IBM T42.
>> > 
>>
>>  This sounds awesome.  Am I correct in expecting that ultimately the
>>  existing hibernation implementation just goes away and we reuse (and hence
>>  strengthen) the existing kexec (and kdump?) infrastructure?
>>
>>  And that we get hibernation support almost for free on all kexec (and
>>  relocatable-kernel?) capable architectures?
>>
>>  And that all the management of hibernation and resume happens in
>>  userspace?

this is the thought.

>>  I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI
>>  must
>>  be disabled in the to-be-hibernated kernel, or in the little transient
>>  kexec kernel?
>> 
>
> I think the point is that if kernel A says "I'm suspending" and calls the 
> suspend method on all its devices, then kernel B finds that it has no powered 
> on devices to work with.  But then couldn't it turn on the ones it wants 
> anyway?  And don't you want to suspend them, to make sure they're not still 
> DMAing memory while B is trying to shuffle everything off to disk?

I don't understand the ACPI problem so I can't try to clarify it.

> It does sound pretty cool.

re-useing existing components in new ways, making it so that particular 
problems only have to be solved once and that solution is used repeatedly. 
there's a lot to like about this approach.

very cool.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]   ` <1184251423.9346.55.camel@caritas-dev.intel.com>
@ 2007-07-12  7:03     ` david
  2007-07-12 12:53     ` Rafael J. Wysocki
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-12  7:03 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Huang, Ying wrote:

> On Wed, 2007-07-11 at 17:22 -0700, Andrew Morton wrote:
>> This sounds awesome.  Am I correct in expecting that ultimately the
>> existing hibernation implementation just goes away and we reuse (and hence
>> strengthen) the existing kexec (and kdump?) infrastructure?
>> And that we get hibernation support almost for free on all kexec (and
>> relocatable-kernel?) capable architectures?
>> And that all the management of hibernation and resume happens in userspace?
>
> Yes. Ultimately, most of the hibernation code such as process freezer,
> memory shrinking, memory snapshot (atomic copy), image reading/writing
> can go away, because kexec based hibernation doesn't depend on them.
> Just the device/CPU state quiescent/save/restore is necessary to remain.

and the device/CPU quiesent/save/restore/sleep/wakeup functions are needed 
for suspend-to-ram and low-power mode.

> And, the management of hibernation and resume will happen in userspace.
>
>>
>> I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
>> be disabled in the to-be-hibernated kernel, or in the little transient
>> kexec kernel?
>
> Under current implementation of device state quiescent/save/restore, the
> CONFIG_ACPI must be turned off both in to-be-hibernated kernel and
> transient kexec kernel.
>
> But the hibernation people are going to separate the device suspend from
> device hibernate. The device hibernate will put device in quiescent
> state but not in low power state. When this is done, it is not necessary
> to disable CONFIG_ACPI at all. It is just a workaround for current
> implementation that disabling CONFIG_ACPI.
>
>> How close do you think all this is to being a viable thing?
>
> The kexec jump is the first step, maybe the simplest step. There are
> many other issues to be resolved, at least the following ones.
>
> 1. Separate device suspend from device hibernate.

you shouldn't need a device hibernate, hibernate will be a system 
shutdown.

> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
> before kexec and restore them after kexec.
> 3. Support the in-place kexec? The relocatable kernel is not necessary
> if this can be implemented.
> 4. Image writing/reading. (Only user space application is needed).
> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
> for resume. For example, in the first stage of kernel boot, just first
> 16M (or a little more) RAM is used, if the resume image is found, the
> saved kernel image is resumed; if the resume image is not found, turn on
> the remaining RAM. This will depends on 3.
> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
> be hibernate/resume by the normal kernel too. This way, a real
> kexec/boot-up is only needed for the first time.

the hibernate kernel shouldn't need a lot of the features of the standard 
kerneel (does it really need sound for example), and if tailored even 
tighter could be configured to only have the drivers actually used for the 
save and restore, makeing a _very_ minimal kernel (no USB, no network, 
only simple video drivers, etc) greatly speeding up the boot

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]   ` <1184257734.9346.76.camel@caritas-dev.intel.com>
@ 2007-07-12  8:54     ` Pavel Machek
       [not found]     ` <20070712085428.GA1866@elf.ucw.cz>
  1 sibling, 0 replies; 113+ messages in thread
From: Pavel Machek @ 2007-07-12  8:54 UTC (permalink / raw)
  To: Huang, Ying; +Cc: linux-kernel, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

Hi!

> > Looks interesting... but I was feeling strange dejavu reading
> > this... and that's because you pasted the changelog twice :-).
> > 
> 
> Sorry, I should have re-checked the mail before sending out.

Were your patches enough to get hibernation working? I got kexec to
work here, so I guess I'm one step closer...

...video does not work in the kexec-ed kernel, unless I boot with
vga=1.

> > How fast can kexec boot secondary kernel?
> 
> I measure it on my IBM T42 with CONFIG_PRINTK_TIME=y. The boot-up time
> is about 4.35s from kexec is issued to root mounted in kexec kernel.

Not bad...

> > > 2. Compile the kexec-tools with kdump and kjump patches added, the
> > >    kdump patch can be found at:
> > > 
> > > http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch
> > 
> > I got 404 error :-(.
> 
> Sorry, typo problem. The URL should be:
> http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump10.patch

That works, thanks.

> > > 7. In the kexec booted kernel, trigger the jumping back with following
> > >    shell command.
> > > 
> > >    echo <a>:<b> > /sys/power/resume
> > > 
> > >    Where <a> and <b> is non-negative integer, at least one of them must
> > >    be non-zero.
> > 
> > What does a and b mean?

> a and b has no meaning. They are only used to trigger the resume
> process. This will be fixed in the future version.

Ok.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]       ` <1184260683.9346.91.camel@caritas-dev.intel.com>
@ 2007-07-12 10:10         ` david
  2007-07-12 13:01           ` Rafael J. Wysocki
                             ` (3 more replies)
  0 siblings, 4 replies; 113+ messages in thread
From: david @ 2007-07-12 10:10 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Huang, Ying wrote:

> On Thu, 2007-07-12 at 00:03 -0700, david@lang.hm wrote:
>>>
>>> The kexec jump is the first step, maybe the simplest step. There are
>>> many other issues to be resolved, at least the following ones.
>>>
>>> 1. Separate device suspend from device hibernate.
>>
>
> Maybe my usage of terminology has some problem. But, the "device
> hibernate" here means put device into quiescent state and save the
> device state, but do not put device into low power state.

is there really enough savings (in time or otherwise) to make it worth 
splitting this into two steps? for suspend-to-ram we definantly will need 
to option to go all the way to a low power state, there's significant 
extra complexity if you also have a state between normal operation and 
this low power state.

it may be worth doing if the low-power state is expensive (in time or 
effort) to get to or from and the lesser state allows the computer overall 
to save power (like the different cpu C states)

but I suspect that the number of drivers where this is worth doing is 
relativly small, and it may be a better approach to start off with just 
putting everything into the low-power state until some drive shows up that 
makes it worth adding the intermediate state to the system (and drivers 
wouldn't have to change, if they only support one suspend state it's the 
low-power one, if they support more then higher layers choose which ones 
to move to)

>>> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
>>> be hibernate/resume by the normal kernel too. This way, a real
>>> kexec/boot-up is only needed for the first time.
>>
>> the hibernate kernel shouldn't need a lot of the features of the standard
>> kerneel (does it really need sound for example), and if tailored even
>> tighter could be configured to only have the drivers actually used for the
>> save and restore, makeing a _very_ minimal kernel (no USB, no network,
>> only simple video drivers, etc) greatly speeding up the boot
>
> There is no need for two kernel. Most drivers and optional features are
> compiled as modules, as that of most desktop distributions. So just
> "insmod" needed modules only in hibernate kernel is sufficient.

actually, I think that while you may be able to get away with only one 
kernel, you are probably better off with two. on the hibernate kernel you 
can choose many 'embedded' options that don't make sense for the normal 
kernel (no high mem, no SMP support, no SELinux, no network routing, not 
netfilter, use SLOB not SLAB/SLUB, etc). also keep in mind that each 
module that you load wastes apartial page of memory.

remember people run complete linux systems in 8M of ram, a syspend system 
for a simple 'write the ram image to partition X on this IDE drive' should 
be aiming at 2-4M of memory.

more complex setups may want more space, but let the distros bloat things 
up, design and demo an optimized system :-)

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12  0:22 ` Andrew Morton
  2007-07-12  5:48   ` Jeremy Fitzhardinge
       [not found]   ` <4695C096.5080400@goop.org>
@ 2007-07-12 12:38   ` Rafael J. Wysocki
  2007-07-12 14:43   ` Huang, Ying
       [not found]   ` <1184251423.9346.55.camel@caritas-dev.intel.com>
  4 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 12:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Pavel Machek, Huang, Ying, linux-pm,
	Jeremy Maitin-Shepard

On Thursday, 12 July 2007 02:22, Andrew Morton wrote:
> On Wed, 11 Jul 2007 15:30:31 +0000
> "Huang, Ying" <ying.huang@intel.com> wrote:
> 
> > Kexec base hibernation has some potential advantages over uswsusp and
> > suspend2. Some most obvious advantages are:
> > 
> > 1. The hibernation image size can exceed half of memory size easily.
> > 2. The hibernation image can be written to and read from almost
> >    anywhere, such as USB disk, NFS.
> > 
> > This patch implements the functionality of "jumping from kexeced
> > kernel to original kernel". That is, the following sequence is
> > possible:
> > 
> > 1. Boot a kernel A
> > 2. Work under kernel A
> > 3. Kexec another kernel B in kernel A
> > 4. Work under kernel B
> > 5. Jump from kernel B to kernel A
> > 6. Continue work under kernel A
> > 
> > This is the first step to implement kexec based hibernation. If the
> > memory image of kernel A is written to or read from a permanent media
> > in step 4, a preliminary version of kexec based hibernation can be
> > implemented.
> > 
> > The kernel B is run as a crashdump kernel in reserved memory
> > region. This is the biggest constrains of the patch. It is planed to
> > be eliminated in the next version. That is, instead of reserving memory
> > region previously, the needed memory region is backuped before kexec
> > and restored after jumping back.
> > 
> > Another constrains of the patch is that the CONFIG_ACPI must be turned
> > off to make kexec jump work. Because ACPI will put devices into low
> > power state, the kexeced kernel can not be booted properly under
> > it. This constrains can be eliminated by separating the suspend method
> > and hibernation method of the devices as proposed earlier in the LKML.
> > 
> > The kexec jump is implemented in the framework of software suspend. In
> > fact, the kexec based hibernation can be seen as just implementing the
> > image writing and reading method of software suspend with a kexeced
> > Linux kernel.
> > 
> > Now, only the i386 architecture is supported. The patch is based on
> > Linux kernel 2.6.22, and has been tested on my IBM T42.
> 
> This sounds awesome.  Am I correct in expecting that ultimately the
> existing hibernation implementation just goes away and we reuse (and hence
> strengthen) the existing kexec (and kdump?) infrastructure?

Well, I haven't had the time to look at it more closely, but I'd assume that if
we can reuse the kexec infrastructure for hibernation, then yes, the existing
implementation will go away.

> And that we get hibernation support almost for free on all kexec (and
> relocatable-kernel?) capable architectures?

We should be able to, in theory.

> And that all the management of hibernation and resume happens in userspace?

I'm not sure what you mean here.

The image saving/loading certainly can be done in the user space.

> I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
> be disabled in the to-be-hibernated kernel, or in the little transient
> kexec kernel?

I think that this mechanism requires that devices be not suspended (ie. in low
power states).

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12  6:43     ` david
@ 2007-07-12 12:46       ` Rafael J. Wysocki
       [not found]       ` <200707121446.14170.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 12:46 UTC (permalink / raw)
  To: david
  Cc: Jeremy Fitzhardinge, linux-kernel, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thursday, 12 July 2007 08:43, david@lang.hm wrote:
> On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote:
> 
> > Andrew Morton wrote:
> >>  On Wed, 11 Jul 2007 15:30:31 +0000
> >>  "Huang, Ying" <ying.huang@intel.com> wrote:
> >> 
> >> >  1. Boot a kernel A
> >> >  2. Work under kernel A
> >> >  3. Kexec another kernel B in kernel A
> >> >  4. Work under kernel B
> >> >  5. Jump from kernel B to kernel A
> >> >  6. Continue work under kernel A
> >> > 
> >> >  This is the first step to implement kexec based hibernation. If the
> >> >  memory image of kernel A is written to or read from a permanent media
> >> >  in step 4, a preliminary version of kexec based hibernation can be
> >> >  implemented.
> >> > 
> >> >  The kernel B is run as a crashdump kernel in reserved memory
> >> >  region. This is the biggest constrains of the patch. It is planed to
> >> >  be eliminated in the next version. That is, instead of reserving memory
> >> >  region previously, the needed memory region is backuped before kexec
> >> >  and restored after jumping back.
> >> > 
> >> >  Another constrains of the patch is that the CONFIG_ACPI must be turned
> >> >  off to make kexec jump work. Because ACPI will put devices into low
> >> >  power state, the kexeced kernel can not be booted properly under
> >> >  it. This constrains can be eliminated by separating the suspend method
> >> >  and hibernation method of the devices as proposed earlier in the LKML.
> >> > 
> >> >  The kexec jump is implemented in the framework of software suspend. In
> >> >  fact, the kexec based hibernation can be seen as just implementing the
> >> >  image writing and reading method of software suspend with a kexeced
> >> >  Linux kernel.
> >> > 
> >
> > I guess I'm (still) confused by the terminology here.  Do you mean that it 
> > fits into suspend-to-disk as a disk-writing mechanism, or in suspend-to-ram 
> > as a way of going to sleep?
> 
> Suspend-to-ram involves stopping the system and shutting down devices to 
> go into low-power mode, then on wakeup restarting devices and resuming 
> operation
> 
> so the steps would be.
> 
> 1. stop userspace
> 
> 2. walk the system device tree and put devices to sleep
> 
> 3. go into the lowest power mode available and wait for a wakeup signal
> 
> later
> 
> 4. walk the system device tree and wake up devices
> 
> 5. resume userspace scheduling.

Note that we are going to phase out steps 1 and 5.

> note that what devices get put to sleep could be configurable, potentially 
> to the extreme of things like the OLPC (that have hardware designed for 
> cheap sleeping) going into a light suspend-to-ram state between keystrokes 
> if nothing else has a timer event scheduled before that.
> 
> Suspend-do-disk (Hibernate) involves stopping the system, makeing a 
> snapshot of ram, writing the snapshot to somewhere and powering off the 
> box. on wakeup (power-on) a helper kernel boots, loads the snapshot into 
> ram and jumps to the kernel in the snapshot to resume operation.
> 
> as I understand the proposal the thought is to do the following
> 
> 1. system kernel does suspend-to-ram to put the devices into a known safe 
> state.

Not necessarily suspend-to-RAM.  I'd much prefer it if devices were not put
into low power states but quiesced (ie. no DMA, no interrupts).

> 2. system kernel uses kexec to start hibernate kernel
> 
> 3. hibernate kernel wakes up devices it needs as if it was doing a 
> resume-from-ram

I think that the devices should be initialized from scratch in this step.

> 4. hibernate kernel copies ram image somewhere

In this step some userland may be involved (started from the "hibernate"
kernel).
 
> 5. hibernate kernel shuts down the box
> 
> later
> 
> 6. hibernate kernel boots
> 
> 7. hibernate kernel copies ram image from somewhere
> 
> 8. hibernate kernel does syspend-to-ram to put the devices into a known 
> safe state.

Again, the devices should be quiesced rather then suspended in this step.

> 9. hibernate kernel uses kexec to start system kernel
> 
> 10. system kernel wakes up devices it needs as if it was doing a 
> resume-from-ram.

I think it should reconfigure devices from scratch (ie. reprobe).

> >> >  Now, only the i386 architecture is supported. The patch is based on
> >> >  Linux kernel 2.6.22, and has been tested on my IBM T42.
> >> > 
> >>
> >>  This sounds awesome.  Am I correct in expecting that ultimately the
> >>  existing hibernation implementation just goes away and we reuse (and hence
> >>  strengthen) the existing kexec (and kdump?) infrastructure?
> >>
> >>  And that we get hibernation support almost for free on all kexec (and
> >>  relocatable-kernel?) capable architectures?
> >>
> >>  And that all the management of hibernation and resume happens in
> >>  userspace?
> 
> this is the thought.
> 
> >>  I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI
> >>  must
> >>  be disabled in the to-be-hibernated kernel, or in the little transient
> >>  kexec kernel?
> >> 
> >
> > I think the point is that if kernel A says "I'm suspending" and calls the 
> > suspend method on all its devices, then kernel B finds that it has no powered 
> > on devices to work with.  But then couldn't it turn on the ones it wants 
> > anyway?  And don't you want to suspend them, to make sure they're not still 
> > DMAing memory while B is trying to shuffle everything off to disk?
> 
> I don't understand the ACPI problem so I can't try to clarify it.
> 
> > It does sound pretty cool.
> 
> re-useing existing components in new ways, making it so that particular 
> problems only have to be solved once and that solution is used repeatedly. 
> there's a lot to like about this approach.
> 
> very cool.

Well, I'm not a big fan of it right now, but well, it looks doable in general.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]     ` <1184260174.9346.85.camel@caritas-dev.intel.com>
@ 2007-07-12 12:47       ` Rafael J. Wysocki
  0 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 12:47 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Jeremy Fitzhardinge, linux-kernel, Pavel Machek, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

On Thursday, 12 July 2007 19:09, Huang, Ying wrote:
> On Wed, 2007-07-11 at 22:48 -0700, Jeremy Fitzhardinge wrote:
> > >> The kexec jump is implemented in the framework of software suspend. In
> > >> fact, the kexec based hibernation can be seen as just implementing the
> > >> image writing and reading method of software suspend with a kexeced
> > >> Linux kernel.
> > >>     
> > 
> > I guess I'm (still) confused by the terminology here.  Do you mean that 
> > it fits into suspend-to-disk as a disk-writing mechanism, or in 
> > suspend-to-ram as a way of going to sleep?
> 
> It fits into suspend-to-disk as a disk-writing mechanism. But most
> tricks of suspend-to-disk will be no longer necessary in kexec based
> hibernation.
> 
> > > I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
> > > be disabled in the to-be-hibernated kernel, or in the little transient
> > > kexec kernel?
> > >   
> > 
> > I think the point is that if kernel A says "I'm suspending" and calls 
> > the suspend method on all its devices, then kernel B finds that it has 
> > no powered on devices to work with.  But then couldn't it turn on the 
> > ones it wants anyway?  And don't you want to suspend them, to make sure 
> > they're not still DMAing memory while B is trying to shuffle everything 
> > off to disk?
> 
> The devices should be put quiescent state to stop DMA like things. But
> they do not need to be put in low power state.

Exactly.  Morover, I don't think it would be correct to put them into low power
states.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]   ` <1184251423.9346.55.camel@caritas-dev.intel.com>
  2007-07-12  7:03     ` david
@ 2007-07-12 12:53     ` Rafael J. Wysocki
  2007-07-12 16:32     ` Eric W. Biederman
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 12:53 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thursday, 12 July 2007 16:43, Huang, Ying wrote:
> On Wed, 2007-07-11 at 17:22 -0700, Andrew Morton wrote:
> > This sounds awesome.  Am I correct in expecting that ultimately the
> > existing hibernation implementation just goes away and we reuse (and hence
> > strengthen) the existing kexec (and kdump?) infrastructure?
> > And that we get hibernation support almost for free on all kexec (and
> > relocatable-kernel?) capable architectures?
> > And that all the management of hibernation and resume happens in userspace?
> 
> Yes. Ultimately, most of the hibernation code such as process freezer,
> memory shrinking, memory snapshot (atomic copy), image reading/writing
> can go away, because kexec based hibernation doesn't depend on them.
> Just the device/CPU state quiescent/save/restore is necessary to remain.
> And, the management of hibernation and resume will happen in userspace.
> 
> > 
> > I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
> > be disabled in the to-be-hibernated kernel, or in the little transient
> > kexec kernel?
> 
> Under current implementation of device state quiescent/save/restore, the
> CONFIG_ACPI must be turned off both in to-be-hibernated kernel and
> transient kexec kernel.
> 
> But the hibernation people are going to separate the device suspend from
> device hibernate. The device hibernate will put device in quiescent
> state but not in low power state. When this is done, it is not necessary
> to disable CONFIG_ACPI at all. It is just a workaround for current
> implementation that disabling CONFIG_ACPI.
> 
> > How close do you think all this is to being a viable thing?
> 
> The kexec jump is the first step, maybe the simplest step. There are
> many other issues to be resolved, at least the following ones.
> 
> 1. Separate device suspend from device hibernate.

Step 0, I'd say. :-)

> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
> before kexec and restore them after kexec.

I don't think this is very important initially.

> 3. Support the in-place kexec? The relocatable kernel is not necessary
> if this can be implemented.
> 4. Image writing/reading. (Only user space application is needed).

And a kernel interface for that application.

> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
> for resume. For example, in the first stage of kernel boot, just first
> 16M (or a little more) RAM is used, if the resume image is found, the
> saved kernel image is resumed; if the resume image is not found, turn on
> the remaining RAM. This will depends on 3.

I think that this is the most difficult part of the whole thing.

> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
> be hibernate/resume by the normal kernel too. This way, a real
> kexec/boot-up is only needed for the first time.

I'm not sure what you mean.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12 10:10         ` david
@ 2007-07-12 13:01           ` Rafael J. Wysocki
       [not found]           ` <200707121501.03016.rjw@sisk.pl>
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 13:01 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thursday, 12 July 2007 12:10, david@lang.hm wrote:
> On Thu, 12 Jul 2007, Huang, Ying wrote:
> 
> > On Thu, 2007-07-12 at 00:03 -0700, david@lang.hm wrote:
> >>>
> >>> The kexec jump is the first step, maybe the simplest step. There are
> >>> many other issues to be resolved, at least the following ones.
> >>>
> >>> 1. Separate device suspend from device hibernate.
> >>
> >
> > Maybe my usage of terminology has some problem. But, the "device
> > hibernate" here means put device into quiescent state and save the
> > device state, but do not put device into low power state.
> 
> is there really enough savings (in time or otherwise) to make it worth 
> splitting this into two steps? for suspend-to-ram we definantly will need 
> to option to go all the way to a low power state, there's significant 
> extra complexity if you also have a state between normal operation and 
> this low power state.
> 
> it may be worth doing if the low-power state is expensive (in time or 
> effort) to get to or from and the lesser state allows the computer overall 
> to save power (like the different cpu C states)
> 
> but I suspect that the number of drivers where this is worth doing is 
> relativly small, and it may be a better approach to start off with just 
> putting everything into the low-power state until some drive shows up that 
> makes it worth adding the intermediate state to the system (and drivers 
> wouldn't have to change, if they only support one suspend state it's the 
> low-power one, if they support more then higher layers choose which ones 
> to move to)

We've discussed that a lot on linux-pm and the conclusion is that devices
should not be put into low power states before creating the hibernation
image, because that leads to problems during the restore.

In turn, during the restore, when the image has been loaded and the "old"
kernel gets the control, it should reprobe devices and initialize them from
scratch rather than doing something like "resume devices after suspend
to RAM".

> >>> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
> >>> be hibernate/resume by the normal kernel too. This way, a real
> >>> kexec/boot-up is only needed for the first time.
> >>
> >> the hibernate kernel shouldn't need a lot of the features of the standard
> >> kerneel (does it really need sound for example), and if tailored even
> >> tighter could be configured to only have the drivers actually used for the
> >> save and restore, makeing a _very_ minimal kernel (no USB, no network,
> >> only simple video drivers, etc) greatly speeding up the boot
> >
> > There is no need for two kernel. Most drivers and optional features are
> > compiled as modules, as that of most desktop distributions. So just
> > "insmod" needed modules only in hibernate kernel is sufficient.
> 
> actually, I think that while you may be able to get away with only one 
> kernel, you are probably better off with two. on the hibernate kernel you 
> can choose many 'embedded' options that don't make sense for the normal 
> kernel (no high mem, no SMP support, no SELinux, no network routing, not 
> netfilter, use SLOB not SLAB/SLUB, etc). also keep in mind that each 
> module that you load wastes apartial page of memory.
> 
> remember people run complete linux systems in 8M of ram, a syspend system 
> for a simple 'write the ram image to partition X on this IDE drive' should 
> be aiming at 2-4M of memory.
> 
> more complex setups may want more space, but let the distros bloat things 
> up, design and demo an optimized system :-)

So if a user wants to install a kernel.org kernel on his system, (s)he'll have
to compile and install two kernels with different options.

That doesn't sound good to me. :-)

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]           ` <200707121501.03016.rjw@sisk.pl>
@ 2007-07-12 13:22             ` jimmy bahuleyan
  2007-07-12 19:03             ` david
  1 sibling, 0 replies; 113+ messages in thread
From: jimmy bahuleyan @ 2007-07-12 13:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

Rafael J. Wysocki wrote:
[snip]
> 
> So if a user wants to install a kernel.org kernel on his system, (s)he'll have
> to compile and install two kernels with different options.
> 
> That doesn't sound good to me. :-)
> 

definitely. that sounds kind of strange, not to think of having to
remember which kernel to choose on booting.

would it be possible to have the same kernel (or image) act as both
kernel 'a' & 'b', kind of like operating in two different modes.

the kexec-style-hibernate does sound more appealing though..

> Greetings,
> Rafael

-jb
-- 
Tact is the art of making a point without making an enemy.

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]       ` <200707121446.14170.rjw@sisk.pl>
@ 2007-07-12 13:51         ` Mark Lord
       [not found]         ` <469631FA.2070405@rtr.ca>
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 113+ messages in thread
From: Mark Lord @ 2007-07-12 13:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, Jeremy Fitzhardinge, linux-kernel, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

Rafael J. Wysocki wrote:
> On Thursday, 12 July 2007 08:43, david@lang.hm wrote:
>> On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote:
>>
>>> Andrew Morton wrote:
..
>> 8. hibernate kernel does syspend-to-ram to put the devices into a known 
>> safe state.
> Again, the devices should be quiesced rather then suspended in this step.

That's just not possible.  The Hibernate kernel will not have all
of the same device drivers as the mainline kernel.  Or at least that's
what people have previously stated here.

..
>>>>  This sounds awesome.  Am I correct in expecting that ultimately the
>>>>  existing hibernation implementation just goes away and we reuse (and hence
>>>>  strengthen) the existing kexec (and kdump?) infrastructure?

No, not so simple.  We still need much of the code to santize devices
upon wakeup from hibernation.   And adding this extra reboot-kernel step
in the midst of hibernate will double the time it takes (ugh).

Currently, TuxOnIce(suspend2) takes about 10 seconds to suspend my notebook.
Switching to this new scheme would double that to 10 seconds to boot/probe,
plus the original 10 seconds to hibernate.  Assuming the new implementation
even comes close to suspend2 speed.

And the complexity and difficulty of setup really scares me.
Right now, we've got a pretty good/fast in-kernel (well, external patch)
that allows my machines to hibernate very quickly, wake up even faster,
and not swap like mad afterwards.  Without any external programs,
initramfs, or extra kernels required.

And we want to replace this with an ultra-complex setup because.. ????

Cheers

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12 10:10         ` david
  2007-07-12 13:01           ` Rafael J. Wysocki
       [not found]           ` <200707121501.03016.rjw@sisk.pl>
@ 2007-07-12 13:55           ` Mark Lord
  2007-07-12 19:05             ` david
  2007-07-12 14:06           ` Pavel Machek
  3 siblings, 1 reply; 113+ messages in thread
From: Mark Lord @ 2007-07-12 13:55 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

david@lang.hm wrote:
>
> actually, I think that while you may be able to get away with only one 
> kernel, you are probably better off with two. on the hibernate kernel 
> you can choose many 'embedded' options that don't make sense for the 
> normal kernel (no high mem, no SMP support, no SELinux, no network 
> routing, not netfilter, use SLOB not SLAB/SLUB, etc). also keep in mind 
> that each module that you load wastes apartial page of memory.

No highmem?  No thanks.

I really want hibernate to save stuff from above 1GB as well
as the stuff below 1GB.

Cheers

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12 10:10         ` david
                             ` (2 preceding siblings ...)
  2007-07-12 13:55           ` Mark Lord
@ 2007-07-12 14:06           ` Pavel Machek
  3 siblings, 0 replies; 113+ messages in thread
From: Pavel Machek @ 2007-07-12 14:06 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

Hi!

> >Maybe my usage of terminology has some problem. But, 
> >the "device
> >hibernate" here means put device into quiescent state 
> >and save the
> >device state, but do not put device into low power 
> >state.
> 
> is there really enough savings (in time or otherwise) to 
> make it worth splitting this into two steps? for 

Yep.

> but I suspect that the number of drivers where this is 
> worth doing is relativly small, and it may be a better 
> approach to start off with just putting everything into 
> the low-power state until some drive shows up that makes 
> it worth adding the intermediate state to the system 

We have had this flamewar before, and linus decided 'snapshot' and
'suspend' are different operations - and he's right. Disk takes 10
seconds to suspend/resume (spindown).
							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12  0:22 ` Andrew Morton
                     ` (2 preceding siblings ...)
  2007-07-12 12:38   ` Rafael J. Wysocki
@ 2007-07-12 14:43   ` Huang, Ying
       [not found]   ` <1184251423.9346.55.camel@caritas-dev.intel.com>
  4 siblings, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-12 14:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Pavel Machek, linux-pm, Jeremy Maitin-Shepard

On Wed, 2007-07-11 at 17:22 -0700, Andrew Morton wrote:
> This sounds awesome.  Am I correct in expecting that ultimately the
> existing hibernation implementation just goes away and we reuse (and hence
> strengthen) the existing kexec (and kdump?) infrastructure?
> And that we get hibernation support almost for free on all kexec (and
> relocatable-kernel?) capable architectures?
> And that all the management of hibernation and resume happens in userspace?

Yes. Ultimately, most of the hibernation code such as process freezer,
memory shrinking, memory snapshot (atomic copy), image reading/writing
can go away, because kexec based hibernation doesn't depend on them.
Just the device/CPU state quiescent/save/restore is necessary to remain.
And, the management of hibernation and resume will happen in userspace.

> 
> I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
> be disabled in the to-be-hibernated kernel, or in the little transient
> kexec kernel?

Under current implementation of device state quiescent/save/restore, the
CONFIG_ACPI must be turned off both in to-be-hibernated kernel and
transient kexec kernel.

But the hibernation people are going to separate the device suspend from
device hibernate. The device hibernate will put device in quiescent
state but not in low power state. When this is done, it is not necessary
to disable CONFIG_ACPI at all. It is just a workaround for current
implementation that disabling CONFIG_ACPI.

> How close do you think all this is to being a viable thing?

The kexec jump is the first step, maybe the simplest step. There are
many other issues to be resolved, at least the following ones.

1. Separate device suspend from device hibernate.
2. Do not reserve memory for kexec kernel. That is, backup needed memory
before kexec and restore them after kexec.
3. Support the in-place kexec? The relocatable kernel is not necessary
if this can be implemented.
4. Image writing/reading. (Only user space application is needed).
5. A smooth resume process. Maybe it is not needed to kexec a new kernel
for resume. For example, in the first stage of kernel boot, just first
16M (or a little more) RAM is used, if the resume image is found, the
saved kernel image is resumed; if the resume image is not found, turn on
the remaining RAM. This will depends on 3.
6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
be hibernate/resume by the normal kernel too. This way, a real
kexec/boot-up is only needed for the first time.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]         ` <469631FA.2070405@rtr.ca>
@ 2007-07-12 14:49           ` Pavel Machek
  2007-07-12 15:35           ` Rafael J. Wysocki
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 113+ messages in thread
From: Pavel Machek @ 2007-07-12 14:49 UTC (permalink / raw)
  To: Mark Lord
  Cc: david, Jeremy Fitzhardinge, linux-kernel, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

Hi!

> And the complexity and difficulty of setup really scares 
> me.
> Right now, we've got a pretty good/fast in-kernel (well, 
> external patch)
> that allows my machines to hibernate very quickly, wake 
> up even faster,
> and not swap like mad afterwards.  Without any external 
> programs,
> initramfs, or extra kernels required.
> 
> And we want to replace this with an ultra-complex setup 
> because.. ????

...freezer does not work with fuse :-). Or more exactly because
freezer is ugly, and we don't know how to get rid of it...

(Not that I advocate kexec-based hibernation. I think it is going to
suck. But it might allow kdump-but-keep-running, so work is not
wasted).
							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]         ` <469631FA.2070405@rtr.ca>
  2007-07-12 14:49           ` Pavel Machek
@ 2007-07-12 15:35           ` Rafael J. Wysocki
       [not found]           ` <200707121735.40077.rjw@sisk.pl>
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 15:35 UTC (permalink / raw)
  To: Mark Lord
  Cc: david, Jeremy Fitzhardinge, linux-kernel, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thursday, 12 July 2007 15:51, Mark Lord wrote:
> Rafael J. Wysocki wrote:
> > On Thursday, 12 July 2007 08:43, david@lang.hm wrote:
> >> On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote:
> >>
> >>> Andrew Morton wrote:
> ..
> >> 8. hibernate kernel does syspend-to-ram to put the devices into a known 
> >> safe state.
> > Again, the devices should be quiesced rather then suspended in this step.
> 
> That's just not possible.  The Hibernate kernel will not have all
> of the same device drivers as the mainline kernel.  Or at least that's
> what people have previously stated here.

OK, one more problem to solve. :-)

> >>>>  This sounds awesome.  Am I correct in expecting that ultimately the
> >>>>  existing hibernation implementation just goes away and we reuse (and hence
> >>>>  strengthen) the existing kexec (and kdump?) infrastructure?
> 
> No, not so simple.  We still need much of the code to santize devices
> upon wakeup from hibernation.   And adding this extra reboot-kernel step
> in the midst of hibernate will double the time it takes (ugh).
> 
> Currently, TuxOnIce(suspend2) takes about 10 seconds to suspend my notebook.
> Switching to this new scheme would double that to 10 seconds to boot/probe,
> plus the original 10 seconds to hibernate.  Assuming the new implementation
> even comes close to suspend2 speed.

How much RAM is there in your machine?

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]           ` <200707121735.40077.rjw@sisk.pl>
@ 2007-07-12 16:03             ` Mark Lord
       [not found]             ` <469650DE.4000901@rtr.ca>
  1 sibling, 0 replies; 113+ messages in thread
From: Mark Lord @ 2007-07-12 16:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, Jeremy Fitzhardinge, linux-kernel, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

Rafael J. Wysocki wrote:
> On Thursday, 12 July 2007 15:51, Mark Lord wrote:
>..
>> Currently, TuxOnIce(suspend2) takes about 10 seconds to suspend my notebook.
>> Switching to this new scheme would double that to 10 seconds to boot/probe,
>> plus the original 10 seconds to hibernate.  Assuming the new implementation
>> even comes close to suspend2 speed.
> 
> How much RAM is there in your machine?

2GB, but It doesn't need to dump that much for good performance.
Hibernate here consists of:

   echo "$(( 256 * 1024 * 1024 ))" > /sys/power/image_size
   echo -n disk > /sys/power/state

Plus a couple of fiddly commands to deal with the ATI binary X server.

We use this on other machines here too, with 2GB RAM (most of them)
and 1.3GB RAM (very slow HD, so it takes longer on that one).

Cheers

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]         ` <469631FA.2070405@rtr.ca>
                             ` (2 preceding siblings ...)
       [not found]           ` <200707121735.40077.rjw@sisk.pl>
@ 2007-07-12 16:09           ` Alan Stern
  2007-07-12 18:49           ` david
  4 siblings, 0 replies; 113+ messages in thread
From: Alan Stern @ 2007-07-12 16:09 UTC (permalink / raw)
  To: Mark Lord
  Cc: david, Jeremy Fitzhardinge, linux-kernel, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Mark Lord wrote:

> Rafael J. Wysocki wrote:
> > On Thursday, 12 July 2007 08:43, david@lang.hm wrote:
> >> On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote:
> >>
> >>> Andrew Morton wrote:
> ..
> >> 8. hibernate kernel does syspend-to-ram to put the devices into a known 
> >> safe state.
> > Again, the devices should be quiesced rather then suspended in this step.
> 
> That's just not possible.  The Hibernate kernel will not have all
> of the same device drivers as the mainline kernel.  Or at least that's
> what people have previously stated here.

It doesn't matter.  The Hibernate kernel needs to quiesce only the 
devices it has been using, since it is the only kernel to run since the 
system was started.

Alan Stern

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found] ` <20070711111350.GI7091@elf.ucw.cz>
       [not found]   ` <1184257734.9346.76.camel@caritas-dev.intel.com>
@ 2007-07-12 16:28   ` Huang, Ying
  1 sibling, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-12 16:28 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Wed, 2007-07-11 at 13:13 +0200, Pavel Machek wrote:
> Hi!
> 
> Looks interesting... but I was feeling strange dejavu reading
> this... and that's because you pasted the changelog twice :-).
> 

Sorry, I should have re-checked the mail before sending out.

> How fast can kexec boot secondary kernel?

I measure it on my IBM T42 with CONFIG_PRINTK_TIME=y. The boot-up time
is about 4.35s from kexec is issued to root mounted in kexec kernel.

I think it is possible to optimize. Maybe the kexec kernel can
be hibernate/resume by the normal kernel too. This way, a real
kexec/boot-up is only needed for the first time.

> > 2. Compile the kexec-tools with kdump and kjump patches added, the
> >    kdump patch can be found at:
> > 
> > http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-dkump10.patch
> 
> I got 404 error :-(.

Sorry, typo problem. The URL should be:
http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump10.patch

> > 3. Boot compiled kernel, the reserved crash kernel memory region must
> >    be added to kernel command line as following:
> > 
> >    crashkernel=<XX>M@<XX>M
> > 
> >    Where, <XX> should be replaced by the real memory size and position.
> 
> How much memory do you suggest to reserve? 64M? 

I reserved 16M RAM. I think this is sufficient for a simple disk based
hibernation.

> > 7. In the kexec booted kernel, trigger the jumping back with following
> >    shell command.
> > 
> >    echo <a>:<b> > /sys/power/resume
> > 
> >    Where <a> and <b> is non-negative integer, at least one of them must
> >    be non-zero.
> 
> What does a and b mean?
> 

a and b has no meaning. They are only used to trigger the resume
process. This will be fixed in the future version.

> > +/* Adds the kexec_backup= command line parameter to command line. */
> > +static int cmdline_add_backup(char *cmdline, unsigned long addr)
> > +{
> > +	int cmdlen, len, align = 1024;
> > +	char str[30], *ptr;
> > +
> > +	/* Passing in kexec_backup=xxxK format. Saves space required in cmdline.
> > +	 * Ensure 1K alignment*/
> > +	if (addr%align)
> > +		return -1;
> > +	addr = addr/align;
> > +	ptr = str;
> > +	strcpy(str, " kexec_backup=");
> > +	ptr += strlen(str);
> > +	ultoa(addr, ptr);
> > +	strcat(str, "K");
> > +	len = strlen(str);
> > +	cmdlen = strlen(cmdline) + len;
> > +	if (cmdlen > (COMMAND_LINE_SIZE - 1))
> > +		die("Command line overflow\n");
> > +	strcat(cmdline, str);
> > +#if 0
> > +		printf("Command line after adding backup\n");
> > +		printf("%s\n", cmdline);
> > +#endif
> > +	return 0;
> > +}
> 
> printf()? ...and please remove out commented code.

This is patch against kexec-tools, which is a userspace tool. So printf
is used. The commented code is just as other cmdline_add_xxx functions
of kexec-tools. But it is useless, and should be removed.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]   ` <1184251423.9346.55.camel@caritas-dev.intel.com>
  2007-07-12  7:03     ` david
  2007-07-12 12:53     ` Rafael J. Wysocki
@ 2007-07-12 16:32     ` Eric W. Biederman
       [not found]     ` <Pine.LNX.4.64.0707112345250.28090@asgard.lang.hm>
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 113+ messages in thread
From: Eric W. Biederman @ 2007-07-12 16:32 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Kexec Mailing List, linux-kernel, Pavel Machek, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

I like the concept, but I completely disagree with your current
implementation.

I think it will be much easier if you start with a completely
independent code path and then just reuse the pieces of the
existing code path that you need.  

More details below.

"Huang, Ying" <ying.huang@intel.com> writes:

> On Wed, 2007-07-11 at 17:22 -0700, Andrew Morton wrote:
>> This sounds awesome.  Am I correct in expecting that ultimately the
>> existing hibernation implementation just goes away and we reuse (and hence
>> strengthen) the existing kexec (and kdump?) infrastructure?
>> And that we get hibernation support almost for free on all kexec (and
>> relocatable-kernel?) capable architectures?
>> And that all the management of hibernation and resume happens in userspace?
>
> Yes. Ultimately, most of the hibernation code such as process freezer,
> memory shrinking, memory snapshot (atomic copy), image reading/writing
> can go away, because kexec based hibernation doesn't depend on them.
> Just the device/CPU state quiescent/save/restore is necessary to remain.
> And, the management of hibernation and resume will happen in userspace.
>
>> I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
>> be disabled in the to-be-hibernated kernel, or in the little transient
>> kexec kernel?
>
> Under current implementation of device state quiescent/save/restore, the
> CONFIG_ACPI must be turned off both in to-be-hibernated kernel and
> transient kexec kernel.
>
> But the hibernation people are going to separate the device suspend from
> device hibernate. The device hibernate will put device in quiescent
> state but not in low power state. When this is done, it is not necessary
> to disable CONFIG_ACPI at all. It is just a workaround for current
> implementation that disabling CONFIG_ACPI.
>
>> How close do you think all this is to being a viable thing?
>
> The kexec jump is the first step, maybe the simplest step. There are
> many other issues to be resolved, at least the following ones.
>
> 1. Separate device suspend from device hibernate.

Actually in some very practical sense we already have two copies of
this in the kernel.  device_shutdown and the hotunplug/module
remove code.  So it is should be mostly a matter of using what we have.

Basically all this entails is to modify sys_reboot()
and adding a LINUX_REBOOT_CMD_KSPAWN and have that command
enter the kexec path with the appropriate set of calls.
I would be really surprised if this winds up with much
more code then the current kernel_kexec function.

This might wind up exactly the same as the current
LINUX_REBOOT_CMD_KEXEC but at least until we have a working
prototype it makes sense to allow for differences.

This should allow the kexec based implementation to coincide
with the existing software suspend to disk code until it is proven out
and then we can just remove all of the software suspend code to
disk code.

> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
> before kexec and restore them after kexec.
> 3. Support the in-place kexec? The relocatable kernel is not necessary
> if this can be implemented.

It sounds like what you really want is the normal kexec path enhanced
so that you can return to the kernel you started with.

The normal kexec path already knows how to do the memory shuffle so
it can do on demand memory allocation.  That code just needs to
enhanced slightly so that you allocate an extra page, setup an inverse
scatter gather list for restoring the pages, and teach relocate_kernel.S
to preserve it's destination pages by using the inverse scatter gather
list.

The normal kexec path already calls device_shutdown and the like to
stop devices from running.  Although again that code path is not
prepared to restore the devices.

...

For prototyping I would:
- reserve a chunk of memory (possibly with the crashkernel= option)
  and run a relocatable kernel out of it.  

  By using the normal kexec you can boot a relocatable restore kernel
  in that reserved region. It is an extra step but it makes things
  work today.

- I would use the normal sys_kexec_load.

- I would debug/tweak the user space and the code to reenter the
  old kernel.  I.e. the device driver stop/start code.

  Once it was basically working I would the update normal kexec
  memory copy code in relocate.S to preserve the destination pages.

> 4. Image writing/reading. (Only user space application is needed).

And possibly a few fixes to /dev/mem.  This is pretty much the same
process as generating a core dump so there should be some synergy with that.

We probably want to use something like the ELF header the crashdump
path uses to communicate to the kernel saving memory which memory
regions need to be saved.  Which probably means that we you can use the
exact same method as the kexec on panic kernel uses to save memory.

> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
> for resume. For example, in the first stage of kernel boot, just first
> 16M (or a little more) RAM is used, if the resume image is found, the
> saved kernel image is resumed; if the resume image is not found, turn on
> the remaining RAM. This will depends on 3.

Well I expect the resume will be load the resumed kernel into reserved
memory.  And kexec a very small assembly stub that will jump back
to the code in relocate_kernel.S which will call ret.

Then either hot add the rest of our memory or kexec to a kernel without
restrictions.

> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
> be hibernate/resume by the normal kernel too. This way, a real
> kexec/boot-up is only needed for the first time.

Well just not loading drivers you aren't going to use and generally avoiding
long disk probing times will help here.  We control all of the code so
it should be relatively straight forward.

Eric

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]             ` <469650DE.4000901@rtr.ca>
@ 2007-07-12 16:35               ` Mark Lord
       [not found]               ` <46965837.8030907@rtr.ca>
  1 sibling, 0 replies; 113+ messages in thread
From: Mark Lord @ 2007-07-12 16:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, Jeremy Fitzhardinge, linux-kernel, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

Mark Lord wrote:
> Rafael J. Wysocki wrote:
>..
>> How much RAM is there in your machine?
> 
> 2GB, but It doesn't need to dump that much for good performance.
> Hibernate here consists of:
> 
>   echo "$(( 256 * 1024 * 1024 ))" > /sys/power/image_size
>   echo -n disk > /sys/power/state
> 
> Plus a couple of fiddly commands to deal with the ATI binary X server.

Whoops.. wrong half of the script.
For TuxOnIce in 10 seconds, it does this:

  echo 288 > /sys/power/suspend2/image_size_limit
  echo > /sys/power/suspend2/do_suspend

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]   ` <4695C096.5080400@goop.org>
  2007-07-12  6:43     ` david
       [not found]     ` <1184260174.9346.85.camel@caritas-dev.intel.com>
@ 2007-07-12 17:09     ` Huang, Ying
  2 siblings, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-12 17:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Wed, 2007-07-11 at 22:48 -0700, Jeremy Fitzhardinge wrote:
> >> The kexec jump is implemented in the framework of software suspend. In
> >> fact, the kexec based hibernation can be seen as just implementing the
> >> image writing and reading method of software suspend with a kexeced
> >> Linux kernel.
> >>     
> 
> I guess I'm (still) confused by the terminology here.  Do you mean that 
> it fits into suspend-to-disk as a disk-writing mechanism, or in 
> suspend-to-ram as a way of going to sleep?

It fits into suspend-to-disk as a disk-writing mechanism. But most
tricks of suspend-to-disk will be no longer necessary in kexec based
hibernation.

> > I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
> > be disabled in the to-be-hibernated kernel, or in the little transient
> > kexec kernel?
> >   
> 
> I think the point is that if kernel A says "I'm suspending" and calls 
> the suspend method on all its devices, then kernel B finds that it has 
> no powered on devices to work with.  But then couldn't it turn on the 
> ones it wants anyway?  And don't you want to suspend them, to make sure 
> they're not still DMAing memory while B is trying to shuffle everything 
> off to disk?

The devices should be put quiescent state to stop DMA like things. But
they do not need to be put in low power state.

"Do not put devices into low power state" vs. "power on devices during
boot-up"

Which one is easier?

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]     ` <Pine.LNX.4.64.0707112345250.28090@asgard.lang.hm>
       [not found]       ` <1184260683.9346.91.camel@caritas-dev.intel.com>
@ 2007-07-12 17:18       ` Huang, Ying
  1 sibling, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-12 17:18 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, 2007-07-12 at 00:03 -0700, david@lang.hm wrote:
> >
> > The kexec jump is the first step, maybe the simplest step. There are
> > many other issues to be resolved, at least the following ones.
> >
> > 1. Separate device suspend from device hibernate.
> 

Maybe my usage of terminology has some problem. But, the "device
hibernate" here means put device into quiescent state and save the
device state, but do not put device into low power state. 

> > 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
> > be hibernate/resume by the normal kernel too. This way, a real
> > kexec/boot-up is only needed for the first time.
> 
> the hibernate kernel shouldn't need a lot of the features of the standard 
> kerneel (does it really need sound for example), and if tailored even 
> tighter could be configured to only have the drivers actually used for the 
> save and restore, makeing a _very_ minimal kernel (no USB, no network, 
> only simple video drivers, etc) greatly speeding up the boot

There is no need for two kernel. Most drivers and optional features are
compiled as modules, as that of most desktop distributions. So just
"insmod" needed modules only in hibernate kernel is sufficient.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]       ` <200707121446.14170.rjw@sisk.pl>
  2007-07-12 13:51         ` Mark Lord
       [not found]         ` <469631FA.2070405@rtr.ca>
@ 2007-07-12 18:42         ` david
       [not found]         ` <Pine.LNX.4.64.0707121138140.25614@asgard.lang.hm>
  3 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-12 18:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, linux-kernel, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:

> On Thursday, 12 July 2007 08:43, david@lang.hm wrote:
>> On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote:
>>
>>> Andrew Morton wrote:
>>>>  On Wed, 11 Jul 2007 15:30:31 +0000
>>>>  "Huang, Ying" <ying.huang@intel.com> wrote:
>>>>
>>>>>  1. Boot a kernel A
>>>>>  2. Work under kernel A
>>>>>  3. Kexec another kernel B in kernel A
>>>>>  4. Work under kernel B
>>>>>  5. Jump from kernel B to kernel A
>>>>>  6. Continue work under kernel A
>>>>>
>>>>>  This is the first step to implement kexec based hibernation. If the
>>>>>  memory image of kernel A is written to or read from a permanent media
>>>>>  in step 4, a preliminary version of kexec based hibernation can be
>>>>>  implemented.
>>>>>
>>>>>  The kernel B is run as a crashdump kernel in reserved memory
>>>>>  region. This is the biggest constrains of the patch. It is planed to
>>>>>  be eliminated in the next version. That is, instead of reserving memory
>>>>>  region previously, the needed memory region is backuped before kexec
>>>>>  and restored after jumping back.
>>>>>
>>>>>  Another constrains of the patch is that the CONFIG_ACPI must be turned
>>>>>  off to make kexec jump work. Because ACPI will put devices into low
>>>>>  power state, the kexeced kernel can not be booted properly under
>>>>>  it. This constrains can be eliminated by separating the suspend method
>>>>>  and hibernation method of the devices as proposed earlier in the LKML.
>>>>>
>>>>>  The kexec jump is implemented in the framework of software suspend. In
>>>>>  fact, the kexec based hibernation can be seen as just implementing the
>>>>>  image writing and reading method of software suspend with a kexeced
>>>>>  Linux kernel.
>>>>>
>>>
>>> I guess I'm (still) confused by the terminology here.  Do you mean that it
>>> fits into suspend-to-disk as a disk-writing mechanism, or in suspend-to-ram
>>> as a way of going to sleep?
>>
>> Suspend-to-ram involves stopping the system and shutting down devices to
>> go into low-power mode, then on wakeup restarting devices and resuming
>> operation
>>
>> so the steps would be.
>>
>> 1. stop userspace
>>
>> 2. walk the system device tree and put devices to sleep
>>
>> 3. go into the lowest power mode available and wait for a wakeup signal
>>
>> later
>>
>> 4. walk the system device tree and wake up devices
>>
>> 5. resume userspace scheduling.
>
> Note that we are going to phase out steps 1 and 5.

what I'm referring to in #1 and #5 is not the current freezer, it's just 
the kernel not allocating any cpu time to userspace while it's shutting 
things down (this could be as simple as unplugging all non-boot cpu's and 
then doing the rest of the work without letting the scheduler run.

>> note that what devices get put to sleep could be configurable, potentially
>> to the extreme of things like the OLPC (that have hardware designed for
>> cheap sleeping) going into a light suspend-to-ram state between keystrokes
>> if nothing else has a timer event scheduled before that.
>>
>> Suspend-do-disk (Hibernate) involves stopping the system, makeing a
>> snapshot of ram, writing the snapshot to somewhere and powering off the
>> box. on wakeup (power-on) a helper kernel boots, loads the snapshot into
>> ram and jumps to the kernel in the snapshot to resume operation.
>>
>> as I understand the proposal the thought is to do the following
>>
>> 1. system kernel does suspend-to-ram to put the devices into a known safe
>> state.
>
> Not necessarily suspend-to-RAM.  I'd much prefer it if devices were not put
> into low power states but quiesced (ie. no DMA, no interrupts).

as I asked in another message, is it really worth having two (or more) 
modes here?

>> 2. system kernel uses kexec to start hibernate kernel
>>
>> 3. hibernate kernel wakes up devices it needs as if it was doing a
>> resume-from-ram
>
> I think that the devices should be initialized from scratch in this step.
>
>> 4. hibernate kernel copies ram image somewhere
>
> In this step some userland may be involved (started from the "hibernate"
> kernel).

yes,, I should have been much clearer that it's the userspace for the 
hibernate kernel that does this.

>> 5. hibernate kernel shuts down the box
>>
>> later
>>
>> 6. hibernate kernel boots
>>
>> 7. hibernate kernel copies ram image from somewhere
>>
>> 8. hibernate kernel does syspend-to-ram to put the devices into a known
>> safe state.
>
> Again, the devices should be quiesced rather then suspended in this step.
>
>> 9. hibernate kernel uses kexec to start system kernel
>>
>> 10. system kernel wakes up devices it needs as if it was doing a
>> resume-from-ram.
>
> I think it should reconfigure devices from scratch (ie. reprobe).

probably a good idea, especially since there will be devices that the 
hibernate krenel hasn't touched.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]         ` <469631FA.2070405@rtr.ca>
                             ` (3 preceding siblings ...)
  2007-07-12 16:09           ` Alan Stern
@ 2007-07-12 18:49           ` david
  4 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-12 18:49 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jeremy Fitzhardinge, linux-kernel, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Mark Lord wrote:

> Rafael J. Wysocki wrote:
>>  On Thursday, 12 July 2007 08:43, david@lang.hm wrote:
>> >  On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote:
>> > 
>> > >  Andrew Morton wrote:
> ..
>> >  8. hibernate kernel does syspend-to-ram to put the devices into a known 
>> >  safe state.
>>  Again, the devices should be quiesced rather then suspended in this step.
>
> That's just not possible.  The Hibernate kernel will not have all
> of the same device drivers as the mainline kernel.  Or at least that's
> what people have previously stated here.

devices that have not been touches don't need to be quiesced or put into a 
low-power state, they are still waiting to be initialized. so as long as 
the device initialization can be done from the low-power or quiesced state 
things will work just fine.

> ..
>> > > >   This sounds awesome.  Am I correct in expecting that ultimately the
>> > > >   existing hibernation implementation just goes away and we reuse 
>> > > >   (and hence
>> > > >   strengthen) the existing kexec (and kdump?) infrastructure?
>
> No, not so simple.  We still need much of the code to santize devices
> upon wakeup from hibernation.   And adding this extra reboot-kernel step
> in the midst of hibernate will double the time it takes (ugh).
>
> Currently, TuxOnIce(suspend2) takes about 10 seconds to suspend my notebook.
> Switching to this new scheme would double that to 10 seconds to boot/probe,
> plus the original 10 seconds to hibernate.  Assuming the new implementation
> even comes close to suspend2 speed.

why do you assume that it will take 10 seconds to boot the new kernel? 
linuxbiosdoes it in <2 seconds, someone else posted on this thread <5 
seconds

> And the complexity and difficulty of setup really scares me.
> Right now, we've got a pretty good/fast in-kernel (well, external patch)
> that allows my machines to hibernate very quickly, wake up even faster,
> and not swap like mad afterwards.  Without any external programs,
> initramfs, or extra kernels required.
>
> And we want to replace this with an ultra-complex setup because.. ????

the complexity of the freezer freezing some things, but not other things 
keeps getting t wrong and many people can't think of any algorithm that 
will always get it right. This approach bypasses the entire problem 
makeing it much simpler conceptually, even though there are a few more 
parts involved.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]     ` <200707121453.49616.rjw@sisk.pl>
@ 2007-07-12 18:57       ` david
       [not found]       ` <Pine.LNX.4.64.0707121150460.25614@asgard.lang.hm>
  1 sibling, 0 replies; 113+ messages in thread
From: david @ 2007-07-12 18:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:

>>> I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
>>> be disabled in the to-be-hibernated kernel, or in the little transient
>>> kexec kernel?
>>
>> Under current implementation of device state quiescent/save/restore, the
>> CONFIG_ACPI must be turned off both in to-be-hibernated kernel and
>> transient kexec kernel.
>>
>> But the hibernation people are going to separate the device suspend from
>> device hibernate. The device hibernate will put device in quiescent
>> state but not in low power state. When this is done, it is not necessary
>> to disable CONFIG_ACPI at all. It is just a workaround for current
>> implementation that disabling CONFIG_ACPI.
>>
>>> How close do you think all this is to being a viable thing?
>>
>> The kexec jump is the first step, maybe the simplest step. There are
>> many other issues to be resolved, at least the following ones.
>>
>> 1. Separate device suspend from device hibernate.
>
> Step 0, I'd say. :-)

is there more involved then just start ignoreing device hibernate and just 
useing the device suspend? (although if there is not a workable device 
suspend for a driver, that would answer my question :-)

>> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
>> before kexec and restore them after kexec.
>
> I don't think this is very important initially.

I agree, a stipped down hibernate kernel can be very small, not allocating 
this memory until it's needed is a step for the final polishing.

>> 3. Support the in-place kexec? The relocatable kernel is not necessary
>> if this can be implemented.
>> 4. Image writing/reading. (Only user space application is needed).
>
> And a kernel interface for that application.

I do't understand this statement, this application is just useing the 
standard kernel interfaces (block devices to read/write to disk, network 
devices to read/write to a server, etc). no new interfaces needed.

>> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
>> for resume. For example, in the first stage of kernel boot, just first
>> 16M (or a little more) RAM is used, if the resume image is found, the
>> saved kernel image is resumed; if the resume image is not found, turn on
>> the remaining RAM. This will depends on 3.
>
> I think that this is the most difficult part of the whole thing.

don't try to get too fancy right now. stick with a simple 'boot hibernate 
kernel, it's userspace looks for an image to resume, and if it doesn't 
find one reboot to the normal system'

I don't know how to do this with grub, but it would be a trivial shell 
script with lilo

>> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
>> be hibernate/resume by the normal kernel too. This way, a real
>> kexec/boot-up is only needed for the first time.
>
> I'm not sure what you mean.

he's trying to get fancy again, the best way to speed up the boot of the 
kexec kernel is make it smaller and avoid probing for devices (hotplug 
should NOT be used for normal suspend situations)

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]           ` <200707121501.03016.rjw@sisk.pl>
  2007-07-12 13:22             ` jimmy bahuleyan
@ 2007-07-12 19:03             ` david
  1 sibling, 0 replies; 113+ messages in thread
From: david @ 2007-07-12 19:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:

> On Thursday, 12 July 2007 12:10, david@lang.hm wrote:
>> On Thu, 12 Jul 2007, Huang, Ying wrote:
>>
>>> On Thu, 2007-07-12 at 00:03 -0700, david@lang.hm wrote:
>>>>>
>>>>> The kexec jump is the first step, maybe the simplest step. There are
>>>>> many other issues to be resolved, at least the following ones.
>>>>>
>>>>> 1. Separate device suspend from device hibernate.
>>>>
>>>
>>> Maybe my usage of terminology has some problem. But, the "device
>>> hibernate" here means put device into quiescent state and save the
>>> device state, but do not put device into low power state.
>>
>> is there really enough savings (in time or otherwise) to make it worth
>> splitting this into two steps? for suspend-to-ram we definantly will need
>> to option to go all the way to a low power state, there's significant
>> extra complexity if you also have a state between normal operation and
>> this low power state.
>>
>> it may be worth doing if the low-power state is expensive (in time or
>> effort) to get to or from and the lesser state allows the computer overall
>> to save power (like the different cpu C states)
>>
>> but I suspect that the number of drivers where this is worth doing is
>> relativly small, and it may be a better approach to start off with just
>> putting everything into the low-power state until some drive shows up that
>> makes it worth adding the intermediate state to the system (and drivers
>> wouldn't have to change, if they only support one suspend state it's the
>> low-power one, if they support more then higher layers choose which ones
>> to move to)
>
> We've discussed that a lot on linux-pm and the conclusion is that devices
> should not be put into low power states before creating the hibernation
> image, because that leads to problems during the restore.

too bad, I was thinking that a driver in a low-power state could be 
initialized normally and we could avoid having different quiesce and 
low-power states

> In turn, during the restore, when the image has been loaded and the "old"
> kernel gets the control, it should reprobe devices and initialize them from
> scratch rather than doing something like "resume devices after suspend
> to RAM".

this makes sense.

>>>>> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
>>>>> be hibernate/resume by the normal kernel too. This way, a real
>>>>> kexec/boot-up is only needed for the first time.
>>>>
>>>> the hibernate kernel shouldn't need a lot of the features of the standard
>>>> kerneel (does it really need sound for example), and if tailored even
>>>> tighter could be configured to only have the drivers actually used for the
>>>> save and restore, makeing a _very_ minimal kernel (no USB, no network,
>>>> only simple video drivers, etc) greatly speeding up the boot
>>>
>>> There is no need for two kernel. Most drivers and optional features are
>>> compiled as modules, as that of most desktop distributions. So just
>>> "insmod" needed modules only in hibernate kernel is sufficient.
>>
>> actually, I think that while you may be able to get away with only one
>> kernel, you are probably better off with two. on the hibernate kernel you
>> can choose many 'embedded' options that don't make sense for the normal
>> kernel (no high mem, no SMP support, no SELinux, no network routing, not
>> netfilter, use SLOB not SLAB/SLUB, etc). also keep in mind that each
>> module that you load wastes apartial page of memory.
>>
>> remember people run complete linux systems in 8M of ram, a syspend system
>> for a simple 'write the ram image to partition X on this IDE drive' should
>> be aiming at 2-4M of memory.
>>
>> more complex setups may want more space, but let the distros bloat things
>> up, design and demo an optimized system :-)
>
> So if a user wants to install a kernel.org kernel on his system, (s)he'll have
> to compile and install two kernels with different options.

for now allow this option as it's the simplest to implement (it takes 
extra work to re-use the kernel)

also, the objections to the auto-config kernel requests have mostly been 
how it's impossible for the auto-config to get enough things right. in the 
case of a hibernate kernel I think it's such a minimal config that it 
should be possible to have an auto-config script that you tell 'I plan to 
suspend to partition X, give me a minimal kernel that can access ram, that 
partition, and the framebuffer (for status output)' and have it produce a 
tiny kernel with only thta support

> That doesn't sound good to me. :-)

on the other hand, booting a standard distro kernel that does hotplug 
detection for everything it can find on the system is far from optimal as 
well.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12 13:55           ` Mark Lord
@ 2007-07-12 19:05             ` david
  0 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-12 19:05 UTC (permalink / raw)
  To: Mark Lord
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Mark Lord wrote:

> david@lang.hm wrote:
>>
>>  actually, I think that while you may be able to get away with only one
>>  kernel, you are probably better off with two. on the hibernate kernel you
>>  can choose many 'embedded' options that don't make sense for the normal
>>  kernel (no high mem, no SMP support, no SELinux, no network routing, not
>>  netfilter, use SLOB not SLAB/SLUB, etc). also keep in mind that each
>>  module that you load wastes apartial page of memory.
>
> No highmem?  No thanks.
>
> I really want hibernate to save stuff from above 1GB as well
> as the stuff below 1GB.

oops, good point. I was thinking that the hibernate kernel wouldn't need 
that much ram for it's own operation and forgetting that it needed to 
access everything the main system had.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]     ` <m14pk9fuqa.fsf@ebiederm.dsl.xmission.com>
@ 2007-07-12 19:09       ` david
  2007-07-12 19:49         ` Eric W. Biederman
       [not found]       ` <1184368525.1069.68.camel@caritas-dev.intel.com>
  2007-07-13 23:15       ` Huang, Ying
  2 siblings, 1 reply; 113+ messages in thread
From: david @ 2007-07-12 19:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kexec Mailing List, linux-kernel, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Eric W. Biederman wrote:

>> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
>> before kexec and restore them after kexec.
>> 3. Support the in-place kexec? The relocatable kernel is not necessary
>> if this can be implemented.
>
> It sounds like what you really want is the normal kexec path enhanced
> so that you can return to the kernel you started with.
>
> The normal kexec path already knows how to do the memory shuffle so
> it can do on demand memory allocation.  That code just needs to
> enhanced slightly so that you allocate an extra page, setup an inverse
> scatter gather list for restoring the pages, and teach relocate_kernel.S
> to preserve it's destination pages by using the inverse scatter gather
> list.
>
> The normal kexec path already calls device_shutdown and the like to
> stop devices from running.  Although again that code path is not
> prepared to restore the devices.

we shouldn't need a restore code path if the new kernel re-detects 
everything. if kexec already shuts down all the devices we may not need to 
implement anything new here (although there may be room for future 
performance optimization)

> ...
>
> For prototyping I would:
> - reserve a chunk of memory (possibly with the crashkernel= option)
>  and run a relocatable kernel out of it.
>
>  By using the normal kexec you can boot a relocatable restore kernel
>  in that reserved region. It is an extra step but it makes things
>  work today.
>
> - I would use the normal sys_kexec_load.
>
> - I would debug/tweak the user space and the code to reenter the
>  old kernel.  I.e. the device driver stop/start code.
>
>  Once it was basically working I would the update normal kexec
>  memory copy code in relocate.S to preserve the destination pages.

for prototyping there's no need to use the same kernel.

>> 4. Image writing/reading. (Only user space application is needed).
>
> And possibly a few fixes to /dev/mem.  This is pretty much the same
> process as generating a core dump so there should be some synergy with that.

what fixes are you thinking of?

you are makeing this sound very simple ;-)

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]           ` <200707122120.19662.rjw@sisk.pl>
@ 2007-07-12 19:14             ` david
       [not found]             ` <Pine.LNX.4.64.0707121210210.25614@asgard.lang.hm>
  1 sibling, 0 replies; 113+ messages in thread
From: david @ 2007-07-12 19:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jeremy Fitzhardinge, linux-kernel, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:

>>>> note that what devices get put to sleep could be configurable, potentially
>>>> to the extreme of things like the OLPC (that have hardware designed for
>>>> cheap sleeping) going into a light suspend-to-ram state between keystrokes
>>>> if nothing else has a timer event scheduled before that.
>>>>
>>>> Suspend-do-disk (Hibernate) involves stopping the system, makeing a
>>>> snapshot of ram, writing the snapshot to somewhere and powering off the
>>>> box. on wakeup (power-on) a helper kernel boots, loads the snapshot into
>>>> ram and jumps to the kernel in the snapshot to resume operation.
>>>>
>>>> as I understand the proposal the thought is to do the following
>>>>
>>>> 1. system kernel does suspend-to-ram to put the devices into a known safe
>>>> state.
>>>
>>> Not necessarily suspend-to-RAM.  I'd much prefer it if devices were not put
>>> into low power states but quiesced (ie. no DMA, no interrupts).
>>
>> as I asked in another message, is it really worth having two (or more)
>> modes here?
>
> I think so.  The suspend-to-RAM mode is quite specific and on some platform
> (ie. ACPI) it requires platform support.
>
> We've already reached the conclusion that it's better to separate suspend from
> hibernation, as far as device drivers are concerned, and let's not repeat the
> discussion.

Ok, I seem to have been miscommunicating here. the old combined 
suspend/hibernate took everything to the hibernate state even if you only 
needed to suspend.

I was still seeing two diffent states involved

for suspend go to low-power mode

for hibernate go to low-power mode, kexec the new kernel, do your stuff, power off

note that this doesn't really matter as you have pointed out in other 
messages that we don't really want to put things in low-power mode, and 
Eric pointed out that kexec already handles disabling devices, so it 
sounds like this may be a solved problem if he's right and an issue to be 
solved differently if he's not.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]         ` <Pine.LNX.4.64.0707121138140.25614@asgard.lang.hm>
       [not found]           ` <200707122120.19662.rjw@sisk.pl>
@ 2007-07-12 19:20           ` Rafael J. Wysocki
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 19:20 UTC (permalink / raw)
  To: david
  Cc: Jeremy Fitzhardinge, linux-kernel, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thursday, 12 July 2007 20:42, david@lang.hm wrote:
> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> 
> > On Thursday, 12 July 2007 08:43, david@lang.hm wrote:
> >> On Wed, 11 Jul 2007, Jeremy Fitzhardinge wrote:
> >>
> >>> Andrew Morton wrote:
> >>>>  On Wed, 11 Jul 2007 15:30:31 +0000
> >>>>  "Huang, Ying" <ying.huang@intel.com> wrote:
> >>>>
> >>>>>  1. Boot a kernel A
> >>>>>  2. Work under kernel A
> >>>>>  3. Kexec another kernel B in kernel A
> >>>>>  4. Work under kernel B
> >>>>>  5. Jump from kernel B to kernel A
> >>>>>  6. Continue work under kernel A
> >>>>>
> >>>>>  This is the first step to implement kexec based hibernation. If the
> >>>>>  memory image of kernel A is written to or read from a permanent media
> >>>>>  in step 4, a preliminary version of kexec based hibernation can be
> >>>>>  implemented.
> >>>>>
> >>>>>  The kernel B is run as a crashdump kernel in reserved memory
> >>>>>  region. This is the biggest constrains of the patch. It is planed to
> >>>>>  be eliminated in the next version. That is, instead of reserving memory
> >>>>>  region previously, the needed memory region is backuped before kexec
> >>>>>  and restored after jumping back.
> >>>>>
> >>>>>  Another constrains of the patch is that the CONFIG_ACPI must be turned
> >>>>>  off to make kexec jump work. Because ACPI will put devices into low
> >>>>>  power state, the kexeced kernel can not be booted properly under
> >>>>>  it. This constrains can be eliminated by separating the suspend method
> >>>>>  and hibernation method of the devices as proposed earlier in the LKML.
> >>>>>
> >>>>>  The kexec jump is implemented in the framework of software suspend. In
> >>>>>  fact, the kexec based hibernation can be seen as just implementing the
> >>>>>  image writing and reading method of software suspend with a kexeced
> >>>>>  Linux kernel.
> >>>>>
> >>>
> >>> I guess I'm (still) confused by the terminology here.  Do you mean that it
> >>> fits into suspend-to-disk as a disk-writing mechanism, or in suspend-to-ram
> >>> as a way of going to sleep?
> >>
> >> Suspend-to-ram involves stopping the system and shutting down devices to
> >> go into low-power mode, then on wakeup restarting devices and resuming
> >> operation
> >>
> >> so the steps would be.
> >>
> >> 1. stop userspace
> >>
> >> 2. walk the system device tree and put devices to sleep
> >>
> >> 3. go into the lowest power mode available and wait for a wakeup signal
> >>
> >> later
> >>
> >> 4. walk the system device tree and wake up devices
> >>
> >> 5. resume userspace scheduling.
> >
> > Note that we are going to phase out steps 1 and 5.
> 
> what I'm referring to in #1 and #5 is not the current freezer, it's just 
> the kernel not allocating any cpu time to userspace while it's shutting 
> things down (this could be as simple as unplugging all non-boot cpu's and 
> then doing the rest of the work without letting the scheduler run.

This is not what we're going to do.  There is a plan to block tasks on I/O if
they request it during a suspend.

> >> note that what devices get put to sleep could be configurable, potentially
> >> to the extreme of things like the OLPC (that have hardware designed for
> >> cheap sleeping) going into a light suspend-to-ram state between keystrokes
> >> if nothing else has a timer event scheduled before that.
> >>
> >> Suspend-do-disk (Hibernate) involves stopping the system, makeing a
> >> snapshot of ram, writing the snapshot to somewhere and powering off the
> >> box. on wakeup (power-on) a helper kernel boots, loads the snapshot into
> >> ram and jumps to the kernel in the snapshot to resume operation.
> >>
> >> as I understand the proposal the thought is to do the following
> >>
> >> 1. system kernel does suspend-to-ram to put the devices into a known safe
> >> state.
> >
> > Not necessarily suspend-to-RAM.  I'd much prefer it if devices were not put
> > into low power states but quiesced (ie. no DMA, no interrupts).
> 
> as I asked in another message, is it really worth having two (or more) 
> modes here?

I think so.  The suspend-to-RAM mode is quite specific and on some platform
(ie. ACPI) it requires platform support.

We've already reached the conclusion that it's better to separate suspend from
hibernation, as far as device drivers are concerned, and let's not repeat the
discussion.

> >> 2. system kernel uses kexec to start hibernate kernel
> >>
> >> 3. hibernate kernel wakes up devices it needs as if it was doing a
> >> resume-from-ram
> >
> > I think that the devices should be initialized from scratch in this step.
> >
> >> 4. hibernate kernel copies ram image somewhere
> >
> > In this step some userland may be involved (started from the "hibernate"
> > kernel).
> 
> yes,, I should have been much clearer that it's the userspace for the 
> hibernate kernel that does this.
> 
> >> 5. hibernate kernel shuts down the box
> >>
> >> later
> >>
> >> 6. hibernate kernel boots
> >>
> >> 7. hibernate kernel copies ram image from somewhere
> >>
> >> 8. hibernate kernel does syspend-to-ram to put the devices into a known
> >> safe state.
> >
> > Again, the devices should be quiesced rather then suspended in this step.
> >
> >> 9. hibernate kernel uses kexec to start system kernel
> >>
> >> 10. system kernel wakes up devices it needs as if it was doing a
> >> resume-from-ram.
> >
> > I think it should reconfigure devices from scratch (ie. reprobe).
> 
> probably a good idea, especially since there will be devices that the 
> hibernate krenel hasn't touched.

Exactly.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]       ` <Pine.LNX.4.64.0707121150460.25614@asgard.lang.hm>
@ 2007-07-12 19:34         ` Rafael J. Wysocki
       [not found]         ` <200707122134.29991.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 19:34 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thursday, 12 July 2007 20:57, david@lang.hm wrote:
> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> 
> >>> I didn't understand the ACPI problem.  Does this mean that CONFIG_ACPI must
> >>> be disabled in the to-be-hibernated kernel, or in the little transient
> >>> kexec kernel?
> >>
> >> Under current implementation of device state quiescent/save/restore, the
> >> CONFIG_ACPI must be turned off both in to-be-hibernated kernel and
> >> transient kexec kernel.
> >>
> >> But the hibernation people are going to separate the device suspend from
> >> device hibernate. The device hibernate will put device in quiescent
> >> state but not in low power state. When this is done, it is not necessary
> >> to disable CONFIG_ACPI at all. It is just a workaround for current
> >> implementation that disabling CONFIG_ACPI.
> >>
> >>> How close do you think all this is to being a viable thing?
> >>
> >> The kexec jump is the first step, maybe the simplest step. There are
> >> many other issues to be resolved, at least the following ones.
> >>
> >> 1. Separate device suspend from device hibernate.
> >
> > Step 0, I'd say. :-)
> 
> is there more involved then just start ignoreing device hibernate and just 
> useing the device suspend? (although if there is not a workable device 
> suspend for a driver, that would answer my question :-)

There's more to it, though.  If devices are suspended, the hibernation kernel
will have to resume them (using platform, like ACPI, callbacks in the process)
instead and that will get complicated.

It's better if devices are quiesced, or even shut down, before we call the
hibernation kernel.

> >> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
> >> before kexec and restore them after kexec.
> >
> > I don't think this is very important initially.
> 
> I agree, a stipped down hibernate kernel can be very small, not allocating 
> this memory until it's needed is a step for the final polishing.

I'm not sure if I agree with that.  In any case, having to use two different
kernels for hibernation would be a big drawback.

> >> 3. Support the in-place kexec? The relocatable kernel is not necessary
> >> if this can be implemented.
> >> 4. Image writing/reading. (Only user space application is needed).
> >
> > And a kernel interface for that application.
> 
> I do't understand this statement, this application is just useing the 
> standard kernel interfaces (block devices to read/write to disk, network 
> devices to read/write to a server, etc). no new interfaces needed.

Yes, but it will have to know _what_ to save, no?

Plus we need to figure out how to avoid corrupting filesystems and swap in use
by the "old" kernel and its processes (hint: a separate "hibernation partition"
is a no-go).

> >> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
> >> for resume. For example, in the first stage of kernel boot, just first
> >> 16M (or a little more) RAM is used, if the resume image is found, the
> >> saved kernel image is resumed; if the resume image is not found, turn on
> >> the remaining RAM. This will depends on 3.
> >
> > I think that this is the most difficult part of the whole thing.
> 
> don't try to get too fancy right now. stick with a simple 'boot hibernate 
> kernel, it's userspace looks for an image to resume, and if it doesn't 
> find one reboot to the normal system'
> 
> I don't know how to do this with grub, but it would be a trivial shell 
> script with lilo

I think it's most portable to use initrd for that, which already makes things
complicated.  Then, we'll have to load the image and jump to the hibernated
kernel in such a way that it would be able to continue from where it stopped
before.  I don't think that is trivial.

> >> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
> >> be hibernate/resume by the normal kernel too. This way, a real
> >> kexec/boot-up is only needed for the first time.
> >
> > I'm not sure what you mean.
> 
> he's trying to get fancy again, the best way to speed up the boot of the 
> kexec kernel is make it smaller and avoid probing for devices (hotplug 
> should NOT be used for normal suspend situations)

Still, I believe that we should do our best to use only one kernel (meaning one
kernel image) here.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]             ` <Pine.LNX.4.64.0707121210210.25614@asgard.lang.hm>
@ 2007-07-12 19:45               ` Rafael J. Wysocki
  0 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 19:45 UTC (permalink / raw)
  To: david
  Cc: Jeremy Fitzhardinge, linux-kernel, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thursday, 12 July 2007 21:14, david@lang.hm wrote:
> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> 
> >>>> note that what devices get put to sleep could be configurable, potentially
> >>>> to the extreme of things like the OLPC (that have hardware designed for
> >>>> cheap sleeping) going into a light suspend-to-ram state between keystrokes
> >>>> if nothing else has a timer event scheduled before that.
> >>>>
> >>>> Suspend-do-disk (Hibernate) involves stopping the system, makeing a
> >>>> snapshot of ram, writing the snapshot to somewhere and powering off the
> >>>> box. on wakeup (power-on) a helper kernel boots, loads the snapshot into
> >>>> ram and jumps to the kernel in the snapshot to resume operation.
> >>>>
> >>>> as I understand the proposal the thought is to do the following
> >>>>
> >>>> 1. system kernel does suspend-to-ram to put the devices into a known safe
> >>>> state.
> >>>
> >>> Not necessarily suspend-to-RAM.  I'd much prefer it if devices were not put
> >>> into low power states but quiesced (ie. no DMA, no interrupts).
> >>
> >> as I asked in another message, is it really worth having two (or more)
> >> modes here?
> >
> > I think so.  The suspend-to-RAM mode is quite specific and on some platform
> > (ie. ACPI) it requires platform support.
> >
> > We've already reached the conclusion that it's better to separate suspend from
> > hibernation, as far as device drivers are concerned, and let's not repeat the
> > discussion.
> 
> Ok, I seem to have been miscommunicating here. the old combined 
> suspend/hibernate took everything to the hibernate state even if you only 
> needed to suspend.

No, quite the other way around.

For creating a hibernation image you don't have to suspend devices.
Furthermore, you don't want to suspend at least some of them, because you'll
be using them to save the image in a while.  Also, there's no need to worry
about what power state to put the device into, so that it can wake up the
system from the sleep state etc.

We've made hibernation use suspend-specific callbacks and that causes quite
a lot of problems to appear.

> I was still seeing two diffent states involved
> 
> for suspend go to low-power mode
> 
> for hibernate go to low-power mode,

No, that is not the way to go, IMO.  We can shut down devices before creating
the image, but not suspend them.

> kexec the new kernel, do your stuff, power off 

Here, instead of just powering off, we may want to make the system enter a
sleep state (S4 in ACPI systems), which is similar to suspend.

> note that this doesn't really matter as you have pointed out in other 
> messages that we don't really want to put things in low-power mode, and 
> Eric pointed out that kexec already handles disabling devices, so it 
> sounds like this may be a solved problem if he's right and an issue to be 
> solved differently if he's not.

Shutting down devices and reinitializing them is costly.  I wouldn't like to
do that.

Of course, in a proof-of-concept version this is viable, but IMO not in the
final one.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-12 19:09       ` david
@ 2007-07-12 19:49         ` Eric W. Biederman
  0 siblings, 0 replies; 113+ messages in thread
From: Eric W. Biederman @ 2007-07-12 19:49 UTC (permalink / raw)
  To: david
  Cc: Kexec Mailing List, linux-kernel, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

david@lang.hm writes:

> On Thu, 12 Jul 2007, Eric W. Biederman wrote:
>
>>> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
>>> before kexec and restore them after kexec.
>>> 3. Support the in-place kexec? The relocatable kernel is not necessary
>>> if this can be implemented.
>>
>> It sounds like what you really want is the normal kexec path enhanced
>> so that you can return to the kernel you started with.
>>
>> The normal kexec path already knows how to do the memory shuffle so
>> it can do on demand memory allocation.  That code just needs to
>> enhanced slightly so that you allocate an extra page, setup an inverse
>> scatter gather list for restoring the pages, and teach relocate_kernel.S
>> to preserve it's destination pages by using the inverse scatter gather
>> list.
>>
>> The normal kexec path already calls device_shutdown and the like to
>> stop devices from running.  Although again that code path is not
>> prepared to restore the devices.
>
> we shouldn't need a restore code path if the new kernel re-detects
> everything. if kexec already shuts down all the devices we may not need to
> implement anything new here (although there may be room for future performance
> optimization)

Yes, reusing device hotplug...  You still need the code path for little
things and to kick of the device redetection but if you get lucky
it won't have to do much.  Of course speed is important.

>> ...
>>
>> For prototyping I would:
>> - reserve a chunk of memory (possibly with the crashkernel= option)
>>  and run a relocatable kernel out of it.
>>
>>  By using the normal kexec you can boot a relocatable restore kernel
>>  in that reserved region. It is an extra step but it makes things
>>  work today.
>>
>> - I would use the normal sys_kexec_load.
>>
>> - I would debug/tweak the user space and the code to reenter the
>>  old kernel.  I.e. the device driver stop/start code.
>>
>>  Once it was basically working I would the update normal kexec
>>  memory copy code in relocate.S to preserve the destination pages.
>
> for prototyping there's no need to use the same kernel.
>
>>> 4. Image writing/reading. (Only user space application is needed).
>>
>> And possibly a few fixes to /dev/mem.  This is pretty much the same
>> process as generating a core dump so there should be some synergy with that.
>
> what fixes are you thinking of?

Don't really know.  I figured /dev/mem was sufficient but the kexec
on panic folks tell me it doesn't work for areas we have told the kernel
isn't memory, I haven't had time so I haven't pushed it.

> you are makeing this sound very simple ;-)

Which is the primary point of using kexec.  If it isn't simple then we
are doing something wrong...

Eric

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]         ` <200707122134.29991.rjw@sisk.pl>
@ 2007-07-12 19:55           ` Jeremy Maitin-Shepard
  2007-07-13  3:06           ` david
       [not found]           ` <877ip54cti.fsf@jbms.ath.cx>
  2 siblings, 0 replies; 113+ messages in thread
From: Jeremy Maitin-Shepard @ 2007-07-12 19:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

[snip]

> There's more to it, though.  If devices are suspended, the hibernation kernel
> will have to resume them (using platform, like ACPI, callbacks in the process)
> instead and that will get complicated.

> It's better if devices are quiesced, or even shut down, before we call the
> hibernation kernel.

I agree that they definitely should not be put into a low power mode, as
that has nothing to do with hibernation.

Ideally, the following would be done:

All of the hardware that won't be needed by the "save image" kernel will
be shut down.  The normal driver shut down calls may not be suitable,
however, because although the same thing should be done to the hardware,
the device shouldn't be "unregistered", since unlike in the actual
shutdown case, the same device will need to brought back up again on
resume, and it will need to have the same device id and such (and
userspace probably shouldn't see the device going away).

Any devices that will be needed by the "save image" kernel could also
safely be shutdown as with the unneeded devices, but it would be more
efficient to simply quiesce it.  Since this would be an additional
complication, initially probably all of the hardware should be shut
down, rather than quiesced.

The reason that I think it is useful to actually shut down the devices,
rather than merely leaving some unneeded devices quiesced, is that it
would be useful to be able to build the "save image" kernel without
support for unneeded devices.  In order to support "suspend to ram"
instead of shutting down after saving the image to disk, the hibernate
kernel needs to be able to send devices into a low power state.  My
impression is that if there are devices it does not know about (i.e. the
unneeded devices), but which are left quiesced but powered on, this
would be a problem for suspend to ram, although not knowing much about
how suspend to ram actually works, I could be mistaken.  (Maybe it is
possible through ACPI or standard bus interfaces to shut down all of the
devices without really knowing anything about them.)

>> >> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
>> >> before kexec and restore them after kexec.
>> >
>> > I don't think this is very important initially.
>> 
>> I agree, a stipped down hibernate kernel can be very small, not allocating 
>> this memory until it's needed is a step for the final polishing.

> I'm not sure if I agree with that.  In any case, having to use two different
> kernels for hibernation would be a big drawback.

I agree that it should not be necessary to use a separate kernel, but it
would be useful to not _require_ that the same kernel is used.
Practically, all this means is to save and restore the text sections as
well, and not rely on the code itself remaining untouched during restore.

>> >> 3. Support the in-place kexec? The relocatable kernel is not necessary
>> >> if this can be implemented.
>> >> 4. Image writing/reading. (Only user space application is needed).
>> >
>> > And a kernel interface for that application.
>> 
>> I do't understand this statement, this application is just useing the 
>> standard kernel interfaces (block devices to read/write to disk, network 
>> devices to read/write to a server, etc). no new interfaces needed.

> Yes, but it will have to know _what_ to save, no?

I agree that a kernel interface would be important; something like
/dev/snapshot that can be read by the "save image" kernel, and written
to by the "restore image" kernel.  Note that similarly, kdump provides a
kernel interface to an ELF image of the old kernel.

> Plus we need to figure out how to avoid corrupting filesystems and swap in use
> by the "old" kernel and its processes (hint: a separate "hibernation partition"
> is a no-go).

Presumably swapoff would take care of freeing up a swap partition for
saving the image.  (It seems that this is the most common hibernate
method, anyway.)  If the user wants to write to a file, like a swap
file, then the old kernel would need to somehow communicate the sequence
of blocks in the file to the "save image" kernel.  Perhaps support for
this method of saving the image need not be available initially.

[snip]

> I think it's most portable to use initrd for that, which already makes things
> complicated.  Then, we'll have to load the image and jump to the hibernated
> kernel in such a way that it would be able to continue from where it stopped
> before.  I don't think that is trivial.

As we've discussed before, I think the resume from hibernate can be done
essentially exactly as it is done currently; it may likely be possible
to reuse the uswsusp kernel code for this purpose.

>> >> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
>> >> be hibernate/resume by the normal kernel too. This way, a real
>> >> kexec/boot-up is only needed for the first time.
>> >
>> > I'm not sure what you mean.
>> 
>> he's trying to get fancy again, the best way to speed up the boot of the 
>> kexec kernel is make it smaller and avoid probing for devices (hotplug 
>> should NOT be used for normal suspend situations)

> Still, I believe that we should do our best to use only one kernel (meaning one
> kernel image) here.

It seems that it is not very difficult to make the choice of using a
different kernel or not one that the user can make.  The only extra
thing required to allow a different kernel to be used is to save and
restore the text sections.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]               ` <46965837.8030907@rtr.ca>
@ 2007-07-12 20:05                 ` Jeremy Maitin-Shepard
       [not found]                 ` <87y7hl2xro.fsf@jbms.ath.cx>
  1 sibling, 0 replies; 113+ messages in thread
From: Jeremy Maitin-Shepard @ 2007-07-12 20:05 UTC (permalink / raw)
  To: Mark Lord
  Cc: david, Jeremy Fitzhardinge, linux-kernel, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm

Mark Lord <lkml@rtr.ca> writes:

[snip]

> Whoops.. wrong half of the script.
> For TuxOnIce in 10 seconds, it does this:

[snip]

I'd argue that for most usage patterns, it doesn't matter all that much
how long it takes to hibernate and power off the system.  What really
matter is that it is extremely reliable, and how fast it takes to
resume.

The reason for this is as follows:

A typical usage pattern of hibernate on a laptop is to shut the lid,
causing the system to start to hibernate, and to place the machine in
the bag.  This is fine, as long as you aren't too rough moving it into
the bag, and the hibernation is extremely reliable (i.e. there is no
chance that it fails to hibernate, and remains powered on.)  Presumably
some additional userspace logic could help here, like start beeping
loudly if the hibernate fails, or perhaps just initiate a shut down, to
avoid the machine overheating in the bag.

Note that in this usage pattern, it doesn't matter how long it takes to
hibernate, because you don't actually wait for it to finish.  The only
waiting occurs when you turn it on, and the resume path should be
essentially exactly the same under kexec hibernate as with the existing
hibernate.

Thus, if kexec hibernate improves reliability (as it might, given that
it eliminates the need for the freezer), it may be worth the slightly
increased hibernate time.  I think the actual amount of extra time it
will take may be very small; a stripped down kernel may only take a
second or two to initialize.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]           ` <877ip54cti.fsf@jbms.ath.cx>
@ 2007-07-12 20:45             ` Rafael J. Wysocki
  2007-07-13  3:12             ` david
       [not found]             ` <Pine.LNX.4.64.0707122008550.25614@asgard.lang.hm>
  2 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-12 20:45 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm

On Thursday, 12 July 2007 21:55, Jeremy Maitin-Shepard wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> [snip]
> 
> > There's more to it, though.  If devices are suspended, the hibernation kernel
> > will have to resume them (using platform, like ACPI, callbacks in the process)
> > instead and that will get complicated.
> 
> > It's better if devices are quiesced, or even shut down, before we call the
> > hibernation kernel.
> 
> I agree that they definitely should not be put into a low power mode, as
> that has nothing to do with hibernation.
> 
> Ideally, the following would be done:
> 
> All of the hardware that won't be needed by the "save image" kernel will
> be shut down.

For that to work, the kernel being hibernated would have to know in advance
which devices would be needed by the image-saving kernel.

> The normal driver shut down calls may not be suitable, 
> however, because although the same thing should be done to the hardware,
> the device shouldn't be "unregistered", since unlike in the actual
> shutdown case, the same device will need to brought back up again on
> resume, and it will need to have the same device id and such (and
> userspace probably shouldn't see the device going away).

Yes, IMO that's an important observation.  We shouldn't unregister devices
at this point and thus we need additinal callbacks for that.

> Any devices that will be needed by the "save image" kernel could also
> safely be shutdown as with the unneeded devices, but it would be more
> efficient to simply quiesce it.  Since this would be an additional
> complication, initially probably all of the hardware should be shut
> down, rather than quiesced.

Agreed, but spinning disks down and up during hibernation is really annoying.

> The reason that I think it is useful to actually shut down the devices,
> rather than merely leaving some unneeded devices quiesced, is that it
> would be useful to be able to build the "save image" kernel without
> support for unneeded devices.  In order to support "suspend to ram"
> instead of shutting down after saving the image to disk, the hibernate
> kernel needs to be able to send devices into a low power state.  My
> impression is that if there are devices it does not know about (i.e. the
> unneeded devices), but which are left quiesced but powered on, this
> would be a problem for suspend to ram, although not knowing much about
> how suspend to ram actually works, I could be mistaken.  (Maybe it is
> possible through ACPI or standard bus interfaces to shut down all of the
> devices without really knowing anything about them.)

I think that if we are going to support suspend-to-both (as Pavel calls it),
the image-saving kernel will have to support exactly the same set of devices
as the hibernated kernel.  In that case, it will be able to put all devices
into low power states for the suspend.

There's one more reason why that may be necessary, actually.  Namely, on ACPI
systems we may want to put the system into the S4 sleep state after saving the
image instead of just powering it off.  In turn, putting the system into the S4
sleep state is very similar to suspending it.

> >> >> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
> >> >> before kexec and restore them after kexec.
> >> >
> >> > I don't think this is very important initially.
> >> 
> >> I agree, a stipped down hibernate kernel can be very small, not allocating 
> >> this memory until it's needed is a step for the final polishing.
> 
> > I'm not sure if I agree with that.  In any case, having to use two different
> > kernels for hibernation would be a big drawback.
> 
> I agree that it should not be necessary to use a separate kernel, but it
> would be useful to not _require_ that the same kernel is used.
>
> Practically, all this means is to save and restore the text sections as
> well, and not rely on the code itself remaining untouched during restore.

Yes, that might be useful.
 
> >> >> 3. Support the in-place kexec? The relocatable kernel is not necessary
> >> >> if this can be implemented.
> >> >> 4. Image writing/reading. (Only user space application is needed).
> >> >
> >> > And a kernel interface for that application.
> >> 
> >> I do't understand this statement, this application is just useing the 
> >> standard kernel interfaces (block devices to read/write to disk, network 
> >> devices to read/write to a server, etc). no new interfaces needed.
> 
> > Yes, but it will have to know _what_ to save, no?
> 
> I agree that a kernel interface would be important; something like
> /dev/snapshot that can be read by the "save image" kernel, and written
> to by the "restore image" kernel.  Note that similarly, kdump provides a
> kernel interface to an ELF image of the old kernel.

Yes, that's what I'm referring to.
 
> > Plus we need to figure out how to avoid corrupting filesystems and swap in use
> > by the "old" kernel and its processes (hint: a separate "hibernation partition"
> > is a no-go).
> 
> Presumably swapoff would take care of freeing up a swap partition for
> saving the image.  (It seems that this is the most common hibernate
> method, anyway.)

No, the swaps are not turned off for hibernation.

> If the user wants to write to a file, like a swap file, then the old kernel
> would need to somehow communicate the sequence of blocks in the file to the
> "save image" kernel.  Perhaps support for this method of saving the image
> need not be available initially. 

I think that the image-saving kernel will need to access the hibernated
kernel's swap data structures to figure out which blocks are safe.  That won't
be very easy, though.

> [snip]
> 
> > I think it's most portable to use initrd for that, which already makes things
> > complicated.  Then, we'll have to load the image and jump to the hibernated
> > kernel in such a way that it would be able to continue from where it stopped
> > before.  I don't think that is trivial.
> 
> As we've discussed before, I think the resume from hibernate can be done
> essentially exactly as it is done currently; it may likely be possible
> to reuse the uswsusp kernel code for this purpose.

Yes, I remember that discussion.

Still, I think that there also is an advantage of using kexec here, since in
that case we won't need additional support from the architectures that already
support kexec.

> >> >> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
> >> >> be hibernate/resume by the normal kernel too. This way, a real
> >> >> kexec/boot-up is only needed for the first time.
> >> >
> >> > I'm not sure what you mean.
> >> 
> >> he's trying to get fancy again, the best way to speed up the boot of the 
> >> kexec kernel is make it smaller and avoid probing for devices (hotplug 
> >> should NOT be used for normal suspend situations)
> 
> > Still, I believe that we should do our best to use only one kernel (meaning one
> > kernel image) here.
> 
> It seems that it is not very difficult to make the choice of using a
> different kernel or not one that the user can make.  The only extra
> thing required to allow a different kernel to be used is to save and
> restore the text sections.

Well, I know too little about kexec to be able to comment that.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                 ` <87y7hl2xro.fsf@jbms.ath.cx>
@ 2007-07-13  2:38                   ` Mark Lord
  0 siblings, 0 replies; 113+ messages in thread
From: Mark Lord @ 2007-07-13  2:38 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: david, Jeremy Fitzhardinge, linux-kernel, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm

Jeremy Maitin-Shepard wrote:
>
> A typical usage pattern of hibernate on a laptop is to shut the lid,
> causing the system to start to hibernate, and to place the machine in

All laptops we have here, and those of all people I have seen
with laptops, do suspend-to-RAM on lid-close, not hibernate.

And even then, I generally wait to verify that the machine actually
did shut down, simply because the one time I didn't wait, Linux failed
to shutdown.  Which was discovered hours later at the end of the journey,
when a very hot notebook with a dead battery was unpacked.  Ugh.

Cheers

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]         ` <200707122134.29991.rjw@sisk.pl>
  2007-07-12 19:55           ` Jeremy Maitin-Shepard
@ 2007-07-13  3:06           ` david
  2007-07-13  5:42             ` Hibernating To Swap Considered Harmful Joseph Fannin
                               ` (4 more replies)
       [not found]           ` <877ip54cti.fsf@jbms.ath.cx>
  2 siblings, 5 replies; 113+ messages in thread
From: david @ 2007-07-13  3:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:

> On Thursday, 12 July 2007 20:57, david@lang.hm wrote:
>> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
>>
>>>> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
>>>> before kexec and restore them after kexec.
>>>
>>> I don't think this is very important initially.
>>
>> I agree, a stipped down hibernate kernel can be very small, not allocating
>> this memory until it's needed is a step for the final polishing.
>
> I'm not sure if I agree with that.  In any case, having to use two different
> kernels for hibernation would be a big drawback.

I see it as a big advantage to not have to use the main kernel for the 
suspend. please keep it as an option at least.

>>>> 3. Support the in-place kexec? The relocatable kernel is not necessary
>>>> if this can be implemented.
>>>> 4. Image writing/reading. (Only user space application is needed).
>>>
>>> And a kernel interface for that application.
>>
>> I do't understand this statement, this application is just useing the
>> standard kernel interfaces (block devices to read/write to disk, network
>> devices to read/write to a server, etc). no new interfaces needed.
>
> Yes, but it will have to know _what_ to save, no?
>
> Plus we need to figure out how to avoid corrupting filesystems and swap in use
> by the "old" kernel and its processes (hint: a separate "hibernation partition"
> is a no-go).

I thought the existing hibernation wrote to the swap partition as it's 
dedicated space?

I didn't know that anyone was suggesting writing the hibernation image to 
a filesystem that the kernel was activly accessing.

>>>> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
>>>> for resume. For example, in the first stage of kernel boot, just first
>>>> 16M (or a little more) RAM is used, if the resume image is found, the
>>>> saved kernel image is resumed; if the resume image is not found, turn on
>>>> the remaining RAM. This will depends on 3.
>>>
>>> I think that this is the most difficult part of the whole thing.
>>
>> don't try to get too fancy right now. stick with a simple 'boot hibernate
>> kernel, it's userspace looks for an image to resume, and if it doesn't
>> find one reboot to the normal system'
>>
>> I don't know how to do this with grub, but it would be a trivial shell
>> script with lilo
>
> I think it's most portable to use initrd for that, which already makes things
> complicated.  Then, we'll have to load the image and jump to the hibernated
> kernel in such a way that it would be able to continue from where it stopped
> before.  I don't think that is trivial.

I was talking about the scripts that would be used inside the initrd (or 
boot partition of whatever type)

to start with don't worry about how the kexec kernel gets it's / 
filesystem (for testing just use a real partition on your disk. after you 
get everything working let the people who really understand the initrd and 
consider it trivial switch it to an initrd image)

fo rthe current stage where we are trying to make things work don't worry 
about packaging everything tight with initrd and re-useing partitions or 
kernel images. once everything is working reliably then it's time to look 
at useing the same kernel for multiple functions, writing to a partition 
that's i use for other things, etc

unless you are saying that this is a trivial task, and if someone is 
willing to use seperate partitions and kernels this works well, and 
therefor the only problem left is how to make it look the same to the user 
as the old approach?

>>>> 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
>>>> be hibernate/resume by the normal kernel too. This way, a real
>>>> kexec/boot-up is only needed for the first time.
>>>
>>> I'm not sure what you mean.
>>
>> he's trying to get fancy again, the best way to speed up the boot of the
>> kexec kernel is make it smaller and avoid probing for devices (hotplug
>> should NOT be used for normal suspend situations)
>
> Still, I believe that we should do our best to use only one kernel (meaning one
> kernel image) here.

later on it may be the right thing, for now get it working with different 
images.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]           ` <877ip54cti.fsf@jbms.ath.cx>
  2007-07-12 20:45             ` Rafael J. Wysocki
@ 2007-07-13  3:12             ` david
       [not found]             ` <Pine.LNX.4.64.0707122008550.25614@asgard.lang.hm>
  2 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-13  3:12 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm

On Thu, 12 Jul 2007, Jeremy Maitin-Shepard wrote:

> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>
> [snip]
>
>> There's more to it, though.  If devices are suspended, the hibernation kernel
>> will have to resume them (using platform, like ACPI, callbacks in the process)
>> instead and that will get complicated.
>
>> It's better if devices are quiesced, or even shut down, before we call the
>> hibernation kernel.
>
> I agree that they definitely should not be put into a low power mode, as
> that has nothing to do with hibernation.
>
> Ideally, the following would be done:
>
> All of the hardware that won't be needed by the "save image" kernel will
> be shut down.  The normal driver shut down calls may not be suitable,
> however, because although the same thing should be done to the hardware,
> the device shouldn't be "unregistered", since unlike in the actual
> shutdown case, the same device will need to brought back up again on
> resume, and it will need to have the same device id and such (and
> userspace probably shouldn't see the device going away).
>
> Any devices that will be needed by the "save image" kernel could also
> safely be shutdown as with the unneeded devices, but it would be more
> efficient to simply quiesce it.  Since this would be an additional
> complication, initially probably all of the hardware should be shut
> down, rather than quiesced.
>
> The reason that I think it is useful to actually shut down the devices,
> rather than merely leaving some unneeded devices quiesced, is that it
> would be useful to be able to build the "save image" kernel without
> support for unneeded devices.  In order to support "suspend to ram"
> instead of shutting down after saving the image to disk, the hibernate
> kernel needs to be able to send devices into a low power state.  My
> impression is that if there are devices it does not know about (i.e. the
> unneeded devices), but which are left quiesced but powered on, this
> would be a problem for suspend to ram, although not knowing much about
> how suspend to ram actually works, I could be mistaken.  (Maybe it is
> possible through ACPI or standard bus interfaces to shut down all of the
> devices without really knowing anything about them.)

I don't think that anyone is talking about useing kexec for 
suspend-to-ram, only for suspend-to-disk (hibernate)

>>>>> 3. Support the in-place kexec? The relocatable kernel is not necessary
>>>>> if this can be implemented.
>>>>> 4. Image writing/reading. (Only user space application is needed).
>>>>
>>>> And a kernel interface for that application.
>>>
>>> I do't understand this statement, this application is just useing the
>>> standard kernel interfaces (block devices to read/write to disk, network
>>> devices to read/write to a server, etc). no new interfaces needed.
>
>> Yes, but it will have to know _what_ to save, no?
>
> I agree that a kernel interface would be important; something like
> /dev/snapshot that can be read by the "save image" kernel, and written
> to by the "restore image" kernel.  Note that similarly, kdump provides a
> kernel interface to an ELF image of the old kernel.

I thought that the idea was to save the entire contents of ram so that 
caches, etc remain populated.

having the system kernel free up ram and then making a sg list of what 
memory needs to be backed up would be a nice enhancement, but let's let 
that remain a future enhancement until everyone agrees that the basic 
approach works.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Hibernating To Swap Considered Harmful
  2007-07-13  3:06           ` david
@ 2007-07-13  5:42             ` Joseph Fannin
  2007-07-13  5:57               ` david
                                 ` (2 more replies)
  2007-07-13  9:29             ` [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Rafael J. Wysocki
                               ` (3 subsequent siblings)
  4 siblings, 3 replies; 113+ messages in thread
From: Joseph Fannin @ 2007-07-13  5:42 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, Jul 12, 2007 at 08:06:43PM -0700, david@lang.hm wrote:
> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:

> > Plus we need to figure out how to avoid corrupting filesystems and
> > swap in use by the "old" kernel and its processes (hint: a separate
> > "hibernation partition" is a no-go).
>
> I thought the existing hibernation wrote to the swap partition as it's
> dedicated space?
>
> I didn't know that anyone was suggesting writing the hibernation image to
> a filesystem that the kernel was activly accessing.

I'm suggesting a dedicated, preallocated hibernation *file*, right
now.  There's no way around it, if hibernation is to be reliable --
otherwise hibernation can fail if the system has used enough of its
swap space, so that there isn't enough room to write the hibernate
image.

Even if it's desirable to allow hibernation to fail if the system is
too deep into swap, it's a moot point.

Consider how the need to ensure that there is enough space to write
the hibernate image is dealt with now:  by making a big honking swap
space, so big that enough of it is all but guaranteed to be free,
except under the heaviest of memory usage.  So the space is already
reserved -- and now that it's commingled with actual swap, you have the
need to pass the swap data structures between the two kernels.

Consider instead,  you set up two swap spaces, one regular, and one
for hibernation. You don't touch the "hibernation swap" unless the
other is full -- I think just setting a lower priority on the swap
space is enough for this.  Before you jump to the hibernate kernel,
you swapoff that hibernate swap.

If you can't swapoff the hibernate swap, hibernate fails right there.

If you can, you have your space for writing the image, free and clear
of any of the original kernel's internal state.  There isn't any need
to treat that space as swap any more at all -- the only reason to do
so would be to reuse the existing code.

Setting aside two partitions for swap is obviously undesireable, but
thankfully, Linux supports swap *files*.

There hasn't been a performance penalty to using a swap file (vs. a
partition) since sometime in the 2.5 series.  Well, swap files can be
fragmented, but that needs to be considered against the *guaranteed*
seeks you'll see with a swap partition on the same disk as a busy
filesystem, as is the usual case.

The only reasons I can see that Linux usually uses a single swap
partition are that that's how it's always been done, and because
swsusp doesn't support anything other than a single swap device.  So,
despite Linux supporting those things, you can't actually use a swap
file or (or more than one swap device) if you want hibernation
support.

(Suspend2 has supported swap files for a long time, and I think I
heard that uswsusp supports them now too.)

Once you accept that swap files need to be supported, you're
already going to be supporting everything you need to support a
dedicated hibernation file -- if you don't consider the trouble to
share the swap and hibernate space to be worth the gain.

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
  2007-07-13  5:42             ` Hibernating To Swap Considered Harmful Joseph Fannin
@ 2007-07-13  5:57               ` david
  2007-07-13  6:20                 ` Joseph Fannin
       [not found]                 ` <20070713062039.GA29055@nineveh.local>
  2007-07-13  9:30               ` Rafael J. Wysocki
       [not found]               ` <200707131130.51279.rjw@sisk.pl>
  2 siblings, 2 replies; 113+ messages in thread
From: david @ 2007-07-13  5:57 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Fri, 13 Jul 2007, Joseph Fannin wrote:

> Date: Fri, 13 Jul 2007 01:42:48 -0400
> On Thu, Jul 12, 2007 at 08:06:43PM -0700, david@lang.hm wrote:
>> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
>
>>> Plus we need to figure out how to avoid corrupting filesystems and
>>> swap in use by the "old" kernel and its processes (hint: a separate
>>> "hibernation partition" is a no-go).
>>
>> I thought the existing hibernation wrote to the swap partition as it's
>> dedicated space?
>>
>> I didn't know that anyone was suggesting writing the hibernation image to
>> a filesystem that the kernel was activly accessing.
>
> I'm suggesting a dedicated, preallocated hibernation *file*, right
> now.  There's no way around it, if hibernation is to be reliable --
> otherwise hibernation can fail if the system has used enough of its
> swap space, so that there isn't enough room to write the hibernate
> image.
>
> Even if it's desirable to allow hibernation to fail if the system is
> too deep into swap, it's a moot point.
>
> Consider how the need to ensure that there is enough space to write
> the hibernate image is dealt with now:  by making a big honking swap
> space, so big that enough of it is all but guaranteed to be free,
> except under the heaviest of memory usage.  So the space is already
> reserved -- and now that it's commingled with actual swap, you have the
> need to pass the swap data structures between the two kernels.
>
> Consider instead,  you set up two swap spaces, one regular, and one
> for hibernation. You don't touch the "hibernation swap" unless the
> other is full -- I think just setting a lower priority on the swap
> space is enough for this.  Before you jump to the hibernate kernel,
> you swapoff that hibernate swap.
>
> If you can't swapoff the hibernate swap, hibernate fails right there.
>
> If you can, you have your space for writing the image, free and clear
> of any of the original kernel's internal state.  There isn't any need
> to treat that space as swap any more at all -- the only reason to do
> so would be to reuse the existing code.
>
> Setting aside two partitions for swap is obviously undesireable, but
> thankfully, Linux supports swap *files*.

the only justification I have heard for why the hibernate image must be 
written to the swap partition is backwards compatibility (i.e., we've 
always done it that way)

if you are going to reserve disk space for hibernation, what is so bad 
about useing a normal partition?

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
  2007-07-13  5:57               ` david
@ 2007-07-13  6:20                 ` Joseph Fannin
       [not found]                 ` <20070713062039.GA29055@nineveh.local>
  1 sibling, 0 replies; 113+ messages in thread
From: Joseph Fannin @ 2007-07-13  6:20 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thu, Jul 12, 2007 at 10:57:04PM -0700, david@lang.hm wrote:
> On Fri, 13 Jul 2007, Joseph Fannin wrote:
>
> the only justification I have heard for why the hibernate image must be
> written to the swap partition is backwards compatibility (i.e., we've
> always done it that way)
>
> if you are going to reserve disk space for hibernation, what is so bad
> about useing a normal partition?
>
    You have to either repartition when you upgrade your memory, or
waste a bunch of disk space with a partition as large as you think
your RAM might ever expand to.

    Swap/hibernate files can be created, deleted, and resized without
partitioning.

    Also: not all platforms support a large number of partitions.
It's not academic -- Intel Macintoshes are limited to four, with two
taken by Mac OS.  Add Windows and a Linux /, and you're out --
there's no room for a swap file.

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                 ` <20070713062039.GA29055@nineveh.local>
@ 2007-07-13  6:27                   ` david
       [not found]                   ` <Pine.LNX.4.64.0707122319270.25614@asgard.lang.hm>
  1 sibling, 0 replies; 113+ messages in thread
From: david @ 2007-07-13  6:27 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Fri, 13 Jul 2007, Joseph Fannin wrote:

> On Thu, Jul 12, 2007 at 10:57:04PM -0700, david@lang.hm wrote:
>> On Fri, 13 Jul 2007, Joseph Fannin wrote:
>>
>> the only justification I have heard for why the hibernate image must be
>> written to the swap partition is backwards compatibility (i.e., we've
>> always done it that way)
>>
>> if you are going to reserve disk space for hibernation, what is so bad
>> about useing a normal partition?
>>
>    You have to either repartition when you upgrade your memory, or
> waste a bunch of disk space with a partition as large as you think
> your RAM might ever expand to.

memory upgrades are rare and tools are available nowdays to resize linux 
partitions.

>    Swap/hibernate files can be created, deleted, and resized without
> partitioning.

if you just use the hibernate file as a reserved set of blocks and never 
touch them from the main OS things will work, but if you do anything that 
could put those blocks into the OS write cache all bets are off.

>    Also: not all platforms support a large number of partitions.
> It's not academic -- Intel Macintoshes are limited to four, with two
> taken by Mac OS.  Add Windows and a Linux /, and you're out --
> there's no room for a swap file.

interesting, I didn't know that. I know that the Tivo's use Mac style 
partition tables and they have many partitions (10+). it seems odd that 
when switching from powerpc to x86 that they would lock themselves down 
like that. are you sure that they can't have extended partitions like 
standard PC's? it seems odd that they would have such a special partition 
table type if windows can access it.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                   ` <Pine.LNX.4.64.0707122319270.25614@asgard.lang.hm>
@ 2007-07-13  7:15                     ` Joseph Fannin
       [not found]                     ` <20070713071512.GB29055@nineveh.local>
  1 sibling, 0 replies; 113+ messages in thread
From: Joseph Fannin @ 2007-07-13  7:15 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thu, Jul 12, 2007 at 11:27:41PM -0700, david@lang.hm wrote:
> On Fri, 13 Jul 2007, Joseph Fannin wrote:
>
> >On Thu, Jul 12, 2007 at 10:57:04PM -0700, david@lang.hm wrote:
> >>On Fri, 13 Jul 2007, Joseph Fannin wrote:
> >>
> >>the only justification I have heard for why the hibernate image must be
> >>written to the swap partition is backwards compatibility (i.e., we've
> >>always done it that way)
> >>
> >>if you are going to reserve disk space for hibernation, what is so bad
> >>about useing a normal partition?
> >>
> >   You have to either repartition when you upgrade your memory, or
> >waste a bunch of disk space with a partition as large as you think
> >your RAM might ever expand to.
>
> memory upgrades are rare and tools are available nowdays to resize linux
> partitions.

I was recently involved in a week long headache that resulted from
a botched filesystem resizing.  I didn't have backups (most people
don't) and I lost a lot.

I did get the chance to recreate my ext3 filesystem with 256k
inodes, as seen in ext4.  Other than the kernel and e2fsprogs, every
tool I point at the new filesystem pukes and dies, in no particular
order.

So:  bring a system down for hours to perform a dangerous operation
with tools that may be out of date, or spend five minutes with "dd" as
root on a running system -- making use of features supported by the
kernel for years?

Some people think memory changes are important enough to write
code to allow memory hotplug.  My hardware doesn't support that, but
some people have been bandying about the idea of
hibernate->hardware change->resume.

> >   Swap/hibernate files can be created, deleted, and resized without
> >partitioning.
>
> if you just use the hibernate file as a reserved set of blocks and never
> touch them from the main OS things will work, but if you do anything that
> could put those blocks into the OS write cache all bets are off.

Well don't do that then.  No one should have permission to open
those files anyway, they're full of privledged data, just like the
nodes in /dev.

If the kernel's caching swap, that's... just perverse.

> >   Also: not all platforms support a large number of partitions.
> >It's not academic -- Intel Macintoshes are limited to four, with two
> >taken by Mac OS.  Add Windows and a Linux /, and you're out --
> >there's no room for a swap file.
>
> interesting, I didn't know that. I know that the Tivo's use Mac style
> partition tables and they have many partitions (10+). it seems odd that
> when switching from powerpc to x86 that they would lock themselves down
> like that. are you sure that they can't have extended partitions like
> standard PC's? it seems odd that they would have such a special partition
> table type if windows can access it.

Intel Macs use GPT partition tables, which support a huge number
of primary partitions, and so don't support secondary partitions.

32bit Windows does not support GPT, so PC-style MBR partition tables
must also be used.  GPT was designed to coexist with MBR tools, so
this mostly works, but you're limited to the union of supported
features -- 4 primary partitions, no secondaries.

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]             ` <Pine.LNX.4.64.0707122008550.25614@asgard.lang.hm>
@ 2007-07-13  9:17               ` Rafael J. Wysocki
  2007-07-13  9:25                 ` david
  0 siblings, 1 reply; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-13  9:17 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Friday, 13 July 2007 05:12, david@lang.hm wrote:
> On Thu, 12 Jul 2007, Jeremy Maitin-Shepard wrote:
> 
> > "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> >
> > [snip]
> >
> >> There's more to it, though.  If devices are suspended, the hibernation kernel
> >> will have to resume them (using platform, like ACPI, callbacks in the process)
> >> instead and that will get complicated.
> >
> >> It's better if devices are quiesced, or even shut down, before we call the
> >> hibernation kernel.
> >
> > I agree that they definitely should not be put into a low power mode, as
> > that has nothing to do with hibernation.
> >
> > Ideally, the following would be done:
> >
> > All of the hardware that won't be needed by the "save image" kernel will
> > be shut down.  The normal driver shut down calls may not be suitable,
> > however, because although the same thing should be done to the hardware,
> > the device shouldn't be "unregistered", since unlike in the actual
> > shutdown case, the same device will need to brought back up again on
> > resume, and it will need to have the same device id and such (and
> > userspace probably shouldn't see the device going away).
> >
> > Any devices that will be needed by the "save image" kernel could also
> > safely be shutdown as with the unneeded devices, but it would be more
> > efficient to simply quiesce it.  Since this would be an additional
> > complication, initially probably all of the hardware should be shut
> > down, rather than quiesced.
> >
> > The reason that I think it is useful to actually shut down the devices,
> > rather than merely leaving some unneeded devices quiesced, is that it
> > would be useful to be able to build the "save image" kernel without
> > support for unneeded devices.  In order to support "suspend to ram"
> > instead of shutting down after saving the image to disk, the hibernate
> > kernel needs to be able to send devices into a low power state.  My
> > impression is that if there are devices it does not know about (i.e. the
> > unneeded devices), but which are left quiesced but powered on, this
> > would be a problem for suspend to ram, although not knowing much about
> > how suspend to ram actually works, I could be mistaken.  (Maybe it is
> > possible through ACPI or standard bus interfaces to shut down all of the
> > devices without really knowing anything about them.)
> 
> I don't think that anyone is talking about useing kexec for 
> suspend-to-ram, only for suspend-to-disk (hibernate)
> 
> >>>>> 3. Support the in-place kexec? The relocatable kernel is not necessary
> >>>>> if this can be implemented.
> >>>>> 4. Image writing/reading. (Only user space application is needed).
> >>>>
> >>>> And a kernel interface for that application.
> >>>
> >>> I do't understand this statement, this application is just useing the
> >>> standard kernel interfaces (block devices to read/write to disk, network
> >>> devices to read/write to a server, etc). no new interfaces needed.
> >
> >> Yes, but it will have to know _what_ to save, no?
> >
> > I agree that a kernel interface would be important; something like
> > /dev/snapshot that can be read by the "save image" kernel, and written
> > to by the "restore image" kernel.  Note that similarly, kdump provides a
> > kernel interface to an ELF image of the old kernel.
> 
> I thought that the idea was to save the entire contents of ram so that 
> caches, etc remain populated.
> 
> having the system kernel free up ram and then making a sg list of what 
> memory needs to be backed up would be a nice enhancement, but let's let 
> that remain a future enhancement until everyone agrees that the basic 
> approach works.

It's not that easy. :-)

First, there are memory regions that we don't want to save, because the
restoration of them may cause problems (generally all of the reserved pages
fall into this category).

We also don't want to save free RAM and we don't want to save the memory
occupied by the hibernation kernel (ie. the "new" one).

Also, please note that we can't restore 100% of RAM, even if we save it.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13  9:17               ` Rafael J. Wysocki
@ 2007-07-13  9:25                 ` david
  2007-07-13 11:41                   ` Rafael J. Wysocki
       [not found]                   ` <200707131341.35801.rjw@sisk.pl>
  0 siblings, 2 replies; 113+ messages in thread
From: david @ 2007-07-13  9:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:

> Date: Fri, 13 Jul 2007 11:17:37 +0200
> From: Rafael J. Wysocki <rjw@sisk.pl>
> To: david@lang.hm
> Cc: Jeremy Maitin-Shepard <jbms@cmu.edu>,
>     "Huang, Ying" <ying.huang@intel.com>,
>     Andrew Morton <akpm@linux-foundation.org>, Pavel Machek <pavel@ucw.cz>,
>     nigel@nigel.suspend2.net, linux-kernel@vger.kernel.org,
>     linux-pm@lists.linux-foundation.org
> Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
> 
> On Friday, 13 July 2007 05:12, david@lang.hm wrote:
>> On Thu, 12 Jul 2007, Jeremy Maitin-Shepard wrote:
>>
>>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>>>>>>> 3. Support the in-place kexec? The relocatable kernel is not necessary
>>>>>>> if this can be implemented.
>>>>>>> 4. Image writing/reading. (Only user space application is needed).
>>>>>>
>>>>>> And a kernel interface for that application.
>>>>>
>>>>> I do't understand this statement, this application is just useing the
>>>>> standard kernel interfaces (block devices to read/write to disk, network
>>>>> devices to read/write to a server, etc). no new interfaces needed.
>>>
>>>> Yes, but it will have to know _what_ to save, no?
>>>
>>> I agree that a kernel interface would be important; something like
>>> /dev/snapshot that can be read by the "save image" kernel, and written
>>> to by the "restore image" kernel.  Note that similarly, kdump provides a
>>> kernel interface to an ELF image of the old kernel.
>>
>> I thought that the idea was to save the entire contents of ram so that
>> caches, etc remain populated.
>>
>> having the system kernel free up ram and then making a sg list of what
>> memory needs to be backed up would be a nice enhancement, but let's let
>> that remain a future enhancement until everyone agrees that the basic
>> approach works.
>
> It's not that easy. :-)
>
> First, there are memory regions that we don't want to save, because the
> restoration of them may cause problems (generally all of the reserved pages
> fall into this category).
>
> We also don't want to save free RAM and we don't want to save the memory
> occupied by the hibernation kernel (ie. the "new" one).

free ram is useually a pretty small number of pages (unless you free up 
ram before suspend). avoiding the ram reserved for the new kernel should 
be pretty simple (actually, it doesn't hurt much to save that ram, it just 
hurts if you try to restore it)

> Also, please note that we can't restore 100% of RAM, even if we save it.

Ok, now we need a data channel from the old kernel to the hibernate 
kernel, to the restore kernel. and the messier the memory layout the 
larger this data channel needs to be (hmm, what's the status on the memory 
defrag patches being proposed?) if this list can be made small enough it 
would work to just have the old kernel put the data in a known location in 
ram, and let the other two parts find it (in ram for the hibernate kernel, 
in the hibernate image for the wakeup kernel). how do the existing 
hibernate processes store this? since people are complaining about the 
amount of ram that a kexec kernel would take up I'm assuiming it's 
somethingmore complex then just a bitmap of all possible pages.

most of the conversation so far has been around the process of makeing the 
snapshot and storing it. what are the processes and tools available to 
restore images?

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13  3:06           ` david
  2007-07-13  5:42             ` Hibernating To Swap Considered Harmful Joseph Fannin
@ 2007-07-13  9:29             ` Rafael J. Wysocki
       [not found]             ` <200707131129.34974.rjw@sisk.pl>
                               ` (2 subsequent siblings)
  4 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-13  9:29 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Friday, 13 July 2007 05:06, david@lang.hm wrote:
> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> 
> > On Thursday, 12 July 2007 20:57, david@lang.hm wrote:
> >> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> >>
> >>>> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
> >>>> before kexec and restore them after kexec.
> >>>
> >>> I don't think this is very important initially.
> >>
> >> I agree, a stipped down hibernate kernel can be very small, not allocating
> >> this memory until it's needed is a step for the final polishing.
> >
> > I'm not sure if I agree with that.  In any case, having to use two different
> > kernels for hibernation would be a big drawback.
> 
> I see it as a big advantage to not have to use the main kernel for the 
> suspend. please keep it as an option at least.

That depends on what we would like to support.  For example, we may need to use
the same kernel for the suspend-to-disk-and-RAM feature.

> >>>> 3. Support the in-place kexec? The relocatable kernel is not necessary
> >>>> if this can be implemented.
> >>>> 4. Image writing/reading. (Only user space application is needed).
> >>>
> >>> And a kernel interface for that application.
> >>
> >> I do't understand this statement, this application is just useing the
> >> standard kernel interfaces (block devices to read/write to disk, network
> >> devices to read/write to a server, etc). no new interfaces needed.
> >
> > Yes, but it will have to know _what_ to save, no?
> >
> > Plus we need to figure out how to avoid corrupting filesystems and swap in use
> > by the "old" kernel and its processes (hint: a separate "hibernation partition"
> > is a no-go).
> 
> I thought the existing hibernation wrote to the swap partition as it's 
> dedicated space?
> 
> I didn't know that anyone was suggesting writing the hibernation image to 
> a filesystem that the kernel was activly accessing.

We can write to a swap file and Nigel can write to a nonswap file too.

Plus if we use swap for hibernation, we need to know which swap pages have been
allocated for normal use by the kernel.

> >>>> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
> >>>> for resume. For example, in the first stage of kernel boot, just first
> >>>> 16M (or a little more) RAM is used, if the resume image is found, the
> >>>> saved kernel image is resumed; if the resume image is not found, turn on
> >>>> the remaining RAM. This will depends on 3.
> >>>
> >>> I think that this is the most difficult part of the whole thing.
> >>
> >> don't try to get too fancy right now. stick with a simple 'boot hibernate
> >> kernel, it's userspace looks for an image to resume, and if it doesn't
> >> find one reboot to the normal system'
> >>
> >> I don't know how to do this with grub, but it would be a trivial shell
> >> script with lilo
> >
> > I think it's most portable to use initrd for that, which already makes things
> > complicated.  Then, we'll have to load the image and jump to the hibernated
> > kernel in such a way that it would be able to continue from where it stopped
> > before.  I don't think that is trivial.
> 
> I was talking about the scripts that would be used inside the initrd (or 
> boot partition of whatever type)
> 
> to start with don't worry about how the kexec kernel gets it's / 
> filesystem (for testing just use a real partition on your disk. after you 
> get everything working let the people who really understand the initrd and 
> consider it trivial switch it to an initrd image)

Well, my experience shows that you need to do everything yourself up to a
certain point.

> fo rthe current stage where we are trying to make things work don't worry 
> about packaging everything tight with initrd and re-useing partitions or 
> kernel images. once everything is working reliably then it's time to look 
> at useing the same kernel for multiple functions, writing to a partition 
> that's i use for other things, etc

I don't agree.  You need to think of many limitations in advance, because
they need to be taken into consideration in the design.

Otherwise we'll end up with something that will need to be bandaided like the
freezer. :-)

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
  2007-07-13  5:42             ` Hibernating To Swap Considered Harmful Joseph Fannin
  2007-07-13  5:57               ` david
@ 2007-07-13  9:30               ` Rafael J. Wysocki
       [not found]               ` <200707131130.51279.rjw@sisk.pl>
  2 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-13  9:30 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

On Friday, 13 July 2007 07:42, Joseph Fannin wrote:
> On Thu, Jul 12, 2007 at 08:06:43PM -0700, david@lang.hm wrote:
> > On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> 
> > > Plus we need to figure out how to avoid corrupting filesystems and
> > > swap in use by the "old" kernel and its processes (hint: a separate
> > > "hibernation partition" is a no-go).
> >
> > I thought the existing hibernation wrote to the swap partition as it's
> > dedicated space?
> >
> > I didn't know that anyone was suggesting writing the hibernation image to
> > a filesystem that the kernel was activly accessing.
> 
> I'm suggesting a dedicated, preallocated hibernation *file*, right
> now.  There's no way around it, if hibernation is to be reliable --
> otherwise hibernation can fail if the system has used enough of its
> swap space, so that there isn't enough room to write the hibernate
> image.
> 
> Even if it's desirable to allow hibernation to fail if the system is
> too deep into swap, it's a moot point.

If you're afraid of that, use a dedicated swap file.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]             ` <200707131129.34974.rjw@sisk.pl>
@ 2007-07-13  9:38               ` david
  2007-07-13 11:59                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 113+ messages in thread
From: david @ 2007-07-13  9:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:

> On Friday, 13 July 2007 05:06, david@lang.hm wrote:
>> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
>>
>>> On Thursday, 12 July 2007 20:57, david@lang.hm wrote:
>>>> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
>>>>
>>>>>> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
>>>>>> before kexec and restore them after kexec.
>>>>>
>>>>> I don't think this is very important initially.
>>>>
>>>> I agree, a stipped down hibernate kernel can be very small, not allocating
>>>> this memory until it's needed is a step for the final polishing.
>>>
>>> I'm not sure if I agree with that.  In any case, having to use two different
>>> kernels for hibernation would be a big drawback.
>>
>> I see it as a big advantage to not have to use the main kernel for the
>> suspend. please keep it as an option at least.
>
> That depends on what we would like to support.  For example, we may need to use
> the same kernel for the suspend-to-disk-and-RAM feature.

I missed this discussion. is this idea to suspend, write to disk, but 
leave things in ram so that if you wakeup soon enough you have everything 
for ram, but if you don't and the battery dies you can restore from disk?

if so I think it's a mistake to mix the two. it would be better to just 
suspend to ram, and wake up once in a while to check the battery state and 
when the battery gets low enough do the suspend to disk.

otherwise you end up mixing the requirements of the two types of suspend, 
which is how things got so ugly in the first place.

>>>>>> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
>>>>>> for resume. For example, in the first stage of kernel boot, just first
>>>>>> 16M (or a little more) RAM is used, if the resume image is found, the
>>>>>> saved kernel image is resumed; if the resume image is not found, turn on
>>>>>> the remaining RAM. This will depends on 3.
>>>>>
>>>>> I think that this is the most difficult part of the whole thing.
>>>>
>>>> don't try to get too fancy right now. stick with a simple 'boot hibernate
>>>> kernel, it's userspace looks for an image to resume, and if it doesn't
>>>> find one reboot to the normal system'
>>>>
>>>> I don't know how to do this with grub, but it would be a trivial shell
>>>> script with lilo
>>>
>>> I think it's most portable to use initrd for that, which already makes things
>>> complicated.  Then, we'll have to load the image and jump to the hibernated
>>> kernel in such a way that it would be able to continue from where it stopped
>>> before.  I don't think that is trivial.
>>
>> I was talking about the scripts that would be used inside the initrd (or
>> boot partition of whatever type)
>>
>> to start with don't worry about how the kexec kernel gets it's /
>> filesystem (for testing just use a real partition on your disk. after you
>> get everything working let the people who really understand the initrd and
>> consider it trivial switch it to an initrd image)
>
> Well, my experience shows that you need to do everything yourself up to a
> certain point.

true, but if you get something that can work reliably, even if ugly, you 
get a lot more people willing to polish it then is you are asking them to 
help you implement the core features.

remember release early, release often (with something that functions)

>> fo rthe current stage where we are trying to make things work don't worry
>> about packaging everything tight with initrd and re-useing partitions or
>> kernel images. once everything is working reliably then it's time to look
>> at useing the same kernel for multiple functions, writing to a partition
>> that's i use for other things, etc
>
> I don't agree.  You need to think of many limitations in advance, because
> they need to be taken into consideration in the design.
>
> Otherwise we'll end up with something that will need to be bandaided like the
> freezer. :-)

on the other hand, worrying about all the possible ways to do things can 
paralize you.

the big advantage of the kexec approach is that the new userspace that's 
setup with the new kernel can do _anything_. if/when this works you will 
see people doing things that you probably never imagined (a simple one is 
to suspend a machine at work, send the image over the network and resume 
on a different machine at home). and all these strange things are 
encapsulated so that you don't have to worry about how they will be done 
now.

it's good to try and find the places where you have fundamental changes to 
make to support them, but there are a lot of things that boil down to 
implementation details, everyone agrees that it can be done reliably, the 
decisions on how just need to be made.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13  9:25                 ` david
@ 2007-07-13 11:41                   ` Rafael J. Wysocki
       [not found]                   ` <200707131341.35801.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-13 11:41 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Friday, 13 July 2007 11:25, david@lang.hm wrote:
> On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:
> 
> > Date: Fri, 13 Jul 2007 11:17:37 +0200
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > To: david@lang.hm
> > Cc: Jeremy Maitin-Shepard <jbms@cmu.edu>,
> >     "Huang, Ying" <ying.huang@intel.com>,
> >     Andrew Morton <akpm@linux-foundation.org>, Pavel Machek <pavel@ucw.cz>,
> >     nigel@nigel.suspend2.net, linux-kernel@vger.kernel.org,
> >     linux-pm@lists.linux-foundation.org
> > Subject: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
> > 
> > On Friday, 13 July 2007 05:12, david@lang.hm wrote:
> >> On Thu, 12 Jul 2007, Jeremy Maitin-Shepard wrote:
> >>
> >>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> >>>>>>> 3. Support the in-place kexec? The relocatable kernel is not necessary
> >>>>>>> if this can be implemented.
> >>>>>>> 4. Image writing/reading. (Only user space application is needed).
> >>>>>>
> >>>>>> And a kernel interface for that application.
> >>>>>
> >>>>> I do't understand this statement, this application is just useing the
> >>>>> standard kernel interfaces (block devices to read/write to disk, network
> >>>>> devices to read/write to a server, etc). no new interfaces needed.
> >>>
> >>>> Yes, but it will have to know _what_ to save, no?
> >>>
> >>> I agree that a kernel interface would be important; something like
> >>> /dev/snapshot that can be read by the "save image" kernel, and written
> >>> to by the "restore image" kernel.  Note that similarly, kdump provides a
> >>> kernel interface to an ELF image of the old kernel.
> >>
> >> I thought that the idea was to save the entire contents of ram so that
> >> caches, etc remain populated.
> >>
> >> having the system kernel free up ram and then making a sg list of what
> >> memory needs to be backed up would be a nice enhancement, but let's let
> >> that remain a future enhancement until everyone agrees that the basic
> >> approach works.
> >
> > It's not that easy. :-)
> >
> > First, there are memory regions that we don't want to save, because the
> > restoration of them may cause problems (generally all of the reserved pages
> > fall into this category).
> >
> > We also don't want to save free RAM and we don't want to save the memory
> > occupied by the hibernation kernel (ie. the "new" one).
> 
> free ram is useually a pretty small number of pages (unless you free up 
> ram before suspend). avoiding the ram reserved for the new kernel should 
> be pretty simple (actually, it doesn't hurt much to save that ram, it just 
> hurts if you try to restore it)
> 
> > Also, please note that we can't restore 100% of RAM, even if we save it.
> 
> Ok, now we need a data channel from the old kernel to the hibernate 
> kernel, to the restore kernel. and the messier the memory layout the 
> larger this data channel needs to be (hmm, what's the status on the memory 
> defrag patches being proposed?) if this list can be made small enough it 
> would work to just have the old kernel put the data in a known location in 
> ram, and let the other two parts find it (in ram for the hibernate kernel, 
> in the hibernate image for the wakeup kernel).

I think the hibernation kernel should mmap() the "old" kernel's (and it's
processes') memory available for saving, so that the image-saving process
can read its contents from the original locations.

> how do the existing hibernate processes store this?

There are two approaches.  In the first of them (used in the mainline) we just
create copies of all pages eligible for saving (hence we can't create images
larger than 50% of RAM) atomically and then we save the contents of these
copies (either directly from the kernel or through a user space process).  This
way we don't need to worry that they may be modified before we can save
them.

The other approach is the Nigel's one, in which all LRU pages are first saved
and then used as additional storage for copying the rest of memory contents.
This has a drawback that we are not 100% sure if the LRU won't be modified
after we've used them to store the copies of the other pages.

> since people are complaining about the amount of ram that a kexec kernel
> would take up I'm assuiming it's somethingmore complex then just a bitmap
> of all possible pages. 

No, it's just bitmaps, AFAICS, and the complaints are a bit overstated, IMO. ;-)

> most of the conversation so far has been around the process of makeing the 
> snapshot and storing it. what are the processes and tools available to 
> restore images?

We have quite an efficient restoration code in the kernel right now.  It's
able to upload big images (something like total RAM minus the size of the
boot kernel, initrd and, optionally, the resume application), which is much
more than we're able to save. :-)

It can work with images uploaded via /dev/snapshot from the user space
(specific image format is required, but that can be changed easily).

Geetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13  9:38               ` david
@ 2007-07-13 11:59                 ` Rafael J. Wysocki
  2007-07-13 14:37                   ` Alan Stern
                                     ` (3 more replies)
  0 siblings, 4 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-13 11:59 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Friday, 13 July 2007 11:38, david@lang.hm wrote:
> On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:
> 
> > On Friday, 13 July 2007 05:06, david@lang.hm wrote:
> >> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> >>
> >>> On Thursday, 12 July 2007 20:57, david@lang.hm wrote:
> >>>> On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> >>>>
> >>>>>> 2. Do not reserve memory for kexec kernel. That is, backup needed memory
> >>>>>> before kexec and restore them after kexec.
> >>>>>
> >>>>> I don't think this is very important initially.
> >>>>
> >>>> I agree, a stipped down hibernate kernel can be very small, not allocating
> >>>> this memory until it's needed is a step for the final polishing.
> >>>
> >>> I'm not sure if I agree with that.  In any case, having to use two different
> >>> kernels for hibernation would be a big drawback.
> >>
> >> I see it as a big advantage to not have to use the main kernel for the
> >> suspend. please keep it as an option at least.
> >
> > That depends on what we would like to support.  For example, we may need to use
> > the same kernel for the suspend-to-disk-and-RAM feature.
> 
> I missed this discussion. is this idea to suspend, write to disk, but 
> leave things in ram so that if you wakeup soon enough you have everything 
> for ram, but if you don't and the battery dies you can restore from disk?
> 
> if so I think it's a mistake to mix the two. it would be better to just 
> suspend to ram, and wake up once in a while to check the battery state and 
> when the battery gets low enough do the suspend to disk.
> 
> otherwise you end up mixing the requirements of the two types of suspend, 
> which is how things got so ugly in the first place.

Not necessarily.  If we don't put devices into low power states before creating
the image, that should work just fine (quiesce devices, create the image or
kexec the new kernel, reprobe devices, save the image, suspend to RAM,
resume from RAM, continue - or restore from the image if power failed in the
meantime).  Still, for this purpose, both kernels need to be able to handle the
same set of devices.

> >>>>>> 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
> >>>>>> for resume. For example, in the first stage of kernel boot, just first
> >>>>>> 16M (or a little more) RAM is used, if the resume image is found, the
> >>>>>> saved kernel image is resumed; if the resume image is not found, turn on
> >>>>>> the remaining RAM. This will depends on 3.
> >>>>>
> >>>>> I think that this is the most difficult part of the whole thing.
> >>>>
> >>>> don't try to get too fancy right now. stick with a simple 'boot hibernate
> >>>> kernel, it's userspace looks for an image to resume, and if it doesn't
> >>>> find one reboot to the normal system'
> >>>>
> >>>> I don't know how to do this with grub, but it would be a trivial shell
> >>>> script with lilo
> >>>
> >>> I think it's most portable to use initrd for that, which already makes things
> >>> complicated.  Then, we'll have to load the image and jump to the hibernated
> >>> kernel in such a way that it would be able to continue from where it stopped
> >>> before.  I don't think that is trivial.
> >>
> >> I was talking about the scripts that would be used inside the initrd (or
> >> boot partition of whatever type)
> >>
> >> to start with don't worry about how the kexec kernel gets it's /
> >> filesystem (for testing just use a real partition on your disk. after you
> >> get everything working let the people who really understand the initrd and
> >> consider it trivial switch it to an initrd image)
> >
> > Well, my experience shows that you need to do everything yourself up to a
> > certain point.
> 
> true, but if you get something that can work reliably, even if ugly, you 
> get a lot more people willing to polish it then is you are asking them to 
> help you implement the core features.

In theory and maybe.  And if you forget of something crucial from the
beginning, then you've just lost time (except for having some fun, perhaps).

> remember release early, release often (with something that functions)
> 
> >> fo rthe current stage where we are trying to make things work don't worry
> >> about packaging everything tight with initrd and re-useing partitions or
> >> kernel images. once everything is working reliably then it's time to look
> >> at useing the same kernel for multiple functions, writing to a partition
> >> that's i use for other things, etc
> >
> > I don't agree.  You need to think of many limitations in advance, because
> > they need to be taken into consideration in the design.
> >
> > Otherwise we'll end up with something that will need to be bandaided like the
> > freezer. :-)
> 
> on the other hand, worrying about all the possible ways to do things can 
> paralize you.
> 
> the big advantage of the kexec approach is that the new userspace that's 
> setup with the new kernel can do _anything_.

No, it can't.  For example, it can't access filesystems mounted by the
hibernated kernel, or they may get corrupted after the restore (if they are
journaling, it can't even read from them).

Which reminds me of one more issue, which is that the image-saving kernel
won't be able to use these filesystems either, so its modules and user space
will have to be available from somewhere else (like a RAM disk or dedicated
partition).  So things get ugly.

Apart from this, the new kernel's user space cannot blindly modify swap space
that might be in use by the hibernated kernel.

> if/when this works you will see people doing things that you probably never
> imagined (a simple one is to suspend a machine at work, send the image
> over the network and resume on a different machine at home). and all these
> strange things are encapsulated so that you don't have to worry about how
> they will be done now.
> 
> it's good to try and find the places where you have fundamental changes to 
> make to support them, but there are a lot of things that boil down to 
> implementation details, everyone agrees that it can be done reliably, the 
> decisions on how just need to be made.

The problem is, we don't know if that can be done reliably, yet.

To convince everyone, we'll need to have a proof-of-concept implementation
working reliably.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]             ` <1184347974.4523.30.camel@caritas-dev.intel.com>
@ 2007-07-13 12:01               ` Rafael J. Wysocki
  0 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-13 12:01 UTC (permalink / raw)
  To: Huang, Ying
  Cc: david, linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Friday, 13 July 2007 19:32, Huang, Ying wrote:
> On Thu, 2007-07-12 at 20:06 -0700, david@lang.hm wrote:
> > >> I agree, a stipped down hibernate kernel can be very small, not allocating
> > >> this memory until it's needed is a step for the final polishing.
> > >
> > > I'm not sure if I agree with that.  In any case, having to use two different
> > > kernels for hibernation would be a big drawback.
> > 
> > I see it as a big advantage to not have to use the main kernel for the 
> > suspend. please keep it as an option at least.
> 
> Yes. It has additional bonus to make it possible to write/read image
> from a program other than main kernel. For example, for a specific
> mobile device product (Such as Intel MID), a customized ultra-small
> program (or kernel) can be composed to write/read image. That way, the
> hibernate/resume time can be reduced to minimal.

You don't need kexec for that.  This is how the userland hibernation (aka
uswsusp) works.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                     ` <20070713071512.GB29055@nineveh.local>
@ 2007-07-13 14:35                       ` Jeremy Maitin-Shepard
       [not found]                       ` <87odig1idx.fsf@jbms.ath.cx>
  1 sibling, 0 replies; 113+ messages in thread
From: Jeremy Maitin-Shepard @ 2007-07-13 14:35 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm

jfannin@gmail.com (Joseph Fannin) writes:

[snip]

> Intel Macs use GPT partition tables, which support a huge number
> of primary partitions, and so don't support secondary partitions.

> 32bit Windows does not support GPT, so PC-style MBR partition tables
> must also be used.  GPT was designed to coexist with MBR tools, so
> this mostly works, but you're limited to the union of supported
> features -- 4 primary partitions, no secondaries.

There is a very simple solution to this obscure problem: (if I
understand correctly, you want to dual boot Mac OS X and Linux (and
maybe also Windows?))

use LVM, thus allowing you to have as many volumes as you like in the
partition

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13 11:59                 ` Rafael J. Wysocki
@ 2007-07-13 14:37                   ` Alan Stern
  2007-07-13 15:12                   ` Jeremy Maitin-Shepard
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 113+ messages in thread
From: Alan Stern @ 2007-07-13 14:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:

> > I missed this discussion. is this idea to suspend, write to disk, but 
> > leave things in ram so that if you wakeup soon enough you have everything 
> > for ram, but if you don't and the battery dies you can restore from disk?
> > 
> > if so I think it's a mistake to mix the two. it would be better to just 
> > suspend to ram, and wake up once in a while to check the battery state and 
> > when the battery gets low enough do the suspend to disk.
> > 
> > otherwise you end up mixing the requirements of the two types of suspend, 
> > which is how things got so ugly in the first place.
> 
> Not necessarily.  If we don't put devices into low power states before creating
> the image, that should work just fine (quiesce devices, create the image or
> kexec the new kernel, reprobe devices, save the image, suspend to RAM,
> resume from RAM, continue - or restore from the image if power failed in the
> meantime).  Still, for this purpose, both kernels need to be able to handle the
> same set of devices.

Why?

Suppose the kexec kernel can't handle some device.  The normal kernel 
has already quiesced the device, so it will remain quiescent while the 
kexec kernel runs and throughout the suspend.  When the regular kernel 
regains control the device will be ready for use.  I don't see any 
problem.

Alan Stern

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13 11:59                 ` Rafael J. Wysocki
  2007-07-13 14:37                   ` Alan Stern
@ 2007-07-13 15:12                   ` Jeremy Maitin-Shepard
       [not found]                   ` <87abu01gnv.fsf@jbms.ath.cx>
  2007-07-14  7:12                   ` david
  3 siblings, 0 replies; 113+ messages in thread
From: Jeremy Maitin-Shepard @ 2007-07-13 15:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

[snip]

> Not necessarily.  If we don't put devices into low power states before creating
> the image, that should work just fine (quiesce devices, create the image or
> kexec the new kernel, reprobe devices, save the image, suspend to RAM,
> resume from RAM, continue - or restore from the image if power failed in the
> meantime).  Still, for this purpose, both kernels need to be able to handle the
> same set of devices.

I don't know much about the suspend to RAM, but it seems that it would
indeed be necessary to have a device driver for a device in order to
switch it from e.g. a quiesced state to a low power state.  If, however,
the original kernel already completely turned off the device, then it
seems that the "save image" kernel shouldn't have to do anything to it
in order to suspend to RAM.  The drawback, though, is that since the old
kernel would have no way (unless the user tells it) to know which
devices should be left quiesced and which should be turned off, it would
have to turn them all off, which would mean spinning up and down the
disks.

On the other hand, being able to build the "save image" kernel with only
minimal hardware support could save a significant amount of the time
required to boot it.

[snip]

> No, it can't.  For example, it can't access filesystems mounted by the
> hibernated kernel, or they may get corrupted after the restore (if they are
> journaling, it can't even read from them).

That is true, but this also holds for the current hibernate
implementations.

> Which reminds me of one more issue, which is that the image-saving kernel
> won't be able to use these filesystems either, so its modules and user space
> will have to be available from somewhere else (like a RAM disk or dedicated
> partition).  So things get ugly.

This is not the issue that it appears to be, though.  Under the current
hibernate implementations, this very same userspace and set of modules
must be available "somewhere else" (i.e. an initrd) because it is needed
by the restore path.  Note that under the kexec approach, save and
restore become rather symmetric operations.

> Apart from this, the new kernel's user space cannot blindly modify swap space
> that might be in use by the hibernated kernel.

But it seems easy enough to swapoff in order to completely free up the
swap space.  I suppose the disadvantage is that instead of failing
cleanly if there is insufficient memory, the OOM killer will be invoked
and cause all sorts of havoc.  This suggests that it may indeed be
important to support "cooperation" with the old kernel on saving the
image sooner, rather than later.

[snip]

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                   ` <87abu01gnv.fsf@jbms.ath.cx>
@ 2007-07-13 15:45                     ` Rafael J. Wysocki
       [not found]                     ` <200707131745.43055.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-13 15:45 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: david, linux-kernel, Eric W. Biederman, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm

On Friday, 13 July 2007 17:12, Jeremy Maitin-Shepard wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> [snip]
> 
> > Not necessarily.  If we don't put devices into low power states before creating
> > the image, that should work just fine (quiesce devices, create the image or
> > kexec the new kernel, reprobe devices, save the image, suspend to RAM,
> > resume from RAM, continue - or restore from the image if power failed in the
> > meantime).  Still, for this purpose, both kernels need to be able to handle the
> > same set of devices.
> 
> I don't know much about the suspend to RAM, but it seems that it would
> indeed be necessary to have a device driver for a device in order to
> switch it from e.g. a quiesced state to a low power state.  If, however,
> the original kernel already completely turned off the device, then it
> seems that the "save image" kernel shouldn't have to do anything to it
> in order to suspend to RAM.  The drawback, though, is that since the old
> kernel would have no way (unless the user tells it) to know which
> devices should be left quiesced and which should be turned off, it would
> have to turn them all off, which would mean spinning up and down the
> disks.
> 
> On the other hand, being able to build the "save image" kernel with only
> minimal hardware support could save a significant amount of the time
> required to boot it.
> 
> [snip]
> 
> > No, it can't.  For example, it can't access filesystems mounted by the
> > hibernated kernel, or they may get corrupted after the restore (if they are
> > journaling, it can't even read from them).
> 
> That is true, but this also holds for the current hibernate
> implementations.
> 
> > Which reminds me of one more issue, which is that the image-saving kernel
> > won't be able to use these filesystems either, so its modules and user space
> > will have to be available from somewhere else (like a RAM disk or dedicated
> > partition).  So things get ugly.
> 
> This is not the issue that it appears to be, though.  Under the current
> hibernate implementations, this very same userspace and set of modules
> must be available "somewhere else" (i.e. an initrd) because it is needed
> by the restore path.  Note that under the kexec approach, save and
> restore become rather symmetric operations.
> 
> > Apart from this, the new kernel's user space cannot blindly modify swap space
> > that might be in use by the hibernated kernel.
> 
> But it seems easy enough to swapoff in order to completely free up the
> swap space.  I suppose the disadvantage is that instead of failing
> cleanly if there is insufficient memory, the OOM killer will be invoked
> and cause all sorts of havoc.  This suggests that it may indeed be
> important to support "cooperation" with the old kernel on saving the
> image sooner, rather than later.

Okay, I have thought it through and I think that, as an initial step, we can do
something like this:

- preload the image-saving kernel before hibernation
- in the hibernation code path replace device_suspend() with the shutting down of
  all devices without unregistering them (not very nice, but should be sufficient
  for a while)
- when we've called device_power_down() and save_processor_state(), jump to
  the image-saving kernel and let it run
- make the image-saving kernel set up everything, save the image without
  starting any user space (we may use the existing image-saving code for this
  purpose, with some modifications) and power off the system (or make it enter
  S4)
- use the existing restoration code to load the image and jump to the
  hibernated kernel
- in the restore code patch replace device_resume() with the reprobing of all
  devices.

Comments?

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                     ` <200707131745.43055.rjw@sisk.pl>
@ 2007-07-13 15:50                       ` Alan Stern
  2007-07-13 16:48                       ` Jeremy Maitin-Shepard
  1 sibling, 0 replies; 113+ messages in thread
From: Alan Stern @ 2007-07-13 15:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Eric W. Biederman, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:

> Okay, I have thought it through and I think that, as an initial step, we can do
> something like this:
> 
> - preload the image-saving kernel before hibernation
> - in the hibernation code path replace device_suspend() with the shutting down of
>   all devices without unregistering them (not very nice, but should be sufficient
>   for a while)
> - when we've called device_power_down() and save_processor_state(), jump to
>   the image-saving kernel and let it run
> - make the image-saving kernel set up everything, save the image without
>   starting any user space (we may use the existing image-saving code for this
>   purpose, with some modifications) and power off the system (or make it enter
>   S4)
> - use the existing restoration code to load the image and jump to the
>   hibernated kernel
> - in the restore code patch replace device_resume() with the reprobing of all
>   devices.
> 
> Comments?

I doubt that re-probing devices will work.  The probe routine won't 
expect there to be any registered children, so it will try to 
re-register them.

On the other hand, post_restore methods could be written to expect 
something like this.

Alan Stern

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]       ` <1184368525.1069.68.camel@caritas-dev.intel.com>
@ 2007-07-13 16:43         ` Eric W. Biederman
       [not found]         ` <m1k5t4dzl4.fsf@ebiederm.dsl.xmission.com>
  1 sibling, 0 replies; 113+ messages in thread
From: Eric W. Biederman @ 2007-07-13 16:43 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Kexec Mailing List, linux-kernel, Pavel Machek, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

"Huang, Ying" <ying.huang@intel.com> writes:

> On Thu, 2007-07-12 at 10:32 -0600, Eric W. Biederman wrote:
>> >
>> > 1. Separate device suspend from device hibernate.
>> 
>> Actually in some very practical sense we already have two copies of
>> this in the kernel.  device_shutdown and the hotunplug/module
>> remove code.  So it is should be mostly a matter of using what we have.
>
> Maybe I misuse the terminology. The "device hibernate" here means put
> device into quiescent state and save the device state into memory for
> later restore.
>
> But, how to restore device state after jumping back from kexeced kernel
> if device_shutdown or hotplug/module removing code is used?

With device_shutdown there really isn't a path (although putting
the device into a quiescent state is exactly what that method does).

However with the module remove path you disassociate the pci device
and the pci device driver and then after restore you go through
the pci devices and you redo the association (as if the pci
device or the module had just been loaded).

For devices that have a really slow probe/initialization this might be
an issue but it will work, and generally it will work fairly quickly.

So in that sense restore may be the wrong concept for devices.
At least for prototypes.

>> Basically all this entails is to modify sys_reboot()
>> and adding a LINUX_REBOOT_CMD_KSPAWN and have that command
>> enter the kexec path with the appropriate set of calls.
>> I would be really surprised if this winds up with much
>> more code then the current kernel_kexec function.
>
> Yes, this is the right place to trigger kexec jump. Thank you for your
> valuable comments. It seems that just the device state
> quiesce/save/restore and CPU state save/restore code need to be
> added.

Yes.

For cpu state I'm fairly certain we don't need to do anything fancy.

At least for the first pass prototyping with a uniprocessor kernel
will remove that hurdle.


>> For prototyping I would:
>> - reserve a chunk of memory (possibly with the crashkernel= option)
>>   and run a relocatable kernel out of it.  
>> 
>>   By using the normal kexec you can boot a relocatable restore kernel
>>   in that reserved region. It is an extra step but it makes things
>>   work today.
>> 
>> - I would use the normal sys_kexec_load.
>> 
>> - I would debug/tweak the user space and the code to reenter the
>>   old kernel.  I.e. the device driver stop/start code.
>
> The above 3 steps are exactly what I have done in this patch. I reserve
> memory with crashkernel=, use sys_kexec_load (kexec -p ...) to load the
> kexec kernel and manage to jump back to normal kernel. But I should not
> mix the kexec jump trigger code with the software suspend code, that is,
> use "kexec -e", not "echo disk > /sys/power/state".

Right.  That caused all kinds of weirdness in your patch.


>> > 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
>> > for resume. For example, in the first stage of kernel boot, just first
>> > 16M (or a little more) RAM is used, if the resume image is found, the
>> > saved kernel image is resumed; if the resume image is not found, turn on
>> > the remaining RAM. This will depends on 3.
>> 
>> Well I expect the resume will be load the resumed kernel into reserved
>> memory.  And kexec a very small assembly stub that will jump back
>> to the code in relocate_kernel.S which will call ret.
>> 
>> Then either hot add the rest of our memory or kexec to a kernel without
>> restrictions.
>
> Why a assembly stub is necessary? Is it not sufficient that just
> continue to complete a normal boot (hot add the reset of memory) or load
> the hibernated kernel (hibernated image) and jump to it?

I was thinking the assembly stub would be the small piece that jumps
to loaded hibernated kernel.  Quite possibly we could just get away
with providing no memory and just an entry point to kexec but it
makes sense to me to plan on running a couple of instructions.

Actually the way the kexec infrastructure it might be reasonable to
just use sys_kexec_load to load the entire hibernated image.  Except
for the fact that sys_kexec_load requires the source pages to be
in the processes memory image the code shouldn't have the 50% of
memory limitation already.

If we can get that going we don't even need to restrict the first
kernels memory.  So it might just require teaching sys_kexec_load
how to steal process pages.  Anyway something to think about.

>> Well just not loading drivers you aren't going to use and generally avoiding
>> long disk probing times will help here.  We control all of the code so
>> it should be relatively straight forward.
>
> Just not loading unnecessary drivers may be sufficient. But further
> optimization is also possible.
>
> The basic idea of optimization is:
>
> For first run:
>
> 1. boot the normal kernel A
> 2. kexec the hibernate kernel B
> 3. jump back to kernel A
> 4. Save the kernel B into a image
> 5. Work under kernel A.
>
> For normal use:
>
> 1. boot the normal kernel A
> 2. resume the image of kernel B
> 3. jump to kernel B
> 4. Save the kernel A into image

Yes.  That may help.  Cooperative multitasking between kernels.

Anyway all of that comes after we get something simple working.
Then we can see what really needs to be optimized, and what works
well enough.

Eric

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                     ` <200707131745.43055.rjw@sisk.pl>
  2007-07-13 15:50                       ` Alan Stern
@ 2007-07-13 16:48                       ` Jeremy Maitin-Shepard
  2007-07-13 21:23                         ` Rafael J. Wysocki
  1 sibling, 1 reply; 113+ messages in thread
From: Jeremy Maitin-Shepard @ 2007-07-13 16:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Eric W. Biederman, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

[snip]

> Okay, I have thought it through and I think that, as an initial step, we can do
> something like this:

> - preload the image-saving kernel before hibernation
> - in the hibernation code path replace device_suspend() with the shutting down
> of
>   all devices without unregistering them (not very nice, but should be
> sufficient
>   for a while)

It seems that the effect of what is done by the current hibernate
implementations is to shutdown all of the devices, but according to
kernel data structures, have it look like the devices were merely
suspended (i.e. device_suspend).  Then in the resume path, the "restore
image" kernel also calls device_suspend just before jumping to the
hibernated kernel, so all of the devices the "restore image" kernel knew
about are in the state device_suspend expects them to be in, except that
they were actually suspended by a different kernel, so they might not be
in quite the right state.  There is also the issue that the "restore
image" kernel might not know about all of the devices; for instance, if
USB support is modular, and, as is likely to be the case, the user
didn't load the USB modules in the "restore image" kernel from an initrd
or something, then the USB devices will actually be powered off, rather
than "suspended".  Despite these apparent discrepancies, it seems that
for many devices (I'm not sure USB devices are included, though),
device_restore happens to do the right thing so that the device is
placed back in the state it needs to be so that the driver can begin
talking to it as it did before, and the device is recognized as the same
device as was there before (since otherwise mounted filesystems backed
by block devices that came back as a different device would cause great
havoc).

Since I recall there being issues with USB devices being recognized as
the same devices post-hibernate-resume, without looking at the code I'm
inclined to believe that the USB drivers still don't end up resuming
from hibernate correctly.

Note that I am describing what is done currently, not what is planned to
be done (i.e. change device_suspend to quiesce and device_resume to
unquiesce).  It seems that ironically, despite everyone believing that
device_suspend/device_resume is incorrect for hibernate, many of the
things that those functions do (like saving the PCI configuration,
perhaps, and then restoring it later, or re-initializing the device) are
actually necessary, especially for modular drivers that won't be loaded
by the "restore image" kernel.

What needs to be done is for the devices to be shut down (or possibly
just quiesced for a select few, but we won't worry about that
complication until later; in the case of the current implementations,
they should all be quiesced rather than shut down), but whatever
information that will be needed later to reinitialize the device
(ideally the reinitialization should be able to handle the device either
being in a quiesced state or completely off) and recognize it as the
same device must be saved.  This probably means they cannot be
"unregistered", as otherwise there would be nothing with which to
associate the saved information.

The resume path needs to use the saved state to reinitialize the device
and recognize it as the same device.  It seems that the existing
reprobing code may not be sufficient for this.  Note that exactly the
same thing must be done on resume for both the current hibernate
implementations and the kexec approach.  It seems that properly
restoring the devices should be relatively easy for the devices that
already get this correct, like IDE devices and basic PCI devices (and
SATA and SCSI devices as well perhaps?), and possibly harder for
Firewire or USB devices.

> - when we've called device_power_down() and save_processor_state(), jump to
>   the image-saving kernel and let it run
> - make the image-saving kernel set up everything, save the image without
>   starting any user space (we may use the existing image-saving code for this
>   purpose, with some modifications) and power off the system (or make it enter
>   S4)

I suppose this has the advantage of not requiring that a
kernel-to-userspace interface be created for this purpose.

> - use the existing restoration code to load the image and jump to the
>   hibernated kernel

This would again avoid the need for a separate userspace-kernelspace
interface for the purpose, so I agree it could be a useful thing to do
initially.

> - in the restore code patch replace device_resume() with the reprobing of all
>   devices.

See my comments above.

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13  3:06           ` david
                               ` (3 preceding siblings ...)
       [not found]             ` <1184347974.4523.30.camel@caritas-dev.intel.com>
@ 2007-07-13 17:32             ` Huang, Ying
  4 siblings, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-13 17:32 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Thu, 2007-07-12 at 20:06 -0700, david@lang.hm wrote:
> >> I agree, a stipped down hibernate kernel can be very small, not allocating
> >> this memory until it's needed is a step for the final polishing.
> >
> > I'm not sure if I agree with that.  In any case, having to use two different
> > kernels for hibernation would be a big drawback.
> 
> I see it as a big advantage to not have to use the main kernel for the 
> suspend. please keep it as an option at least.

Yes. It has additional bonus to make it possible to write/read image
from a program other than main kernel. For example, for a specific
mobile device product (Such as Intel MID), a customized ultra-small
program (or kernel) can be composed to write/read image. That way, the
hibernate/resume time can be reduced to minimal.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13 16:48                       ` Jeremy Maitin-Shepard
@ 2007-07-13 21:23                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-13 21:23 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: david, linux-kernel, Eric W. Biederman, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm

On Friday, 13 July 2007 18:48, Jeremy Maitin-Shepard wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> [snip]
> 
> > Okay, I have thought it through and I think that, as an initial step, we can do
> > something like this:
> 
> > - preload the image-saving kernel before hibernation
> > - in the hibernation code path replace device_suspend() with the shutting down
> > of
> >   all devices without unregistering them (not very nice, but should be
> > sufficient
> >   for a while)
> 
> It seems that the effect of what is done by the current hibernate
> implementations is to shutdown all of the devices, but according to
> kernel data structures, have it look like the devices were merely
> suspended (i.e. device_suspend).  Then in the resume path, the "restore
> image" kernel also calls device_suspend just before jumping to the
> hibernated kernel, so all of the devices the "restore image" kernel knew
> about are in the state device_suspend expects them to be in, except that
> they were actually suspended by a different kernel, so they might not be
> in quite the right state.  There is also the issue that the "restore
> image" kernel might not know about all of the devices; for instance, if
> USB support is modular, and, as is likely to be the case, the user
> didn't load the USB modules in the "restore image" kernel from an initrd
> or something, then the USB devices will actually be powered off, rather
> than "suspended".  Despite these apparent discrepancies, it seems that
> for many devices (I'm not sure USB devices are included, though),
> device_restore happens to do the right thing so that the device is
> placed back in the state it needs to be so that the driver can begin
> talking to it as it did before, and the device is recognized as the same
> device as was there before (since otherwise mounted filesystems backed
> by block devices that came back as a different device would cause great
> havoc).
> 
> Since I recall there being issues with USB devices being recognized as
> the same devices post-hibernate-resume, without looking at the code I'm
> inclined to believe that the USB drivers still don't end up resuming
> from hibernate correctly.
> 
> Note that I am describing what is done currently, not what is planned to
> be done (i.e. change device_suspend to quiesce and device_resume to
> unquiesce).  It seems that ironically, despite everyone believing that
> device_suspend/device_resume is incorrect for hibernate, many of the
> things that those functions do (like saving the PCI configuration,
> perhaps, and then restoring it later, or re-initializing the device) are
> actually necessary, especially for modular drivers that won't be loaded
> by the "restore image" kernel.
> 
> What needs to be done is for the devices to be shut down (or possibly
> just quiesced for a select few, but we won't worry about that
> complication until later; in the case of the current implementations,
> they should all be quiesced rather than shut down), but whatever
> information that will be needed later to reinitialize the device
> (ideally the reinitialization should be able to handle the device either
> being in a quiesced state or completely off) and recognize it as the
> same device must be saved.  This probably means they cannot be
> "unregistered", as otherwise there would be nothing with which to
> associate the saved information.
> 
> The resume path needs to use the saved state to reinitialize the device
> and recognize it as the same device.  It seems that the existing
> reprobing code may not be sufficient for this.  Note that exactly the
> same thing must be done on resume for both the current hibernate
> implementations and the kexec approach.  It seems that properly
> restoring the devices should be relatively easy for the devices that
> already get this correct, like IDE devices and basic PCI devices (and
> SATA and SCSI devices as well perhaps?), and possibly harder for
> Firewire or USB devices.

The problem of handling devices during hibernation has been discussed for many
times on linux-pm and we have reached certain agreement.  I wouldn't like to
repeat all of the arguments here.  For details, please refer, for example, to
this thread:

https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012386.html

> > - when we've called device_power_down() and save_processor_state(), jump to
> >   the image-saving kernel and let it run
> > - make the image-saving kernel set up everything, save the image without
> >   starting any user space (we may use the existing image-saving code for this
> >   purpose, with some modifications) and power off the system (or make it enter
> >   S4)
> 
> I suppose this has the advantage of not requiring that a
> kernel-to-userspace interface be created for this purpose.

Yes, and for now we're avoiding the problems with starting the new user space
from a special place.

> > - use the existing restoration code to load the image and jump to the
> >   hibernated kernel
> 
> This would again avoid the need for a separate userspace-kernelspace
> interface for the purpose, so I agree it could be a useful thing to do
> initially.
> 
> > - in the restore code patch replace device_resume() with the reprobing of all
> >   devices.
> 
> See my comments above.

And please see my reply. :-)

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]     ` <m14pk9fuqa.fsf@ebiederm.dsl.xmission.com>
  2007-07-12 19:09       ` david
       [not found]       ` <1184368525.1069.68.camel@caritas-dev.intel.com>
@ 2007-07-13 23:15       ` Huang, Ying
  2 siblings, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-13 23:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kexec Mailing List, linux-kernel, Pavel Machek, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

On Thu, 2007-07-12 at 10:32 -0600, Eric W. Biederman wrote:
> >
> > 1. Separate device suspend from device hibernate.
> 
> Actually in some very practical sense we already have two copies of
> this in the kernel.  device_shutdown and the hotunplug/module
> remove code.  So it is should be mostly a matter of using what we have.

Maybe I misuse the terminology. The "device hibernate" here means put
device into quiescent state and save the device state into memory for
later restore.

But, how to restore device state after jumping back from kexeced kernel
if device_shutdown or hotplug/module removing code is used?

> Basically all this entails is to modify sys_reboot()
> and adding a LINUX_REBOOT_CMD_KSPAWN and have that command
> enter the kexec path with the appropriate set of calls.
> I would be really surprised if this winds up with much
> more code then the current kernel_kexec function.

Yes, this is the right place to trigger kexec jump. Thank you for your
valuable comments. It seems that just the device state
quiesce/save/restore and CPU state save/restore code need to be added.

> This might wind up exactly the same as the current
> LINUX_REBOOT_CMD_KEXEC but at least until we have a working
> prototype it makes sense to allow for differences.
> 
> This should allow the kexec based implementation to coincide
> with the existing software suspend to disk code until it is proven out
> and then we can just remove all of the software suspend code to
> disk code.
> 
> > 2. Do not reserve memory for kexec kernel. That is, backup needed memory
> > before kexec and restore them after kexec.
> > 3. Support the in-place kexec? The relocatable kernel is not necessary
> > if this can be implemented.
> 
> It sounds like what you really want is the normal kexec path enhanced
> so that you can return to the kernel you started with.

Yes

> The normal kexec path already knows how to do the memory shuffle so
> it can do on demand memory allocation.  That code just needs to
> enhanced slightly so that you allocate an extra page, setup an inverse
> scatter gather list for restoring the pages, and teach relocate_kernel.S
> to preserve it's destination pages by using the inverse scatter gather
> list.

Yes, I have seen that code. The framework of current kexec memory
copying can be used for memory backup before kexec.

> The normal kexec path already calls device_shutdown and the like to
> stop devices from running.  Although again that code path is not
> prepared to restore the devices.

Yes, the device state quiesce/save/restore and CPU state save/restore
code should be added.

> For prototyping I would:
> - reserve a chunk of memory (possibly with the crashkernel= option)
>   and run a relocatable kernel out of it.  
> 
>   By using the normal kexec you can boot a relocatable restore kernel
>   in that reserved region. It is an extra step but it makes things
>   work today.
> 
> - I would use the normal sys_kexec_load.
> 
> - I would debug/tweak the user space and the code to reenter the
>   old kernel.  I.e. the device driver stop/start code.

The above 3 steps are exactly what I have done in this patch. I reserve
memory with crashkernel=, use sys_kexec_load (kexec -p ...) to load the
kexec kernel and manage to jump back to normal kernel. But I should not
mix the kexec jump trigger code with the software suspend code, that is,
use "kexec -e", not "echo disk > /sys/power/state".

>   Once it was basically working I would the update normal kexec
>   memory copy code in relocate.S to preserve the destination pages.

This is what I plan to do.

> > 4. Image writing/reading. (Only user space application is needed).
> 
> And possibly a few fixes to /dev/mem.  This is pretty much the same
> process as generating a core dump so there should be some synergy with that.
> 
> We probably want to use something like the ELF header the crashdump
> path uses to communicate to the kernel saving memory which memory
> regions need to be saved.  Which probably means that we you can use the
> exact same method as the kexec on panic kernel uses to save memory.

This sound good. The information in ELF header can be used to locate
necessary information for image writing.

> > 5. A smooth resume process. Maybe it is not needed to kexec a new kernel
> > for resume. For example, in the first stage of kernel boot, just first
> > 16M (or a little more) RAM is used, if the resume image is found, the
> > saved kernel image is resumed; if the resume image is not found, turn on
> > the remaining RAM. This will depends on 3.
> 
> Well I expect the resume will be load the resumed kernel into reserved
> memory.  And kexec a very small assembly stub that will jump back
> to the code in relocate_kernel.S which will call ret.
> 
> Then either hot add the rest of our memory or kexec to a kernel without
> restrictions.

Why a assembly stub is necessary? Is it not sufficient that just
continue to complete a normal boot (hot add the reset of memory) or load
the hibernated kernel (hibernated image) and jump to it?

> > 6. Reduce the boot-up time of kexec kernel. Maybe the kexec kernel can
> > be hibernate/resume by the normal kernel too. This way, a real
> > kexec/boot-up is only needed for the first time.
> 
> Well just not loading drivers you aren't going to use and generally avoiding
> long disk probing times will help here.  We control all of the code so
> it should be relatively straight forward.

Just not loading unnecessary drivers may be sufficient. But further
optimization is also possible.

The basic idea of optimization is:

For first run:

1. boot the normal kernel A
2. kexec the hibernate kernel B
3. jump back to kernel A
4. Save the kernel B into a image
5. Work under kernel A.

For normal use:

1. boot the normal kernel A
2. resume the image of kernel B
3. jump to kernel B
4. Save the kernel A into image

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]     ` <20070712085428.GA1866@elf.ucw.cz>
@ 2007-07-13 23:18       ` Huang, Ying
  0 siblings, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-13 23:18 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Thu, 2007-07-12 at 10:54 +0200, Pavel Machek wrote:
> > Sorry, I should have re-checked the mail before sending out.
> 
> Were your patches enough to get hibernation working? I got kexec to
> work here, so I guess I'm one step closer...

Yes, it is just the first step. There are still many steps should be
done to get hibernation work.

> ...video does not work in the kexec-ed kernel, unless I boot with
> vga=1.
> 

I will check it.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]               ` <200707131130.51279.rjw@sisk.pl>
@ 2007-07-14  0:45                 ` Joseph Fannin
       [not found]                 ` <20070714004517.GA18336@nineveh.local>
  1 sibling, 0 replies; 113+ messages in thread
From: Joseph Fannin @ 2007-07-14  0:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Fri, Jul 13, 2007 at 11:30:50AM +0200, Rafael J. Wysocki wrote:
> On Friday, 13 July 2007 07:42, Joseph Fannin wrote:
> > On Thu, Jul 12, 2007 at 08:06:43PM -0700, david@lang.hm wrote:
> > > On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> >
> > > > Plus we need to figure out how to avoid corrupting filesystems and
> > > > swap in use by the "old" kernel and its processes (hint: a separate
> > > > "hibernation partition" is a no-go).
> > >
> > > I thought the existing hibernation wrote to the swap partition as it's
> > > dedicated space?
> > >
> > > I didn't know that anyone was suggesting writing the hibernation image to
> > > a filesystem that the kernel was activly accessing.
> >
> > I'm suggesting a dedicated, preallocated hibernation *file*, right
> > now.  There's no way around it, if hibernation is to be reliable --
> > otherwise hibernation can fail if the system has used enough of its
> > swap space, so that there isn't enough room to write the hibernate
> > image.
> >
> > Even if it's desirable to allow hibernation to fail if the system is
> > too deep into swap, it's a moot point.
>
> If you're afraid of that, use a dedicated swap file.

    I don't understand what you mean.  A dedicated swap file for what?

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]         ` <m1k5t4dzl4.fsf@ebiederm.dsl.xmission.com>
@ 2007-07-14  5:48           ` Huang, Ying
       [not found]           ` <1184392129.1898.69.camel@caritas-dev.intel.com>
  1 sibling, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-14  5:48 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kexec Mailing List, linux-kernel, Pavel Machek, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

On Fri, 2007-07-13 at 10:43 -0600, Eric W. Biederman wrote:
> > Why a assembly stub is necessary? Is it not sufficient that just
> > continue to complete a normal boot (hot add the reset of memory) or load
> > the hibernated kernel (hibernated image) and jump to it?
> 
> I was thinking the assembly stub would be the small piece that jumps
> to loaded hibernated kernel.  Quite possibly we could just get away
> with providing no memory and just an entry point to kexec but it
> makes sense to me to plan on running a couple of instructions.

Oh, I got it. In my patch, there is such assembly stub in
arch/i386/kernel/kexec_jump.S. I think it is needed to restore basic CPU
state and accommodate some position independent restore code (such as
memory restore code).

> Actually the way the kexec infrastructure it might be reasonable to
> just use sys_kexec_load to load the entire hibernated image.  Except
> for the fact that sys_kexec_load requires the source pages to be
> in the processes memory image the code shouldn't have the 50% of
> memory limitation already.
> 
> If we can get that going we don't even need to restrict the first
> kernels memory.  So it might just require teaching sys_kexec_load
> how to steal process pages.  Anyway something to think about.

As for memory backupping and restoring during hibernating and resuming,
I think a possible picture can be as follow:

Memory:

  Total memory: 512M
  Memory used by hibernating/resuming kernel: 0~16M


Hibernating process:

  1. Normal kernel running
  2. Hibernating is triggered, sys_kexec_load is used to load
     hibernating kernel and initramfs into memory. Then
     sys_reboot(LINUX_REBOOT_CMD_KSPAWN) is invoked.
  3. In sys_reboot, kexec_jump is called to save device/CPU state,
     then relocate_kernel is called. kexec_jump and relocate_kernel
     reside in individual page in 16M~512M.
  4. In relocate_kernel, 0~16M is backupped firstly, then the
     hibernating kernel and initramfs is copied to 0~16M, after that,
     the hibernating kernel is booted.
  5. In hibernating kernel, the memory of normal kernel (it is in
     16M~512M) is saved into a hibernation image through /dev/mem
     and ELF header.


Resume process:

  1. Resuming kernel is booted as a normal kernel, but the memory is
     restricted to 0~16M.
  2. Checking whether there is a effective hibernation image. If
     there isn't, the memory of 16M~512M is hot added, and the normal
     boot up process continues; If there is, a resuming process is
     triggered.
  3. sys_kexec_load is used to restore the memory state of hibernated
     kernel. The sys_kexec_load works in crashdump way, that is, the
     hibernation image is copied to destination location in 16M~512M
     in sys_kexec_load instead of relocate_kernel. There is no half
     of memory size restriction.
  4. sys_reboot is called to trigger jumping back, which will jump back
     to kexec_jump of hibernated kernel.
  5. In kexec_jump of hibernated kernel, the memory of 0~16M is copied
     back from the backup area in 16M~512M. The memory state of
     hibernated kernel is restored totally. The CPU and device state
     can be restored after that.


If there is too much difficulty to hot add memory in step 2. A more
conservative method can be used as step 1 and step 2.

  1. A normal kernel is booted.
  2. Checking whether there is a effective hibernation image. If there
     isn't, continue the normal boot process; otherwise, a resuming
     kernel is kexeced in memory 0~16M. The resuming process will
     continue in kexeced resuming kernel.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-13 11:59                 ` Rafael J. Wysocki
                                     ` (2 preceding siblings ...)
       [not found]                   ` <87abu01gnv.fsf@jbms.ath.cx>
@ 2007-07-14  7:12                   ` david
  3 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-14  7:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:

>> remember release early, release often (with something that functions)
>>
>>>> fo rthe current stage where we are trying to make things work don't worry
>>>> about packaging everything tight with initrd and re-useing partitions or
>>>> kernel images. once everything is working reliably then it's time to look
>>>> at useing the same kernel for multiple functions, writing to a partition
>>>> that's i use for other things, etc
>>>
>>> I don't agree.  You need to think of many limitations in advance, because
>>> they need to be taken into consideration in the design.
>>>
>>> Otherwise we'll end up with something that will need to be bandaided like the
>>> freezer. :-)
>>
>> on the other hand, worrying about all the possible ways to do things can
>> paralize you.
>>
>> the big advantage of the kexec approach is that the new userspace that's
>> setup with the new kernel can do _anything_.
>
> No, it can't.  For example, it can't access filesystems mounted by the
> hibernated kernel, or they may get corrupted after the restore (if they are
> journaling, it can't even read from them).

it's only ext3 that has this bug when mounting a fileystem read-only 
AFAIK.

> Which reminds me of one more issue, which is that the image-saving kernel
> won't be able to use these filesystems either, so its modules and user space
> will have to be available from somewhere else (like a RAM disk or dedicated
> partition).  So things get ugly.

another reason to go ahead and make a dedicated no-module kernel for the 
hibernate phase ;-)

> Apart from this, the new kernel's user space cannot blindly modify swap space
> that might be in use by the hibernated kernel.

with swap space you have two options

1. free it up (to a swap file if needed) and don't worry about it

2. try and ensure that nothing else on the system attempts to use the swap 
partition until you reboot.

#2 is failure prone, #1 would be more reliable.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                   ` <200707131341.35801.rjw@sisk.pl>
@ 2007-07-14  7:51                     ` david
       [not found]                     ` <Pine.LNX.4.64.0707140017560.25614@asgard.lang.hm>
  1 sibling, 0 replies; 113+ messages in thread
From: david @ 2007-07-14  7:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:

>> Ok, now we need a data channel from the old kernel to the hibernate
>> kernel, to the restore kernel. and the messier the memory layout the
>> larger this data channel needs to be (hmm, what's the status on the memory
>> defrag patches being proposed?) if this list can be made small enough it
>> would work to just have the old kernel put the data in a known location in
>> ram, and let the other two parts find it (in ram for the hibernate kernel,
>> in the hibernate image for the wakeup kernel).
>
> I think the hibernation kernel should mmap() the "old" kernel's (and it's
> processes') memory available for saving, so that the image-saving process
> can read its contents from the original locations.

but I'll bet that not all kernels keep the info in the same place (and 
probably not even in the same format). I'm suggesting that a standard be 
defined for the format of the data and the location of a pointer to it 
that will be maintained across kernel versions.

>> how do the existing hibernate processes store this?
>
> There are two approaches.  In the first of them (used in the mainline) we just
> create copies of all pages eligible for saving (hence we can't create images
> larger than 50% of RAM) atomically and then we save the contents of these
> copies (either directly from the kernel or through a user space process).  This
> way we don't need to worry that they may be modified before we can save
> them.
>
> The other approach is the Nigel's one, in which all LRU pages are first saved
> and then used as additional storage for copying the rest of memory contents.
> This has a drawback that we are not 100% sure if the LRU won't be modified
> after we've used them to store the copies of the other pages.

and since both current approaches use the same kernel for everything the 
issues I'm thinking of simply don't apply.

>> since people are complaining about the amount of ram that a kexec kernel
>> would take up I'm assuiming it's somethingmore complex then just a bitmap
>> of all possible pages.
>
> No, it's just bitmaps, AFAICS, and the complaints are a bit overstated, IMO. ;-)

1 bit for each 4k means 1m bits for 4g of ram, or 128k of bitmaps, growing 
up to 1m of ram used for 32G of ram in the system. I guess this isn't bad 
as long as it doesn't need to be contiguous for the new kernel to access 
it.

ok, that makes it a pretty trivial thing to work with. I just need to 
learn how to find the bitmaps.

>> most of the conversation so far has been around the process of makeing the
>> snapshot and storing it. what are the processes and tools available to
>> restore images?
>
> We have quite an efficient restoration code in the kernel right now.  It's
> able to upload big images (something like total RAM minus the size of the
> boot kernel, initrd and, optionally, the resume application), which is much
> more than we're able to save. :-)
>
> It can work with images uploaded via /dev/snapshot from the user space
> (specific image format is required, but that can be changed easily).

Ok, so it sounds as if the restore is basicly a solved problem, great!

so now I want to do a but-ugly hibernate configuration.

while I haven't done it yet I am very confident that I can enable kexec 
and create a kernel and filesystem to boot into. I can then use perl to 
get the bitmaps out of kmem (once I learn how to find them), and then can 
read from kmem to a file to save the 4k chunks that I need to a file.

it'll be ugly and use lots of partitions, but it should end up being solid 
from what I am understanding.

where can I learn how to find the bitmaps and what format things need to 
be in to feed into /dev/snapshot?

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                     ` <Pine.LNX.4.64.0707140017560.25614@asgard.lang.hm>
@ 2007-07-14  8:33                       ` david
       [not found]                       ` <Pine.LNX.4.64.0707140128210.25614@asgard.lang.hm>
  2007-07-14 20:00                       ` Rafael J. Wysocki
  2 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-14  8:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

by the way, a data point on kernel sizes

-rw-r--r-- 1 root root  864648 Jul 14 00:53 vmlinuz.2.6.22.1.hibernate
-rw-r--r-- 1 root root  659496 Jul 14 01:17 vmlinuz.2.6.22.1.hibernate.stripped
-rw-r--r-- 1 root root 3948168 Jul 14 01:10 vmlinuz.2.6.22.1.running

the running one matches the config I'm running on my home server, the 
hibernate is a pretty stripped down version, and the stripped is close to 
a minimum (including turning off printk and BUG()). All three are with all 
drivers built-in, no module support.

this is on a amd64 64 bit system

configs are available if anyone cares, the point is how much smaller a 
kernel could be if it doesn't need all the stuff that you put in your main 
kernel. In my case this includes not enabling the 3-ware card that holds 
my 12-disk raid array, instead the hibernate image would be stored on one 
of the scsi drives attached to the adaptec 78xx card.

I expect that on a normal desktop/laptop with more features (like sound) 
the savings could be even more significant

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                       ` <Pine.LNX.4.64.0707140128210.25614@asgard.lang.hm>
@ 2007-07-14  9:24                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-14  9:24 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Saturday, 14 July 2007 10:33, david@lang.hm wrote:
> by the way, a data point on kernel sizes
> 
> -rw-r--r-- 1 root root  864648 Jul 14 00:53 vmlinuz.2.6.22.1.hibernate
> -rw-r--r-- 1 root root  659496 Jul 14 01:17 vmlinuz.2.6.22.1.hibernate.stripped
> -rw-r--r-- 1 root root 3948168 Jul 14 01:10 vmlinuz.2.6.22.1.running
> 
> the running one matches the config I'm running on my home server, the 
> hibernate is a pretty stripped down version, and the stripped is close to 
> a minimum (including turning off printk and BUG()). All three are with all 
> drivers built-in, no module support.
> 
> this is on a amd64 64 bit system
> 
> configs are available if anyone cares, the point is how much smaller a 
> kernel could be if it doesn't need all the stuff that you put in your main 
> kernel. In my case this includes not enabling the 3-ware card that holds 
> my 12-disk raid array, instead the hibernate image would be stored on one 
> of the scsi drives attached to the adaptec 78xx card.
> 
> I expect that on a normal desktop/laptop with more features (like sound) 
> the savings could be even more significant

But the kernel needs some data to work too (a 'struct page' for each memory
page etc.).

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                 ` <20070714004517.GA18336@nineveh.local>
@ 2007-07-14  9:48                   ` Rafael J. Wysocki
       [not found]                   ` <200707141148.18279.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-14  9:48 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

On Saturday, 14 July 2007 02:45, Joseph Fannin wrote:
> On Fri, Jul 13, 2007 at 11:30:50AM +0200, Rafael J. Wysocki wrote:
> > On Friday, 13 July 2007 07:42, Joseph Fannin wrote:
> > > On Thu, Jul 12, 2007 at 08:06:43PM -0700, david@lang.hm wrote:
> > > > On Thu, 12 Jul 2007, Rafael J. Wysocki wrote:
> > >
> > > > > Plus we need to figure out how to avoid corrupting filesystems and
> > > > > swap in use by the "old" kernel and its processes (hint: a separate
> > > > > "hibernation partition" is a no-go).
> > > >
> > > > I thought the existing hibernation wrote to the swap partition as it's
> > > > dedicated space?
> > > >
> > > > I didn't know that anyone was suggesting writing the hibernation image to
> > > > a filesystem that the kernel was activly accessing.
> > >
> > > I'm suggesting a dedicated, preallocated hibernation *file*, right
> > > now.  There's no way around it, if hibernation is to be reliable --
> > > otherwise hibernation can fail if the system has used enough of its
> > > swap space, so that there isn't enough room to write the hibernate
> > > image.
> > >
> > > Even if it's desirable to allow hibernation to fail if the system is
> > > too deep into swap, it's a moot point.
> >
> > If you're afraid of that, use a dedicated swap file.
> 
>     I don't understand what you mean.  A dedicated swap file for what?

Sorry, I should have been more precise.

For hibernation (ie. a swap file that you activate right befor the
hibernation).

Also tuxonice (formerly known as suspend2) allows you to use regular files
hibernation.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]           ` <1184392129.1898.69.camel@caritas-dev.intel.com>
@ 2007-07-14  9:59             ` Rafael J. Wysocki
  2007-07-14 10:55               ` Huang, Ying
       [not found]               ` <1184410554.1898.84.camel@caritas-dev.intel.com>
  0 siblings, 2 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-14  9:59 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Saturday, 14 July 2007 07:48, Huang, Ying wrote:
> On Fri, 2007-07-13 at 10:43 -0600, Eric W. Biederman wrote:
> > > Why a assembly stub is necessary? Is it not sufficient that just
> > > continue to complete a normal boot (hot add the reset of memory) or load
> > > the hibernated kernel (hibernated image) and jump to it?
> > 
> > I was thinking the assembly stub would be the small piece that jumps
> > to loaded hibernated kernel.  Quite possibly we could just get away
> > with providing no memory and just an entry point to kexec but it
> > makes sense to me to plan on running a couple of instructions.
> 
> Oh, I got it. In my patch, there is such assembly stub in
> arch/i386/kernel/kexec_jump.S. I think it is needed to restore basic CPU
> state and accommodate some position independent restore code (such as
> memory restore code).
> 
> > Actually the way the kexec infrastructure it might be reasonable to
> > just use sys_kexec_load to load the entire hibernated image.  Except
> > for the fact that sys_kexec_load requires the source pages to be
> > in the processes memory image the code shouldn't have the 50% of
> > memory limitation already.
> > 
> > If we can get that going we don't even need to restrict the first
> > kernels memory.  So it might just require teaching sys_kexec_load
> > how to steal process pages.  Anyway something to think about.
> 
> As for memory backupping and restoring during hibernating and resuming,
> I think a possible picture can be as follow:
> 
> Memory:
> 
>   Total memory: 512M
>   Memory used by hibernating/resuming kernel: 0~16M
> 
> 
> Hibernating process:
> 
>   1. Normal kernel running
>   2. Hibernating is triggered, sys_kexec_load is used to load
>      hibernating kernel and initramfs into memory. Then
>      sys_reboot(LINUX_REBOOT_CMD_KSPAWN) is invoked.
>   3. In sys_reboot, kexec_jump is called to save device/CPU state,
>      then relocate_kernel is called. kexec_jump and relocate_kernel
>      reside in individual page in 16M~512M.

OK

What's going to happen to devices at this point?

>   4. In relocate_kernel, 0~16M is backupped firstly, then the
>      hibernating kernel and initramfs is copied to 0~16M, after that,
>      the hibernating kernel is booted.
>   5. In hibernating kernel, the memory of normal kernel (it is in
>      16M~512M) is saved into a hibernation image through /dev/mem
>      and ELF header.

I don't think it can be _that_ simple:
(a) what about processes' memory
(b) what about areas that shouldn't be saved?

> Resume process:
> 
>   1. Resuming kernel is booted as a normal kernel, but the memory is
>      restricted to 0~16M.
>   2. Checking whether there is a effective hibernation image. If
>      there isn't, the memory of 16M~512M is hot added, and the normal
>      boot up process continues; If there is, a resuming process is
>      triggered.
>   3. sys_kexec_load is used to restore the memory state of hibernated
>      kernel. The sys_kexec_load works in crashdump way, that is, the
>      hibernation image is copied to destination location in 16M~512M
>      in sys_kexec_load instead of relocate_kernel. There is no half
>      of memory size restriction.
>   4. sys_reboot is called to trigger jumping back, which will jump back
>      to kexec_jump of hibernated kernel.
>   5. In kexec_jump of hibernated kernel, the memory of 0~16M is copied
>      back from the backup area in 16M~512M. The memory state of
>      hibernated kernel is restored totally. The CPU and device state
>      can be restored after that.

Well, I don't know why this needs to be that complicated.  We already have
code in the mainline that's able to load a large hibernation image into memory
and jump to the kernel being restored.  And it has _no_ 50% of RAM limitation,
this is the _saving_ part of the current code that this limitation comes from.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-14  9:59             ` Rafael J. Wysocki
@ 2007-07-14 10:55               ` Huang, Ying
       [not found]               ` <1184410554.1898.84.camel@caritas-dev.intel.com>
  1 sibling, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-14 10:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Sat, 2007-07-14 at 11:59 +0200, Rafael J. Wysocki wrote:
> > Hibernating process:
> > 
> >   1. Normal kernel running
> >   2. Hibernating is triggered, sys_kexec_load is used to load
> >      hibernating kernel and initramfs into memory. Then
> >      sys_reboot(LINUX_REBOOT_CMD_KSPAWN) is invoked.
> >   3. In sys_reboot, kexec_jump is called to save device/CPU state,
> >      then relocate_kernel is called. kexec_jump and relocate_kernel
> >      reside in individual page in 16M~512M.
> 
> OK
> What's going to happen to devices at this point?
> 

The devices should be quiesced and the state of devices should be saved
in kexec_jump, before relocate_kernel is called. This needs the
implementation of device hibernating as you mentioned before.

> >   4. In relocate_kernel, 0~16M is backupped firstly, then the
> >      hibernating kernel and initramfs is copied to 0~16M, after that,
> >      the hibernating kernel is booted.
> >   5. In hibernating kernel, the memory of normal kernel (it is in
> >      16M~512M) is saved into a hibernation image through /dev/mem
> >      and ELF header.
> 
> I don't think it can be _that_ simple:
> (a) what about processes' memory
> (b) what about areas that shouldn't be saved?

The mem_map (struct page[]) of every zone of hibernated kernel is
checked.  Necessary pages are saved, like memory snapshot of software
suspend, but in user space.

> > Resume process:
> > 
> >   1. Resuming kernel is booted as a normal kernel, but the memory is
> >      restricted to 0~16M.
> >   2. Checking whether there is a effective hibernation image. If
> >      there isn't, the memory of 16M~512M is hot added, and the normal
> >      boot up process continues; If there is, a resuming process is
> >      triggered.
> >   3. sys_kexec_load is used to restore the memory state of hibernated
> >      kernel. The sys_kexec_load works in crashdump way, that is, the
> >      hibernation image is copied to destination location in 16M~512M
> >      in sys_kexec_load instead of relocate_kernel. There is no half
> >      of memory size restriction.
> >   4. sys_reboot is called to trigger jumping back, which will jump back
> >      to kexec_jump of hibernated kernel.
> >   5. In kexec_jump of hibernated kernel, the memory of 0~16M is copied
> >      back from the backup area in 16M~512M. The memory state of
> >      hibernated kernel is restored totally. The CPU and device state
> >      can be restored after that.
> 
> Well, I don't know why this needs to be that complicated.  We already have
> code in the mainline that's able to load a large hibernation image into memory
> and jump to the kernel being restored.  And it has _no_ 50% of RAM limitation,
> this is the _saving_ part of the current code that this limitation comes from.

There is much similarity between sys_kexec_load and software resuming.
If resuming can be done by sys_kexec_load, then we need not two similar
functionality in kernel.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]               ` <1184410554.1898.84.camel@caritas-dev.intel.com>
@ 2007-07-14 19:16                 ` Rafael J. Wysocki
       [not found]                 ` <200707142116.10237.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-14 19:16 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Saturday, 14 July 2007 12:55, Huang, Ying wrote:
> On Sat, 2007-07-14 at 11:59 +0200, Rafael J. Wysocki wrote:
> > > Hibernating process:
> > > 
> > >   1. Normal kernel running
> > >   2. Hibernating is triggered, sys_kexec_load is used to load
> > >      hibernating kernel and initramfs into memory. Then
> > >      sys_reboot(LINUX_REBOOT_CMD_KSPAWN) is invoked.
> > >   3. In sys_reboot, kexec_jump is called to save device/CPU state,
> > >      then relocate_kernel is called. kexec_jump and relocate_kernel
> > >      reside in individual page in 16M~512M.
> > 
> > OK
> > What's going to happen to devices at this point?
> > 
> 
> The devices should be quiesced and the state of devices should be saved
> in kexec_jump, before relocate_kernel is called. This needs the
> implementation of device hibernating as you mentioned before.

Hmm, at which point devices are normally shut down when kexec is used?

> > >   4. In relocate_kernel, 0~16M is backupped firstly, then the
> > >      hibernating kernel and initramfs is copied to 0~16M, after that,
> > >      the hibernating kernel is booted.
> > >   5. In hibernating kernel, the memory of normal kernel (it is in
> > >      16M~512M) is saved into a hibernation image through /dev/mem
> > >      and ELF header.
> > 
> > I don't think it can be _that_ simple:
> > (a) what about processes' memory
> > (b) what about areas that shouldn't be saved?
> 
> The mem_map (struct page[]) of every zone of hibernated kernel is
> checked.  Necessary pages are saved, like memory snapshot of software
> suspend, but in user space.

Well, it's not enough to check that, sorry.  That's why we have
register_nosave_region().

> > > Resume process:
> > > 
> > >   1. Resuming kernel is booted as a normal kernel, but the memory is
> > >      restricted to 0~16M.
> > >   2. Checking whether there is a effective hibernation image. If
> > >      there isn't, the memory of 16M~512M is hot added, and the normal
> > >      boot up process continues; If there is, a resuming process is
> > >      triggered.
> > >   3. sys_kexec_load is used to restore the memory state of hibernated
> > >      kernel. The sys_kexec_load works in crashdump way, that is, the
> > >      hibernation image is copied to destination location in 16M~512M
> > >      in sys_kexec_load instead of relocate_kernel. There is no half
> > >      of memory size restriction.
> > >   4. sys_reboot is called to trigger jumping back, which will jump back
> > >      to kexec_jump of hibernated kernel.
> > >   5. In kexec_jump of hibernated kernel, the memory of 0~16M is copied
> > >      back from the backup area in 16M~512M. The memory state of
> > >      hibernated kernel is restored totally. The CPU and device state
> > >      can be restored after that.
> > 
> > Well, I don't know why this needs to be that complicated.  We already have
> > code in the mainline that's able to load a large hibernation image into memory
> > and jump to the kernel being restored.  And it has _no_ 50% of RAM limitation,
> > this is the _saving_ part of the current code that this limitation comes from.
> 
> There is much similarity between sys_kexec_load and software resuming.
> If resuming can be done by sys_kexec_load, then we need not two similar
> functionality in kernel.

Oh, I see, but your proposed solution seems to be more complicated than that.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                     ` <Pine.LNX.4.64.0707140017560.25614@asgard.lang.hm>
  2007-07-14  8:33                       ` david
       [not found]                       ` <Pine.LNX.4.64.0707140128210.25614@asgard.lang.hm>
@ 2007-07-14 20:00                       ` Rafael J. Wysocki
  2007-07-14 20:34                         ` david
                                           ` (2 more replies)
  2 siblings, 3 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-14 20:00 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Saturday, 14 July 2007 09:51, david@lang.hm wrote:
> On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:
> 
> >> Ok, now we need a data channel from the old kernel to the hibernate
> >> kernel, to the restore kernel. and the messier the memory layout the
> >> larger this data channel needs to be (hmm, what's the status on the memory
> >> defrag patches being proposed?) if this list can be made small enough it
> >> would work to just have the old kernel put the data in a known location in
> >> ram, and let the other two parts find it (in ram for the hibernate kernel,
> >> in the hibernate image for the wakeup kernel).
> >
> > I think the hibernation kernel should mmap() the "old" kernel's (and it's
> > processes') memory available for saving, so that the image-saving process
> > can read its contents from the original locations.
> 
> but I'll bet that not all kernels keep the info in the same place (and 
> probably not even in the same format). I'm suggesting that a standard be 
> defined for the format of the data and the location of a pointer to it 
> that will be maintained across kernel versions.

Yes.

The image-saving kernel needs to have access to the hibernated kernel's
pages data, plus some additional information that should be passed in a
standard format.

> >> how do the existing hibernate processes store this?
> >
> > There are two approaches.  In the first of them (used in the mainline) we just
> > create copies of all pages eligible for saving (hence we can't create images
> > larger than 50% of RAM) atomically and then we save the contents of these
> > copies (either directly from the kernel or through a user space process).  This
> > way we don't need to worry that they may be modified before we can save
> > them.
> >
> > The other approach is the Nigel's one, in which all LRU pages are first saved
> > and then used as additional storage for copying the rest of memory contents.
> > This has a drawback that we are not 100% sure if the LRU won't be modified
> > after we've used them to store the copies of the other pages.
> 
> and since both current approaches use the same kernel for everything the 
> issues I'm thinking of simply don't apply.
> 
> >> since people are complaining about the amount of ram that a kexec kernel
> >> would take up I'm assuiming it's somethingmore complex then just a bitmap
> >> of all possible pages.
> >
> > No, it's just bitmaps, AFAICS, and the complaints are a bit overstated, IMO. ;-)
> 
> 1 bit for each 4k means 1m bits for 4g of ram, or 128k of bitmaps, growing 
> up to 1m of ram used for 32G of ram in the system. I guess this isn't bad 
> as long as it doesn't need to be contiguous for the new kernel to access 
> it.
> 
> ok, that makes it a pretty trivial thing to work with. I just need to 
> learn how to find the bitmaps.

They are created on the fly before the hibernation.  The format is described in
kernel/power/snapshot.c .

> >> most of the conversation so far has been around the process of makeing the
> >> snapshot and storing it. what are the processes and tools available to
> >> restore images?
> >
> > We have quite an efficient restoration code in the kernel right now.  It's
> > able to upload big images (something like total RAM minus the size of the
> > boot kernel, initrd and, optionally, the resume application), which is much
> > more than we're able to save. :-)
> >
> > It can work with images uploaded via /dev/snapshot from the user space
> > (specific image format is required, but that can be changed easily).
> 
> Ok, so it sounds as if the restore is basicly a solved problem, great!
> 
> so now I want to do a but-ugly hibernate configuration.
> 
> while I haven't done it yet I am very confident that I can enable kexec 
> and create a kernel and filesystem to boot into. I can then use perl to 
> get the bitmaps out of kmem (once I learn how to find them), and then can 
> read from kmem to a file to save the 4k chunks that I need to a file.
> 
> it'll be ugly and use lots of partitions, but it should end up being solid 
> from what I am understanding.
> 
> where can I learn how to find the bitmaps and what format things need to 
> be in to feed into /dev/snapshot?

Well, see above.

BTW, please read this message and tell me what you think:

http://lkml.org/lkml/2007/7/13/265

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-14 20:00                       ` Rafael J. Wysocki
@ 2007-07-14 20:34                         ` david
       [not found]                         ` <Pine.LNX.4.64.0707141257290.14672@asgard.lang.hm>
  2007-07-14 21:34                         ` david
  2 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-14 20:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

in the past, Rafael J. Wysocki wrote:

> BTW, please read this message and tell me what you think:
>
> http://lkml.org/lkml/2007/7/13/265
>
> Greetings,
> Rafael
>
>
>

since I've deleted this message here's the relavent portion of it

>Okay, I have thought it through and I think that, as an initial step, we
>can do something like this:
>
>- preload the image-saving kernel before hibernation
>- in the hibernation code path replace device_suspend() with the shutting 
>down of all devices without unregistering them (not very nice, but should be 
>sufficient for a while)
>- when we've called device_power_down() and save_processor_state(), jump 
>to the image-saving kernel and let it run
>- make the image-saving kernel set up everything, save the image without
>  starting any user space (we may use the existing image-saving code for 
>this purpose, with some modifications) and power off the system (or make it 
>enter S4)
>- use the existing restoration code to load the image and jump to the
>  hibernated kernel
>- in the restore code patch replace device_resume() with the reprobing of 
>all devices.
>
>Comments?

I think this is far more complicated then it needs to be.

it sounds like it should be possible to do the following

1. figure out what pages should be backed up (creating a data structure to 
hold them)

2. kexec into the hibernate kernel (this step handles all device 
transitions today)

3. have the hibernate userspace find the data structures created in step 
#1

4. have the hibernate userspace write the pages somewhere in the suspend 
format.

5. have the hibernate kernel power down the box.

the only things here that sounds like they're not available in stock 
kernels are steps #1 and #3.

now this won't do the fancier suspend-to-ram-and-disk and it won't let you 
go back from the hibernate kernel to the main kernel, but it should be 
enough to let you do the suspend safely and reliably.

for the restore, as I understand it the process is

1. boot a kernel, any working kernel.

2. read the suspend formatted data from wherever it was saved and feed it 
to /dev/suspend

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                         ` <Pine.LNX.4.64.0707141257290.14672@asgard.lang.hm>
@ 2007-07-14 21:06                           ` Rafael J. Wysocki
       [not found]                           ` <200707142306.33783.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-14 21:06 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Saturday, 14 July 2007 22:34, david@lang.hm wrote:
> in the past, Rafael J. Wysocki wrote:
> 
> > BTW, please read this message and tell me what you think:
> >
> > http://lkml.org/lkml/2007/7/13/265
> >
> > Greetings,
> > Rafael
> >
> >
> >
> 
> since I've deleted this message here's the relavent portion of it
> 
> >Okay, I have thought it through and I think that, as an initial step, we
> >can do something like this:
> >
> >- preload the image-saving kernel before hibernation
> >- in the hibernation code path replace device_suspend() with the shutting 
> >down of all devices without unregistering them (not very nice, but should be 
> >sufficient for a while)
> >- when we've called device_power_down() and save_processor_state(), jump 
> >to the image-saving kernel and let it run
> >- make the image-saving kernel set up everything, save the image without
> >  starting any user space (we may use the existing image-saving code for 
> >this purpose, with some modifications) and power off the system (or make it 
> >enter S4)
> >- use the existing restoration code to load the image and jump to the
> >  hibernated kernel
> >- in the restore code patch replace device_resume() with the reprobing of 
> >all devices.
> >
> >Comments?
> 
> I think this is far more complicated then it needs to be.
> 
> it sounds like it should be possible to do the following
> 
> 1. figure out what pages should be backed up (creating a data structure to 
> hold them)

That should be done after step 2, because the memory contents can change
in this step.

> 2. kexec into the hibernate kernel (this step handles all device 
> transitions today)
> 
> 3. have the hibernate userspace find the data structures created in step 
> #1
> 
> 4. have the hibernate userspace write the pages somewhere in the suspend 
> format.

You don't need to run any hibernate userspace to carry out steps 3 and 4.
 
> 5. have the hibernate kernel power down the box.
> 
> the only things here that sounds like they're not available in stock 
> kernels are steps #1 and #3.

Correct, up to the first remark above.

> now this won't do the fancier suspend-to-ram-and-disk and it won't let you 
> go back from the hibernate kernel to the main kernel, but it should be 
> enough to let you do the suspend safely and reliably.
> 
> for the restore, as I understand it the process is
> 
> 1. boot a kernel, any working kernel.
> 
> 2. read the suspend formatted data from wherever it was saved and feed it 
> to /dev/suspend

Yes, something like this, but you really should pay more attention to devices.

There are things that you shouldn't do to them (like unregistering), because
some processes may be using them while we're trying to hibernate (for now
we have the freezer, but I though you'd like to do all that to eliminate it).

Generally, you need to ensure that the devices are handled in consistent ways
by both the hibernated and image-saving kernels and that's a big piece of the
jigsaw that's missing now.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                           ` <200707142306.33783.rjw@sisk.pl>
@ 2007-07-14 21:13                             ` david
  2007-07-15 10:31                               ` Rafael J. Wysocki
       [not found]                               ` <200707151231.27410.rjw@sisk.pl>
  0 siblings, 2 replies; 113+ messages in thread
From: david @ 2007-07-14 21:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Sat, 14 Jul 2007, Rafael J. Wysocki wrote:

> On Saturday, 14 July 2007 22:34, david@lang.hm wrote:
>> in the past, Rafael J. Wysocki wrote:
>>
>>> BTW, please read this message and tell me what you think:
>>>
>>> http://lkml.org/lkml/2007/7/13/265
>>>
>>> Greetings,
>>> Rafael
>>>
>>>
>>>
>>
>> since I've deleted this message here's the relavent portion of it
>>
>>> Okay, I have thought it through and I think that, as an initial step, we
>>> can do something like this:
>>>
>>> - preload the image-saving kernel before hibernation
>>> - in the hibernation code path replace device_suspend() with the shutting
>>> down of all devices without unregistering them (not very nice, but should be
>>> sufficient for a while)
>>> - when we've called device_power_down() and save_processor_state(), jump
>>> to the image-saving kernel and let it run
>>> - make the image-saving kernel set up everything, save the image without
>>>  starting any user space (we may use the existing image-saving code for
>>> this purpose, with some modifications) and power off the system (or make it
>>> enter S4)
>>> - use the existing restoration code to load the image and jump to the
>>>  hibernated kernel
>>> - in the restore code patch replace device_resume() with the reprobing of
>>> all devices.
>>>
>>> Comments?
>>
>> I think this is far more complicated then it needs to be.
>>
>> it sounds like it should be possible to do the following
>>
>> 1. figure out what pages should be backed up (creating a data structure to
>> hold them)
>
> That should be done after step 2, because the memory contents can change
> in this step.

no, this needs to be done by the main kernel, becouse only it knows how to 
find this info. the kernel that you kexec into could be very different 
(including different versions) and the ways to identify this data is not 
part of any existing API

>> 2. kexec into the hibernate kernel (this step handles all device
>> transitions today)
>>
>> 3. have the hibernate userspace find the data structures created in step
>> #1
>>
>> 4. have the hibernate userspace write the pages somewhere in the suspend
>> format.
>
> You don't need to run any hibernate userspace to carry out steps 3 and 4.

you should though.

by doing this write in usespace you can add in all the eye-candy (aka 
progress bars), network I/O, etc that you want since it doesn't affect 
things

trying to do this in the kernel makes the kernel have to know/decide too 
much policy (and many things that people want to do are things that do not 
belong in the kernel in the first place)

>> 5. have the hibernate kernel power down the box.
>>
>> the only things here that sounds like they're not available in stock
>> kernels are steps #1 and #3.
>
> Correct, up to the first remark above.
>
>> now this won't do the fancier suspend-to-ram-and-disk and it won't let you
>> go back from the hibernate kernel to the main kernel, but it should be
>> enough to let you do the suspend safely and reliably.
>>
>> for the restore, as I understand it the process is
>>
>> 1. boot a kernel, any working kernel.
>>
>> 2. read the suspend formatted data from wherever it was saved and feed it
>> to /dev/suspend
>
> Yes, something like this, but you really should pay more attention to devices.
>
> There are things that you shouldn't do to them (like unregistering), because
> some processes may be using them while we're trying to hibernate (for now
> we have the freezer, but I though you'd like to do all that to eliminate it).
>
> Generally, you need to ensure that the devices are handled in consistent ways
> by both the hibernated and image-saving kernels and that's a big piece of the
> jigsaw that's missing now.

the kexec process should handle the device state for the transition from 
one kernel to another (it has to do this now, this isn't a new 
requirement), so this should solve the problem during the hibernate stage.

during the wakeup stage, I thought you said that al that was needed was to 
feed the suspend image to /dev/suspend and the kernel in the suspend image 
would re-probe, or otherwise re-initialize all the devices it needs. am I 
misunderstanding this?

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-14 20:00                       ` Rafael J. Wysocki
  2007-07-14 20:34                         ` david
       [not found]                         ` <Pine.LNX.4.64.0707141257290.14672@asgard.lang.hm>
@ 2007-07-14 21:34                         ` david
  2007-07-15 10:39                           ` Rafael J. Wysocki
       [not found]                           ` <200707151239.28400.rjw@sisk.pl>
  2 siblings, 2 replies; 113+ messages in thread
From: david @ 2007-07-14 21:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Sat, 14 Jul 2007, Rafael J. Wysocki wrote:

> On Saturday, 14 July 2007 09:51, david@lang.hm wrote:
>> On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:
>>
>>>> Ok, now we need a data channel from the old kernel to the hibernate
>>>> kernel, to the restore kernel. and the messier the memory layout the
>>>> larger this data channel needs to be (hmm, what's the status on the memory
>>>> defrag patches being proposed?) if this list can be made small enough it
>>>> would work to just have the old kernel put the data in a known location in
>>>> ram, and let the other two parts find it (in ram for the hibernate kernel,
>>>> in the hibernate image for the wakeup kernel).
>>>
>>> I think the hibernation kernel should mmap() the "old" kernel's (and it's
>>> processes') memory available for saving, so that the image-saving process
>>> can read its contents from the original locations.
>>
>> but I'll bet that not all kernels keep the info in the same place (and
>> probably not even in the same format). I'm suggesting that a standard be
>> defined for the format of the data and the location of a pointer to it
>> that will be maintained across kernel versions.
>
> Yes.
>
> The image-saving kernel needs to have access to the hibernated kernel's
> pages data, plus some additional information that should be passed in a
> standard format.

but per stable-abi-nonsense the internal structure of the kernel's pages 
data isn't an abi. so instead of figuring this out by pokeing around in 
the memory of the old kernel and deciding what should be saved and what 
shouldn't, the old kernel (which understands the memory structure) should 
create a simple map of what should be backed up (either a bitmap or an 
extent-style map, depending on how many holes there are expected to be) 
and then provide that map to the new kernel. the new kernel (or more 
precisely it's userspace) reads the pages it's told to read and writes 
them somewhere.

>>>> since people are complaining about the amount of ram that a kexec kernel
>>>> would take up I'm assuiming it's somethingmore complex then just a bitmap
>>>> of all possible pages.
>>>
>>> No, it's just bitmaps, AFAICS, and the complaints are a bit overstated, IMO. ;-)
>>
>> 1 bit for each 4k means 1m bits for 4g of ram, or 128k of bitmaps, growing
>> up to 1m of ram used for 32G of ram in the system. I guess this isn't bad
>> as long as it doesn't need to be contiguous for the new kernel to access
>> it.
>>
>> ok, that makes it a pretty trivial thing to work with. I just need to
>> learn how to find the bitmaps.
>
> They are created on the fly before the hibernation.  The format is described in
> kernel/power/snapshot.c .

I'll look through this file, but the format of this is an abi/api to the 
userspace and should be documented outside of the code.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                 ` <200707142116.10237.rjw@sisk.pl>
@ 2007-07-15  9:30                   ` Huang, Ying
       [not found]                   ` <1184491804.1898.121.camel@caritas-dev.intel.com>
  1 sibling, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-15  9:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Sat, 2007-07-14 at 21:16 +0200, Rafael J. Wysocki wrote:
> > The devices should be quiesced and the state of devices should be saved
> > in kexec_jump, before relocate_kernel is called. This needs the
> > implementation of device hibernating as you mentioned before.
> 
> Hmm, at which point devices are normally shut down when kexec is used?

I think putting devices in quiescent state (not in low power state) is
sufficient for booting a new kernel with kexec, is it? According to my
experiment, the new kernel can be booted with kexec if the .suspend
method the drivers is called before kexec (given CONFIG_ACPI is not
selected).

Do we need a device quiesce/save + device shutdown for kexeced kernel to
work? I don't think so.

> > > >   4. In relocate_kernel, 0~16M is backupped firstly, then the
> > > >      hibernating kernel and initramfs is copied to 0~16M, after that,
> > > >      the hibernating kernel is booted.
> > > >   5. In hibernating kernel, the memory of normal kernel (it is in
> > > >      16M~512M) is saved into a hibernation image through /dev/mem
> > > >      and ELF header.
> > > 
> > > I don't think it can be _that_ simple:
> > > (a) what about processes' memory
> > > (b) what about areas that shouldn't be saved?
> > 
> > The mem_map (struct page[]) of every zone of hibernated kernel is
> > checked.  Necessary pages are saved, like memory snapshot of software
> > suspend, but in user space.
> 
> Well, it's not enough to check that, sorry.  That's why we have
> register_nosave_region().

After some investigation, I found the usage of "nosave" is as follow on
i386:

1. __nosavedata
   used only for global variable in_suspend and swsusp_pg_dir
2. PG_nosave page flags
   used for snapshot itself

Both are not necessary for kexec based hibernation. Because the image
are written from a different kernel, the memory of hibernating kernel
will not be saved, they can be used freely during image writing/reading.

On x86_64, there is another usage of nosave during processing E820
memory map. But I don't know why the memory region other than E820_RAM
are marked as nosave. I think only the memory region of type E820_RAM
will be thought of normal memory, others will be thought as reserved. Is
it sufficient just to check whether the page is reserved?

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-14 21:13                             ` david
@ 2007-07-15 10:31                               ` Rafael J. Wysocki
       [not found]                               ` <200707151231.27410.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-15 10:31 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Saturday, 14 July 2007 23:13, david@lang.hm wrote:
> On Sat, 14 Jul 2007, Rafael J. Wysocki wrote:
> 
> > On Saturday, 14 July 2007 22:34, david@lang.hm wrote:
> >> in the past, Rafael J. Wysocki wrote:
> >>
> >>> BTW, please read this message and tell me what you think:
> >>>
> >>> http://lkml.org/lkml/2007/7/13/265
> >>>
> >>> Greetings,
> >>> Rafael
> >>>
> >>>
> >>>
> >>
> >> since I've deleted this message here's the relavent portion of it
> >>
> >>> Okay, I have thought it through and I think that, as an initial step, we
> >>> can do something like this:
> >>>
> >>> - preload the image-saving kernel before hibernation
> >>> - in the hibernation code path replace device_suspend() with the shutting
> >>> down of all devices without unregistering them (not very nice, but should be
> >>> sufficient for a while)
> >>> - when we've called device_power_down() and save_processor_state(), jump
> >>> to the image-saving kernel and let it run
> >>> - make the image-saving kernel set up everything, save the image without
> >>>  starting any user space (we may use the existing image-saving code for
> >>> this purpose, with some modifications) and power off the system (or make it
> >>> enter S4)
> >>> - use the existing restoration code to load the image and jump to the
> >>>  hibernated kernel
> >>> - in the restore code patch replace device_resume() with the reprobing of
> >>> all devices.
> >>>
> >>> Comments?
> >>
> >> I think this is far more complicated then it needs to be.
> >>
> >> it sounds like it should be possible to do the following
> >>
> >> 1. figure out what pages should be backed up (creating a data structure to
> >> hold them)
> >
> > That should be done after step 2, because the memory contents can change
> > in this step.
> 
> no, this needs to be done by the main kernel, becouse only it knows how to 
> find this info. the kernel that you kexec into could be very different 
> (including different versions) and the ways to identify this data is not 
> part of any existing API

If the memory contents changes in step 2, then the information collected by
the main kernel will be inaccurate.

> >> 2. kexec into the hibernate kernel (this step handles all device
> >> transitions today)
> >>
> >> 3. have the hibernate userspace find the data structures created in step
> >> #1
> >>
> >> 4. have the hibernate userspace write the pages somewhere in the suspend
> >> format.
> >
> > You don't need to run any hibernate userspace to carry out steps 3 and 4.
> 
> you should though.
> 
> by doing this write in usespace you can add in all the eye-candy (aka 
> progress bars), network I/O, etc that you want since it doesn't affect 
> things
> 
> trying to do this in the kernel makes the kernel have to know/decide too 
> much policy (and many things that people want to do are things that do not 
> belong in the kernel in the first place)

Please don't tell me.  I've written uswsusp on the basis of these arguments,
but I don't cosider it as an overwhelming success ...

> >> 5. have the hibernate kernel power down the box.
> >>
> >> the only things here that sounds like they're not available in stock
> >> kernels are steps #1 and #3.
> >
> > Correct, up to the first remark above.
> >
> >> now this won't do the fancier suspend-to-ram-and-disk and it won't let you
> >> go back from the hibernate kernel to the main kernel, but it should be
> >> enough to let you do the suspend safely and reliably.
> >>
> >> for the restore, as I understand it the process is
> >>
> >> 1. boot a kernel, any working kernel.
> >>
> >> 2. read the suspend formatted data from wherever it was saved and feed it
> >> to /dev/suspend
> >
> > Yes, something like this, but you really should pay more attention to devices.
> >
> > There are things that you shouldn't do to them (like unregistering), because
> > some processes may be using them while we're trying to hibernate (for now
> > we have the freezer, but I though you'd like to do all that to eliminate it).
> >
> > Generally, you need to ensure that the devices are handled in consistent ways
> > by both the hibernated and image-saving kernels and that's a big piece of the
> > jigsaw that's missing now.
> 
> the kexec process should handle the device state for the transition from 
> one kernel to another (it has to do this now, this isn't a new 
> requirement), so this should solve the problem during the hibernate stage.

Well, I don't think so.

> during the wakeup stage, I thought you said that al that was needed was to 
> feed the suspend image to /dev/suspend and the kernel in the suspend image 
> would re-probe, or otherwise re-initialize all the devices it needs. am I 
> misunderstanding this?

Perhaps.  Currently, the hibernated kernel will run device_resume() after
the restore, which is not exactly compatible with what kexec does.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-14 21:34                         ` david
@ 2007-07-15 10:39                           ` Rafael J. Wysocki
       [not found]                           ` <200707151239.28400.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-15 10:39 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Saturday, 14 July 2007 23:34, david@lang.hm wrote:
> On Sat, 14 Jul 2007, Rafael J. Wysocki wrote:
> 
> > On Saturday, 14 July 2007 09:51, david@lang.hm wrote:
> >> On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:
> >>
> >>>> Ok, now we need a data channel from the old kernel to the hibernate
> >>>> kernel, to the restore kernel. and the messier the memory layout the
> >>>> larger this data channel needs to be (hmm, what's the status on the memory
> >>>> defrag patches being proposed?) if this list can be made small enough it
> >>>> would work to just have the old kernel put the data in a known location in
> >>>> ram, and let the other two parts find it (in ram for the hibernate kernel,
> >>>> in the hibernate image for the wakeup kernel).
> >>>
> >>> I think the hibernation kernel should mmap() the "old" kernel's (and it's
> >>> processes') memory available for saving, so that the image-saving process
> >>> can read its contents from the original locations.
> >>
> >> but I'll bet that not all kernels keep the info in the same place (and
> >> probably not even in the same format). I'm suggesting that a standard be
> >> defined for the format of the data and the location of a pointer to it
> >> that will be maintained across kernel versions.
> >
> > Yes.
> >
> > The image-saving kernel needs to have access to the hibernated kernel's
> > pages data, plus some additional information that should be passed in a
> > standard format.
> 
> but per stable-abi-nonsense the internal structure of the kernel's pages 
> data isn't an abi. so instead of figuring this out by pokeing around in 
> the memory of the old kernel and deciding what should be saved and what 
> shouldn't, the old kernel (which understands the memory structure) should 
> create a simple map of what should be backed up (either a bitmap or an 
> extent-style map, depending on how many holes there are expected to be) 
> and then provide that map to the new kernel. the new kernel (or more 
> precisely it's userspace) reads the pages it's told to read and writes 
> them somewhere.

That's reasonable, but the "old" kernel can only do this after handling the
shut down/quiescing of devices, when there is 100% guarantee that the memory
contents will not change.
 
> >>>> since people are complaining about the amount of ram that a kexec kernel
> >>>> would take up I'm assuiming it's somethingmore complex then just a bitmap
> >>>> of all possible pages.
> >>>
> >>> No, it's just bitmaps, AFAICS, and the complaints are a bit overstated, IMO. ;-)
> >>
> >> 1 bit for each 4k means 1m bits for 4g of ram, or 128k of bitmaps, growing
> >> up to 1m of ram used for 32G of ram in the system. I guess this isn't bad
> >> as long as it doesn't need to be contiguous for the new kernel to access
> >> it.
> >>
> >> ok, that makes it a pretty trivial thing to work with. I just need to
> >> learn how to find the bitmaps.
> >
> > They are created on the fly before the hibernation.  The format is described in
> > kernel/power/snapshot.c .
> 
> I'll look through this file, but the format of this is an abi/api to the 
> userspace and should be documented outside of the code.

Nope.  The user space need not know anything about the image contents.

The current implementation of the user space tools use the knowledge of the
image header format, which is given by 'struct swsusp_info', defined in
kernel/power/power.h .

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                   ` <1184491804.1898.121.camel@caritas-dev.intel.com>
@ 2007-07-15 10:49                     ` Rafael J. Wysocki
  2007-07-17  8:13                     ` david
       [not found]                     ` <Pine.LNX.4.64.0707170101010.19248@asgard.lang.hm>
  2 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-15 10:49 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Sunday, 15 July 2007 11:30, Huang, Ying wrote:
> On Sat, 2007-07-14 at 21:16 +0200, Rafael J. Wysocki wrote:
> > > The devices should be quiesced and the state of devices should be saved
> > > in kexec_jump, before relocate_kernel is called. This needs the
> > > implementation of device hibernating as you mentioned before.
> > 
> > Hmm, at which point devices are normally shut down when kexec is used?
> 
> I think putting devices in quiescent state (not in low power state) is
> sufficient for booting a new kernel with kexec, is it? According to my
> experiment, the new kernel can be booted with kexec if the .suspend
> method the drivers is called before kexec (given CONFIG_ACPI is not
> selected).

Well, this illustrates the problem.  With ACPI, the devices are suspended
and without it their kind of quiesced.

Generally, we need to make them be quiesced with or without ACPI.  IOW,
the per-driver callbacks used before hibernation should be different from those
used before the suspend (to RAM and similar).
 
> Do we need a device quiesce/save + device shutdown for kexeced kernel to
> work? I don't think so.

No, we don't.

Still, my question was related to how kexec _normally_ handles devices.  Are
they shut down or they are just left in the state in which they were before?

I assume that kexec loads a new kernel into memory and then passes control
to it, but I think the new kernel needs to set up devices for itself.  I assume
that this is done in a usual way, ie. devices are detected, registered,
initialized, etc.  So, my question is if kexec prepares devices for that in any
way.

> > > > >   4. In relocate_kernel, 0~16M is backupped firstly, then the
> > > > >      hibernating kernel and initramfs is copied to 0~16M, after that,
> > > > >      the hibernating kernel is booted.
> > > > >   5. In hibernating kernel, the memory of normal kernel (it is in
> > > > >      16M~512M) is saved into a hibernation image through /dev/mem
> > > > >      and ELF header.
> > > > 
> > > > I don't think it can be _that_ simple:
> > > > (a) what about processes' memory
> > > > (b) what about areas that shouldn't be saved?
> > > 
> > > The mem_map (struct page[]) of every zone of hibernated kernel is
> > > checked.  Necessary pages are saved, like memory snapshot of software
> > > suspend, but in user space.
> > 
> > Well, it's not enough to check that, sorry.  That's why we have
> > register_nosave_region().
> 
> After some investigation, I found the usage of "nosave" is as follow on
> i386:
> 
> 1. __nosavedata
>    used only for global variable in_suspend and swsusp_pg_dir
> 2. PG_nosave page flags
>    used for snapshot itself

We don't use PG_nosave flags any more at all.

> Both are not necessary for kexec based hibernation. Because the image
> are written from a different kernel, the memory of hibernating kernel
> will not be saved, they can be used freely during image writing/reading.

This is not the point.  There are memory regions that you should not _restore_,
because that will cause harm.

> On x86_64, there is another usage of nosave during processing E820
> memory map. But I don't know why the memory region other than E820_RAM
> are marked as nosave. I think only the memory region of type E820_RAM
> will be thought of normal memory, others will be thought as reserved. Is
> it sufficient just to check whether the page is reserved?

No, it's not.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                               ` <200707151231.27410.rjw@sisk.pl>
@ 2007-07-15 19:23                                 ` david
  2007-07-15 22:59                                   ` Rafael J. Wysocki
       [not found]                                   ` <200707160059.08277.rjw@sisk.pl>
  0 siblings, 2 replies; 113+ messages in thread
From: david @ 2007-07-15 19:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Sun, 15 Jul 2007, Rafael J. Wysocki wrote:

>>>> I think this is far more complicated then it needs to be.
>>>>
>>>> it sounds like it should be possible to do the following
>>>>
>>>> 1. figure out what pages should be backed up (creating a data structure to
>>>> hold them)
>>>
>>> That should be done after step 2, because the memory contents can change
>>> in this step.
>>
>> no, this needs to be done by the main kernel, becouse only it knows how to
>> find this info. the kernel that you kexec into could be very different
>> (including different versions) and the ways to identify this data is not
>> part of any existing API
>
> If the memory contents changes in step 2, then the information collected by
> the main kernel will be inaccurate.

why would the memory use change when the new kernel is run? is it becouse 
of whatever it does to the devices for the hand-off?

>>>> 2. kexec into the hibernate kernel (this step handles all device
>>>> transitions today)
>>>>
>>>> 3. have the hibernate userspace find the data structures created in step
>>>> #1
>>>>
>>>> 4. have the hibernate userspace write the pages somewhere in the suspend
>>>> format.
>>>
>>> You don't need to run any hibernate userspace to carry out steps 3 and 4.
>>
>> you should though.
>>
>> by doing this write in usespace you can add in all the eye-candy (aka
>> progress bars), network I/O, etc that you want since it doesn't affect
>> things
>>
>> trying to do this in the kernel makes the kernel have to know/decide too
>> much policy (and many things that people want to do are things that do not
>> belong in the kernel in the first place)
>
> Please don't tell me.  I've written uswsusp on the basis of these arguments,
> but I don't cosider it as an overwhelming success ...

uswsusp has the problem that you are trying to run some userspace, but not 
other userspace as you are doing the hibernate. with this approach you 
don't have that problem since you are running a completely seperate 
userspace.

>>>> 5. have the hibernate kernel power down the box.
>>>>
>>>> the only things here that sounds like they're not available in stock
>>>> kernels are steps #1 and #3.
>>>
>>> Correct, up to the first remark above.
>>>
>>>> now this won't do the fancier suspend-to-ram-and-disk and it won't let you
>>>> go back from the hibernate kernel to the main kernel, but it should be
>>>> enough to let you do the suspend safely and reliably.
>>>>
>>>> for the restore, as I understand it the process is
>>>>
>>>> 1. boot a kernel, any working kernel.
>>>>
>>>> 2. read the suspend formatted data from wherever it was saved and feed it
>>>> to /dev/suspend
>>>
>>> Yes, something like this, but you really should pay more attention to devices.
>>>
>>> There are things that you shouldn't do to them (like unregistering), because
>>> some processes may be using them while we're trying to hibernate (for now
>>> we have the freezer, but I though you'd like to do all that to eliminate it).
>>>
>>> Generally, you need to ensure that the devices are handled in consistent ways
>>> by both the hibernated and image-saving kernels and that's a big piece of the
>>> jigsaw that's missing now.
>>
>> the kexec process should handle the device state for the transition from
>> one kernel to another (it has to do this now, this isn't a new
>> requirement), so this should solve the problem during the hibernate stage.
>
> Well, I don't think so.

Ok, I could easily be misunderstanding something here (and the comments 
with the latest 'kexec back and forth' patch indicate there may still be 
work to do here after all)

>> during the wakeup stage, I thought you said that al that was needed was to
>> feed the suspend image to /dev/suspend and the kernel in the suspend image
>> would re-probe, or otherwise re-initialize all the devices it needs. am I
>> misunderstanding this?
>
> Perhaps.  Currently, the hibernated kernel will run device_resume() after
> the restore, which is not exactly compatible with what kexec does.

but kexec isn't needed during the restore process, is it?

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                           ` <200707151239.28400.rjw@sisk.pl>
@ 2007-07-15 19:33                             ` david
       [not found]                             ` <Pine.LNX.4.64.0707151224160.25614@asgard.lang.hm>
  1 sibling, 0 replies; 113+ messages in thread
From: david @ 2007-07-15 19:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Sun, 15 Jul 2007, Rafael J. Wysocki wrote:

> On Saturday, 14 July 2007 23:34, david@lang.hm wrote:
>> On Sat, 14 Jul 2007, Rafael J. Wysocki wrote:
>>
>>> On Saturday, 14 July 2007 09:51, david@lang.hm wrote:
>>>> On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:
>>>>
>>>>>> Ok, now we need a data channel from the old kernel to the hibernate
>>>>>> kernel, to the restore kernel. and the messier the memory layout the
>>>>>> larger this data channel needs to be (hmm, what's the status on the memory
>>>>>> defrag patches being proposed?) if this list can be made small enough it
>>>>>> would work to just have the old kernel put the data in a known location in
>>>>>> ram, and let the other two parts find it (in ram for the hibernate kernel,
>>>>>> in the hibernate image for the wakeup kernel).
>>>>>
>>>>> I think the hibernation kernel should mmap() the "old" kernel's (and it's
>>>>> processes') memory available for saving, so that the image-saving process
>>>>> can read its contents from the original locations.
>>>>
>>>> but I'll bet that not all kernels keep the info in the same place (and
>>>> probably not even in the same format). I'm suggesting that a standard be
>>>> defined for the format of the data and the location of a pointer to it
>>>> that will be maintained across kernel versions.
>>>
>>> Yes.
>>>
>>> The image-saving kernel needs to have access to the hibernated kernel's
>>> pages data, plus some additional information that should be passed in a
>>> standard format.
>>
>> but per stable-abi-nonsense the internal structure of the kernel's pages
>> data isn't an abi. so instead of figuring this out by pokeing around in
>> the memory of the old kernel and deciding what should be saved and what
>> shouldn't, the old kernel (which understands the memory structure) should
>> create a simple map of what should be backed up (either a bitmap or an
>> extent-style map, depending on how many holes there are expected to be)
>> and then provide that map to the new kernel. the new kernel (or more
>> precisely it's userspace) reads the pages it's told to read and writes
>> them somewhere.
>
> That's reasonable, but the "old" kernel can only do this after handling the
> shut down/quiescing of devices, when there is 100% guarantee that the memory
> contents will not change.

Ok, that makes sense. and since part of what's being passed along here is 
what ram is free as far as the outgoing kernel is concerned, this is 
useful info for the new kernel for other situations, not just for the 
hibernate operation, so this is probably a reasonable modification to the 
kexec call in any case (although a crash-dump kernel may decide not to 
trust this info and save everything, it's still useful to know what the 
outgoing kernel considers free)

>>>>>> since people are complaining about the amount of ram that a kexec kernel
>>>>>> would take up I'm assuiming it's somethingmore complex then just a bitmap
>>>>>> of all possible pages.
>>>>>
>>>>> No, it's just bitmaps, AFAICS, and the complaints are a bit overstated, IMO. ;-)
>>>>
>>>> 1 bit for each 4k means 1m bits for 4g of ram, or 128k of bitmaps, growing
>>>> up to 1m of ram used for 32G of ram in the system. I guess this isn't bad
>>>> as long as it doesn't need to be contiguous for the new kernel to access
>>>> it.
>>>>
>>>> ok, that makes it a pretty trivial thing to work with. I just need to
>>>> learn how to find the bitmaps.
>>>
>>> They are created on the fly before the hibernation.  The format is described in
>>> kernel/power/snapshot.c .
>>
>> I'll look through this file, but the format of this is an abi/api to the
>> userspace and should be documented outside of the code.
>
> Nope.  The user space need not know anything about the image contents.
>
> The current implementation of the user space tools use the knowledge of the
> image header format, which is given by 'struct swsusp_info', defined in
> kernel/power/power.h .

there are a couple factors here.

1. this needs to remain the same across different kernel versions.

2. this may or may not be created by userspace tools

both of these tend to imply that this is an interface to the world and 
needs to be documented and stable.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-15 19:23                                 ` david
@ 2007-07-15 22:59                                   ` Rafael J. Wysocki
       [not found]                                   ` <200707160059.08277.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-15 22:59 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Sunday, 15 July 2007 21:23, david@lang.hm wrote:
> On Sun, 15 Jul 2007, Rafael J. Wysocki wrote:
> 
> >>>> I think this is far more complicated then it needs to be.
> >>>>
> >>>> it sounds like it should be possible to do the following
> >>>>
> >>>> 1. figure out what pages should be backed up (creating a data structure to
> >>>> hold them)
> >>>
> >>> That should be done after step 2, because the memory contents can change
> >>> in this step.
> >>
> >> no, this needs to be done by the main kernel, becouse only it knows how to
> >> find this info. the kernel that you kexec into could be very different
> >> (including different versions) and the ways to identify this data is not
> >> part of any existing API
> >
> > If the memory contents changes in step 2, then the information collected by
> > the main kernel will be inaccurate.
> 
> why would the memory use change when the new kernel is run? is it becouse 
> of whatever it does to the devices for the hand-off?

Yes, I think so, although I'm not sure, because I don't know what happens to
devices during a "normal" kexec.

> >>>> 2. kexec into the hibernate kernel (this step handles all device
> >>>> transitions today)
> >>>>
> >>>> 3. have the hibernate userspace find the data structures created in step
> >>>> #1
> >>>>
> >>>> 4. have the hibernate userspace write the pages somewhere in the suspend
> >>>> format.
> >>>
> >>> You don't need to run any hibernate userspace to carry out steps 3 and 4.
> >>
> >> you should though.
> >>
> >> by doing this write in usespace you can add in all the eye-candy (aka
> >> progress bars), network I/O, etc that you want since it doesn't affect
> >> things
> >>
> >> trying to do this in the kernel makes the kernel have to know/decide too
> >> much policy (and many things that people want to do are things that do not
> >> belong in the kernel in the first place)
> >
> > Please don't tell me.  I've written uswsusp on the basis of these arguments,
> > but I don't cosider it as an overwhelming success ...
> 
> uswsusp has the problem that you are trying to run some userspace, but not 
> other userspace as you are doing the hibernate. with this approach you 
> don't have that problem since you are running a completely seperate 
> userspace.

You're running it from a RAM disk, because you can't touch filesystems.  You
can do the same with uswsusp, just fine.

> >>>> 5. have the hibernate kernel power down the box.
> >>>>
> >>>> the only things here that sounds like they're not available in stock
> >>>> kernels are steps #1 and #3.
> >>>
> >>> Correct, up to the first remark above.
> >>>
> >>>> now this won't do the fancier suspend-to-ram-and-disk and it won't let you
> >>>> go back from the hibernate kernel to the main kernel, but it should be
> >>>> enough to let you do the suspend safely and reliably.
> >>>>
> >>>> for the restore, as I understand it the process is
> >>>>
> >>>> 1. boot a kernel, any working kernel.
> >>>>
> >>>> 2. read the suspend formatted data from wherever it was saved and feed it
> >>>> to /dev/suspend
> >>>
> >>> Yes, something like this, but you really should pay more attention to devices.
> >>>
> >>> There are things that you shouldn't do to them (like unregistering), because
> >>> some processes may be using them while we're trying to hibernate (for now
> >>> we have the freezer, but I though you'd like to do all that to eliminate it).
> >>>
> >>> Generally, you need to ensure that the devices are handled in consistent ways
> >>> by both the hibernated and image-saving kernels and that's a big piece of the
> >>> jigsaw that's missing now.
> >>
> >> the kexec process should handle the device state for the transition from
> >> one kernel to another (it has to do this now, this isn't a new
> >> requirement), so this should solve the problem during the hibernate stage.
> >
> > Well, I don't think so.
> 
> Ok, I could easily be misunderstanding something here (and the comments 
> with the latest 'kexec back and forth' patch indicate there may still be 
> work to do here after all)
> 
> >> during the wakeup stage, I thought you said that al that was needed was to
> >> feed the suspend image to /dev/suspend and the kernel in the suspend image
> >> would re-probe, or otherwise re-initialize all the devices it needs. am I
> >> misunderstanding this?
> >
> > Perhaps.  Currently, the hibernated kernel will run device_resume() after
> > the restore, which is not exactly compatible with what kexec does.
> 
> but kexec isn't needed during the restore process, is it?

Generally, it's not needed.  _However_, the current handling of devices is
such that:
(a) hibernated kernel uses device_suspend() to put them into low power states
    and creates the image
(b) hibernated kernel uses device_resume() to get devices back to work and
    saves the image
(c) during the restore the boot kernel loads the image and uses
    device_suspend() to prepare devices for the "old" kernel
(d) hibernated kernel gets control and uses device_resume() to get devices back
    to work.
Now, if you use kexec instead of (a) and (b), then whatever it does to devices
is generally incompatible with the device_resume() in (d) (because, for
instance, some device driver's .resume() routine may expect some data to be
saved by the corresponding .suspend() at specific locations).

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                             ` <Pine.LNX.4.64.0707151224160.25614@asgard.lang.hm>
@ 2007-07-15 23:11                               ` Rafael J. Wysocki
       [not found]                               ` <200707160111.16805.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-15 23:11 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Sunday, 15 July 2007 21:33, david@lang.hm wrote:
> On Sun, 15 Jul 2007, Rafael J. Wysocki wrote:
> 
> > On Saturday, 14 July 2007 23:34, david@lang.hm wrote:
> >> On Sat, 14 Jul 2007, Rafael J. Wysocki wrote:
> >>
> >>> On Saturday, 14 July 2007 09:51, david@lang.hm wrote:
> >>>> On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:
> >>>>
> >>>>>> Ok, now we need a data channel from the old kernel to the hibernate
> >>>>>> kernel, to the restore kernel. and the messier the memory layout the
> >>>>>> larger this data channel needs to be (hmm, what's the status on the memory
> >>>>>> defrag patches being proposed?) if this list can be made small enough it
> >>>>>> would work to just have the old kernel put the data in a known location in
> >>>>>> ram, and let the other two parts find it (in ram for the hibernate kernel,
> >>>>>> in the hibernate image for the wakeup kernel).
> >>>>>
> >>>>> I think the hibernation kernel should mmap() the "old" kernel's (and it's
> >>>>> processes') memory available for saving, so that the image-saving process
> >>>>> can read its contents from the original locations.
> >>>>
> >>>> but I'll bet that not all kernels keep the info in the same place (and
> >>>> probably not even in the same format). I'm suggesting that a standard be
> >>>> defined for the format of the data and the location of a pointer to it
> >>>> that will be maintained across kernel versions.
> >>>
> >>> Yes.
> >>>
> >>> The image-saving kernel needs to have access to the hibernated kernel's
> >>> pages data, plus some additional information that should be passed in a
> >>> standard format.
> >>
> >> but per stable-abi-nonsense the internal structure of the kernel's pages
> >> data isn't an abi. so instead of figuring this out by pokeing around in
> >> the memory of the old kernel and deciding what should be saved and what
> >> shouldn't, the old kernel (which understands the memory structure) should
> >> create a simple map of what should be backed up (either a bitmap or an
> >> extent-style map, depending on how many holes there are expected to be)
> >> and then provide that map to the new kernel. the new kernel (or more
> >> precisely it's userspace) reads the pages it's told to read and writes
> >> them somewhere.
> >
> > That's reasonable, but the "old" kernel can only do this after handling the
> > shut down/quiescing of devices, when there is 100% guarantee that the memory
> > contents will not change.
> 
> Ok, that makes sense. and since part of what's being passed along here is 
> what ram is free as far as the outgoing kernel is concerned, this is 
> useful info for the new kernel for other situations, not just for the 
> hibernate operation, so this is probably a reasonable modification to the 
> kexec call in any case (although a crash-dump kernel may decide not to 
> trust this info and save everything, it's still useful to know what the 
> outgoing kernel considers free)
> 
> >>>>>> since people are complaining about the amount of ram that a kexec kernel
> >>>>>> would take up I'm assuiming it's somethingmore complex then just a bitmap
> >>>>>> of all possible pages.
> >>>>>
> >>>>> No, it's just bitmaps, AFAICS, and the complaints are a bit overstated, IMO. ;-)
> >>>>
> >>>> 1 bit for each 4k means 1m bits for 4g of ram, or 128k of bitmaps, growing
> >>>> up to 1m of ram used for 32G of ram in the system. I guess this isn't bad
> >>>> as long as it doesn't need to be contiguous for the new kernel to access
> >>>> it.
> >>>>
> >>>> ok, that makes it a pretty trivial thing to work with. I just need to
> >>>> learn how to find the bitmaps.
> >>>
> >>> They are created on the fly before the hibernation.  The format is described in
> >>> kernel/power/snapshot.c .
> >>
> >> I'll look through this file, but the format of this is an abi/api to the
> >> userspace and should be documented outside of the code.
> >
> > Nope.  The user space need not know anything about the image contents.
> >
> > The current implementation of the user space tools use the knowledge of the
> > image header format, which is given by 'struct swsusp_info', defined in
> > kernel/power/power.h .
> 
> there are a couple factors here.
> 
> 1. this needs to remain the same across different kernel versions.

Not exactly.  Whatever is after the image header may change at any time and
the user space should not rely on that not changing.  The header itself is a
(slightly) different matter.

> 2. this may or may not be created by userspace tools

Well, the image header is not created by userspace tools and only read the
image size from it.

> both of these tend to imply that this is an interface to the world and 
> needs to be documented and stable.

Well, it should be documented, but currently there's only one implementation
of the userland tools and the authors of these know the format. ;-)

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                                   ` <200707160059.08277.rjw@sisk.pl>
@ 2007-07-15 23:22                                     ` david
       [not found]                                     ` <Pine.LNX.4.64.0707151549200.25614@asgard.lang.hm>
  1 sibling, 0 replies; 113+ messages in thread
From: david @ 2007-07-15 23:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Mon, 16 Jul 2007, Rafael J. Wysocki wrote:

> On Sunday, 15 July 2007 21:23, david@lang.hm wrote:
>> On Sun, 15 Jul 2007, Rafael J. Wysocki wrote:
>>
>>>>>> I think this is far more complicated then it needs to be.
>>>>>>
>>>>>> it sounds like it should be possible to do the following
>>>>>>
>>>>>> 1. figure out what pages should be backed up (creating a data structure to
>>>>>> hold them)
>>>>>
>>>>> That should be done after step 2, because the memory contents can change
>>>>> in this step.
>>>>
>>>> no, this needs to be done by the main kernel, becouse only it knows how to
>>>> find this info. the kernel that you kexec into could be very different
>>>> (including different versions) and the ways to identify this data is not
>>>> part of any existing API
>>>
>>> If the memory contents changes in step 2, then the information collected by
>>> the main kernel will be inaccurate.
>>
>> why would the memory use change when the new kernel is run? is it becouse
>> of whatever it does to the devices for the hand-off?
>
> Yes, I think so, although I'm not sure, because I don't know what happens to
> devices during a "normal" kexec.

is this  a matter of running some test to find out, or is this a question 
for the kexec implemantors?

>>>> during the wakeup stage, I thought you said that al that was needed was to
>>>> feed the suspend image to /dev/suspend and the kernel in the suspend image
>>>> would re-probe, or otherwise re-initialize all the devices it needs. am I
>>>> misunderstanding this?
>>>
>>> Perhaps.  Currently, the hibernated kernel will run device_resume() after
>>> the restore, which is not exactly compatible with what kexec does.
>>
>> but kexec isn't needed during the restore process, is it?
>
> Generally, it's not needed.  _However_, the current handling of devices is
> such that:
> (a) hibernated kernel uses device_suspend() to put them into low power states
>    and creates the image
> (b) hibernated kernel uses device_resume() to get devices back to work and
>    saves the image
> (c) during the restore the boot kernel loads the image and uses
>    device_suspend() to prepare devices for the "old" kernel
> (d) hibernated kernel gets control and uses device_resume() to get devices back
>    to work.
> Now, if you use kexec instead of (a) and (b), then whatever it does to devices
> is generally incompatible with the device_resume() in (d) (because, for
> instance, some device driver's .resume() routine may expect some data to be
> saved by the corresponding .suspend() at specific locations).

ok, this means that the resume operation is not a solved problem (at least 
for the kexec process)

now, one possible approach to this (and this may be what Ying Huang it 
thinking of) would be to have the restore process be

1. boot the normal kernel with a dummy userspace, initializeing all 
devices

2. kexec to the hibernate kernel

3. the hibernate kernel's userspace overwrites all memory from the 
origional system that was saved

4. kexec from the hibernate kernel back to the origional kernel

the ugly part here is the need for the dummy userspace in step #1 so that 
it doesn't try to access the wrong things.

now, it may be that the kernel boot in step 1 doesn't need to be able to 
initialize all drivers for things to work in step 4, in which case things 
are much simpler (but you still may need the three kernel hop so that the 
final kernel is running in the same addresses that it started in.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                               ` <200707160111.16805.rjw@sisk.pl>
@ 2007-07-15 23:33                                 ` david
  0 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-15 23:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Mon, 16 Jul 2007, Rafael J. Wysocki wrote:

> On Sunday, 15 July 2007 21:33, david@lang.hm wrote:
>> On Sun, 15 Jul 2007, Rafael J. Wysocki wrote:
>>
>>> On Saturday, 14 July 2007 23:34, david@lang.hm wrote:
>>>> On Sat, 14 Jul 2007, Rafael J. Wysocki wrote:
>>>>
>>>>> On Saturday, 14 July 2007 09:51, david@lang.hm wrote:
>>>>>> On Fri, 13 Jul 2007, Rafael J. Wysocki wrote:
>>>>>>
>>>>>>>> since people are complaining about the amount of ram that a kexec kernel
>>>>>>>> would take up I'm assuiming it's somethingmore complex then just a bitmap
>>>>>>>> of all possible pages.
>>>>>>>
>>>>>>> No, it's just bitmaps, AFAICS, and the complaints are a bit overstated, IMO. ;-)
>>>>>>
>>>>>> 1 bit for each 4k means 1m bits for 4g of ram, or 128k of bitmaps, growing
>>>>>> up to 1m of ram used for 32G of ram in the system. I guess this isn't bad
>>>>>> as long as it doesn't need to be contiguous for the new kernel to access
>>>>>> it.
>>>>>>
>>>>>> ok, that makes it a pretty trivial thing to work with. I just need to
>>>>>> learn how to find the bitmaps.
>>>>>
>>>>> They are created on the fly before the hibernation.  The format is described in
>>>>> kernel/power/snapshot.c .
>>>>
>>>> I'll look through this file, but the format of this is an abi/api to the
>>>> userspace and should be documented outside of the code.
>>>
>>> Nope.  The user space need not know anything about the image contents.
>>>
>>> The current implementation of the user space tools use the knowledge of the
>>> image header format, which is given by 'struct swsusp_info', defined in
>>> kernel/power/power.h .
>>
>> there are a couple factors here.
>>
>> 1. this needs to remain the same across different kernel versions.
>
> Not exactly.  Whatever is after the image header may change at any time and
> the user space should not rely on that not changing.  The header itself is a
> (slightly) different matter.

ok, more precisely

an image of one kernel version should be able to be fed into /dev/snapshot 
of another kernel version and work.

that's what I was meaning about it needing to be the same across different 
kernel versions

>> 2. this may or may not be created by userspace tools
>
> Well, the image header is not created by userspace tools and only read the
> image size from it.

it's not today, but maby it should be.

in the kexec approach there's nothing happening here that a perl script 
couldn't do perfectly well. it's reading a bitmap from somewhere, and 
based on that bitmap it creates a header and then reads chunks from 
/dev/oldmem or /dev/mem and writes the results somewhere (to a device, or 
a network or a userspace compress program, or ...)

>> both of these tend to imply that this is an interface to the world and
>> needs to be documented and stable.
>
> Well, it should be documented, but currently there's only one implementation
> of the userland tools and the authors of these know the format. ;-)

true today, but as the pieces get simplified and documented other 
implementations could exist.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                   ` <200707141148.18279.rjw@sisk.pl>
@ 2007-07-16  5:37                     ` Joseph Fannin
  0 siblings, 0 replies; 113+ messages in thread
From: Joseph Fannin @ 2007-07-16  5:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Sat, Jul 14, 2007 at 11:48:17AM +0200, Rafael J. Wysocki wrote:
> On Saturday, 14 July 2007 02:45, Joseph Fannin wrote:
> > On Fri, Jul 13, 2007 at 11:30:50AM +0200, Rafael J. Wysocki wrote:
> > > On Friday, 13 July 2007 07:42, Joseph Fannin wrote:
> > > > On Thu, Jul 12, 2007 at 08:06:43PM -0700, david@lang.hm wrote:
> > >
> > > If you're afraid of that, use a dedicated swap file.
> >
> >     I don't understand what you mean.  A dedicated swap file for what?
>
> Sorry, I should have been more precise.
>
> For hibernation (ie. a swap file that you activate right befor the
> hibernation).
>
> Also tuxonice (formerly known as suspend2) allows you to use regular files
> hibernation.

     How is that different from what I proposed, other than the
requirement to pass the swap data stuctures between the two kernels?

     Even if you only expect hibernation to work only _most of the
time_, suspending to swap means allocating a bunch of swap space that
you intend to never be used as swap.  The two functions don't really
belong together.

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                                     ` <Pine.LNX.4.64.0707151549200.25614@asgard.lang.hm>
@ 2007-07-16 12:17                                       ` Rafael J. Wysocki
       [not found]                                       ` <200707161417.50166.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-16 12:17 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Monday, 16 July 2007 01:22, david@lang.hm wrote:
> On Mon, 16 Jul 2007, Rafael J. Wysocki wrote:
> 
> > On Sunday, 15 July 2007 21:23, david@lang.hm wrote:
> >> On Sun, 15 Jul 2007, Rafael J. Wysocki wrote:
> >>
> >>>>>> I think this is far more complicated then it needs to be.
> >>>>>>
> >>>>>> it sounds like it should be possible to do the following
> >>>>>>
> >>>>>> 1. figure out what pages should be backed up (creating a data structure to
> >>>>>> hold them)
> >>>>>
> >>>>> That should be done after step 2, because the memory contents can change
> >>>>> in this step.
> >>>>
> >>>> no, this needs to be done by the main kernel, becouse only it knows how to
> >>>> find this info. the kernel that you kexec into could be very different
> >>>> (including different versions) and the ways to identify this data is not
> >>>> part of any existing API
> >>>
> >>> If the memory contents changes in step 2, then the information collected by
> >>> the main kernel will be inaccurate.
> >>
> >> why would the memory use change when the new kernel is run? is it becouse
> >> of whatever it does to the devices for the hand-off?
> >
> > Yes, I think so, although I'm not sure, because I don't know what happens to
> > devices during a "normal" kexec.
> 
> is this  a matter of running some test to find out, or is this a question 
> for the kexec implemantors?

Actually, I'd like someone to tell me. ;-)

I've browsed the kexec code, but haven't found anything related to the devices
in it.  Perhaps I didn't know where to look ...

> >>>> during the wakeup stage, I thought you said that al that was needed was to
> >>>> feed the suspend image to /dev/suspend and the kernel in the suspend image
> >>>> would re-probe, or otherwise re-initialize all the devices it needs. am I
> >>>> misunderstanding this?
> >>>
> >>> Perhaps.  Currently, the hibernated kernel will run device_resume() after
> >>> the restore, which is not exactly compatible with what kexec does.
> >>
> >> but kexec isn't needed during the restore process, is it?
> >
> > Generally, it's not needed.  _However_, the current handling of devices is
> > such that:
> > (a) hibernated kernel uses device_suspend() to put them into low power states
> >    and creates the image
> > (b) hibernated kernel uses device_resume() to get devices back to work and
> >    saves the image
> > (c) during the restore the boot kernel loads the image and uses
> >    device_suspend() to prepare devices for the "old" kernel
> > (d) hibernated kernel gets control and uses device_resume() to get devices back
> >    to work.
> > Now, if you use kexec instead of (a) and (b), then whatever it does to devices
> > is generally incompatible with the device_resume() in (d) (because, for
> > instance, some device driver's .resume() routine may expect some data to be
> > saved by the corresponding .suspend() at specific locations).
> 
> ok, this means that the resume operation is not a solved problem (at least 
> for the kexec process)
> 
> now, one possible approach to this (and this may be what Ying Huang it 
> thinking of) would be to have the restore process be
> 
> 1. boot the normal kernel with a dummy userspace, initializeing all 
> devices
> 
> 2. kexec to the hibernate kernel
> 
> 3. the hibernate kernel's userspace overwrites all memory from the 
> origional system that was saved
> 
> 4. kexec from the hibernate kernel back to the origional kernel
> 
> the ugly part here is the need for the dummy userspace in step #1 so that 
> it doesn't try to access the wrong things.
> 
> now, it may be that the kernel boot in step 1 doesn't need to be able to 
> initialize all drivers for things to work in step 4, in which case things 
> are much simpler (but you still may need the three kernel hop so that the 
> final kernel is running in the same addresses that it started in.

I think that the right approach is to separate devices' suspend from the
devices' hibernation-related operations FIRST.  Then, many different approaches
to hibernation will be much easier to implement than they are now.

I've been saying this for weeks now, but no one seems to listen frankly I'm
tired of repeating it:

If we want to improve things, let's do that IN AN ORDERED WAY.  If everyone
will come up with a new idea every two days, we won't be able to get anything
actually _done_.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                                       ` <200707161417.50166.rjw@sisk.pl>
@ 2007-07-16 14:42                                         ` Huang, Ying
       [not found]                                         ` <1184596950.24143.28.camel@caritas-dev.intel.com>
  1 sibling, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-16 14:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: david, linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Mon, 2007-07-16 at 14:17 +0200, Rafael J. Wysocki wrote:
> > is this  a matter of running some test to find out, or is this a question 
> > for the kexec implemantors?
> 
> Actually, I'd like someone to tell me. ;-)
> 
> I've browsed the kexec code, but haven't found anything related to the devices
> in it.  Perhaps I didn't know where to look ...

There are two stages for kexec. For "normal" kexec, first the
sys_kexe_load is called to load the kernel image, then
sys_reboot(LINUX_REBOOT_CMD_KEXEC) is called to boot the new kernel. The
call chain is as follow:

sys_reboot(LINUX_REBOOT_CMD_KEXEC)
    kernel_kexec
        kernel_restart_prepare
            device_shutdown
        machine_shutdown
        machine_kexec

In device_shutdown, the dev->bus->shutdown or dev->driver->shutdown of
every device is called to put device in quiescent state. In
machine_kexec, the new kernel is booted.

So, for normal kexec, there is no code path for device state saving and
restoring. State of device can be restore after shutdown? I don't think
so.

> I think that the right approach is to separate devices' suspend from the
> devices' hibernation-related operations FIRST.  Then, many different approaches
> to hibernation will be much easier to implement than they are now.
> 
> I've been saying this for weeks now, but no one seems to listen frankly I'm
> tired of repeating it:

I agree with you on this. :)

> If we want to improve things, let's do that IN AN ORDERED WAY.  If everyone
> will come up with a new idea every two days, we won't be able to get anything
> actually _done_.

Yes, and I am very glad to collaborate with everybody who is interested
in this subject. But I think we should try to verify our idea with code
as early as possible. Now, I am trying to implement a prototype of
kexec/kdump based image writing/reading mechanism to verify the
feasibility. (I suppose you are working on separating device suspend and
device hibernate).

What do you think about the pattern of collaboration?

At last, thank you very much for your valuable reminding in the mail of
"hibernation considerations".

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                                         ` <1184596950.24143.28.camel@caritas-dev.intel.com>
@ 2007-07-16 15:40                                           ` Rafael J. Wysocki
       [not found]                                           ` <200707161740.26703.rjw@sisk.pl>
  1 sibling, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-16 15:40 UTC (permalink / raw)
  To: Huang, Ying
  Cc: david, linux-kernel, Pavel Machek, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Monday, 16 July 2007 16:42, Huang, Ying wrote:
> On Mon, 2007-07-16 at 14:17 +0200, Rafael J. Wysocki wrote:
> > > is this  a matter of running some test to find out, or is this a question 
> > > for the kexec implemantors?
> > 
> > Actually, I'd like someone to tell me. ;-)
> > 
> > I've browsed the kexec code, but haven't found anything related to the devices
> > in it.  Perhaps I didn't know where to look ...
> 
> There are two stages for kexec. For "normal" kexec, first the
> sys_kexe_load is called to load the kernel image, then
> sys_reboot(LINUX_REBOOT_CMD_KEXEC) is called to boot the new kernel.

OK, thanks.  This is the information that I was missing.

> The call chain is as follow:
> 
> sys_reboot(LINUX_REBOOT_CMD_KEXEC)
>     kernel_kexec
>         kernel_restart_prepare
>             device_shutdown
>         machine_shutdown
>         machine_kexec
> 
> In device_shutdown, the dev->bus->shutdown or dev->driver->shutdown of
> every device is called to put device in quiescent state. In
> machine_kexec, the new kernel is booted.

Yes.

> So, for normal kexec, there is no code path for device state saving and
> restoring.

Exactly.

> State of device can be restore after shutdown? I don't think so.

No, it can't, but we need something like this for hibernation and
device_shutdown() is not appropriate for this purpose IMO.

> > I think that the right approach is to separate devices' suspend from the
> > devices' hibernation-related operations FIRST.  Then, many different approaches
> > to hibernation will be much easier to implement than they are now.
> > 
> > I've been saying this for weeks now, but no one seems to listen frankly I'm
> > tired of repeating it:
> 
> I agree with you on this. :)

OK :-)

> > If we want to improve things, let's do that IN AN ORDERED WAY.  If everyone
> > will come up with a new idea every two days, we won't be able to get anything
> > actually _done_.
> 
> Yes, and I am very glad to collaborate with everybody who is interested
> in this subject. But I think we should try to verify our idea with code
> as early as possible. Now, I am trying to implement a prototype of
> kexec/kdump based image writing/reading mechanism to verify the
> feasibility. (I suppose you are working on separating device suspend and
> device hibernate).

Yes, I am.

> What do you think about the pattern of collaboration?

Sounds good.

> At last, thank you very much for your valuable reminding in the mail of
> "hibernation considerations".

You're welcome. :-)

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                       ` <87odig1idx.fsf@jbms.ath.cx>
@ 2007-07-17  0:12                         ` Joseph Fannin
       [not found]                         ` <20070717001239.GB20082@nineveh.local>
  1 sibling, 0 replies; 113+ messages in thread
From: Joseph Fannin @ 2007-07-17  0:12 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: david, linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm

On Fri, Jul 13, 2007 at 10:35:22AM -0400, Jeremy Maitin-Shepard wrote:
> jfannin@gmail.com (Joseph Fannin) writes:
>
> There is a very simple solution to this obscure problem: (if I
> understand correctly, you want to dual boot Mac OS X and Linux (and
> maybe also Windows?))
>
> use LVM, thus allowing you to have as many volumes as you like in the
> partition

Ok.

Why are all these workarounds preferred, instead of proper suspend
support for swap files?

IOW, what reasons are there to *not* support swap files, other than the
hit-and-miss Linux suspend support?

I brought up the swap file issue to illustrate that writing
hibernation images to files needs to be supported anyway.  Once you
have that, there is no good reason to write the hibernation image to
swap, and several reasons not to.

That my particular problem might be messily worked around isn't really
important in the context of that argument, which no one has responded
to.

(Aside from adding more administrivia to my Macintosh's setup, your
LVM suggestion would prevent the ext3 drivers for Windows and OS X
from working, as they don't do LVM.  This is arguably not Linux's
problem -- but Linux *already supports* a working solution).

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                                           ` <200707161740.26703.rjw@sisk.pl>
@ 2007-07-17  4:18                                             ` david
  2007-07-17 11:46                                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 113+ messages in thread
From: david @ 2007-07-17  4:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Mon, 16 Jul 2007, Rafael J. Wysocki wrote:

> On Monday, 16 July 2007 16:42, Huang, Ying wrote:
>> On Mon, 2007-07-16 at 14:17 +0200, Rafael J. Wysocki wrote:
>>>> is this  a matter of running some test to find out, or is this a question
>>>> for the kexec implemantors?
>>>
>>> Actually, I'd like someone to tell me. ;-)
>>>
>>> I've browsed the kexec code, but haven't found anything related to the devices
>>> in it.  Perhaps I didn't know where to look ...
>>
>> There are two stages for kexec. For "normal" kexec, first the
>> sys_kexe_load is called to load the kernel image, then
>> sys_reboot(LINUX_REBOOT_CMD_KEXEC) is called to boot the new kernel.
>
> OK, thanks.  This is the information that I was missing.
>
>> The call chain is as follow:
>>
>> sys_reboot(LINUX_REBOOT_CMD_KEXEC)
>>     kernel_kexec
>>         kernel_restart_prepare
>>             device_shutdown
>>         machine_shutdown
>>         machine_kexec
>>
>> In device_shutdown, the dev->bus->shutdown or dev->driver->shutdown of
>> every device is called to put device in quiescent state. In
>> machine_kexec, the new kernel is booted.
>
> Yes.
>
>> So, for normal kexec, there is no code path for device state saving and
>> restoring.
>
> Exactly.
>
>> State of device can be restore after shutdown? I don't think so.
>
> No, it can't, but we need something like this for hibernation and
> device_shutdown() is not appropriate for this purpose IMO.

is the only reason that device_shutdown() is not appropriate the amount of 
time it takes to shutdown some devices and then start them up again? (I'm 
specificly thinking of drive spin down/up as an example)

if so, it is probably worth implementing a demo with the long times 
involved to hash out any other problems, and then implemnt shortcuts to 
avoid the device_shutdown only where the time involved is excessive.

so, exactly where in the process above does the memory map need to be 
created? is this in the machine_shutdown step or would it need to be in 
the machine_kexec step?

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                         ` <20070717001239.GB20082@nineveh.local>
@ 2007-07-17  5:44                           ` Oliver Neukum
       [not found]                           ` <200707170744.08191.oliver@neukum.org>
  1 sibling, 0 replies; 113+ messages in thread
From: Oliver Neukum @ 2007-07-17  5:44 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

Am Dienstag 17 Juli 2007 schrieb Joseph Fannin:
> Why are all these workarounds preferred, instead of proper suspend
> support for swap files?
> 
> IOW, what reasons are there to *not* support swap files, other than the
> hit-and-miss Linux suspend support?

If yoi want to go the kexec route to hibernation, the dumping kernel
would need to mount the filesystem to write to a file. Therefore the
suspending kernel would need to sync to disk and lock that file.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                           ` <200707170744.08191.oliver@neukum.org>
@ 2007-07-17  6:28                             ` Joseph Fannin
       [not found]                             ` <20070717062803.GA9069@nineveh.local>
  1 sibling, 0 replies; 113+ messages in thread
From: Joseph Fannin @ 2007-07-17  6:28 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: david, linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Tue, Jul 17, 2007 at 07:44:07AM +0200, Oliver Neukum wrote:
>
> If yoi want to go the kexec route to hibernation, the dumping kernel
> would need to mount the filesystem to write to a file. Therefore the
> suspending kernel would need to sync to disk and lock that file.

If the file is preallocated, that's not a problem, as there's no need
to touch filesystem metadata.  There'd need to be some channel to pass
the disk blocks that are for writing the image, but that's not going
to be nearly as complicated as passing the current swap data
structures from the previous kernel.

There's no reason to have that file open in the original kernel --
it should be root-owned (it's full of privledged data) and probably
mode 000.

root is free to "dd if=/dev/random of=/dev/mem".  Root owned
daemons which do bad things are bugs.

Again, supporting swap files (*which is not optional*) requires the
very same support.

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                             ` <20070717062803.GA9069@nineveh.local>
@ 2007-07-17  6:42                               ` david
  2007-07-17  7:26                                 ` Joseph Fannin
  2007-07-17  7:10                               ` Oliver Neukum
  1 sibling, 1 reply; 113+ messages in thread
From: david @ 2007-07-17  6:42 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Tue, 17 Jul 2007, Joseph Fannin wrote:

> On Tue, Jul 17, 2007 at 07:44:07AM +0200, Oliver Neukum wrote:
>>
>> If yoi want to go the kexec route to hibernation, the dumping kernel
>> would need to mount the filesystem to write to a file. Therefore the
>> suspending kernel would need to sync to disk and lock that file.
>
> If the file is preallocated, that's not a problem, as there's no need
> to touch filesystem metadata.  There'd need to be some channel to pass
> the disk blocks that are for writing the image, but that's not going
> to be nearly as complicated as passing the current swap data
> structures from the previous kernel.
>
> There's no reason to have that file open in the original kernel --
> it should be root-owned (it's full of privledged data) and probably
> mode 000.
>
> root is free to "dd if=/dev/random of=/dev/mem".  Root owned
> daemons which do bad things are bugs.

in this case it would be more like

dd if=/block0 of=/dev/sda1 count=1 bs=4096 skip=5000
dd if=/block1 of=/dev/sda1 count=1 bs=4096 skip=5050
dd if=/block2 of=/dev/sda1 count=1 bs=4096 skip=5400
etc

to write the blocks to the raw parition in the right place

> Again, supporting swap files (*which is not optional*) requires the
> very same support.

in the kexec model why would the second kernel care about swap files at 
all? (unles it chooses to write to them, in which case it is exactly the 
same support, but unless it writes to them it doesn't need to care)

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                             ` <20070717062803.GA9069@nineveh.local>
  2007-07-17  6:42                               ` david
@ 2007-07-17  7:10                               ` Oliver Neukum
  1 sibling, 0 replies; 113+ messages in thread
From: Oliver Neukum @ 2007-07-17  7:10 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

Am Dienstag 17 Juli 2007 schrieb Joseph Fannin:
> On Tue, Jul 17, 2007 at 07:44:07AM +0200, Oliver Neukum wrote:
> >
> > If yoi want to go the kexec route to hibernation, the dumping kernel
> > would need to mount the filesystem to write to a file. Therefore the
> > suspending kernel would need to sync to disk and lock that file.
> 
> If the file is preallocated, that's not a problem, as there's no need
> to touch filesystem metadata.  There'd need to be some channel to pass

The suspending and the resuming kernel need to touch metadata.
The kernel writing the image does not need to touch it.
You need to make sure that the metadata on disk is in a state that
allows the resuming kernel to find the file.

> the disk blocks that are for writing the image, but that's not going
> to be nearly as complicated as passing the current swap data
> structures from the previous kernel.

You'd need to pass just a bitmap of the swap space. Or you go the
easiest way and use a dedicated partition.
 
> There's no reason to have that file open in the original kernel --
> it should be root-owned (it's full of privledged data) and probably
> mode 000.
> 
> root is free to "dd if=/dev/random of=/dev/mem".  Root owned
> daemons which do bad things are bugs.
> 
> Again, supporting swap files (*which is not optional*) requires the
> very same support.

Why would you need to support swap files?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
  2007-07-17  6:42                               ` david
@ 2007-07-17  7:26                                 ` Joseph Fannin
  2007-07-17  7:34                                   ` david
                                                     ` (2 more replies)
  0 siblings, 3 replies; 113+ messages in thread
From: Joseph Fannin @ 2007-07-17  7:26 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Mon, Jul 16, 2007 at 11:42:08PM -0700, david@lang.hm wrote:
> On Tue, 17 Jul 2007, Joseph Fannin wrote:

> >root is free to "dd if=/dev/random of=/dev/mem".  Root owned
> >daemons which do bad things are bugs.
>
> in this case it would be more like
>
> dd if=/block0 of=/dev/sda1 count=1 bs=4096 skip=5000
> dd if=/block1 of=/dev/sda1 count=1 bs=4096 skip=5050
> dd if=/block2 of=/dev/sda1 count=1 bs=4096 skip=5400
> etc
>
> to write the blocks to the raw parition in the right place

What I meant by that was that root is allowed to shoot himself in the
foot.  Nothing stops root from opening a swap/hibernate file, which
would put it in cache, and cause it to be inconsistant if a
hibernation image was written to it behind the kernel's back.

That would be a very stupid thing to do, however.  There's no reason
to open that file, unless you know *exactly* what you are doing, in
which case the onus is on you to get it right.

But you have a point.  The swap file could be very fragmented.  It
might often be so, even.

Still, is this better than exporting the kernel's swap internals
(which has never been a public interface before)?

Does it make the interface that writing hibernation images to swap
imposes any better?

Even if hibernation files are no less complicated to support than
hibernating to swap files (which isn't a forgone conclusion) , there
are plenty of reasons writing hibernation images to swap doesn't make
sense.

> >Again, supporting swap files (*which is not optional*) requires the
> >very same support.
>
> in the kexec model why would the second kernel care about swap files at
> all? (unles it chooses to write to them, in which case it is exactly the
> same support, but unless it writes to them it doesn't need to care)

My point is that no extra work is required to write hibernation images
to dedicated files than to write hibernation images to swap files.

If swap files are to be supported, then, there's no technical reason
not to support dedicated hibernation files.  Dedicated hibernation
files are better, and there's no reason not to implement them.

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
  2007-07-17  7:26                                 ` Joseph Fannin
@ 2007-07-17  7:34                                   ` david
  2007-07-17 11:52                                   ` Rafael J. Wysocki
       [not found]                                   ` <Pine.LNX.4.64.0707170030460.19248@asgard.lang.hm>
  2 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-17  7:34 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Tue, 17 Jul 2007, Joseph Fannin wrote:

> On Mon, Jul 16, 2007 at 11:42:08PM -0700, david@lang.hm wrote:
>> On Tue, 17 Jul 2007, Joseph Fannin wrote:
>
>>> root is free to "dd if=/dev/random of=/dev/mem".  Root owned
>>> daemons which do bad things are bugs.
>>
>> in this case it would be more like
>>
>> dd if=/block0 of=/dev/sda1 count=1 bs=4096 skip=5000
>> dd if=/block1 of=/dev/sda1 count=1 bs=4096 skip=5050
>> dd if=/block2 of=/dev/sda1 count=1 bs=4096 skip=5400
>> etc
>>
>> to write the blocks to the raw parition in the right place
>
> What I meant by that was that root is allowed to shoot himself in the
> foot.  Nothing stops root from opening a swap/hibernate file, which
> would put it in cache, and cause it to be inconsistant if a
> hibernation image was written to it behind the kernel's back.
>
> That would be a very stupid thing to do, however.  There's no reason
> to open that file, unless you know *exactly* what you are doing, in
> which case the onus is on you to get it right.
>
> But you have a point.  The swap file could be very fragmented.  It
> might often be so, even.
>
> Still, is this better than exporting the kernel's swap internals
> (which has never been a public interface before)?
>
> Does it make the interface that writing hibernation images to swap
> imposes any better?
>
> Even if hibernation files are no less complicated to support than
> hibernating to swap files (which isn't a forgone conclusion) , there
> are plenty of reasons writing hibernation images to swap doesn't make
> sense.
>
>
>>> Again, supporting swap files (*which is not optional*) requires the
>>> very same support.
>>
>> in the kexec model why would the second kernel care about swap files at
>> all? (unles it chooses to write to them, in which case it is exactly the
>> same support, but unless it writes to them it doesn't need to care)
>
> My point is that no extra work is required to write hibernation images
> to dedicated files than to write hibernation images to swap files.
>
> If swap files are to be supported, then, there's no technical reason
> not to support dedicated hibernation files.  Dedicated hibernation
> files are better, and there's no reason not to implement them.

I agree with your point, but the reverse is not true, the ability to write 
to a dedicated hibernation file does not produce the capacity to write to 
a swap file, and I do question the 'requirement' to write the hibernation 
image to the swap file.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                   ` <1184491804.1898.121.camel@caritas-dev.intel.com>
  2007-07-15 10:49                     ` Rafael J. Wysocki
@ 2007-07-17  8:13                     ` david
       [not found]                     ` <Pine.LNX.4.64.0707170101010.19248@asgard.lang.hm>
  2 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-17  8:13 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

Ying, as the kexec guru in this thread I have a question for you about how 
kexec works (and possibly where you are going with this)

for the power-off hibernate with ACPI disabled the hibernation seems 
fairly straightforward (although there are still some missing pieces)

however, since the resume designed for ACPI won't work would the following 
approach work

1. boot one kernel
2. setup a kexec the same way you would for hibernate
3. kexec to the new kernel
4. overwrite the memory of the first kernel
5. kexec 'back' to the main kernel that has now been overwritten by what was saved?

as part of this question, when you do a kexec, how does the kernel that 
you are doing the kexec to know what to run next?

it needs to do some initialization first before it starts running normal 
things, and at that point it the move back doesn't look for init like a 
normal kernel boot (or the system would effectivly boot instead of picking 
up where it left off)

is this 'restart point' flexible enough that either the pre-hibernate 
kerenl or the small hibernate kernel could tell the pre-hibernate kernel 
to go into suspend-to-ram mode before doing anything else?

Rafael,
   if ACPI is disabled and not used, is there any memory in the origional 
kernel that _must not_ be saved in the hibernate image? I recognise that 
for efficancy it would save time to not save free memory, but if I'm 
willing to waste some resources would it hurt to save everything?

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-17  4:18                                             ` david
@ 2007-07-17 11:46                                               ` Rafael J. Wysocki
  0 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-17 11:46 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton, linux-pm,
	Jeremy Maitin-Shepard

On Tuesday, 17 July 2007 06:18, david@lang.hm wrote:
> On Mon, 16 Jul 2007, Rafael J. Wysocki wrote:
> 
> > On Monday, 16 July 2007 16:42, Huang, Ying wrote:
> >> On Mon, 2007-07-16 at 14:17 +0200, Rafael J. Wysocki wrote:
> >>>> is this  a matter of running some test to find out, or is this a question
> >>>> for the kexec implemantors?
> >>>
> >>> Actually, I'd like someone to tell me. ;-)
> >>>
> >>> I've browsed the kexec code, but haven't found anything related to the devices
> >>> in it.  Perhaps I didn't know where to look ...
> >>
> >> There are two stages for kexec. For "normal" kexec, first the
> >> sys_kexe_load is called to load the kernel image, then
> >> sys_reboot(LINUX_REBOOT_CMD_KEXEC) is called to boot the new kernel.
> >
> > OK, thanks.  This is the information that I was missing.
> >
> >> The call chain is as follow:
> >>
> >> sys_reboot(LINUX_REBOOT_CMD_KEXEC)
> >>     kernel_kexec
> >>         kernel_restart_prepare
> >>             device_shutdown
> >>         machine_shutdown
> >>         machine_kexec
> >>
> >> In device_shutdown, the dev->bus->shutdown or dev->driver->shutdown of
> >> every device is called to put device in quiescent state. In
> >> machine_kexec, the new kernel is booted.
> >
> > Yes.
> >
> >> So, for normal kexec, there is no code path for device state saving and
> >> restoring.
> >
> > Exactly.
> >
> >> State of device can be restore after shutdown? I don't think so.
> >
> > No, it can't, but we need something like this for hibernation and
> > device_shutdown() is not appropriate for this purpose IMO.
> 
> is the only reason that device_shutdown() is not appropriate the amount of 
> time it takes to shutdown some devices and then start them up again? (I'm 
> specificly thinking of drive spin down/up as an example)

Not only that.  You also need to save some driver data so that it can restore
the devices state from before the hibernation.  [Say you have a task blocked
on the driver's mutex in .read().  In that case you'd want the .read() to be
carried out after the restore in the same way in which it would have been
caried out if the hibernation hadn't occur.]

> if so, it is probably worth implementing a demo with the long times 
> involved to hash out any other problems, and then implemnt shortcuts to 
> avoid the device_shutdown only where the time involved is excessive.

I think that device_shutdown() is just inappropriate, because in principle
it doesn't allow you to save any information related to the device state
before hibernation that may be needed after the restore.

> so, exactly where in the process above does the memory map need to be 
> created? is this in the machine_shutdown step or would it need to be in 
> the machine_kexec step?

I would do in in the machine_kexec step.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
  2007-07-17  7:26                                 ` Joseph Fannin
  2007-07-17  7:34                                   ` david
@ 2007-07-17 11:52                                   ` Rafael J. Wysocki
       [not found]                                   ` <Pine.LNX.4.64.0707170030460.19248@asgard.lang.hm>
  2 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-17 11:52 UTC (permalink / raw)
  To: Joseph Fannin
  Cc: david, linux-kernel, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm, Jeremy Maitin-Shepard

On Tuesday, 17 July 2007 09:26, Joseph Fannin wrote:
> On Mon, Jul 16, 2007 at 11:42:08PM -0700, david@lang.hm wrote:
> > On Tue, 17 Jul 2007, Joseph Fannin wrote:
> 
> > >root is free to "dd if=/dev/random of=/dev/mem".  Root owned
> > >daemons which do bad things are bugs.
> >
> > in this case it would be more like
> >
> > dd if=/block0 of=/dev/sda1 count=1 bs=4096 skip=5000
> > dd if=/block1 of=/dev/sda1 count=1 bs=4096 skip=5050
> > dd if=/block2 of=/dev/sda1 count=1 bs=4096 skip=5400
> > etc
> >
> > to write the blocks to the raw parition in the right place
> 
> What I meant by that was that root is allowed to shoot himself in the
> foot.  Nothing stops root from opening a swap/hibernate file, which
> would put it in cache, and cause it to be inconsistant if a
> hibernation image was written to it behind the kernel's back.
> 
> That would be a very stupid thing to do, however.  There's no reason
> to open that file, unless you know *exactly* what you are doing, in
> which case the onus is on you to get it right.
> 
> But you have a point.  The swap file could be very fragmented.  It
> might often be so, even.
> 
> Still, is this better than exporting the kernel's swap internals
> (which has never been a public interface before)?
> 
> Does it make the interface that writing hibernation images to swap
> imposes any better?
> 
> Even if hibernation files are no less complicated to support than
> hibernating to swap files (which isn't a forgone conclusion) , there
> are plenty of reasons writing hibernation images to swap doesn't make
> sense.
> 
> 
> > >Again, supporting swap files (*which is not optional*) requires the
> > >very same support.
> >
> > in the kexec model why would the second kernel care about swap files at
> > all? (unles it chooses to write to them, in which case it is exactly the
> > same support, but unless it writes to them it doesn't need to care)
> 
> My point is that no extra work is required to write hibernation images
> to dedicated files than to write hibernation images to swap files.

This is not true, because for writing into swap files we can use some existing
code and for writing to "dedicated" files some new code needs to be written,
tested etc. (I think there is code like that in tuxonice, though).

> If swap files are to be supported, then, there's no technical reason
> not to support dedicated hibernation files.

Yes there is, we need some more code for that in the kernel.

> Dedicated hibernation files are better, and there's no reason not to
> implement them.

I won't argue that they are not better.  For many people hibernation to swap
files/partitions is sufficient, though.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: Hibernating To Swap Considered Harmful
       [not found]                                   ` <Pine.LNX.4.64.0707170030460.19248@asgard.lang.hm>
@ 2007-07-17 11:54                                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-17 11:54 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, Joseph Fannin, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Tuesday, 17 July 2007 09:34, david@lang.hm wrote:
> On Tue, 17 Jul 2007, Joseph Fannin wrote:
> 
> > On Mon, Jul 16, 2007 at 11:42:08PM -0700, david@lang.hm wrote:
> >> On Tue, 17 Jul 2007, Joseph Fannin wrote:
> >
> >>> root is free to "dd if=/dev/random of=/dev/mem".  Root owned
> >>> daemons which do bad things are bugs.
> >>
> >> in this case it would be more like
> >>
> >> dd if=/block0 of=/dev/sda1 count=1 bs=4096 skip=5000
> >> dd if=/block1 of=/dev/sda1 count=1 bs=4096 skip=5050
> >> dd if=/block2 of=/dev/sda1 count=1 bs=4096 skip=5400
> >> etc
> >>
> >> to write the blocks to the raw parition in the right place
> >
> > What I meant by that was that root is allowed to shoot himself in the
> > foot.  Nothing stops root from opening a swap/hibernate file, which
> > would put it in cache, and cause it to be inconsistant if a
> > hibernation image was written to it behind the kernel's back.
> >
> > That would be a very stupid thing to do, however.  There's no reason
> > to open that file, unless you know *exactly* what you are doing, in
> > which case the onus is on you to get it right.
> >
> > But you have a point.  The swap file could be very fragmented.  It
> > might often be so, even.
> >
> > Still, is this better than exporting the kernel's swap internals
> > (which has never been a public interface before)?
> >
> > Does it make the interface that writing hibernation images to swap
> > imposes any better?
> >
> > Even if hibernation files are no less complicated to support than
> > hibernating to swap files (which isn't a forgone conclusion) , there
> > are plenty of reasons writing hibernation images to swap doesn't make
> > sense.
> >
> >
> >>> Again, supporting swap files (*which is not optional*) requires the
> >>> very same support.
> >>
> >> in the kexec model why would the second kernel care about swap files at
> >> all? (unles it chooses to write to them, in which case it is exactly the
> >> same support, but unless it writes to them it doesn't need to care)
> >
> > My point is that no extra work is required to write hibernation images
> > to dedicated files than to write hibernation images to swap files.
> >
> > If swap files are to be supported, then, there's no technical reason
> > not to support dedicated hibernation files.  Dedicated hibernation
> > files are better, and there's no reason not to implement them.
> 
> I agree with your point, but the reverse is not true, the ability to write 
> to a dedicated hibernation file does not produce the capacity to write to 
> a swap file, and I do question the 'requirement' to write the hibernation 
> image to the swap file.

There's no such requirement, it's just been easier to implement.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                     ` <Pine.LNX.4.64.0707170101010.19248@asgard.lang.hm>
@ 2007-07-17 11:59                       ` Rafael J. Wysocki
  2007-07-17 12:48                       ` Huang, Ying
       [not found]                       ` <1184676518.10998.34.camel@caritas-dev.intel.com>
  2 siblings, 0 replies; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-17 11:59 UTC (permalink / raw)
  To: david
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Tuesday, 17 July 2007 10:13, david@lang.hm wrote:
> Ying, as the kexec guru in this thread I have a question for you about how 
> kexec works (and possibly where you are going with this)
> 
> for the power-off hibernate with ACPI disabled the hibernation seems 
> fairly straightforward (although there are still some missing pieces)
> 
> however, since the resume designed for ACPI won't work would the following 
> approach work
> 
> 1. boot one kernel
> 2. setup a kexec the same way you would for hibernate
> 3. kexec to the new kernel
> 4. overwrite the memory of the first kernel
> 5. kexec 'back' to the main kernel that has now been overwritten by what was saved?
> 
> as part of this question, when you do a kexec, how does the kernel that 
> you are doing the kexec to know what to run next?
> 
> it needs to do some initialization first before it starts running normal 
> things, and at that point it the move back doesn't look for init like a 
> normal kernel boot (or the system would effectivly boot instead of picking 
> up where it left off)
> 
> is this 'restart point' flexible enough that either the pre-hibernate 
> kerenl or the small hibernate kernel could tell the pre-hibernate kernel 
> to go into suspend-to-ram mode before doing anything else?
> 
> Rafael,
>    if ACPI is disabled and not used, is there any memory in the origional 
> kernel that _must not_ be saved in the hibernate image? I recognise that 
> for efficancy it would save time to not save free memory, but if I'm 
> willing to waste some resources would it hurt to save everything?

On some systems there are valid 'struct page' structures that correspond to
memory holes and the kernel will oops if you try to save the contents of
these pages.  Unfortunately, only the early initialization platform code can
tell you which they are.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                     ` <Pine.LNX.4.64.0707170101010.19248@asgard.lang.hm>
  2007-07-17 11:59                       ` Rafael J. Wysocki
@ 2007-07-17 12:48                       ` Huang, Ying
       [not found]                       ` <1184676518.10998.34.camel@caritas-dev.intel.com>
  2 siblings, 0 replies; 113+ messages in thread
From: Huang, Ying @ 2007-07-17 12:48 UTC (permalink / raw)
  To: david
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Tue, 2007-07-17 at 01:13 -0700, david@lang.hm wrote:
> however, since the resume designed for ACPI won't work would the following 
> approach work
> 
> 1. boot one kernel
> 2. setup a kexec the same way you would for hibernate
> 3. kexec to the new kernel
> 4. overwrite the memory of the first kernel
> 5. kexec 'back' to the main kernel that has now been overwritten by what was saved?
> 
> as part of this question, when you do a kexec, how does the kernel that 
> you are doing the kexec to know what to run next?

For kernel in 3 that do kexec, the devices and CPU state are saved into
memory before executing the new kernel. So when jumping back, the
control will continue from kexec point. If the memory image of main
kernel is restored from disk, the devices and CPU state in memory is
restored too. Before jumping back in 5, the devices are put in the known
state, after jumping back, the devices and CPU state is restored. If the
"kexec -j" is used to trigger the kexec in 3, the system will continue
with "kexec -j" exiting with exit code 0.

> it needs to do some initialization first before it starts running normal 
> things, and at that point it the move back doesn't look for init like a 
> normal kernel boot (or the system would effectivly boot instead of picking 
> up where it left off)

I think the early initialization can be done in a initramfs. At that
point, the resume image can be checked, the next step depends on the
result of checking.

> is this 'restart point' flexible enough that either the pre-hibernate 
> kerenl or the small hibernate kernel could tell the pre-hibernate kernel 
> to go into suspend-to-ram mode before doing anything else?

It is possible for hibernate kernel to pass information back to
pre-hibernate kernel. For example, the information can be passed in jump
buffer page. 

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
       [not found]                       ` <1184676518.10998.34.camel@caritas-dev.intel.com>
@ 2007-07-17 14:22                         ` Rafael J. Wysocki
  2007-07-18  0:25                           ` david
  0 siblings, 1 reply; 113+ messages in thread
From: Rafael J. Wysocki @ 2007-07-17 14:22 UTC (permalink / raw)
  To: Huang, Ying
  Cc: david, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Pavel Machek, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Tuesday, 17 July 2007 14:48, Huang, Ying wrote:
> On Tue, 2007-07-17 at 01:13 -0700, david@lang.hm wrote:
> > however, since the resume designed for ACPI won't work would the following 
> > approach work
> > 
> > 1. boot one kernel
> > 2. setup a kexec the same way you would for hibernate
> > 3. kexec to the new kernel
> > 4. overwrite the memory of the first kernel
> > 5. kexec 'back' to the main kernel that has now been overwritten by what was saved?
> > 
> > as part of this question, when you do a kexec, how does the kernel that 
> > you are doing the kexec to know what to run next?
> 
> For kernel in 3 that do kexec, the devices and CPU state are saved into
> memory before executing the new kernel. So when jumping back, the
> control will continue from kexec point. If the memory image of main
> kernel is restored from disk, the devices and CPU state in memory is
> restored too. Before jumping back in 5, the devices are put in the known
> state, after jumping back, the devices and CPU state is restored. If the
> "kexec -j" is used to trigger the kexec in 3, the system will continue
> with "kexec -j" exiting with exit code 0.
> 
> > it needs to do some initialization first before it starts running normal 
> > things, and at that point it the move back doesn't look for init like a 
> > normal kernel boot (or the system would effectivly boot instead of picking 
> > up where it left off)
> 
> I think the early initialization can be done in a initramfs. At that
> point, the resume image can be checked, the next step depends on the
> result of checking.
> 
> > is this 'restart point' flexible enough that either the pre-hibernate 
> > kerenl or the small hibernate kernel could tell the pre-hibernate kernel 
> > to go into suspend-to-ram mode before doing anything else?
> 
> It is possible for hibernate kernel to pass information back to
> pre-hibernate kernel. For example, the information can be passed in jump
> buffer page. 

I think it would be reasonable to have a protocol defined for passing this
information, so that it's independent of the kernel version etc.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 113+ messages in thread

* Re: [PATCH 0/2] Kexec jump: The first step to kexec base hibernation
  2007-07-17 14:22                         ` Rafael J. Wysocki
@ 2007-07-18  0:25                           ` david
  0 siblings, 0 replies; 113+ messages in thread
From: david @ 2007-07-18  0:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman, Pavel Machek,
	Huang, Ying, Andrew Morton, linux-pm, Jeremy Maitin-Shepard

On Tue, 17 Jul 2007, Rafael J. Wysocki wrote:

> On Tuesday, 17 July 2007 14:48, Huang, Ying wrote:
>> On Tue, 2007-07-17 at 01:13 -0700, david@lang.hm wrote:
>>> however, since the resume designed for ACPI won't work would the following
>>> approach work
>>>
>>> 1. boot one kernel
>>> 2. setup a kexec the same way you would for hibernate
>>> 3. kexec to the new kernel
>>> 4. overwrite the memory of the first kernel
>>> 5. kexec 'back' to the main kernel that has now been overwritten by what was saved?
>>>
>>> as part of this question, when you do a kexec, how does the kernel that
>>> you are doing the kexec to know what to run next?
>>
>> For kernel in 3 that do kexec, the devices and CPU state are saved into
>> memory before executing the new kernel. So when jumping back, the
>> control will continue from kexec point. If the memory image of main
>> kernel is restored from disk, the devices and CPU state in memory is
>> restored too. Before jumping back in 5, the devices are put in the known
>> state, after jumping back, the devices and CPU state is restored. If the
>> "kexec -j" is used to trigger the kexec in 3, the system will continue
>> with "kexec -j" exiting with exit code 0.
>>
>>> it needs to do some initialization first before it starts running normal
>>> things, and at that point it the move back doesn't look for init like a
>>> normal kernel boot (or the system would effectivly boot instead of picking
>>> up where it left off)
>>
>> I think the early initialization can be done in a initramfs. At that
>> point, the resume image can be checked, the next step depends on the
>> result of checking.
>>
>>> is this 'restart point' flexible enough that either the pre-hibernate
>>> kerenl or the small hibernate kernel could tell the pre-hibernate kernel
>>> to go into suspend-to-ram mode before doing anything else?
>>
>> It is possible for hibernate kernel to pass information back to
>> pre-hibernate kernel. For example, the information can be passed in jump
>> buffer page.
>
> I think it would be reasonable to have a protocol defined for passing this
> information, so that it's independent of the kernel version etc.

At this point it looks like we have the following communication nessasary 
between the two kernels

1. the original kernel needs to create a map of what memory to backup

   this could be either a bitmap, or a series of address:blockcount pairs.
   in either case the result could be sizeable, so it's probably best to 
define a standard location to find the type and address of the data.

2. the new kernel needs to tell the old kernel which 'restart point' to 
use.

   this could be a simple jump table.

   the list that has been suggested so far is
     A. restore (the default)
     B. suspend-to-ram
     C. ACPI suspend-to-disk (S4 mode)

since both kernels have access to the other kernel's memory the data could 
be stored in either kernel's address space, but my initial thought is that 
it's cleaner to store this in the original kernel's space and have the 
second kernel find it there.

I don't have the knowledge to create either of these interfaces, let alone 
the pull to get them implmented in the kernel.

could I ask that one of you consider makeing a patch that implements at 
least #1 (the memory map)

it sounds as if these kexec-back patches plus makeing the memory map 
available would allow for a ACPI-free hibernate mode with no other 
modifications. I'd like to try experimenting with this, but without the 
memory map it sounds as if things will croak when I try to do this.

David Lang

^ permalink raw reply	[flat|nested] 113+ messages in thread

end of thread, other threads:[~2007-07-18  0:25 UTC | newest]

Thread overview: 113+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-11 15:30 [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Huang, Ying
2007-07-11 11:13 ` Pavel Machek
2007-07-12  0:22 ` Andrew Morton
2007-07-12  5:48   ` Jeremy Fitzhardinge
     [not found]   ` <4695C096.5080400@goop.org>
2007-07-12  6:43     ` david
2007-07-12 12:46       ` Rafael J. Wysocki
     [not found]       ` <200707121446.14170.rjw@sisk.pl>
2007-07-12 13:51         ` Mark Lord
     [not found]         ` <469631FA.2070405@rtr.ca>
2007-07-12 14:49           ` Pavel Machek
2007-07-12 15:35           ` Rafael J. Wysocki
     [not found]           ` <200707121735.40077.rjw@sisk.pl>
2007-07-12 16:03             ` Mark Lord
     [not found]             ` <469650DE.4000901@rtr.ca>
2007-07-12 16:35               ` Mark Lord
     [not found]               ` <46965837.8030907@rtr.ca>
2007-07-12 20:05                 ` Jeremy Maitin-Shepard
     [not found]                 ` <87y7hl2xro.fsf@jbms.ath.cx>
2007-07-13  2:38                   ` Mark Lord
2007-07-12 16:09           ` Alan Stern
2007-07-12 18:49           ` david
2007-07-12 18:42         ` david
     [not found]         ` <Pine.LNX.4.64.0707121138140.25614@asgard.lang.hm>
     [not found]           ` <200707122120.19662.rjw@sisk.pl>
2007-07-12 19:14             ` david
     [not found]             ` <Pine.LNX.4.64.0707121210210.25614@asgard.lang.hm>
2007-07-12 19:45               ` Rafael J. Wysocki
2007-07-12 19:20           ` Rafael J. Wysocki
     [not found]     ` <1184260174.9346.85.camel@caritas-dev.intel.com>
2007-07-12 12:47       ` Rafael J. Wysocki
2007-07-12 17:09     ` Huang, Ying
2007-07-12 12:38   ` Rafael J. Wysocki
2007-07-12 14:43   ` Huang, Ying
     [not found]   ` <1184251423.9346.55.camel@caritas-dev.intel.com>
2007-07-12  7:03     ` david
2007-07-12 12:53     ` Rafael J. Wysocki
2007-07-12 16:32     ` Eric W. Biederman
     [not found]     ` <Pine.LNX.4.64.0707112345250.28090@asgard.lang.hm>
     [not found]       ` <1184260683.9346.91.camel@caritas-dev.intel.com>
2007-07-12 10:10         ` david
2007-07-12 13:01           ` Rafael J. Wysocki
     [not found]           ` <200707121501.03016.rjw@sisk.pl>
2007-07-12 13:22             ` jimmy bahuleyan
2007-07-12 19:03             ` david
2007-07-12 13:55           ` Mark Lord
2007-07-12 19:05             ` david
2007-07-12 14:06           ` Pavel Machek
2007-07-12 17:18       ` Huang, Ying
     [not found]     ` <200707121453.49616.rjw@sisk.pl>
2007-07-12 18:57       ` david
     [not found]       ` <Pine.LNX.4.64.0707121150460.25614@asgard.lang.hm>
2007-07-12 19:34         ` Rafael J. Wysocki
     [not found]         ` <200707122134.29991.rjw@sisk.pl>
2007-07-12 19:55           ` Jeremy Maitin-Shepard
2007-07-13  3:06           ` david
2007-07-13  5:42             ` Hibernating To Swap Considered Harmful Joseph Fannin
2007-07-13  5:57               ` david
2007-07-13  6:20                 ` Joseph Fannin
     [not found]                 ` <20070713062039.GA29055@nineveh.local>
2007-07-13  6:27                   ` david
     [not found]                   ` <Pine.LNX.4.64.0707122319270.25614@asgard.lang.hm>
2007-07-13  7:15                     ` Joseph Fannin
     [not found]                     ` <20070713071512.GB29055@nineveh.local>
2007-07-13 14:35                       ` Jeremy Maitin-Shepard
     [not found]                       ` <87odig1idx.fsf@jbms.ath.cx>
2007-07-17  0:12                         ` Joseph Fannin
     [not found]                         ` <20070717001239.GB20082@nineveh.local>
2007-07-17  5:44                           ` Oliver Neukum
     [not found]                           ` <200707170744.08191.oliver@neukum.org>
2007-07-17  6:28                             ` Joseph Fannin
     [not found]                             ` <20070717062803.GA9069@nineveh.local>
2007-07-17  6:42                               ` david
2007-07-17  7:26                                 ` Joseph Fannin
2007-07-17  7:34                                   ` david
2007-07-17 11:52                                   ` Rafael J. Wysocki
     [not found]                                   ` <Pine.LNX.4.64.0707170030460.19248@asgard.lang.hm>
2007-07-17 11:54                                     ` Rafael J. Wysocki
2007-07-17  7:10                               ` Oliver Neukum
2007-07-13  9:30               ` Rafael J. Wysocki
     [not found]               ` <200707131130.51279.rjw@sisk.pl>
2007-07-14  0:45                 ` Joseph Fannin
     [not found]                 ` <20070714004517.GA18336@nineveh.local>
2007-07-14  9:48                   ` Rafael J. Wysocki
     [not found]                   ` <200707141148.18279.rjw@sisk.pl>
2007-07-16  5:37                     ` Joseph Fannin
2007-07-13  9:29             ` [PATCH 0/2] Kexec jump: The first step to kexec base hibernation Rafael J. Wysocki
     [not found]             ` <200707131129.34974.rjw@sisk.pl>
2007-07-13  9:38               ` david
2007-07-13 11:59                 ` Rafael J. Wysocki
2007-07-13 14:37                   ` Alan Stern
2007-07-13 15:12                   ` Jeremy Maitin-Shepard
     [not found]                   ` <87abu01gnv.fsf@jbms.ath.cx>
2007-07-13 15:45                     ` Rafael J. Wysocki
     [not found]                     ` <200707131745.43055.rjw@sisk.pl>
2007-07-13 15:50                       ` Alan Stern
2007-07-13 16:48                       ` Jeremy Maitin-Shepard
2007-07-13 21:23                         ` Rafael J. Wysocki
2007-07-14  7:12                   ` david
     [not found]             ` <1184347974.4523.30.camel@caritas-dev.intel.com>
2007-07-13 12:01               ` Rafael J. Wysocki
2007-07-13 17:32             ` Huang, Ying
     [not found]           ` <877ip54cti.fsf@jbms.ath.cx>
2007-07-12 20:45             ` Rafael J. Wysocki
2007-07-13  3:12             ` david
     [not found]             ` <Pine.LNX.4.64.0707122008550.25614@asgard.lang.hm>
2007-07-13  9:17               ` Rafael J. Wysocki
2007-07-13  9:25                 ` david
2007-07-13 11:41                   ` Rafael J. Wysocki
     [not found]                   ` <200707131341.35801.rjw@sisk.pl>
2007-07-14  7:51                     ` david
     [not found]                     ` <Pine.LNX.4.64.0707140017560.25614@asgard.lang.hm>
2007-07-14  8:33                       ` david
     [not found]                       ` <Pine.LNX.4.64.0707140128210.25614@asgard.lang.hm>
2007-07-14  9:24                         ` Rafael J. Wysocki
2007-07-14 20:00                       ` Rafael J. Wysocki
2007-07-14 20:34                         ` david
     [not found]                         ` <Pine.LNX.4.64.0707141257290.14672@asgard.lang.hm>
2007-07-14 21:06                           ` Rafael J. Wysocki
     [not found]                           ` <200707142306.33783.rjw@sisk.pl>
2007-07-14 21:13                             ` david
2007-07-15 10:31                               ` Rafael J. Wysocki
     [not found]                               ` <200707151231.27410.rjw@sisk.pl>
2007-07-15 19:23                                 ` david
2007-07-15 22:59                                   ` Rafael J. Wysocki
     [not found]                                   ` <200707160059.08277.rjw@sisk.pl>
2007-07-15 23:22                                     ` david
     [not found]                                     ` <Pine.LNX.4.64.0707151549200.25614@asgard.lang.hm>
2007-07-16 12:17                                       ` Rafael J. Wysocki
     [not found]                                       ` <200707161417.50166.rjw@sisk.pl>
2007-07-16 14:42                                         ` Huang, Ying
     [not found]                                         ` <1184596950.24143.28.camel@caritas-dev.intel.com>
2007-07-16 15:40                                           ` Rafael J. Wysocki
     [not found]                                           ` <200707161740.26703.rjw@sisk.pl>
2007-07-17  4:18                                             ` david
2007-07-17 11:46                                               ` Rafael J. Wysocki
2007-07-14 21:34                         ` david
2007-07-15 10:39                           ` Rafael J. Wysocki
     [not found]                           ` <200707151239.28400.rjw@sisk.pl>
2007-07-15 19:33                             ` david
     [not found]                             ` <Pine.LNX.4.64.0707151224160.25614@asgard.lang.hm>
2007-07-15 23:11                               ` Rafael J. Wysocki
     [not found]                               ` <200707160111.16805.rjw@sisk.pl>
2007-07-15 23:33                                 ` david
     [not found]     ` <m14pk9fuqa.fsf@ebiederm.dsl.xmission.com>
2007-07-12 19:09       ` david
2007-07-12 19:49         ` Eric W. Biederman
     [not found]       ` <1184368525.1069.68.camel@caritas-dev.intel.com>
2007-07-13 16:43         ` Eric W. Biederman
     [not found]         ` <m1k5t4dzl4.fsf@ebiederm.dsl.xmission.com>
2007-07-14  5:48           ` Huang, Ying
     [not found]           ` <1184392129.1898.69.camel@caritas-dev.intel.com>
2007-07-14  9:59             ` Rafael J. Wysocki
2007-07-14 10:55               ` Huang, Ying
     [not found]               ` <1184410554.1898.84.camel@caritas-dev.intel.com>
2007-07-14 19:16                 ` Rafael J. Wysocki
     [not found]                 ` <200707142116.10237.rjw@sisk.pl>
2007-07-15  9:30                   ` Huang, Ying
     [not found]                   ` <1184491804.1898.121.camel@caritas-dev.intel.com>
2007-07-15 10:49                     ` Rafael J. Wysocki
2007-07-17  8:13                     ` david
     [not found]                     ` <Pine.LNX.4.64.0707170101010.19248@asgard.lang.hm>
2007-07-17 11:59                       ` Rafael J. Wysocki
2007-07-17 12:48                       ` Huang, Ying
     [not found]                       ` <1184676518.10998.34.camel@caritas-dev.intel.com>
2007-07-17 14:22                         ` Rafael J. Wysocki
2007-07-18  0:25                           ` david
2007-07-13 23:15       ` Huang, Ying
     [not found] ` <20070711111350.GI7091@elf.ucw.cz>
     [not found]   ` <1184257734.9346.76.camel@caritas-dev.intel.com>
2007-07-12  8:54     ` Pavel Machek
     [not found]     ` <20070712085428.GA1866@elf.ucw.cz>
2007-07-13 23:18       ` Huang, Ying
2007-07-12 16:28   ` Huang, Ying

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox