All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Olaf Hering <olaf@aepfle.de>
Cc: kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] add hook to read_from_oldmem() to check for non-ram pages
Date: Thu, 5 May 2011 14:25:51 -0700	[thread overview]
Message-ID: <20110505142551.b4d2d95a.akpm@linux-foundation.org> (raw)
In-Reply-To: <20110503190806.GA12485@aepfle.de>

On Tue, 3 May 2011 21:08:06 +0200
Olaf Hering <olaf@aepfle.de> wrote:

> 
> The balloon driver in a Xen guest frees guest pages and marks them as
> mmio. When the kernel crashes and the crash kernel attempts to read the
> oldmem via /proc/vmcore a read from ballooned pages will generate 100%
> load in dom0 because Xen asks qemu-dm for the page content. Since the
> reads come in as 8byte requests each ballooned page is tried 512 times.
> 
> With this change a hook can be registered which checks wether the given
> pfn is really ram. The hook has to return a value > 0 for ram pages, a
> value < 0 on error (because the hypercall is not known) and 0 for
> non-ram pages.
> 
> This will reduce the time to read /proc/vmcore.  Without this change a
> 512M guest with 128M crashkernel region needs 200 seconds to read it,
> with this change it takes just 2 seconds.

Seems reasonable, I suppose.

Is there some suitable ifdef we can put around this stuff to avoid
adding it to kernel builds which will never use it?

> ...
>
> --- linux-2.6.39-rc5.orig/fs/proc/vmcore.c
> +++ linux-2.6.39-rc5/fs/proc/vmcore.c
> @@ -18,6 +18,7 @@
>  #include <linux/init.h>
>  #include <linux/crash_dump.h>
>  #include <linux/list.h>
> +#include <linux/wait.h>
>  #include <asm/uaccess.h>
>  #include <asm/io.h>
>  
> @@ -35,6 +36,44 @@ static u64 vmcore_size;
>  
>  static struct proc_dir_entry *proc_vmcore = NULL;
>  
> +/* returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error */
> +static int (*oldmem_pfn_is_ram)(unsigned long pfn);
> +static DECLARE_WAIT_QUEUE_HEAD(oldmem_fn_waitq);
> +static atomic_t oldmem_fn_refcount = ATOMIC_INIT(0);
> +
> +void register_oldmem_pfn_is_ram(int (*fn)(unsigned long))
> +{
> +	if (oldmem_pfn_is_ram == NULL)
> +		oldmem_pfn_is_ram = fn;
> +}

This is racy, and it should return a success code.  And we may as well
mark it __must_check to prevent people from cheating.

> +void unregister_oldmem_pfn_is_ram(void)
> +{
> +	wait_event(oldmem_fn_waitq, atomic_read(&oldmem_fn_refcount) == 0);
> +	oldmem_pfn_is_ram = NULL;
> +	wmb();
> +}

I'd say we should do away with the (also racy) refcount thing. 
Instead, require that callers not be using the thing when they
unregister.  ie: that callers not be buggy.

> +static int pfn_is_ram(unsigned long pfn)
> +{
> +	int (*fn)(unsigned long);
> +	/* pfn is ram unless fn() checks pagetype */
> +	int ret = 1;
> +
> +	atomic_inc(&oldmem_fn_refcount);
> +	smp_mb__after_atomic_inc();
> +	fn = oldmem_pfn_is_ram;
> +	if (fn)
> +		ret = fn(pfn);
> +	if (atomic_dec_and_test(&oldmem_fn_refcount))
> +		wake_up(&oldmem_fn_waitq);
> +
> +	return ret;
> +}

This function would have been a suitable place at which to document the
entire feature.  As it stands, anyone who is reading this code won't
have any clue why it exists.

> +EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
> +EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);

Each export should be placed immediately after the function which is
being exported, please.  Checkpatch reports this.  Please use checkpatch.


>  /* Reads a page from the oldmem device from given offset. */
>  static ssize_t read_from_oldmem(char *buf, size_t count,
>  				u64 *ppos, int userbuf)
> @@ -55,9 +94,14 @@ static ssize_t read_from_oldmem(char *bu
>  		else
>  			nr_bytes = count;
>  
> -		tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
> -		if (tmp < 0)
> -			return tmp;
> +		/* if pfn is not ram, return zeros for spares dump files */

typo.

Also, sentences start with capital letters!

> +		if (pfn_is_ram(pfn) == 0)
> +			memset(buf, 0, nr_bytes);
> +		else {
> +			tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
> +			if (tmp < 0)
> +				return tmp;
> +		}
>  		*ppos += nr_bytes;
>  		count -= nr_bytes;
>  		buf += nr_bytes;
> Index: linux-2.6.39-rc5/include/linux/crash_dump.h
> ===================================================================
> --- linux-2.6.39-rc5.orig/include/linux/crash_dump.h
> +++ linux-2.6.39-rc5/include/linux/crash_dump.h
> @@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
>  	if (is_kdump_kernel())
>  		elfcorehdr_addr = ELFCORE_ADDR_ERR;
>  }
> +
> +#define HAVE_OLDMEM_PFN_IS_RAM 1

What's this for?

> +extern void register_oldmem_pfn_is_ram(int (*fn)(unsigned long));

"unsigned long pfn" in the declaration, please.  This has good
documentation value.

> +extern void unregister_oldmem_pfn_is_ram(void);



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Olaf Hering <olaf@aepfle.de>
Cc: linux-kernel@vger.kernel.org, kexec@lists.infradead.org
Subject: Re: [PATCH] add hook to read_from_oldmem() to check for non-ram pages
Date: Thu, 5 May 2011 14:25:51 -0700	[thread overview]
Message-ID: <20110505142551.b4d2d95a.akpm@linux-foundation.org> (raw)
In-Reply-To: <20110503190806.GA12485@aepfle.de>

On Tue, 3 May 2011 21:08:06 +0200
Olaf Hering <olaf@aepfle.de> wrote:

> 
> The balloon driver in a Xen guest frees guest pages and marks them as
> mmio. When the kernel crashes and the crash kernel attempts to read the
> oldmem via /proc/vmcore a read from ballooned pages will generate 100%
> load in dom0 because Xen asks qemu-dm for the page content. Since the
> reads come in as 8byte requests each ballooned page is tried 512 times.
> 
> With this change a hook can be registered which checks wether the given
> pfn is really ram. The hook has to return a value > 0 for ram pages, a
> value < 0 on error (because the hypercall is not known) and 0 for
> non-ram pages.
> 
> This will reduce the time to read /proc/vmcore.  Without this change a
> 512M guest with 128M crashkernel region needs 200 seconds to read it,
> with this change it takes just 2 seconds.

Seems reasonable, I suppose.

Is there some suitable ifdef we can put around this stuff to avoid
adding it to kernel builds which will never use it?

> ...
>
> --- linux-2.6.39-rc5.orig/fs/proc/vmcore.c
> +++ linux-2.6.39-rc5/fs/proc/vmcore.c
> @@ -18,6 +18,7 @@
>  #include <linux/init.h>
>  #include <linux/crash_dump.h>
>  #include <linux/list.h>
> +#include <linux/wait.h>
>  #include <asm/uaccess.h>
>  #include <asm/io.h>
>  
> @@ -35,6 +36,44 @@ static u64 vmcore_size;
>  
>  static struct proc_dir_entry *proc_vmcore = NULL;
>  
> +/* returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error */
> +static int (*oldmem_pfn_is_ram)(unsigned long pfn);
> +static DECLARE_WAIT_QUEUE_HEAD(oldmem_fn_waitq);
> +static atomic_t oldmem_fn_refcount = ATOMIC_INIT(0);
> +
> +void register_oldmem_pfn_is_ram(int (*fn)(unsigned long))
> +{
> +	if (oldmem_pfn_is_ram == NULL)
> +		oldmem_pfn_is_ram = fn;
> +}

This is racy, and it should return a success code.  And we may as well
mark it __must_check to prevent people from cheating.

> +void unregister_oldmem_pfn_is_ram(void)
> +{
> +	wait_event(oldmem_fn_waitq, atomic_read(&oldmem_fn_refcount) == 0);
> +	oldmem_pfn_is_ram = NULL;
> +	wmb();
> +}

I'd say we should do away with the (also racy) refcount thing. 
Instead, require that callers not be using the thing when they
unregister.  ie: that callers not be buggy.

> +static int pfn_is_ram(unsigned long pfn)
> +{
> +	int (*fn)(unsigned long);
> +	/* pfn is ram unless fn() checks pagetype */
> +	int ret = 1;
> +
> +	atomic_inc(&oldmem_fn_refcount);
> +	smp_mb__after_atomic_inc();
> +	fn = oldmem_pfn_is_ram;
> +	if (fn)
> +		ret = fn(pfn);
> +	if (atomic_dec_and_test(&oldmem_fn_refcount))
> +		wake_up(&oldmem_fn_waitq);
> +
> +	return ret;
> +}

This function would have been a suitable place at which to document the
entire feature.  As it stands, anyone who is reading this code won't
have any clue why it exists.

> +EXPORT_SYMBOL_GPL(register_oldmem_pfn_is_ram);
> +EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);

Each export should be placed immediately after the function which is
being exported, please.  Checkpatch reports this.  Please use checkpatch.


>  /* Reads a page from the oldmem device from given offset. */
>  static ssize_t read_from_oldmem(char *buf, size_t count,
>  				u64 *ppos, int userbuf)
> @@ -55,9 +94,14 @@ static ssize_t read_from_oldmem(char *bu
>  		else
>  			nr_bytes = count;
>  
> -		tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
> -		if (tmp < 0)
> -			return tmp;
> +		/* if pfn is not ram, return zeros for spares dump files */

typo.

Also, sentences start with capital letters!

> +		if (pfn_is_ram(pfn) == 0)
> +			memset(buf, 0, nr_bytes);
> +		else {
> +			tmp = copy_oldmem_page(pfn, buf, nr_bytes, offset, userbuf);
> +			if (tmp < 0)
> +				return tmp;
> +		}
>  		*ppos += nr_bytes;
>  		count -= nr_bytes;
>  		buf += nr_bytes;
> Index: linux-2.6.39-rc5/include/linux/crash_dump.h
> ===================================================================
> --- linux-2.6.39-rc5.orig/include/linux/crash_dump.h
> +++ linux-2.6.39-rc5/include/linux/crash_dump.h
> @@ -66,6 +66,11 @@ static inline void vmcore_unusable(void)
>  	if (is_kdump_kernel())
>  		elfcorehdr_addr = ELFCORE_ADDR_ERR;
>  }
> +
> +#define HAVE_OLDMEM_PFN_IS_RAM 1

What's this for?

> +extern void register_oldmem_pfn_is_ram(int (*fn)(unsigned long));

"unsigned long pfn" in the declaration, please.  This has good
documentation value.

> +extern void unregister_oldmem_pfn_is_ram(void);



  reply	other threads:[~2011-05-05 21:26 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-07  9:56 dynamic oldmem in kdump kernel Olaf Hering
2011-04-07  9:56 ` Olaf Hering
2011-04-07 10:23 ` Américo Wang
2011-04-07 10:23   ` Américo Wang
2011-04-07 13:12   ` Olaf Hering
2011-04-07 13:12     ` Olaf Hering
2011-04-08 10:49     ` Américo Wang
2011-04-08 10:49       ` Américo Wang
2011-05-02 10:22 ` Olaf Hering
2011-05-02 10:22   ` Olaf Hering
2011-05-03 19:08 ` [PATCH] add hook to read_from_oldmem() to check for non-ram pages Olaf Hering
2011-05-03 19:08   ` Olaf Hering
2011-05-05 21:25   ` Andrew Morton [this message]
2011-05-05 21:25     ` Andrew Morton
2011-05-06 10:55     ` Olaf Hering
2011-05-06 10:55       ` Olaf Hering
2011-05-06 19:30       ` Andrew Morton
2011-05-06 19:30         ` Andrew Morton
2011-05-06 19:39         ` Olaf Hering
2011-05-06 19:39           ` Olaf Hering
2011-05-06 19:55           ` Andrew Morton
2011-05-06 19:55             ` Andrew Morton
2011-05-06 13:20   ` [PATCH v2] " Olaf Hering
2011-05-06 13:20     ` Olaf Hering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110505142551.b4d2d95a.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=olaf@aepfle.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.