public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Petr Mládek" <pmladek@suse.cz>
To: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
Cc: linux-kernel@vger.kernel.org,
	"Luis R. Rodriguez" <mcgrof@suse.com>,
	Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joe Perches <joe@perches.com>, Arun KS <arunks.linux@gmail.com>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs
Date: Wed, 11 Jun 2014 11:34:47 +0200	[thread overview]
Message-ID: <20140611093447.GL7772@pathway.suse.cz> (raw)
In-Reply-To: <1402448685-30634-1-git-send-email-mcgrof@do-not-panic.com>

On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> The default size of the ring buffer is too small for machines
> with a large amount of CPUs under heavy load. What ends up
> happening when debugging is the ring buffer overlaps and chews
> up old messages making debugging impossible unless the size is
> passed as a kernel parameter. An idle system upon boot up will
> on average spew out only about one or two extra lines but where
> this really matters is on heavy load and that will vary widely
> depending on the system and environment.

Thanks for looking at this. It is a pity to lose stracktrace when a huge
machine Oopses just because the default ring buffer is too small.


> There are mechanisms to help increase the kernel ring buffer
> for tracing through debugfs, and those interfaces even allow growing
> the kernel ring buffer per CPU. We also have a static value which
> can be passed upon boot. Relying on debugfs however is not ideal
> for production, and relying on the value passed upon bootup is
> can only used *after* an issue has creeped up. Instead of being
> reactive this adds a proactive measure which lets you scale the
> amount of contributions you'd expect to the kernel ring buffer
> under load by each CPU in the worst case scenerio.
> 
> We use num_possible_cpus() to avoid complexities which could be
> introduced by dynamically changing the ring buffer size at run
> time, num_possible_cpus() lets us use the upper limit on possible
> number of CPUs therefore avoiding having to deal with hotplugging
> CPUs on and off. This option is diabled by default, and if used
> the kernel ring buffer size then can be computed as follows:
> 
> size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) *  __LOG_CPU_BUF_LEN
> 
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Petr Mladek <pmladek@suse.cz>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Joe Perches <joe@perches.com>
> Cc: Arun KS <arunks.linux@gmail.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  init/Kconfig           | 28 ++++++++++++++++++++++++++++
>  kernel/printk/printk.c |  6 ++++--
>  2 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 9d3585b..1814436 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
>  		     13 =>  8 KB
>  		     12 =>  4 KB
>  
> +config LOG_CPU_BUF_SHIFT
> +	int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> +	range 0 21
> +	default 0
> +	help
> +	  The kernel ring buffer will get additional data logged onto it
> +	  when multiple CPUs are supported. Typically the contributions is a
> +	  few lines when idle however under under load this can vary and in the
> +	  worst case it can mean loosing logging information. You can use this
> +	  to set the maximum expected mount of amount of logging contribution
> +	  under load by each CPU in the worst case scenerio. Select a size as
> +	  a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> +	  LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
> +	  follows having 16 CPUs as possible.
> +
> +	     ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB

It might be better to use the CPU_NUM-specific value as a minimum of
the needed space. Linux distributions might want to distribute kernel
with non-zero value and still use the static "__log_buf" on reasonable
small systems.

> +	  Where as typically you'd only end up with 256 KB. This is disabled
> +	  by default with a value of 0.

I would add:

	This value is ignored when "log_buf_len" commandline parameter
	is used. It forces the exact size of the ring buffer.

> +	  Examples:
> +		     17 => 128 KB
> +		     16 => 64 KB
> +	             15 => 32 KB
> +	             14 => 16 KB
> +		     13 =>  8 KB
> +		     12 =>  4 KB

I think that we should make it more cleat that it is per-CPU here,
for example:

		17 => 128 KB for each CPU
		16 =>  64 KB for each CPU
		15 =>  32 KB for each CPU
		14 =>  16 KB for each CPU
		13 =>   8 KB for each CPU
		12 =>   4 KB for each CPU


>  #
>  # Architectures with an unreliable sched_clock() should select this:
>  #
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 7228258..2023424 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -246,6 +246,7 @@ static u32 clear_idx;
>  #define LOG_ALIGN __alignof__(struct printk_log)
>  #endif
>  #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT)
>  static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
> @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
>  	unsigned long flags;
>  	char *new_log_buf;
>  	int free;
> +	int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;
>  
> -	if (!new_log_buf_len)
> -		return;
> +	if (!new_log_buf_len && cpu_extra > 1)
> +		new_log_buf_len = __LOG_BUF_LEN + cpu_extra;

We still should return when both new_log_buf_len and cpu_extra are
zero and call here:

	if (!new_log_buf_len)
		return;

Also I would feel more comfortable if we somehow limit the maximum
size of cpu_extra. I wonder if there might be a crazy setup with a lot
of possible CPUs and possible memory but with some minimal amount of
CPUs and memory at the boot time.

The question is how to do it. I am still not much familiar with the
memory subsystem. I wonder if 10% of memory defined by the
"total_rampages" variable would be a reasonable limit.

>  	if (early) {
>  		new_log_buf =
> -- 
> 2.0.0.rc3.18.g00a5b79
> 

>  LocalWords:  buf len cpu boottime

  reply	other threads:[~2014-06-11  9:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-11  1:04 [RFC] printk: allow increasing the ring buffer depending on the number of CPUs Luis R. Rodriguez
2014-06-11  9:34 ` Petr Mládek [this message]
2014-06-11 21:47   ` Luis R. Rodriguez
2014-06-12 13:05     ` Petr Mládek
2014-06-12 20:22       ` Luis R. Rodriguez
2014-06-12 18:01   ` Davidlohr Bueso
2014-06-12 18:45     ` Joe Perches
2014-06-12 21:28       ` Luis R. Rodriguez
2014-06-16 20:17       ` Chris Metcalf
2014-06-12 21:12     ` Luis R. Rodriguez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140611093447.GL7772@pathway.suse.cz \
    --to=pmladek@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=arunks.linux@gmail.com \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@do-not-panic.com \
    --cc=mcgrof@suse.com \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox