Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Luis R. Rodriguez" <mcgrof@suse.com>
To: "Petr Mládek" <pmladek@suse.cz>
Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>,
	linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joe Perches <joe@perches.com>, Arun KS <arunks.linux@gmail.com>,
	Kees Cook <keescook@chromium.org>, Mel Gorman <mgorman@suse.de>,
	Davidlohr Bueso <davidlohr@hp.com>,
	Chris Metcalf <cmetcalf@tilera.com>
Subject: Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs
Date: Thu, 12 Jun 2014 22:22:47 +0200	[thread overview]
Message-ID: <20140612202246.GA4841@wotan.suse.de> (raw)
In-Reply-To: <20140612130532.GN7772@pathway.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 8758 bytes --]

On Thu, Jun 12, 2014 at 03:05:32PM +0200, Petr Mládek wrote:
> On Wed 2014-06-11 23:47:41, Luis R. Rodriguez wrote:
> > On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote:
> > > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> > > > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > > > diff --git a/init/Kconfig b/init/Kconfig
> > > > index 9d3585b..1814436 100644
> > > > --- a/init/Kconfig
> > > > +++ b/init/Kconfig
> > > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
> > > >  		     13 =>  8 KB
> > > >  		     12 =>  4 KB
> > > >  
> > > > +config LOG_CPU_BUF_SHIFT
> > > > +	int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> > > > +	range 0 21
> > > > +	default 0
> > > > +	help
> > > > +	  The kernel ring buffer will get additional data logged onto it
> > > > +	  when multiple CPUs are supported. Typically the contributions is a
> > > > +	  few lines when idle however under under load this can vary and in the
> > > > +	  worst case it can mean loosing logging information. You can use this
> > > > +	  to set the maximum expected mount of amount of logging contribution
> > > > +	  under load by each CPU in the worst case scenerio. Select a size as
> > > > +	  a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> > > > +	  LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
> > > > +	  follows having 16 CPUs as possible.
> > > > +
> > > > +	     ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
> > > 
> > > It might be better to use the CPU_NUM-specific value as a minimum of
> > > the needed space. Linux distributions might want to distribute kernel
> > > with non-zero value and still use the static "__log_buf" on reasonable
> > > small systems.
> > 
> > Not sure if I follow what you mean by CPU_NUM-specific, can you
> > elaborate?
> 
> I wanted to say that the space requested by LOG_CPU_BUF_SHIFT depends
> on the number of CPUs. If LOG_CPU_BUF_SHIFT is not zero, your
> patch always allocates new ringbuffer and leave the static "__log_buf"
> unused. I think that this is not necessary for machines with small
> amount of CPUs and probably also with small amount of memory.

True, which is why I disabled it by default if we want to only leave
this disabled for < certain amount of num CPU systems, what is that
number, I see below a recommendation and I do like it.

> I would rename the variable to LOG_CPU_BUF_MIN_SHIFT or so. It would
> represent minimal size that is needed to print CPU-specific
> messages. If they take only "small" part of the default ring buffer
> size, we could still use the default rind buffer.

True, and will rename this, that still leaves open the question of a
number of CPUs that is sensible to keep but you resolve that below.

> For example, if we left 50% of the default buffer for CPU-specific
> messages, the code might look like:
> 
> 	#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> 
> 	int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> 
> 	if (!new_log_buf_len && (cpu_extra > __LOG_BUF_LEN / 2))
> 		new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
> 
> 	if (!new_log_buf_len)
> 		return;
> 
> 	allocate the new ring buffer...

Yeah I like these heuristics a lot, will fold them in and send a v2 now in patch form.
To be clear with this CONFIG_LOG_CPU_MIN_BUF_SHIFT could actually be left now to
something other than non zeo by default and only if that contribution is seen to
go above 1/2 of __LOG_BUF_LEN will we allocate more for the ring buffer. With
default values of LOG_BUF_SHIFT at 18 and say a default value of 12 for
LOG_CPU_MIN_BUF_SHIFT this would mean we'd need if we remove the -1 we'd require
64 CPUs in order to trigger an allocation for more memory. If we keep the -1 we'd
require anything over 64 number of CPUs. Do we want to keep the -1 and the > 64
CPU requirement as default? Is the LOG_CPU_MIN_BUF_SHIFT default of 12 resonable
to start with (assumes 4KB in the worst case before the kernel ring buffer flips over).

> > The default in this patch is to ignore this, do you mean that upstream
> > should probably default to a non-zero value here and then let distributions
> > select 0 for some kernel builds ?
> 
> If the change has effect only for huge systems, the default value
> might be non-zero everywhere.

Sure.

> > If so then perhaps adding a sysctl override value might be good to
> > allow only small systems to override this to 0?
> 
> I think that it won't help to lover the value using sysctl because the
> huge buffer would be already allocated during boot. If I did not miss anything.
> 
> [...]

Yeah true, a sensible default would be best, with the systctl we'd also have
to handle dynamic re-allocations and while the tracing code already added
code to make this easier I'd prefer we don't make this a popular path.

> > 
> > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > > > index 7228258..2023424 100644
> > > > --- a/kernel/printk/printk.c
> > > > +++ b/kernel/printk/printk.c
> > > > @@ -246,6 +246,7 @@ static u32 clear_idx;
> > > >  #define LOG_ALIGN __alignof__(struct printk_log)
> > > >  #endif
> > > >  #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> > > > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT)
> > > >  static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> > > >  static char *log_buf = __log_buf;
> > > >  static u32 log_buf_len = __LOG_BUF_LEN;
> > > > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
> > > >  	unsigned long flags;
> > > >  	char *new_log_buf;
> > > >  	int free;
> > > > +	int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;
> > > >  
> > > > -	if (!new_log_buf_len)
> > > > -		return;
> > > > +	if (!new_log_buf_len && cpu_extra > 1)
> > > > +		new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
> > > 
> > > We still should return when both new_log_buf_len and cpu_extra are
> > > zero and call here:
> > > 
> > > 	if (!new_log_buf_len)
> > > 		return;
> > 
> > The check for cpu_extra > 1 does that -- the default in the patch was 0
> > and 1 << 0 is 1, so if in the case that the default is used we'd bail
> > just like before. Or did I perhaps miss what you were saying here?
> 
> The problem is that we do not bail out because you removed the "return".
> If "new_log_buf_len=0" and "cpu_extra=1" then we keep
> "new_log_buf_len" as is. Then we continue, try to allocate zero memory
> and print error: "log_buf_len: 0 bytes not available". Do I get it right?

Yeah sorry, I meant to add the else.. and bail with a return if the default
was of 0 was not used or if the kernel parameter to increase the size was
not passed.

> > > Also I would feel more comfortable if we somehow limit the maximum
> > > size of cpu_extra.
> > 
> > Michal had similar concerns and I thought up to limit it to 1024 max
> > CPUs, but after my second implementation I did some math on the values
> > that would be used if say LOG_CPU_BUF_SHIFT was 12, it turns out to not
> > be *that* bad for even huge num_possible_cpus(). For example for 4096
> > num_possible_cpus() this comes out to with LOG_BUF_SHIFT of 18:
> > 
> > 
> > ((1 << 18) + ((4096 - 1) * (1 << 12))) / 1024 = 16636 KB
> > 
> > ~16 MB doesn't seem that bad for such a monster box which I'd presume
> > would have an insane amount of memory. If this logic however does
> > seems unreasonable and we should cap it -- then by all means lets
> > pick a sensible number, its just not clear to me what that number
> > should be. Another reason why I stayed away from capping this was
> > that we'd then likely end up capping this in the future, and I was
> > trying to find a solution that would not require mucking as
> > technology evolves. The reasoning above is also why I had opted to
> > make the default to 0, only distributions would have a good sense
> > of what might be reasonable, which I guess begs more for a sysctl
> > value here.
> 
> I am not sure but I think that the huge buffer would be allocated
> before any sysctl value could be modified. So, I think that sysctl
> would not really help here.

Sure.

> I think that the 10% or 20% of the total memory size is a good limit.
> Nobody would want to use more than 20% of memory for logging. So, it
> needs not be higher. The main purpose of the limit is that the system
> does not die immediately after allocating the ring buffer. The 80%
> reserve for the rest of the system sounds fine as well. Note that
> the limit won't be needed on 99,9% of systems but it would help
> with debugging the last 0.1% :-)

Oh, what we do for the the 0.1%.

  Luis

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

next prev parent reply	other threads:[~2014-06-12 20:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-11  1:04 [RFC] printk: allow increasing the ring buffer depending on the number of CPUs Luis R. Rodriguez
2014-06-11  9:34 ` Petr Mládek
2014-06-11 21:47   ` Luis R. Rodriguez
2014-06-12 13:05     ` Petr Mládek
2014-06-12 20:22       ` Luis R. Rodriguez [this message]
2014-06-12 18:01   ` Davidlohr Bueso
2014-06-12 18:45     ` Joe Perches
2014-06-12 21:28       ` Luis R. Rodriguez
2014-06-16 20:17       ` Chris Metcalf
2014-06-12 21:12     ` Luis R. Rodriguez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140612202246.GA4841@wotan.suse.de \
    --to=mcgrof@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=arunks.linux@gmail.com \
    --cc=cmetcalf@tilera.com \
    --cc=davidlohr@hp.com \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@do-not-panic.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=pmladek@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox