All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luis R. Rodriguez" <mcgrof@suse.com>
To: "Petr Mládek" <pmladek@suse.cz>
Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>,
	linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joe Perches <joe@perches.com>, Arun KS <arunks.linux@gmail.com>,
	Kees Cook <keescook@chromium.org>, Mel Gorman <mgorman@suse.de>,
	Davidlohr Bueso <davidlohr@hp.com>,
	Chris Metcalf <cmetcalf@tilera.com>
Subject: Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs
Date: Thu, 12 Jun 2014 22:22:47 +0200	[thread overview]
Message-ID: <20140612202246.GA4841@wotan.suse.de> (raw)
In-Reply-To: <20140612130532.GN7772@pathway.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 8758 bytes --]

On Thu, Jun 12, 2014 at 03:05:32PM +0200, Petr Mládek wrote:
> On Wed 2014-06-11 23:47:41, Luis R. Rodriguez wrote:
> > On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote:
> > > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote:
> > > > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > > > diff --git a/init/Kconfig b/init/Kconfig
> > > > index 9d3585b..1814436 100644
> > > > --- a/init/Kconfig
> > > > +++ b/init/Kconfig
> > > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT
> > > >  		     13 =>  8 KB
> > > >  		     12 =>  4 KB
> > > >  
> > > > +config LOG_CPU_BUF_SHIFT
> > > > +	int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)"
> > > > +	range 0 21
> > > > +	default 0
> > > > +	help
> > > > +	  The kernel ring buffer will get additional data logged onto it
> > > > +	  when multiple CPUs are supported. Typically the contributions is a
> > > > +	  few lines when idle however under under load this can vary and in the
> > > > +	  worst case it can mean loosing logging information. You can use this
> > > > +	  to set the maximum expected mount of amount of logging contribution
> > > > +	  under load by each CPU in the worst case scenerio. Select a size as
> > > > +	  a power of 2. For example if LOG_BUF_SHIFT is 18 and if your
> > > > +	  LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as
> > > > +	  follows having 16 CPUs as possible.
> > > > +
> > > > +	     ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB
> > > 
> > > It might be better to use the CPU_NUM-specific value as a minimum of
> > > the needed space. Linux distributions might want to distribute kernel
> > > with non-zero value and still use the static "__log_buf" on reasonable
> > > small systems.
> > 
> > Not sure if I follow what you mean by CPU_NUM-specific, can you
> > elaborate?
> 
> I wanted to say that the space requested by LOG_CPU_BUF_SHIFT depends
> on the number of CPUs. If LOG_CPU_BUF_SHIFT is not zero, your
> patch always allocates new ringbuffer and leave the static "__log_buf"
> unused. I think that this is not necessary for machines with small
> amount of CPUs and probably also with small amount of memory.

True, which is why I disabled it by default if we want to only leave
this disabled for < certain amount of num CPU systems, what is that
number, I see below a recommendation and I do like it.

> I would rename the variable to LOG_CPU_BUF_MIN_SHIFT or so. It would
> represent minimal size that is needed to print CPU-specific
> messages. If they take only "small" part of the default ring buffer
> size, we could still use the default rind buffer.

True, and will rename this, that still leaves open the question of a
number of CPUs that is sensible to keep but you resolve that below.

> For example, if we left 50% of the default buffer for CPU-specific
> messages, the code might look like:
> 
> 	#define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT)
> 
> 	int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN;
> 
> 	if (!new_log_buf_len && (cpu_extra > __LOG_BUF_LEN / 2))
> 		new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
> 
> 	if (!new_log_buf_len)
> 		return;
> 
> 	allocate the new ring buffer...

Yeah I like these heuristics a lot, will fold them in and send a v2 now in patch form.
To be clear with this CONFIG_LOG_CPU_MIN_BUF_SHIFT could actually be left now to
something other than non zeo by default and only if that contribution is seen to
go above 1/2 of __LOG_BUF_LEN will we allocate more for the ring buffer. With
default values of LOG_BUF_SHIFT at 18 and say a default value of 12 for
LOG_CPU_MIN_BUF_SHIFT this would mean we'd need if we remove the -1 we'd require
64 CPUs in order to trigger an allocation for more memory. If we keep the -1 we'd
require anything over 64 number of CPUs. Do we want to keep the -1 and the > 64
CPU requirement as default? Is the LOG_CPU_MIN_BUF_SHIFT default of 12 resonable
to start with (assumes 4KB in the worst case before the kernel ring buffer flips over).

> > The default in this patch is to ignore this, do you mean that upstream
> > should probably default to a non-zero value here and then let distributions
> > select 0 for some kernel builds ?
> 
> If the change has effect only for huge systems, the default value
> might be non-zero everywhere.

Sure.

> > If so then perhaps adding a sysctl override value might be good to
> > allow only small systems to override this to 0?
> 
> I think that it won't help to lover the value using sysctl because the
> huge buffer would be already allocated during boot. If I did not miss anything.
> 
> [...]

Yeah true, a sensible default would be best, with the systctl we'd also have
to handle dynamic re-allocations and while the tracing code already added
code to make this easier I'd prefer we don't make this a popular path.

> > 
> > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > > > index 7228258..2023424 100644
> > > > --- a/kernel/printk/printk.c
> > > > +++ b/kernel/printk/printk.c
> > > > @@ -246,6 +246,7 @@ static u32 clear_idx;
> > > >  #define LOG_ALIGN __alignof__(struct printk_log)
> > > >  #endif
> > > >  #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
> > > > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT)
> > > >  static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> > > >  static char *log_buf = __log_buf;
> > > >  static u32 log_buf_len = __LOG_BUF_LEN;
> > > > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early)
> > > >  	unsigned long flags;
> > > >  	char *new_log_buf;
> > > >  	int free;
> > > > +	int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN;
> > > >  
> > > > -	if (!new_log_buf_len)
> > > > -		return;
> > > > +	if (!new_log_buf_len && cpu_extra > 1)
> > > > +		new_log_buf_len = __LOG_BUF_LEN + cpu_extra;
> > > 
> > > We still should return when both new_log_buf_len and cpu_extra are
> > > zero and call here:
> > > 
> > > 	if (!new_log_buf_len)
> > > 		return;
> > 
> > The check for cpu_extra > 1 does that -- the default in the patch was 0
> > and 1 << 0 is 1, so if in the case that the default is used we'd bail
> > just like before. Or did I perhaps miss what you were saying here?
> 
> The problem is that we do not bail out because you removed the "return".
> If "new_log_buf_len=0" and "cpu_extra=1" then we keep
> "new_log_buf_len" as is. Then we continue, try to allocate zero memory
> and print error: "log_buf_len: 0 bytes not available". Do I get it right?

Yeah sorry, I meant to add the else.. and bail with a return if the default
was of 0 was not used or if the kernel parameter to increase the size was
not passed.

> > > Also I would feel more comfortable if we somehow limit the maximum
> > > size of cpu_extra.
> > 
> > Michal had similar concerns and I thought up to limit it to 1024 max
> > CPUs, but after my second implementation I did some math on the values
> > that would be used if say LOG_CPU_BUF_SHIFT was 12, it turns out to not
> > be *that* bad for even huge num_possible_cpus(). For example for 4096
> > num_possible_cpus() this comes out to with LOG_BUF_SHIFT of 18:
> > 
> > 
> > ((1 << 18) + ((4096 - 1) * (1 << 12))) / 1024 = 16636 KB
> > 
> > ~16 MB doesn't seem that bad for such a monster box which I'd presume
> > would have an insane amount of memory. If this logic however does
> > seems unreasonable and we should cap it -- then by all means lets
> > pick a sensible number, its just not clear to me what that number
> > should be. Another reason why I stayed away from capping this was
> > that we'd then likely end up capping this in the future, and I was
> > trying to find a solution that would not require mucking as
> > technology evolves. The reasoning above is also why I had opted to
> > make the default to 0, only distributions would have a good sense
> > of what might be reasonable, which I guess begs more for a sysctl
> > value here.
> 
> I am not sure but I think that the huge buffer would be allocated
> before any sysctl value could be modified. So, I think that sysctl
> would not really help here.

Sure.

> I think that the 10% or 20% of the total memory size is a good limit.
> Nobody would want to use more than 20% of memory for logging. So, it
> needs not be higher. The main purpose of the limit is that the system
> does not die immediately after allocating the ring buffer. The 80%
> reserve for the rest of the system sounds fine as well. Note that
> the limit won't be needed on 99,9% of systems but it would help
> with debugging the last 0.1% :-)

Oh, what we do for the the 0.1%.

  Luis

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

  reply	other threads:[~2014-06-12 20:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-11  1:04 [RFC] printk: allow increasing the ring buffer depending on the number of CPUs Luis R. Rodriguez
2014-06-11  9:34 ` Petr Mládek
2014-06-11 21:47   ` Luis R. Rodriguez
2014-06-12 13:05     ` Petr Mládek
2014-06-12 20:22       ` Luis R. Rodriguez [this message]
2014-06-12 18:01   ` Davidlohr Bueso
2014-06-12 18:45     ` Joe Perches
2014-06-12 21:28       ` Luis R. Rodriguez
2014-06-16 20:17       ` Chris Metcalf
2014-06-12 21:12     ` Luis R. Rodriguez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140612202246.GA4841@wotan.suse.de \
    --to=mcgrof@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=arunks.linux@gmail.com \
    --cc=cmetcalf@tilera.com \
    --cc=davidlohr@hp.com \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@do-not-panic.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=pmladek@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.