* [RFC] printk: allow increasing the ring buffer depending on the number of CPUs @ 2014-06-11 1:04 Luis R. Rodriguez 2014-06-11 9:34 ` Petr Mládek 0 siblings, 1 reply; 10+ messages in thread From: Luis R. Rodriguez @ 2014-06-11 1:04 UTC (permalink / raw) To: linux-kernel Cc: Luis R. Rodriguez, Michal Hocko, Petr Mladek, Andrew Morton, Joe Perches, Arun KS, Kees Cook From: "Luis R. Rodriguez" <mcgrof@suse.com> The default size of the ring buffer is too small for machines with a large amount of CPUs under heavy load. What ends up happening when debugging is the ring buffer overlaps and chews up old messages making debugging impossible unless the size is passed as a kernel parameter. An idle system upon boot up will on average spew out only about one or two extra lines but where this really matters is on heavy load and that will vary widely depending on the system and environment. There are mechanisms to help increase the kernel ring buffer for tracing through debugfs, and those interfaces even allow growing the kernel ring buffer per CPU. We also have a static value which can be passed upon boot. Relying on debugfs however is not ideal for production, and relying on the value passed upon bootup is can only used *after* an issue has creeped up. Instead of being reactive this adds a proactive measure which lets you scale the amount of contributions you'd expect to the kernel ring buffer under load by each CPU in the worst case scenerio. We use num_possible_cpus() to avoid complexities which could be introduced by dynamically changing the ring buffer size at run time, num_possible_cpus() lets us use the upper limit on possible number of CPUs therefore avoiding having to deal with hotplugging CPUs on and off. This option is diabled by default, and if used the kernel ring buffer size then can be computed as follows: size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) * __LOG_CPU_BUF_LEN Cc: Michal Hocko <mhocko@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Cc: Arun KS <arunks.linux@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> --- init/Kconfig | 28 ++++++++++++++++++++++++++++ kernel/printk/printk.c | 6 ++++-- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 9d3585b..1814436 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT 13 => 8 KB 12 => 4 KB +config LOG_CPU_BUF_SHIFT + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" + range 0 21 + default 0 + help + The kernel ring buffer will get additional data logged onto it + when multiple CPUs are supported. Typically the contributions is a + few lines when idle however under under load this can vary and in the + worst case it can mean loosing logging information. You can use this + to set the maximum expected mount of amount of logging contribution + under load by each CPU in the worst case scenerio. Select a size as + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as + follows having 16 CPUs as possible. + + ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB + + Where as typically you'd only end up with 256 KB. This is disabled + by default with a value of 0. + + Examples: + 17 => 128 KB + 16 => 64 KB + 15 => 32 KB + 14 => 16 KB + 13 => 8 KB + 12 => 4 KB + # # Architectures with an unreliable sched_clock() should select this: # diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 7228258..2023424 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -246,6 +246,7 @@ static u32 clear_idx; #define LOG_ALIGN __alignof__(struct printk_log) #endif #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT) static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; @@ -752,9 +753,10 @@ void __init setup_log_buf(int early) unsigned long flags; char *new_log_buf; int free; + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN; - if (!new_log_buf_len) - return; + if (!new_log_buf_len && cpu_extra > 1) + new_log_buf_len = __LOG_BUF_LEN + cpu_extra; if (early) { new_log_buf = -- 2.0.0.rc3.18.g00a5b79 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-11 1:04 [RFC] printk: allow increasing the ring buffer depending on the number of CPUs Luis R. Rodriguez @ 2014-06-11 9:34 ` Petr Mládek 2014-06-11 21:47 ` Luis R. Rodriguez 2014-06-12 18:01 ` Davidlohr Bueso 0 siblings, 2 replies; 10+ messages in thread From: Petr Mládek @ 2014-06-11 9:34 UTC (permalink / raw) To: Luis R. Rodriguez Cc: linux-kernel, Luis R. Rodriguez, Michal Hocko, Andrew Morton, Joe Perches, Arun KS, Kees Cook On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote: > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > The default size of the ring buffer is too small for machines > with a large amount of CPUs under heavy load. What ends up > happening when debugging is the ring buffer overlaps and chews > up old messages making debugging impossible unless the size is > passed as a kernel parameter. An idle system upon boot up will > on average spew out only about one or two extra lines but where > this really matters is on heavy load and that will vary widely > depending on the system and environment. Thanks for looking at this. It is a pity to lose stracktrace when a huge machine Oopses just because the default ring buffer is too small. > There are mechanisms to help increase the kernel ring buffer > for tracing through debugfs, and those interfaces even allow growing > the kernel ring buffer per CPU. We also have a static value which > can be passed upon boot. Relying on debugfs however is not ideal > for production, and relying on the value passed upon bootup is > can only used *after* an issue has creeped up. Instead of being > reactive this adds a proactive measure which lets you scale the > amount of contributions you'd expect to the kernel ring buffer > under load by each CPU in the worst case scenerio. > > We use num_possible_cpus() to avoid complexities which could be > introduced by dynamically changing the ring buffer size at run > time, num_possible_cpus() lets us use the upper limit on possible > number of CPUs therefore avoiding having to deal with hotplugging > CPUs on and off. This option is diabled by default, and if used > the kernel ring buffer size then can be computed as follows: > > size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) * __LOG_CPU_BUF_LEN > > Cc: Michal Hocko <mhocko@suse.cz> > Cc: Petr Mladek <pmladek@suse.cz> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Joe Perches <joe@perches.com> > Cc: Arun KS <arunks.linux@gmail.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> > --- > init/Kconfig | 28 ++++++++++++++++++++++++++++ > kernel/printk/printk.c | 6 ++++-- > 2 files changed, 32 insertions(+), 2 deletions(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 9d3585b..1814436 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT > 13 => 8 KB > 12 => 4 KB > > +config LOG_CPU_BUF_SHIFT > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" > + range 0 21 > + default 0 > + help > + The kernel ring buffer will get additional data logged onto it > + when multiple CPUs are supported. Typically the contributions is a > + few lines when idle however under under load this can vary and in the > + worst case it can mean loosing logging information. You can use this > + to set the maximum expected mount of amount of logging contribution > + under load by each CPU in the worst case scenerio. Select a size as > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as > + follows having 16 CPUs as possible. > + > + ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB It might be better to use the CPU_NUM-specific value as a minimum of the needed space. Linux distributions might want to distribute kernel with non-zero value and still use the static "__log_buf" on reasonable small systems. > + Where as typically you'd only end up with 256 KB. This is disabled > + by default with a value of 0. I would add: This value is ignored when "log_buf_len" commandline parameter is used. It forces the exact size of the ring buffer. > + Examples: > + 17 => 128 KB > + 16 => 64 KB > + 15 => 32 KB > + 14 => 16 KB > + 13 => 8 KB > + 12 => 4 KB I think that we should make it more cleat that it is per-CPU here, for example: 17 => 128 KB for each CPU 16 => 64 KB for each CPU 15 => 32 KB for each CPU 14 => 16 KB for each CPU 13 => 8 KB for each CPU 12 => 4 KB for each CPU > # > # Architectures with an unreliable sched_clock() should select this: > # > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 7228258..2023424 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -246,6 +246,7 @@ static u32 clear_idx; > #define LOG_ALIGN __alignof__(struct printk_log) > #endif > #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT) > static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > static char *log_buf = __log_buf; > static u32 log_buf_len = __LOG_BUF_LEN; > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early) > unsigned long flags; > char *new_log_buf; > int free; > + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN; > > - if (!new_log_buf_len) > - return; > + if (!new_log_buf_len && cpu_extra > 1) > + new_log_buf_len = __LOG_BUF_LEN + cpu_extra; We still should return when both new_log_buf_len and cpu_extra are zero and call here: if (!new_log_buf_len) return; Also I would feel more comfortable if we somehow limit the maximum size of cpu_extra. I wonder if there might be a crazy setup with a lot of possible CPUs and possible memory but with some minimal amount of CPUs and memory at the boot time. The question is how to do it. I am still not much familiar with the memory subsystem. I wonder if 10% of memory defined by the "total_rampages" variable would be a reasonable limit. > if (early) { > new_log_buf = > -- > 2.0.0.rc3.18.g00a5b79 > > LocalWords: buf len cpu boottime ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-11 9:34 ` Petr Mládek @ 2014-06-11 21:47 ` Luis R. Rodriguez 2014-06-12 13:05 ` Petr Mládek 2014-06-12 18:01 ` Davidlohr Bueso 1 sibling, 1 reply; 10+ messages in thread From: Luis R. Rodriguez @ 2014-06-11 21:47 UTC (permalink / raw) To: Petr Mládek Cc: Luis R. Rodriguez, linux-kernel, Michal Hocko, Andrew Morton, Joe Perches, Arun KS, Kees Cook, Mel Gorman [-- Attachment #1: Type: text/plain, Size: 6158 bytes --] On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote: > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote: > > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > diff --git a/init/Kconfig b/init/Kconfig > > index 9d3585b..1814436 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT > > 13 => 8 KB > > 12 => 4 KB > > > > +config LOG_CPU_BUF_SHIFT > > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" > > + range 0 21 > > + default 0 > > + help > > + The kernel ring buffer will get additional data logged onto it > > + when multiple CPUs are supported. Typically the contributions is a > > + few lines when idle however under under load this can vary and in the > > + worst case it can mean loosing logging information. You can use this > > + to set the maximum expected mount of amount of logging contribution > > + under load by each CPU in the worst case scenerio. Select a size as > > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your > > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as > > + follows having 16 CPUs as possible. > > + > > + ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB > > It might be better to use the CPU_NUM-specific value as a minimum of > the needed space. Linux distributions might want to distribute kernel > with non-zero value and still use the static "__log_buf" on reasonable > small systems. Not sure if I follow what you mean by CPU_NUM-specific, can you elaborate? The default in this patch is to ignore this, do you mean that upstream should probably default to a non-zero value here and then let distributions select 0 for some kernel builds ? If so then perhaps adding a sysctl override value might be good to allow only small systems to override this to 0? > > + Where as typically you'd only end up with 256 KB. This is disabled > > + by default with a value of 0. > > I would add: > > This value is ignored when "log_buf_len" commandline parameter > is used. It forces the exact size of the ring buffer. Good point, I've amended this in. > > + Examples: > > + 17 => 128 KB > > + 16 => 64 KB > > + 15 => 32 KB > > + 14 => 16 KB > > + 13 => 8 KB > > + 12 => 4 KB > > I think that we should make it more cleat that it is per-CPU here, > for example: > > 17 => 128 KB for each CPU > 16 => 64 KB for each CPU > 15 => 32 KB for each CPU > 14 => 16 KB for each CPU > 13 => 8 KB for each CPU > 12 => 4 KB for each CPU Thanks, amended as well. > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 7228258..2023424 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -246,6 +246,7 @@ static u32 clear_idx; > > #define LOG_ALIGN __alignof__(struct printk_log) > > #endif > > #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) > > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT) > > static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > > static char *log_buf = __log_buf; > > static u32 log_buf_len = __LOG_BUF_LEN; > > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early) > > unsigned long flags; > > char *new_log_buf; > > int free; > > + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN; > > > > - if (!new_log_buf_len) > > - return; > > + if (!new_log_buf_len && cpu_extra > 1) > > + new_log_buf_len = __LOG_BUF_LEN + cpu_extra; > > We still should return when both new_log_buf_len and cpu_extra are > zero and call here: > > if (!new_log_buf_len) > return; The check for cpu_extra > 1 does that -- the default in the patch was 0 and 1 << 0 is 1, so if in the case that the default is used we'd bail just like before. Or did I perhaps miss what you were saying here? > Also I would feel more comfortable if we somehow limit the maximum > size of cpu_extra. Michal had similar concerns and I thought up to limit it to 1024 max CPUs, but after my second implementation I did some math on the values that would be used if say LOG_CPU_BUF_SHIFT was 12, it turns out to not be *that* bad for even huge num_possible_cpus(). For example for 4096 num_possible_cpus() this comes out to with LOG_BUF_SHIFT of 18: ((1 << 18) + ((4096 - 1) * (1 << 12))) / 1024 = 16636 KB ~16 MB doesn't seem that bad for such a monster box which I'd presume would have an insane amount of memory. If this logic however does seems unreasonable and we should cap it -- then by all means lets pick a sensible number, its just not clear to me what that number should be. Another reason why I stayed away from capping this was that we'd then likely end up capping this in the future, and I was trying to find a solution that would not require mucking as technology evolves. The reasoning above is also why I had opted to make the default to 0, only distributions would have a good sense of what might be reasonable, which I guess begs more for a sysctl value here. > I wonder if there might be a crazy setup with a lot > of possible CPUs and possible memory but with some minimal amount of > CPUs and memory at the boot time. When I tested disabling smp I saw the log was still amended to include information about the disabled CPUs, I however hadn't tested on a machine with hot pluggable CPUs and with tons of CPUs disabled, so not sure if that adds more info as well. This also though points more to this being more a system specific thing, which is another reason to perhaps keep this disabled and leave this instead as a system config? > The question is how to do it. I am still not much familiar with the > memory subsystem. I wonder if 10% of memory defined by the > "total_rampages" variable would be a reasonable limit. Not sure either, curious if Mel might have a suggestion? > > > if (early) { > > new_log_buf = > > -- > > 2.0.0.rc3.18.g00a5b79 > > > > > LocalWords: buf len cpu boottime What's this? :) Luis [-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-11 21:47 ` Luis R. Rodriguez @ 2014-06-12 13:05 ` Petr Mládek 2014-06-12 20:22 ` Luis R. Rodriguez 0 siblings, 1 reply; 10+ messages in thread From: Petr Mládek @ 2014-06-12 13:05 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Luis R. Rodriguez, linux-kernel, Michal Hocko, Andrew Morton, Joe Perches, Arun KS, Kees Cook, Mel Gorman On Wed 2014-06-11 23:47:41, Luis R. Rodriguez wrote: > On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote: > > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote: > > > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > > diff --git a/init/Kconfig b/init/Kconfig > > > index 9d3585b..1814436 100644 > > > --- a/init/Kconfig > > > +++ b/init/Kconfig > > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT > > > 13 => 8 KB > > > 12 => 4 KB > > > > > > +config LOG_CPU_BUF_SHIFT > > > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" > > > + range 0 21 > > > + default 0 > > > + help > > > + The kernel ring buffer will get additional data logged onto it > > > + when multiple CPUs are supported. Typically the contributions is a > > > + few lines when idle however under under load this can vary and in the > > > + worst case it can mean loosing logging information. You can use this > > > + to set the maximum expected mount of amount of logging contribution > > > + under load by each CPU in the worst case scenerio. Select a size as > > > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your > > > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as > > > + follows having 16 CPUs as possible. > > > + > > > + ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB > > > > It might be better to use the CPU_NUM-specific value as a minimum of > > the needed space. Linux distributions might want to distribute kernel > > with non-zero value and still use the static "__log_buf" on reasonable > > small systems. > > Not sure if I follow what you mean by CPU_NUM-specific, can you > elaborate? I wanted to say that the space requested by LOG_CPU_BUF_SHIFT depends on the number of CPUs. If LOG_CPU_BUF_SHIFT is not zero, your patch always allocates new ringbuffer and leave the static "__log_buf" unused. I think that this is not necessary for machines with small amount of CPUs and probably also with small amount of memory. I would rename the variable to LOG_CPU_BUF_MIN_SHIFT or so. It would represent minimal size that is needed to print CPU-specific messages. If they take only "small" part of the default ring buffer size, we could still use the default rind buffer. For example, if we left 50% of the default buffer for CPU-specific messages, the code might look like: #define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT) int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN; if (!new_log_buf_len && (cpu_extra > __LOG_BUF_LEN / 2)) new_log_buf_len = __LOG_BUF_LEN + cpu_extra; if (!new_log_buf_len) return; allocate the new ring buffer... > The default in this patch is to ignore this, do you mean that upstream > should probably default to a non-zero value here and then let distributions > select 0 for some kernel builds ? If the change has effect only for huge systems, the default value might be non-zero everywhere. > If so then perhaps adding a sysctl override value might be good to > allow only small systems to override this to 0? I think that it won't help to lover the value using sysctl because the huge buffer would be already allocated during boot. If I did not miss anything. [...] > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > > index 7228258..2023424 100644 > > > --- a/kernel/printk/printk.c > > > +++ b/kernel/printk/printk.c > > > @@ -246,6 +246,7 @@ static u32 clear_idx; > > > #define LOG_ALIGN __alignof__(struct printk_log) > > > #endif > > > #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) > > > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT) > > > static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > > > static char *log_buf = __log_buf; > > > static u32 log_buf_len = __LOG_BUF_LEN; > > > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early) > > > unsigned long flags; > > > char *new_log_buf; > > > int free; > > > + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN; > > > > > > - if (!new_log_buf_len) > > > - return; > > > + if (!new_log_buf_len && cpu_extra > 1) > > > + new_log_buf_len = __LOG_BUF_LEN + cpu_extra; > > > > We still should return when both new_log_buf_len and cpu_extra are > > zero and call here: > > > > if (!new_log_buf_len) > > return; > > The check for cpu_extra > 1 does that -- the default in the patch was 0 > and 1 << 0 is 1, so if in the case that the default is used we'd bail > just like before. Or did I perhaps miss what you were saying here? The problem is that we do not bail out because you removed the "return". If "new_log_buf_len=0" and "cpu_extra=1" then we keep "new_log_buf_len" as is. Then we continue, try to allocate zero memory and print error: "log_buf_len: 0 bytes not available". Do I get it right? > > Also I would feel more comfortable if we somehow limit the maximum > > size of cpu_extra. > > Michal had similar concerns and I thought up to limit it to 1024 max > CPUs, but after my second implementation I did some math on the values > that would be used if say LOG_CPU_BUF_SHIFT was 12, it turns out to not > be *that* bad for even huge num_possible_cpus(). For example for 4096 > num_possible_cpus() this comes out to with LOG_BUF_SHIFT of 18: > > > ((1 << 18) + ((4096 - 1) * (1 << 12))) / 1024 = 16636 KB > > ~16 MB doesn't seem that bad for such a monster box which I'd presume > would have an insane amount of memory. If this logic however does > seems unreasonable and we should cap it -- then by all means lets > pick a sensible number, its just not clear to me what that number > should be. Another reason why I stayed away from capping this was > that we'd then likely end up capping this in the future, and I was > trying to find a solution that would not require mucking as > technology evolves. The reasoning above is also why I had opted to > make the default to 0, only distributions would have a good sense > of what might be reasonable, which I guess begs more for a sysctl > value here. I am not sure but I think that the huge buffer would be allocated before any sysctl value could be modified. So, I think that sysctl would not really help here. I think that the 10% or 20% of the total memory size is a good limit. Nobody would want to use more than 20% of memory for logging. So, it needs not be higher. The main purpose of the limit is that the system does not die immediately after allocating the ring buffer. The 80% reserve for the rest of the system sounds fine as well. Note that the limit won't be needed on 99,9% of systems but it would help with debugging the last 0.1% :-) > > I wonder if there might be a crazy setup with a lot > > of possible CPUs and possible memory but with some minimal amount of > > CPUs and memory at the boot time. > > When I tested disabling smp I saw the log was still amended to include > information about the disabled CPUs, I however hadn't tested on a machine > with hot pluggable CPUs and with tons of CPUs disabled, so not sure if > that adds more info as well. This also though points more to this being > more a system specific thing, which is another reason to perhaps keep this > disabled and leave this instead as a system config? > > > The question is how to do it. I am still not much familiar with the > > memory subsystem. I wonder if 10% of memory defined by the > > "total_rampages" variable would be a reasonable limit. > > Not sure either, curious if Mel might have a suggestion? > > > > > > if (early) { > > > new_log_buf = > > > -- > > > 2.0.0.rc3.18.g00a5b79 > > > > > > > > LocalWords: buf len cpu boottime > > What's this? :) Heh, emacs added this when doing spell check. It is strage, it does it only from time to time :-) Best Regards, Petr ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-12 13:05 ` Petr Mládek @ 2014-06-12 20:22 ` Luis R. Rodriguez 0 siblings, 0 replies; 10+ messages in thread From: Luis R. Rodriguez @ 2014-06-12 20:22 UTC (permalink / raw) To: Petr Mládek Cc: Luis R. Rodriguez, linux-kernel, Michal Hocko, Andrew Morton, Joe Perches, Arun KS, Kees Cook, Mel Gorman, Davidlohr Bueso, Chris Metcalf [-- Attachment #1: Type: text/plain, Size: 8758 bytes --] On Thu, Jun 12, 2014 at 03:05:32PM +0200, Petr Mládek wrote: > On Wed 2014-06-11 23:47:41, Luis R. Rodriguez wrote: > > On Wed, Jun 11, 2014 at 11:34:47AM +0200, Petr Mládek wrote: > > > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote: > > > > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > > > diff --git a/init/Kconfig b/init/Kconfig > > > > index 9d3585b..1814436 100644 > > > > --- a/init/Kconfig > > > > +++ b/init/Kconfig > > > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT > > > > 13 => 8 KB > > > > 12 => 4 KB > > > > > > > > +config LOG_CPU_BUF_SHIFT > > > > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" > > > > + range 0 21 > > > > + default 0 > > > > + help > > > > + The kernel ring buffer will get additional data logged onto it > > > > + when multiple CPUs are supported. Typically the contributions is a > > > > + few lines when idle however under under load this can vary and in the > > > > + worst case it can mean loosing logging information. You can use this > > > > + to set the maximum expected mount of amount of logging contribution > > > > + under load by each CPU in the worst case scenerio. Select a size as > > > > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your > > > > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as > > > > + follows having 16 CPUs as possible. > > > > + > > > > + ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB > > > > > > It might be better to use the CPU_NUM-specific value as a minimum of > > > the needed space. Linux distributions might want to distribute kernel > > > with non-zero value and still use the static "__log_buf" on reasonable > > > small systems. > > > > Not sure if I follow what you mean by CPU_NUM-specific, can you > > elaborate? > > I wanted to say that the space requested by LOG_CPU_BUF_SHIFT depends > on the number of CPUs. If LOG_CPU_BUF_SHIFT is not zero, your > patch always allocates new ringbuffer and leave the static "__log_buf" > unused. I think that this is not necessary for machines with small > amount of CPUs and probably also with small amount of memory. True, which is why I disabled it by default if we want to only leave this disabled for < certain amount of num CPU systems, what is that number, I see below a recommendation and I do like it. > I would rename the variable to LOG_CPU_BUF_MIN_SHIFT or so. It would > represent minimal size that is needed to print CPU-specific > messages. If they take only "small" part of the default ring buffer > size, we could still use the default rind buffer. True, and will rename this, that still leaves open the question of a number of CPUs that is sensible to keep but you resolve that below. > For example, if we left 50% of the default buffer for CPU-specific > messages, the code might look like: > > #define __LOG_CPU_MIN_BUF_LEN (1 << CONFIG_LOG_CPU_MIN_BUF_SHIFT) > > int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_MIN_BUF_LEN; > > if (!new_log_buf_len && (cpu_extra > __LOG_BUF_LEN / 2)) > new_log_buf_len = __LOG_BUF_LEN + cpu_extra; > > if (!new_log_buf_len) > return; > > allocate the new ring buffer... Yeah I like these heuristics a lot, will fold them in and send a v2 now in patch form. To be clear with this CONFIG_LOG_CPU_MIN_BUF_SHIFT could actually be left now to something other than non zeo by default and only if that contribution is seen to go above 1/2 of __LOG_BUF_LEN will we allocate more for the ring buffer. With default values of LOG_BUF_SHIFT at 18 and say a default value of 12 for LOG_CPU_MIN_BUF_SHIFT this would mean we'd need if we remove the -1 we'd require 64 CPUs in order to trigger an allocation for more memory. If we keep the -1 we'd require anything over 64 number of CPUs. Do we want to keep the -1 and the > 64 CPU requirement as default? Is the LOG_CPU_MIN_BUF_SHIFT default of 12 resonable to start with (assumes 4KB in the worst case before the kernel ring buffer flips over). > > The default in this patch is to ignore this, do you mean that upstream > > should probably default to a non-zero value here and then let distributions > > select 0 for some kernel builds ? > > If the change has effect only for huge systems, the default value > might be non-zero everywhere. Sure. > > If so then perhaps adding a sysctl override value might be good to > > allow only small systems to override this to 0? > > I think that it won't help to lover the value using sysctl because the > huge buffer would be already allocated during boot. If I did not miss anything. > > [...] Yeah true, a sensible default would be best, with the systctl we'd also have to handle dynamic re-allocations and while the tracing code already added code to make this easier I'd prefer we don't make this a popular path. > > > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > > > index 7228258..2023424 100644 > > > > --- a/kernel/printk/printk.c > > > > +++ b/kernel/printk/printk.c > > > > @@ -246,6 +246,7 @@ static u32 clear_idx; > > > > #define LOG_ALIGN __alignof__(struct printk_log) > > > > #endif > > > > #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) > > > > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT) > > > > static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > > > > static char *log_buf = __log_buf; > > > > static u32 log_buf_len = __LOG_BUF_LEN; > > > > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early) > > > > unsigned long flags; > > > > char *new_log_buf; > > > > int free; > > > > + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN; > > > > > > > > - if (!new_log_buf_len) > > > > - return; > > > > + if (!new_log_buf_len && cpu_extra > 1) > > > > + new_log_buf_len = __LOG_BUF_LEN + cpu_extra; > > > > > > We still should return when both new_log_buf_len and cpu_extra are > > > zero and call here: > > > > > > if (!new_log_buf_len) > > > return; > > > > The check for cpu_extra > 1 does that -- the default in the patch was 0 > > and 1 << 0 is 1, so if in the case that the default is used we'd bail > > just like before. Or did I perhaps miss what you were saying here? > > The problem is that we do not bail out because you removed the "return". > If "new_log_buf_len=0" and "cpu_extra=1" then we keep > "new_log_buf_len" as is. Then we continue, try to allocate zero memory > and print error: "log_buf_len: 0 bytes not available". Do I get it right? Yeah sorry, I meant to add the else.. and bail with a return if the default was of 0 was not used or if the kernel parameter to increase the size was not passed. > > > Also I would feel more comfortable if we somehow limit the maximum > > > size of cpu_extra. > > > > Michal had similar concerns and I thought up to limit it to 1024 max > > CPUs, but after my second implementation I did some math on the values > > that would be used if say LOG_CPU_BUF_SHIFT was 12, it turns out to not > > be *that* bad for even huge num_possible_cpus(). For example for 4096 > > num_possible_cpus() this comes out to with LOG_BUF_SHIFT of 18: > > > > > > ((1 << 18) + ((4096 - 1) * (1 << 12))) / 1024 = 16636 KB > > > > ~16 MB doesn't seem that bad for such a monster box which I'd presume > > would have an insane amount of memory. If this logic however does > > seems unreasonable and we should cap it -- then by all means lets > > pick a sensible number, its just not clear to me what that number > > should be. Another reason why I stayed away from capping this was > > that we'd then likely end up capping this in the future, and I was > > trying to find a solution that would not require mucking as > > technology evolves. The reasoning above is also why I had opted to > > make the default to 0, only distributions would have a good sense > > of what might be reasonable, which I guess begs more for a sysctl > > value here. > > I am not sure but I think that the huge buffer would be allocated > before any sysctl value could be modified. So, I think that sysctl > would not really help here. Sure. > I think that the 10% or 20% of the total memory size is a good limit. > Nobody would want to use more than 20% of memory for logging. So, it > needs not be higher. The main purpose of the limit is that the system > does not die immediately after allocating the ring buffer. The 80% > reserve for the rest of the system sounds fine as well. Note that > the limit won't be needed on 99,9% of systems but it would help > with debugging the last 0.1% :-) Oh, what we do for the the 0.1%. Luis [-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-11 9:34 ` Petr Mládek 2014-06-11 21:47 ` Luis R. Rodriguez @ 2014-06-12 18:01 ` Davidlohr Bueso 2014-06-12 18:45 ` Joe Perches 2014-06-12 21:12 ` Luis R. Rodriguez 1 sibling, 2 replies; 10+ messages in thread From: Davidlohr Bueso @ 2014-06-12 18:01 UTC (permalink / raw) To: Petr Mládek Cc: Luis R. Rodriguez, linux-kernel, Luis R. Rodriguez, Michal Hocko, Andrew Morton, Joe Perches, Arun KS, Kees Cook On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote: > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote: > > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > > > The default size of the ring buffer is too small for machines > > with a large amount of CPUs under heavy load. What ends up > > happening when debugging is the ring buffer overlaps and chews > > up old messages making debugging impossible unless the size is > > passed as a kernel parameter. An idle system upon boot up will > > on average spew out only about one or two extra lines but where > > this really matters is on heavy load and that will vary widely > > depending on the system and environment. > > Thanks for looking at this. It is a pity to lose stracktrace when a huge > machine Oopses just because the default ring buffer is too small. Agreed, I would very much welcome something like this. > > There are mechanisms to help increase the kernel ring buffer > > for tracing through debugfs, and those interfaces even allow growing > > the kernel ring buffer per CPU. We also have a static value which > > can be passed upon boot. Relying on debugfs however is not ideal > > for production, and relying on the value passed upon bootup is > > can only used *after* an issue has creeped up. Instead of being > > reactive this adds a proactive measure which lets you scale the > > amount of contributions you'd expect to the kernel ring buffer > > under load by each CPU in the worst case scenerio. > > > > We use num_possible_cpus() to avoid complexities which could be > > introduced by dynamically changing the ring buffer size at run > > time, num_possible_cpus() lets us use the upper limit on possible > > number of CPUs therefore avoiding having to deal with hotplugging > > CPUs on and off. This option is diabled by default, and if used > > the kernel ring buffer size then can be computed as follows: > > > > size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) * __LOG_CPU_BUF_LEN > > > > Cc: Michal Hocko <mhocko@suse.cz> > > Cc: Petr Mladek <pmladek@suse.cz> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Joe Perches <joe@perches.com> > > Cc: Arun KS <arunks.linux@gmail.com> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: linux-kernel@vger.kernel.org > > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> > > --- > > init/Kconfig | 28 ++++++++++++++++++++++++++++ > > kernel/printk/printk.c | 6 ++++-- > > 2 files changed, 32 insertions(+), 2 deletions(-) > > > > diff --git a/init/Kconfig b/init/Kconfig > > index 9d3585b..1814436 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT > > 13 => 8 KB > > 12 => 4 KB > > > > +config LOG_CPU_BUF_SHIFT > > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" > > + range 0 21 > > + default 0 > > + help > > + The kernel ring buffer will get additional data logged onto it > > + when multiple CPUs are supported. Typically the contributions is a > > + few lines when idle however under under load this can vary and in the > > + worst case it can mean loosing logging information. You can use this > > + to set the maximum expected mount of amount of logging contribution > > + under load by each CPU in the worst case scenerio. Select a size as > > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your > > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as > > + follows having 16 CPUs as possible. > > + > > + ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB > > It might be better to use the CPU_NUM-specific value as a minimum of > the needed space. Linux distributions might want to distribute kernel > with non-zero value and still use the static "__log_buf" on reasonable > small systems. It should also depend on SMP and !BASE_SMALL. I was wondering about disabling this by default as it would defeat the purpose of being a proactive feature. Similarly, I worry about distros choosing a correct default value on their own. > > + Where as typically you'd only end up with 256 KB. This is disabled > > + by default with a value of 0. > > I would add: > > This value is ignored when "log_buf_len" commandline parameter > is used. It forces the exact size of the ring buffer. ... and update Documentation/kernel-parameters.txt to be more descriptive about this new functionality. > > + Examples: > > + 17 => 128 KB > > + 16 => 64 KB > > + 15 => 32 KB > > + 14 => 16 KB > > + 13 => 8 KB > > + 12 => 4 KB > > I think that we should make it more cleat that it is per-CPU here, > for example: > > 17 => 128 KB for each CPU > 16 => 64 KB for each CPU > 15 => 32 KB for each CPU > 14 => 16 KB for each CPU > 13 => 8 KB for each CPU > 12 => 4 KB for each CPU > Agreed. > > # > > # Architectures with an unreliable sched_clock() should select this: > > # > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 7228258..2023424 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -246,6 +246,7 @@ static u32 clear_idx; > > #define LOG_ALIGN __alignof__(struct printk_log) > > #endif > > #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) > > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT) > > static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > > static char *log_buf = __log_buf; > > static u32 log_buf_len = __LOG_BUF_LEN; > > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early) > > unsigned long flags; > > char *new_log_buf; > > int free; > > + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN; If depending on SMP, you can remove the - 1 here. > > - if (!new_log_buf_len) > > - return; > > + if (!new_log_buf_len && cpu_extra > 1) > > + new_log_buf_len = __LOG_BUF_LEN + cpu_extra; > > We still should return when both new_log_buf_len and cpu_extra are > zero and call here: > > if (!new_log_buf_len) > return; Yep. > Also I would feel more comfortable if we somehow limit the maximum > size of cpu_extra. I wonder if there might be a crazy setup with a lot > of possible CPUs and possible memory but with some minimal amount of > CPUs and memory at the boot time. Maybe. But considering that systems with a lot of CPUs *do* have a lot of memory, I wouldn't worry much about this, just like we don't worry about it now. Considering a _large_ 1024 core system and using the max value 21 for CONFIG_LOG_BUF_SHIFT, we would only allocate just over 2Gb of extra space -- trivial for such a system. And if it does break something, then heck, go fix you box and/or just reduce the percpu value. I guess that's a good reason to keep the default to 0 and let users play with it as they wish without compromising uninterested parties. afaict only x86 would be exposed to systems not booting if we fail to allocate. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-12 18:01 ` Davidlohr Bueso @ 2014-06-12 18:45 ` Joe Perches 2014-06-12 21:28 ` Luis R. Rodriguez 2014-06-16 20:17 ` Chris Metcalf 2014-06-12 21:12 ` Luis R. Rodriguez 1 sibling, 2 replies; 10+ messages in thread From: Joe Perches @ 2014-06-12 18:45 UTC (permalink / raw) To: Davidlohr Bueso, Chris Metcalf Cc: Petr Mládek, Luis R. Rodriguez, linux-kernel, Luis R. Rodriguez, Michal Hocko, Andrew Morton, Arun KS, Kees Cook (adding Chris Metcalf for arch/tile, I think this change might impact that arch) On Thu, 2014-06-12 at 11:01 -0700, Davidlohr Bueso wrote: > On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote: > > On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote: > > > From: "Luis R. Rodriguez" <mcgrof@suse.com> > > > > > > The default size of the ring buffer is too small for machines > > > with a large amount of CPUs under heavy load. What ends up > > > happening when debugging is the ring buffer overlaps and chews > > > up old messages making debugging impossible unless the size is > > > passed as a kernel parameter. An idle system upon boot up will > > > on average spew out only about one or two extra lines but where > > > this really matters is on heavy load and that will vary widely > > > depending on the system and environment. > > > > Thanks for looking at this. It is a pity to lose stracktrace when a huge > > machine Oopses just because the default ring buffer is too small. > > Agreed, I would very much welcome something like this. > > > > There are mechanisms to help increase the kernel ring buffer > > > for tracing through debugfs, and those interfaces even allow growing > > > the kernel ring buffer per CPU. We also have a static value which > > > can be passed upon boot. Relying on debugfs however is not ideal > > > for production, and relying on the value passed upon bootup is > > > can only used *after* an issue has creeped up. Instead of being > > > reactive this adds a proactive measure which lets you scale the > > > amount of contributions you'd expect to the kernel ring buffer > > > under load by each CPU in the worst case scenerio. > > > > > > We use num_possible_cpus() to avoid complexities which could be > > > introduced by dynamically changing the ring buffer size at run > > > time, num_possible_cpus() lets us use the upper limit on possible > > > number of CPUs therefore avoiding having to deal with hotplugging > > > CPUs on and off. This option is diabled by default, and if used > > > the kernel ring buffer size then can be computed as follows: > > > > > > size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) * __LOG_CPU_BUF_LEN > > > > > > Cc: Michal Hocko <mhocko@suse.cz> > > > Cc: Petr Mladek <pmladek@suse.cz> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > Cc: Joe Perches <joe@perches.com> > > > Cc: Arun KS <arunks.linux@gmail.com> > > > Cc: Kees Cook <keescook@chromium.org> > > > Cc: linux-kernel@vger.kernel.org > > > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> > > > --- > > > init/Kconfig | 28 ++++++++++++++++++++++++++++ > > > kernel/printk/printk.c | 6 ++++-- > > > 2 files changed, 32 insertions(+), 2 deletions(-) > > > > > > diff --git a/init/Kconfig b/init/Kconfig > > > index 9d3585b..1814436 100644 > > > --- a/init/Kconfig > > > +++ b/init/Kconfig > > > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT > > > 13 => 8 KB > > > 12 => 4 KB > > > > > > +config LOG_CPU_BUF_SHIFT > > > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" > > > + range 0 21 > > > + default 0 > > > + help > > > + The kernel ring buffer will get additional data logged onto it > > > + when multiple CPUs are supported. Typically the contributions is a > > > + few lines when idle however under under load this can vary and in the > > > + worst case it can mean loosing logging information. You can use this trivia: s/loosing/losing/ > > > + to set the maximum expected mount of amount of logging contribution > > > + under load by each CPU in the worst case scenerio. Select a size as > > > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your > > > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as > > > + follows having 16 CPUs as possible. > > > + > > > + ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB > > > > It might be better to use the CPU_NUM-specific value as a minimum of > > the needed space. Linux distributions might want to distribute kernel > > with non-zero value and still use the static "__log_buf" on reasonable > > small systems. > > It should also depend on SMP and !BASE_SMALL. > I was wondering about disabling this by default as it would defeat the > purpose of being a proactive feature. Similarly, I worry about distros > choosing a correct default value on their own. > > > > + Where as typically you'd only end up with 256 KB. This is disabled > > > + by default with a value of 0. > > > > I would add: > > > > This value is ignored when "log_buf_len" commandline parameter > > is used. It forces the exact size of the ring buffer. > > ... and update Documentation/kernel-parameters.txt to be more > descriptive about this new functionality. > > > > + Examples: > > > + 17 => 128 KB > > > + 16 => 64 KB > > > + 15 => 32 KB > > > + 14 => 16 KB > > > + 13 => 8 KB > > > + 12 => 4 KB > > > > I think that we should make it more cleat that it is per-CPU here, > > for example: > > > > 17 => 128 KB for each CPU > > 16 => 64 KB for each CPU > > 15 => 32 KB for each CPU > > 14 => 16 KB for each CPU > > 13 => 8 KB for each CPU > > 12 => 4 KB for each CPU > > > > Agreed. > > > > # > > > # Architectures with an unreliable sched_clock() should select this: > > > # > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > > index 7228258..2023424 100644 > > > --- a/kernel/printk/printk.c > > > +++ b/kernel/printk/printk.c > > > @@ -246,6 +246,7 @@ static u32 clear_idx; > > > #define LOG_ALIGN __alignof__(struct printk_log) > > > #endif > > > #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) > > > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT) > > > static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > > > static char *log_buf = __log_buf; > > > static u32 log_buf_len = __LOG_BUF_LEN; > > > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early) > > > unsigned long flags; > > > char *new_log_buf; > > > int free; > > > + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN; > > If depending on SMP, you can remove the - 1 here. > > > > - if (!new_log_buf_len) > > > - return; > > > + if (!new_log_buf_len && cpu_extra > 1) > > > + new_log_buf_len = __LOG_BUF_LEN + cpu_extra; > > > > We still should return when both new_log_buf_len and cpu_extra are > > zero and call here: > > > > if (!new_log_buf_len) > > return; > > Yep. > > > Also I would feel more comfortable if we somehow limit the maximum > > size of cpu_extra. I wonder if there might be a crazy setup with a lot > > of possible CPUs and possible memory but with some minimal amount of > > CPUs and memory at the boot time. > > Maybe. But considering that systems with a lot of CPUs *do* have a lot > of memory, I wouldn't worry much about this, just like we don't worry > about it now. Considering a _large_ 1024 core system and using the max > value 21 for CONFIG_LOG_BUF_SHIFT, we would only allocate just over 2Gb > of extra space -- trivial for such a system. And if it does break > something, then heck, go fix you box and/or just reduce the percpu > value. I guess that's a good reason to keep the default to 0 and let > users play with it as they wish without compromising uninterested > parties. afaict only x86 would be exposed to systems not booting if we > fail to allocate. > > Thanks, > Davidlohr ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-12 18:45 ` Joe Perches @ 2014-06-12 21:28 ` Luis R. Rodriguez 2014-06-16 20:17 ` Chris Metcalf 1 sibling, 0 replies; 10+ messages in thread From: Luis R. Rodriguez @ 2014-06-12 21:28 UTC (permalink / raw) To: Joe Perches Cc: Davidlohr Bueso, Chris Metcalf, Petr Mládek, linux-kernel@vger.kernel.org, Michal Hocko, Andrew Morton, Arun KS, Kees Cook On Thu, Jun 12, 2014 at 11:45 AM, Joe Perches <joe@perches.com> wrote: > > trivia: s/loosing/losing/ Amended, thanks. Luis ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-12 18:45 ` Joe Perches 2014-06-12 21:28 ` Luis R. Rodriguez @ 2014-06-16 20:17 ` Chris Metcalf 1 sibling, 0 replies; 10+ messages in thread From: Chris Metcalf @ 2014-06-16 20:17 UTC (permalink / raw) To: Joe Perches, Davidlohr Bueso Cc: Petr Mládek, Luis R. Rodriguez, linux-kernel, Luis R. Rodriguez, Michal Hocko, Andrew Morton, Arun KS, Kees Cook On 6/12/2014 2:45 PM, Joe Perches wrote: > (adding Chris Metcalf for arch/tile, > I think this change might impact that arch) Thanks for the Cc. I've been following the discussion and I think it's on the right track for arch/tile. -- Chris Metcalf, Tilera Corp. http://www.tilera.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] printk: allow increasing the ring buffer depending on the number of CPUs 2014-06-12 18:01 ` Davidlohr Bueso 2014-06-12 18:45 ` Joe Perches @ 2014-06-12 21:12 ` Luis R. Rodriguez 1 sibling, 0 replies; 10+ messages in thread From: Luis R. Rodriguez @ 2014-06-12 21:12 UTC (permalink / raw) To: Davidlohr Bueso Cc: Petr Mládek, linux-kernel@vger.kernel.org, Michal Hocko, Andrew Morton, Joe Perches, Arun KS, Kees Cook, Chris Metcalf On Thu, Jun 12, 2014 at 11:01 AM, Davidlohr Bueso <davidlohr@hp.com> wrote: > On Wed, 2014-06-11 at 11:34 +0200, Petr Mládek wrote: >> On Tue 2014-06-10 18:04:45, Luis R. Rodriguez wrote: >> > From: "Luis R. Rodriguez" <mcgrof@suse.com> >> > >> > The default size of the ring buffer is too small for machines >> > with a large amount of CPUs under heavy load. What ends up >> > happening when debugging is the ring buffer overlaps and chews >> > up old messages making debugging impossible unless the size is >> > passed as a kernel parameter. An idle system upon boot up will >> > on average spew out only about one or two extra lines but where >> > this really matters is on heavy load and that will vary widely >> > depending on the system and environment. >> >> Thanks for looking at this. It is a pity to lose stracktrace when a huge >> machine Oopses just because the default ring buffer is too small. > > Agreed, I would very much welcome something like this. Great! >> > There are mechanisms to help increase the kernel ring buffer >> > for tracing through debugfs, and those interfaces even allow growing >> > the kernel ring buffer per CPU. We also have a static value which >> > can be passed upon boot. Relying on debugfs however is not ideal >> > for production, and relying on the value passed upon bootup is >> > can only used *after* an issue has creeped up. Instead of being >> > reactive this adds a proactive measure which lets you scale the >> > amount of contributions you'd expect to the kernel ring buffer >> > under load by each CPU in the worst case scenerio. >> > >> > We use num_possible_cpus() to avoid complexities which could be >> > introduced by dynamically changing the ring buffer size at run >> > time, num_possible_cpus() lets us use the upper limit on possible >> > number of CPUs therefore avoiding having to deal with hotplugging >> > CPUs on and off. This option is diabled by default, and if used >> > the kernel ring buffer size then can be computed as follows: >> > >> > size = __LOG_BUF_LEN + (num_possible_cpus() - 1 ) * __LOG_CPU_BUF_LEN >> > >> > Cc: Michal Hocko <mhocko@suse.cz> >> > Cc: Petr Mladek <pmladek@suse.cz> >> > Cc: Andrew Morton <akpm@linux-foundation.org> >> > Cc: Joe Perches <joe@perches.com> >> > Cc: Arun KS <arunks.linux@gmail.com> >> > Cc: Kees Cook <keescook@chromium.org> >> > Cc: linux-kernel@vger.kernel.org >> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> >> > --- >> > init/Kconfig | 28 ++++++++++++++++++++++++++++ >> > kernel/printk/printk.c | 6 ++++-- >> > 2 files changed, 32 insertions(+), 2 deletions(-) >> > >> > diff --git a/init/Kconfig b/init/Kconfig >> > index 9d3585b..1814436 100644 >> > --- a/init/Kconfig >> > +++ b/init/Kconfig >> > @@ -806,6 +806,34 @@ config LOG_BUF_SHIFT >> > 13 => 8 KB >> > 12 => 4 KB >> > >> > +config LOG_CPU_BUF_SHIFT >> > + int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" >> > + range 0 21 >> > + default 0 >> > + help >> > + The kernel ring buffer will get additional data logged onto it >> > + when multiple CPUs are supported. Typically the contributions is a >> > + few lines when idle however under under load this can vary and in the >> > + worst case it can mean loosing logging information. You can use this >> > + to set the maximum expected mount of amount of logging contribution >> > + under load by each CPU in the worst case scenerio. Select a size as >> > + a power of 2. For example if LOG_BUF_SHIFT is 18 and if your >> > + LOG_CPU_BUF_SHIFT is 12 your kernel ring buffer size will be as >> > + follows having 16 CPUs as possible. >> > + >> > + ((1 << 18) + ((16 - 1) * (1 << 12))) / 1024 = 316 KB >> >> It might be better to use the CPU_NUM-specific value as a minimum of >> the needed space. Linux distributions might want to distribute kernel >> with non-zero value and still use the static "__log_buf" on reasonable >> small systems. > > It should also depend on SMP and !BASE_SMALL. > I was wondering about disabling this by default as it would defeat the > purpose of being a proactive feature. Similarly, I worry about distros > choosing a correct default value on their own. True, it seems Petr's recommendations would address these concerns for systems under a certain amount of limit of number of CPUs, as is right now we require the contribution by CPU in worst case scenario to be > 1/2 of the default kernel ring buffer size that's > 64 number of CPUs. >> > + Where as typically you'd only end up with 256 KB. This is disabled >> > + by default with a value of 0. >> >> I would add: >> >> This value is ignored when "log_buf_len" commandline parameter >> is used. It forces the exact size of the ring buffer. > > ... and update Documentation/kernel-parameters.txt to be more > descriptive about this new functionality. Will do! >> > + Examples: >> > + 17 => 128 KB >> > + 16 => 64 KB >> > + 15 => 32 KB >> > + 14 => 16 KB >> > + 13 => 8 KB >> > + 12 => 4 KB >> >> I think that we should make it more cleat that it is per-CPU here, >> for example: >> >> 17 => 128 KB for each CPU >> 16 => 64 KB for each CPU >> 15 => 32 KB for each CPU >> 14 => 16 KB for each CPU >> 13 => 8 KB for each CPU >> 12 => 4 KB for each CPU >> > > Agreed. Amended. >> > # >> > # Architectures with an unreliable sched_clock() should select this: >> > # >> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> > index 7228258..2023424 100644 >> > --- a/kernel/printk/printk.c >> > +++ b/kernel/printk/printk.c >> > @@ -246,6 +246,7 @@ static u32 clear_idx; >> > #define LOG_ALIGN __alignof__(struct printk_log) >> > #endif >> > #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) >> > +#define __LOG_CPU_BUF_LEN (1 << CONFIG_LOG_CPU_BUF_SHIFT) >> > static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); >> > static char *log_buf = __log_buf; >> > static u32 log_buf_len = __LOG_BUF_LEN; >> > @@ -752,9 +753,10 @@ void __init setup_log_buf(int early) >> > unsigned long flags; >> > char *new_log_buf; >> > int free; >> > + int cpu_extra = (num_possible_cpus() - 1) * __LOG_CPU_BUF_LEN; > > If depending on SMP, you can remove the - 1 here. Great point, however it still does have an implication of the minimum amount of CPUs that will trigger an increase with the heuristics suggested by Petr, with the -1 we'd require > 64 CPUs, without it 64 CPUs would trigger an increase. We need to decide on this, I will add the Kconfig requirement suggestions though. >> > - if (!new_log_buf_len) >> > - return; >> > + if (!new_log_buf_len && cpu_extra > 1) >> > + new_log_buf_len = __LOG_BUF_LEN + cpu_extra; >> >> We still should return when both new_log_buf_len and cpu_extra are >> zero and call here: >> >> if (!new_log_buf_len) >> return; > > Yep. Fixed, thanks. >> Also I would feel more comfortable if we somehow limit the maximum >> size of cpu_extra. I wonder if there might be a crazy setup with a lot >> of possible CPUs and possible memory but with some minimal amount of >> CPUs and memory at the boot time. > > Maybe. But considering that systems with a lot of CPUs *do* have a lot > of memory, I wouldn't worry much about this, just like we don't worry > about it now. Considering a _large_ 1024 core system and using the max > value 21 for CONFIG_LOG_BUF_SHIFT, we would only allocate just over 2Gb > of extra space -- trivial for such a system. And if it does break > something, then heck, go fix you box and/or just reduce the percpu > value. I guess that's a good reason to keep the default to 0 and let > users play with it as they wish without compromising uninterested > parties. afaict only x86 would be exposed to systems not booting if we > fail to allocate. Picking hard limit values is certainly subjective but if we can pick some heuristic that can scale without revisiting this much it'd be great, I think Petr's new suggestion of having the contribution be more than the default kernel ring buffer could help mitigate most issues on smaller systems, a default of 12 (4KB contribution per CPU) is also reasonably small I think based on the computations I've made even for crazy large beasts. Luis ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-06-16 20:17 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-11 1:04 [RFC] printk: allow increasing the ring buffer depending on the number of CPUs Luis R. Rodriguez 2014-06-11 9:34 ` Petr Mládek 2014-06-11 21:47 ` Luis R. Rodriguez 2014-06-12 13:05 ` Petr Mládek 2014-06-12 20:22 ` Luis R. Rodriguez 2014-06-12 18:01 ` Davidlohr Bueso 2014-06-12 18:45 ` Joe Perches 2014-06-12 21:28 ` Luis R. Rodriguez 2014-06-16 20:17 ` Chris Metcalf 2014-06-12 21:12 ` Luis R. Rodriguez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox