* [PATCH 0/4] init: Shrink early messages to prevent overflowing the kernel log buffer
@ 2011-02-25 18:06 Mike Travis
2011-02-25 18:06 ` [PATCH 1/4] printk: Allocate kernel log buffer earlier Mike Travis
` (3 more replies)
0 siblings, 4 replies; 32+ messages in thread
From: Mike Travis @ 2011-02-25 18:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel
On larger systems, information in the kernel log is lost because
there is so much early text printed, that it overflows the static
log buffer before the log_buf_len kernel parameter can be processed,
and a bigger log buffer allocated.
Distros are relunctant to increase memory usage by increasing the
size of the static log buffer, so minimize the problem by allocating
the new log buffer as early as possible. Unfortunately allocating
early does not recover all the early messages, so we also need to
reduce the amount of characters those early messages generate.
Here are the stats testing on a system with 248 nodes, 606 EFI
mem ranges, 1984 cores
after get_log_buff_early: (17% overflow)
[ 0.000000] early log_buf free: -45723/262183(-17%)
[ 0.000000] first line: : mem339: type=7, attr=0xf, range=[0x00000e6000000000-0x00000e6fff000000) (6552
Here I enabled some cores that were disabled so now the system
has 248 nodes, 606 EFI mem ranges, 2368 cores.
after minimize-time-zero-msgs: (5% overflow)
[0] early log_buf free: -15184/262172(-5%)
[0] first line: [0x000000007226e000-0x0000000072271000) (0MB) <6>[0] EFI: mem67: type=3, attr=0
after minimize-srat-msgs.v2: (28% free)
[0] early log_buf free: 73556/188616(28%)
[0] first line: <6>[0] Initializing cgroup subsys cpuset <6>[0] Initializing cgroup subsys cpu
Some previous stats from testing these changes on our current lab
UV systems. (Both of these systems lost all of the e820 and EFI
memmap ranges before the changes.)
System X:
8,793,945,145,344 bytes of system memory
256 nodes
599 EFI Mem ranges
4096 cpu_ids
43% of static log buffer unused
System Y:
11,779,115,188,224 bytes of system memory
492 Nodes
976 EFI Mem ranges
1968 cpu_ids
17% of static log buffer unused
The last stat is how close the static log buffer came
to overflowing. While these resources are fairly close
to today's max limits, there is not a lot of head room
for growth.
An alternative for the future might be to create a larger
static log buffer in the __initdata section, and then
always allocate a dynamically sized log buffer to replace
it. This would also allow shrinking the log buffer for
memory tight situations. But it would add complexity to
the code.
--
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-25 18:06 [PATCH 0/4] init: Shrink early messages to prevent overflowing the kernel log buffer Mike Travis
@ 2011-02-25 18:06 ` Mike Travis
2011-02-27 12:09 ` Ingo Molnar
2011-02-25 18:06 ` [PATCH 2/4] printk: Break out printk_time Mike Travis
` (2 subsequent siblings)
3 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2011-02-25 18:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel
[-- Attachment #1: get_log_buff_early --]
[-- Type: text/plain, Size: 4291 bytes --]
On larger systems, because of the numerous ACPI, Bootmem and EFI
messages, the static log buffer overflows before the larger one
specified by the log_buf_len param is allocated. Minimize the
potential for overflow by allocating the new log buffer as soon
as possible.
We do this by changing the log_buf_len from an early_param to a
_setup param. But _setup params are processed before the
alloc_bootmem is available, so this function will now just save
the requested log buf len. The real work routine (setup_log_buf)
is called after bootmem is available.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Jack Steiner <steiner@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
---
arch/x86/kernel/setup.c | 5 +++
include/linux/printk.h | 4 ++
init/main.c | 1
kernel/printk.c | 75 ++++++++++++++++++++++++++++--------------------
4 files changed, 54 insertions(+), 31 deletions(-)
--- linux.orig/arch/x86/kernel/setup.c
+++ linux/arch/x86/kernel/setup.c
@@ -1007,6 +1007,11 @@ void __init setup_arch(char **cmdline_p)
memblock_find_dma_reserve();
dma32_reserve_bootmem();
+ /*
+ * Allocate bigger log buffer as early as possible
+ */
+ setup_log_buf();
+
#ifdef CONFIG_KVM_CLOCK
kvmclock_init();
#endif
--- linux.orig/include/linux/printk.h
+++ linux/include/linux/printk.h
@@ -1,6 +1,8 @@
#ifndef __KERNEL_PRINTK__
#define __KERNEL_PRINTK__
+#include <linux/init.h>
+
extern const char linux_banner[];
extern const char linux_proc_banner[];
@@ -89,6 +91,8 @@ int no_printk(const char *fmt, ...)
extern asmlinkage __attribute__ ((format (printf, 1, 2)))
void early_printk(const char *fmt, ...);
+void __init setup_log_buf(void);
+
extern int printk_needs_cpu(int cpu);
extern void printk_tick(void);
--- linux.orig/init/main.c
+++ linux/init/main.c
@@ -592,6 +592,7 @@ asmlinkage void __init start_kernel(void
* These use large bootmem allocations and must precede
* kmem_cache_init()
*/
+ setup_log_buf();
pidhash_init();
vfs_caches_init_early();
sort_main_extable();
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -162,46 +162,59 @@ void log_buf_kexec_setup(void)
}
#endif
+/* requested log_buf_len from kernel cmdline */
+static unsigned long __initdata new_log_buf_len;
+
+/* save requested log_buf_len since it's too early to process it */
static int __init log_buf_len_setup(char *str)
{
unsigned size = memparse(str, &str);
- unsigned long flags;
if (size)
size = roundup_pow_of_two(size);
- if (size > log_buf_len) {
- unsigned start, dest_idx, offset;
- char *new_log_buf;
-
- new_log_buf = alloc_bootmem(size);
- if (!new_log_buf) {
- printk(KERN_WARNING "log_buf_len: allocation failed\n");
- goto out;
- }
-
- spin_lock_irqsave(&logbuf_lock, flags);
- log_buf_len = size;
- log_buf = new_log_buf;
-
- offset = start = min(con_start, log_start);
- dest_idx = 0;
- while (start != log_end) {
- log_buf[dest_idx] = __log_buf[start & (__LOG_BUF_LEN - 1)];
- start++;
- dest_idx++;
- }
- log_start -= offset;
- con_start -= offset;
- log_end -= offset;
- spin_unlock_irqrestore(&logbuf_lock, flags);
+ if (size > log_buf_len)
+ new_log_buf_len = size;
- printk(KERN_NOTICE "log_buf_len: %d\n", log_buf_len);
- }
-out:
- return 1;
+ return 0;
}
+early_param("log_buf_len", log_buf_len_setup);
-__setup("log_buf_len=", log_buf_len_setup);
+void __init setup_log_buf(void)
+{
+ unsigned long flags;
+ unsigned start, dest_idx, offset;
+ char *new_log_buf;
+ int free;
+
+ if (!new_log_buf_len)
+ return;
+
+ new_log_buf = alloc_bootmem(new_log_buf_len);
+
+ spin_lock_irqsave(&logbuf_lock, flags);
+ log_buf_len = new_log_buf_len;
+ log_buf = new_log_buf;
+ new_log_buf_len = 0;
+ free = __LOG_BUF_LEN - log_end;
+
+ offset = start = min(con_start, log_start);
+ dest_idx = 0;
+ while (start != log_end) {
+ unsigned log_idx_mask = start & (__LOG_BUF_LEN - 1);
+
+ log_buf[dest_idx] = __log_buf[log_idx_mask];
+ start++;
+ dest_idx++;
+ }
+ log_start -= offset;
+ con_start -= offset;
+ log_end -= offset;
+ spin_unlock_irqrestore(&logbuf_lock, flags);
+
+ pr_info("log_buf_len: %d\n", log_buf_len);
+ pr_info("early log buf free: %d(%d%%)\n",
+ free, (free * 100) / __LOG_BUF_LEN);
+}
#ifdef CONFIG_BOOT_PRINTK_DELAY
--
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 2/4] printk: Break out printk_time
2011-02-25 18:06 [PATCH 0/4] init: Shrink early messages to prevent overflowing the kernel log buffer Mike Travis
2011-02-25 18:06 ` [PATCH 1/4] printk: Allocate kernel log buffer earlier Mike Travis
@ 2011-02-25 18:06 ` Mike Travis
2011-02-27 11:56 ` Ingo Molnar
2011-02-25 18:06 ` [PATCH 3/4] printk: Minimize time zero output Mike Travis
2011-02-25 18:06 ` [PATCH 4/4] x86: Minimize SRAT messages Mike Travis
3 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2011-02-25 18:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel
[-- Attachment #1: break-out-printk_time --]
[-- Type: text/plain, Size: 1568 bytes --]
Clean up printk_time by making it a separate function.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: David Rientjes <rientjes@google.com>
---
kernel/printk.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -739,6 +739,24 @@ static inline void printk_delay(void)
}
}
+/* Follow the token with the time */
+static inline int printk_emit_time(void)
+{
+ char tbuf[50], *tp;
+ unsigned tlen;
+ unsigned long long t;
+ unsigned long microsec_rem;
+
+ t = cpu_clock(printk_cpu);
+ microsec_rem = do_div(t, 1000000000) / 1000;
+ tlen = sprintf(tbuf, "[%5lu.%06lu] ", (unsigned long)t, microsec_rem);
+
+ for (tp = tbuf; tp < tbuf + tlen; tp++)
+ emit_log_char(*tp);
+
+ return tlen;
+}
+
asmlinkage int vprintk(const char *fmt, va_list args)
{
int printed_len = 0;
@@ -823,23 +841,9 @@ asmlinkage int vprintk(const char *fmt,
printed_len += 3;
new_text_line = 0;
- if (printk_time) {
- /* Follow the token with the time */
- char tbuf[50], *tp;
- unsigned tlen;
- unsigned long long t;
- unsigned long nanosec_rem;
-
- t = cpu_clock(printk_cpu);
- nanosec_rem = do_div(t, 1000000000);
- tlen = sprintf(tbuf, "[%5lu.%06lu] ",
- (unsigned long) t,
- nanosec_rem / 1000);
-
- for (tp = tbuf; tp < tbuf + tlen; tp++)
- emit_log_char(*tp);
- printed_len += tlen;
- }
+ /* add time if requested */
+ if (printk_time)
+ printed_len += printk_emit_time();
if (!*p)
break;
--
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 3/4] printk: Minimize time zero output
2011-02-25 18:06 [PATCH 0/4] init: Shrink early messages to prevent overflowing the kernel log buffer Mike Travis
2011-02-25 18:06 ` [PATCH 1/4] printk: Allocate kernel log buffer earlier Mike Travis
2011-02-25 18:06 ` [PATCH 2/4] printk: Break out printk_time Mike Travis
@ 2011-02-25 18:06 ` Mike Travis
2011-02-25 18:06 ` [PATCH 4/4] x86: Minimize SRAT messages Mike Travis
3 siblings, 0 replies; 32+ messages in thread
From: Mike Travis @ 2011-02-25 18:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel
[-- Attachment #1: minimize-time-zero-msgs --]
[-- Type: text/plain, Size: 1360 bytes --]
Reduce the length for time zero messages by only printing "[0] ".
Before:
[ 0.000000] <message>
After:
[0] <message>
There will be somewhere around 4000 to 5000 messages(*) on a fully
configured large NUMA system, before allocation of the log buf can
happen. By removing 11 bytes from each the net gain will be from
44,000 to 55,000 bytes of redundant information removed.
(* - this is after the minimize-srat-msgs patch is applied.)
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Jack Steiner <steiner@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
---
kernel/printk.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -745,12 +745,17 @@ static inline int printk_emit_time(void)
char tbuf[50], *tp;
unsigned tlen;
unsigned long long t;
- unsigned long microsec_rem;
t = cpu_clock(printk_cpu);
- microsec_rem = do_div(t, 1000000000) / 1000;
- tlen = sprintf(tbuf, "[%5lu.%06lu] ", (unsigned long)t, microsec_rem);
+ if (likely(t)) {
+ unsigned long microsec_rem = do_div(t, 1000000000) / 1000;
+ tlen = sprintf(tbuf, "[%5lu.%06lu] ",
+ (unsigned long)t, microsec_rem);
+ } else {
+ /* reduce byte count in log when time is zero */
+ tlen = sprintf(tbuf, "[0] ");
+ }
for (tp = tbuf; tp < tbuf + tlen; tp++)
emit_log_char(*tp);
--
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 4/4] x86: Minimize SRAT messages
2011-02-25 18:06 [PATCH 0/4] init: Shrink early messages to prevent overflowing the kernel log buffer Mike Travis
` (2 preceding siblings ...)
2011-02-25 18:06 ` [PATCH 3/4] printk: Minimize time zero output Mike Travis
@ 2011-02-25 18:06 ` Mike Travis
2011-02-27 12:03 ` Ingo Molnar
3 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2011-02-25 18:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel
[-- Attachment #1: minimize-srat-msgs.v2 --]
[-- Type: text/plain, Size: 2982 bytes --]
Condense the SRAT: messages to show all the APIC id's on one line for
each Node. This not only saves space in the log buf, it also makes
it easier to spot inconsistencies in core to node placement.
On a system with 2368 cores on 248 nodes the change will be...
Was 2368 lines (for 2368 cores):
779 [0] SRAT: PXM 0 -> APIC 0x0000 -> Node 0
780 [0] SRAT: PXM 0 -> APIC 0x0002 -> Node 0
781 [0] SRAT: PXM 0 -> APIC 0x0004 -> Node 0
...
3145 [0] SRAT: PXM 247 -> APIC 0x3df0 -> Node 247
3146 [0] SRAT: PXM 247 -> APIC 0x3df2 -> Node 247
Now it's 248 lines (for 248 Nodes):
821 [0] SRAT: Node 0: PXM:APIC 0:0x0 :0x2 :0x4 :0x10 :0x12 ...
822 [0] SRAT: Node 1: PXM:APIC 1:0x40 :0x42 :0x44 :0x50 :0x52 ...
823 [0] SRAT: Node 2: PXM:APIC 2:0x80 :0x82 :0x84 :0x90 :0x92 ...
...
1067 [0] SRAT: Node 246: PXM:APIC 246:0x3d80 :0x3d82 :0x3d84 :0x3d90 ...
1068 [0] SRAT: Node 247: PXM:APIC 247:0x3dc0 :0x3dc2 :0x3dc4 :0x3dd2 ...
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Jack Steiner <steiner@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
---
arch/x86/mm/srat_64.c | 19 +++++++++++++++++--
drivers/acpi/numa.c | 7 +++++++
2 files changed, 24 insertions(+), 2 deletions(-)
--- linux.orig/arch/x86/mm/srat_64.c
+++ linux/arch/x86/mm/srat_64.c
@@ -110,6 +110,12 @@ void __init acpi_numa_slit_init(struct a
memblock_x86_reserve_range(phys, phys + length, "ACPI SLIT");
}
+/*
+ * Keep track of previous node and PXM values so we can combine
+ * same ones onto a single line.
+ */
+static int __initdata last_node = NUMA_NO_NODE, last_pxm = PXM_INVAL;
+
/* Callback for Proximity Domain -> x2APIC mapping */
void __init
acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
@@ -141,8 +147,17 @@ acpi_numa_x2apic_affinity_init(struct ac
set_apicid_to_node(apic_id, node);
node_set(node, cpu_nodes_parsed);
acpi_numa = 1;
- printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
- pxm, apic_id, node);
+ if (node != last_node) {
+ pr_info("SRAT: Node %u: PXM:APIC %u:0x%x",
+ node, pxm, apic_id);
+ last_node = node;
+ last_pxm = pxm;
+ } else if (pxm != last_pxm) {
+ pr_cont(" %u:0x%x", pxm, apic_id);
+ last_pxm = pxm;
+ } else {
+ pr_cont(" :0x%x", apic_id);
+ }
}
/* Callback for Proximity Domain -> LAPIC mapping */
--- linux.orig/drivers/acpi/numa.c
+++ linux/drivers/acpi/numa.c
@@ -286,6 +286,13 @@ int __init acpi_numa_init(void)
if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
acpi_parse_x2apic_affinity, 0);
+ /*
+ * Parsing ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY entries place
+ * multiple CPU's on the same Node line. This can leave the
+ * last entry "dangling" without a newline. Insert it here.
+ */
+ pr_cont("\n");
+
acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
acpi_parse_processor_affinity, 0);
ret = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
--
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] printk: Break out printk_time
2011-02-25 18:06 ` [PATCH 2/4] printk: Break out printk_time Mike Travis
@ 2011-02-27 11:56 ` Ingo Molnar
0 siblings, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2011-02-27 11:56 UTC (permalink / raw)
To: Mike Travis
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel
* Mike Travis <travis@sgi.com> wrote:
> + /* add time if requested */
> + if (printk_time)
> + printed_len += printk_emit_time();
Why isnt't the printk_time condition pushed into printk_emit_time() as well?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] x86: Minimize SRAT messages
2011-02-25 18:06 ` [PATCH 4/4] x86: Minimize SRAT messages Mike Travis
@ 2011-02-27 12:03 ` Ingo Molnar
2011-02-28 19:41 ` Mike Travis
0 siblings, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2011-02-27 12:03 UTC (permalink / raw)
To: Mike Travis
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel
* Mike Travis <travis@sgi.com> wrote:
> Condense the SRAT: messages to show all the APIC id's on one line for
> each Node. This not only saves space in the log buf, it also makes
> it easier to spot inconsistencies in core to node placement.
>
> On a system with 2368 cores on 248 nodes the change will be...
>
> Was 2368 lines (for 2368 cores):
>
> 779 [0] SRAT: PXM 0 -> APIC 0x0000 -> Node 0
> 780 [0] SRAT: PXM 0 -> APIC 0x0002 -> Node 0
> 781 [0] SRAT: PXM 0 -> APIC 0x0004 -> Node 0
> ...
> 3145 [0] SRAT: PXM 247 -> APIC 0x3df0 -> Node 247
> 3146 [0] SRAT: PXM 247 -> APIC 0x3df2 -> Node 247
>
> Now it's 248 lines (for 248 Nodes):
>
> 821 [0] SRAT: Node 0: PXM:APIC 0:0x0 :0x2 :0x4 :0x10 :0x12 ...
> 822 [0] SRAT: Node 1: PXM:APIC 1:0x40 :0x42 :0x44 :0x50 :0x52 ...
> 823 [0] SRAT: Node 2: PXM:APIC 2:0x80 :0x82 :0x84 :0x90 :0x92 ...
> ...
> 1067 [0] SRAT: Node 246: PXM:APIC 246:0x3d80 :0x3d82 :0x3d84 :0x3d90 ...
> 1068 [0] SRAT: Node 247: PXM:APIC 247:0x3dc0 :0x3dc2 :0x3dc4 :0x3dd2 ...
>
> Signed-off-by: Mike Travis <travis@sgi.com>
> Reviewed-by: Jack Steiner <steiner@sgi.com>
> Reviewed-by: Robin Holt <holt@sgi.com>
> ---
> arch/x86/mm/srat_64.c | 19 +++++++++++++++++--
> drivers/acpi/numa.c | 7 +++++++
> 2 files changed, 24 insertions(+), 2 deletions(-)
>
> --- linux.orig/arch/x86/mm/srat_64.c
> +++ linux/arch/x86/mm/srat_64.c
> @@ -110,6 +110,12 @@ void __init acpi_numa_slit_init(struct a
> memblock_x86_reserve_range(phys, phys + length, "ACPI SLIT");
> }
>
> +/*
> + * Keep track of previous node and PXM values so we can combine
> + * same ones onto a single line.
> + */
> +static int __initdata last_node = NUMA_NO_NODE, last_pxm = PXM_INVAL;
> +
> /* Callback for Proximity Domain -> x2APIC mapping */
> void __init
> acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
> @@ -141,8 +147,17 @@ acpi_numa_x2apic_affinity_init(struct ac
> set_apicid_to_node(apic_id, node);
> node_set(node, cpu_nodes_parsed);
> acpi_numa = 1;
> - printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
> - pxm, apic_id, node);
> + if (node != last_node) {
> + pr_info("SRAT: Node %u: PXM:APIC %u:0x%x",
> + node, pxm, apic_id);
> + last_node = node;
> + last_pxm = pxm;
> + } else if (pxm != last_pxm) {
> + pr_cont(" %u:0x%x", pxm, apic_id);
> + last_pxm = pxm;
> + } else {
> + pr_cont(" :0x%x", apic_id);
> + }
> }
>
> /* Callback for Proximity Domain -> LAPIC mapping */
> --- linux.orig/drivers/acpi/numa.c
> +++ linux/drivers/acpi/numa.c
> @@ -286,6 +286,13 @@ int __init acpi_numa_init(void)
> if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
> acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
> acpi_parse_x2apic_affinity, 0);
> + /*
> + * Parsing ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY entries place
> + * multiple CPU's on the same Node line. This can leave the
> + * last entry "dangling" without a newline. Insert it here.
> + */
> + pr_cont("\n");
This is quite ugly as it breaks the genericity of the ACPI parsing here. Is there no
cleaner method that keeps this deinit \n printing somehow within the realm of x86?
Also, can there be cases where there's no 'dangling' line pending? In that case the
\n will be superfluous here.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-25 18:06 ` [PATCH 1/4] printk: Allocate kernel log buffer earlier Mike Travis
@ 2011-02-27 12:09 ` Ingo Molnar
2011-02-27 12:15 ` Ingo Molnar
2011-02-28 19:14 ` Mike Travis
0 siblings, 2 replies; 32+ messages in thread
From: Ingo Molnar @ 2011-02-27 12:09 UTC (permalink / raw)
To: Mike Travis
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds,
Yinghai Lu
* Mike Travis <travis@sgi.com> wrote:
> On larger systems, because of the numerous ACPI, Bootmem and EFI
> messages, the static log buffer overflows before the larger one
> specified by the log_buf_len param is allocated. Minimize the
> potential for overflow by allocating the new log buffer as soon
> as possible.
>
> We do this by changing the log_buf_len from an early_param to a
> _setup param. But _setup params are processed before the
> alloc_bootmem is available, so this function will now just save
> the requested log buf len. The real work routine (setup_log_buf)
> is called after bootmem is available.
>
> Signed-off-by: Mike Travis <travis@sgi.com>
> Reviewed-by: Jack Steiner <steiner@sgi.com>
> Reviewed-by: Robin Holt <holt@sgi.com>
> ---
> arch/x86/kernel/setup.c | 5 +++
> include/linux/printk.h | 4 ++
> init/main.c | 1
> kernel/printk.c | 75 ++++++++++++++++++++++++++++--------------------
> 4 files changed, 54 insertions(+), 31 deletions(-)
Well, the modern allocation method is memblock - available on all major
architectures.
You could avoid all this ugly workaround of bootmem limitations by moving the
allocation to memblock_alloc() and desupporting the log_buf_len= boot parameter
on non-memblock architectures.
kernel log buffer size can be configured via the .config so they will not be left
without larger buffers.
Doing this should also have the advantage of getting all the early x86 messages into
the larger buffer already, reducing the pressure to apply some of the other patches
in your series.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-27 12:09 ` Ingo Molnar
@ 2011-02-27 12:15 ` Ingo Molnar
2011-02-28 1:34 ` Yinghai Lu
2011-02-28 19:14 ` Mike Travis
1 sibling, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2011-02-27 12:15 UTC (permalink / raw)
To: Mike Travis
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds,
Yinghai Lu
* Ingo Molnar <mingo@elte.hu> wrote:
> You could avoid all this ugly workaround of bootmem limitations by moving the
> allocation to memblock_alloc() and desupporting the log_buf_len= boot parameter on
> non-memblock architectures.
memblock_alloc() could return -ENOSYS on architectures that do not implement it -
thus enabling such optional features without ugly #ifdef CONFIG_HAVE_MEMBLOCK
conditionals.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-27 12:15 ` Ingo Molnar
@ 2011-02-28 1:34 ` Yinghai Lu
2011-02-28 8:01 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Yinghai Lu @ 2011-02-28 1:34 UTC (permalink / raw)
To: Ingo Molnar, Mike Travis
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds
On 02/27/2011 04:15 AM, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>> You could avoid all this ugly workaround of bootmem limitations by moving the
>> allocation to memblock_alloc() and desupporting the log_buf_len= boot parameter on
>> non-memblock architectures.
>
> memblock_alloc() could return -ENOSYS on architectures that do not implement it -
> thus enabling such optional features without ugly #ifdef CONFIG_HAVE_MEMBLOCK
> conditionals.
Mike,
please check updated patch...
with the memblock change, you don't need to change acpi SRAT handling etc any more.
new buffer will be near high mem under 4g.
Thanks
Yinghai
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a47fe00..69b8995 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -974,6 +974,11 @@ void __init setup_arch(char **cmdline_p)
memblock.current_limit = get_max_mapped();
/*
+ * Allocate bigger log buffer as early as possible
+ */
+ setup_log_buf();
+
+ /*
* NOTE: On x86-32, only from this point on, fixmaps are ready for use.
*/
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 62a10c2..c3ade22 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -2,6 +2,8 @@
#define _LINUX_MEMBLOCK_H
#ifdef __KERNEL__
+#define MEMBLOCK_ERROR 0
+
#ifdef CONFIG_HAVE_MEMBLOCK
/*
* Logical memory blocks.
@@ -20,7 +22,6 @@
#include <asm/memblock.h>
#define INIT_MEMBLOCK_REGIONS 128
-#define MEMBLOCK_ERROR 0
struct memblock_region {
phys_addr_t base;
@@ -160,6 +161,12 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
#define __initdata_memblock
#endif
+#else
+static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
+{
+ return MEMBLOCK_ERROR;
+}
+
#endif /* CONFIG_HAVE_MEMBLOCK */
#endif /* __KERNEL__ */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index ee048e7..fd266a8 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -1,6 +1,8 @@
#ifndef __KERNEL_PRINTK__
#define __KERNEL_PRINTK__
+#include <linux/init.h>
+
extern const char linux_banner[];
extern const char linux_proc_banner[];
@@ -89,6 +91,8 @@ int no_printk(const char *fmt, ...)
extern asmlinkage __attribute__ ((format (printf, 1, 2)))
void early_printk(const char *fmt, ...);
+void __init setup_log_buf(void);
+
extern int printk_needs_cpu(int cpu);
extern void printk_tick(void);
diff --git a/init/main.c b/init/main.c
index 33c37c3..2085bb3 100644
--- a/init/main.c
+++ b/init/main.c
@@ -592,6 +592,7 @@ asmlinkage void __init start_kernel(void)
* These use large bootmem allocations and must precede
* kmem_cache_init()
*/
+ setup_log_buf();
pidhash_init();
vfs_caches_init_early();
sort_main_extable();
diff --git a/kernel/printk.c b/kernel/printk.c
index 3623152..14fa4d0 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -31,6 +31,7 @@
#include <linux/smp.h>
#include <linux/security.h>
#include <linux/bootmem.h>
+#include <linux/memblock.h>
#include <linux/syscalls.h>
#include <linux/kexec.h>
#include <linux/kdb.h>
@@ -162,46 +163,64 @@ void log_buf_kexec_setup(void)
}
#endif
+/* requested log_buf_len from kernel cmdline */
+static unsigned long __initdata new_log_buf_len;
+
+/* save requested log_buf_len since it's too early to process it */
static int __init log_buf_len_setup(char *str)
{
unsigned size = memparse(str, &str);
- unsigned long flags;
if (size)
size = roundup_pow_of_two(size);
- if (size > log_buf_len) {
- unsigned start, dest_idx, offset;
- char *new_log_buf;
+ if (size > log_buf_len)
+ new_log_buf_len = size;
- new_log_buf = alloc_bootmem(size);
- if (!new_log_buf) {
- printk(KERN_WARNING "log_buf_len: allocation failed\n");
- goto out;
- }
+ return 0;
+}
+early_param("log_buf_len", log_buf_len_setup);
- spin_lock_irqsave(&logbuf_lock, flags);
- log_buf_len = size;
- log_buf = new_log_buf;
-
- offset = start = min(con_start, log_start);
- dest_idx = 0;
- while (start != log_end) {
- log_buf[dest_idx] = __log_buf[start & (__LOG_BUF_LEN - 1)];
- start++;
- dest_idx++;
- }
- log_start -= offset;
- con_start -= offset;
- log_end -= offset;
- spin_unlock_irqrestore(&logbuf_lock, flags);
+void __init setup_log_buf(void)
+{
+ unsigned long flags;
+ unsigned start, dest_idx, offset;
+ char *new_log_buf;
+ phys_addr_t new_addr;
+ int free;
+
+ if (!new_log_buf_len)
+ return;
- printk(KERN_NOTICE "log_buf_len: %d\n", log_buf_len);
+ new_addr = memblock_alloc(new_log_buf_len, PAGE_SIZE);
+ if (new_addr != MEMBLOCK_ERROR)
+ new_log_buf = __va(new_addr);
+ else
+ new_log_buf = alloc_bootmem(new_log_buf_len);
+
+ spin_lock_irqsave(&logbuf_lock, flags);
+ log_buf_len = new_log_buf_len;
+ log_buf = new_log_buf;
+ new_log_buf_len = 0;
+ free = __LOG_BUF_LEN - log_end;
+
+ offset = start = min(con_start, log_start);
+ dest_idx = 0;
+ while (start != log_end) {
+ unsigned log_idx_mask = start & (__LOG_BUF_LEN - 1);
+
+ log_buf[dest_idx] = __log_buf[log_idx_mask];
+ start++;
+ dest_idx++;
}
-out:
- return 1;
-}
+ log_start -= offset;
+ con_start -= offset;
+ log_end -= offset;
+ spin_unlock_irqrestore(&logbuf_lock, flags);
-__setup("log_buf_len=", log_buf_len_setup);
+ pr_info("log_buf_len: %d\n", log_buf_len);
+ pr_info("early log buf free: %d(%d%%)\n",
+ free, (free * 100) / __LOG_BUF_LEN);
+}
#ifdef CONFIG_BOOT_PRINTK_DELAY
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 1:34 ` Yinghai Lu
@ 2011-02-28 8:01 ` Ingo Molnar
2011-02-28 19:26 ` Mike Travis
2011-02-28 8:06 ` Ingo Molnar
2011-02-28 19:23 ` Mike Travis
2 siblings, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2011-02-28 8:01 UTC (permalink / raw)
To: Yinghai Lu
Cc: Mike Travis, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds
* Yinghai Lu <yinghai@kernel.org> wrote:
> + pr_info("log_buf_len: %d\n", log_buf_len);
> + pr_info("early log buf free: %d(%d%%)\n",
> + free, (free * 100) / __LOG_BUF_LEN);
Such debug printks should be removed from the final version of the patch ...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 1:34 ` Yinghai Lu
2011-02-28 8:01 ` Ingo Molnar
@ 2011-02-28 8:06 ` Ingo Molnar
2011-02-28 19:18 ` Yinghai Lu
2011-02-28 19:29 ` Mike Travis
2011-02-28 19:23 ` Mike Travis
2 siblings, 2 replies; 32+ messages in thread
From: Ingo Molnar @ 2011-02-28 8:06 UTC (permalink / raw)
To: Yinghai Lu
Cc: Mike Travis, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds
* Yinghai Lu <yinghai@kernel.org> wrote:
> + new_addr = memblock_alloc(new_log_buf_len, PAGE_SIZE);
> + if (new_addr != MEMBLOCK_ERROR)
> + new_log_buf = __va(new_addr);
> + else
> + new_log_buf = alloc_bootmem(new_log_buf_len);
alloc_bootmem() can fail, especially if someone uses a too large boot parameter
value - and your code does not check for failure.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-27 12:09 ` Ingo Molnar
2011-02-27 12:15 ` Ingo Molnar
@ 2011-02-28 19:14 ` Mike Travis
1 sibling, 0 replies; 32+ messages in thread
From: Mike Travis @ 2011-02-28 19:14 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds,
Yinghai Lu
Ingo Molnar wrote:
> * Mike Travis <travis@sgi.com> wrote:
>
>> On larger systems, because of the numerous ACPI, Bootmem and EFI
>> messages, the static log buffer overflows before the larger one
>> specified by the log_buf_len param is allocated. Minimize the
>> potential for overflow by allocating the new log buffer as soon
>> as possible.
>>
>> We do this by changing the log_buf_len from an early_param to a
>> _setup param. But _setup params are processed before the
>> alloc_bootmem is available, so this function will now just save
>> the requested log buf len. The real work routine (setup_log_buf)
>> is called after bootmem is available.
>>
>> Signed-off-by: Mike Travis <travis@sgi.com>
>> Reviewed-by: Jack Steiner <steiner@sgi.com>
>> Reviewed-by: Robin Holt <holt@sgi.com>
>> ---
>> arch/x86/kernel/setup.c | 5 +++
>> include/linux/printk.h | 4 ++
>> init/main.c | 1
>> kernel/printk.c | 75 ++++++++++++++++++++++++++++--------------------
>> 4 files changed, 54 insertions(+), 31 deletions(-)
>
> Well, the modern allocation method is memblock - available on all major
> architectures.
>
> You could avoid all this ugly workaround of bootmem limitations by moving the
> allocation to memblock_alloc() and desupporting the log_buf_len= boot parameter
> on non-memblock architectures.
Is it really that ugly? I thought in some ways it cleaned it up.
I'm also hesitant to change code for other arch's when I can't test them. This
approach seemed to be the safest.
> kernel log buffer size can be configured via the .config so they will not be left
> without larger buffers.
We have asked about this, but distros are reluctant to increase memory usage
for their entire installed base. I think we're lucky they bumped it up to 256k
from the default 128k.
>
> Doing this should also have the advantage of getting all the early x86 messages into
> the larger buffer already, reducing the pressure to apply some of the other patches
> in your series.
There are only two and both remove only redundant information.
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 8:06 ` Ingo Molnar
@ 2011-02-28 19:18 ` Yinghai Lu
2011-02-28 19:29 ` Mike Travis
1 sibling, 0 replies; 32+ messages in thread
From: Yinghai Lu @ 2011-02-28 19:18 UTC (permalink / raw)
To: Ingo Molnar
Cc: Mike Travis, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
On Mon, Feb 28, 2011 at 12:06 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Yinghai Lu <yinghai@kernel.org> wrote:
>
>> + new_addr = memblock_alloc(new_log_buf_len, PAGE_SIZE);
>> + if (new_addr != MEMBLOCK_ERROR)
>> + new_log_buf = __va(new_addr);
>> + else
>> + new_log_buf = alloc_bootmem(new_log_buf_len);
>
> alloc_bootmem() can fail, especially if someone uses a too large boot parameter
> value - and your code does not check for failure.
alloc_bootmem will panic if it fails.
static void * __init ___alloc_bootmem(unsigned long size, unsigned long align,
unsigned long goal, unsigned long limit)
{
void *mem = ___alloc_bootmem_nopanic(size, align, goal, limit);
if (mem)
return mem;
/*
* Whoops, we cannot satisfy the allocation request.
*/
printk(KERN_ALERT "bootmem alloc of %lu bytes failed!\n", size);
panic("Out of memory");
return NULL;
}
so it should be ok.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 1:34 ` Yinghai Lu
2011-02-28 8:01 ` Ingo Molnar
2011-02-28 8:06 ` Ingo Molnar
@ 2011-02-28 19:23 ` Mike Travis
2011-02-28 19:46 ` Yinghai Lu
2011-03-01 7:42 ` [PATCH 1/4] printk: Allocate kernel log buffer earlier Ingo Molnar
2 siblings, 2 replies; 32+ messages in thread
From: Mike Travis @ 2011-02-28 19:23 UTC (permalink / raw)
To: Yinghai Lu
Cc: Ingo Molnar, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds
Yinghai Lu wrote:
> On 02/27/2011 04:15 AM, Ingo Molnar wrote:
>> * Ingo Molnar <mingo@elte.hu> wrote:
>>
>>> You could avoid all this ugly workaround of bootmem limitations by moving the
>>> allocation to memblock_alloc() and desupporting the log_buf_len= boot parameter on
>>> non-memblock architectures.
>> memblock_alloc() could return -ENOSYS on architectures that do not implement it -
>> thus enabling such optional features without ugly #ifdef CONFIG_HAVE_MEMBLOCK
>> conditionals.
>
> Mike,
>
> please check updated patch...
>
> with the memblock change, you don't need to change acpi SRAT handling etc any more.
I had to debug a weird ACPI -> Node mapping last week and the
"improved" SRAT messages helped that considerably. It was
far easier to spot which Node didn't have the correct assignments.
I'd submit that patch even without needing fewer (like 512 lines
max instead of 4096 lines max) bytes in the log buffer.
>
> new buffer will be near high mem under 4g.
Is this really that important? The patchset is ridiculously
simple as it is. Do I really need to spend more time on it?
I've already done 4 revisions of it and responded to all
objections.
I have a far more important patchset that I've been trying to
prepare that actually fixes broken code, instead of trying to
reduce a few thousand bytes in the log buffer.
And everyone that has experienced it here on our lab systems
say they like the improvements (even the ones that I had to
toss because of "that's not the way it was before" kind of
objections.)
Let's move on to far more important problems.
Thanks,
Mike
>
> Thanks
>
> Yinghai
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index a47fe00..69b8995 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -974,6 +974,11 @@ void __init setup_arch(char **cmdline_p)
> memblock.current_limit = get_max_mapped();
>
> /*
> + * Allocate bigger log buffer as early as possible
> + */
> + setup_log_buf();
> +
> + /*
> * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
> */
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 62a10c2..c3ade22 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -2,6 +2,8 @@
> #define _LINUX_MEMBLOCK_H
> #ifdef __KERNEL__
>
> +#define MEMBLOCK_ERROR 0
> +
> #ifdef CONFIG_HAVE_MEMBLOCK
> /*
> * Logical memory blocks.
> @@ -20,7 +22,6 @@
> #include <asm/memblock.h>
>
> #define INIT_MEMBLOCK_REGIONS 128
> -#define MEMBLOCK_ERROR 0
>
> struct memblock_region {
> phys_addr_t base;
> @@ -160,6 +161,12 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
> #define __initdata_memblock
> #endif
>
> +#else
> +static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
> +{
> + return MEMBLOCK_ERROR;
> +}
> +
> #endif /* CONFIG_HAVE_MEMBLOCK */
>
> #endif /* __KERNEL__ */
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index ee048e7..fd266a8 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -1,6 +1,8 @@
> #ifndef __KERNEL_PRINTK__
> #define __KERNEL_PRINTK__
>
> +#include <linux/init.h>
> +
> extern const char linux_banner[];
> extern const char linux_proc_banner[];
>
> @@ -89,6 +91,8 @@ int no_printk(const char *fmt, ...)
> extern asmlinkage __attribute__ ((format (printf, 1, 2)))
> void early_printk(const char *fmt, ...);
>
> +void __init setup_log_buf(void);
> +
> extern int printk_needs_cpu(int cpu);
> extern void printk_tick(void);
>
> diff --git a/init/main.c b/init/main.c
> index 33c37c3..2085bb3 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -592,6 +592,7 @@ asmlinkage void __init start_kernel(void)
> * These use large bootmem allocations and must precede
> * kmem_cache_init()
> */
> + setup_log_buf();
> pidhash_init();
> vfs_caches_init_early();
> sort_main_extable();
> diff --git a/kernel/printk.c b/kernel/printk.c
> index 3623152..14fa4d0 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -31,6 +31,7 @@
> #include <linux/smp.h>
> #include <linux/security.h>
> #include <linux/bootmem.h>
> +#include <linux/memblock.h>
> #include <linux/syscalls.h>
> #include <linux/kexec.h>
> #include <linux/kdb.h>
> @@ -162,46 +163,64 @@ void log_buf_kexec_setup(void)
> }
> #endif
>
> +/* requested log_buf_len from kernel cmdline */
> +static unsigned long __initdata new_log_buf_len;
> +
> +/* save requested log_buf_len since it's too early to process it */
> static int __init log_buf_len_setup(char *str)
> {
> unsigned size = memparse(str, &str);
> - unsigned long flags;
>
> if (size)
> size = roundup_pow_of_two(size);
> - if (size > log_buf_len) {
> - unsigned start, dest_idx, offset;
> - char *new_log_buf;
> + if (size > log_buf_len)
> + new_log_buf_len = size;
>
> - new_log_buf = alloc_bootmem(size);
> - if (!new_log_buf) {
> - printk(KERN_WARNING "log_buf_len: allocation failed\n");
> - goto out;
> - }
> + return 0;
> +}
> +early_param("log_buf_len", log_buf_len_setup);
>
> - spin_lock_irqsave(&logbuf_lock, flags);
> - log_buf_len = size;
> - log_buf = new_log_buf;
> -
> - offset = start = min(con_start, log_start);
> - dest_idx = 0;
> - while (start != log_end) {
> - log_buf[dest_idx] = __log_buf[start & (__LOG_BUF_LEN - 1)];
> - start++;
> - dest_idx++;
> - }
> - log_start -= offset;
> - con_start -= offset;
> - log_end -= offset;
> - spin_unlock_irqrestore(&logbuf_lock, flags);
> +void __init setup_log_buf(void)
> +{
> + unsigned long flags;
> + unsigned start, dest_idx, offset;
> + char *new_log_buf;
> + phys_addr_t new_addr;
> + int free;
> +
> + if (!new_log_buf_len)
> + return;
>
> - printk(KERN_NOTICE "log_buf_len: %d\n", log_buf_len);
> + new_addr = memblock_alloc(new_log_buf_len, PAGE_SIZE);
> + if (new_addr != MEMBLOCK_ERROR)
> + new_log_buf = __va(new_addr);
> + else
> + new_log_buf = alloc_bootmem(new_log_buf_len);
> +
> + spin_lock_irqsave(&logbuf_lock, flags);
> + log_buf_len = new_log_buf_len;
> + log_buf = new_log_buf;
> + new_log_buf_len = 0;
> + free = __LOG_BUF_LEN - log_end;
> +
> + offset = start = min(con_start, log_start);
> + dest_idx = 0;
> + while (start != log_end) {
> + unsigned log_idx_mask = start & (__LOG_BUF_LEN - 1);
> +
> + log_buf[dest_idx] = __log_buf[log_idx_mask];
> + start++;
> + dest_idx++;
> }
> -out:
> - return 1;
> -}
> + log_start -= offset;
> + con_start -= offset;
> + log_end -= offset;
> + spin_unlock_irqrestore(&logbuf_lock, flags);
>
> -__setup("log_buf_len=", log_buf_len_setup);
> + pr_info("log_buf_len: %d\n", log_buf_len);
> + pr_info("early log buf free: %d(%d%%)\n",
> + free, (free * 100) / __LOG_BUF_LEN);
> +}
>
> #ifdef CONFIG_BOOT_PRINTK_DELAY
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 8:01 ` Ingo Molnar
@ 2011-02-28 19:26 ` Mike Travis
2011-03-01 7:35 ` Ingo Molnar
0 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2011-02-28 19:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Yinghai Lu, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds
Because I haven't been able to test on a fully configured
system, this might be crucial info to figure out how to
fix this when it happens on a customer system. Are you
saying this small line is any less important than the
other thousands and thousands of seemingly meaningless
lines?
Ingo Molnar wrote:
> * Yinghai Lu <yinghai@kernel.org> wrote:
>
>> + pr_info("log_buf_len: %d\n", log_buf_len);
>> + pr_info("early log buf free: %d(%d%%)\n",
>> + free, (free * 100) / __LOG_BUF_LEN);
>
> Such debug printks should be removed from the final version of the patch ...
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 8:06 ` Ingo Molnar
2011-02-28 19:18 ` Yinghai Lu
@ 2011-02-28 19:29 ` Mike Travis
1 sibling, 0 replies; 32+ messages in thread
From: Mike Travis @ 2011-02-28 19:29 UTC (permalink / raw)
To: Ingo Molnar
Cc: Yinghai Lu, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds
Ingo Molnar wrote:
> * Yinghai Lu <yinghai@kernel.org> wrote:
>
>> + new_addr = memblock_alloc(new_log_buf_len, PAGE_SIZE);
>> + if (new_addr != MEMBLOCK_ERROR)
>> + new_log_buf = __va(new_addr);
>> + else
>> + new_log_buf = alloc_bootmem(new_log_buf_len);
>
> alloc_bootmem() can fail, especially if someone uses a too large boot parameter
> value - and your code does not check for failure.
alloc_bootmem does panic when it can't allocate memory.
Ingo, we have a "uvconfig" script that sets up the boot parameters (there
are many that are needed to be very specific). It sets up the log_buf_len
to be 8M. We will never overflow memory with that.
And if someone is stupid enough to try and allocate a log buffer that
consumes more memory than they have, then they have a different kind of
problem.
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] x86: Minimize SRAT messages
2011-02-27 12:03 ` Ingo Molnar
@ 2011-02-28 19:41 ` Mike Travis
2011-03-01 7:51 ` Ingo Molnar
0 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2011-02-28 19:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel
Ingo Molnar wrote:
> * Mike Travis <travis@sgi.com> wrote:
>
>> Condense the SRAT: messages to show all the APIC id's on one line for
>> each Node. This not only saves space in the log buf, it also makes
>> it easier to spot inconsistencies in core to node placement.
>>
>> On a system with 2368 cores on 248 nodes the change will be...
>>
>> Was 2368 lines (for 2368 cores):
>>
>> 779 [0] SRAT: PXM 0 -> APIC 0x0000 -> Node 0
>> 780 [0] SRAT: PXM 0 -> APIC 0x0002 -> Node 0
>> 781 [0] SRAT: PXM 0 -> APIC 0x0004 -> Node 0
>> ...
>> 3145 [0] SRAT: PXM 247 -> APIC 0x3df0 -> Node 247
>> 3146 [0] SRAT: PXM 247 -> APIC 0x3df2 -> Node 247
>>
>> Now it's 248 lines (for 248 Nodes):
>>
>> 821 [0] SRAT: Node 0: PXM:APIC 0:0x0 :0x2 :0x4 :0x10 :0x12 ...
>> 822 [0] SRAT: Node 1: PXM:APIC 1:0x40 :0x42 :0x44 :0x50 :0x52 ...
>> 823 [0] SRAT: Node 2: PXM:APIC 2:0x80 :0x82 :0x84 :0x90 :0x92 ...
>> ...
>> 1067 [0] SRAT: Node 246: PXM:APIC 246:0x3d80 :0x3d82 :0x3d84 :0x3d90 ...
>> 1068 [0] SRAT: Node 247: PXM:APIC 247:0x3dc0 :0x3dc2 :0x3dc4 :0x3dd2 ...
>>
>> Signed-off-by: Mike Travis <travis@sgi.com>
>> Reviewed-by: Jack Steiner <steiner@sgi.com>
>> Reviewed-by: Robin Holt <holt@sgi.com>
>> ---
>> arch/x86/mm/srat_64.c | 19 +++++++++++++++++--
>> drivers/acpi/numa.c | 7 +++++++
>> 2 files changed, 24 insertions(+), 2 deletions(-)
>>
>> --- linux.orig/arch/x86/mm/srat_64.c
>> +++ linux/arch/x86/mm/srat_64.c
>> @@ -110,6 +110,12 @@ void __init acpi_numa_slit_init(struct a
>> memblock_x86_reserve_range(phys, phys + length, "ACPI SLIT");
>> }
>>
>> +/*
>> + * Keep track of previous node and PXM values so we can combine
>> + * same ones onto a single line.
>> + */
>> +static int __initdata last_node = NUMA_NO_NODE, last_pxm = PXM_INVAL;
>> +
>> /* Callback for Proximity Domain -> x2APIC mapping */
>> void __init
>> acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
>> @@ -141,8 +147,17 @@ acpi_numa_x2apic_affinity_init(struct ac
>> set_apicid_to_node(apic_id, node);
>> node_set(node, cpu_nodes_parsed);
>> acpi_numa = 1;
>> - printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
>> - pxm, apic_id, node);
>> + if (node != last_node) {
>> + pr_info("SRAT: Node %u: PXM:APIC %u:0x%x",
>> + node, pxm, apic_id);
>> + last_node = node;
>> + last_pxm = pxm;
>> + } else if (pxm != last_pxm) {
>> + pr_cont(" %u:0x%x", pxm, apic_id);
>> + last_pxm = pxm;
>> + } else {
>> + pr_cont(" :0x%x", apic_id);
>> + }
>> }
>>
>> /* Callback for Proximity Domain -> LAPIC mapping */
>> --- linux.orig/drivers/acpi/numa.c
>> +++ linux/drivers/acpi/numa.c
>> @@ -286,6 +286,13 @@ int __init acpi_numa_init(void)
>> if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
>> acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
>> acpi_parse_x2apic_affinity, 0);
>> + /*
>> + * Parsing ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY entries place
>> + * multiple CPU's on the same Node line. This can leave the
>> + * last entry "dangling" without a newline. Insert it here.
>> + */
>> + pr_cont("\n");
>
> This is quite ugly as it breaks the genericity of the ACPI parsing here. Is there no
> cleaner method that keeps this deinit \n printing somehow within the realm of x86?
>
> Also, can there be cases where there's no 'dangling' line pending? In that case the
> \n will be superfluous here.
>
> Thanks,
>
> Ingo
Yes, David brought up the same point a couple of weeks ago. I've tried and
failed to find a solution, except that the printk function seems to add the
newline if there is not one. I asked if this was sufficient to rely on,
and no one spoke up. (Everyone is quick to object, but seemingly very slow
to agree.)
And yes, there will always be a dangling line. If the ACPI guys could tell
me how to predict when this is the last entry, I would gladly change it.
Thanks,
Mike
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 19:23 ` Mike Travis
@ 2011-02-28 19:46 ` Yinghai Lu
2011-02-28 20:02 ` Mike Travis
2011-03-01 7:42 ` [PATCH 1/4] printk: Allocate kernel log buffer earlier Ingo Molnar
1 sibling, 1 reply; 32+ messages in thread
From: Yinghai Lu @ 2011-02-28 19:46 UTC (permalink / raw)
To: Mike Travis
Cc: Ingo Molnar, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
On Mon, Feb 28, 2011 at 11:23 AM, Mike Travis <travis@sgi.com> wrote:
>
>
> Yinghai Lu wrote:
>>
>> On 02/27/2011 04:15 AM, Ingo Molnar wrote:
>>>
>>> * Ingo Molnar <mingo@elte.hu> wrote:
>>>
>>>> You could avoid all this ugly workaround of bootmem limitations by
>>>> moving the allocation to memblock_alloc() and desupporting the log_buf_len=
>>>> boot parameter on non-memblock architectures.
>>>
>>> memblock_alloc() could return -ENOSYS on architectures that do not
>>> implement it - thus enabling such optional features without ugly #ifdef
>>> CONFIG_HAVE_MEMBLOCK conditionals.
>>
>> Mike,
>>
>> please check updated patch...
>>
>> with the memblock change, you don't need to change acpi SRAT handling etc
>> any more.
>
> I had to debug a weird ACPI -> Node mapping last week and the
> "improved" SRAT messages helped that considerably. It was
> far easier to spot which Node didn't have the correct assignments.
> I'd submit that patch even without needing fewer (like 512 lines
> max instead of 4096 lines max) bytes in the log buffer.
Your current change to ACPI srat is not complete yet.
you only handle x2apic entries.
According to ACPI 4.0 spec, We should have mixed entries with apic
entries and x2apic entries.
apic entries are for apic id < 255.
x2apic entries are for apic id > 255.
Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 19:46 ` Yinghai Lu
@ 2011-02-28 20:02 ` Mike Travis
2011-02-28 22:59 ` Yinghai Lu
0 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2011-02-28 20:02 UTC (permalink / raw)
To: Yinghai Lu
Cc: Ingo Molnar, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
Yinghai Lu wrote:
> On Mon, Feb 28, 2011 at 11:23 AM, Mike Travis <travis@sgi.com> wrote:
>>
>> Yinghai Lu wrote:
>>> On 02/27/2011 04:15 AM, Ingo Molnar wrote:
>>>> * Ingo Molnar <mingo@elte.hu> wrote:
>>>>
>>>>> You could avoid all this ugly workaround of bootmem limitations by
>>>>> moving the allocation to memblock_alloc() and desupporting the log_buf_len=
>>>>> boot parameter on non-memblock architectures.
>>>> memblock_alloc() could return -ENOSYS on architectures that do not
>>>> implement it - thus enabling such optional features without ugly #ifdef
>>>> CONFIG_HAVE_MEMBLOCK conditionals.
>>> Mike,
>>>
>>> please check updated patch...
>>>
>>> with the memblock change, you don't need to change acpi SRAT handling etc
>>> any more.
>> I had to debug a weird ACPI -> Node mapping last week and the
>> "improved" SRAT messages helped that considerably. It was
>> far easier to spot which Node didn't have the correct assignments.
>> I'd submit that patch even without needing fewer (like 512 lines
>> max instead of 4096 lines max) bytes in the log buffer.
>
> Your current change to ACPI srat is not complete yet.
>
> you only handle x2apic entries.
>
> According to ACPI 4.0 spec, We should have mixed entries with apic
> entries and x2apic entries.
> apic entries are for apic id < 255.
> x2apic entries are for apic id > 255.
>
> Yinghai
Are you sure you can run both "legacy" and "x2" apic modes in
the same SSI under the Intel or AMD rules?
(And it's highly probable that you cannot overflow the log
buffer with less than 256 CPU's.)
Thanks,
Mike
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 20:02 ` Mike Travis
@ 2011-02-28 22:59 ` Yinghai Lu
2011-03-31 0:41 ` [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set Mike Travis
0 siblings, 1 reply; 32+ messages in thread
From: Yinghai Lu @ 2011-02-28 22:59 UTC (permalink / raw)
To: Mike Travis
Cc: Ingo Molnar, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
On Mon, Feb 28, 2011 at 12:02 PM, Mike Travis <travis@sgi.com> wrote:
>
>
> Yinghai Lu wrote:
>>
>> On Mon, Feb 28, 2011 at 11:23 AM, Mike Travis <travis@sgi.com> wrote:
>>>
>>> Yinghai Lu wrote:
>>>>
>>>> On 02/27/2011 04:15 AM, Ingo Molnar wrote:
>>>>>
>>>>> * Ingo Molnar <mingo@elte.hu> wrote:
>>>>>
>>>>>> You could avoid all this ugly workaround of bootmem limitations by
>>>>>> moving the allocation to memblock_alloc() and desupporting the
>>>>>> log_buf_len=
>>>>>> boot parameter on non-memblock architectures.
>>>>>
>>>>> memblock_alloc() could return -ENOSYS on architectures that do not
>>>>> implement it - thus enabling such optional features without ugly #ifdef
>>>>> CONFIG_HAVE_MEMBLOCK conditionals.
>>>>
>>>> Mike,
>>>>
>>>> please check updated patch...
>>>>
>>>> with the memblock change, you don't need to change acpi SRAT handling
>>>> etc
>>>> any more.
>>>
>>> I had to debug a weird ACPI -> Node mapping last week and the
>>> "improved" SRAT messages helped that considerably. It was
>>> far easier to spot which Node didn't have the correct assignments.
>>> I'd submit that patch even without needing fewer (like 512 lines
>>> max instead of 4096 lines max) bytes in the log buffer.
>>
>> Your current change to ACPI srat is not complete yet.
>>
>> you only handle x2apic entries.
>>
>> According to ACPI 4.0 spec, We should have mixed entries with apic
>> entries and x2apic entries.
>> apic entries are for apic id < 255.
>> x2apic entries are for apic id > 255.
>>
>> Yinghai
>
> Are you sure you can run both "legacy" and "x2" apic modes in
> the same SSI under the Intel or AMD rules?
>
No.
According to ACPI 4.0 Spec:
even the CPUs are with x2apic mode (aka pre-enabled in BIOS),
if their apic id < 255, BIOS still need to use xapic entries for them.
Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 19:26 ` Mike Travis
@ 2011-03-01 7:35 ` Ingo Molnar
0 siblings, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2011-03-01 7:35 UTC (permalink / raw)
To: Mike Travis
Cc: Yinghai Lu, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds
* Mike Travis <travis@sgi.com> wrote:
> Ingo Molnar wrote:
> >* Yinghai Lu <yinghai@kernel.org> wrote:
> >
> >>+ pr_info("log_buf_len: %d\n", log_buf_len);
> >>+ pr_info("early log buf free: %d(%d%%)\n",
> >>+ free, (free * 100) / __LOG_BUF_LEN);
> >
> > Such debug printks should be removed from the final version of the patch ...
>
> Because I haven't been able to test on a fully configured
> system, this might be crucial info to figure out how to
> fix this when it happens on a customer system. Are you
> saying this small line is any less important than the
> other thousands and thousands of seemingly meaningless
> lines?
I think you are thinking about this the wrong way.
The real fix is to never even have to think about 'how much early buffer space do we
have left'. I.e. via the memblock allocation the buffer can be enlargened before the
big printks happen.
btw., a sidenote, it would make sense to record cases when we 'lose' printk output,
in the printk buffer itself. (it can be done by reserving a bit and updating the
'lost this many bytes' portion of the buffer, or so.)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] printk: Allocate kernel log buffer earlier
2011-02-28 19:23 ` Mike Travis
2011-02-28 19:46 ` Yinghai Lu
@ 2011-03-01 7:42 ` Ingo Molnar
1 sibling, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2011-03-01 7:42 UTC (permalink / raw)
To: Mike Travis
Cc: Yinghai Lu, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Tejun Heo, Linus Torvalds
* Mike Travis <travis@sgi.com> wrote:
> Yinghai Lu wrote:
> > please check updated patch...
> >
> > with the memblock change, you don't need to change acpi SRAT handling etc any
> > more.
>
> I had to debug a weird ACPI -> Node mapping last week and the
> "improved" SRAT messages helped that considerably. It was
> far easier to spot which Node didn't have the correct assignments.
> I'd submit that patch even without needing fewer (like 512 lines
> max instead of 4096 lines max) bytes in the log buffer.
I agree that better compressed output generally makes sense - you just need to solve
the ugliness aspect of it. (or get Len's Acked-by to add that code to drivers/acpi/)
Nevertheless doing this via memblock is obviously more important, as it solves the
early printk log overrun problem once and for all.
> Let's move on to far more important problems.
That's not the threshold for upstream inclusion though. (at least for patches that
we process via the -tip tree)
If you add crap to a single hardware driver then only that hardware is affected, but
if you change the way the printk buffer is allocated it is *very important*, because
like every single kernel message is affected by it.
So we scale up our review threshold with the importance of the piece of code
affected.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] x86: Minimize SRAT messages
2011-02-28 19:41 ` Mike Travis
@ 2011-03-01 7:51 ` Ingo Molnar
2011-03-31 2:38 ` Len Brown
0 siblings, 1 reply; 32+ messages in thread
From: Ingo Molnar @ 2011-03-01 7:51 UTC (permalink / raw)
To: Mike Travis
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Linus Torvalds
* Mike Travis <travis@sgi.com> wrote:
> Ingo Molnar wrote:
> >* Mike Travis <travis@sgi.com> wrote:
> >
> >>Condense the SRAT: messages to show all the APIC id's on one line for
> >>each Node. This not only saves space in the log buf, it also makes
> >>it easier to spot inconsistencies in core to node placement.
> >>
> >>On a system with 2368 cores on 248 nodes the change will be...
> >>
> >>Was 2368 lines (for 2368 cores):
> >>
> >> 779 [0] SRAT: PXM 0 -> APIC 0x0000 -> Node 0
> >> 780 [0] SRAT: PXM 0 -> APIC 0x0002 -> Node 0
> >> 781 [0] SRAT: PXM 0 -> APIC 0x0004 -> Node 0
> >> ...
> >> 3145 [0] SRAT: PXM 247 -> APIC 0x3df0 -> Node 247
> >> 3146 [0] SRAT: PXM 247 -> APIC 0x3df2 -> Node 247
> >>
> >>Now it's 248 lines (for 248 Nodes):
> >>
> >> 821 [0] SRAT: Node 0: PXM:APIC 0:0x0 :0x2 :0x4 :0x10 :0x12 ...
> >> 822 [0] SRAT: Node 1: PXM:APIC 1:0x40 :0x42 :0x44 :0x50 :0x52 ...
> >> 823 [0] SRAT: Node 2: PXM:APIC 2:0x80 :0x82 :0x84 :0x90 :0x92 ...
> >> ...
> >> 1067 [0] SRAT: Node 246: PXM:APIC 246:0x3d80 :0x3d82 :0x3d84 :0x3d90 ...
> >> 1068 [0] SRAT: Node 247: PXM:APIC 247:0x3dc0 :0x3dc2 :0x3dc4 :0x3dd2 ...
> >>
> >>Signed-off-by: Mike Travis <travis@sgi.com>
> >>Reviewed-by: Jack Steiner <steiner@sgi.com>
> >>Reviewed-by: Robin Holt <holt@sgi.com>
> >>---
> >> arch/x86/mm/srat_64.c | 19 +++++++++++++++++--
> >> drivers/acpi/numa.c | 7 +++++++
> >> 2 files changed, 24 insertions(+), 2 deletions(-)
> >>
> >>--- linux.orig/arch/x86/mm/srat_64.c
> >>+++ linux/arch/x86/mm/srat_64.c
> >>@@ -110,6 +110,12 @@ void __init acpi_numa_slit_init(struct a
> >> memblock_x86_reserve_range(phys, phys + length, "ACPI SLIT");
> >> }
> >>+/*
> >>+ * Keep track of previous node and PXM values so we can combine
> >>+ * same ones onto a single line.
> >>+ */
> >>+static int __initdata last_node = NUMA_NO_NODE, last_pxm = PXM_INVAL;
> >>+
> >> /* Callback for Proximity Domain -> x2APIC mapping */
> >> void __init
> >> acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
> >>@@ -141,8 +147,17 @@ acpi_numa_x2apic_affinity_init(struct ac
> >> set_apicid_to_node(apic_id, node);
> >> node_set(node, cpu_nodes_parsed);
> >> acpi_numa = 1;
> >>- printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
> >>- pxm, apic_id, node);
> >>+ if (node != last_node) {
> >>+ pr_info("SRAT: Node %u: PXM:APIC %u:0x%x",
> >>+ node, pxm, apic_id);
> >>+ last_node = node;
> >>+ last_pxm = pxm;
> >>+ } else if (pxm != last_pxm) {
> >>+ pr_cont(" %u:0x%x", pxm, apic_id);
> >>+ last_pxm = pxm;
> >>+ } else {
> >>+ pr_cont(" :0x%x", apic_id);
> >>+ }
> >> }
> >> /* Callback for Proximity Domain -> LAPIC mapping */
> >>--- linux.orig/drivers/acpi/numa.c
> >>+++ linux/drivers/acpi/numa.c
> >>@@ -286,6 +286,13 @@ int __init acpi_numa_init(void)
> >> if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
> >> acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
> >> acpi_parse_x2apic_affinity, 0);
> >>+ /*
> >>+ * Parsing ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY entries place
> >>+ * multiple CPU's on the same Node line. This can leave the
> >>+ * last entry "dangling" without a newline. Insert it here.
> >>+ */
> >>+ pr_cont("\n");
> >
> >This is quite ugly as it breaks the genericity of the ACPI parsing
> >here. Is there no cleaner method that keeps this deinit \n
> >printing somehow within the realm of x86?
> >
> >Also, can there be cases where there's no 'dangling' line pending?
> >In that case the \n will be superfluous here.
> >
> >Thanks,
> >
> > Ingo
>
> Yes, David brought up the same point a couple of weeks ago. I've tried and
> failed to find a solution, except that the printk function seems to add the
> newline if there is not one. I asked if this was sufficient to rely on,
> and no one spoke up. (Everyone is quick to object, but seemingly very slow
> to agree.)
>
> And yes, there will always be a dangling line. If the ACPI guys could tell
> me how to predict when this is the last entry, I would gladly change it.
Your problem is that the current way of ACPI parsing does not lend itself well to
your stateful approach to printing compressed info.
Last i checked the C language was still Turing-complete so there *ought* to be some
solution.
My problem is that you are asking me to commit a change to a piece of code i do not
maintain. I do that reluctantly and i absolutely cannot do it when a patch has
easily visible negative side-effects on code quality. So either get Len's Acked-by
to add a small amount of crap to drivers/acpi/ (or better yet, get him to commit
it), or code up a clean solution ...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
2011-02-28 22:59 ` Yinghai Lu
@ 2011-03-31 0:41 ` Mike Travis
2011-03-31 1:40 ` Yinghai Lu
2011-04-07 19:43 ` Mike Travis
0 siblings, 2 replies; 32+ messages in thread
From: Mike Travis @ 2011-03-31 0:41 UTC (permalink / raw)
To: Yinghai Lu, Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
Subject: memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
Author: Yinghai Lu <yinghai@kernel.org>
Add an error return if CONFIG_HAVE_MEMBLOCK is not set instead
of having to add #ifdef CONFIG_HAVE_MEMBLOCK around blocks of
code calling that function.
Signed-off-by: Mike Travis <travis@sgi.com>
---
include/linux/memblock.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- linux.orig/include/linux/memblock.h
+++ linux/include/linux/memblock.h
@@ -2,6 +2,8 @@
#define _LINUX_MEMBLOCK_H
#ifdef __KERNEL__
+#define MEMBLOCK_ERROR 0
+
#ifdef CONFIG_HAVE_MEMBLOCK
/*
* Logical memory blocks.
@@ -20,7 +22,6 @@
#include <asm/memblock.h>
#define INIT_MEMBLOCK_REGIONS 128
-#define MEMBLOCK_ERROR 0
struct memblock_region {
phys_addr_t base;
@@ -160,6 +161,12 @@ static inline unsigned long memblock_reg
#define __initdata_memblock
#endif
+#else
+static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
+{
+ return MEMBLOCK_ERROR;
+}
+
#endif /* CONFIG_HAVE_MEMBLOCK */
#endif /* __KERNEL__ */
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
2011-03-31 0:41 ` [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set Mike Travis
@ 2011-03-31 1:40 ` Yinghai Lu
2011-03-31 15:23 ` Mike Travis
2011-04-07 19:43 ` Mike Travis
1 sibling, 1 reply; 32+ messages in thread
From: Yinghai Lu @ 2011-03-31 1:40 UTC (permalink / raw)
To: Mike Travis
Cc: Ingo Molnar, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
On Wed, Mar 30, 2011 at 5:41 PM, Mike Travis <travis@sgi.com> wrote:
> Subject: memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
> Author: Yinghai Lu <yinghai@kernel.org>
>
> Add an error return if CONFIG_HAVE_MEMBLOCK is not set instead
> of having to add #ifdef CONFIG_HAVE_MEMBLOCK around blocks of
> code calling that function.
>
> Signed-off-by: Mike Travis <travis@sgi.com>
> ---
> include/linux/memblock.h | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> --- linux.orig/include/linux/memblock.h
> +++ linux/include/linux/memblock.h
> @@ -2,6 +2,8 @@
> #define _LINUX_MEMBLOCK_H
> #ifdef __KERNEL__
>
> +#define MEMBLOCK_ERROR 0
> +
> #ifdef CONFIG_HAVE_MEMBLOCK
> /*
> * Logical memory blocks.
> @@ -20,7 +22,6 @@
> #include <asm/memblock.h>
>
> #define INIT_MEMBLOCK_REGIONS 128
> -#define MEMBLOCK_ERROR 0
>
> struct memblock_region {
> phys_addr_t base;
> @@ -160,6 +161,12 @@ static inline unsigned long memblock_reg
> #define __initdata_memblock
> #endif
>
> +#else
> +static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t
> align)
> +{
> + return MEMBLOCK_ERROR;
> +}
> +
> #endif /* CONFIG_HAVE_MEMBLOCK */
>
> #endif /* __KERNEL__ */
>
setup_log_buf will pass function pointer, So this one is not needed, right?
Thanks
Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] x86: Minimize SRAT messages
2011-03-01 7:51 ` Ingo Molnar
@ 2011-03-31 2:38 ` Len Brown
2011-03-31 4:40 ` Yinghai Lu
0 siblings, 1 reply; 32+ messages in thread
From: Len Brown @ 2011-03-31 2:38 UTC (permalink / raw)
To: Ingo Molnar
Cc: Mike Travis, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, Yinghai Lu,
linux-acpi, x86, linux-kernel, Linus Torvalds
Hi Mike,
How about a patch to stuff the srat info into a debugfs file,
and another patch to delete the console printk entirely?
thanks,
Len Brown, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] x86: Minimize SRAT messages
2011-03-31 2:38 ` Len Brown
@ 2011-03-31 4:40 ` Yinghai Lu
0 siblings, 0 replies; 32+ messages in thread
From: Yinghai Lu @ 2011-03-31 4:40 UTC (permalink / raw)
To: Len Brown
Cc: Ingo Molnar, Mike Travis, David Rientjes, Jack Steiner,
Robin Holt, Len Brown, Thomas Gleixner, H. Peter Anvin,
Andrew Morton, linux-acpi, x86, linux-kernel, Linus Torvalds
On Wed, Mar 30, 2011 at 7:38 PM, Len Brown <lenb@kernel.org> wrote:
> Hi Mike,
>
> How about a patch to stuff the srat info into a debugfs file,
> and another patch to delete the console printk entirely?
with the patch using memblock for allocate log buf early. this patch
is not needed.
Thanks
Yinghai
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
2011-03-31 1:40 ` Yinghai Lu
@ 2011-03-31 15:23 ` Mike Travis
2011-03-31 16:17 ` Yinghai Lu
0 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2011-03-31 15:23 UTC (permalink / raw)
To: Yinghai Lu
Cc: Ingo Molnar, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
Yinghai Lu wrote:
> On Wed, Mar 30, 2011 at 5:41 PM, Mike Travis <travis@sgi.com> wrote:
>> Subject: memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
>> Author: Yinghai Lu <yinghai@kernel.org>
>>
>> Add an error return if CONFIG_HAVE_MEMBLOCK is not set instead
>> of having to add #ifdef CONFIG_HAVE_MEMBLOCK around blocks of
>> code calling that function.
>>
>> Signed-off-by: Mike Travis <travis@sgi.com>
>> ---
>> include/linux/memblock.h | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> --- linux.orig/include/linux/memblock.h
>> +++ linux/include/linux/memblock.h
>> @@ -2,6 +2,8 @@
>> #define _LINUX_MEMBLOCK_H
>> #ifdef __KERNEL__
>>
>> +#define MEMBLOCK_ERROR 0
>> +
>> #ifdef CONFIG_HAVE_MEMBLOCK
>> /*
>> * Logical memory blocks.
>> @@ -20,7 +22,6 @@
>> #include <asm/memblock.h>
>>
>> #define INIT_MEMBLOCK_REGIONS 128
>> -#define MEMBLOCK_ERROR 0
>>
>> struct memblock_region {
>> phys_addr_t base;
>> @@ -160,6 +161,12 @@ static inline unsigned long memblock_reg
>> #define __initdata_memblock
>> #endif
>>
>> +#else
>> +static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t
>> align)
>> +{
>> + return MEMBLOCK_ERROR;
>> +}
>> +
>> #endif /* CONFIG_HAVE_MEMBLOCK */
>>
>> #endif /* __KERNEL__ */
>>
>
> setup_log_buf will pass function pointer, So this one is not needed, right?
The other function would need the #ifdef CONFIG_HAVE_MEMBLOCK before calling
memblock_alloc which I thought was the point of this patch? Note we still
have the last fallback of using alloc_boot_mem in kernel/init.c.
Thanks,
Mike
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
2011-03-31 15:23 ` Mike Travis
@ 2011-03-31 16:17 ` Yinghai Lu
0 siblings, 0 replies; 32+ messages in thread
From: Yinghai Lu @ 2011-03-31 16:17 UTC (permalink / raw)
To: Mike Travis
Cc: Ingo Molnar, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
On Thu, Mar 31, 2011 at 8:23 AM, Mike Travis <travis@sgi.com> wrote:
>
>>
>> setup_log_buf will pass function pointer, So this one is not needed,
>> right?
>
> The other function would need the #ifdef CONFIG_HAVE_MEMBLOCK before calling
> memblock_alloc which I thought was the point of this patch? Note we still
> have the last fallback of using alloc_boot_mem in kernel/init.c.
before that patch, it already use bootmem allocation.
So should be ok to drop this one.
Thanks
Yinghai
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
2011-03-31 0:41 ` [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set Mike Travis
2011-03-31 1:40 ` Yinghai Lu
@ 2011-04-07 19:43 ` Mike Travis
2011-04-08 6:40 ` Ingo Molnar
1 sibling, 1 reply; 32+ messages in thread
From: Mike Travis @ 2011-04-07 19:43 UTC (permalink / raw)
To: Yinghai Lu, Ingo Molnar
Cc: David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
Was there any further objections to these patches?
Mike Travis wrote:
> Subject: memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
> Author: Yinghai Lu <yinghai@kernel.org>
>
> Add an error return if CONFIG_HAVE_MEMBLOCK is not set instead
> of having to add #ifdef CONFIG_HAVE_MEMBLOCK around blocks of
> code calling that function.
>
> Signed-off-by: Mike Travis <travis@sgi.com>
> ---
> include/linux/memblock.h | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> --- linux.orig/include/linux/memblock.h
> +++ linux/include/linux/memblock.h
> @@ -2,6 +2,8 @@
> #define _LINUX_MEMBLOCK_H
> #ifdef __KERNEL__
>
> +#define MEMBLOCK_ERROR 0
> +
> #ifdef CONFIG_HAVE_MEMBLOCK
> /*
> * Logical memory blocks.
> @@ -20,7 +22,6 @@
> #include <asm/memblock.h>
>
> #define INIT_MEMBLOCK_REGIONS 128
> -#define MEMBLOCK_ERROR 0
>
> struct memblock_region {
> phys_addr_t base;
> @@ -160,6 +161,12 @@ static inline unsigned long memblock_reg
> #define __initdata_memblock
> #endif
>
> +#else
> +static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t
> align)
> +{
> + return MEMBLOCK_ERROR;
> +}
> +
> #endif /* CONFIG_HAVE_MEMBLOCK */
>
> #endif /* __KERNEL__ */
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set
2011-04-07 19:43 ` Mike Travis
@ 2011-04-08 6:40 ` Ingo Molnar
0 siblings, 0 replies; 32+ messages in thread
From: Ingo Molnar @ 2011-04-08 6:40 UTC (permalink / raw)
To: Mike Travis
Cc: Yinghai Lu, David Rientjes, Jack Steiner, Robin Holt, Len Brown,
Thomas Gleixner, H. Peter Anvin, Andrew Morton, linux-acpi, x86,
linux-kernel, Tejun Heo, Linus Torvalds
* Mike Travis <travis@sgi.com> wrote:
> Was there any further objections to these patches?
No big objections that i remember - so assuming review feedback has been
addressed please send the latest and greatest in a new thread, this one is
getting a bit deep :-)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2011-04-08 6:40 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-25 18:06 [PATCH 0/4] init: Shrink early messages to prevent overflowing the kernel log buffer Mike Travis
2011-02-25 18:06 ` [PATCH 1/4] printk: Allocate kernel log buffer earlier Mike Travis
2011-02-27 12:09 ` Ingo Molnar
2011-02-27 12:15 ` Ingo Molnar
2011-02-28 1:34 ` Yinghai Lu
2011-02-28 8:01 ` Ingo Molnar
2011-02-28 19:26 ` Mike Travis
2011-03-01 7:35 ` Ingo Molnar
2011-02-28 8:06 ` Ingo Molnar
2011-02-28 19:18 ` Yinghai Lu
2011-02-28 19:29 ` Mike Travis
2011-02-28 19:23 ` Mike Travis
2011-02-28 19:46 ` Yinghai Lu
2011-02-28 20:02 ` Mike Travis
2011-02-28 22:59 ` Yinghai Lu
2011-03-31 0:41 ` [PATCH 1/2] memblock: add error return when CONFIG_HAVE_MEMBLOCK is not set Mike Travis
2011-03-31 1:40 ` Yinghai Lu
2011-03-31 15:23 ` Mike Travis
2011-03-31 16:17 ` Yinghai Lu
2011-04-07 19:43 ` Mike Travis
2011-04-08 6:40 ` Ingo Molnar
2011-03-01 7:42 ` [PATCH 1/4] printk: Allocate kernel log buffer earlier Ingo Molnar
2011-02-28 19:14 ` Mike Travis
2011-02-25 18:06 ` [PATCH 2/4] printk: Break out printk_time Mike Travis
2011-02-27 11:56 ` Ingo Molnar
2011-02-25 18:06 ` [PATCH 3/4] printk: Minimize time zero output Mike Travis
2011-02-25 18:06 ` [PATCH 4/4] x86: Minimize SRAT messages Mike Travis
2011-02-27 12:03 ` Ingo Molnar
2011-02-28 19:41 ` Mike Travis
2011-03-01 7:51 ` Ingo Molnar
2011-03-31 2:38 ` Len Brown
2011-03-31 4:40 ` Yinghai Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).