* [RFC PATCH 1/2] printk: Add context information to the header of /dev/kmsg
2014-05-19 10:28 [RFC PATCH 0/2] printk: Add context information to kernel messages from /dev/kmsg Yoshihiro YUNOMAE
@ 2014-05-19 10:28 ` Yoshihiro YUNOMAE
2014-05-19 11:35 ` Kay Sievers
2014-05-20 3:36 ` Valdis.Kletnieks
2014-05-19 10:28 ` [RFC PATCH 2/2] printk: Simplify fragmented line information of " Yoshihiro YUNOMAE
1 sibling, 2 replies; 7+ messages in thread
From: Yoshihiro YUNOMAE @ 2014-05-19 10:28 UTC (permalink / raw)
To: Andrew Morton, Kay Sievers, linux-kernel
Cc: Kees Cook, Randy Dunlap, Petr Mladek, Hidehiro Kawai,
yrl.pp-manager.tt, Masami Hiramatsu, Joe Perches, Arun KS
Add context information to the header of /dev/kmsg.
Two printk messages connected with KERN_CONT can be divided in multiple lines
by a different process context message. If the different context message seems
like the 1st divided message, it is difficult to understand which the 2nd
divided message belongs to. This problem can also occur for the situation where
multiple message lines without KERN_CONT are broken into by similar messages.
For example, SCSI disk error messages can be show as follows:
[110781.736171] sd 2:0:0:0: [sdb]
[110781.736170] sd 3:0:0:0: [sdc] Unhandled sense code
[110781.736172] sd 3:0:0:0: [sdc]
[110781.736175] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[110781.736177] sd 3:0:0:0: [sdc]
[110781.736178] Sense Key : Medium Error [current]
[110781.736187] Sense Key : Recovered Error
[110781.736189] [current]
This patch adds PID and interrupt context flag to the header of /dev/kmsg as
the context information in order to understand relation of output messages. If
PID values of two messages and the interrupt context flags are same, it means
that those messages are same context, so those message have some relation. If
not so, it means that those messages are different context, so users do not
need to take care about the relation of the messages.
The header information is changed as follows:
<prefix>,<seq#>,<timestamp>,<fragmented>,<PID>/<interrupt context>;<msg>
| |
+-----------------------+
context info.
Interrupt context indicates as follows:
n: NMI context
h: Hard IRQ context
s: Soft IRQ context
-: No interrupt context
<Examples>
- Output same 3 printk messages in two kernel threads.
Run two kernel threads in same time in order to make the messages compete.
pr_info("A1");
printk("A2");
printk("A3\n");
- Current implementation (without context information)
6,3321,31629088,-;A1 <== Who output this message?
6,3322,31629091,c;A1 |
4,3323,31629094,+;A2 <----+
4,3324,31629095,+;A3 | Which?
4,3325,31629097,c;A2 <----+
4,3326,31629099,+;A3
A1, A2, and A3 messages of two threads are clearly fragmented in this result,
but we cannot understand who output the 1st line message and which messages
were output by whom. So, the relation of divided messages is unclear from this
test.
- Applying this patch (with context information)
6,2308,36687114,-,622/-;A1 <== Output by PID=622
6,2309,36687118,c,621/-;A1 |
4,2310,36687121,+,621/-;A2 | Relate
4,2311,36687123,+,621/-;A3 |
4,2312,36687124,c,622/-;A2 <----+ (1)
4,2313,36687126,+,622/-;A3 <----+ (2)
We can understand the relation of divided messages. For example,
the 1st line indicates PID=622 and no interrupt context. The same context
is the lines of (1) and (2). So, users can understand these three messages
are output by the same thread.
Note:
According to Documentation/ABI/testing/dev-kmsg, we can add comma separated
values before ';' as the header information of /dev/kmsg. So, this change does
not affect ABI.
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Arun KS <arunks.linux@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
---
Documentation/ABI/testing/dev-kmsg | 21 ++++++++--
kernel/printk/printk.c | 77 ++++++++++++++++++++++++++++--------
2 files changed, 76 insertions(+), 22 deletions(-)
diff --git a/Documentation/ABI/testing/dev-kmsg b/Documentation/ABI/testing/dev-kmsg
index bb820be..d8128ac 100644
--- a/Documentation/ABI/testing/dev-kmsg
+++ b/Documentation/ABI/testing/dev-kmsg
@@ -77,11 +77,11 @@ Description: The /dev/kmsg character device node provides userspace access
userspace.
Example:
- 7,160,424069,-;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
+ 7,160,424069,-,1/-;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
- 6,339,5140900,-;NET: Registered protocol family 10
- 30,340,5690716,-;udevd[80]: starting version 181
+ 6,339,5140900,-,1/-;NET: Registered protocol family 10
+ 30,340,5690716,-,80/-;udevd[80]: starting version 181
The DEVICE= key uniquely identifies devices the following way:
b12:8 - block dev_t
@@ -89,7 +89,7 @@ Description: The /dev/kmsg character device node provides userspace access
n8 - netdev ifindex
+sound:card0 - subsystem:devname
- The flags field carries '-' by default. A 'c' indicates a
+ The 4th flags field carries '-' by default. A 'c' indicates a
fragment of a line. All following fragments are flagged with
'+'. Note, that these hints about continuation lines are not
necessarily correct, and the stream could be interleaved with
@@ -98,4 +98,17 @@ Description: The /dev/kmsg character device node provides userspace access
logic is used internally when messages are printed to the
console, /proc/kmsg or the syslog() syscall.
+ The 5th field is context information to separate mixed messages
+ from different contexts. The field indicates PID and interrupt
+ context information like '<PID>/<interrupt context info.>'. In
+ interrupt context information, 'n' indicates NMI context, 'h'
+ indicates hard IRQ context, 's' indicates soft IRQ context, and
+ '-' indicates no interrupt context. This 5th context field tells
+ users which messages are output in the same context. If PID
+ values of two messages and interrupt context are same, it means
+ that those messages are output by the same process, so those
+ message may have any relation. If not so, it means that those
+ messages are different context, so users do not need to care
+ about the relation of the messages.
+
Users: dmesg(1), userspace kernel log consumers
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7228258..8105431 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -197,6 +197,8 @@ enum log_flags {
struct printk_log {
u64 ts_nsec; /* timestamp in nanoseconds */
+ pid_t pid; /* identify PID */
+ u32 irq_count; /* identify irq_count */
u16 len; /* length of entire record */
u16 text_len; /* length of text buffer */
u16 dict_len; /* length of dictionary buffer */
@@ -298,10 +300,11 @@ static u32 log_next(u32 idx)
}
/* insert record into the buffer, discard old ones, update heads */
-static void log_store(int facility, int level,
- enum log_flags flags, u64 ts_nsec,
- const char *dict, u16 dict_len,
- const char *text, u16 text_len)
+static void __log_store(int facility, int level,
+ enum log_flags flags, u64 ts_nsec,
+ const char *dict, u16 dict_len,
+ const char *text, u16 text_len,
+ pid_t pid, u32 irq_count)
{
struct printk_log *msg;
u32 size, pad_len;
@@ -339,6 +342,8 @@ static void log_store(int facility, int level,
/* fill message */
msg = (struct printk_log *)(log_buf + log_next_idx);
+ msg->pid = pid;
+ msg->irq_count = irq_count;
memcpy(log_text(msg), text, text_len);
msg->text_len = text_len;
memcpy(log_dict(msg), dict, dict_len);
@@ -470,6 +475,18 @@ out:
return ret;
}
+static char show_irq_count_flag(u32 irq_count)
+{
+ if (irq_count & NMI_MASK)
+ return 'n';
+ else if (irq_count & HARDIRQ_MASK)
+ return 'h';
+ else if (irq_count & SOFTIRQ_MASK)
+ return 's';
+ else
+ return '-';
+}
+
static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
@@ -530,9 +547,10 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
((user->prev & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
cont = '+';
- len = sprintf(user->buf, "%u,%llu,%llu,%c;",
+ len = sprintf(user->buf, "%u,%llu,%llu,%c,%d/%c;",
(msg->facility << 3) | msg->level,
- user->seq, ts_usec, cont);
+ user->seq, ts_usec, cont, msg->pid,
+ show_irq_count_flag(msg->irq_count));
user->prev = msg->flags;
/* escape non-printable characters */
@@ -1391,12 +1409,31 @@ static struct cont {
size_t cons; /* bytes written to console */
struct task_struct *owner; /* task of first print*/
u64 ts_nsec; /* time of first print */
+ u32 irq_count; /* irq_count of first print */
u8 level; /* log level of first message */
u8 facility; /* log level of first message */
enum log_flags flags; /* prefix, newline flags */
bool flushed:1; /* buffer sealed and committed */
} cont;
+static inline void log_store_cont(int facility, int level,
+ enum log_flags flags, u64 ts_nsec,
+ const char *dict, u16 dict_len,
+ const char *text, u16 text_len)
+{
+ __log_store(facility, level, flags, ts_nsec, dict, dict_len,
+ text, text_len, cont.owner->tgid, cont.irq_count);
+}
+
+static inline void log_store_current(int facility, int level,
+ enum log_flags flags, u64 ts_nsec,
+ const char *dict, u16 dict_len,
+ const char *text, u16 text_len)
+{
+ __log_store(facility, level, flags, ts_nsec, dict, dict_len,
+ text, text_len, current->tgid, irq_count());
+}
+
static void cont_flush(enum log_flags flags)
{
if (cont.flushed)
@@ -1410,8 +1447,8 @@ static void cont_flush(enum log_flags flags)
* console; wait for the console to pick up the rest of the
* line. LOG_NOCONS suppresses a duplicated output.
*/
- log_store(cont.facility, cont.level, flags | LOG_NOCONS,
- cont.ts_nsec, NULL, 0, cont.buf, cont.len);
+ log_store_cont(cont.facility, cont.level, flags | LOG_NOCONS,
+ cont.ts_nsec, NULL, 0, cont.buf, cont.len);
cont.flags = flags;
cont.flushed = true;
} else {
@@ -1419,8 +1456,8 @@ static void cont_flush(enum log_flags flags)
* If no fragment of this line ever reached the console,
* just submit it to the store and free the buffer.
*/
- log_store(cont.facility, cont.level, flags, 0,
- NULL, 0, cont.buf, cont.len);
+ log_store_cont(cont.facility, cont.level, flags, 0,
+ NULL, 0, cont.buf, cont.len);
cont.len = 0;
}
}
@@ -1441,6 +1478,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
cont.level = level;
cont.owner = current;
cont.ts_nsec = local_clock();
+ cont.irq_count = irq_count();
cont.flags = 0;
cont.cons = 0;
cont.flushed = false;
@@ -1532,8 +1570,8 @@ asmlinkage int vprintk_emit(int facility, int level,
recursion_bug = 0;
printed_len += strlen(recursion_msg);
/* emit KERN_CRIT message */
- log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
- NULL, 0, recursion_msg, printed_len);
+ log_store_current(0, 2, LOG_PREFIX|LOG_NEWLINE, 0,
+ NULL, 0, recursion_msg, printed_len);
}
/*
@@ -1582,13 +1620,14 @@ asmlinkage int vprintk_emit(int facility, int level,
* Flush the conflicting buffer. An earlier newline was missing,
* or another task also prints continuation lines.
*/
- if (cont.len && (lflags & LOG_PREFIX || cont.owner != current))
+ if (cont.len && (lflags & LOG_PREFIX || cont.owner != current
+ || cont.irq_count != irq_count()))
cont_flush(LOG_NEWLINE);
/* buffer line if possible, otherwise store it right away */
if (!cont_add(facility, level, text, text_len))
- log_store(facility, level, lflags | LOG_CONT, 0,
- dict, dictlen, text, text_len);
+ log_store_current(facility, level, lflags | LOG_CONT, 0,
+ dict, dictlen, text, text_len);
} else {
bool stored = false;
@@ -1601,15 +1640,17 @@ asmlinkage int vprintk_emit(int facility, int level,
* a newline, flush and append the newline.
*/
if (cont.len) {
- if (cont.owner == current && !(lflags & LOG_PREFIX))
+ if ((cont.owner == current &&
+ cont.irq_count == irq_count())
+ && !(lflags & LOG_PREFIX))
stored = cont_add(facility, level, text,
text_len);
cont_flush(LOG_NEWLINE);
}
if (!stored)
- log_store(facility, level, lflags, 0,
- dict, dictlen, text, text_len);
+ log_store_current(facility, level, lflags, 0,
+ dict, dictlen, text, text_len);
}
printed_len += text_len;
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH 2/2] printk: Simplify fragmented line information of the header of /dev/kmsg
2014-05-19 10:28 [RFC PATCH 0/2] printk: Add context information to kernel messages from /dev/kmsg Yoshihiro YUNOMAE
2014-05-19 10:28 ` [RFC PATCH 1/2] printk: Add context information to the header of /dev/kmsg Yoshihiro YUNOMAE
@ 2014-05-19 10:28 ` Yoshihiro YUNOMAE
1 sibling, 0 replies; 7+ messages in thread
From: Yoshihiro YUNOMAE @ 2014-05-19 10:28 UTC (permalink / raw)
To: Andrew Morton, Kay Sievers, linux-kernel
Cc: Kees Cook, Randy Dunlap, Petr Mladek, Hidehiro Kawai,
yrl.pp-manager.tt, Masami Hiramatsu, Joe Perches, Arun KS
Simplify fragmented line information of the 4th flag of the header of /dev/kmsg.
One of the hint of the printk() linebreak is the header information of
/dev/kmsg. 4th flag in the header adds 'c' or '+' when a message is divided.
Users can roughly understand where the printk message was divided, but they
cannot understand exactly which messages are divided or how those messages
should be connected.
According to a comment in printk.c, 'c' is marked for the first fragment of a
line. However, 'c' is not always marked as the first fragment of the line in
current kernel. The kernel outputs the 'c' flag when current message flags has
LOG_CONT and previous message flag does not have LOG_CONT. So, LOG_CONT flag
is needed to add 'c'. However, when the previous message A is being stored in
cont buffer and the current message B is not added to cont buffer, LOG_CONT
flag is added not to the message A but to the message B. In this situation,
'c' flag is added to the message B even though the message A is actually
the first fragmented line. As the result, it seems that the message A is not
divided because the message A is output with '-' (no fragment).
In addition, all following fragments are flagged with '+' according to
Documentation/ABI/testing/dev-kmsg. In kernel, when current message and the
previous message has LOG_CONT flag, '+' flag is added to the current message.
However, if those messages are output in different context and those messages
have LOG_CONT flag, '+' flag is added to the current message but it is
sometimes meaningless. For example, when previous message with LOG_CONT is
output by process C and current message (1st continuous message) with LOG_CONT
is output by process D, adding '+' flag to the current message of process D is
incorrect. This occurs when both return values of both messages without '\n'
are false in cont_add().
This patch simplifies the fragmented line flag policy.
1) Add 'f' flag if the kernel flushes a fragment message
2) Delete 'c' and '+'
When fragmentation occurs, the current message flushes the previous message
with LOG_CONT. When the kernel outputs the messages to /dev/kmsg, it
just checks whether messages have LOG_CONT flag or not. If the messages have
LOG_CONT flag, it adds 'f' flag in the header of /dev/kmsg. Then, users can
understand which message was divided by 'f' flag and how these messages
should be connected by checking the fragmented flag and the context information.
<Example>
- Environment
Execute 3 kernel threads A, B, and C. Each thread outputs following
messages:
/* for {A, B, C} */
pr_info("{A, B, C}1");
printk("{A, B, C}2");
printk("{A, B, C}3\n");
- Current result of /dev/kmsg
6,225186,93838029,-,2288/-;B1 <== No fragment?
6,225187,93838091,c,2289/-;C1
4,225188,93838092,+,2289/-;C2
4,225189,93838092,+,2289/-;C3
6,225190,93842059,c,2289/-;C1
4,225191,93842409,+,2289/-;C2C3
4,225192,93842713,-,2288/-;B2 <== No fragment?
6,225193,93842944,c,2287/-;A1
4,225194,93842944,+,2287/-;A2
4,225195,93842945,+,2287/-;A3
4,225196,93846199,-,2288/-;B3 <== No fragment?
In this example, fragmented flags of the B thread are incorrect because
B1, B2, and B3 are actually divided, but there are no flags. So, users cannot
understand which messages are divided or how those messages should be
connected in current implementation.
- Apply this patch
This patch introduces 'f' flag policy and deletes 'c' and '+' flags. If a
message is forcibly flushed in the middle, just add 'f'. Moreover, if a
message includes '\n' meaning the end of the messages, add no fragment flag
('-'). So, if users find 'f' flag for a message, they can connect the
message to next 'f' flag (forcibly flushed) or '-' (the end of the messages)
in same context.
6,8219,543073110,f,538/-;B1 <== This message is fragmented.
6,8220,543073113,f,539/-;C1
4,8221,543073115,f,539/-;C2
4,8222,543073116,-,539/-;C3
4,8223,543073117,f,538/-;B2 <== Stil fragmented
6,8224,543073120,f,537/-;A1
4,8225,543073120,-,538/-;B3 <== This is the end of the messages.
4,8226,543073123,-,537/-;A2A3
A message merging tool can convert these messages as follows:
6,8219,543073110,-,538/-;B1B2B3
6,8220,543073113,-,539/-;C1C2C3
6,8224,543073120,-,537/-;A1A2A3
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Arun KS <arunks.linux@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
---
Documentation/ABI/testing/dev-kmsg | 15 +++++++--------
kernel/printk/printk.c | 31 +++++++++++++++----------------
2 files changed, 22 insertions(+), 24 deletions(-)
diff --git a/Documentation/ABI/testing/dev-kmsg b/Documentation/ABI/testing/dev-kmsg
index d8128ac..ccb7cf3 100644
--- a/Documentation/ABI/testing/dev-kmsg
+++ b/Documentation/ABI/testing/dev-kmsg
@@ -89,14 +89,13 @@ Description: The /dev/kmsg character device node provides userspace access
n8 - netdev ifindex
+sound:card0 - subsystem:devname
- The 4th flags field carries '-' by default. A 'c' indicates a
- fragment of a line. All following fragments are flagged with
- '+'. Note, that these hints about continuation lines are not
- necessarily correct, and the stream could be interleaved with
- unrelated messages, but merging the lines in the output
- usually produces better human readable results. A similar
- logic is used internally when messages are printed to the
- console, /proc/kmsg or the syslog() syscall.
+ The 4th flags field carries '-' by default. A 'f' indicates a
+ fragment of a line. If a message is forcibly flushed in the
+ middle, just add 'f'. Moreover, if a message includes '\n'
+ meaning the end of the messages, no add flag ('-'). So, if
+ users find 'f' flag for a message, they can connect the message
+ to next 'f' flag (forcibly flushed) or '-' (the end of the
+ messages) in same context information output in the 5th field.
The 5th field is context information to separate mixed messages
from different contexts. The field indicates PID and interrupt
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8105431..d8c3716 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -414,7 +414,6 @@ static int check_syslog_permissions(int type, bool from_file)
struct devkmsg_user {
u64 seq;
u32 idx;
- enum log_flags prev;
struct mutex lock;
char buf[8192];
};
@@ -535,23 +534,16 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
/*
* If we couldn't merge continuation line fragments during the print,
- * export the stored flags to allow an optional external merge of the
- * records. Merging the records isn't always neccessarily correct, like
- * when we hit a race during printing. In most cases though, it produces
- * better readable output. 'c' in the record flags mark the first
- * fragment of a line, '+' the following.
+ * export the stored flags 'f' to allow an optional external merge of
+ * the records.
*/
- if (msg->flags & LOG_CONT && !(user->prev & LOG_CONT))
- cont = 'c';
- else if ((msg->flags & LOG_CONT) ||
- ((user->prev & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
- cont = '+';
+ if (msg->flags & LOG_CONT)
+ cont = 'f';
len = sprintf(user->buf, "%u,%llu,%llu,%c,%d/%c;",
(msg->facility << 3) | msg->level,
user->seq, ts_usec, cont, msg->pid,
show_irq_count_flag(msg->irq_count));
- user->prev = msg->flags;
/* escape non-printable characters */
for (i = 0; i < msg->text_len; i++) {
@@ -1620,9 +1612,13 @@ asmlinkage int vprintk_emit(int facility, int level,
* Flush the conflicting buffer. An earlier newline was missing,
* or another task also prints continuation lines.
*/
- if (cont.len && (lflags & LOG_PREFIX || cont.owner != current
- || cont.irq_count != irq_count()))
- cont_flush(LOG_NEWLINE);
+ if (cont.len) {
+ if (cont.owner != current ||
+ cont.irq_count != irq_count())
+ cont_flush(LOG_CONT);
+ else if (lflags & LOG_PREFIX)
+ cont_flush(LOG_NEWLINE);
+ }
/* buffer line if possible, otherwise store it right away */
if (!cont_add(facility, level, text, text_len))
@@ -1645,7 +1641,10 @@ asmlinkage int vprintk_emit(int facility, int level,
&& !(lflags & LOG_PREFIX))
stored = cont_add(facility, level, text,
text_len);
- cont_flush(LOG_NEWLINE);
+ if (!stored)
+ cont_flush(LOG_CONT);
+ else
+ cont_flush(LOG_NEWLINE);
}
if (!stored)
^ permalink raw reply related [flat|nested] 7+ messages in thread