All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrea Parri <parri.andrea@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] printk: use the lockless ringbuffer
Date: Fri, 14 Feb 2020 10:48:51 +0100	[thread overview]
Message-ID: <87r1yxh530.fsf@linutronix.de> (raw)
In-Reply-To: <20200214014113.GE36551@google.com> (Sergey Senozhatsky's message of "Fri, 14 Feb 2020 10:41:13 +0900")

On 2020-02-14, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>>> cat /dev/kmsg
>>> cat: /dev/kmsg: Broken pipe
>>
>> In mainline you can have this "problem" as well. Once the ringbuffer
>> has wrapped, any read to a newly opened /dev/kmsg when a new message
>> arrived will result in an EPIPE. This happens quite easily once the
>> ringbuffer has wrapped because each new message is overwriting the
>> oldest message.
>
> Hmm. Something doesn't add up.
>
> Looking at the numbers, both r->info->seq and prb_first_seq(prb)
> do increase, so there are new messages in the ring buffer
>
>                            u->seq    r->seq    prb_first_seq
> [..]
> cat: devkmsg_read() error 1981080   1982633   1981080
> cat: devkmsg_read() error 1981080   1982633   1981080
> cat: devkmsg_read() error 1981095   1982652   1981095
> cat: devkmsg_read() error 1981095   1982652   1981095
> cat: devkmsg_read() error 1981095   1982652   1981095
> [..]
>
> but 'cat' still wouldn't read anything from the logbuf - EPIPE.
>
> NOTE: I don't run 'cat /dev/kmsg' during the test. I run the test
> first, then I run 'cat /dev/kmsg', after the test, when
> printk-pressure is gone.

Sure. The problem is not the printk-pressure. The problem is you have
data-less records in your ringbuffer (from your previous
printk-pressure). If you used your own program that continued to read
after EPIPE, then you would see the sequence numbers jumping over the
data-less records.

> I can't reproduce it with current logbuf. 'cat' reads from /dev/kmsg
> after heavy printk-pressure test. So chances are some loggers can also
> experience problems. This might be a regression.

Mainline doesn't have data-less records. In mainline such failed
printk's are silently ignored (after attepting truncation). So for
mainline you can only reproduce the overflow case.

1. Boot 5.6.0-rc1 (without any console= slowing down printk)

2. Fill the ringbuffer and let it overflow with:

   $ while true; do echo filling buffer > /dev/kmsg; done &

3. Once you can see the ringbuffer has overflowed (and continues to
   overflow), try to read from /dev/kmsg

   $ strace head /dev/kmsg

In most cases you will see:

read(3, 0x7f7307ac1000, 4096)           = -1 EPIPE (Broken pipe)

Current readers need to be able to handle EPIPE. cat(1) does not and so
(unfortunately) is not a good candidate for reading the ringbuffer.

>>> ...
>>> systemd-journal: devkmsg_read() error 1979281 1982465 1980933
>>> systemd-journal: corrected seq 1982465 1982465
>>> cat: devkmsg_read() error 1980987 1982531 1980987
>>> cat: corrected seq 1982531 1982531
>>> cat: devkmsg_read() error 1981015 1982563 1981015
>>> cat: corrected seq 1982563 1982563
>>
>> The situation with a data-less record is the same as when the ringbuffer
>> wraps: cat is hitting that EPIPE. But re-opening the file descriptor is
>> not going to help because it will not be able to get past that data-less
>> record.
>
> So maybe this is the case with broken 'cat' on my system?

I think it is appropriate for an application to close the descriptor
after an EPIPE. /dev/kmsg is special because the reader should continue
reading anyway.

>> We could implement it such that devkmsg_read() will skip over data-less
>> records instead of issuing an EPIPE. (That is what dmesg does.) But then
>> do we need EPIPE at all? The reader can see that is has missed records
>> by tracking the sequence number, so could we just get rid of EPIPE? Then
>> cat(1) would be a great tool to view the raw ringbuffer. Please share
>> your thoughts on this.
>
> Looking at systemd/src/journal/journald-kmsg.c :
> server_read_dev_kmsg() -EPIPE is just one of the erronos they handle,
> nothing special.

Yes, but what does systemd-journald do when the EPIPE is _not_ returned
and instead there is a jump in the sequence number? Looking at
dev_kmsg_record(), systemd actually does it the way I would hope. It
tracks the sequence number correctly.

    /* Did we lose any? */
    if (serial > *s->kernel_seqnum)
         server_driver_message(s, 0,
                               "MESSAGE_ID="
                               SD_MESSAGE_JOURNAL_MISSED_STR,
                               LOG_MESSAGE("Missed %"PRIu64" kernel messages",
                               serial - *s->kernel_seqnum),
                               NULL);

> Could it be the case that some other loggers would have special
> handling for EPIPE?  I'm not sure, let's look around.
>
> I'd say that EPIPE removal looks OK to me. But before we do that, I'm
> not sure that we have clear understanding of 'cat /dev/kmsg' behaviour
> change.

In mainline, with regard to /dev/kmsg, sequence numbers will never
jump. If there would be a jump (due to lost messages) then EPIPE is
issued. The reader can either:

1. continue reading and see the jump

2. reopen the file descriptor, possibly having missed a ton more
   messages due to reopening, and then start from the oldest available
   message

With my series, #2 is no longer an option because the lost messages
could exist in a part of the ringbuffer not yet overwritten.

If we remove EPIPE, then readers will need to track the sequence number
to identify jumps. systemd-journald does this already. And tools like
cat(1) would "just work" because cat does not care if messages were
lost.

John Ogness

  parent reply	other threads:[~2020-02-14  9:48 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-28 16:19 [PATCH 0/2] printk: replace ringbuffer John Ogness
2020-01-28 16:19 ` [PATCH 1/2] printk: add lockless buffer John Ogness
2020-01-29  3:53   ` Steven Rostedt
2020-02-21 11:54   ` more barriers: " Petr Mladek
2020-02-27 12:04     ` John Ogness
2020-03-04 15:08       ` Petr Mladek
2020-03-13 10:13         ` John Ogness
2020-02-21 12:05   ` misc nits " Petr Mladek
2020-03-02 10:38     ` John Ogness
2020-03-02 12:17       ` Joe Perches
2020-03-02 12:32       ` Petr Mladek
2020-03-02 13:43         ` John Ogness
2020-03-03  9:47           ` Petr Mladek
2020-03-03 15:42             ` John Ogness
2020-03-04 10:09               ` Petr Mladek
2020-03-04  9:40           ` Petr Mladek
2020-01-28 16:19 ` [PATCH 2/2] printk: use the lockless ringbuffer John Ogness
2020-02-13  9:07   ` Sergey Senozhatsky
2020-02-13  9:42     ` John Ogness
2020-02-13 11:59       ` Sergey Senozhatsky
2020-02-13 22:36         ` John Ogness
2020-02-14  1:41           ` Sergey Senozhatsky
2020-02-14  2:09             ` Sergey Senozhatsky
2020-02-14  9:48             ` John Ogness [this message]
2020-02-14 13:29   ` lijiang
2020-02-14 13:50     ` John Ogness
2020-02-15  4:15       ` lijiang
2020-02-17 15:40       ` crashdump: " Petr Mladek
2020-02-17 16:14         ` John Ogness
2020-02-17 14:41   ` misc details: " Petr Mladek
2020-02-25 20:11     ` John Ogness
2020-02-26  9:54       ` Petr Mladek
2020-02-05  4:25 ` [PATCH 0/2] printk: replace ringbuffer lijiang
2020-02-05  4:42   ` Sergey Senozhatsky
2020-02-05  4:48   ` Sergey Senozhatsky
2020-02-05  5:02     ` Sergey Senozhatsky
2020-02-05  5:38       ` lijiang
2020-02-05  6:36         ` Sergey Senozhatsky
2020-02-05  9:00           ` John Ogness
2020-02-05  9:28             ` Sergey Senozhatsky
2020-02-05 10:19             ` lijiang
2020-02-05 16:12               ` John Ogness
2020-02-06  9:12                 ` lijiang
2020-02-13 13:07                 ` Petr Mladek
2020-02-14  1:07                   ` Sergey Senozhatsky
2020-02-05 11:07             ` Sergey Senozhatsky
2020-02-05 15:48               ` John Ogness
2020-02-05 19:29                 ` Joe Perches
2020-02-06  6:31                 ` Sergey Senozhatsky
2020-02-06  7:30                 ` lijiang
2020-02-07  1:40                 ` Steven Rostedt
2020-02-07  7:43                   ` John Ogness
2020-02-14 15:56                 ` Petr Mladek
2020-02-17 11:13                   ` John Ogness
2020-02-17 14:50                     ` Petr Mladek
2020-02-25 19:27                       ` John Ogness
2020-02-05  9:36           ` lijiang
2020-02-06  9:21 ` lijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1yxh530.fsf@linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.