Re: [PATCH printk v3 04/14] printk: ringbuffer: Do not skip non-finalized records with prb_next_seq()

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org,
	Mukesh Ojha <quic_mojha@quicinc.com>
Subject: Re: [PATCH printk v3 04/14] printk: ringbuffer: Do not skip non-finalized records with prb_next_seq()
Date: Mon, 05 Feb 2024 12:39:30 +0106	[thread overview]
Message-ID: <871q9rp2lx.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <ZaVkqJ-KMRp9mbLR@alley>

On 2024-01-15, Petr Mladek <pmladek@suse.com> wrote:
>> The acquire is with @last_finalized_seq. So the release must also be
>> with @last_finalized_seq. The important thing is that the CPU that
>> updates @last_finalized_seq has actually read the corresponding
>> record beforehand. That is exactly what desc_update_last_finalized()
>> does.
>
> I probably did not describe it well. The CPU updating
> @last_finalized_seq does the right thing. I was not sure about the CPU
> which reads @last_finalized_seq via prb_next_seq().
>
> To make it more clear:
>
> u64 prb_next_seq(struct printk_ringbuffer *rb)
> {
> 	u64 seq;
>
> 	seq = desc_last_finalized_seq(rb);
> 	      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 	      |
> 	      `-> This includes atomic_long_read_acquire(last_finalized_seq)
>
>
> 	if (seq != 0)
> 		seq++;
>
> 	while (_prb_read_valid(rb, &seq, NULL, NULL))
> 		seq++;
>
> 	return seq;
> }
>
> But where is the atomic_long_read_release(last_finalized_seq) in
> this code path?

read_release? The counterpart of this load_acquire is a
store_release. For example:

CPU0                     CPU1
====                     ====
load(varA)
store_release(varB)      load_acquire(varB)
                         load(varA)

If CPU1 reads the value in varB that CPU0 stored, then it is guaranteed
that CPU1 will read the value (or a later value) in varA that CPU0 read.

Translating the above example to this particular patch, we have:

CPU0: desc_update_last_finalized()       CPU1: prb_next_seq()
====                                     ====
_prb_read_valid(seq)
cmpxchg_release(last_finalized_seq,seq)  seq=read_acquire(last_finalized_seq)
                                         _prb_read_valid(seq)

> IMHO, the barrier provided by the acquire() is _important_ to make
> sure that _prb_read_valid() would see the valid descriptor.

Correct.

> Now, I think that the related read_release(seq) is hidden in:
>
> static int prb_read(struct printk_ringbuffer *rb, u64 seq,
> 		    struct printk_record *r, unsigned int *line_count)
> {
> 	/* Get a local copy of the correct descriptor (if available). */
> 	err = desc_read_finalized_seq(desc_ring, id, seq, &desc);
>
> 	/* If requested, copy meta data. */
> 	if (r->info)
> 		memcpy(r->info, info, sizeof(*(r->info)));
>
> 	/* Copy text data. If it fails, this is a data-less record. */
> 	if (!copy_data(&rb->text_data_ring, &desc.text_blk_lpos, info->text_len,
> 		       r->text_buf, r->text_buf_size, line_count)) {
> 		return -ENOENT;
> 	}
>
> 	/* Ensure the record is still finalized and has the same @seq. */
> 	return desc_read_finalized_seq(desc_ring, id, seq, &desc);
> 	       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 	       |
> 	       `-> This includes a memory barrier /* LMM(desc_read:A) */
> 		   which makes sure that the data are read before
> 		   the desc/data could be reused.
> }
>
> I consider this /* LMM(desc_read:A) */ as a counter part for that
> acquire() in prb_next_seq().

desc_read:A is not a memory barrier. It only marks the load of the
descriptor state. This is a significant load because prb_next_seq() must
see at least the descriptor state that desc_update_last_finalized() saw.

The memory barrier comments in desc_update_last_finalized() state:

    * If desc_last_finalized_seq:A reads from
    * desc_update_last_finalized:A, then desc_read:A reads from
    * _prb_commit:B.

This is referring to a slightly different situation than the example I
used above because it is referencing where the descriptor state was
stored (_prb_commit:B). The same general picture is valid:

CPU0                              CPU1
====                              ====
_prb_commit:B
desc_update_last_finalized:A      desc_last_finalized_seq:A
                                  desc_read:A

desc_read:A is loding the descriptor state that _prb_commit:B stored.

The extra note in the comment clarifies that _prb_commit:B could also be
denoted as desc_read:A because desc_update_last_finalized() performs a
read (i.e. must have seen) _prb_commit:B.

    * Note: _prb_commit:B and desc_update_last_finalized:A can be
    *       different CPUs. However, the desc_update_last_finalized:A
    *       CPU (which performs the release) must have previously seen
    *       _prb_commit:B.

Normally the CPU committing the record will also update
last_finalized_seq. But it is possible that another CPU updates
last_finalized_seq before the committing CPU because it already sees the
finalized record. In that case the complete (maximally complex) picture
looks like this.

CPU0            CPU1                           CPU2
====            ====                           ====
_prb_commit:B   desc_read:A
                desc_update_last_finalized:A   desc_last_finalized_seq:A
                                               desc_read:A

Any memory barriers in _prb_commit() or desc_read() are irrelevant for
guaranteeing that a CPU reading a sequence value from
desc_last_finalized_seq() will always be able to read that record.

> Summary:
>
> I saw atomic_long_read_acquire(last_finalized_seq) called from
> prb_next_seq() code path. The barrier looked important to me.
> But I saw neither the counter-part nor any comment. I wanted
> to understand it because it might be important for reviewing
> following patches which depend on prb_next_seq().

desc_update_last_finalized:A is the counterpart to
desc_last_finalized_seq:A. IMHO there are plenty of comments that are
formally documenting these memory barriers. Including the new entry in
the summary of all memory barriers:

 *   desc_update_last_finalized:A / desc_last_finalized_seq:A
 *     store finalized record, then set new highest finalized sequence number

John

next prev parent reply	other threads:[~2024-02-05 11:33 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 21:41 [PATCH printk v3 00/14] fix console flushing John Ogness
2023-12-14 21:41 ` [PATCH printk v3 01/14] printk: nbcon: Relocate 32bit seq macros John Ogness
2024-01-12 10:14   ` Petr Mladek
2023-12-14 21:41 ` [PATCH printk v3 02/14] printk: Adjust mapping for " John Ogness
2023-12-15  9:55   ` Sebastian Andrzej Siewior
2023-12-15 10:10     ` John Ogness
2023-12-15 10:58       ` Sebastian Andrzej Siewior
2024-01-12 10:28   ` Petr Mladek
2024-01-12 18:14     ` Petr Mladek
2024-01-15  8:51       ` Sebastian Andrzej Siewior
2024-01-15 10:52       ` John Ogness
2024-01-15 16:17         ` Petr Mladek
2024-01-15 17:08           ` John Ogness
2023-12-14 21:41 ` [PATCH printk v3 03/14] printk: Use prb_first_seq() as base " John Ogness
2024-01-12 16:19   ` Petr Mladek
2023-12-14 21:41 ` [PATCH printk v3 04/14] printk: ringbuffer: Do not skip non-finalized records with prb_next_seq() John Ogness
2024-01-12 18:05   ` Petr Mladek
2024-01-15 11:55     ` John Ogness
2024-01-15 17:00       ` Petr Mladek
2024-02-05 11:33         ` John Ogness [this message]
2024-02-06 17:27           ` Petr Mladek
2023-12-14 21:41 ` [PATCH printk v3 05/14] printk: ringbuffer: Clarify special lpos values John Ogness
2024-01-30 13:12   ` Petr Mladek
2023-12-14 21:41 ` [PATCH printk v3 06/14] printk: For @suppress_panic_printk check for other CPU in panic John Ogness
2023-12-14 21:41 ` [PATCH printk v3 07/14] printk: Add this_cpu_in_panic() John Ogness
2023-12-14 21:41 ` [PATCH printk v3 08/14] printk: ringbuffer: Cleanup reader terminology John Ogness
2024-01-30 14:36   ` Petr Mladek
2023-12-14 21:41 ` [PATCH printk v3 09/14] printk: Wait for all reserved records with pr_flush() John Ogness
2024-01-31 11:36   ` Petr Mladek
2024-02-05 13:33     ` John Ogness
2024-02-07  9:20       ` Petr Mladek
2023-12-14 21:41 ` [PATCH printk v3 10/14] printk: ringbuffer: Skip non-finalized records in panic John Ogness
2024-02-01 16:56   ` Petr Mladek
2023-12-14 21:41 ` [PATCH printk v3 11/14] printk: ringbuffer: Consider committed as finalized " John Ogness
2024-02-01 18:00   ` Petr Mladek
2024-02-05 14:08     ` John Ogness
2024-02-07 10:11       ` Petr Mladek
2023-12-14 21:41 ` [PATCH printk v3 12/14] printk: Disable passing console lock owner completely during panic() John Ogness
2023-12-14 21:42 ` [PATCH printk v3 13/14] printk: Avoid non-panic CPUs writing to ringbuffer John Ogness
2024-02-02  9:26   ` Petr Mladek
2023-12-14 21:42 ` [PATCH printk v3 14/14] panic: Flush kernel log buffer at the end John Ogness
2024-02-02  9:30   ` Petr Mladek
2024-02-02  9:38 ` [PATCH printk v3 00/14] fix console flushing Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871q9rp2lx.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=quic_mojha@quicinc.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).