From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Andrea Parri <parri.andrea@gmail.com>,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Paul McKenney <paulmck@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
Steven Rostedt <rostedt@goodmis.org>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Barrier before pushing desc_ring tail: was [PATCH v2 2/3] printk: add lockless buffer
Date: Tue, 09 Jun 2020 17:56:03 +0200 [thread overview]
Message-ID: <87d068utbg.fsf@vostro.fn.ogness.net> (raw)
In-Reply-To: <20200609113751.GD23752@linux-b0ei> (Petr Mladek's message of "Tue, 9 Jun 2020 13:37:51 +0200")
On 2020-06-09, Petr Mladek <pmladek@suse.com> wrote:
>> --- /dev/null
>> +++ b/kernel/printk/printk_ringbuffer.c
>> +/*
>> + * Advance the desc ring tail. This function advances the tail by one
>> + * descriptor, thus invalidating the oldest descriptor. Before advancing
>> + * the tail, the tail descriptor is made reusable and all data blocks up to
>> + * and including the descriptor's data block are invalidated (i.e. the data
>> + * ring tail is pushed past the data block of the descriptor being made
>> + * reusable).
>> + */
>> +static bool desc_push_tail(struct printk_ringbuffer *rb,
>> + unsigned long tail_id)
>> +{
>> + struct prb_desc_ring *desc_ring = &rb->desc_ring;
>> + enum desc_state d_state;
>> + struct prb_desc desc;
>> +
>> + d_state = desc_read(desc_ring, tail_id, &desc);
>> +
>> + switch (d_state) {
>> + case desc_miss:
>> + /*
>> + * If the ID is exactly 1 wrap behind the expected, it is
>> + * in the process of being reserved by another writer and
>> + * must be considered reserved.
>> + */
>> + if (DESC_ID(atomic_long_read(&desc.state_var)) ==
>> + DESC_ID_PREV_WRAP(desc_ring, tail_id)) {
>> + return false;
>> + }
>> +
>> + /*
>> + * The ID has changed. Another writer must have pushed the
>> + * tail and recycled the descriptor already. Success is
>> + * returned because the caller is only interested in the
>> + * specified tail being pushed, which it was.
>> + */
>> + return true;
>> + case desc_reserved:
>> + return false;
>> + case desc_committed:
>> + desc_make_reusable(desc_ring, tail_id);
>> + break;
>> + case desc_reusable:
>> + break;
>> + }
>> +
>> + /*
>> + * Data blocks must be invalidated before their associated
>> + * descriptor can be made available for recycling. Invalidating
>> + * them later is not possible because there is no way to trust
>> + * data blocks once their associated descriptor is gone.
>> + */
>> +
>> + if (!data_push_tail(rb, &rb->text_data_ring, desc.text_blk_lpos.next))
>> + return false;
>> + if (!data_push_tail(rb, &rb->dict_data_ring, desc.dict_blk_lpos.next))
>> + return false;
>> +
>> + /*
>> + * Check the next descriptor after @tail_id before pushing the tail
>> + * to it because the tail must always be in a committed or reusable
>> + * state. The implementation of prb_first_seq() relies on this.
>> + *
>> + * A successful read implies that the next descriptor is less than or
>> + * equal to @head_id so there is no risk of pushing the tail past the
>> + * head.
>> + */
>> + d_state = desc_read(desc_ring, DESC_ID(tail_id + 1),
>> + &desc); /* LMM(desc_push_tail:A) */
>> + if (d_state == desc_committed || d_state == desc_reusable) {
>> + /*
>> + * Any CPU that loads the new tail ID, must see that the
>> + * descriptor at @tail_id is in the reusable state. See the
>> + * read memory barrier part of desc_reserve:D for details.
>> + */
>> + atomic_long_cmpxchg_relaxed(&desc_ring->tail_id, tail_id,
>> + DESC_ID(tail_id + 1)); /* LMM(desc_push_tail:B) */
>
> I was quite confused by the above comment. Does it mean that we need
> a barrier here? Or does it explain why the cmpxchg has its own
> LMM marker?
This LMM marker is referenced quite often, but since it is a relaxed
cmpxchg(), its significance is not immediately clear. I was hoping to
add some hints as to why it is significant. The comment that it is
referring to is:
/*
* Guarantee the tail ID is read before validating the
* recycled descriptor state. A read memory barrier is
* sufficient for this. This pairs with data_push_tail:C.
*
* Memory barrier involvement:
*
* If desc_reserve:C reads from desc_push_tail:B, then
* desc_reserve:F reads from desc_make_reusable:A.
*
* Relies on:
*
* MB from desc_make_reusable:A to desc_push_tail:B
* matching
* RMB from desc_reserve:C to desc_reserve:F
*
* Note: desc_make_reusable:A, desc_push_tail:B, and
* data_push_tail:C can all be different CPUs. However,
* the desc_push_tail:B CPU must have previously seen
* data_push_tail:D and the data_push_tail:D CPU (which
* performs the full memory barrier) must have
* previously seen desc_make_reusable:A.
*/
English translation:
In order to push the data tail, a CPU must first see that the associated
descriptor is in the reusable state. Since a full memory barrier is
performed after that sighting and before doing the data tail push, _any_
CPU that sees the pushed data tail will be able to see that the
associated descriptor is in the reusable state.
In order to push the descriptor tail, a CPU must first see that the
associated data tail has been pushed. Therefore, that CPU would also see
that the associated descriptor is in the reusable state.
> I think that we actually need a full barrier here to make sure that
> all CPUs see the changes made by data_push_tail() before we
> allow to rewrite the descriptor. The changes in data_push_tail() might
> be done on different CPUs.
There needs to be a reason why the ordering of data tail pushing and
descriptor tail pushing is important.
> It is similar like the full barrier in data_push_tail() before changing
> data_ring->tail_lpos.
How so? That memory barrier exists to make sure the reusable descriptor
state is stored before pushing the data tail. This is important for
readers (which start from the data tail) so they can notice if the
descriptor has since been invalidated (reusable state).
But where is it important that the data tail change is seen before the
descriptor tail change? How are the data tail and descriptor tail
significantly related to each other?
John Ogness
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
next prev parent reply other threads:[~2020-06-09 15:56 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-01 9:40 [PATCH v2 0/3] printk: replace ringbuffer John Ogness
2020-05-01 9:40 ` [PATCH v2 1/3] crash: add VMCOREINFO macro for anonymous structs John Ogness
2020-06-03 10:16 ` Petr Mladek
2020-05-01 9:40 ` [PATCH v2 2/3] printk: add lockless buffer John Ogness
2020-05-18 13:03 ` John Ogness
2020-05-18 17:22 ` Linus Torvalds
2020-05-19 20:34 ` John Ogness
2020-06-09 7:10 ` blk->id read race: was: " Petr Mladek
2020-06-09 14:18 ` John Ogness
2020-06-10 8:42 ` Petr Mladek
2020-06-10 13:55 ` John Ogness
2020-06-09 9:31 ` redundant check in make_data_reusable(): was " Petr Mladek
2020-06-09 14:48 ` John Ogness
2020-06-10 9:38 ` Petr Mladek
2020-06-10 10:24 ` John Ogness
2020-06-10 14:56 ` John Ogness
2020-06-11 19:51 ` John Ogness
2020-06-11 13:55 ` Petr Mladek
2020-06-11 20:25 ` John Ogness
2020-06-09 9:48 ` Full barrier in data_push_tail(): " Petr Mladek
2020-06-09 15:03 ` John Ogness
2020-06-09 11:37 ` Barrier before pushing desc_ring tail: " Petr Mladek
2020-06-09 15:56 ` John Ogness [this message]
2020-06-11 12:01 ` Petr Mladek
2020-06-11 23:06 ` John Ogness
2020-06-09 14:38 ` data_ring head_lpos and tail_lpos synchronization: " Petr Mladek
2020-06-10 7:53 ` John Ogness
2020-05-01 9:40 ` [PATCH v2 3/3] printk: use the lockless ringbuffer John Ogness
2020-05-06 14:50 ` John Ogness
2020-05-13 12:05 ` [PATCH v2 0/3] printk: replace ringbuffer Prarit Bhargava
2020-05-15 10:24 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d068utbg.fsf@vostro.fn.ogness.net \
--to=john.ogness@linutronix.de \
--cc=gregkh@linuxfoundation.org \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=parri.andrea@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox