public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrea Parri <parri.andrea@gmail.com>,
	Paul McKenney <paulmck@kernel.org>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support
Date: Fri, 28 Aug 2020 12:07:09 +0206	[thread overview]
Message-ID: <873647144a.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <20200828072148.GC4353@alley>

On 2020-08-28, Petr Mladek <pmladek@suse.com> wrote:
>> Below is a patch against this series that adds support for finalizing
>> all 4 queried states. It passes all my tests. Note that the code handles
>> 2 corner cases:
>> 
>> 1. When seq is 0, there is no previous descriptor to finalize. This
>>    exception is important because we don't want to finalize the -1
>>    placeholder. Otherwise, upon the first wrap, a descriptor will be
>>    prematurely finalized.
>> 
>> 2. When a previous descriptor is being reserved for the first time, it
>>    might have a state_var value of 0 because the writer is still in
>>    prb_reserve() and has not set the initial value yet. I added
>>    considerable comments on this special case.
>> 
>> I am comfortable with adding this new code, although it clearly adds
>> complexity.
>> 
>> John Ogness
>> 
>> diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
>> index 90d48973ac9e..1ed1e9eb930f 100644
>> --- a/kernel/printk/printk_ringbuffer.c
>> +++ b/kernel/printk/printk_ringbuffer.c
>> @@ -860,9 +860,11 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
>>  	struct prb_desc_ring *desc_ring = &rb->desc_ring;
>>  	unsigned long prev_state_val;
>>  	unsigned long id_prev_wrap;
>> +	unsigned long state_val;
>>  	struct prb_desc *desc;
>>  	unsigned long head_id;
>>  	unsigned long id;
>> +	bool is_final;
>>  
>>  	head_id = atomic_long_read(&desc_ring->head_id); /* LMM(desc_reserve:A) */
>>  
>> @@ -953,10 +955,17 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
>>  	 * See "ABA Issues" about why this verification is performed.
>>  	 */
>>  	prev_state_val = atomic_long_read(&desc->state_var); /* LMM(desc_reserve:E) */
>> -	if (prev_state_val &&
>> -	    get_desc_state(id_prev_wrap, prev_state_val, NULL) != desc_reusable) {
>> -		WARN_ON_ONCE(1);
>> -		return false;
>> +	if (get_desc_state(id_prev_wrap, prev_state_val, &is_final) != desc_reusable) {
>> +		/*
>> +		 * If this descriptor has never been used, @prev_state_val
>> +		 * will be 0. However, even though it may have never been
>> +		 * used, it may have been finalized. So that flag must be
>> +		 * ignored.
>> +		 */
>> +		if ((prev_state_val & ~DESC_FINAL_MASK)) {
>> +			WARN_ON_ONCE(1);
>> +			return false;
>> +		}
>>  	}
>>  
>>  	/*
>> @@ -967,10 +976,25 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
>>  	 * any other changes. A write memory barrier is sufficient for this.
>>  	 * This pairs with desc_read:D.
>>  	 */
>> -	if (!atomic_long_try_cmpxchg(&desc->state_var, &prev_state_val,
>> -				     id | 0)) { /* LMM(desc_reserve:F) */
>> -		WARN_ON_ONCE(1);
>> -		return false;
>> +	if (is_final)
>> +		state_val = id | 0 | DESC_FINAL_MASK;
>
> The state from the previous wrap always have to have DESC_FINAL_MASK set.
> Do I miss something, please?

Important: FINAL is not a _state_. It is a _flag_ that marks a
descriptor as non-reopenable. This was a simple change because it does
not affect any state logic. The number of states and possible
transitions have not changed.

When a descriptor transitions to reusable, the FINAL flag is cleared. It
has reached the end of its lifecycle. See desc_make_reusable().

(In order to have transitioned to reusable, the FINAL and COMMIT flags
must have been set.)

In the case of desc_reserve(), a reusable descriptor is transitioning to
reserved. When this transition happens, there may already be a later
descriptor that has been reserved and finalized this descriptor. If the
FINAL flag is set here, it means that the FINAL flag is set for the
_new_ descriptor being reserved.

In summary, the FINAL flag can be set in _any_ state. Once set, it is
preserved for all further state transitions. And it is cleared when that
descriptor becomes reusable.

>> +	else
>> +		state_val = id | 0;
>> +	if (atomic_long_cmpxchg(&desc->state_var, prev_state_val,
>> +				state_val) != prev_state_val) { /* LMM(desc_reserve:F) */
>> +		/*
>> +		 * This reusable descriptor must have been finalized already.
>> +		 * Retry with a reusable+final expected value.
>> +		 */
>> +		prev_state_val |= DESC_FINAL_MASK;
>> +		state_val |= DESC_FINAL_MASK;
>> +
>> +		if (!atomic_long_try_cmpxchg(&desc->state_var, &prev_state_val,
>> +					     state_val)) { /* LMM(desc_reserve:FIXME) */
>> +
>> +			WARN_ON_ONCE(1);
>> +			return false;
>> +		}
>>  	}
>>  
>>  	/* Now data in @desc can be modified: LMM(desc_reserve:G) */
>> @@ -1364,9 +1388,37 @@ static void desc_finalize(struct prb_desc_ring *desc_ring, unsigned long id)
>>  	while (!atomic_long_try_cmpxchg_relaxed(&d->state_var, &prev_state_val,
>>  						prev_state_val | DESC_FINAL_MASK)) {
>>  
>> -		if (get_desc_state(id, prev_state_val, &is_final) != desc_reserved)
>> +		switch (get_desc_state(id, prev_state_val, &is_final)) {
>> +		case desc_miss:
>> +			/*
>> +			 * If the ID is exactly 1 wrap behind the expected, it is
>> +			 * in the process of being reserved by another writer and
>> +			 * must be considered reserved.
>> +			 */
>> +			if (get_desc_state(DESC_ID_PREV_WRAP(desc_ring, id),
>> +					   prev_state_val, &is_final) != desc_reusable) {
>> +				/*
>> +				 * If this descriptor has never been used, @prev_state_val
>> +				 * will be 0. However, even though it may have never been
>> +				 * used, it may have been finalized. So that flag must be
>> +				 * ignored.
>> +				 */
>> +				if ((prev_state_val & ~DESC_FINAL_MASK)) {
>> +					WARN_ON_ONCE(1);
>> +					return;
>> +				}
>> +			}
>> +			fallthrough;
>
> How is this supposed to work please?
>
> If the ID is exactly one wrap behind, it is being reserved but the new
> id was not written yet. In this case, DESC_FINAL_MASK is already set.

No. If it is exactly one wrap behind, it is still in the reusable
state. The FINAL flag will not be set because it is cleared when
transitioning to reusable.

> The above cmpxchg will not do any change. And prb_reserve() will not
> be able to distinguish that it has been finalized.

The cmpxchg() will try again using the newly updated @prev_state_val and
try to set it to "prev_state_val | FINAL".

Now, theoretically, a writer could commit and reopen the descriptor with
such timing that this cmpxchg() will always fail. A kind of cat and
mouse, always trying to set the FINAL flag for the last @state_var
value.

That game could be avoided if the descriptor noticed that it is no
longer the head ID and set its own FINAL flag. I like the idea of a
guaranteed finalizing of the previous descriptor and the ability for a
descriptor to finalize itself.

(Although really, if we start talking about timed cmpxchg() attacks,
almost any cmpxchg loop can become near-infinite.)

> I am really not sure what solution is better. Mine have more barriers.
> But this brings many new combinations that need to be checked and
> handled.

I am putting together a fully functional version of your solution with
the appropriate memory barriers and correct handling of the special
cases so that we can get a better look at the difference.

John Ogness

  reply	other threads:[~2020-08-28 10:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 10:35 [PATCH v2 0/7][next] printk: reimplement LOG_CONT handling John Ogness
2020-08-24 10:35 ` [PATCH v2 1/7][next] printk: ringbuffer: rename DESC_COMMITTED_MASK flag John Ogness
2020-08-25 17:09   ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 2/7][next] printk: ringbuffer: change representation of reusable John Ogness
2020-08-25 17:10   ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 3/7][next] printk: ringbuffer: relocate get_data() John Ogness
2020-08-25 17:14   ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 4/7][next] printk: ringbuffer: add BLK_DATALESS() macro John Ogness
2020-08-25 17:24   ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support John Ogness
2020-08-26  8:39   ` John Ogness
2020-08-26 10:01     ` Sergey Senozhatsky
2020-08-26 12:37       ` John Ogness
2020-08-26 15:07         ` Petr Mladek
2020-08-27  9:58           ` John Ogness
2020-08-27 15:17             ` Petr Mladek
2020-08-28  7:21             ` Petr Mladek
2020-08-28 10:01               ` John Ogness [this message]
2020-08-27 12:50   ` Petr Mladek
2020-08-27 14:28     ` John Ogness
2020-08-24 10:35 ` [PATCH v2 6/7][next] printk: reimplement log_cont using record extension John Ogness
2020-08-24 10:35 ` [PATCH v2 7/7][next] scripts/gdb: support printk finalized records John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=873647144a.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox