public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrea Parri <parri.andrea@gmail.com>,
	Paul McKenney <paulmck@kernel.org>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support
Date: Thu, 27 Aug 2020 12:04:58 +0206	[thread overview]
Message-ID: <87pn7c5s0t.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <20200826150726.GA4928@alley>

On 2020-08-26, Petr Mladek <pmladek@suse.com> wrote:
>> This series makes a very naive assumption that the previous
>> descriptor is either in the reserved or committed queried states. The
>> fact is, it can be in any of the 4 queried states. Adding support for
>> finalization of all the states then gets quite complex, since any
>> state transition (cmpxchg) may have to deal with an unexpected FINAL
>> flag.
>
> It has to be done in two steps to avoid race:
>
> prb_commit()
>
>    + set PRB_COMMIT_MASK
>    + check if it is still the last descriptor in the array
>    + set PRB_FINAL_MASK when it is not the last descriptor
>
> It should work because prb_reserve() finalizes the previous
> descriptor after the new one is reserved. As a result:
>
>    + prb_reserve() should either see PRB_COMMIT_MASK in the previous
>      descriptor and be able to finalize it.
>
>    + or prb_commit() will see that the head moved and it is not
>      longer the last reserved one.

I do not like the idea of relying on descriptors to finalize
themselves. I worry that there might be some hole there. Failing to
finalize basically disables printk, so that is pretty serious.

Below is a patch against this series that adds support for finalizing
all 4 queried states. It passes all my tests. Note that the code handles
2 corner cases:

1. When seq is 0, there is no previous descriptor to finalize. This
   exception is important because we don't want to finalize the -1
   placeholder. Otherwise, upon the first wrap, a descriptor will be
   prematurely finalized.

2. When a previous descriptor is being reserved for the first time, it
   might have a state_var value of 0 because the writer is still in
   prb_reserve() and has not set the initial value yet. I added
   considerable comments on this special case.

I am comfortable with adding this new code, although it clearly adds
complexity.

John Ogness

diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 90d48973ac9e..1ed1e9eb930f 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -860,9 +860,11 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
 	struct prb_desc_ring *desc_ring = &rb->desc_ring;
 	unsigned long prev_state_val;
 	unsigned long id_prev_wrap;
+	unsigned long state_val;
 	struct prb_desc *desc;
 	unsigned long head_id;
 	unsigned long id;
+	bool is_final;
 
 	head_id = atomic_long_read(&desc_ring->head_id); /* LMM(desc_reserve:A) */
 
@@ -953,10 +955,17 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
 	 * See "ABA Issues" about why this verification is performed.
 	 */
 	prev_state_val = atomic_long_read(&desc->state_var); /* LMM(desc_reserve:E) */
-	if (prev_state_val &&
-	    get_desc_state(id_prev_wrap, prev_state_val, NULL) != desc_reusable) {
-		WARN_ON_ONCE(1);
-		return false;
+	if (get_desc_state(id_prev_wrap, prev_state_val, &is_final) != desc_reusable) {
+		/*
+		 * If this descriptor has never been used, @prev_state_val
+		 * will be 0. However, even though it may have never been
+		 * used, it may have been finalized. So that flag must be
+		 * ignored.
+		 */
+		if ((prev_state_val & ~DESC_FINAL_MASK)) {
+			WARN_ON_ONCE(1);
+			return false;
+		}
 	}
 
 	/*
@@ -967,10 +976,25 @@ static bool desc_reserve(struct printk_ringbuffer *rb, unsigned long *id_out)
 	 * any other changes. A write memory barrier is sufficient for this.
 	 * This pairs with desc_read:D.
 	 */
-	if (!atomic_long_try_cmpxchg(&desc->state_var, &prev_state_val,
-				     id | 0)) { /* LMM(desc_reserve:F) */
-		WARN_ON_ONCE(1);
-		return false;
+	if (is_final)
+		state_val = id | 0 | DESC_FINAL_MASK;
+	else
+		state_val = id | 0;
+	if (atomic_long_cmpxchg(&desc->state_var, prev_state_val,
+				state_val) != prev_state_val) { /* LMM(desc_reserve:F) */
+		/*
+		 * This reusable descriptor must have been finalized already.
+		 * Retry with a reusable+final expected value.
+		 */
+		prev_state_val |= DESC_FINAL_MASK;
+		state_val |= DESC_FINAL_MASK;
+
+		if (!atomic_long_try_cmpxchg(&desc->state_var, &prev_state_val,
+					     state_val)) { /* LMM(desc_reserve:FIXME) */
+
+			WARN_ON_ONCE(1);
+			return false;
+		}
 	}
 
 	/* Now data in @desc can be modified: LMM(desc_reserve:G) */
@@ -1364,9 +1388,37 @@ static void desc_finalize(struct prb_desc_ring *desc_ring, unsigned long id)
 	while (!atomic_long_try_cmpxchg_relaxed(&d->state_var, &prev_state_val,
 						prev_state_val | DESC_FINAL_MASK)) {
 
-		if (get_desc_state(id, prev_state_val, &is_final) != desc_reserved)
+		switch (get_desc_state(id, prev_state_val, &is_final)) {
+		case desc_miss:
+			/*
+			 * If the ID is exactly 1 wrap behind the expected, it is
+			 * in the process of being reserved by another writer and
+			 * must be considered reserved.
+			 */
+			if (get_desc_state(DESC_ID_PREV_WRAP(desc_ring, id),
+					   prev_state_val, &is_final) != desc_reusable) {
+				/*
+				 * If this descriptor has never been used, @prev_state_val
+				 * will be 0. However, even though it may have never been
+				 * used, it may have been finalized. So that flag must be
+				 * ignored.
+				 */
+				if ((prev_state_val & ~DESC_FINAL_MASK)) {
+					WARN_ON_ONCE(1);
+					return;
+				}
+			}
+			fallthrough;
+		case desc_reserved:
+		case desc_reusable:
+			/* finalizable, try again */
 			break;
+		case desc_committed:
+			/* already finalized */
+			return;
+		}
 
+		/* already finalized? */
 		if (is_final)
 			break;
 	}
@@ -1431,14 +1483,6 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 		goto fail;
 	}
 
-	/*
-	 * New data is about to be reserved. Once that happens, previous
-	 * descriptors are no longer able to be extended. Finalize the
-	 * previous descriptor now so that it can be made available to
-	 * readers (when committed).
-	 */
-	desc_finalize(desc_ring, DESC_ID(id - 1));
-
 	d = to_desc(desc_ring, id);
 
 	/*
@@ -1464,6 +1508,16 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb,
 	else
 		d->info.seq += DESCS_COUNT(desc_ring);
 
+	/*
+	 * New data is about to be reserved. Once that happens, previous
+	 * descriptors are no longer able to be extended. Finalize the
+	 * previous descriptor now so that it can be made available to
+	 * readers (when committed). (For the first descriptor, there is
+	 * no previous record to finalize.)
+	 */
+	if (d->info.seq > 0)
+		desc_finalize(desc_ring, DESC_ID(id - 1));
+
 	r->text_buf = data_alloc(rb, &rb->text_data_ring, r->text_buf_size,
 				 &d->text_blk_lpos, id);
 	/* If text data allocation fails, a data-less record is committed. */

  reply	other threads:[~2020-08-27  9:59 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 10:35 [PATCH v2 0/7][next] printk: reimplement LOG_CONT handling John Ogness
2020-08-24 10:35 ` [PATCH v2 1/7][next] printk: ringbuffer: rename DESC_COMMITTED_MASK flag John Ogness
2020-08-25 17:09   ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 2/7][next] printk: ringbuffer: change representation of reusable John Ogness
2020-08-25 17:10   ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 3/7][next] printk: ringbuffer: relocate get_data() John Ogness
2020-08-25 17:14   ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 4/7][next] printk: ringbuffer: add BLK_DATALESS() macro John Ogness
2020-08-25 17:24   ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support John Ogness
2020-08-26  8:39   ` John Ogness
2020-08-26 10:01     ` Sergey Senozhatsky
2020-08-26 12:37       ` John Ogness
2020-08-26 15:07         ` Petr Mladek
2020-08-27  9:58           ` John Ogness [this message]
2020-08-27 15:17             ` Petr Mladek
2020-08-28  7:21             ` Petr Mladek
2020-08-28 10:01               ` John Ogness
2020-08-27 12:50   ` Petr Mladek
2020-08-27 14:28     ` John Ogness
2020-08-24 10:35 ` [PATCH v2 6/7][next] printk: reimplement log_cont using record extension John Ogness
2020-08-24 10:35 ` [PATCH v2 7/7][next] scripts/gdb: support printk finalized records John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pn7c5s0t.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox