public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: "Sergey Senozhatsky" <senozhatsky@chromium.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Jiri Slaby" <jirislaby@kernel.org>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Tony Lindgren" <tony@atomide.com>,
	"Geert Uytterhoeven" <geert+renesas@glider.be>,
	"Justin Chen" <justin.chen@broadcom.com>,
	"Jiaqing Zhao" <jiaqing.zhao@linux.intel.com>,
	linux-serial@vger.kernel.org
Subject: Re: [PATCH printk v2 08/26] printk: nbcon: Implement processing in port->lock wrapper
Date: Mon, 11 Mar 2024 18:14:19 +0106	[thread overview]
Message-ID: <87le6oy9vg.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <Zdh4eEJJpasEWqa5@alley>

Hi Petr,

Thanks for the detailed feedback. Here is a lengthy response. I hope it
clarifies the uart port and console fields. And I think you identified a
bug relating to the setup() callback.

On 2024-02-23, Petr Mladek <pmladek@suse.com> wrote:
> My main (only) concern was the synchronization of the various accessed
> variables, especially, port->cons.

The only issue is if port->cons disappears between lock and unlock. I
see there is code setting port->cons to NULL, although I do not know
why. Once port->cons is set, there is never a reason to unset it.

Regardless, I will add port->lock synchronization when modifying
port->cons. There are only a few code sites and they are all during
driver setup.

> Note: I am not completely sure how the early and valid console drivers
>       share the same struct uart_port. So, maybe I miss some important
>       guarantee.

The struct uart_port is _not_ shared between the early and normal
consoles. However, the struct console is shared for normal consoles
amoung various ports of a particular driver.

> Anyway. synchronization of port->cons looks like a shade area.
> IMHO, the existing code expects that it is used _only when the console
> is registered_. But this patch wants to access it _even before
> the console is registered_!

Indeed. It is not enough for uart_is_nbcon() to check if it is an
NBCON. It must also check if it is registered, locklessly:

    hlist_unhashed_lockless(&con->node);

Most importantly to be sure that nbcon_init() has already been called.
register_console() clears the nbcon state after cons->index has been
set, but before the console has been added to the list.

> For example, it took me quite a lot of time to shake my head around:
>
> #define uart_console(port) \
> 	((port)->cons && (port)->cons->index == (port)->line)
>
>   + port->cons and port->line are updated in the uart code.
>     It seems that the update is not serialized by port->lock.
>     Something might be done under port->mutex.
>
>   + cons->index is updated in register_console() under
>     console_list_lock.
>
> Spoiler: I propose a solution which does not use uart_console().

This check is necessary because multiple ports of a driver will set and
share the same port->cons value, even if they are not really the
console. I looked into enforcing that port->cons is NULL if it is not a
registered console, but this is tricky. port->cons is driver-internal
and hidden from printk. The driver will set port->cons in case it
becomes the console and printk will set cons->index once it has chosen
which port will be the actual console. But there is no way to unset
port->cons if a port was not chosen by printk.

The various fields have the following meaning (AFAICT):

port->line: An identifier to represent a particular port supported by a
driver.

port->cons: The struct console to use if this port is chosen to be a
console.

port->console: Boolean, true if this port was chosen to be a
console. (Used only by the tty layer.)

cons->index: The port chosen by printk to be a console.

None of those fields specify if the port is currently registered as a
console. For that you would need to check if port->cons->node is hashed
and then verify that port->line matches port->cons->index. This is what
uart_nbcon_acquire() needs to do (as well as check if it is an nbcon
console).

>> --- a/drivers/tty/serial/8250/8250_port.c
>> +++ b/drivers/tty/serial/8250/8250_port.c
>> @@ -3284,6 +3284,7 @@ void serial8250_init_port(struct uart_8250_port *up)
>>  	struct uart_port *port = &up->port;
>>  
>>  	spin_lock_init(&port->lock);
>> +	port->nbcon_locked_port = false;
>
> I am not sure if the variable really helps anything:
>
>    + nbcon_context release() must handle the case when it
>      lost the ownership anyway.

uart_nbcon_release() must first check if the provided port is a
registered nbcon console. A simple boolean check is certainly faster
than the 4 checks mentioned above. After all, if it was never locked,
there is no reason to unlock it.

>    + nbcon_acquire() is called under port->lock. It means that
>      nbcon_release() should always be called when the acquire
>      succeeded earlier.

Same answer as above.

> We just need to make sure that port->cons can be cleared only
> under port->lock. But this would be required even with
> port->nbcon_locked_port.

Agreed. I will add this.

>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -995,3 +996,79 @@ void nbcon_free(struct console *con)
>>  
>>  	con->pbufs = NULL;
>>  }
>> +
>> +static inline bool uart_is_nbcon(struct uart_port *up)
>> +{
>> +	int cookie;
>> +	bool ret;
>> +
>> +	if (!uart_console(up))
>
> This function accesses up->cons. I am not completely sure how
> this value is synchronized, for example:
>
>   + serial_core_add_one_port() sets uport->cons under port->mutex.
>     The function is called uart_add_one_port() in various probe()
>     or init() calls.

I will add port->lock synchronization.

>   + univ8250_console_setup() sets and clears port->cons with
>     no explicit synchronization. The function is called from
>     try_enable_preferred_console() under console_list_lock.

I will add port->lock synchronization.

> IMHO, the assignment is done when the drivers are being initialized.
> It is done when the port might already be used by early consoles.
>
> Especially, according to my understanding, newcon->setup() callbacks
> are responsible for using the same port by early and real console drivers.
>
> I guess that uart_port_lock() API is used by early consoles as well.
> It means that they might access up->cons here while it is being
> manipulated by the proper driver.

Note that port->lock does not synchronize early and normal
consoles. Only the console lock can do that. But you bring up a very
good point with setup(). serial8250_console_setup() can call
probe_baud(), which will write to the hardware.

I think that con->setup() needs to be called under the console lock
(just as already with unblank() and device()).

>> +		return false;
>> +
>> +	cookie = console_srcu_read_lock();
>> +	ret = (console_srcu_read_flags(up->cons) & CON_NBCON);
>> +	console_srcu_read_unlock(cookie);
>
> Hmm, console_srcu_read_flags(con) is called under
> console_srcu_read_lock() to make sure that it does not
> disappear. It makes sense when it is used by registered consoles.
>
> But uart_port_lock() might be called even when the console
> is not registered.
>
> I suggest to remove the SRCU lock here. In this code path,
> it does not guarantee anything and is rather misleading.

Agreed.

> I would use a READ_ONCE(), for example by splitting:
>
> /*
>  * Locklessly reading console->flags provides a consistent
>  * read value because there is at most one CPU modifying
>  * console->flags and that CPU is using only read-modify-write
>  * operations to do so.
>  *
>  * The caller must make sure that @con does not disappear.
>  * It can be done by console_srcu_read_lock() when used
>  * only for registered consoles.
>  */
> static inline short console_read_flags(const struct console *con)
> {
> 	return data_race(READ_ONCE(con->flags));
> }
>
> /* Locklessly reading console->flags for registered consoles */
> static inline short console_srcu_read_flags(const struct console *con)
> {
> 	WARN_ON_ONCE(!console_srcu_read_lock_is_held());
>
> 	console_read_flags();
> }

OK.

>> +void uart_nbcon_acquire(struct uart_port *up)
>> +{
>> +	struct console *con = up->cons;
>> +	struct nbcon_context ctxt;
>
> I would add:
>
> 	lockdep_assert_held(&up->lock);

OK.

>> +
>> +	if (!uart_is_nbcon(up))
>> +		return;
>> +
>> +	WARN_ON_ONCE(up->nbcon_locked_port);
>> +
>> +	do {
>> +		do {
>> +			memset(&ctxt, 0, sizeof(ctxt));
>> +			ctxt.console	= con;
>> +			ctxt.prio	= NBCON_PRIO_NORMAL;
>> +		} while (!nbcon_context_try_acquire(&ctxt));
>> +
>> +	} while (!nbcon_context_enter_unsafe(&ctxt));
>> +
>> +	up->nbcon_locked_port = true;
>> +}
>> +EXPORT_SYMBOL_GPL(uart_nbcon_acquire);
>
> I would prefer to split the uart and nbcon specific code, for example:

Can you explain why? This code will not be used anywhere else.

> /**
>  * nbcon_normal_context_acquire - Acquire a generic context with
>  *	the normal priority for nbcon console
>  * @con:	nbcon console
>  *
>  * Context:	Any context which could not be migrated to another CPU.
>  *
>  * Acquire a generic context with the normal priority for the given console.
>  * Prevent the release by entering the unsafe state.
>  *
>  * Note: The console might still loose the ownership by a hostile takeover.
>  *	 But it can be done only by the final flush in panic() when other
>  *	 CPUs should be stopped and other contexts interrupted.
>  */
> static void nbcon_normal_context_acquire(struct console *con)
> {
> 	struct nbcon_context ctxt;
>
> 	do {
> 		do {
> 			memset(&ctxt, 0, sizeof(ctxt));
> 			ctxt.console	= con;
> 			ctxt.prio	= NBCON_PRIO_NORMAL;
> 		} while (!nbcon_context_try_acquire(&ctxt));
>
> 	} while (!nbcon_context_enter_unsafe(&ctxt));
> }
>
> /**
>  * uart_nbcon_acquire - Acquire nbcon console associated with the gived port.
>  * @up:		uart port
>  *
>  * Context:	Must be called under up->lock to prevent manipulating
>  *		up->cons and migrating to another CPU.
>  *
>  * Note: The console might still loose the ownership by a hostile takeover.
>  *	 But it can be done only by the final flush in panic() when other
>  *	 CPUs should be stopped and other contexts interrupted.
>  */
> void uart_nbcon_acquire(struct uart_port *up)
> {
> 	struct console *con; = up->cons;
>
> 	lockdep_assert_held(&up->lock);
>
> 	/*
> 	 * FIXME: This would require adding WRITE_ONCE()
> 	 * on the write part.
> 	 *
> 	 * Or even better, the value should be modified under
> 	 * port->lock so that simple read would be enough here.
> 	 */
> 	con = data_race(READ_ONCE(up->cons));
>
> 	if (!con)
> 		return;
>
> 	if (!console_read_flags(con) & CON_NBCON)
> 		return;
>
> 	nbcon_normal_context_acquire(con);
> }
>
> Note that I did not use up->nbcon_locked_port as explained above.

Note that it will not work because other ports will share the same
up->cons value even though they are not consoles. up->cons only
specifies which struct console to use _if_ printk chooses that port as a
console. It does _not_ mean that printk has chosen that port.

>> +void uart_nbcon_release(struct uart_port *up)
>> +{
>> +	struct console *con = up->cons;
>> +	struct nbcon_context ctxt = {
>> +		.console	= con,
>> +		.prio		= NBCON_PRIO_NORMAL,
>> +	};
>> +
>
> I would add here as well.
>
> 	lockdep_assert_held(&up->lock);

OK.

> This deserves a comment why we do not complain when this function
> is called for nbcon and it is not locked. Something like:
>
> 	/*
> 	 * Even port used by nbcon console might be seen unlocked
> 	 * when it was locked and the console has been registered
> 	 * at the same time.
> 	 */

I think a more appropriate comment would be:

	/*
	 * This function is called for ports that are not consoles
	 * and for ports that may be consoles but are not nbcon
	 * consoles. In those the cases the nbcon console was
	 * never locked and this context must not unlock.
	 */

>> +	if (!up->nbcon_locked_port)
>> +		return;
>> +
>> +	if (nbcon_context_exit_unsafe(&ctxt))
>> +		nbcon_context_release(&ctxt);
>> +
>> +	up->nbcon_locked_port = false;
>> +}
>
> Again I would better split the nbcon and uart part and create:

I can do the split, but I do not see the reason for it.

John

  reply	other threads:[~2024-03-11 17:09 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-18 18:57 [PATCH printk v2 00/26] wire up write_atomic() printing John Ogness
2024-02-18 18:57 ` [PATCH printk v2 01/26] serial: core: Provide port lock wrappers John Ogness
2024-02-18 18:57 ` [PATCH printk v2 02/26] serial: core: Use " John Ogness
2024-02-18 18:57 ` [PATCH printk v2 03/26] serial: core: fix kernel-doc for uart_port_unlock_irqrestore() John Ogness
2024-02-18 18:57 ` [PATCH printk v2 04/26] printk: Consider nbcon boot consoles on seq init John Ogness
2024-02-20 10:26   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 05/26] printk: Add notation to console_srcu locking John Ogness
2024-02-20 10:29   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 06/26] printk: nbcon: Ensure ownership release on failed emit John Ogness
2024-02-20 15:16   ` Petr Mladek
2024-02-20 16:29     ` John Ogness
2024-02-21 13:23       ` John Ogness
2024-02-18 18:57 ` [PATCH printk v2 07/26] printk: Check printk_deferred_enter()/_exit() usage John Ogness
2024-02-21  9:55   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 08/26] printk: nbcon: Implement processing in port->lock wrapper John Ogness
2024-02-19 12:16   ` Andy Shevchenko
2024-02-19 14:18     ` John Ogness
2024-02-19 14:35       ` Andy Shevchenko
2024-02-19 16:52         ` John Ogness
2024-02-19 17:14           ` Andy Shevchenko
2024-02-23 10:51   ` Petr Mladek
2024-03-11 17:08     ` John Ogness [this message]
2024-03-13  9:49       ` John Ogness
2024-03-22  6:23         ` Tony Lindgren
2024-03-27  9:32           ` John Ogness
2024-03-27 13:12             ` Andy Shevchenko
2024-03-14 11:37       ` John Ogness
2024-03-14 14:26       ` Petr Mladek
2024-03-15 15:04         ` John Ogness
2024-03-18 15:42           ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 09/26] printk: nbcon: Add detailed doc for write_atomic() John Ogness
2024-02-23 13:11   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 10/26] printk: nbcon: Fix kerneldoc for enums John Ogness
2024-02-18 19:10   ` Randy Dunlap
2024-02-23 13:13   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 11/26] printk: Make console_is_usable() available to nbcon John Ogness
2024-02-18 18:57 ` [PATCH printk v2 12/26] printk: Let console_is_usable() handle nbcon John Ogness
2024-02-18 18:57 ` [PATCH printk v2 13/26] printk: Add @flags argument for console_is_usable() John Ogness
2024-02-18 18:57 ` [PATCH printk v2 14/26] printk: nbcon: Provide function to flush using write_atomic() John Ogness
2024-02-23 15:47   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 15/26] printk: Track registered boot consoles John Ogness
2024-02-23 15:57   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 16/26] printk: nbcon: Use nbcon consoles in console_flush_all() John Ogness
2024-02-23 17:15   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 17/26] printk: nbcon: Assign priority based on CPU state John Ogness
2024-02-29 13:50   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 18/26] printk: nbcon: Add unsafe flushing on panic John Ogness
2024-02-29 13:53   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 19/26] printk: Avoid console_lock dance if no legacy or boot consoles John Ogness
2024-02-29 15:19   ` Petr Mladek
2024-02-29 16:19   ` READ_ONCE: was: " Petr Mladek
2024-02-29 22:51     ` Paul E. McKenney
2024-02-18 18:57 ` [PATCH printk v2 20/26] printk: Track nbcon consoles John Ogness
2024-03-01  9:39   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 21/26] printk: Coordinate direct printing in panic John Ogness
2024-03-01 13:05   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 22/26] printk: nbcon: Implement emergency sections John Ogness
2024-03-01 13:28   ` Petr Mladek
2024-03-01 15:49   ` flush was: " Petr Mladek
2024-03-01 16:12     ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 23/26] panic: Mark emergency section in warn John Ogness
2024-03-01 13:57   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 24/26] panic: Mark emergency section in oops John Ogness
2024-03-01 14:55   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 25/26] rcu: Mark emergency section in rcu stalls John Ogness
2024-03-01 15:13   ` Petr Mladek
2024-02-18 18:57 ` [PATCH printk v2 26/26] lockdep: Mark emergency section in lockdep splats John Ogness
2024-02-19  4:14   ` Waiman Long
2024-02-19 11:11     ` John Ogness
2024-02-19 15:07       ` Waiman Long
2024-03-01 15:18   ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87le6oy9vg.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=geert+renesas@glider.be \
    --cc=gregkh@linuxfoundation.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jiaqing.zhao@linux.intel.com \
    --cc=jirislaby@kernel.org \
    --cc=justin.chen@broadcom.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    --cc=tony@atomide.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox