From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932237AbcASPCt (ORCPT ); Tue, 19 Jan 2016 10:02:49 -0500 Received: from mail-pa0-f43.google.com ([209.85.220.43]:34938 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756178AbcASPC2 (ORCPT ); Tue, 19 Jan 2016 10:02:28 -0500 Date: Wed, 20 Jan 2016 00:00:40 +0900 From: Sergey Senozhatsky To: Petr Mladek Cc: Sergey Senozhatsky , Andrew Morton , Tejun Heo , Jan Kara , Kyle McMartin , Dave Jones , Calvin Owens , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [RFC][PATCH -next 1/2] printk: move can_use_console out of console_trylock_for_printk Message-ID: <20160119150040.GA558@swordfish> References: <1452747443-9927-1-git-send-email-sergey.senozhatsky@gmail.com> <1452747443-9927-2-git-send-email-sergey.senozhatsky@gmail.com> <20160118154228.GY3178@pathway.suse.cz> <20160119004236.GA4963@swordfish> <20160119133136.GA2148@dhcp128.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160119133136.GA2148@dhcp128.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On (01/19/16 14:31), Petr Mladek wrote: > > On (01/18/16 16:42), Petr Mladek wrote: > > > On Thu 2016-01-14 13:57:22, Sergey Senozhatsky wrote: > > > > vprintk_emit() disables preemption around console_trylock_for_printk() > > > > and console_unlock() calls for a strong reason -- can_use_console() > > > > check. The thing is that vprintl_emit() can be called on a CPU that > > > > is not fully brought up yet (!cpu_online()), which potentially can > > > > cause problems if console driver accesses per-cpu data. A console > > > > driver can explicitly state that it's safe to call it from !online > > > > cpu by setting CON_ANYTIME bit in console ->flags. That's why for > > > > !cpu_online() can_use_console() iterates all the console to find out > > > > if there is a CON_ANYTIME console, otherwise console_unlock() must be > > > > avoided. > > > > > > > > call_console_drivers(), called from console_cont_flush() and > > > > console_unlock(), does the same test during for_each_console() loop. > > > > However, we can have the following corner case. Assume that we have 2 > > > > cpus -- CPU0 is online, CPU1 is !online; and no CON_ANYTIME consoles > > > > available. > > > > > > > > CPU0 online CPU1 !online > > > > console_trylock() > > > > ... > > > > console_unlock() > > > > > > Please, where this console_unlock() comes from? > > > > from UP* or DOWN* (_PREPARE f.e.) notifiers on this CPU, for example, we don't > > know what's going on there. what prevents it from calling console_trylock(), > > grabbing the console_sem and eventually doing console_unlock()? there is > > a can_use_console() check, but it handles only one case -- printk(). > > So, is it a theoretical problem or do you know about any > particular path where this happens? a theoretical one at this point. > Well, it might make sense to get rid of console_trylock_for_printk() > and do the can_use_console() check at the beginning of > unlock_console(). I mean to do > > - if (console_trylock_for_printk()) > + if (console_trylock()) > unlock_console(); agree, I almost removed it, but decided to keep for a while, just in case if we would want to add any additional code there. > But do we really need to repeat the check in every cycle? well, on every iteration in the best case we check cpu_online() only. which is what we would have done anyway in vprintk_emit(), so no additional checks added. at the same time call_console_drivers does not do '!cpu_online && !CON_ANYTIME' for each console now, so in some sense there are less checks now (this is far even from a micro-optimization, just noted). > can_use_console() checks available consoles and if the CPU > is online. Consoles could not get added or removed when > we own console_sem. It seems that CPUs get disabled > in a process context. Therefore it seems that it might happen > only when unlock_console() gets rescheduled. I guess that > it could not get scheduled back on an offline CPU. So, it > seems that it is enough to check can_use_console() only once at > the beginning. > > Or did I miss anything? It does sound interesting, thanks. I was trying to keep the existing behaviour. It almost works, there is one case. console_unlock() /* w/o can_use_console() in logbuf_lock section */ .... again: for (;;) { raw_spin_lock_irqsave logbuf_lock msg_print_text raw_spin_unlock logbuf_lock call_console_drivers local_irq_restore } up_console_sem raw_spin_lock logbuf_lock retry = console_seq != log_next_seq raw_spin_unlock_irqrestore logbuf_lock if retry && console_trylock() goto again; when we up_console_sem(), consoles may appear and disappear, since we don't keep the semaphore. Suppose that we have OFFLINE cpu and we had a CON_ANYTIME console, but in between up_console_sem--console_trylock that single CON_ANYTIME console was removed. So now we have !cpu_online and !CON_ANYTIME. So this is why I re-do the can_use_console() check. I can add a bool flag and do the can_use_console() only once -- when the flag is false, set it to true on successful can_use_console() and avoid can_use_console() during this loop; and set the flag to false right after the 'again:' label, which will imply that console semaphore has been re-acquired and we need to can_use_console() again. something like this, perhaps (to be folded.. or should it be a separate commit -- optimization). will give a test tomorrow. I reused `bool retry' flag (which is probably a bit hacky, it'll be better to have a separate byte for this thing). And I moved can_use_console() from console_cont_flush() to the top -- so if we are executing the `for (;;)' loop, then can_use_console() was successful in console_cont_flush() and we have to re-do can_use_console() only if we released console_sem and jumped to `again' label. --- kernel/printk/printk.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 28b2dec..c7a5ce8 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2170,6 +2170,11 @@ static bool console_cont_flush(char *text, size_t size) raw_spin_lock_irqsave(&logbuf_lock, flags); + if (!can_use_console()) { + ret = false; + goto out; + } + if (!cont.len) goto out; @@ -2181,11 +2186,6 @@ static bool console_cont_flush(char *text, size_t size) if (console_seq < log_next_seq && !cont.cons) goto out; - if (!can_use_console()) { - ret = false; - goto out; - } - len = cont_print_text(text, size); raw_spin_unlock(&logbuf_lock); stop_critical_timings(); @@ -2219,7 +2219,7 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; - bool do_cond_resched, retry, aborted = false; + bool do_cond_resched, retry = false, aborted = false; if (console_suspended) { up_console_sem(); @@ -2255,9 +2255,20 @@ again: raw_spin_lock_irqsave(&logbuf_lock, flags); - if (!can_use_console()) { - aborted = true; - break; + /* + * @retry == true implies that we have released console_sem + * and, thus, someone could have added/removed console(-s). + * we need to can_use_console() again. @retry intially set + * to false, because console_cont_flush() does the + * can_use_console() check and we can't execute this loop + * in can_use_console() returned `false` there. + */ + if (retry) { + retry = false; + if (!can_use_console()) { + aborted = true; + break; + } } if (seen_seq != log_next_seq) { -- 2.7.0