From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752985AbdLOGwO (ORCPT ); Fri, 15 Dec 2017 01:52:14 -0500 Received: from mail-pg0-f68.google.com ([74.125.83.68]:33398 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751704AbdLOGwN (ORCPT ); Fri, 15 Dec 2017 01:52:13 -0500 X-Google-Smtp-Source: ACJfBotpiFjqI1uRIZN2eIZEoiniZ8f+Hjku2zkgE4jn2BuqRkY6GeiuHWwd8uyIl1yhQqBu9TECLw== Date: Fri, 15 Dec 2017 15:52:05 +0900 From: Sergey Senozhatsky To: Steven Rostedt Cc: Tejun Heo , Sergey Senozhatsky , Sergey Senozhatsky , Petr Mladek , Jan Kara , Andrew Morton , Peter Zijlstra , Rafael Wysocki , Pavel Machek , Tetsuo Handa , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Message-ID: <20171215065205.GB468@jagdpanzerIV> References: <20171204134825.7822-1-sergey.senozhatsky@gmail.com> <20171214142709.trgl76hbcdwaczzd@pathway.suse.cz> <20171214152551.GY3919388@devbig577.frc2.facebook.com> <20171214125506.52a7e5fa@gandalf.local.home> <20171214181153.GZ3919388@devbig577.frc2.facebook.com> <20171215021024.GA11199@jagdpanzerIV> <20171214221831.3ead0298@vmware.local.home> <20171215050607.GC11199@jagdpanzerIV> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171215050607.GC11199@jagdpanzerIV> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (12/15/17 14:06), Sergey Senozhatsky wrote: [..] > > Where do we do the above? And has this been proven to be an issue? > > um... hundreds of cases. > > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus) > happening at the same moment + NMI backtraces from all the CPUs (more > than 3 cpus) that follows the lockups, over not-so-fast serial console. > exactly the bug report I received two days ago. so which one of the CPUs > here is a good candidate to successfully emit all of the pending logbuf > entries? none. all of them either have local IRQs disabled, or dump_stack() > from either backtrace IPI or backtrace NMI (depending on the configuration). and, Steven, one more thing. wondering what's your opinion. suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing printk-s and several atomic CPUs doing printk-s. Is proposed hand off scheme really useful in this case? CPUs will now a) print their lines (a potentially slow call_console_drivers()) and b) spin in vprintk_emit on console_owner with local IRQs disabled waiting for either non-atomic printk CPU or another atomic CPU to finish printing its line (call_console_drivers()) and to hand off printing. so current CPU, after busy-waiting for foreign CPU's call_console_drivers(), will go and do his own call_console_drivers(). which, time-wise, simply doubles (roughly) the amount of time that CPU spends in printk()->console_unlock(). agreed? if we previously could have a case when non-atomic printk CPU would grab the console_sem and print all atomic printk CPUs messages first, and then its own messages, thus atomic printk CPUs would have just log_store(), now we will have CPUs to call_console_driver() and to spin on console_sem owner waiting for call_console_driver() on a foreign CPU [not all of them: it's one CPU doing the print out and one CPU spinning console_owner. but overall I think all CPUs will experience that spin on console_sem waiting for call_console_driver() and then do its own call_console_driver()]. even two CPUs case is not so simple anymore. see below. - first, assume one CPU is atomic and one is non-atomic. - second, assume that both CPUs are atomic CPUs, and go thought it again. CPU0 CPU1 printk() printk() log_store() log_store() console_unlock() set console_owner sees console_owner sets console_waiter spin call_console_drivers() sees console_waiter break printk() log_store() console_unlock() set console_owner sees console_owner sets console_waiter spin call_console_drivers() sees console_waiter break printk() log_store() console_unlock() set console_owner sees console_owner sets console_waiter spin call_console_drivers() sees console_waiter break printk() log_store() console_unlock() set console_owner sees console_owner sets console_waiter spin .... that "wait for call_console_drivers() on another CPU and then do its own call_console_drivers()" pattern does look dangerous. the benefit of hand-off is really fragile sometimes, isn't it? -ss