From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751345AbdL1GsV (ORCPT ); Thu, 28 Dec 2017 01:48:21 -0500 Received: from mail-pg0-f47.google.com ([74.125.83.47]:34208 "EHLO mail-pg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750981AbdL1GsU (ORCPT ); Thu, 28 Dec 2017 01:48:20 -0500 X-Google-Smtp-Source: ACJfBotpd4PD58hQOQVxvdRO9yisKuFF3fSwyaOAoF7zoIls7yzxQF+oM+AMjFzTh84jpPuh6qnkxw== Date: Thu, 28 Dec 2017 15:48:58 +0900 From: Sergey Senozhatsky To: Steven Rostedt Cc: Tejun Heo , Petr Mladek , Sergey Senozhatsky , Jan Kara , Andrew Morton , Peter Zijlstra , Rafael Wysocki , Pavel Machek , Tetsuo Handa , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Message-ID: <20171228064858.GA892@jagdpanzerIV> References: <20171204134825.7822-1-sergey.senozhatsky@gmail.com> <20171214142709.trgl76hbcdwaczzd@pathway.suse.cz> <20171214152551.GY3919388@devbig577.frc2.facebook.com> <20171214125506.52a7e5fa@gandalf.local.home> <20171214181153.GZ3919388@devbig577.frc2.facebook.com> <20171214132109.32ae6a74@gandalf.local.home> <20171222000932.GG1084507@devbig577.frc2.facebook.com> <20171221231932.27727fab@vmware.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171221231932.27727fab@vmware.local.home> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On (12/21/17 23:19), Steven Rostedt wrote: [..] > > I wrote this before but this isn't a theoretical problem. We see > > these stalls a lot. Preemption isn't enabled to begin with. Memory > > pressure is high and OOM triggers and printk starts printing out OOM > > warnings; then, a network packet comes in which triggers allocations > > in the network layer, which fails due to memory pressure, which then > > generates memory allocation failure messages which then generates > > netconsole packets which then tries to allocate more memory and so on. > > It doesn't matter if preemption is enabled or not. The hand off should > happen either way. preemption does matter. it matters so much, that it makes the first thing your patch depends on to be questionable - if CPUA stuck in console_unlock() printing other CPU's messages, it's because there are currently printk()-s happening on CPUB-CPUZ. which is not always true. with preemption enabled CPUA can stuck in console_unlock() because printk()-s from CPUB-CPUZ could have happened seconds or even minutes ago. I have demonstrated it with the traces; it didn't convenience you. OK, I sat down and went trough Tetsuo's reports starting from 2016: https://marc.info/?l=linux-kernel&m=151375384500555 and Tetsuo has reported several times that printing CPU can sleep for minutes with console_sem being locked; while other CPUs are happy to printk()->log_store() as much as they want. and that's why I believe that that "The hand off should happen either way" is dangerous, especially when we hand off from a preemptible context to an atomic context. you don't think that this is a problem, because your patch requires logbuf to be small enough for any atomic context (irq, under spin_lock, etc) to be able to print out all the pending messages to all registered consoles [we print to consoles sequentially] within watchdog_threshold seconds (10 seconds by default). who does adjust the size of the logbuf in a way that console_unlock() can print the entire logbuf under 10 seconds? > > It's just that there's no one else to give that flushing duty too, so > > the pingpoinging that your patch implements can't really help > > anything. > > > > You argue that it isn't made worse by your patch, which may be true > > but your patch doesn't solve actual problems and is most likely > > unnecessary complication which gets in the way for the actual > > solution. It's a weird argument to push an approach which is > > fundamentally broken. Let's please stop that. > > BULLSHIT! > > It's not a complex solution, and coming from the cgroup and workqueue > maintainer, that's a pretty ironic comment. > > It has already been proven that it can solve problems: > > http://lkml.kernel.org/r/20171219143147.GB15210@dhcp22.suse.cz and Tetsuo also said that his test does not cover all the scenarios; and he has sent us all a pretty clear warning: : Therefore, while it is true that any approach would survive my : environment, it is dangerous to assume that any approach is safe : for my customer's enterprise servers. https://marc.info/?l=linux-kernel&m=151377161705573 when it comes to lockups, printk() design is so flexible, that one can justify nearly every patch no matter how many regressions it introduces, by simply saying: "but the same lockup scenario could have happened even before my patch. it's just printk() was designed this way. you were lucky enough not to hit the problem before; now you are less lucky". been there, done that. it's a trap. > You don't think handing off printks to an offloaded thread isn't more > complex nor can it cause even more issues (like more likely to lose > relevant information on kernel crashes)? printk_kthread *used* to be way too independent. basically what we had before was bad_function() printk() vprintk_emit() { if (oops_in_progress) can_offload_to_printk = false; if (can_offload_to_printk) wake_up(printk_kthread); else if (console_trylock()) console_unlock(); } the bad things: - first, we do not take into account the fact that printk_kthread can already be active. - second, we do not take into account the fact that printk_kthread can be way-way behind - a huge gap between `console_seq' and `log_next_seq'. - third, we do not take into account the fact that printk_kthread can be preempted under console_sem. so, IOW, the CPU which was in trouble declared the "oh, we should not offload to printk kthread" emergency only for itself, basically. the CPU which printk_kthread was running on or sleeping on [preemption] did not care, neither did printk_kthread. what we have now: - printk_kthread cannot be preempted under console_sem. if it acquires the console_sem it stays RUNNING, printing the messages. - printk_kthread does check for emergency on other CPUs. right before it starts printing the next pending logbuf line (every line). IOW, instead of that: console_unlock() { again: for (;;) { call_console_drivers(); } up(); if (more_pending_messages && down_trylock()) goto again; } it now does this: console_unlock() { preempt_disable(); again: for (;;) { if (something_is_not_right) break; if (printing_for_too_long) break; call_console_drivers(); } up(); preempt_enable(); if (!something_is_not_right && !printing_for_too_long && more_pending_messages && down_trylock()) goto again; } so if CPUA declares emergency (oops_in_progress, etc) then printk_kthread, if it's running, will get out of CPUA's way as soon as possible. so I don't think we are still *more likely* to lose relevant information on kernel crashes because of printk_kthread. and I'm actually thinking about returning back the old vprintk_emit() behavior vprintk_emit() { + preempt_disable(); if (console_trylock()) console_unlock(); + preempt_enable(); } and letting console_unlock() to offload when needed. preemption in vprintk_emit() is no good. -ss