From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Thomas Gleixner Subject: Re: [patch 00/13] preempt: Make preempt count unconditional In-Reply-To: References: <20200914204209.256266093@linutronix.de> Date: Mon, 14 Sep 2020 23:55:24 +0200 Message-ID: <871rj4owfn.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain To: Linus Torvalds Cc: LKML , linux-arch , Sebastian Andrzej Siewior , Valentin Schneider , Richard Henderson , Ivan Kokshaysky , Matt Turner , alpha , Jeff Dike , Richard Weinberger , Anton Ivanov , linux-um , Brian Cain , linux-hexagon@vger.kernel.org, Geert Uytterhoeven , linux-m68k , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Andrew Morton , Linux-MM , Ingo Molnar , Russell King , Linux ARM , Chris Zankel , Max Filippov , linux-xtensa@linux-xtensa.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , David Airlie , Daniel Vetter , intel-gfx , dri-devel , "Paul E. McKenney" , Josh Triplett , Mathieu Desnoyers , Lai Jiangshan , Shuah Khan , rcu@vger.kernel.org, "open list:KERNEL SELFTEST FRAMEWORK" List-ID: On Mon, Sep 14 2020 at 13:59, Linus Torvalds wrote: > On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner wrote: >> >> Recently merged code does: >> >> gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC; >> >> Looks obviously correct, except for the fact that preemptible() is >> unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in >> that code use GFP_ATOMIC on such kernels. > > I don't think this is a good reason to entirely get rid of the > no-preempt thing. I did not say that this is a good reason. It just illustrates the problem. > The above is just garbage. It's bogus. You can't do it. > > Blaming the no-preempt code for this bug is extremely unfair, imho. I'm not blaming the no-preempt code. I'm blaming inconsistency and there is no real good argument for inconsistent behaviour, TBH. > And the no-preempt code does help make for much better code generation > for simple spinlocks. Yes it does generate better code, but I tried hard to spot a difference in various metrics exposed by perf. It's all in the noise and I only can spot a difference when the actual preemption check after the decrement which still depends on CONFIG_PREEMPT is in place, but that's not the case for PREEMPT_NONE or PREEMPT_VOLUNTARY kernels where the decrement is just a decrement w/o any conditional behind it. > Where is that horribly buggy recent code? It's not in that exact > format, certainly, since 'grep' doesn't find it. Bah, that was stuff in next which got dropped again. But just look at any check which uses preemptible(), especially those which check !preemptible(): In the X86 #GP handler we have: /* * To be potentially processing a kprobe fault and to trust the result * from kprobe_running(), we have to be non-preemptible. */ if (!preemptible() && kprobe_running() && kprobe_fault_handler(regs, X86_TRAP_GP)) goto exit; and a similar check in the S390 code in kprobe_exceptions_notify(). That all magically 'works' because that code might have been actually tested with lockdep enabled which enforces PREEMPT_COUNT... The SG code has some interesting usage as well: if (miter->__flags & SG_MITER_ATOMIC) { WARN_ON_ONCE(preemptible()); kunmap_atomic(miter->addr); How is that WARN_ON_ONCE() supposed to catch anything? Especially as calling code does: flags = SG_MITER_TO_SG; if (!preemptible()) flags |= SG_MITER_ATOMIC; which is equally useless on kernels which have PREEMPT_COUNT=n. There are bugs which are related to in_atomic() or other in_***() usage all over the place as well. Inconsistency at the core level is a clear recipe for disaster and at some point we have to bite the bullet and accept that consistency is more important than the non measurable 3 cycles? Thanks, tglx