From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D624EC63777 for ; Fri, 20 Nov 2020 09:20:58 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 48D062224C for ; Fri, 20 Nov 2020 09:20:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="pwuRDt0g"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ds2ReALj" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 48D062224C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Message-ID:References:In-Reply-To:Subject:To:From: Date:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ZVHBavtFRMEJIZucqJmKM6hx+YYJN6UuDjpKroGfK0Q=; b=pwuRDt0gy5uz/5CirE1BQjqAI udOOvjAADmhJSaorh1sDmZra9k+T1HU7IVBsL2ooyQS5Wtw5VdqixTPdDjxlHB8rUniUG2U3HfNJJ Wlkqc8x6l+99L+RUnD/VyqPiSca7b+eq6usrbn/RhLRfIvEgU9BgJ5nZj+wwTxmo4YwYO7EE9E6ji gt3ZfFSBtUcUR9WMkUErB7cZv53nffaNZGJfBUBH/fZaIEKqiLVLiThzg/r57iSO1wUB0JhBkRH6S +5kLuBSiR1HShvua6bwJuSH8YsciXKMkdbnyUMA/1My6QP2Km7ppiUnWegXQ0J3Jv9+k5ksnsRbG/ Fg65QhpQQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kg2ae-0007b1-Kc; Fri, 20 Nov 2020 09:20:24 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kg2ab-0007ZA-3K for linux-arm-kernel@lists.infradead.org; Fri, 20 Nov 2020 09:20:22 +0000 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 180B12224C; Fri, 20 Nov 2020 09:20:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1605864020; bh=8RZpdc9rhvQ7VBlC0MqUV7y9Pd6IAW8XU8MT5TxsxtI=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Ds2ReALj76fYX0N9YcUBzIBPwBsr9Uf3CQFvm7iLhEfc3P/Qg7QVtvY91ba3t/d1y sqW3JKMsjWtLDodt27D7BapHVvxNXMELvJEXT+nMethmXcDSrXyBb919mEQ7EVop2M vad/tsLIWr/BRVuzZjTAF7hRK6ct1YOD5cXiMYpA= Received: from disco-boy.misterjones.org ([51.254.78.96] helo=www.loen.fr) by disco-boy.misterjones.org with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94) (envelope-from ) id 1kg2aX-00CCq5-M9; Fri, 20 Nov 2020 09:20:17 +0000 MIME-Version: 1.0 Date: Fri, 20 Nov 2020 09:20:17 +0000 From: Marc Zyngier To: Thomas Gleixner Subject: Re: [PATCH 0/2] arm64: Allow the rescheduling IPI to bypass irq_enter/exit In-Reply-To: <87ft5q18qs.fsf@nanos.tec.linutronix.de> References: <20201101131430.257038-1-maz@kernel.org> <87ft5q18qs.fsf@nanos.tec.linutronix.de> User-Agent: Roundcube Webmail/1.4.9 Message-ID: <91cde5eeb22eb2926515dd27113c664a@kernel.org> X-Sender: maz@kernel.org X-SA-Exim-Connect-IP: 51.254.78.96 X-SA-Exim-Rcpt-To: tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, will@kernel.org, catalin.marinas@arm.com, Valentin.Schneider@arm.com, peterz@infradead.org, kernel-team@android.com, mark.rutland@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201120_042021_321947_C311FA3C X-CRM114-Status: GOOD ( 35.03 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, Android Kernel Team , Peter Zijlstra , Catalin Marinas , linux-kernel , Will Deacon , Valentin Schneider , LAK Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org [+ Mark who has been hacking in the same area lately] Hi Thomas, On 2020-11-03 20:32, Thomas Gleixner wrote: > On Sun, Nov 01 2020 at 13:14, Marc Zyngier wrote: >> Vincent recently reported [1] that 5.10-rc1 showed a significant >> regression when running "perf bench sched pipe" on arm64, and >> pinpointed it to the recent move to handling IPIs as normal >> interrupts. >> >> The culprit is the use of irq_enter/irq_exit around the handling of >> the rescheduling IPI, meaning that we enter the scheduler right after >> the handling of the IPI instead of deferring it to the next preemption >> event. This accounts for most of the overhead introduced. > > irq_enter()/exit() does not end up in the scheduler. If it does then > please show the call chain. > > Scheduling happens when the IPI returns just before returning into the > low level code (or on ARM in the low level code) when NEED_RESCHED is > set (which is usually the case when the IPI is sent) and: > > the IPI hit user space > > or > > the IPI hit in preemptible kernel context and CONFIG_PREEMPT[_RT] is > enabled. > > Not doing so would be a bug. So I really don't understand your > reasoning > here. You are of course right. I somehow associated the overhead of the resched IPI with the scheduler itself. I stand corrected. > >> On architectures that have architected IPIs at the CPU level (x86 >> being the obvious one), the absence of irq_enter/exit is natural. > > It's not really architected IPIs. We reserve the top 20ish vectors on > each CPU for IPIs and other per processor interrupts, e.g. the per cpu > timer. > > Now lets look at what x86 does: > > Interrupts and regular IPIs (function call ....) do > > irqentry_enter() <- handles rcu_irq_enter() or context tracking > ... > irq_enter_rcu() > ... > irq_exit_rcu() > irqentry_exit() <- handles need_resched() > > The scheduler IPI does: > > irqentry_enter() <- handles rcu_irq_enter() or context tracking > ... > __irq_enter_raw() > ... > __irq_exit_raw() > irqentry_exit() <- handles need_resched() > > So we don't invoke irq_enter_rcu() on enter and on exit we skip > irq_exit_rcu(). That's fine because > > - Calling the tick management is pointless because this is going to > schedule anyway or something consumed the need_resched already. > > - The irqtime accounting is silly because it covers only the call and > returns. The time spent in the accounting is more than the time > we are accounting (empty call). > > So what your hack fails to invoke is rcu_irq_enter()/exit() in case > that > the IPI hits the idle task in an RCU off region. You also fail to tell > lockdep. No cookies! Who needs cookies when you can have cheese? ;-) More seriously, it seems to me that we have a bit of a cross-architecture disconnect here. I have been trying to join the dots between what you describe above, and the behaviour of arm64 (and probably a large number of the non-x86 architectures), and I feel massively confused. Up to 5.9, our handling of the rescheduling IPI was "do as little as possible": decode the interrupt from the lowest possible level (the root irqchip), call into arch code, end-up in scheduler_ipi(), the end. No lockdep, no RCU, no nothing. What changed? Have we missed some radical change in the way the core kernel expects the arch code to do thing? I'm aware of the kernel/entry/common.c stuff, which implements most of the goodies you mention, but this effectively is x86-only at the moment. If arm64 has forever been broken, I'd really like to know and fix it. >> The bad news is that these patches are ugly as sin, and I really don't >> like them. > > Yes, they are ugly and the extra conditional in the irq handling path > is > not pretty either. > >> I specially hate that they can give driver authors the idea that they >> can make random interrupts "faster". > > Just split the guts of irq_modify_status() into __irq_modify_status() > and call that from irq_modify_status(). > > Reject IRQ_HIDDEN (which should have been IRQ_IPI really) and IRQ_NAKED > (IRQ_RAW perhaps) in irq_modify_status(). > > Do not export __irq_modify_status() so it can be only used from > built-in > code which takes it away from driver writers. Yup, done. Thanks, M. -- Jazz is not dead. It just smells funny... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel