public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sven Schnelle <svens@linux.ibm.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	Pingfan Liu <kernelfans@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Joey Gouly <joey.gouly@arm.com>,
	Sami Tolvanen <samitolvanen@google.com>,
	Julien Thierry <julien.thierry@arm.com>,
	Yuichi Ito <ito-yuichi@fujitsu.com>,
	linux-kernel@vger.kernel.org, Vasily Gorbik <gor@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>
Subject: Re: [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of
Date: Tue, 28 Sep 2021 11:52:51 +0200	[thread overview]
Message-ID: <yt9dtui53u30.fsf@linux.ibm.com> (raw)
In-Reply-To: <20210928083222.GA1924@C02TD0UTHF1T.local> (Mark Rutland's message of "Tue, 28 Sep 2021 09:32:22 +0100")

Mark Rutland <mark.rutland@arm.com> writes:

> On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote:
>> On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote:
>> > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote:
>> > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote:
>> > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics]
>> > > > 
>> > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote:
>> > > > > After introducing arm64/kernel/entry_common.c which is akin to
>> > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as
>> > > > > the following:
>> > > > >     enter_from_kernel_mode()->rcu_irq_enter().
>> > > > > And
>> > > > >     gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter()
>> > > > >
>> > > > > Besides redundance, based on code analysis, the redundance also raise
>> > > > > some mistake, e.g.  rcu_data->dynticks_nmi_nesting inc 2, which causes
>> > > > > rcu_is_cpu_rrupt_from_idle() unexpected.
>> > > > 
>> > > > Hmmm...
>> > > > 
>> > > > The fundamental questionss are:
>> > > > 
>> > > > 1) Who is supposed to be responsible for doing the rcu entry/exit?
>> > > > 
>> > > > 2) Is it supposed to matter if this happens multiple times?
>> > > > 
>> > > > For (1), I'd generally expect that this is supposed to happen in the
>> > > > arch/common entry code, since that itself (or the irqchip driver) could
>> > > > depend on RCU, and if that's the case thatn handle_domain_irq()
>> > > > shouldn't need to call rcu_irq_enter(). That would be consistent with
>> > > > the way we handle all other exceptions.
>> > > > 
>> > > > For (2) I don't know whether the level of nesting is suppoosed to
>> > > > matter. I was under the impression it wasn't meant to matter in general,
>> > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a
>> > > > specific level of nesting.
>> > > > 
>> > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to
>> > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which
>> > > > doesn't sound right, at least...
>> > > > 
>> > > > Thomas, Paul, thoughts?
>> > > 
>> > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls
>> > > be balanced.  Normally, this is taken care of by the fact that irq_enter()
>> > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit().  Similarly,
>> > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit().
>> > 
>> > Sure; I didn't mean to suggest those weren't balanced! The problem here
>> > is *nesting*. Due to the structure of our entry code and the core IRQ
>> > code, when handling an IRQ we have a sequence:
>> > 
>> > 	irq_enter() // arch code
>> > 	irq_enter() // irq code
>> > 
>> > 	< irq handler here >
>> > 
>> > 	irq_exit() // irq code
>> > 	irq_exit() // arch code
>> > 
>> > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the
>> > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the
>> > expected result because of the additional nesting, since
>> > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting
>> > is only incremented once per exception entry, when it does:
>> > 
>> > 	/* Are we at first interrupt nesting level? */
>> > 	nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting);
>> > 	if (nesting > 1)
>> > 		return false;
>> > 
>> > What I'm trying to figure out is whether that expectation is legitimate,
>> > and assuming so, where the entry/exit should happen.
>> 
>> Oooh...
>> 
>> The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will
>> be unable to detect a userspace quiescent state for a non-nohz_full
>> CPU.  That could result in RCU CPU stall warnings if a user task runs
>> continuously on a given CPU for more than 21 seconds (60 seconds in
>> some distros).  And this can easily happen if the user has a CPU-bound
>> thread that is the only runnable task on that CPU.
>> 
>> So, yes, this does need some sort of resolution.
>> 
>> The traditional approach is (as you surmise) to have only a single call
>> to irq_enter() on exception entry and only a single call to irq_exit()
>> on exception exit.  If this is feasible, it is highly recommended.
>
> Cool; that's roughly what I was expecting / hoping to hear!
>
>> In theory, we could have that "1" in "nesting > 1" be a constant supplied
>> by the architecture (you would want "3" if I remember correctly) but
>> in practice could we please avoid this?  For one thing, if there is
>> some other path into the kernel for your architecture that does only a
>> single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand
>> a chance.  It would need to compare against a different value depending
>> on what exception showed up.  Even if that cannot happen, it would be
>> better if your architecture could remain in blissful ignorance of the
>> colorful details of ->dynticks_nmi_nesting manipulations.
>
> I completely agree. I think it's much harder to keep that in check than
> to enforce a "once per architectural exception" policy in the arch code.
>
>> Another approach would be for the arch code to supply RCU a function that
>> it calls.  If there is such a function (or perhaps better, if some new
>> Kconfig option is enabled), RCU invokes it.  Otherwise, it compares to
>> "1" as it does now.  But you break it, you buy it!  ;-)
>
> I guess we could look at the exception regs and inspect the original
> context, but it sounds overkill...
>
> I think the cleanest thing is to leave this to arch code, and have the
> common IRQ code stay well clear. Unfortunately most architectures
> (including arch/arm) still need the common IRQ code to handle this, so
> we'll have to make that conditional on Kconfig, something like the below
> (build+boot tested only).
>
> If there are no objections, I'll go check who else needs the same
> treatment (IIUC at least s390 will), and spin that as a real
> patch/series.

Hmm, s390 doesn't use handle_domain_irq() and doesn't have
HANDLE_DOMAIN_IRQ set. So i don't think the patch below applies to s390.
However, i'll follow the code to make sure we're not calling
irq_enter/irq_exit twice.

> Thanks,
> Mark.
>
> ---->8----
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 8df1c7102643..c59475e50e4c 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -225,6 +225,12 @@ config GENERIC_SMP_IDLE_THREAD
>  config GENERIC_IDLE_POLL_SETUP
>         bool
>  
> +config ARCH_ENTERS_IRQ
> +       bool
> +       help
> +         An architecture should select this when it performs irq entry
> +         management itself (e.g. calling irq_enter() and irq_exit()).
> +
>  config ARCH_HAS_FORTIFY_SOURCE
>         bool
>         help
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 5c7ae4c3954b..fa6476bf2b4d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -16,6 +16,7 @@ config ARM64
>         select ARCH_ENABLE_MEMORY_HOTREMOVE
>         select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
>         select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
> +       select ARCH_ENTERS_IRQ
>         select ARCH_HAS_CACHE_LINE_SIZE
>         select ARCH_HAS_DEBUG_VIRTUAL
>         select ARCH_HAS_DEBUG_VM_PGTABLE
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> index 4e3c29bb603c..6affa12222e0 100644
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -677,6 +677,15 @@ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq)
>  EXPORT_SYMBOL_GPL(generic_handle_domain_irq);
>  
>  #ifdef CONFIG_HANDLE_DOMAIN_IRQ
> +
> +#ifdef ARCH_ENTERS_IRQ
> +#define handle_irq_enter()
> +#define handle_irq_exit()
> +#else
> +#define handle_irq_enter()     irq_enter()
> +#define handle_irq_exit()      irq_exit()
> +#endif
> +
>  /**
>   * handle_domain_irq - Invoke the handler for a HW irq belonging to a domain,
>   *                     usually for a root interrupt controller
> @@ -693,7 +702,7 @@ int handle_domain_irq(struct irq_domain *domain,
>         struct irq_desc *desc;
>         int ret = 0;
>  
> -       irq_enter();
> +       handle_irq_enter();
>  
>         /* The irqdomain code provides boundary checks */
>         desc = irq_resolve_mapping(domain, hwirq);
> @@ -702,7 +711,7 @@ int handle_domain_irq(struct irq_domain *domain,
>         else
>                 ret = -EINVAL;
>  
> -       irq_exit();
> +       handle_irq_exit();
>         set_irq_regs(old_regs);
>         return ret;
>  }

  parent reply	other threads:[~2021-09-28  9:53 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 13:28 [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of Pingfan Liu
2021-09-24 13:28 ` [PATCHv2 1/5] arm64/entry-common: push the judgement of nmi ahead Pingfan Liu
2021-09-24 17:53   ` Mark Rutland
2021-09-25 15:39     ` Pingfan Liu
2021-09-30 13:32       ` Mark Rutland
2021-10-08  4:01         ` Pingfan Liu
2021-10-08 14:55           ` Pingfan Liu
2021-10-08 17:25             ` Mark Rutland
2021-10-09  3:49               ` Pingfan Liu
2021-10-08 15:45           ` Paul E. McKenney
2021-10-09  4:14             ` Pingfan Liu
2021-09-24 13:28 ` [PATCHv2 2/5] irqchip/GICv3: expose handle_nmi() directly Pingfan Liu
2021-09-24 13:28 ` [PATCHv2 3/5] kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch optional Pingfan Liu
2021-09-28  8:55   ` Mark Rutland
2021-09-29  3:15     ` Pingfan Liu
2021-09-24 13:28 ` [PATCHv2 4/5] irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64 Pingfan Liu
2021-09-28  9:10   ` Mark Rutland
2021-09-29  3:10     ` Pingfan Liu
2021-09-29  7:20       ` Marc Zyngier
2021-09-29  8:27         ` Pingfan Liu
2021-09-29  9:23           ` Mark Rutland
2021-09-29 11:40             ` Pingfan Liu
2021-09-29 14:29             ` Pingfan Liu
2021-09-29 17:41               ` Mark Rutland
2021-09-24 13:28 ` [PATCHv2 5/5] irqchip/GICv3: make reschedule-ipi light weight Pingfan Liu
2021-09-29  7:24   ` Marc Zyngier
2021-09-29  8:32     ` Pingfan Liu
2021-09-24 17:36 ` [PATCHv2 0/5] arm64/irqentry: remove duplicate housekeeping of Mark Rutland
2021-09-24 22:59   ` Paul E. McKenney
2021-09-27  9:23     ` Mark Rutland
2021-09-28  0:09       ` Paul E. McKenney
2021-09-28  8:32         ` Mark Rutland
2021-09-28  8:35           ` Mark Rutland
2021-09-28  9:52           ` Sven Schnelle [this message]
2021-09-28 10:26             ` Mark Rutland
2021-09-28 13:55           ` Paul E. McKenney
2021-09-25 15:12   ` Pingfan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yt9dtui53u30.fsf@linux.ibm.com \
    --to=svens@linux.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=ito-yuichi@fujitsu.com \
    --cc=joey.gouly@arm.com \
    --cc=julien.thierry@arm.com \
    --cc=kernelfans@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=samitolvanen@google.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox