Re: Kernel-managed IRQ affinity (cont)

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Xu <peterx@redhat.com>, Juri Lelli <juri.lelli@redhat.com>,
	Ming Lei <minlei@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org
Subject: Re: Kernel-managed IRQ affinity (cont)
Date: Fri, 10 Jan 2020 09:28:02 +0800	[thread overview]
Message-ID: <20200110012802.GA4501@ming.t460p> (raw)
In-Reply-To: <87eew8l7oz.fsf@nanos.tec.linutronix.de>

Hello Thomas,

On Thu, Jan 09, 2020 at 09:02:20PM +0100, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@redhat.com> writes:
> 
> > On Thu, Dec 19, 2019 at 09:32:14AM -0500, Peter Xu wrote:
> >> ... this one seems to be more appealing at least to me.
> >
> > OK, please try the following patch:
> >
> >
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index 6c8512d3be88..0fbcbacd1b29 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -13,6 +13,7 @@ enum hk_flags {
> >  	HK_FLAG_TICK		= (1 << 4),
> >  	HK_FLAG_DOMAIN		= (1 << 5),
> >  	HK_FLAG_WQ		= (1 << 6),
> > +	HK_FLAG_MANAGED_IRQ	= (1 << 7),
> >  };
> >  
> >  #ifdef CONFIG_CPU_ISOLATION
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 1753486b440c..0a75a09cc4e8 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/sched/task.h>
> >  #include <uapi/linux/sched/types.h>
> >  #include <linux/task_work.h>
> > +#include <linux/sched/isolation.h>
> >  
> >  #include "internals.h"
> >  
> > @@ -212,12 +213,33 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> >  {
> >  	struct irq_desc *desc = irq_data_to_desc(data);
> >  	struct irq_chip *chip = irq_data_get_irq_chip(data);
> > +	const struct cpumask *housekeeping_mask =
> > +		housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> >  	int ret;
> > +	cpumask_var_t tmp_mask;
> >  
> >  	if (!chip || !chip->irq_set_affinity)
> >  		return -EINVAL;
> >  
> > -	ret = chip->irq_set_affinity(data, mask, force);
> > +	if (!zalloc_cpumask_var(&tmp_mask, GFP_KERNEL))
> > +		return -EINVAL;
> 
> That's wrong. This code is called with interrupts disabled, so
> GFP_KERNEL is wrong. And NO, we won't do a GFP_ATOMIC allocation here.

OK, looks desc->lock is held.

> 
> > +	/*
> > +	 * Userspace can't change managed irq's affinity, make sure
> > +	 * that isolated CPU won't be selected as the effective CPU
> > +	 * if this irq's affinity includes both isolated CPU and
> > +	 * housekeeping CPU.
> > +	 *
> > +	 * This way guarantees that isolated CPU won't be interrupted
> > +	 * by IO submitted from housekeeping CPU.
> > +	 */
> > +	if (irqd_affinity_is_managed(data) &&
> > +			cpumask_intersects(mask, housekeeping_mask))
> > +		cpumask_and(tmp_mask, mask, housekeeping_mask);
> 
> This is duct tape engineering with absolutely no semantics. I can't even
> figure out the intent of this 'managed_irq' parameter.

The intent is to isolate the specified CPUs from handling managed interrupt.

For non-managed interrupt, the isolation is done via userspace because
userspace is allowed to change non-manage interrupt's affinity.

> 
> If the intent is to keep managed device interrupts away from isolated
> cores then you really want to do that when the interrupts are spread and
> not in the middle of the affinity setter code.
> 
> But first you need to define how that mask should work:
> 
>  1) Exclude CPUs from managed interrupt spreading completely
> 
>  2) Exclude CPUs only when the resulting spreading contains
>     housekeeping CPUs
> 
>  3) Whatever ...

We can do that. The big problem is that the RT case can't guarantee that
IO won't be submitted from isolated CPU always. blk-mq's queue mapping
relies on the setup affinity, so un-known behavior(kernel crash, or io
hang, or other) may be caused if we exclude isolated CPUs from interrupt
affinity.

That is why I try to exclude isolated CPUs from interrupt effective affinity,
turns out the approach is simple and doable.


Thanks,
Ming

next      parent reply	other threads:[~2020-01-10  1:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20191216195712.GA161272@xz-x1>
     [not found] ` <20191219082819.GB15731@ming.t460p>
     [not found]   ` <20191219143214.GA50561@xz-x1>
     [not found]     ` <20191219161115.GA18672@ming.t460p>
     [not found]       ` <87eew8l7oz.fsf@nanos.tec.linutronix.de>
2020-01-10  1:28         ` Ming Lei [this message]
2020-01-10 19:43           ` Kernel-managed IRQ affinity (cont) Thomas Gleixner
2020-01-11  2:48             ` Ming Lei
2020-01-14 13:45               ` Thomas Gleixner
2020-01-14 23:38                 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200110012802.GA4501@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minlei@redhat.com \
    --cc=peterx@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).