From: Ming Lei <ming.lei@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Xu <peterx@redhat.com>, Juri Lelli <juri.lelli@redhat.com>,
Ming Lei <minlei@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-block@vger.kernel.org
Subject: Re: Kernel-managed IRQ affinity (cont)
Date: Fri, 10 Jan 2020 09:28:02 +0800 [thread overview]
Message-ID: <20200110012802.GA4501@ming.t460p> (raw)
In-Reply-To: <87eew8l7oz.fsf@nanos.tec.linutronix.de>
Hello Thomas,
On Thu, Jan 09, 2020 at 09:02:20PM +0100, Thomas Gleixner wrote:
> Ming,
>
> Ming Lei <ming.lei@redhat.com> writes:
>
> > On Thu, Dec 19, 2019 at 09:32:14AM -0500, Peter Xu wrote:
> >> ... this one seems to be more appealing at least to me.
> >
> > OK, please try the following patch:
> >
> >
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index 6c8512d3be88..0fbcbacd1b29 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -13,6 +13,7 @@ enum hk_flags {
> > HK_FLAG_TICK = (1 << 4),
> > HK_FLAG_DOMAIN = (1 << 5),
> > HK_FLAG_WQ = (1 << 6),
> > + HK_FLAG_MANAGED_IRQ = (1 << 7),
> > };
> >
> > #ifdef CONFIG_CPU_ISOLATION
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 1753486b440c..0a75a09cc4e8 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -20,6 +20,7 @@
> > #include <linux/sched/task.h>
> > #include <uapi/linux/sched/types.h>
> > #include <linux/task_work.h>
> > +#include <linux/sched/isolation.h>
> >
> > #include "internals.h"
> >
> > @@ -212,12 +213,33 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> > {
> > struct irq_desc *desc = irq_data_to_desc(data);
> > struct irq_chip *chip = irq_data_get_irq_chip(data);
> > + const struct cpumask *housekeeping_mask =
> > + housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> > int ret;
> > + cpumask_var_t tmp_mask;
> >
> > if (!chip || !chip->irq_set_affinity)
> > return -EINVAL;
> >
> > - ret = chip->irq_set_affinity(data, mask, force);
> > + if (!zalloc_cpumask_var(&tmp_mask, GFP_KERNEL))
> > + return -EINVAL;
>
> That's wrong. This code is called with interrupts disabled, so
> GFP_KERNEL is wrong. And NO, we won't do a GFP_ATOMIC allocation here.
OK, looks desc->lock is held.
>
> > + /*
> > + * Userspace can't change managed irq's affinity, make sure
> > + * that isolated CPU won't be selected as the effective CPU
> > + * if this irq's affinity includes both isolated CPU and
> > + * housekeeping CPU.
> > + *
> > + * This way guarantees that isolated CPU won't be interrupted
> > + * by IO submitted from housekeeping CPU.
> > + */
> > + if (irqd_affinity_is_managed(data) &&
> > + cpumask_intersects(mask, housekeeping_mask))
> > + cpumask_and(tmp_mask, mask, housekeeping_mask);
>
> This is duct tape engineering with absolutely no semantics. I can't even
> figure out the intent of this 'managed_irq' parameter.
The intent is to isolate the specified CPUs from handling managed interrupt.
For non-managed interrupt, the isolation is done via userspace because
userspace is allowed to change non-manage interrupt's affinity.
>
> If the intent is to keep managed device interrupts away from isolated
> cores then you really want to do that when the interrupts are spread and
> not in the middle of the affinity setter code.
>
> But first you need to define how that mask should work:
>
> 1) Exclude CPUs from managed interrupt spreading completely
>
> 2) Exclude CPUs only when the resulting spreading contains
> housekeeping CPUs
>
> 3) Whatever ...
We can do that. The big problem is that the RT case can't guarantee that
IO won't be submitted from isolated CPU always. blk-mq's queue mapping
relies on the setup affinity, so un-known behavior(kernel crash, or io
hang, or other) may be caused if we exclude isolated CPUs from interrupt
affinity.
That is why I try to exclude isolated CPUs from interrupt effective affinity,
turns out the approach is simple and doable.
Thanks,
Ming
next parent reply other threads:[~2020-01-10 1:28 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20191216195712.GA161272@xz-x1>
[not found] ` <20191219082819.GB15731@ming.t460p>
[not found] ` <20191219143214.GA50561@xz-x1>
[not found] ` <20191219161115.GA18672@ming.t460p>
[not found] ` <87eew8l7oz.fsf@nanos.tec.linutronix.de>
2020-01-10 1:28 ` Ming Lei [this message]
2020-01-10 19:43 ` Kernel-managed IRQ affinity (cont) Thomas Gleixner
2020-01-11 2:48 ` Ming Lei
2020-01-14 13:45 ` Thomas Gleixner
2020-01-14 23:38 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200110012802.GA4501@ming.t460p \
--to=ming.lei@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=minlei@redhat.com \
--cc=peterx@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).