Re: Kernel-managed IRQ affinity (cont)

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Kernel-managed IRQ affinity (cont)
       [not found]       ` <87eew8l7oz.fsf@nanos.tec.linutronix.de>
@ 2020-01-10  1:28         ` Ming Lei
  2020-01-10 19:43           ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2020-01-10  1:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Xu, Juri Lelli, Ming Lei, Linux Kernel Mailing List,
	linux-block

Hello Thomas,

On Thu, Jan 09, 2020 at 09:02:20PM +0100, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@redhat.com> writes:
> 
> > On Thu, Dec 19, 2019 at 09:32:14AM -0500, Peter Xu wrote:
> >> ... this one seems to be more appealing at least to me.
> >
> > OK, please try the following patch:
> >
> >
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index 6c8512d3be88..0fbcbacd1b29 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -13,6 +13,7 @@ enum hk_flags {
> >  	HK_FLAG_TICK		= (1 << 4),
> >  	HK_FLAG_DOMAIN		= (1 << 5),
> >  	HK_FLAG_WQ		= (1 << 6),
> > +	HK_FLAG_MANAGED_IRQ	= (1 << 7),
> >  };
> >  
> >  #ifdef CONFIG_CPU_ISOLATION
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index 1753486b440c..0a75a09cc4e8 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -20,6 +20,7 @@
> >  #include <linux/sched/task.h>
> >  #include <uapi/linux/sched/types.h>
> >  #include <linux/task_work.h>
> > +#include <linux/sched/isolation.h>
> >  
> >  #include "internals.h"
> >  
> > @@ -212,12 +213,33 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> >  {
> >  	struct irq_desc *desc = irq_data_to_desc(data);
> >  	struct irq_chip *chip = irq_data_get_irq_chip(data);
> > +	const struct cpumask *housekeeping_mask =
> > +		housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> >  	int ret;
> > +	cpumask_var_t tmp_mask;
> >  
> >  	if (!chip || !chip->irq_set_affinity)
> >  		return -EINVAL;
> >  
> > -	ret = chip->irq_set_affinity(data, mask, force);
> > +	if (!zalloc_cpumask_var(&tmp_mask, GFP_KERNEL))
> > +		return -EINVAL;
> 
> That's wrong. This code is called with interrupts disabled, so
> GFP_KERNEL is wrong. And NO, we won't do a GFP_ATOMIC allocation here.

OK, looks desc->lock is held.

> 
> > +	/*
> > +	 * Userspace can't change managed irq's affinity, make sure
> > +	 * that isolated CPU won't be selected as the effective CPU
> > +	 * if this irq's affinity includes both isolated CPU and
> > +	 * housekeeping CPU.
> > +	 *
> > +	 * This way guarantees that isolated CPU won't be interrupted
> > +	 * by IO submitted from housekeeping CPU.
> > +	 */
> > +	if (irqd_affinity_is_managed(data) &&
> > +			cpumask_intersects(mask, housekeeping_mask))
> > +		cpumask_and(tmp_mask, mask, housekeeping_mask);
> 
> This is duct tape engineering with absolutely no semantics. I can't even
> figure out the intent of this 'managed_irq' parameter.

The intent is to isolate the specified CPUs from handling managed interrupt.

For non-managed interrupt, the isolation is done via userspace because
userspace is allowed to change non-manage interrupt's affinity.

> 
> If the intent is to keep managed device interrupts away from isolated
> cores then you really want to do that when the interrupts are spread and
> not in the middle of the affinity setter code.
> 
> But first you need to define how that mask should work:
> 
>  1) Exclude CPUs from managed interrupt spreading completely
> 
>  2) Exclude CPUs only when the resulting spreading contains
>     housekeeping CPUs
> 
>  3) Whatever ...

We can do that. The big problem is that the RT case can't guarantee that
IO won't be submitted from isolated CPU always. blk-mq's queue mapping
relies on the setup affinity, so un-known behavior(kernel crash, or io
hang, or other) may be caused if we exclude isolated CPUs from interrupt
affinity.

That is why I try to exclude isolated CPUs from interrupt effective affinity,
turns out the approach is simple and doable.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel-managed IRQ affinity (cont)
  2020-01-10  1:28         ` Kernel-managed IRQ affinity (cont) Ming Lei
@ 2020-01-10 19:43           ` Thomas Gleixner
  2020-01-11  2:48             ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2020-01-10 19:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Peter Xu, Juri Lelli, Ming Lei, Linux Kernel Mailing List,
	linux-block

Ming,

Ming Lei <ming.lei@redhat.com> writes:
> On Thu, Jan 09, 2020 at 09:02:20PM +0100, Thomas Gleixner wrote:
>> Ming Lei <ming.lei@redhat.com> writes:
>>
>> This is duct tape engineering with absolutely no semantics. I can't even
>> figure out the intent of this 'managed_irq' parameter.
>
> The intent is to isolate the specified CPUs from handling managed
> interrupt.

That's what I figured, but it still does not provide semantics and works
just for specific cases.

> We can do that. The big problem is that the RT case can't guarantee that
> IO won't be submitted from isolated CPU always. blk-mq's queue mapping
> relies on the setup affinity, so un-known behavior(kernel crash, or io
> hang, or other) may be caused if we exclude isolated CPUs from interrupt
> affinity.
>
> That is why I try to exclude isolated CPUs from interrupt effective affinity,
> turns out the approach is simple and doable.

Yes, it's doable. But it still is inconsistent behaviour. Assume the
following configuration:

  8 CPUs CPU0,1 assigned for housekeeping

With 8 queues the proposed change does nothing because each queue is
mapped to exactly one CPU.

With 4 queues you get the following:

 CPU0,1       queue 0
 CPU2,3       queue 1
 CPU4,5       queue 2
 CPU6,7       queue 3

No effect on the isolated CPUs either.

With 2 queues you get the following:

 CPU0,1,2,3   queue 0
 CPU4,5,6,7   queue 1

So here the isolated CPUs 2 and 3 get the isolation, but 4-7
not. That's perhaps intended, but definitely not documented.

So you really need to make your mind up and describe what the intended
effect of this is and why you think that the result is correct.

Thanks,

       tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel-managed IRQ affinity (cont)
  2020-01-10 19:43           ` Thomas Gleixner
@ 2020-01-11  2:48             ` Ming Lei
  2020-01-14 13:45               ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2020-01-11  2:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Xu, Juri Lelli, Ming Lei, Linux Kernel Mailing List,
	linux-block

Hi Thomas,

On Fri, Jan 10, 2020 at 08:43:14PM +0100, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@redhat.com> writes:
> > On Thu, Jan 09, 2020 at 09:02:20PM +0100, Thomas Gleixner wrote:
> >> Ming Lei <ming.lei@redhat.com> writes:
> >>
> >> This is duct tape engineering with absolutely no semantics. I can't even
> >> figure out the intent of this 'managed_irq' parameter.
> >
> > The intent is to isolate the specified CPUs from handling managed
> > interrupt.
> 
> That's what I figured, but it still does not provide semantics and works
> just for specific cases.
> 
> > We can do that. The big problem is that the RT case can't guarantee that
> > IO won't be submitted from isolated CPU always. blk-mq's queue mapping
> > relies on the setup affinity, so un-known behavior(kernel crash, or io
> > hang, or other) may be caused if we exclude isolated CPUs from interrupt
> > affinity.
> >
> > That is why I try to exclude isolated CPUs from interrupt effective affinity,
> > turns out the approach is simple and doable.
> 
> Yes, it's doable. But it still is inconsistent behaviour. Assume the
> following configuration:
> 
>   8 CPUs CPU0,1 assigned for housekeeping
> 
> With 8 queues the proposed change does nothing because each queue is
> mapped to exactly one CPU.

That is expected behavior for this RT case, given userspace won't submit
IO from isolated CPUs.

> 
> With 4 queues you get the following:
> 
>  CPU0,1       queue 0
>  CPU2,3       queue 1
>  CPU4,5       queue 2
>  CPU6,7       queue 3
> 
> No effect on the isolated CPUs either.
> 
> With 2 queues you get the following:
> 
>  CPU0,1,2,3   queue 0
>  CPU4,5,6,7   queue 1
> 
> So here the isolated CPUs 2 and 3 get the isolation, but 4-7
> not. That's perhaps intended, but definitely not documented.

That is intentional change, given no IO will be submitted from 4-7
most of times in RT case, so it is fine to select effective CPU from
isolated CPUs in this case. As peter mentioned, IO may just be submitted
from isolated CPUs during booting. Once the system is setup, no IO
comes from isolated CPUs, then no interrupt is delivered to isolated
CPUs, then meet RT's requirement.

We can document this change somewhere.

> 
> So you really need to make your mind up and describe what the intended
> effect of this is and why you think that the result is correct.

In short, if there is at least one housekeeping available in the
interrupt's affinity, we choose effective CPU from housekeeping CPUs.
Otherwise, keep the current behavior wrt. selecting effective CPU.

With this approach, no interrupts can be delivered to isolated CPUs
if no IOs are submitted from these CPUs.

Please let us know if it addresses your concerns.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel-managed IRQ affinity (cont)
  2020-01-11  2:48             ` Ming Lei
@ 2020-01-14 13:45               ` Thomas Gleixner
  2020-01-14 23:38                 ` Ming Lei
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2020-01-14 13:45 UTC (permalink / raw)
  To: Ming Lei
  Cc: Peter Xu, Juri Lelli, Ming Lei, Linux Kernel Mailing List,
	linux-block

Ming,

Ming Lei <ming.lei@redhat.com> writes:
> On Fri, Jan 10, 2020 at 08:43:14PM +0100, Thomas Gleixner wrote:
>> Ming Lei <ming.lei@redhat.com> writes:
>> > That is why I try to exclude isolated CPUs from interrupt effective affinity,
>> > turns out the approach is simple and doable.
>> 
>> Yes, it's doable. But it still is inconsistent behaviour. Assume the
>> following configuration:
>> 
>>   8 CPUs CPU0,1 assigned for housekeeping
>> 
>> With 8 queues the proposed change does nothing because each queue is
>> mapped to exactly one CPU.
>
> That is expected behavior for this RT case, given userspace won't submit
> IO from isolated CPUs.

What is _this_ RT case? We really don't implement policy for a specific
use case. If the kernel implements a policy then it has to be generally
useful and practical.

>> With 4 queues you get the following:
>> 
>>  CPU0,1       queue 0
>>  CPU2,3       queue 1
>>  CPU4,5       queue 2
>>  CPU6,7       queue 3
>> 
>> No effect on the isolated CPUs either.
>> 
>> With 2 queues you get the following:
>> 
>>  CPU0,1,2,3   queue 0
>>  CPU4,5,6,7   queue 1
>> 
>> So here the isolated CPUs 2 and 3 get the isolation, but 4-7
>> not. That's perhaps intended, but definitely not documented.
>
> That is intentional change, given no IO will be submitted from 4-7
> most of times in RT case, so it is fine to select effective CPU from
> isolated CPUs in this case. As peter mentioned, IO may just be submitted
> from isolated CPUs during booting. Once the system is setup, no IO
> comes from isolated CPUs, then no interrupt is delivered to isolated
> CPUs, then meet RT's requirement.

Again. This is a specific usecase. Is this generally applicable?

> We can document this change somewhere.

Yes, this needs to be documented very clearly with that command line
parameter.

>> So you really need to make your mind up and describe what the intended
>> effect of this is and why you think that the result is correct.
>
> In short, if there is at least one housekeeping available in the
> interrupt's affinity, we choose effective CPU from housekeeping CPUs.
> Otherwise, keep the current behavior wrt. selecting effective CPU.
>
> With this approach, no interrupts can be delivered to isolated CPUs
> if no IOs are submitted from these CPUs.
>
> Please let us know if it addresses your concerns.

Mostly. See above.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel-managed IRQ affinity (cont)
  2020-01-14 13:45               ` Thomas Gleixner
@ 2020-01-14 23:38                 ` Ming Lei
  0 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2020-01-14 23:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Xu, Juri Lelli, Ming Lei, Linux Kernel Mailing List,
	linux-block

Hi Thomas,

On Tue, Jan 14, 2020 at 02:45:00PM +0100, Thomas Gleixner wrote:
> Ming,
> 
> Ming Lei <ming.lei@redhat.com> writes:
> > On Fri, Jan 10, 2020 at 08:43:14PM +0100, Thomas Gleixner wrote:
> >> Ming Lei <ming.lei@redhat.com> writes:
> >> > That is why I try to exclude isolated CPUs from interrupt effective affinity,
> >> > turns out the approach is simple and doable.
> >> 
> >> Yes, it's doable. But it still is inconsistent behaviour. Assume the
> >> following configuration:
> >> 
> >>   8 CPUs CPU0,1 assigned for housekeeping
> >> 
> >> With 8 queues the proposed change does nothing because each queue is
> >> mapped to exactly one CPU.
> >
> > That is expected behavior for this RT case, given userspace won't submit
> > IO from isolated CPUs.
> 
> What is _this_ RT case? We really don't implement policy for a specific
> use case. If the kernel implements a policy then it has to be generally
> useful and practical.

Maybe the word of 'RT case' isn't accurate, I thought isolated CPUs is only
used for realtime cases, at least that is Peter's usage, maybe I was
wrong.

But it can be generic for all isolated CPUs cases, in which users
don't want managed interrupts to disturb the isolated CPU cores.

> 
> >> With 4 queues you get the following:
> >> 
> >>  CPU0,1       queue 0
> >>  CPU2,3       queue 1
> >>  CPU4,5       queue 2
> >>  CPU6,7       queue 3
> >> 
> >> No effect on the isolated CPUs either.
> >> 
> >> With 2 queues you get the following:
> >> 
> >>  CPU0,1,2,3   queue 0
> >>  CPU4,5,6,7   queue 1
> >> 
> >> So here the isolated CPUs 2 and 3 get the isolation, but 4-7
> >> not. That's perhaps intended, but definitely not documented.
> >
> > That is intentional change, given no IO will be submitted from 4-7
> > most of times in RT case, so it is fine to select effective CPU from
> > isolated CPUs in this case. As peter mentioned, IO may just be submitted
> > from isolated CPUs during booting. Once the system is setup, no IO
> > comes from isolated CPUs, then no interrupt is delivered to isolated
> > CPUs, then meet RT's requirement.
> 
> Again. This is a specific usecase. Is this generally applicable?

As mentioned above, it can be applied for all isolated CPUs, when users
don't want managed interrupts to disturb these CPU cores.

> 
> > We can document this change somewhere.
> 
> Yes, this needs to be documented very clearly with that command line
> parameter.

OK, will do that in formal post.

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-14 23:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20191216195712.GA161272@xz-x1>
     [not found] ` <20191219082819.GB15731@ming.t460p>
     [not found]   ` <20191219143214.GA50561@xz-x1>
     [not found]     ` <20191219161115.GA18672@ming.t460p>
     [not found]       ` <87eew8l7oz.fsf@nanos.tec.linutronix.de>
2020-01-10  1:28         ` Kernel-managed IRQ affinity (cont) Ming Lei
2020-01-10 19:43           ` Thomas Gleixner
2020-01-11  2:48             ` Ming Lei
2020-01-14 13:45               ` Thomas Gleixner
2020-01-14 23:38                 ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).