All of lore.kernel.org
 help / color / mirror / Atom feed
From: yangyingliang@huawei.com (Yang Yingliang)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2] irqchip/gic-v3-its: fix ITS queue timeout
Date: Mon, 11 Jun 2018 19:23:42 +0800	[thread overview]
Message-ID: <5B1E5BBE.9010907@huawei.com> (raw)
In-Reply-To: <0ebd8eef-1a86-3c7e-cd3b-f9580c497b5c@arm.com>

Hi, Marc

On 2018/6/11 17:31, Marc Zyngier wrote:
> On 06/06/18 03:40, Yang Yingliang wrote:
>> When the kernel booted with maxcpus=x, 'x' is smaller
>> than actual cpu numbers, the TAs of offline cpus won't
>> be set to its->collection.
>>
>> If LPI is bind to offline cpu, sync cmd will use zero TA,
>> it leads to ITS queue timeout.  Fix this by choosing a
>> online cpu, if there is no online cpu in cpu_mask.
>>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>   drivers/irqchip/irq-gic-v3-its.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 5416f2b..d8b9539 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -2309,7 +2309,9 @@ static int its_irq_domain_activate(struct irq_domain *domain,
>>   		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>>   
>>   	/* Bind the LPI to the first possible CPU */
>> -	cpu = cpumask_first(cpu_mask);
>> +	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
>> +	if (cpu >= nr_cpu_ids)
>> +		cpu = cpumask_first(cpu_online_mask);
> I've thought about this one a bit more, and apart from breaking TX1
> in a very bad way, I think it is actually correct. It is just that
> the commit message doesn't make much sense.
>
> The way I understand it is:
> - this is a NUMA system, with at least one node not online
> - the SRAT table indicates that this ITS is local to an offline node
Yes, your comment is more proper and correct. Mine describes how the BUG 
happens.
I will send a v3 later with proper comment.

Thanks,
Yang
>
> In that case, we need to pick an online CPU, and any will do (again,
> ignoring the silly Cavium erratum). Explained like this, the above
> hunk is sensible, and just needs to handle the TX1 quirk. Something like:
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 5416f2b2ac21..21b7b5151177 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -2309,7 +2309,13 @@ static int its_irq_domain_activate(struct irq_domain *domain,
>   		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>   
>   	/* Bind the LPI to the first possible CPU */
> -	cpu = cpumask_first(cpu_mask);
> +	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
> +	if (cpu >= nr_cpu_idx) {
> +		if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
> +			return -EINVAL;
> +
> +		cpu = cpumask_first(cpu_online_mask);
> +	}
>   	its_dev->event_map.col_map[event] = cpu;
>   	irq_data_update_effective_affinity(d, cpumask_of(cpu));
>   
>
>>   	its_dev->event_map.col_map[event] = cpu;
>>   	irq_data_update_effective_affinity(d, cpumask_of(cpu));
>>   
>> @@ -2466,7 +2468,10 @@ static int its_vpe_set_affinity(struct irq_data *d,
>>   				bool force)
>>   {
>>   	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
>> -	int cpu = cpumask_first(mask_val);
>> +	int cpu = cpumask_first_and(mask_val, cpu_online_mask);
>> +
>> +	if (cpu >= nr_cpu_ids)
>> +		cpu = cpumask_first(cpu_online_mask);
>>   
>>   	/*
>>   	 * Changing affinity is mega expensive, so let's be as lazy as
>>
> This hunk, on the other hand, is completely useless. Look how this is
> called from vgic_v4_flush_hwstate():
>
> 	err = irq_set_affinity(irq, cpumask_of(smp_processor_id()));
>
> The mask is always that of the CPU we run on, and we're in a non-premptible
> section. So no way we can be targeting an offline CPU.
>
> If you quickly respin this patch with a decent commit log, I'll take it.
>
> Thanks,
>
> 	M.

WARNING: multiple messages have this Message-ID (diff)
From: Yang Yingliang <yangyingliang@huawei.com>
To: Marc Zyngier <marc.zyngier@arm.com>, <julien.thierry@arm.com>
Cc: <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	Hanjun Guo <guohanjun@huawei.com>
Subject: Re: [PATCH v2] irqchip/gic-v3-its: fix ITS queue timeout
Date: Mon, 11 Jun 2018 19:23:42 +0800	[thread overview]
Message-ID: <5B1E5BBE.9010907@huawei.com> (raw)
In-Reply-To: <0ebd8eef-1a86-3c7e-cd3b-f9580c497b5c@arm.com>

Hi, Marc

On 2018/6/11 17:31, Marc Zyngier wrote:
> On 06/06/18 03:40, Yang Yingliang wrote:
>> When the kernel booted with maxcpus=x, 'x' is smaller
>> than actual cpu numbers, the TAs of offline cpus won't
>> be set to its->collection.
>>
>> If LPI is bind to offline cpu, sync cmd will use zero TA,
>> it leads to ITS queue timeout.  Fix this by choosing a
>> online cpu, if there is no online cpu in cpu_mask.
>>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>   drivers/irqchip/irq-gic-v3-its.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 5416f2b..d8b9539 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -2309,7 +2309,9 @@ static int its_irq_domain_activate(struct irq_domain *domain,
>>   		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>>   
>>   	/* Bind the LPI to the first possible CPU */
>> -	cpu = cpumask_first(cpu_mask);
>> +	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
>> +	if (cpu >= nr_cpu_ids)
>> +		cpu = cpumask_first(cpu_online_mask);
> I've thought about this one a bit more, and apart from breaking TX1
> in a very bad way, I think it is actually correct. It is just that
> the commit message doesn't make much sense.
>
> The way I understand it is:
> - this is a NUMA system, with at least one node not online
> - the SRAT table indicates that this ITS is local to an offline node
Yes, your comment is more proper and correct. Mine describes how the BUG 
happens.
I will send a v3 later with proper comment.

Thanks,
Yang
>
> In that case, we need to pick an online CPU, and any will do (again,
> ignoring the silly Cavium erratum). Explained like this, the above
> hunk is sensible, and just needs to handle the TX1 quirk. Something like:
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 5416f2b2ac21..21b7b5151177 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -2309,7 +2309,13 @@ static int its_irq_domain_activate(struct irq_domain *domain,
>   		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>   
>   	/* Bind the LPI to the first possible CPU */
> -	cpu = cpumask_first(cpu_mask);
> +	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
> +	if (cpu >= nr_cpu_idx) {
> +		if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
> +			return -EINVAL;
> +
> +		cpu = cpumask_first(cpu_online_mask);
> +	}
>   	its_dev->event_map.col_map[event] = cpu;
>   	irq_data_update_effective_affinity(d, cpumask_of(cpu));
>   
>
>>   	its_dev->event_map.col_map[event] = cpu;
>>   	irq_data_update_effective_affinity(d, cpumask_of(cpu));
>>   
>> @@ -2466,7 +2468,10 @@ static int its_vpe_set_affinity(struct irq_data *d,
>>   				bool force)
>>   {
>>   	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
>> -	int cpu = cpumask_first(mask_val);
>> +	int cpu = cpumask_first_and(mask_val, cpu_online_mask);
>> +
>> +	if (cpu >= nr_cpu_ids)
>> +		cpu = cpumask_first(cpu_online_mask);
>>   
>>   	/*
>>   	 * Changing affinity is mega expensive, so let's be as lazy as
>>
> This hunk, on the other hand, is completely useless. Look how this is
> called from vgic_v4_flush_hwstate():
>
> 	err = irq_set_affinity(irq, cpumask_of(smp_processor_id()));
>
> The mask is always that of the CPU we run on, and we're in a non-premptible
> section. So no way we can be targeting an offline CPU.
>
> If you quickly respin this patch with a decent commit log, I'll take it.
>
> Thanks,
>
> 	M.

  reply	other threads:[~2018-06-11 11:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-06  2:40 [PATCH v2] irqchip/gic-v3-its: fix ITS queue timeout Yang Yingliang
2018-06-06  2:40 ` Yang Yingliang
2018-06-06  9:13 ` Marc Zyngier
2018-06-06  9:13   ` Marc Zyngier
2018-06-07 12:25   ` Hanjun Guo
2018-06-07 12:25     ` Hanjun Guo
2018-06-10 10:40     ` Marc Zyngier
2018-06-10 10:40       ` Marc Zyngier
2018-06-11  9:31 ` Marc Zyngier
2018-06-11  9:31   ` Marc Zyngier
2018-06-11 11:23   ` Yang Yingliang [this message]
2018-06-11 11:23     ` Yang Yingliang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B1E5BBE.9010907@huawei.com \
    --to=yangyingliang@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.