From: Marc Zyngier <maz@kernel.org>
To: John Garry <john.garry@huawei.com>
Cc: Jason Cooper <jason@lakedaemon.net>,
chenxiang <chenxiang66@hisilicon.com>,
Will Deacon <will@kernel.org>,
luojiaxing@huawei.com, linux-kernel@vger.kernel.org,
Ming Lei <ming.lei@redhat.com>,
Zhou Wang <wangzhou1@hisilicon.com>,
Thomas Gleixner <tglx@linutronix.de>,
Robin Murphy <robin.murphy@arm.com>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across CPUs
Date: Fri, 15 May 2020 11:14:39 +0100 [thread overview]
Message-ID: <668f819c8747104814245cd6faebdd9a@kernel.org> (raw)
In-Reply-To: <7c05b08b-2edc-7f97-0175-898e9772673e@huawei.com>
Hi John,
On 2020-05-14 13:05, John Garry wrote:
>>
>> + its_inc_lpi_count(d, cpu);
>> +
>> return IRQ_SET_MASK_OK_DONE;
>> }
>>
>> Results look ok:
>> nvme.use_threaded_interrupts=1 =0*
>> Before 950K IOPs 1000K IOPs
>> After 1100K IOPs 1150K IOPs
>>
>> * as mentioned before, this is quite unstable and causes lockups.
>> JFYI, there was an attempt to fix this:
>>
>> https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@kernel.org/
>>
>
> Hi Marc,
>
> Just wondering if we can try to get this series over the line?
Absolutely. Life has got in the way, so let me page it back in...
> So I tested the patches on v5.7-rc5, and get similar performance
> improvement to above.
>
> I did apply a couple of patches, below, to remedy the issues I
> experienced for my D06CS.
Comments on that below.
>
> Thanks,
> John
>
>
> ---->8
>
>
> [PATCH 1/2] irqchip/gic-v3-its: Don't double account for target CPU
> assignment
>
> In its_set_affinity(), when a managed irq is already assigned to a CPU,
> we may needlessly reassign the irq to another CPU.
>
> This is because when selecting the target CPU, being the least loaded
> CPU in the mask, we account of that irq still being assigned to a CPU;
> thereby we may unfairly select another CPU.
>
> Modify this behaviour to pre-decrement the current target CPU LPI count
> when finding the least loaded CPU.
>
> Alternatively we may be able to just bail out early when the current
> target CPU already falls within the requested mask.
>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index 73f5c12..2b18feb 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1636,6 +1636,8 @@ static int its_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val,
> if (irqd_is_forwarded_to_vcpu(d))
> return -EINVAL;
>
> + its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
> +
> if (!force)
> cpu = its_select_cpu(d, mask_val);
> else
> @@ -1646,14 +1648,14 @@ static int its_set_affinity(struct irq_data
> *d, const struct cpumask *mask_val,
>
> /* don't set the affinity when the target cpu is same as current one
> */
> if (cpu != its_dev->event_map.col_map[id]) {
> - its_inc_lpi_count(d, cpu);
> - its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
> target_col = &its_dev->its->collections[cpu];
> its_send_movi(its_dev, target_col, id);
> its_dev->event_map.col_map[id] = cpu;
> irq_data_update_effective_affinity(d, cpumask_of(cpu));
> }
>
> + its_inc_lpi_count(d, cpu);
> +
I'm OK with that change, as it removes unnecessary churn.
> return IRQ_SET_MASK_OK_DONE;
> }
>
> ---
>
>
> [PATCH 2/2] irqchip/gic-v3-its: Handle no overlap of non-managed irq
> affinity mask
>
> In selecting the target CPU for a non-managed interrupt, we may select
> a
> target CPU outside the requested affinity mask.
>
> This is because there may be no overlap of the ITS node mask and the
> requested CPU affinity mask. The requested affinity mask may be coming
> from userspace or some drivers which try to set irq affinity, see [0].
>
> In this case, just ignore the ITS node cpumask. This is a deviation
> from
> what Thomas described. Having said that, I am not sure if the
> interrupt is ever bound to a node for us.
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/hisilicon/hisi_uncore_pmu.c#n417
>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index 2b18feb..12d5d4b4 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1584,10 +1584,6 @@ static int its_select_cpu(struct irq_data *d,
> cpumask_and(tmpmask, cpumask_of_node(node), aff_mask);
> cpumask_and(tmpmask, tmpmask, cpu_online_mask);
>
> - /* If that doesn't work, try the nodemask itself */
> - if (cpumask_empty(tmpmask))
> - cpumask_and(tmpmask, cpumask_of_node(node), cpu_online_mask);
> -
> cpu = cpumask_pick_least_loaded(d, tmpmask);
> if (cpu < nr_cpu_ids)
> goto out;
I'm really not sure. Shouldn't we then drop the wider search on
cpu_inline_mask, because userspace could have given us something
that we cannot deal with?
What you are advocating for is a strict adherence to the provided
mask, and it doesn't seem to be what other architectures are providing.
I consider the userspace-provided affinity as a hint more that anything
else, as in this case the kernel does know better (routing the interrupt
to a foreign node might be costly, or even impossible, see the TX1
erratum).
From what I remember of the earlier discussion, you saw an issue on
a system with two sockets and a single ITS, with the node mask limited
to the first socket. Is that correct?
I'll respin the series today and report it with you first patch
squased in.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: John Garry <john.garry@huawei.com>
Cc: linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
Jason Cooper <jason@lakedaemon.net>,
chenxiang <chenxiang66@hisilicon.com>,
Robin Murphy <robin.murphy@arm.com>,
luojiaxing@huawei.com, Ming Lei <ming.lei@redhat.com>,
Zhou Wang <wangzhou1@hisilicon.com>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across CPUs
Date: Fri, 15 May 2020 11:14:39 +0100 [thread overview]
Message-ID: <668f819c8747104814245cd6faebdd9a@kernel.org> (raw)
In-Reply-To: <7c05b08b-2edc-7f97-0175-898e9772673e@huawei.com>
Hi John,
On 2020-05-14 13:05, John Garry wrote:
>>
>> + its_inc_lpi_count(d, cpu);
>> +
>> return IRQ_SET_MASK_OK_DONE;
>> }
>>
>> Results look ok:
>> nvme.use_threaded_interrupts=1 =0*
>> Before 950K IOPs 1000K IOPs
>> After 1100K IOPs 1150K IOPs
>>
>> * as mentioned before, this is quite unstable and causes lockups.
>> JFYI, there was an attempt to fix this:
>>
>> https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@kernel.org/
>>
>
> Hi Marc,
>
> Just wondering if we can try to get this series over the line?
Absolutely. Life has got in the way, so let me page it back in...
> So I tested the patches on v5.7-rc5, and get similar performance
> improvement to above.
>
> I did apply a couple of patches, below, to remedy the issues I
> experienced for my D06CS.
Comments on that below.
>
> Thanks,
> John
>
>
> ---->8
>
>
> [PATCH 1/2] irqchip/gic-v3-its: Don't double account for target CPU
> assignment
>
> In its_set_affinity(), when a managed irq is already assigned to a CPU,
> we may needlessly reassign the irq to another CPU.
>
> This is because when selecting the target CPU, being the least loaded
> CPU in the mask, we account of that irq still being assigned to a CPU;
> thereby we may unfairly select another CPU.
>
> Modify this behaviour to pre-decrement the current target CPU LPI count
> when finding the least loaded CPU.
>
> Alternatively we may be able to just bail out early when the current
> target CPU already falls within the requested mask.
>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index 73f5c12..2b18feb 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1636,6 +1636,8 @@ static int its_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val,
> if (irqd_is_forwarded_to_vcpu(d))
> return -EINVAL;
>
> + its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
> +
> if (!force)
> cpu = its_select_cpu(d, mask_val);
> else
> @@ -1646,14 +1648,14 @@ static int its_set_affinity(struct irq_data
> *d, const struct cpumask *mask_val,
>
> /* don't set the affinity when the target cpu is same as current one
> */
> if (cpu != its_dev->event_map.col_map[id]) {
> - its_inc_lpi_count(d, cpu);
> - its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
> target_col = &its_dev->its->collections[cpu];
> its_send_movi(its_dev, target_col, id);
> its_dev->event_map.col_map[id] = cpu;
> irq_data_update_effective_affinity(d, cpumask_of(cpu));
> }
>
> + its_inc_lpi_count(d, cpu);
> +
I'm OK with that change, as it removes unnecessary churn.
> return IRQ_SET_MASK_OK_DONE;
> }
>
> ---
>
>
> [PATCH 2/2] irqchip/gic-v3-its: Handle no overlap of non-managed irq
> affinity mask
>
> In selecting the target CPU for a non-managed interrupt, we may select
> a
> target CPU outside the requested affinity mask.
>
> This is because there may be no overlap of the ITS node mask and the
> requested CPU affinity mask. The requested affinity mask may be coming
> from userspace or some drivers which try to set irq affinity, see [0].
>
> In this case, just ignore the ITS node cpumask. This is a deviation
> from
> what Thomas described. Having said that, I am not sure if the
> interrupt is ever bound to a node for us.
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/hisilicon/hisi_uncore_pmu.c#n417
>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index 2b18feb..12d5d4b4 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1584,10 +1584,6 @@ static int its_select_cpu(struct irq_data *d,
> cpumask_and(tmpmask, cpumask_of_node(node), aff_mask);
> cpumask_and(tmpmask, tmpmask, cpu_online_mask);
>
> - /* If that doesn't work, try the nodemask itself */
> - if (cpumask_empty(tmpmask))
> - cpumask_and(tmpmask, cpumask_of_node(node), cpu_online_mask);
> -
> cpu = cpumask_pick_least_loaded(d, tmpmask);
> if (cpu < nr_cpu_ids)
> goto out;
I'm really not sure. Shouldn't we then drop the wider search on
cpu_inline_mask, because userspace could have given us something
that we cannot deal with?
What you are advocating for is a strict adherence to the provided
mask, and it doesn't seem to be what other architectures are providing.
I consider the userspace-provided affinity as a hint more that anything
else, as in this case the kernel does know better (routing the interrupt
to a foreign node might be costly, or even impossible, see the TX1
erratum).
From what I remember of the earlier discussion, you saw an issue on
a system with two sockets and a single ITS, with the node mask limited
to the first socket. Is that correct?
I'll respin the series today and report it with you first patch
squased in.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
next prev parent reply other threads:[~2020-05-15 10:14 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-16 11:54 [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across CPUs Marc Zyngier
2020-03-16 11:54 ` Marc Zyngier
2020-03-16 11:54 ` [PATCH v3 1/2] irqchip/gic-v3-its: Track LPI distribution on a per CPU basis Marc Zyngier
2020-03-16 11:54 ` Marc Zyngier
2020-03-16 11:54 ` [PATCH v3 2/2] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs Marc Zyngier
2020-03-16 11:54 ` Marc Zyngier
2020-03-16 13:02 ` John Garry
2020-03-16 13:02 ` John Garry
2020-03-16 13:14 ` Marc Zyngier
2020-03-16 13:14 ` Marc Zyngier
2020-03-17 18:43 ` John Garry
2020-03-17 18:43 ` John Garry
2020-03-18 14:16 ` Marc Zyngier
2020-03-18 14:16 ` Marc Zyngier
2020-03-18 14:25 ` John Garry
2020-03-18 14:25 ` John Garry
2020-03-18 12:22 ` John Garry
2020-03-18 12:22 ` John Garry
2020-03-18 14:04 ` Marc Zyngier
2020-03-18 14:04 ` Marc Zyngier
2020-03-18 15:34 ` John Garry
2020-03-18 15:34 ` John Garry
2020-03-18 17:30 ` Marc Zyngier
2020-03-18 17:30 ` Marc Zyngier
2020-03-18 19:00 ` John Garry
2020-03-18 19:00 ` John Garry
2020-03-27 17:52 ` John Garry
2020-03-27 17:52 ` John Garry
2020-03-19 12:31 ` [PATCH v3 0/2] irqchip/gic-v3-its: Balance " John Garry
2020-03-19 12:31 ` John Garry
2020-03-27 17:47 ` John Garry
2020-03-27 17:47 ` John Garry
2020-04-01 11:33 ` John Garry
2020-04-01 11:33 ` John Garry
2020-05-14 12:05 ` John Garry
2020-05-14 12:05 ` John Garry
2020-05-15 10:14 ` Marc Zyngier [this message]
2020-05-15 10:14 ` Marc Zyngier
2020-05-15 11:50 ` John Garry
2020-05-15 11:50 ` John Garry
2020-05-15 15:37 ` Marc Zyngier
2020-05-15 15:37 ` Marc Zyngier
2020-05-15 16:15 ` John Garry
2020-05-15 16:15 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=668f819c8747104814245cd6faebdd9a@kernel.org \
--to=maz@kernel.org \
--cc=chenxiang66@hisilicon.com \
--cc=jason@lakedaemon.net \
--cc=john.garry@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luojiaxing@huawei.com \
--cc=ming.lei@redhat.com \
--cc=robin.murphy@arm.com \
--cc=tglx@linutronix.de \
--cc=wangzhou1@hisilicon.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.