[PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances
@ 2022-11-28 17:08 Luiz Capitulino
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 1/4] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Luiz Capitulino
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Luiz Capitulino @ 2022-11-28 17:08 UTC (permalink / raw)
  To: stable, maz; +Cc: tglx, lcapitulino, Luiz Capitulino

Hi,

[ Marc, can you help reviewing? Esp. the first patch? ]

This series of backports from upstream to stable 5.15 and 5.10 fixes an issue
we're seeing on AWS ARM instances where attaching an EBS volume (which is a
nvme device) to the instance after offlining CPUs causes the device to take
several minutes to show up and eventually nvme kworkers and other threads start
getting stuck.

This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it
on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.

An easy reproducer is:

1. Start an ARM instance with 32 CPUs
2. Once the instance is booted, offline all CPUs but CPU 0. Eg:
   # for i in $(seq 1 32); do chcpu -d $i; done
3. Once the CPUs are offline, attach an EBS volume
4. Watch lsblk and dmesg in the instance

Eventually, you get this stack trace:

[   71.842974] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802
[   71.843966] pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff]
[   71.845149] pci 0000:00:1f.0: PME# supported from D0 D1 D2 D3hot D3cold
[   71.846694] pci 0000:00:1f.0: BAR 0: assigned [mem 0x8011c000-0x8011ffff]
[   71.848458] ACPI: \_SB_.PCI0.GSI3: Enabled at IRQ 38
[   71.850852] nvme nvme1: pci function 0000:00:1f.0
[   71.851611] nvme 0000:00:1f.0: enabling device (0000 -> 0002)
[  135.887787] nvme nvme1: I/O 22 QID 0 timeout, completion polled
[  197.328276] nvme nvme1: I/O 23 QID 0 timeout, completion polled
[  197.329221] nvme nvme1: 1/0/0 default/read/poll queues
[  243.408619] INFO: task kworker/u64:2:275 blocked for more than 122 seconds.
[  243.409674]       Not tainted 5.15.79 #1
[  243.410270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.411389] task:kworker/u64:2   state:D stack:    0 pid:  275 ppid:     2 flags:0x00000008
[  243.412602] Workqueue: events_unbound async_run_entry_fn
[  243.413417] Call trace:
[  243.413797]  __switch_to+0x15c/0x1a4
[  243.414335]  __schedule+0x2bc/0x990
[  243.414849]  schedule+0x68/0xf8
[  243.415334]  schedule_timeout+0x184/0x340
[  243.415946]  wait_for_completion+0xc8/0x220
[  243.416543]  __flush_work.isra.43+0x240/0x2f0
[  243.417179]  flush_work+0x20/0x2c
[  243.417666]  nvme_async_probe+0x20/0x3c
[  243.418228]  async_run_entry_fn+0x3c/0x1e0
[  243.418858]  process_one_work+0x1bc/0x460
[  243.419437]  worker_thread+0x164/0x528
[  243.420030]  kthread+0x118/0x124
[  243.420517]  ret_from_fork+0x10/0x20
[  258.768771] nvme nvme1: I/O 20 QID 0 timeout, completion polled
[  320.209266] nvme nvme1: I/O 21 QID 0 timeout, completion polled

For completion, I tested the same test-case on x86 with this series applied
on 5.15.79 and 5.10.155 as well. It works as expected.

Thanks,

Marc Zyngier (4):
  genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  genirq: Always limit the affinity to online CPUs
  irqchip/gic-v3: Always trust the managed affinity provided by the core
    code
  genirq: Take the proposed affinity at face value if force==true

 drivers/irqchip/irq-gic-v3-its.c |  2 +-
 kernel/irq/manage.c              | 31 +++++++++++++++++++++++--------
 kernel/irq/msi.c                 |  7 +++++++
 3 files changed, 31 insertions(+), 9 deletions(-)

-- 
2.37.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATH stable 5.15,5.10 1/4] genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  2022-11-28 17:08 [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Luiz Capitulino
@ 2022-11-28 17:08 ` Luiz Capitulino
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 2/4] genirq: Always limit the affinity to online CPUs Luiz Capitulino
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Luiz Capitulino @ 2022-11-28 17:08 UTC (permalink / raw)
  To: stable, maz
  Cc: tglx, lcapitulino, John Garry, David Decotigny, Luiz Capitulino

From: Marc Zyngier <maz@kernel.org>

commit d802057c7c553ad426520a053da9f9fe08e2c35a upstream.

[ This commit is almost a rewrite because it conflicts with Thomas
  Gleixner's refactoring of this code in v5.17-rc1. I wasn't sure if
  I should drop all the s-o-bs (including Mark's), but decided
  to keep as the original commit ]

When booting with maxcpus=<small number>, interrupt controllers
such as the GICv3 ITS may not be able to satisfy the affinity of
some managed interrupts, as some of the HW resources are simply
not available.

The same thing happens when loading a driver using managed interrupts
while CPUs are offline.

In order to deal with this, do not try to activate such interrupt
if there is no online CPU capable of handling it. Instead, place
it in shutdown state. Once a capable CPU shows up, it will be
activated.

Reported-by: John Garry <john.garry@huawei.com>
Reported-by: David Decotigny <ddecotig@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20220405185040.206297-2-maz@kernel.org

Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
---
 kernel/irq/msi.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7f350ae59c5f..d75586dc584f 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -596,6 +596,13 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
 			irqd_clr_can_reserve(irq_data);
 			if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK)
 				irqd_set_msi_nomask_quirk(irq_data);
+			if ((info->flags & MSI_FLAG_ACTIVATE_EARLY) &&
+				irqd_affinity_is_managed(irq_data) &&
+				!cpumask_intersects(irq_data_get_affinity_mask(irq_data),
+						    cpu_online_mask)) {
+				irqd_set_managed_shutdown(irq_data);
+				continue;
+			}
 		}
 		ret = irq_domain_activate_irq(irq_data, can_reserve);
 		if (ret)
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATH stable 5.15,5.10 2/4] genirq: Always limit the affinity to online CPUs
  2022-11-28 17:08 [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Luiz Capitulino
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 1/4] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Luiz Capitulino
@ 2022-11-28 17:08 ` Luiz Capitulino
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 3/4] irqchip/gic-v3: Always trust the managed affinity provided by the core code Luiz Capitulino
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Luiz Capitulino @ 2022-11-28 17:08 UTC (permalink / raw)
  To: stable, maz; +Cc: tglx, lcapitulino, Luiz Capitulino

From: Marc Zyngier <maz@kernel.org>

commit 33de0aa4bae982ed6f7c777f86b5af3e627ac937 upstream.

[ Fixed small conflicts due to the HK_FLAG_MANAGED_IRQ flag been
  renamed on upstream ]

When booting with maxcpus=<small number> (or even loading a driver
while most CPUs are offline), it is pretty easy to observe managed
affinities containing a mix of online and offline CPUs being passed
to the irqchip driver.

This means that the irqchip cannot trust the affinity passed down
from the core code, which is a bit annoying and requires (at least
in theory) all drivers to implement some sort of affinity narrowing.

In order to address this, always limit the cpumask to the set of
online CPUs.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220405185040.206297-3-maz@kernel.org

Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
---
 kernel/irq/manage.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0c3c26fb054f..a1727cdaebed 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -222,11 +222,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 {
 	struct irq_desc *desc = irq_data_to_desc(data);
 	struct irq_chip *chip = irq_data_get_irq_chip(data);
+	const struct cpumask  *prog_mask;
 	int ret;
 
+	static DEFINE_RAW_SPINLOCK(tmp_mask_lock);
+	static struct cpumask tmp_mask;
+
 	if (!chip || !chip->irq_set_affinity)
 		return -EINVAL;
 
+	raw_spin_lock(&tmp_mask_lock);
 	/*
 	 * If this is a managed interrupt and housekeeping is enabled on
 	 * it check whether the requested affinity mask intersects with
@@ -248,24 +253,28 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	 */
 	if (irqd_affinity_is_managed(data) &&
 	    housekeeping_enabled(HK_FLAG_MANAGED_IRQ)) {
-		const struct cpumask *hk_mask, *prog_mask;
-
-		static DEFINE_RAW_SPINLOCK(tmp_mask_lock);
-		static struct cpumask tmp_mask;
+		const struct cpumask *hk_mask;
 
 		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
 
-		raw_spin_lock(&tmp_mask_lock);
 		cpumask_and(&tmp_mask, mask, hk_mask);
 		if (!cpumask_intersects(&tmp_mask, cpu_online_mask))
 			prog_mask = mask;
 		else
 			prog_mask = &tmp_mask;
-		ret = chip->irq_set_affinity(data, prog_mask, force);
-		raw_spin_unlock(&tmp_mask_lock);
 	} else {
-		ret = chip->irq_set_affinity(data, mask, force);
+		prog_mask = mask;
 	}
+
+	/* Make sure we only provide online CPUs to the irqchip */
+	cpumask_and(&tmp_mask, prog_mask, cpu_online_mask);
+	if (!cpumask_empty(&tmp_mask))
+		ret = chip->irq_set_affinity(data, &tmp_mask, force);
+	else
+		ret = -EINVAL;
+
+	raw_spin_unlock(&tmp_mask_lock);
+
 	switch (ret) {
 	case IRQ_SET_MASK_OK:
 	case IRQ_SET_MASK_OK_DONE:
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATH stable 5.15,5.10 3/4] irqchip/gic-v3: Always trust the managed affinity provided by the core code
  2022-11-28 17:08 [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Luiz Capitulino
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 1/4] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Luiz Capitulino
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 2/4] genirq: Always limit the affinity to online CPUs Luiz Capitulino
@ 2022-11-28 17:08 ` Luiz Capitulino
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 4/4] genirq: Take the proposed affinity at face value if force==true Luiz Capitulino
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Luiz Capitulino @ 2022-11-28 17:08 UTC (permalink / raw)
  To: stable, maz; +Cc: tglx, lcapitulino, Luiz Capitulino

From: Marc Zyngier <maz@kernel.org>

commit 3f893a5962d31c0164efdbf6174ed0784f1d7603 upstream.

Now that the core code has been fixed to always give us an affinity
that only includes online CPUs, directly use this affinity when
computing a target CPU.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220405185040.206297-4-maz@kernel.org

Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fc1bfffc468f..59a5d06b2d3e 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d,
 
 		cpu = cpumask_pick_least_loaded(d, tmpmask);
 	} else {
-		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+		cpumask_copy(tmpmask, aff_mask);
 
 		/* If we cannot cross sockets, limit the search to that node */
 		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATH stable 5.15,5.10 4/4] genirq: Take the proposed affinity at face value if force==true
  2022-11-28 17:08 [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Luiz Capitulino
                   ` (2 preceding siblings ...)
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 3/4] irqchip/gic-v3: Always trust the managed affinity provided by the core code Luiz Capitulino
@ 2022-11-28 17:08 ` Luiz Capitulino
  2022-11-28 17:53 ` [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Marc Zyngier
  2022-11-30 17:11 ` [PATH stable 5.15,5.10 " Greg KH
  5 siblings, 0 replies; 9+ messages in thread
From: Luiz Capitulino @ 2022-11-28 17:08 UTC (permalink / raw)
  To: stable, maz; +Cc: tglx, lcapitulino, Marek Szyprowski, Luiz Capitulino

From: Marc Zyngier <maz@kernel.org>

commit c48c8b829d2b966a6649827426bcdba082ccf922 upstream.

Although setting the affinity of an interrupt to a set of CPUs that doesn't
have any online CPU is generally frowned apon, there are a few limited
cases where such affinity is set from a CPUHP notifier, setting the
affinity to a CPU that isn't online yet.

The saving grace is that this is always done using the 'force' attribute,
which gives a hint that the affinity setting can be outside of the online
CPU mask and the callsite set this flag with the knowledge that the
underlying interrupt controller knows to handle it.

This restores the expected behaviour on Marek's system.

Fixes: 33de0aa4bae9 ("genirq: Always limit the affinity to online CPUs")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/4b7fc13c-887b-a664-26e8-45aed13f048a@samsung.com
Link: https://lore.kernel.org/r/20220414140011.541725-1-maz@kernel.org

Signed-off-by: Luiz Capitulino <luizcap@amazon.com>
---
 kernel/irq/manage.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index a1727cdaebed..9862372e0f01 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -266,10 +266,16 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 		prog_mask = mask;
 	}
 
-	/* Make sure we only provide online CPUs to the irqchip */
+	/*
+	 * Make sure we only provide online CPUs to the irqchip,
+	 * unless we are being asked to force the affinity (in which
+	 * case we do as we are told).
+	 */
 	cpumask_and(&tmp_mask, prog_mask, cpu_online_mask);
-	if (!cpumask_empty(&tmp_mask))
+	if (!force && !cpumask_empty(&tmp_mask))
 		ret = chip->irq_set_affinity(data, &tmp_mask, force);
+	else if (force)
+		ret = chip->irq_set_affinity(data, mask, force);
 	else
 		ret = -EINVAL;
 
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances
  2022-11-28 17:08 [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Luiz Capitulino
                   ` (3 preceding siblings ...)
  2022-11-28 17:08 ` [PATH stable 5.15,5.10 4/4] genirq: Take the proposed affinity at face value if force==true Luiz Capitulino
@ 2022-11-28 17:53 ` Marc Zyngier
  2022-11-28 18:27   ` [PATH stable 5.15, 5.10 " Luiz Capitulino
  2022-11-30 17:11 ` [PATH stable 5.15,5.10 " Greg KH
  5 siblings, 1 reply; 9+ messages in thread
From: Marc Zyngier @ 2022-11-28 17:53 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: stable, tglx, lcapitulino

On Mon, 28 Nov 2022 17:08:31 +0000,
Luiz Capitulino <luizcap@amazon.com> wrote:
> 
> Hi,
> 
> [ Marc, can you help reviewing? Esp. the first patch? ]
> 
> This series of backports from upstream to stable 5.15 and 5.10 fixes an issue
> we're seeing on AWS ARM instances where attaching an EBS volume (which is a
> nvme device) to the instance after offlining CPUs causes the device to take
> several minutes to show up and eventually nvme kworkers and other threads start
> getting stuck.
> 
> This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it
> on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.

That's because x86 has a very different allocation policy compared to
what the ITS does. The x86 vector space is tiny, so vectors are only
allocated when required. In your case, that's when the CPUs are
onlined.

With the ITS, all the vectors are allocated upfront, as this is
essentially free. But in the case of managed interrupts, these vectors
are now pointing to offline CPUs. The ITS tries to fix that, but
doesn't nearly have enough information. And the correct course of
action is to keep these interrupts in the shutdown state, which is
what the series is doing.

>
> An easy reproducer is:
> 
> 1. Start an ARM instance with 32 CPUs

To satisfy my own curiosity, is that in a guest or bare metal? It
shouldn't make any difference, but hey...

Anyway, patch #1 looks OK to me, but I haven't tried to dig further
into something that is "oh so last year" ;-). Specially as we're
rewriting the whole of the MSI stack! FWIW:

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATH stable 5.15, 5.10 0/4] Fix EBS volume attach on AWS ARM instances
  2022-11-28 17:53 ` [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Marc Zyngier
@ 2022-11-28 18:27   ` Luiz Capitulino
  2022-11-30  3:12     ` Luiz Capitulino
  0 siblings, 1 reply; 9+ messages in thread
From: Luiz Capitulino @ 2022-11-28 18:27 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: stable, tglx, lcapitulino


On 2022-11-28 12:53, Marc Zyngier wrote:
> On Mon, 28 Nov 2022 17:08:31 +0000,
> Luiz Capitulino <luizcap@amazon.com> wrote:
>> Hi,
>>
>> [ Marc, can you help reviewing? Esp. the first patch? ]
>>
>> This series of backports from upstream to stable 5.15 and 5.10 fixes an issue
>> we're seeing on AWS ARM instances where attaching an EBS volume (which is a
>> nvme device) to the instance after offlining CPUs causes the device to take
>> several minutes to show up and eventually nvme kworkers and other threads start
>> getting stuck.
>>
>> This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it
>> on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.
> That's because x86 has a very different allocation policy compared to
> what the ITS does. The x86 vector space is tiny, so vectors are only
> allocated when required. In your case, that's when the CPUs are
> onlined.
>
> With the ITS, all the vectors are allocated upfront, as this is
> essentially free. But in the case of managed interrupts, these vectors
> are now pointing to offline CPUs. The ITS tries to fix that, but
> doesn't nearly have enough information. And the correct course of
> action is to keep these interrupts in the shutdown state, which is
> what the series is doing.

Thank you for the explanation, Marc. I also immensely

appreciate the super fast response! (more below).


>
>> An easy reproducer is:
>>
>> 1. Start an ARM instance with 32 CPUs
> To satisfy my own curiosity, is that in a guest or bare metal? It
> shouldn't make any difference, but hey...

This is a guest. I'll test on a bare-metal instance, it may

take a few hours. I'll reply here.


> Anyway, patch #1 looks OK to me, but I haven't tried to dig further
> into something that is "oh so last year" ;-). Specially as we're
> rewriting the whole of the MSI stack! FWIW:
>
> Acked-by: Marc Zyngier <maz@kernel.org>

Thank you again, Marc!


>
>          M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATH stable 5.15, 5.10 0/4] Fix EBS volume attach on AWS ARM instances
  2022-11-28 18:27   ` [PATH stable 5.15, 5.10 " Luiz Capitulino
@ 2022-11-30  3:12     ` Luiz Capitulino
  0 siblings, 0 replies; 9+ messages in thread
From: Luiz Capitulino @ 2022-11-30  3:12 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: stable, tglx, lcapitulino



On 2022-11-28 13:27, Luiz Capitulino wrote:
> 
> On 2022-11-28 12:53, Marc Zyngier wrote:
>> On Mon, 28 Nov 2022 17:08:31 +0000,
>> Luiz Capitulino <luizcap@amazon.com> wrote:
>>> Hi,
>>>
>>> [ Marc, can you help reviewing? Esp. the first patch? ]
>>>
>>> This series of backports from upstream to stable 5.15 and 5.10 fixes 
>>> an issue
>>> we're seeing on AWS ARM instances where attaching an EBS volume 
>>> (which is a
>>> nvme device) to the instance after offlining CPUs causes the device 
>>> to take
>>> several minutes to show up and eventually nvme kworkers and other 
>>> threads start
>>> getting stuck.
>>>
>>> This series fixes the issue for 5.15.79 and 5.10.155. I can't 
>>> reproduce it
>>> on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.
>> That's because x86 has a very different allocation policy compared to
>> what the ITS does. The x86 vector space is tiny, so vectors are only
>> allocated when required. In your case, that's when the CPUs are
>> onlined.
>>
>> With the ITS, all the vectors are allocated upfront, as this is
>> essentially free. But in the case of managed interrupts, these vectors
>> are now pointing to offline CPUs. The ITS tries to fix that, but
>> doesn't nearly have enough information. And the correct course of
>> action is to keep these interrupts in the shutdown state, which is
>> what the series is doing.
> 
> Thank you for the explanation, Marc. I also immensely
> 
> appreciate the super fast response! (more below).
> 
> 
>>
>>> An easy reproducer is:
>>>
>>> 1. Start an ARM instance with 32 CPUs
>> To satisfy my own curiosity, is that in a guest or bare metal? It
>> shouldn't make any difference, but hey...
> 
> This is a guest. I'll test on a bare-metal instance, it may
> 
> take a few hours. I'll reply here.

I was able to test this on a bare-metal instance on both arm64 and x86 
with and without this series. It all works as expected.

The only difference in that on the arm64 bare-metal instance, I get
a PCI error on an unfixed kernel (below) and the system never hangs
(whereas on a guest, I get no PCI error and eventually threads start
hanging).

This series fixes this case too and the device is added as expected
on a fixed kernel.

So, all seems good!

[  162.618277] pcieport 0000:14:06.0: bridge window [io  0x1000-0x0fff] 
to [bus 1b] add_size 1000
[  162.618905] pcieport 0000:14:06.0: BAR 13: no space for [io  size 0x1000]
[  162.619398] pcieport 0000:14:06.0: BAR 13: failed to assign [io  size 
0x1000]
[  162.619916] pcieport 0000:14:06.0: BAR 13: no space for [io  size 0x1000]
[  162.620410] pcieport 0000:14:06.0: BAR 13: failed to assign [io  size 
0x1000]
[  162.620929] pci 0000:1b:00.0: BAR 0: assigned [mem 
0x83200000-0x833fffff 64bit]
[  162.621472] pcieport 0000:14:06.0: PCI bridge to [bus 1b]
[  162.621872] pcieport 0000:14:06.0:   bridge window [mem 
0x83200000-0x833fffff]
[  162.622398] pcieport 0000:14:06.0:   bridge window [mem 
0x18019000000-0x18019ffffff 64bit pref]
[  162.623411] nvme 0000:1b:00.0: Adding to iommu group 56
[  162.624081] nvme nvme2: pci function 0000:1b:00.0
[  162.624455] nvme 0000:1b:00.0: enabling device (0000 -> 0002)
[  162.627776] nvme nvme2: Removing after probe failure status: -5
[  187.396805] nvme nvme1: I/O 3 QID 0 timeout, reset controller
[  187.399390] nvme nvme1: Identify namespace failed (-4)
[  187.429068] nvme nvme1: Removing after probe failure status: -5

> 
> 
>> Anyway, patch #1 looks OK to me, but I haven't tried to dig further
>> into something that is "oh so last year" ;-). Specially as we're
>> rewriting the whole of the MSI stack! FWIW:
>>
>> Acked-by: Marc Zyngier <maz@kernel.org>
> 
> Thank you again, Marc!
> 
> 
>>
>>          M.
>>
>> -- 
>> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances
  2022-11-28 17:08 [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Luiz Capitulino
                   ` (4 preceding siblings ...)
  2022-11-28 17:53 ` [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Marc Zyngier
@ 2022-11-30 17:11 ` Greg KH
  5 siblings, 0 replies; 9+ messages in thread
From: Greg KH @ 2022-11-30 17:11 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: stable, maz, tglx, lcapitulino

On Mon, Nov 28, 2022 at 05:08:31PM +0000, Luiz Capitulino wrote:
> Hi,
> 
> [ Marc, can you help reviewing? Esp. the first patch? ]
> 
> This series of backports from upstream to stable 5.15 and 5.10 fixes an issue
> we're seeing on AWS ARM instances where attaching an EBS volume (which is a
> nvme device) to the instance after offlining CPUs causes the device to take
> several minutes to show up and eventually nvme kworkers and other threads start
> getting stuck.
> 
> This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it
> on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.
> 
> An easy reproducer is:
> 
> 1. Start an ARM instance with 32 CPUs
> 2. Once the instance is booted, offline all CPUs but CPU 0. Eg:
>    # for i in $(seq 1 32); do chcpu -d $i; done
> 3. Once the CPUs are offline, attach an EBS volume
> 4. Watch lsblk and dmesg in the instance
> 
> Eventually, you get this stack trace:
> 
> [   71.842974] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802
> [   71.843966] pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff]
> [   71.845149] pci 0000:00:1f.0: PME# supported from D0 D1 D2 D3hot D3cold
> [   71.846694] pci 0000:00:1f.0: BAR 0: assigned [mem 0x8011c000-0x8011ffff]
> [   71.848458] ACPI: \_SB_.PCI0.GSI3: Enabled at IRQ 38
> [   71.850852] nvme nvme1: pci function 0000:00:1f.0
> [   71.851611] nvme 0000:00:1f.0: enabling device (0000 -> 0002)
> [  135.887787] nvme nvme1: I/O 22 QID 0 timeout, completion polled
> [  197.328276] nvme nvme1: I/O 23 QID 0 timeout, completion polled
> [  197.329221] nvme nvme1: 1/0/0 default/read/poll queues
> [  243.408619] INFO: task kworker/u64:2:275 blocked for more than 122 seconds.
> [  243.409674]       Not tainted 5.15.79 #1
> [  243.410270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.411389] task:kworker/u64:2   state:D stack:    0 pid:  275 ppid:     2 flags:0x00000008
> [  243.412602] Workqueue: events_unbound async_run_entry_fn
> [  243.413417] Call trace:
> [  243.413797]  __switch_to+0x15c/0x1a4
> [  243.414335]  __schedule+0x2bc/0x990
> [  243.414849]  schedule+0x68/0xf8
> [  243.415334]  schedule_timeout+0x184/0x340
> [  243.415946]  wait_for_completion+0xc8/0x220
> [  243.416543]  __flush_work.isra.43+0x240/0x2f0
> [  243.417179]  flush_work+0x20/0x2c
> [  243.417666]  nvme_async_probe+0x20/0x3c
> [  243.418228]  async_run_entry_fn+0x3c/0x1e0
> [  243.418858]  process_one_work+0x1bc/0x460
> [  243.419437]  worker_thread+0x164/0x528
> [  243.420030]  kthread+0x118/0x124
> [  243.420517]  ret_from_fork+0x10/0x20
> [  258.768771] nvme nvme1: I/O 20 QID 0 timeout, completion polled
> [  320.209266] nvme nvme1: I/O 21 QID 0 timeout, completion polled
> 
> For completion, I tested the same test-case on x86 with this series applied
> on 5.15.79 and 5.10.155 as well. It works as expected.

All now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-11-30 17:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-28 17:08 [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Luiz Capitulino
2022-11-28 17:08 ` [PATH stable 5.15,5.10 1/4] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Luiz Capitulino
2022-11-28 17:08 ` [PATH stable 5.15,5.10 2/4] genirq: Always limit the affinity to online CPUs Luiz Capitulino
2022-11-28 17:08 ` [PATH stable 5.15,5.10 3/4] irqchip/gic-v3: Always trust the managed affinity provided by the core code Luiz Capitulino
2022-11-28 17:08 ` [PATH stable 5.15,5.10 4/4] genirq: Take the proposed affinity at face value if force==true Luiz Capitulino
2022-11-28 17:53 ` [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Marc Zyngier
2022-11-28 18:27   ` [PATH stable 5.15, 5.10 " Luiz Capitulino
2022-11-30  3:12     ` Luiz Capitulino
2022-11-30 17:11 ` [PATH stable 5.15,5.10 " Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.