All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <greg@kroah.com>
To: Luiz Capitulino <luizcap@amazon.com>
Cc: stable@vger.kernel.org, maz@kernel.org, tglx@linutronix.de,
	lcapitulino@gmail.com
Subject: Re: [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances
Date: Wed, 30 Nov 2022 18:11:47 +0100	[thread overview]
Message-ID: <Y4eO07FChCui7Y6J@kroah.com> (raw)
In-Reply-To: <cover.1669655291.git.luizcap@amazon.com>

On Mon, Nov 28, 2022 at 05:08:31PM +0000, Luiz Capitulino wrote:
> Hi,
> 
> [ Marc, can you help reviewing? Esp. the first patch? ]
> 
> This series of backports from upstream to stable 5.15 and 5.10 fixes an issue
> we're seeing on AWS ARM instances where attaching an EBS volume (which is a
> nvme device) to the instance after offlining CPUs causes the device to take
> several minutes to show up and eventually nvme kworkers and other threads start
> getting stuck.
> 
> This series fixes the issue for 5.15.79 and 5.10.155. I can't reproduce it
> on 5.4. Also, I couldn't reproduce this on x86 even w/ affected kernels.
> 
> An easy reproducer is:
> 
> 1. Start an ARM instance with 32 CPUs
> 2. Once the instance is booted, offline all CPUs but CPU 0. Eg:
>    # for i in $(seq 1 32); do chcpu -d $i; done
> 3. Once the CPUs are offline, attach an EBS volume
> 4. Watch lsblk and dmesg in the instance
> 
> Eventually, you get this stack trace:
> 
> [   71.842974] pci 0000:00:1f.0: [1d0f:8061] type 00 class 0x010802
> [   71.843966] pci 0000:00:1f.0: reg 0x10: [mem 0x00000000-0x00003fff]
> [   71.845149] pci 0000:00:1f.0: PME# supported from D0 D1 D2 D3hot D3cold
> [   71.846694] pci 0000:00:1f.0: BAR 0: assigned [mem 0x8011c000-0x8011ffff]
> [   71.848458] ACPI: \_SB_.PCI0.GSI3: Enabled at IRQ 38
> [   71.850852] nvme nvme1: pci function 0000:00:1f.0
> [   71.851611] nvme 0000:00:1f.0: enabling device (0000 -> 0002)
> [  135.887787] nvme nvme1: I/O 22 QID 0 timeout, completion polled
> [  197.328276] nvme nvme1: I/O 23 QID 0 timeout, completion polled
> [  197.329221] nvme nvme1: 1/0/0 default/read/poll queues
> [  243.408619] INFO: task kworker/u64:2:275 blocked for more than 122 seconds.
> [  243.409674]       Not tainted 5.15.79 #1
> [  243.410270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.411389] task:kworker/u64:2   state:D stack:    0 pid:  275 ppid:     2 flags:0x00000008
> [  243.412602] Workqueue: events_unbound async_run_entry_fn
> [  243.413417] Call trace:
> [  243.413797]  __switch_to+0x15c/0x1a4
> [  243.414335]  __schedule+0x2bc/0x990
> [  243.414849]  schedule+0x68/0xf8
> [  243.415334]  schedule_timeout+0x184/0x340
> [  243.415946]  wait_for_completion+0xc8/0x220
> [  243.416543]  __flush_work.isra.43+0x240/0x2f0
> [  243.417179]  flush_work+0x20/0x2c
> [  243.417666]  nvme_async_probe+0x20/0x3c
> [  243.418228]  async_run_entry_fn+0x3c/0x1e0
> [  243.418858]  process_one_work+0x1bc/0x460
> [  243.419437]  worker_thread+0x164/0x528
> [  243.420030]  kthread+0x118/0x124
> [  243.420517]  ret_from_fork+0x10/0x20
> [  258.768771] nvme nvme1: I/O 20 QID 0 timeout, completion polled
> [  320.209266] nvme nvme1: I/O 21 QID 0 timeout, completion polled
> 
> For completion, I tested the same test-case on x86 with this series applied
> on 5.15.79 and 5.10.155 as well. It works as expected.

All now queued up, thanks.

greg k-h

      parent reply	other threads:[~2022-11-30 17:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 17:08 [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Luiz Capitulino
2022-11-28 17:08 ` [PATH stable 5.15,5.10 1/4] genirq/msi: Shutdown managed interrupts with unsatifiable affinities Luiz Capitulino
2022-11-28 17:08 ` [PATH stable 5.15,5.10 2/4] genirq: Always limit the affinity to online CPUs Luiz Capitulino
2022-11-28 17:08 ` [PATH stable 5.15,5.10 3/4] irqchip/gic-v3: Always trust the managed affinity provided by the core code Luiz Capitulino
2022-11-28 17:08 ` [PATH stable 5.15,5.10 4/4] genirq: Take the proposed affinity at face value if force==true Luiz Capitulino
2022-11-28 17:53 ` [PATH stable 5.15,5.10 0/4] Fix EBS volume attach on AWS ARM instances Marc Zyngier
2022-11-28 18:27   ` [PATH stable 5.15, 5.10 " Luiz Capitulino
2022-11-30  3:12     ` Luiz Capitulino
2022-11-30 17:11 ` Greg KH [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4eO07FChCui7Y6J@kroah.com \
    --to=greg@kroah.com \
    --cc=lcapitulino@gmail.com \
    --cc=luizcap@amazon.com \
    --cc=maz@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.