All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@kernel.org>
To: Bagas Sanjaya <bagasdotme@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux SCSI <linux-scsi@vger.kernel.org>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	John Garry <john.g.garry@oracle.com>,
	Jason Yan <yanaijie@huawei.com>,
	Nikolaos Barkas <nickosbarkas@gmail.com>
Subject: Re: Fwd: Marvell 88SE6320 SAS controller (mvsas) cannot survive ACPI S3 or ACPI S4
Date: Thu, 26 Oct 2023 17:56:03 +0900	[thread overview]
Message-ID: <4bec6c99-68aa-ddfb-9c22-ba62f66e5901@kernel.org> (raw)
In-Reply-To: <7e8fab39-8cdd-4527-8c4f-b18dd79cee14@gmail.com>

On 2023/10/26 17:25, Bagas Sanjaya wrote:
> Hi,
> 
> I notice a bug report on Bugzilla [1]. Quoting from it:

[...]

>> [  437.249448] PM: suspend entry (deep)
>> [  437.255308] Filesystems sync: 0.005 seconds
>> [  437.255570] Freezing user space processes
>> [  437.257093] Freezing user space processes completed (elapsed 0.001 seconds)
>> [  437.257097] OOM killer disabled.
>> [  437.257098] Freezing remaining freezable tasks
>> [  437.258226] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
>> [  437.258281] printk: Suspending console(s) (use no_console_suspend to debug)
>> [  437.291778] sd 0:0:0:0: [sdb] Synchronizing SCSI cache
>> [  437.291825] sd 0:0:1:0: [sdc] Synchronizing SCSI cache
>> [  437.292083] sd 0:0:0:0: [sdb] Stopping disk
>> [  437.292083] sd 0:0:1:0: [sdc] Stopping disk
>> [  438.363660] sd 1:0:0:0: [sda] Synchronizing SCSI cache
>> [  438.363760] sd 1:0:0:0: [sda] Stopping disk

Given this message, this does not look like the latest kernel.

>> [  589.081341] drivers/scsi/mvsas/mv_sas.c 1304:mvs_I_T_nexus_reset for device[1]:rc= 0
>> [  610.481270] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> [  610.481280] rcu: 	11-...0: (0 ticks this GP) idle=4f84/1/0x4000000000000000 softirq=19873/19873 fqs=1159
>> [  610.481292] 	(detected by 5, t=5252 jiffies, g=53581, q=31630 ncpus=12)
>> [  610.481299] Sending NMI from CPU 5 to CPUs 11:
>> [  610.481309] NMI backtrace for cpu 11
>> [  610.481312] CPU: 11 PID: 3152 Comm: kworker/u32:59 Tainted: G          I        6.1.57-vanilla #14
>> [  610.481318] Hardware name: System manufacturer System Product Name/P6T WS PRO, BIOS 1205    09/24/2010
>> [  610.481321] Workqueue: events_unbound async_run_entry_fn
>> [  610.481329] RIP: 0010:mvs_int_rx+0x81/0x150 [mvsas]
>> [  610.481346] Code: 00 00 44 39 75 70 74 47 48 8b 45 60 45 89 e6 41 81 e6 ff 03 00 00 41 8d 56 01 8b 1c 90 49 89 d4 41 89 df 41 81 e7 00 00 08 00 <f7> c3 00 00 01 00 74 58 31 d2 89 de 48 89 ef e8 0b f9 ff ff 45 85
>> [  610.481350] RSP: 0018:ffffb61f06acbb60 EFLAGS: 00000046
>> [  610.481354] RAX: ffff9a7cc2658000 RBX: 0000000000010000 RCX: 0000000000000000
>> [  610.481358] RDX: 000000000000026e RSI: 0000000000010000 RDI: ffff9a7ce2660000
>> [  610.481361] RBP: ffff9a7ce2660000 R08: ffff9a7ce2660f00 R09: ffff9a7ce2660000
>> [  610.481364] R10: ffff9a7ce26600c8 R11: ffffffff884d4300 R12: 000000000000026e
>> [  610.481367] R13: 0000000000000000 R14: 000000000000026d R15: 0000000000000000
>> [  610.481371] FS:  0000000000000000(0000) GS:ffff9a7df7cc0000(0000) knlGS:0000000000000000
>> [  610.481375] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  610.481378] CR2: 0000563633425300 CR3: 0000000077210006 CR4: 00000000000206e0
>> [  610.481382] Call Trace:
>> [  610.481385]  <NMI>
>> [  610.481389]  ? nmi_cpu_backtrace.cold+0x1b/0x76
>> [  610.481398]  ? nmi_cpu_backtrace_handler+0xd/0x20
>> [  610.481403]  ? nmi_handle+0x5d/0x120
>> [  610.481410]  ? mvs_int_rx+0x81/0x150 [mvsas]
>> [  610.481423]  ? default_do_nmi+0x69/0x170
>> [  610.481428]  ? exc_nmi+0x13c/0x170
>> [  610.481432]  ? end_repeat_nmi+0x16/0x67
>> [  610.481443]  ? mvs_int_rx+0x81/0x150 [mvsas]
>> [  610.481457]  ? mvs_int_rx+0x81/0x150 [mvsas]
>> [  610.481470]  ? mvs_int_rx+0x81/0x150 [mvsas]
>> [  610.481483]  </NMI>
>> [  610.481484]  <TASK>
>> [  610.481487]  mvs_do_release_task+0x3f/0x90 [mvsas]
>> [  610.481501]  mvs_release_task+0x13e/0x1a0 [mvsas]
>> [  610.481516]  mvs_I_T_nexus_reset+0xb2/0xd0 [mvsas]
>> [  610.481530]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
>> [  610.481552]  sas_ata_hard_reset+0x48/0x80 [libsas]
>> [  610.481575]  ata_eh_reset+0x2e5/0x1090 [libata]
>> [  610.481631]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
>> [  610.481652]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
>> [  610.481676]  ata_eh_recover+0x2e6/0xe00 [libata]
>> [  610.481728]  ? __wake_up_klogd.part.0+0x56/0x80
>> [  610.481735]  ? vprintk_emit+0x207/0x290
>> [  610.481739]  ? smp_ata_check_ready_type+0xb0/0xb0 [libsas]
>> [  610.481760]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
>> [  610.481783]  ? smp_ata_check_ready_type+0xb0/0xb0 [libsas]
>> [  610.481804]  ? sas_ata_wait_after_reset+0x80/0x80 [libsas]
>> [  610.481824]  ata_do_eh+0x75/0xf0 [libata]
>> [  610.481876]  ? del_timer_sync+0x6f/0xb0
>> [  610.481884]  ata_scsi_port_error_handler+0x3a8/0x800 [libata]
>> [  610.481938]  async_sas_ata_eh+0x44/0x7f [libsas]
>> [  610.481960]  async_run_entry_fn+0x30/0x130
>> [  610.481966]  process_one_work+0x1c7/0x380
>> [  610.481974]  worker_thread+0x4d/0x380
>> [  610.481981]  ? rescuer_thread+0x3a0/0x3a0
>> [  610.481987]  kthread+0xe9/0x110
>> [  610.481992]  ? kthread_complete_and_exit+0x20/0x20
>> [  610.481999]  ret_from_fork+0x22/0x30
>> [  610.482009]  </TASK>
>> [  665.286198] NMI watchdog: Watchdog detected hard LOCKUP on cpu 11
Could be due to the libata deadlock without the recent suspend/resume fixes. Or
this is yet another adapter that was not tested for suspend/resume. mpt3sas
crashes the machine 100% of the time as well. I had no time to dig into that issue.

>> Hibernate/resume with a sas disk and then running gnome disks for a couple of reads:
>>
>> [    0.000000] Linux version 6.1.57-vanilla (turbo@dbn-003) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #14 SMP PREEMPT_DYNAMIC Sat Oct 14 11:48:05 EEST 2023
>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.57-vanilla root=/dev/mapper/test-root ro quiet
>>
>>     4.191667] scsi host0: mvsas
>> [    4.202932] sas: phy-0:1 added to port-0:0, phy_mask:0x2 (2bad2bad2bad2bad)
>> [    4.202935] drivers/scsi/mvsas/mv_sas.c 1099:set wide port phy map 2
>> [    4.232802] sas: DOING DISCOVERY on port 0, pid:78
>> [    4.234233] scsi 0:0:0:0: Direct-Access     WD       WD3001HKHG-02VUC SR07 PQ: 0 ANSI: 6
>> [    4.234723] scsi 0:0:0:0: Power-on or device reset occurred
>> [    4.243180] sas: DONE DISCOVERY on port 0, pid:78, result:0
>> [    4.243958] sd 0:0:0:0: [sdb] 586072368 512-byte logical blocks: (300 GB/279 GiB)
>> [    4.251366] sd 0:0:0:0: [sdb] Write Protect is off
>> [    4.251372] sd 0:0:0:0: [sdb] Mode Sense: df 00 10 08
>> [    4.256088] sd 0:0:0:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
>>
>> [  331.841349] PM: hibernation: hibernation entry
>> [  331.846068] Filesystems sync: 0.004 seconds
>> [  331.846251] Freezing user space processes
>> [  331.847662] Freezing user space processes completed (elapsed 0.001 seconds)
>> [  331.847665] OOM killer disabled.
>> [  331.847752] PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff]
>> [  331.847754] PM: hibernation: Marking nosave pages: [mem 0x0009c000-0x000fffff]
>> [  331.847757] PM: hibernation: Marking nosave pages: [mem 0xbf780000-0xffffffff]
>> [  331.848818] PM: hibernation: Basic memory bitmaps created
>> [  331.848868] PM: hibernation: Preallocating image memory
>> [  332.320952] PM: hibernation: Allocated 468298 pages for snapshot
>> [  332.320958] PM: hibernation: Allocated 1873192 kbytes in 0.47 seconds (3985.51 MB/s)
>> [  332.320961] Freezing remaining freezable tasks
>> [  332.322205] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
>> [  332.322378] printk: Suspending console(s) (use no_console_suspend to debug)
>> [  334.826522] ACPI: PM: Preparing to enter system sleep state S4
>> [  334.826773] ACPI: PM: Saving platform NVS memory
>> [  334.826979] Disabling non-boot CPUs ...
>> [  334.829452] smpboot: CPU 1 is now offline
>> [  334.831537] smpboot: CPU 2 is now offline
>> [  334.833750] smpboot: CPU 3 is now offline
>> [  334.835897] smpboot: CPU 4 is now offline
>> [  334.838296] smpboot: CPU 5 is now offline
>> [  334.840868] smpboot: CPU 6 is now offline
>> [  334.842966] smpboot: CPU 7 is now offline
>> [  334.845060] smpboot: CPU 8 is now offline
>> [  334.847030] smpboot: CPU 9 is now offline
>> [  334.848955] smpboot: CPU 10 is now offline
>> [  334.850837] smpboot: CPU 11 is now offline
>> [  334.851340] PM: hibernation: Creating image:
>> [  335.133820] PM: hibernation: Need to copy 480821 pages
>> [  335.133826] PM: hibernation: Normal pages needed: 480821 + 1024, available pages: 1613904
>> [  334.852388] ACPI: PM: Restoring platform NVS memory
>> [  334.852777] Enabling non-boot CPUs ...
>> [  334.852817] x86: Booting SMP configuration:
>> [  334.852818] smpboot: Booting Node 0 Processor 1 APIC 0x2
>> [  334.856338] CPU1 is up
>> [  334.856369] smpboot: Booting Node 0 Processor 2 APIC 0x4
>> [  334.859345] CPU2 is up
>> [  334.859375] smpboot: Booting Node 0 Processor 3 APIC 0x10
>> [  334.862383] CPU3 is up
>> [  334.862412] smpboot: Booting Node 0 Processor 4 APIC 0x12
>> [  334.865376] CPU4 is up
>> [  334.865407] smpboot: Booting Node 0 Processor 5 APIC 0x14
>> [  334.868395] CPU5 is up
>> [  334.868423] smpboot: Booting Node 0 Processor 6 APIC 0x1
>> [  334.871618] CPU6 is up
>> [  334.871649] smpboot: Booting Node 0 Processor 7 APIC 0x3
>> [  334.874844] CPU7 is up
>> [  334.874878] smpboot: Booting Node 0 Processor 8 APIC 0x5
>> [  334.878160] CPU8 is up
>> [  334.878189] smpboot: Booting Node 0 Processor 9 APIC 0x11
>> [  334.881557] CPU9 is up
>> [  334.881586] smpboot: Booting Node 0 Processor 10 APIC 0x13
>> [  334.884946] CPU10 is up
>> [  334.884974] smpboot: Booting Node 0 Processor 11 APIC 0x15
>> [  334.888415] CPU11 is up
>> [  334.901008] ACPI: PM: Waking up from system sleep state S4
>> [  335.001313] usb usb4: root hub lost power or was reset
>> [  335.001314] usb usb2: root hub lost power or was reset
>> [  335.001314] usb usb5: root hub lost power or was reset
>> [  335.001365] usb usb1: root hub lost power or was reset
>> [  335.001372] usb usb6: root hub lost power or was reset
>> [  335.001375] usb usb7: root hub lost power or was reset
>> [  335.001428] usb usb8: root hub lost power or was reset
>> [  335.001449] usb usb3: root hub lost power or was reset
>> [  335.297280] usb 3-4: reset high-speed USB device number 2 using ehci-pci
>> [  335.537215] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [  335.549624] ata1.00: configured for UDMA/133
>> [  335.549810] ata1.00: Enabling discard_zeroes_data
>> [  335.769243] usb 3-4.3: reset full-speed USB device number 3 using ehci-pci
>> [  335.957245] usb 3-4.4: reset full-speed USB device number 4 using ehci-pci
>> [  336.385247] usb 3-4.4.1: reset full-speed USB device number 5 using ehci-pci
>> [  336.539864] PM: hibernation: Basic memory bitmaps freed
>> [  336.540367] OOM killer enabled.
>> [  336.540368] Restarting tasks ... 
>> [  336.541593] done.
>> [  336.541600] PM: hibernation: hibernation exit
>> [  336.636953] RTL8211B Gigabit Ethernet r8169-0-800:00: attached PHY driver (mii_bus:phy_addr=r8169-0-800:00, irq=MAC)
>> [  336.703336] r8169 0000:08:00.0 enp8s0: Link is Down
>> [  336.704991] RTL8211B Gigabit Ethernet r8169-0-900:00: attached PHY driver (mii_bus:phy_addr=r8169-0-900:00, irq=MAC)
>> [  336.772497] r8169 0000:09:00.0 enp9s0: Link is Down
>>
>> [  463.977280] sas: Enter sas_scsi_recover_host busy: 3 failed: 3
>> [  463.977288] sas: trying to find task 0x00000000f13c22cf
>> [  463.977292] sas: sas_scsi_find_task: aborting task 0x00000000f13c22cf
>> [  463.977416] sas: task done but aborted
>> [  463.977422] sas: sas_scsi_find_task: task 0x00000000f13c22cf is done
>> [  463.977425] sas: sas_eh_handle_sas_errors: task 0x00000000f13c22cf is done
>> [  463.977429] sas: trying to find task 0x0000000089d39e25
>> [  463.977431] sas: sas_scsi_find_task: aborting task 0x0000000089d39e25
>> [  472.026033] sas: task done but aborted
>> [  472.026040] sas: sas_scsi_find_task: task 0x0000000089d39e25 is done
>> [  472.026044] sas: sas_eh_handle_sas_errors: task 0x0000000089d39e25 is done
>> [  472.026048] sas: trying to find task 0x000000002d226601
>> [  472.026050] sas: sas_scsi_find_task: aborting task 0x000000002d226601
>> [  492.601193] sas: TMF task timeout for 2bad2bad2bad2bad and not done
>> [  492.601204] drivers/scsi/mvsas/mv_sas.c 1410:mvs_abort_task:rc= 5
>> [  492.601208] sas: sas_scsi_find_task: querying task 0x000000002d226601
>> [  513.081228] sas: TMF task timeout for 2bad2bad2bad2bad and not done
>> [  513.081241] drivers/scsi/mvsas/mv_sas.c 1340:mvs_query_task:rc= 5
>> [  513.081247] sas: sas_scsi_find_task: task 0x000000002d226601 failed to abort
>> [  513.081252] sas: task 0x000000002d226601 is not at LU: I_T recover
>> [  513.081255] sas: I_T nexus reset for dev 2bad2bad2bad2bad
>> [  513.577383] drivers/scsi/mvsas/mv_sas.c 1690:Release slot [5] tag[5], task [000000002d226601]:
>> [  513.577396] sas: task done but aborted
>> [  513.577398] drivers/scsi/mvsas/mv_sas.c 1690:Release slot [1] tag[1], task [00000000d0d28957]:
>> [  515.321219] drivers/scsi/mvsas/mv_sas.c 1304:mvs_I_T_nexus_reset for device[0]:rc= 0
>> [  515.321226] sas: I_T 2bad2bad2bad2bad recovered
>> [  515.321243] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 3 tries: 1
>> [  515.337273] sd 0:0:0:0: [sdb] tag#8 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=81s
>> [  515.337281] sd 0:0:0:0: [sdb] tag#8 CDB: Read(10) 28 00 11 77 78 98 00 04 00 00
>> [  515.337283] I/O error, dev sdb, sector 293042328 op 0x0:(READ) flags 0x4000 phys_seg 8 prio class 2
>> [  515.337299] sd 0:0:0:0: [sdb] tag#9 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=81s
>> [  515.337303] sd 0:0:0:0: [sdb] tag#9 CDB: Read(10) 28 00 11 77 7c 98 00 04 00 00
>> [  515.337305] I/O error, dev sdb, sector 293043352 op 0x0:(READ) flags 0x4000 phys_seg 8 prio class 2
>> [  515.337315] sd 0:0:0:0: [sdb] tag#10 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=81s
>> [  515.337318] sd 0:0:0:0: [sdb] tag#10 CDB: Read(10) 28 00 11 77 80 98 00 04 00 00
>> [  515.337320] I/O error, dev sdb, sector 293044376 op 0x0:(READ) flags 0x4000 phys_seg 9 prio class 2
>> [  515.710666] drivers/scsi/mvsas/mv_sas.c 1099:set wide port phy map 2
>> [  515.710714] sas: sas_form_port: phy1 belongs to port0 already(1)!
>> [  519.721300] drivers/scsi/mvsas/mv_sas.c 1690:Release slot [1] tag[1], task [00000000d0d28957]:
>> [  521.822694] drivers/scsi/mvsas/mv_sas.c 1099:set wide port phy map 2
>> [  521.822702] sas: sas_form_port: phy1 belongs to port0 already(1)!

This one is not libata :) So the adapter driver itself may have an issue.

-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2023-10-26  8:56 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-26  8:25 Fwd: Marvell 88SE6320 SAS controller (mvsas) cannot survive ACPI S3 or ACPI S4 Bagas Sanjaya
2023-10-26  8:56 ` Damien Le Moal [this message]
2023-10-26 14:12   ` Bagas Sanjaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4bec6c99-68aa-ddfb-9c22-ba62f66e5901@kernel.org \
    --to=dlemoal@kernel.org \
    --cc=bagasdotme@gmail.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.g.garry@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nickosbarkas@gmail.com \
    --cc=yanaijie@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.