linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	YangYang <yang.yang@vivo.com>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead
Date: Mon, 28 Oct 2024 14:44:02 +0900	[thread overview]
Message-ID: <20241028054402.GT1279924@google.com> (raw)
In-Reply-To: <ZxPKP8SEb7Y4ceOq@fedora>

On (24/10/19 23:03), Ming Lei wrote:
> On Sat, Oct 19, 2024 at 10:50:10PM +0900, Sergey Senozhatsky wrote:
> > On (24/10/19 21:09), Ming Lei wrote:
> > > On Sat, Oct 19, 2024 at 09:58:04PM +0900, Sergey Senozhatsky wrote:
> > > > On (24/10/19 20:50), Ming Lei wrote:
> > > > > On Sat, Oct 19, 2024 at 09:37:27PM +0900, Sergey Senozhatsky wrote:
> > [..]
> 
> Probably bio_queue_enter() waits for runtime PM, and the queue is in
> ->pm_only state, and BLK_MQ_REQ_PM isn't passed actually from
> ioctl_internal_command() <- scsi_set_medium_removal().

Sorry for the delay.

Another report.
I see lots of buffer I/O errors

<6>[ 364.268167] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 364.268551] scsi host3: usb-storage 3-3:1.0
<3>[ 364.274806] Buffer I/O error on dev sdc1, logical block 0, lost async page write
<5>[ 365.318424] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4
<5>[ 365.319898] sd 3:0:0:0: [sdc] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB)
<5>[ 365.320077] sd 3:0:0:0: [sdc] Write Protect is off
<7>[ 365.320085] sd 3:0:0:0: [sdc] Mode Sense: 03 00 00 00
<4>[ 365.320255] sd 3:0:0:0: [sdc] No Caching mode page found
<4>[ 365.320262] sd 3:0:0:0: [sdc] Assuming drive cache: write through
<6>[ 365.322483] sdc: sdc1
<5>[ 365.323130] sd 3:0:0:0: [sdc] Attached SCSI removable disk
<6>[ 369.083225] usb 3-3: USB disconnect, device number 49

Then PM suspend/resume.

After resume

<7>[ 1338.847937] PM: resume of devices complete after 291.422 msecs
<6>[ 1338.854215] OOM killer enabled.
<6>[ 1338.854235] Restarting tasks ...
<6>[ 1338.854797] mei_hdcp 0000:00:16.0-(UUID: 7): bound 0000:00:02.0 (ops 0xffffffffb8f03e50)
<6>[ 1338.857745] mei_pxp 0000:00:16.0-(UUID: 2): bound 0000:00:02.0 (ops 0xffffffffb8f16a80)
<4>[ 1338.859663] done.
<5>[ 1338.859683] random: crng reseeded on system resumption
<12>[ 1338.868200] init: cupsd main process ended, respawning
<6>[ 1338.868541] Resume caused by IRQ 9, acpi
<6>[ 1338.868549] Resume caused by IRQ 98, chromeos-ec
<6>[ 1338.868555] PM: suspend exit

lots of buffer I/O errors again and eventually a deadlock.  The deadlock
happens much later than 120 seconds after resume, so I cannot directly
connect those events.

[..]
<6>[ 1859.660882] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 1859.661457] scsi host4: usb-storage 3-3:1.0
<3>[ 1859.668180] Buffer I/O error on dev sdd1, logical block 0, lost async page write
<5>[ 1860.697826] scsi 4:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4
<5>[ 1860.699222] sd 4:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB)
<5>[ 1860.699373] sd 4:0:0:0: [sdd] Write Protect is off
<7>[ 1860.699380] sd 4:0:0:0: [sdd] Mode Sense: 03 00 00 00
<4>[ 1860.699522] sd 4:0:0:0: [sdd] No Caching mode page found
<4>[ 1860.699526] sd 4:0:0:0: [sdd] Assuming drive cache: write through
<6>[ 1860.701393] sdd: sdd1
<5>[ 1860.701886] sd 4:0:0:0: [sdd] Attached SCSI removable disk
<6>[ 1862.077109] usb 3-3: USB disconnect, device number 110
<6>[ 1862.338159] usb 3-3: new high-speed USB device number 111 using xhci_hcd
<6>[ 1862.468090] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00
<6>[ 1862.468105] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8)
<6>[ 1862.468111] usb 3-3: Product: Disk 2.0
<6>[ 1862.468115] usb 3-3: Manufacturer: USB
<6>[ 1862.468119] usb 3-3: SerialNumber: (Serial: 9)
<6>[ 1862.469962] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 1862.470642] scsi host3: usb-storage 3-3:1.0
<3>[ 1862.476447] Buffer I/O error on dev sdd1, logical block 0, lost async page write
<5>[ 1863.514018] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4
<5>[ 1863.515489] sd 3:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB)
<5>[ 1863.515640] sd 3:0:0:0: [sdd] Write Protect is off
<7>[ 1863.515646] sd 3:0:0:0: [sdd] Mode Sense: 03 00 00 00
<4>[ 1863.515797] sd 3:0:0:0: [sdd] No Caching mode page found
<4>[ 1863.515802] sd 3:0:0:0: [sdd] Assuming drive cache: write through
<6>[ 1863.518227] sdd: sdd1
<5>[ 1863.518551] sd 3:0:0:0: [sdd] Attached SCSI removable disk
<6>[ 1865.018356] usb 3-3: USB disconnect, device number 111
<6>[ 1865.285091] usb 3-3: new high-speed USB device number 112 using xhci_hcd
<3>[ 1865.605088] usb 3-3: device descriptor read/64, error -71
<6>[ 1865.844873] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00
<6>[ 1865.844892] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8)
<6>[ 1865.844898] usb 3-3: Product: Disk 2.0
<6>[ 1865.844903] usb 3-3: Manufacturer: USB
<6>[ 1865.844906] usb 3-3: SerialNumber: (Serial: 9)
<6>[ 1865.847205] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 1865.847806] scsi host4: usb-storage 3-3:1.0
<3>[ 1865.853941] Buffer I/O error on dev sdd1, logical block 0, lost async page write
<6>[ 1866.436729] usb 3-3: USB disconnect, device number 112
<6>[ 1866.700998] usb 3-3: new high-speed USB device number 113 using xhci_hcd
<6>[ 1866.829449] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00
<6>[ 1866.829466] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8)
<6>[ 1866.829473] usb 3-3: Product: Disk 2.0
<6>[ 1866.829478] usb 3-3: Manufacturer: USB
<6>[ 1866.829482] usb 3-3: SerialNumber: (Serial: 9)
<6>[ 1866.831605] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 1866.832173] scsi host3: usb-storage 3-3:1.0
<5>[ 1867.866118] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4
<5>[ 1867.868213] sd 3:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB)
<5>[ 1867.868604] sd 3:0:0:0: [sdd] Write Protect is off
<7>[ 1867.868616] sd 3:0:0:0: [sdd] Mode Sense: 03 00 00 00
<4>[ 1867.869071] sd 3:0:0:0: [sdd] No Caching mode page found
<4>[ 1867.869081] sd 3:0:0:0: [sdd] Assuming drive cache: write through
<6>[ 1867.871429] sdd: sdd1
<5>[ 1867.871857] sd 3:0:0:0: [sdd] Attached SCSI removable disk
<6>[ 1868.423593] usb 3-3: USB disconnect, device number 113
<6>[ 1868.431172] sdd: detected capacity change from 122880000 to 0
<28>[ 1928.670962] udevd[203]: sdd: Worker [9839] processing SEQNUM=6508 is taking a long time
<3>[ 2004.633104] INFO: task kworker/0:3:187 blocked for more than 122 seconds.
<3>[ 2004.633125] Tainted: G U 6.6.41-03520-gd3d77f15f842 #1
<3>[ 2004.633131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[ 2004.633149] task:kworker/0:3 state:D stack:0 pid:187 ppid:2 flags:0x00004000
<6>[ 2004.633149] Workqueue: usb_hub_wq hub_event
<6>[ 2004.633166] Call Trace:
<6>[ 2004.633172] <TASK>
<6>[ 2004.633179] schedule+0x4f4/0x1540
<6>[ 2004.633190] ? default_wake_function+0x388/0xcd0
<6>[ 2004.633200] schedule_preempt_disabled+0x15/0x30
<6>[ 2004.633206] __mutex_lock_slowpath+0x2b5/0x4d0
<6>[ 2004.633212] del_gendisk+0x136/0x370
<6>[ 2004.633222] sd_remove+0x30/0x60
<6>[ 2004.633230] device_release_driver_internal+0x1a2/0x2a0
<6>[ 2004.633239] bus_remove_device+0x154/0x180
<6>[ 2004.633248] device_del+0x207/0x370
<6>[ 2004.633256] ? __pfx_transport_remove_classdev+0x10/0x10
<6>[ 2004.633264] ? attribute_container_device_trigger+0xe3/0x110
<6>[ 2004.633272] __scsi_remove_device+0xc0/0x170
<6>[ 2004.633279] scsi_forget_host+0x45/0x60
<6>[ 2004.633287] scsi_remove_host+0x87/0x170
<6>[ 2004.633295] usb_stor_disconnect+0x63/0xb0
<6>[ 2004.633302] usb_unbind_interface+0xbe/0x250
<6>[ 2004.633309] device_release_driver_internal+0x1a2/0x2a0
<6>[ 2004.633315] bus_remove_device+0x154/0x180
<6>[ 2004.633322] device_del+0x207/0x370
<6>[ 2004.633328] ? kobject_release+0x56/0xb0
<6>[ 2004.633336] usb_disable_device+0x72/0x170
<6>[ 2004.633342] usb_disconnect+0xeb/0x280
<6>[ 2004.633350] hub_event+0xac7/0x1760
<6>[ 2004.633359] worker_thread+0x355/0x900
<6>[ 2004.633367] kthread+0xed/0x110
<6>[ 2004.633374] ? __pfx_worker_thread+0x10/0x10
<6>[ 2004.633381] ? __pfx_kthread+0x10/0x10
<6>[ 2004.633387] ret_from_fork+0x38/0x50
<6>[ 2004.633393] ? __pfx_kthread+0x10/0x10
<6>[ 2004.633399] ret_from_fork_asm+0x1b/0x30
<6>[ 2004.633407] </TASK>
<3>[ 2004.633496] INFO: task cros-disks:1614 blocked for more than 122 seconds.
<3>[ 2004.633502] Tainted: G U 6.6.41-03520-gd3d77f15f842 #1
<3>[ 2004.633506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[ 2004.633519] task:cros-disks state:D stack:0 pid:1614 ppid:1 flags:0x00004002
<6>[ 2004.633519] Call Trace:
<6>[ 2004.633523] <TASK>
<6>[ 2004.633527] schedule+0x4f4/0x1540
<6>[ 2004.633533] ? xas_store+0xc57/0xcc0
<6>[ 2004.633539] ? lru_add_drain+0x4d8/0x6e0
<6>[ 2004.633548] blk_queue_enter+0x172/0x250
<6>[ 2004.633557] ? __pfx_autoremove_wake_function+0x10/0x10
<6>[ 2004.633565] blk_mq_alloc_request+0x167/0x210
<6>[ 2004.633573] scsi_execute_cmd+0x65/0x240
<6>[ 2004.633580] ioctl_internal_command+0x6c/0x150
<6>[ 2004.633590] scsi_set_medium_removal+0x63/0xc0
<6>[ 2004.633598] sd_release+0x42/0x50
<6>[ 2004.633606] blkdev_put+0x13b/0x1f0
<6>[ 2004.633615] blkdev_release+0x2b/0x40
<6>[ 2004.633623] __fput_sync+0x9b/0x2c0
<6>[ 2004.633632] __se_sys_close+0x69/0xc0
<6>[ 2004.633639] do_syscall_64+0x60/0x90
<6>[ 2004.633649] ? exit_to_user_mode_prepare+0x49/0x130
<6>[ 2004.633657] ? do_syscall_64+0x6f/0x90
<6>[ 2004.633665] ? do_syscall_64+0x6f/0x90
<6>[ 2004.633672] ? do_syscall_64+0x6f/0x90
<6>[ 2004.633680] ? irq_exit_rcu+0x38/0x90
<6>[ 2004.633687] ? exit_to_user_mode_prepare+0x49/0x130
<6>[ 2004.633694] entry_SYSCALL_64_after_hwframe+0x73/0xdd
<6>[ 2004.633703] RIP: 0033:0x786d55239960
<6>[ 2004.633711] RSP: 002b:00007ffd1c6d8c28 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
<6>[ 2004.633719] RAX: ffffffffffffffda RBX: 00005a5ffe743fd0 RCX: 0000786d55239960
<6>[ 2004.633725] RDX: 0000786d55307b00 RSI: 0000000000000000 RDI: 000000000000000c
<6>[ 2004.633730] RBP: 00007ffd1c6d8d30 R08: 0000000000000007 R09: 00005a5ffe78a9f0
<6>[ 2004.633735] R10: 8a1ecef621fff8a0 R11: 0000000000000202 R12: 0000000000000831
<6>[ 2004.633741] R13: 00005a5ffe743f60 R14: 00005a5ffe743f80 R15: 000000000000000c
<6>[ 2004.633746] </TASK>

  parent reply	other threads:[~2024-10-28  5:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 11:38 try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Christoph Hellwig
2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
2024-10-09 12:31   ` Sergey Senozhatsky
2024-10-09 12:41     ` Christoph Hellwig
2024-10-09 12:43       ` Sergey Senozhatsky
2024-10-09 13:49       ` Jens Axboe
2024-10-16  4:14   ` YangYang
2024-10-16 11:09   ` Ming Lei
2024-10-16 12:32     ` Christoph Hellwig
2024-10-16 12:49       ` Ming Lei
2024-10-16 13:35   ` Ming Lei
2024-10-19  1:25     ` Sergey Senozhatsky
2024-10-19 12:32       ` Ming Lei
2024-10-19 12:37         ` Sergey Senozhatsky
2024-10-19 12:50           ` Ming Lei
2024-10-19 12:58             ` Sergey Senozhatsky
2024-10-19 13:09               ` Ming Lei
2024-10-19 13:50                 ` Sergey Senozhatsky
2024-10-19 15:03                   ` Ming Lei
2024-10-19 15:11                     ` Sergey Senozhatsky
2024-10-19 15:40                     ` Sergey Senozhatsky
2024-10-28  5:44                     ` Sergey Senozhatsky [this message]
2024-10-09 11:38 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig
2024-10-16  4:15   ` YangYang
2024-10-16  2:09 ` try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Sergey Senozhatsky
  -- strict thread matches above, loose matches on Subject: below --
2024-10-08 11:57 RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Christoph Hellwig
2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
2024-10-09  5:06   ` Sergey Senozhatsky
2024-10-09  7:34     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241028054402.GT1279924@google.com \
    --to=senozhatsky@chromium.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=yang.yang@vivo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).