All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <senozhatsky@chromium.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	YangYang <yang.yang@vivo.com>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead
Date: Mon, 28 Oct 2024 14:44:02 +0900	[thread overview]
Message-ID: <20241028054402.GT1279924@google.com> (raw)
In-Reply-To: <ZxPKP8SEb7Y4ceOq@fedora>

On (24/10/19 23:03), Ming Lei wrote:
> On Sat, Oct 19, 2024 at 10:50:10PM +0900, Sergey Senozhatsky wrote:
> > On (24/10/19 21:09), Ming Lei wrote:
> > > On Sat, Oct 19, 2024 at 09:58:04PM +0900, Sergey Senozhatsky wrote:
> > > > On (24/10/19 20:50), Ming Lei wrote:
> > > > > On Sat, Oct 19, 2024 at 09:37:27PM +0900, Sergey Senozhatsky wrote:
> > [..]
> 
> Probably bio_queue_enter() waits for runtime PM, and the queue is in
> ->pm_only state, and BLK_MQ_REQ_PM isn't passed actually from
> ioctl_internal_command() <- scsi_set_medium_removal().

Sorry for the delay.

Another report.
I see lots of buffer I/O errors

<6>[ 364.268167] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 364.268551] scsi host3: usb-storage 3-3:1.0
<3>[ 364.274806] Buffer I/O error on dev sdc1, logical block 0, lost async page write
<5>[ 365.318424] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4
<5>[ 365.319898] sd 3:0:0:0: [sdc] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB)
<5>[ 365.320077] sd 3:0:0:0: [sdc] Write Protect is off
<7>[ 365.320085] sd 3:0:0:0: [sdc] Mode Sense: 03 00 00 00
<4>[ 365.320255] sd 3:0:0:0: [sdc] No Caching mode page found
<4>[ 365.320262] sd 3:0:0:0: [sdc] Assuming drive cache: write through
<6>[ 365.322483] sdc: sdc1
<5>[ 365.323130] sd 3:0:0:0: [sdc] Attached SCSI removable disk
<6>[ 369.083225] usb 3-3: USB disconnect, device number 49

Then PM suspend/resume.

After resume

<7>[ 1338.847937] PM: resume of devices complete after 291.422 msecs
<6>[ 1338.854215] OOM killer enabled.
<6>[ 1338.854235] Restarting tasks ...
<6>[ 1338.854797] mei_hdcp 0000:00:16.0-(UUID: 7): bound 0000:00:02.0 (ops 0xffffffffb8f03e50)
<6>[ 1338.857745] mei_pxp 0000:00:16.0-(UUID: 2): bound 0000:00:02.0 (ops 0xffffffffb8f16a80)
<4>[ 1338.859663] done.
<5>[ 1338.859683] random: crng reseeded on system resumption
<12>[ 1338.868200] init: cupsd main process ended, respawning
<6>[ 1338.868541] Resume caused by IRQ 9, acpi
<6>[ 1338.868549] Resume caused by IRQ 98, chromeos-ec
<6>[ 1338.868555] PM: suspend exit

lots of buffer I/O errors again and eventually a deadlock.  The deadlock
happens much later than 120 seconds after resume, so I cannot directly
connect those events.

[..]
<6>[ 1859.660882] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 1859.661457] scsi host4: usb-storage 3-3:1.0
<3>[ 1859.668180] Buffer I/O error on dev sdd1, logical block 0, lost async page write
<5>[ 1860.697826] scsi 4:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4
<5>[ 1860.699222] sd 4:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB)
<5>[ 1860.699373] sd 4:0:0:0: [sdd] Write Protect is off
<7>[ 1860.699380] sd 4:0:0:0: [sdd] Mode Sense: 03 00 00 00
<4>[ 1860.699522] sd 4:0:0:0: [sdd] No Caching mode page found
<4>[ 1860.699526] sd 4:0:0:0: [sdd] Assuming drive cache: write through
<6>[ 1860.701393] sdd: sdd1
<5>[ 1860.701886] sd 4:0:0:0: [sdd] Attached SCSI removable disk
<6>[ 1862.077109] usb 3-3: USB disconnect, device number 110
<6>[ 1862.338159] usb 3-3: new high-speed USB device number 111 using xhci_hcd
<6>[ 1862.468090] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00
<6>[ 1862.468105] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8)
<6>[ 1862.468111] usb 3-3: Product: Disk 2.0
<6>[ 1862.468115] usb 3-3: Manufacturer: USB
<6>[ 1862.468119] usb 3-3: SerialNumber: (Serial: 9)
<6>[ 1862.469962] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 1862.470642] scsi host3: usb-storage 3-3:1.0
<3>[ 1862.476447] Buffer I/O error on dev sdd1, logical block 0, lost async page write
<5>[ 1863.514018] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4
<5>[ 1863.515489] sd 3:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB)
<5>[ 1863.515640] sd 3:0:0:0: [sdd] Write Protect is off
<7>[ 1863.515646] sd 3:0:0:0: [sdd] Mode Sense: 03 00 00 00
<4>[ 1863.515797] sd 3:0:0:0: [sdd] No Caching mode page found
<4>[ 1863.515802] sd 3:0:0:0: [sdd] Assuming drive cache: write through
<6>[ 1863.518227] sdd: sdd1
<5>[ 1863.518551] sd 3:0:0:0: [sdd] Attached SCSI removable disk
<6>[ 1865.018356] usb 3-3: USB disconnect, device number 111
<6>[ 1865.285091] usb 3-3: new high-speed USB device number 112 using xhci_hcd
<3>[ 1865.605088] usb 3-3: device descriptor read/64, error -71
<6>[ 1865.844873] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00
<6>[ 1865.844892] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8)
<6>[ 1865.844898] usb 3-3: Product: Disk 2.0
<6>[ 1865.844903] usb 3-3: Manufacturer: USB
<6>[ 1865.844906] usb 3-3: SerialNumber: (Serial: 9)
<6>[ 1865.847205] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 1865.847806] scsi host4: usb-storage 3-3:1.0
<3>[ 1865.853941] Buffer I/O error on dev sdd1, logical block 0, lost async page write
<6>[ 1866.436729] usb 3-3: USB disconnect, device number 112
<6>[ 1866.700998] usb 3-3: new high-speed USB device number 113 using xhci_hcd
<6>[ 1866.829449] usb 3-3: New USB device found, idVendor=346d, idProduct=5678, bcdDevice= 2.00
<6>[ 1866.829466] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=(Serial: 8)
<6>[ 1866.829473] usb 3-3: Product: Disk 2.0
<6>[ 1866.829478] usb 3-3: Manufacturer: USB
<6>[ 1866.829482] usb 3-3: SerialNumber: (Serial: 9)
<6>[ 1866.831605] usb-storage 3-3:1.0: USB Mass Storage device detected
<6>[ 1866.832173] scsi host3: usb-storage 3-3:1.0
<5>[ 1867.866118] scsi 3:0:0:0: Direct-Access VendorCo ProductCode 2.00 PQ: 0 ANSI: 4
<5>[ 1867.868213] sd 3:0:0:0: [sdd] 122880000 512-byte logical blocks: (62.9 GB/58.6 GiB)
<5>[ 1867.868604] sd 3:0:0:0: [sdd] Write Protect is off
<7>[ 1867.868616] sd 3:0:0:0: [sdd] Mode Sense: 03 00 00 00
<4>[ 1867.869071] sd 3:0:0:0: [sdd] No Caching mode page found
<4>[ 1867.869081] sd 3:0:0:0: [sdd] Assuming drive cache: write through
<6>[ 1867.871429] sdd: sdd1
<5>[ 1867.871857] sd 3:0:0:0: [sdd] Attached SCSI removable disk
<6>[ 1868.423593] usb 3-3: USB disconnect, device number 113
<6>[ 1868.431172] sdd: detected capacity change from 122880000 to 0
<28>[ 1928.670962] udevd[203]: sdd: Worker [9839] processing SEQNUM=6508 is taking a long time
<3>[ 2004.633104] INFO: task kworker/0:3:187 blocked for more than 122 seconds.
<3>[ 2004.633125] Tainted: G U 6.6.41-03520-gd3d77f15f842 #1
<3>[ 2004.633131] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[ 2004.633149] task:kworker/0:3 state:D stack:0 pid:187 ppid:2 flags:0x00004000
<6>[ 2004.633149] Workqueue: usb_hub_wq hub_event
<6>[ 2004.633166] Call Trace:
<6>[ 2004.633172] <TASK>
<6>[ 2004.633179] schedule+0x4f4/0x1540
<6>[ 2004.633190] ? default_wake_function+0x388/0xcd0
<6>[ 2004.633200] schedule_preempt_disabled+0x15/0x30
<6>[ 2004.633206] __mutex_lock_slowpath+0x2b5/0x4d0
<6>[ 2004.633212] del_gendisk+0x136/0x370
<6>[ 2004.633222] sd_remove+0x30/0x60
<6>[ 2004.633230] device_release_driver_internal+0x1a2/0x2a0
<6>[ 2004.633239] bus_remove_device+0x154/0x180
<6>[ 2004.633248] device_del+0x207/0x370
<6>[ 2004.633256] ? __pfx_transport_remove_classdev+0x10/0x10
<6>[ 2004.633264] ? attribute_container_device_trigger+0xe3/0x110
<6>[ 2004.633272] __scsi_remove_device+0xc0/0x170
<6>[ 2004.633279] scsi_forget_host+0x45/0x60
<6>[ 2004.633287] scsi_remove_host+0x87/0x170
<6>[ 2004.633295] usb_stor_disconnect+0x63/0xb0
<6>[ 2004.633302] usb_unbind_interface+0xbe/0x250
<6>[ 2004.633309] device_release_driver_internal+0x1a2/0x2a0
<6>[ 2004.633315] bus_remove_device+0x154/0x180
<6>[ 2004.633322] device_del+0x207/0x370
<6>[ 2004.633328] ? kobject_release+0x56/0xb0
<6>[ 2004.633336] usb_disable_device+0x72/0x170
<6>[ 2004.633342] usb_disconnect+0xeb/0x280
<6>[ 2004.633350] hub_event+0xac7/0x1760
<6>[ 2004.633359] worker_thread+0x355/0x900
<6>[ 2004.633367] kthread+0xed/0x110
<6>[ 2004.633374] ? __pfx_worker_thread+0x10/0x10
<6>[ 2004.633381] ? __pfx_kthread+0x10/0x10
<6>[ 2004.633387] ret_from_fork+0x38/0x50
<6>[ 2004.633393] ? __pfx_kthread+0x10/0x10
<6>[ 2004.633399] ret_from_fork_asm+0x1b/0x30
<6>[ 2004.633407] </TASK>
<3>[ 2004.633496] INFO: task cros-disks:1614 blocked for more than 122 seconds.
<3>[ 2004.633502] Tainted: G U 6.6.41-03520-gd3d77f15f842 #1
<3>[ 2004.633506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[ 2004.633519] task:cros-disks state:D stack:0 pid:1614 ppid:1 flags:0x00004002
<6>[ 2004.633519] Call Trace:
<6>[ 2004.633523] <TASK>
<6>[ 2004.633527] schedule+0x4f4/0x1540
<6>[ 2004.633533] ? xas_store+0xc57/0xcc0
<6>[ 2004.633539] ? lru_add_drain+0x4d8/0x6e0
<6>[ 2004.633548] blk_queue_enter+0x172/0x250
<6>[ 2004.633557] ? __pfx_autoremove_wake_function+0x10/0x10
<6>[ 2004.633565] blk_mq_alloc_request+0x167/0x210
<6>[ 2004.633573] scsi_execute_cmd+0x65/0x240
<6>[ 2004.633580] ioctl_internal_command+0x6c/0x150
<6>[ 2004.633590] scsi_set_medium_removal+0x63/0xc0
<6>[ 2004.633598] sd_release+0x42/0x50
<6>[ 2004.633606] blkdev_put+0x13b/0x1f0
<6>[ 2004.633615] blkdev_release+0x2b/0x40
<6>[ 2004.633623] __fput_sync+0x9b/0x2c0
<6>[ 2004.633632] __se_sys_close+0x69/0xc0
<6>[ 2004.633639] do_syscall_64+0x60/0x90
<6>[ 2004.633649] ? exit_to_user_mode_prepare+0x49/0x130
<6>[ 2004.633657] ? do_syscall_64+0x6f/0x90
<6>[ 2004.633665] ? do_syscall_64+0x6f/0x90
<6>[ 2004.633672] ? do_syscall_64+0x6f/0x90
<6>[ 2004.633680] ? irq_exit_rcu+0x38/0x90
<6>[ 2004.633687] ? exit_to_user_mode_prepare+0x49/0x130
<6>[ 2004.633694] entry_SYSCALL_64_after_hwframe+0x73/0xdd
<6>[ 2004.633703] RIP: 0033:0x786d55239960
<6>[ 2004.633711] RSP: 002b:00007ffd1c6d8c28 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
<6>[ 2004.633719] RAX: ffffffffffffffda RBX: 00005a5ffe743fd0 RCX: 0000786d55239960
<6>[ 2004.633725] RDX: 0000786d55307b00 RSI: 0000000000000000 RDI: 000000000000000c
<6>[ 2004.633730] RBP: 00007ffd1c6d8d30 R08: 0000000000000007 R09: 00005a5ffe78a9f0
<6>[ 2004.633735] R10: 8a1ecef621fff8a0 R11: 0000000000000202 R12: 0000000000000831
<6>[ 2004.633741] R13: 00005a5ffe743f60 R14: 00005a5ffe743f80 R15: 000000000000000c
<6>[ 2004.633746] </TASK>

  parent reply	other threads:[~2024-10-28  5:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 11:38 try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Christoph Hellwig
2024-10-09 11:38 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
2024-10-09 12:31   ` Sergey Senozhatsky
2024-10-09 12:41     ` Christoph Hellwig
2024-10-09 12:43       ` Sergey Senozhatsky
2024-10-09 13:49       ` Jens Axboe
2024-10-16  4:14   ` YangYang
2024-10-16 11:09   ` Ming Lei
2024-10-16 12:32     ` Christoph Hellwig
2024-10-16 12:49       ` Ming Lei
2024-10-16 13:35   ` Ming Lei
2024-10-19  1:25     ` Sergey Senozhatsky
2024-10-19 12:32       ` Ming Lei
2024-10-19 12:37         ` Sergey Senozhatsky
2024-10-19 12:50           ` Ming Lei
2024-10-19 12:58             ` Sergey Senozhatsky
2024-10-19 13:09               ` Ming Lei
2024-10-19 13:50                 ` Sergey Senozhatsky
2024-10-19 15:03                   ` Ming Lei
2024-10-19 15:11                     ` Sergey Senozhatsky
2024-10-19 15:40                     ` Sergey Senozhatsky
2024-10-28  5:44                     ` Sergey Senozhatsky [this message]
2024-10-09 11:38 ` [PATCH 2/2] block: mark the disk dead before taking open_mutx in del_gendisk Christoph Hellwig
2024-10-16  4:15   ` YangYang
2024-10-16  2:09 ` try to avoid del_gendisk vs passthrough from ->release deadlocks v2 Sergey Senozhatsky
  -- strict thread matches above, loose matches on Subject: below --
2024-10-08 11:57 RFC: try to avoid del_gendisk vs passthrough from ->release deadlocks Christoph Hellwig
2024-10-08 11:57 ` [PATCH 1/2] block: also mark disk-owned queues as dying in __blk_mark_disk_dead Christoph Hellwig
2024-10-09  5:06   ` Sergey Senozhatsky
2024-10-09  7:34     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241028054402.GT1279924@google.com \
    --to=senozhatsky@chromium.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=yang.yang@vivo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.