* hungtask in dm code raised by concurrent run refresh and remove command
@ 2024-11-05 12:27 wangzhiqiang (Q)
2024-11-05 21:15 ` Zdenek Kabelac
0 siblings, 1 reply; 2+ messages in thread
From: wangzhiqiang (Q) @ 2024-11-05 12:27 UTC (permalink / raw)
To: linux-lvm
Hi Team,
Here's a hungtask issue occurs in the dm-snapshot scenario,
reproduce by concurrent run vgchange --refresh and dmsetup -f remove vg-snap.
vgchange dmsetup dmsetup
table_load (load snapshot)
table_load snapshot to error
remove snapshot
suspend origin/cow/real
table_load(snapshot already remove)
take type_lock and issue io to cow in snapshot_ctr
table_load (wait type_lock)
[root@localhost ~]# ps aux | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1818066 0.0 0.0 0 0 ? D Nov04 0:03 [kworker/3:2+ksnaphd]
root 2972729 0.5 2.1 87256 73032 pts/1 D<L 20:17 0:00 vgchange --refresh vg
root 2972761 0.0 0.3 23464 10636 pts/1 D 20:17 0:00 dmsetup -f remove vg-snap
Snapshot has remove after suspend origin/cow/real during vgchange --refresh, and then load
snapshot will take type_lock and issue io to cow in snapshot_ctr, the io process by kworker
but cow has suspend lead to hungtask in kernel.
Does we have some way to fix it?
Thanks
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: hungtask in dm code raised by concurrent run refresh and remove command
2024-11-05 12:27 hungtask in dm code raised by concurrent run refresh and remove command wangzhiqiang (Q)
@ 2024-11-05 21:15 ` Zdenek Kabelac
0 siblings, 0 replies; 2+ messages in thread
From: Zdenek Kabelac @ 2024-11-05 21:15 UTC (permalink / raw)
To: wangzhiqiang (Q), linux-lvm
Dne 05. 11. 24 v 13:27 wangzhiqiang (Q) napsal(a):
> Hi Team,
> Here's a hungtask issue occurs in the dm-snapshot scenario,
> reproduce by concurrent run vgchange --refresh and dmsetup -f remove vg-snap.
>
> vgchange dmsetup dmsetup
> table_load (load snapshot)
> table_load snapshot to error
> remove snapshot
> suspend origin/cow/real
> table_load(snapshot already remove)
> take type_lock and issue io to cow in snapshot_ctr
> table_load (wait type_lock)
>
> [root@localhost ~]# ps aux | grep D
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 1818066 0.0 0.0 0 0 ? D Nov04 0:03 [kworker/3:2+ksnaphd]
> root 2972729 0.5 2.1 87256 73032 pts/1 D<L 20:17 0:00 vgchange --refresh vg
> root 2972761 0.0 0.3 23464 10636 pts/1 D 20:17 0:00 dmsetup -f remove vg-snap
>
> Snapshot has remove after suspend origin/cow/real during vgchange --refresh, and then load
> snapshot will take type_lock and issue io to cow in snapshot_ctr, the io process by kworker
> but cow has suspend lead to hungtask in kernel.
>
> Does we have some way to fix it?
It's like guessing from crystal ball what you were doing and what is the state
of the system in use.
Usually the most info you will get from 'dmsetup info -c'
If you have there any device in suspend - it's likely blocking the progress of
other commands which might be waiting on device resume.
In practice you are doing something which is not supportable in any way - you
can't interfere with DM tables of those device which are being manipulated by
lvm2 command (there is a good reason we use locked sections to ensure
exclusive access to those devices).
To recover from case you would need to know where the lvm2 command was
interfered and reaload & resume those device that are already expected to be
there and funcional - and this might be non-trivial operation if you have not
grabbed 'dmsetup table' state prior your interfering manipulation command -
which in practice is 'replacing' any existing target with 'error' target -
this can possibly create even a combination of devices that were not tested
before - thus causing some unexpected code flow.
It's also good to know which kernel version you are working with - over the
time many DM kernel bugs where fixed - so please make sure you are testing on
6.11 kernel.
Regards
Zdenek
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-11-05 21:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-05 12:27 hungtask in dm code raised by concurrent run refresh and remove command wangzhiqiang (Q)
2024-11-05 21:15 ` Zdenek Kabelac
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).