From: Zdenek Kabelac <zdenek.kabelac@gmail.com>
To: "wangzhiqiang (Q)" <wangzhiqiang95@huawei.com>,
linux-lvm@lists.linux.dev
Subject: Re: hungtask in dm code raised by concurrent run refresh and remove command
Date: Tue, 5 Nov 2024 22:15:25 +0100 [thread overview]
Message-ID: <cef4164c-1aa4-48ba-b220-f2b1b2ad03fe@gmail.com> (raw)
In-Reply-To: <5d723c3e-9437-5006-4891-1ba659de30a1@huawei.com>
Dne 05. 11. 24 v 13:27 wangzhiqiang (Q) napsal(a):
> Hi Team,
> Here's a hungtask issue occurs in the dm-snapshot scenario,
> reproduce by concurrent run vgchange --refresh and dmsetup -f remove vg-snap.
>
> vgchange dmsetup dmsetup
> table_load (load snapshot)
> table_load snapshot to error
> remove snapshot
> suspend origin/cow/real
> table_load(snapshot already remove)
> take type_lock and issue io to cow in snapshot_ctr
> table_load (wait type_lock)
>
> [root@localhost ~]# ps aux | grep D
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 1818066 0.0 0.0 0 0 ? D Nov04 0:03 [kworker/3:2+ksnaphd]
> root 2972729 0.5 2.1 87256 73032 pts/1 D<L 20:17 0:00 vgchange --refresh vg
> root 2972761 0.0 0.3 23464 10636 pts/1 D 20:17 0:00 dmsetup -f remove vg-snap
>
> Snapshot has remove after suspend origin/cow/real during vgchange --refresh, and then load
> snapshot will take type_lock and issue io to cow in snapshot_ctr, the io process by kworker
> but cow has suspend lead to hungtask in kernel.
>
> Does we have some way to fix it?
It's like guessing from crystal ball what you were doing and what is the state
of the system in use.
Usually the most info you will get from 'dmsetup info -c'
If you have there any device in suspend - it's likely blocking the progress of
other commands which might be waiting on device resume.
In practice you are doing something which is not supportable in any way - you
can't interfere with DM tables of those device which are being manipulated by
lvm2 command (there is a good reason we use locked sections to ensure
exclusive access to those devices).
To recover from case you would need to know where the lvm2 command was
interfered and reaload & resume those device that are already expected to be
there and funcional - and this might be non-trivial operation if you have not
grabbed 'dmsetup table' state prior your interfering manipulation command -
which in practice is 'replacing' any existing target with 'error' target -
this can possibly create even a combination of devices that were not tested
before - thus causing some unexpected code flow.
It's also good to know which kernel version you are working with - over the
time many DM kernel bugs where fixed - so please make sure you are testing on
6.11 kernel.
Regards
Zdenek
prev parent reply other threads:[~2024-11-05 21:15 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-05 12:27 hungtask in dm code raised by concurrent run refresh and remove command wangzhiqiang (Q)
2024-11-05 21:15 ` Zdenek Kabelac [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cef4164c-1aa4-48ba-b220-f2b1b2ad03fe@gmail.com \
--to=zdenek.kabelac@gmail.com \
--cc=linux-lvm@lists.linux.dev \
--cc=wangzhiqiang95@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).