* [Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule()
@ 2017-09-25 7:47 Guoqing Jiang
2017-09-25 19:07 ` David Teigland
0 siblings, 1 reply; 2+ messages in thread
From: Guoqing Jiang @ 2017-09-25 7:47 UTC (permalink / raw)
To: cluster-devel.redhat.com
Call schedule() here could make the thread miss wake
up from kthread_stop(), so it is better to recheck
kthread_should_stop() before call schedule(), a symptom
happened when I run indefinite test (which mostly created
clustered raid1, assemble it in other nodes, then stop
them) of clustered raid.
linux175:~ # ps aux|grep md|grep D
root 4211 0.0 0.0 19760 2220 ? Ds 02:58 0:00 mdadm -Ssq
linux175:~ # cat /proc/4211/stack
[<ffffffff810920cd>] kthread_stop+0x4d/0x150
[<ffffffffa05f0f95>] dlm_recoverd_stop+0x15/0x20 [dlm]
[<ffffffffa05e885b>] dlm_release_lockspace+0x2ab/0x460 [dlm]
[<ffffffffa06291af>] leave+0xbf/0x150 [md_cluster]
[<ffffffffa06157c8>] md_cluster_stop+0x18/0x30 [md_mod]
[<ffffffffa061659e>] bitmap_free+0x12e/0x140 [md_mod]
[<ffffffffa061903f>] bitmap_destroy+0x7f/0x90 [md_mod]
[<ffffffffa0609381>] __md_stop+0x21/0xa0 [md_mod]
[<ffffffffa061246f>] do_md_stop+0x15f/0x5c0 [md_mod]
[<ffffffffa0614515>] md_ioctl+0xa65/0x18a0 [md_mod]
[<ffffffff8138aa8e>] blkdev_ioctl+0x49e/0x8d0
[<ffffffff812535c1>] block_ioctl+0x41/0x50
[<ffffffff8122bf86>] do_vfs_ioctl+0x96/0x5b0
[<ffffffff8122c519>] SyS_ioctl+0x79/0x90
[<ffffffff81702dfb>] entry_SYSCALL_64_fastpath+0x1e/0xad
This maybe not resolve the issue completely since the
KTHREAD_SHOULD_STOP flag could be set between "break"
and "schedule", but at least the chance for the symptom
happen could be reduce a lot (The indefinite test runs
more than 20 hours without problem and it happens easily
without the change).
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
fs/dlm/recoverd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c
index 6859b4b..9fab490 100644
--- a/fs/dlm/recoverd.c
+++ b/fs/dlm/recoverd.c
@@ -290,8 +290,11 @@ static int dlm_recoverd(void *arg)
while (!kthread_should_stop()) {
set_current_state(TASK_INTERRUPTIBLE);
if (!test_bit(LSFL_RECOVER_WORK, &ls->ls_flags) &&
- !test_bit(LSFL_RECOVER_DOWN, &ls->ls_flags))
+ !test_bit(LSFL_RECOVER_DOWN, &ls->ls_flags)) {
+ if (kthread_should_stop())
+ break;
schedule();
+ }
set_current_state(TASK_RUNNING);
if (test_and_clear_bit(LSFL_RECOVER_DOWN, &ls->ls_flags)) {
--
2.6.6
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule()
2017-09-25 7:47 [Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule() Guoqing Jiang
@ 2017-09-25 19:07 ` David Teigland
0 siblings, 0 replies; 2+ messages in thread
From: David Teigland @ 2017-09-25 19:07 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Mon, Sep 25, 2017 at 03:47:50PM +0800, Guoqing Jiang wrote:
> Call schedule() here could make the thread miss wake
> up from kthread_stop(), so it is better to recheck
> kthread_should_stop() before call schedule(), a symptom
> happened when I run indefinite test (which mostly created
> clustered raid1, assemble it in other nodes, then stop
> them) of clustered raid.
Thanks, I put this into the next branch in linux-dlm (which I also
moved onto -rc2.)
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-09-25 19:07 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-25 7:47 [Cluster-devel] [PATCH] dlm/recoverd: recheck kthread_should_stop() before schedule() Guoqing Jiang
2017-09-25 19:07 ` David Teigland
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.