From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shaohua Li Subject: Re: [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array Date: Fri, 3 Jun 2016 16:21:32 -0700 Message-ID: <20160603232132.GA2171@kernel.org> References: <1464924725-18096-1-git-send-email-gqjiang@suse.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <1464924725-18096-1-git-send-email-gqjiang@suse.com> Sender: linux-raid-owner@vger.kernel.org To: Guoqing Jiang Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On Thu, Jun 02, 2016 at 11:32:04PM -0400, Guoqing Jiang wrote: > Add a disk to an array which is performing recovery > is a little complicated, we need to do both reap the > sync thread and perform add disk for the case, then > it caused deadlock as follows. > > linux44:~ # ps aux|grep md|grep D > root 1822 0.0 0.0 0 0 ? D 16:50 0:00 [md127_resync] > root 1848 0.0 0.0 19860 952 pts/0 D+ 16:50 0:00 mdadm --manage /dev/md127 --re-add /dev/vdb > linux44:~ # cat /proc/1848/stack > [] kthread_stop+0x6e/0x120 > [] md_unregister_thread+0x40/0x80 [md_mod] > [] md_reap_sync_thread+0x15/0x150 [md_mod] > [] action_store+0x260/0x270 [md_mod] > [] md_attr_store+0xb4/0x100 [md_mod] > [] sysfs_write_file+0xbe/0x140 > [] vfs_write+0xb8/0x1e0 > [] SyS_write+0x48/0xa0 > [] system_call_fastpath+0x16/0x1b > [<00007f068ea1ed30>] 0x7f068ea1ed30 > linux44:~ # cat /proc/1822/stack > [] md_do_sync+0x846/0xf40 [md_mod] > [] md_thread+0x16d/0x180 [md_mod] > [] kthread+0xb4/0xc0 > [] ret_from_fork+0x58/0x90 > > Task1848 Task1822 > md_attr_store (held reconfig_mutex by call mddev_lock()) > action_store > md_reap_sync_thread > md_unregister_thread > kthread_stop md_wakeup_thread(mddev->thread); > wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING)) > > md_check_recovery is triggered by wakeup mddev->thread, > but it can't clear MD_CHANGE_PENDING flag since it can't > get lock which was held by md_attr_store already. > > To solve the deadlock problem, we move "->resync_finish()" > from md_do_sync to md_reap_sync_thread (after md_update_sb), > also MD_HELD_RESYNC_LOCK is introduced since it is possible > that node can't get resync lock in md_do_sync. > > Then we do not need to wait for MD_CHANGE_PENDING is cleared > or not since metadata should be updated after md_update_sb, > so just call resync_finish if MD_HELD_RESYNC_LOCK is set. > > We also unified the code after skip label, since set PENDING > for non-clustered case should be harmless. applied the two, thanks!