From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: Stuck in md_write_start because MD_SB_CHANGE_PENDING can't be cleared Date: Mon, 04 Sep 2017 12:16:50 +1000 Message-ID: <877exfdx7x.fsf@notabene.neil.brown.name> References: <546311999.4473128.1504339295016.JavaMail.zimbra@redhat.com> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: In-Reply-To: <546311999.4473128.1504339295016.JavaMail.zimbra@redhat.com> Sender: linux-raid-owner@vger.kernel.org To: Xiao Ni , linux-raid Cc: shli@kernel.org List-Id: linux-raid.ids --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Sat, Sep 02 2017, Xiao Ni wrote: > Hi Neil and Shaohua > > I'm trying to reproduce the problem the problem "raid5_finish_reshape is = stuck at revalidate_disk". > But there is a new stuck. I tried with 4.13.0-rc5 and latest mdadm. The s= teps are: > > #!/bin/bash > num=3D0 > while [ 1 ] > do > echo "*************************$num" > mdadm -Ss > mdadm --create --run /dev/md0 --level 5 --metadata 1.2 --raid-devices 5 = /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 --spare-devices 1 /d= ev/loop5 --chunk 512 > mdadm --wait /dev/md0 > mkfs -t ext4 /dev/md0 > mount -t ext4 /dev/md0 /mnt/md_test > dd if=3D/dev/urandom of=3D/mnt/md_test/testfile bs=3D1M count=3D100 > mdadm --add /dev/md0 /dev/loop6 > mdadm --grow --raid-devices 6 /dev/md0 > dd if=3D/dev/urandom of=3D/mnt/md_test/testfile bs=3D1M count=3D1000 > mdadm --wait /dev/md0 > ((num++)) > umount /dev/md0 > done > > The calltrace messages are: > Sep 1 13:57:25 localhost kernel: INFO: task kworker/u8:4:21401 blocked f= or more than 120 seconds. > Sep 1 13:57:25 localhost kernel: Tainted: G OE 4.13.0-r= c5 #2 > Sep 1 13:57:25 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_ti= meout_secs" disables this message. > Sep 1 13:57:25 localhost kernel: kworker/u8:4 D 0 21401 2 0x0= 0000080 > Sep 1 13:57:25 localhost kernel: Workqueue: writeback wb_workfn (flush-9= :0) > Sep 1 13:57:25 localhost kernel: Call Trace: > Sep 1 13:57:25 localhost kernel: __schedule+0x28d/0x890 > Sep 1 13:57:25 localhost kernel: schedule+0x36/0x80 > Sep 1 13:57:25 localhost kernel: md_write_start+0xf0/0x220 [md_mod] > Sep 1 13:57:25 localhost kernel: ? remove_wait_queue+0x60/0x60 > Sep 1 13:57:25 localhost kernel: raid5_make_request+0x89/0x8b0 [raid456] > Sep 1 13:57:25 localhost kernel: ? bio_split+0x5d/0x90 > Sep 1 13:57:25 localhost kernel: ? blk_queue_split+0xd2/0x630 > Sep 1 13:57:25 localhost kernel: ? remove_wait_queue+0x60/0x60 > Sep 1 13:57:25 localhost kernel: md_make_request+0xf5/0x260 [md_mod] > .... > > All the processes send bio to md are stuck at md_write_start. > wait_event(mddev->sb_wait, > !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && != mddev->suspended); > > > Sep 1 13:57:26 localhost kernel: INFO: task md0_reshape:23605 blocked fo= r more than 120 seconds. > Sep 1 13:57:26 localhost kernel: Tainted: G OE 4.13.0-r= c5 #2 > Sep 1 13:57:26 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_ti= meout_secs" disables this message. > Sep 1 13:57:26 localhost kernel: md0_reshape D 0 23605 2 0x0= 0000080 > Sep 1 13:57:26 localhost kernel: Call Trace: > Sep 1 13:57:26 localhost kernel: __schedule+0x28d/0x890 > Sep 1 13:57:26 localhost kernel: schedule+0x36/0x80 > Sep 1 13:57:26 localhost kernel: raid5_sync_request+0x2cf/0x370 [raid456] > Sep 1 13:57:26 localhost kernel: ? remove_wait_queue+0x60/0x60 > Sep 1 13:57:26 localhost kernel: md_do_sync+0xafe/0xee0 [md_mod] > Sep 1 13:57:26 localhost kernel: ? remove_wait_queue+0x60/0x60 > Sep 1 13:57:26 localhost kernel: md_thread+0x132/0x180 [md_mod] > Sep 1 13:57:26 localhost kernel: kthread+0x109/0x140 > Sep 1 13:57:26 localhost kernel: ? find_pers+0x70/0x70 [md_mod] > Sep 1 13:57:26 localhost kernel: ? kthread_park+0x60/0x60 > Sep 1 13:57:26 localhost kernel: ? do_syscall_64+0x67/0x150 > Sep 1 13:57:26 localhost kernel: ret_from_fork+0x25/0x30 > > It stucks at: > /* Allow raid5_quiesce to complete */ > wait_event(conf->wait_for_overlap, conf->quiesce !=3D 2); > > Sep 1 13:57:26 localhost kernel: INFO: task mdadm:23613 blocked for more= than 120 seconds. > Sep 1 13:57:26 localhost kernel: Tainted: G OE 4.13.0-r= c5 #2 > Sep 1 13:57:26 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_ti= meout_secs" disables this message. > Sep 1 13:57:26 localhost kernel: mdadm D 0 23613 1 0x0= 0000080 > Sep 1 13:57:26 localhost kernel: Call Trace: > Sep 1 13:57:26 localhost kernel: __schedule+0x28d/0x890 > Sep 1 13:57:26 localhost kernel: schedule+0x36/0x80 > Sep 1 13:57:26 localhost kernel: raid5_quiesce+0x274/0x2b0 [raid456] > Sep 1 13:57:26 localhost kernel: ? remove_wait_queue+0x60/0x60 > Sep 1 13:57:26 localhost kernel: suspend_lo_store+0x82/0xe0 [md_mod] > Sep 1 13:57:26 localhost kernel: md_attr_store+0x80/0xc0 [md_mod] > Sep 1 13:57:26 localhost kernel: sysfs_kf_write+0x3a/0x50 > Sep 1 13:57:26 localhost kernel: kernfs_fop_write+0xff/0x180 > Sep 1 13:57:26 localhost kernel: __vfs_write+0x37/0x170 > Sep 1 13:57:26 localhost kernel: ? selinux_file_permission+0xe5/0x120 > Sep 1 13:57:26 localhost kernel: ? security_file_permission+0x3b/0xc0 > Sep 1 13:57:26 localhost kernel: vfs_write+0xb2/0x1b0 > Sep 1 13:57:26 localhost kernel: ? syscall_trace_enter+0x1d0/0x2b0 > Sep 1 13:57:26 localhost kernel: SyS_write+0x55/0xc0 > Sep 1 13:57:26 localhost kernel: do_syscall_64+0x67/0x150 > > It's stuck at: > conf->quiesce =3D 2; > wait_event_cmd(conf->wait_for_quiescent, > atomic_read(&conf->active_stripes) =3D=3D 0 && > atomic_read(&conf->active_aligned_reads) =3D=3D 0, > unlock_all_device_hash_locks_irq(conf), > lock_all_device_hash_locks_irq(conf)); > conf->quiesce =3D 1; > > [root@dell-pr1700-02 ~]# cat /proc/mdstat=20 > Personalities : [raid6] [raid5] [raid4]=20 > md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2] loop1[= 1] loop0[0] > 2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UU= UUUU] > [>....................] reshape =3D 0.5% (2560/509952) finish=3D1= 62985.0min speed=3D0K/sec >=20=20=20=20=20=20=20 > unused devices: > > > I can reproduce this. So I add some logs by printk to check MD_SB_CHANGE_= PENDING of mddev->flags > and mddev->suspended. > mddev->suspend : 0 > mddev->flags : SP CHANGE PENDING is set > conf->quiesce : 2 > > The I echo active > /sys/block/md0/md/array_state, reshape can start and = finish successfully. > > I notice there are some fixes for raid5 stuck problem. Not sure whether i= t's introduced by those > patches.=20 > > Best Regards > Xiao Thanks for the detailed report. All the processes that you have mentioned are waiting for raid5_quiesce() to complete. raid5_quiesce() must be waiting for either active_stripes or active_aligned_reads to reach zero. Most likely active_stripes. That suggests that either there is a stripe that is active and isn't being handled for some reason, or the accounting of active stripes has problems. The fact that setting the array to 'active' removes the hang is interesting. That would cause the threads blocked in md_write_start to continue, but I cannot see how that would affect active_stripes as threads waiting for md_write_start() never hold a stripe. Doing that could cause a stripe to be used, then released, which might trigger a wakeup that had been lost. It would be useful to see if ->active_stripes is 0 when the deadlock happens. One way to do that is to add a printk into raid5_status(). Then whenever you "cat /proc/mdstat" the printk will trigger and tell you the value of ->active_stripes. I haven't managed to reproduce this yet, or I would track it down my self. Thanks, NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAlmst6QACgkQOeye3VZi gbnCrQ/+IKPSNdUOKhQu9DXlCrARL9v01AGXg6UT0T67ZWVytNmP5/tr9kupAWLV ycit2EXHwkItFSXI1mhMHSkOYX6QbIdGUpM8HSLI36hgjpXQPtjB1Wlltv0gLHpk pZ9Rzrx3K3jBbQj4Ai7AD8jQMAI6rK0bMiDT5mtRE/WW8lXBNPS0ekOs8JDEGaVH MmZQEGH12ngXKIxcNo1XJ8x8cXX/2TshEd9rM7oG49oAlvr7QVGnXKW7jHe73UAY t5zuF5PMT2x9pMOdGoqjNpNhOuOeJeWQUe7Qqz3VnExvcStHoXWhNqjnEqjJ0Vl6 lPpEu3CeednmgdwcMgoRBwNrn0ttbiQ4BHYmiDCmZtlTRx2YXHqO27foHCtFqFqV TKqaIDW8UV3K0wQ0deJUhoJPqjhdctrOWL7f5Sus/QTTY0BGboZrXr2KWuSeOSg9 Su3+0JkaNvkdbGRJaMBGS0wKYf56Q6AfBmL0hoBSnWbcqL2HlpDXvKuGlVAYDFT5 bL74mO61IibJPLMqkFSSm2b+0jTAFamwKCizNa8dkhXdJp76l+YVBPSUFH6Pn7Fn It5TFWlbpwwSpybqdLezFmkOZ2fqAvkHsUsAlJHyMi9qQSEzeJhSlyoTNdDTWzoz Nd3FQAWdzh6t1YtKfMihWLYv+bcdywLAAmp83lUa9U1c+mck2H8= =llrd -----END PGP SIGNATURE----- --=-=-=--