From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Donald Buczek <buczek@molgen.mpg.de>,
Logan Gunthorpe <logang@deltatee.com>, Song Liu <song@kernel.org>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held
Date: Mon, 23 May 2022 09:08:22 +0800 [thread overview]
Message-ID: <5b0584a3-c128-cb53-7c8a-63744c60c667@linux.dev> (raw)
In-Reply-To: <836b2a93-65be-8d6c-8610-18373b88f86d@molgen.mpg.de>
On 5/22/22 2:23 AM, Donald Buczek wrote:
> On 20.05.22 20:27, Logan Gunthorpe wrote:
>>
>> Hi,
>>
>> On 2022-05-10 00:44, Song Liu wrote:
>>> On Mon, May 9, 2022 at 1:09 AM Guoqing Jiang
>>> <guoqing.jiang@linux.dev> wrote:
>>>> On 5/9/22 2:37 PM, Song Liu wrote:
>>>>> On Fri, May 6, 2022 at 4:37 AM Guoqing
>>>>> Jiang<guoqing.jiang@linux.dev> wrote:
>>>>>> From: Guoqing Jiang<guoqing.jiang@cloud.ionos.com>
>>>>>>
>>>>>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
>>>>>> doesn't reconfigure array.
>>>>>>
>>>>>> And it could cause deadlock problem for raid5 as follows:
>>>>>>
>>>>>> 1. process A tried to reap sync thread with reconfig_mutex held
>>>>>> after echo
>>>>>> idle to sync_action.
>>>>>> 2. raid5 sync thread was blocked if there were too many active
>>>>>> stripes.
>>>>>> 3. SB_CHANGE_PENDING was set (because of write IO comes from
>>>>>> upper layer)
>>>>>> which causes the number of active stripes can't be decreased.
>>>>>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was
>>>>>> not able
>>>>>> to hold reconfig_mutex.
>>>>>>
>>>>>> More details in the link:
>>>>>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
>>>>>>
>>>>>>
>>>>>> Let's call unregister thread between mddev_unlock and
>>>>>> mddev_lock_nointr
>>>>>> (thanks for the report from kernel test robot<lkp@intel.com>) if the
>>>>>> reconfig_mutex is held, and mddev_is_locked is introduced
>>>>>> accordingly.
>>>>> mddev_is_locked() feels really hacky to me. It cannot tell whether
>>>>> mddev is locked
>>>>> by current thread. So technically, we can unlock reconfigure_mutex
>>>>> for
>>>>> other thread
>>>>> by accident, no?
>>>>
>>>> I can switch back to V2 if you think that is the correct way to do
>>>> though no
>>>> one comment about the change in dm-raid.
>>>
>>> I guess v2 is the best at the moment. I pushed a slightly modified
>>> v2 to
>>> md-next.
>>>
>>
>> I noticed a clear regression with mdadm tests with this patch in md-next
>> (7e6ba434cc6080).
>>
>> Before the patch, tests 07reshape5intr and 07revert-grow would fail
>> fairly infrequently (about 1 in 4 runs for the former and 1 in 30 runs
>> for the latter).
>>
>> After this patch, both tests always fail.
>>
>> I don't have time to dig into why this is, but it would be nice if
>> someone can at least fix the regression. It is hard to make any progress
>> on these tests if we are continuing to further break them.
>
> Hmmm. I wanted to try to help a bit by reproducing and digging into this.
>
> But it seems that more or less ALL tests hang my system one way or
> another.
>
> I've used a qemu/kvm machine with md-next and mdraid master.
>
> Is this supposed to work?
>
> I can investigate the bugs I see, but probably that is a waste of time
> because I'm doing something wrong fundamentally?
>
> This is an example from 00raid0:
>
> [ 57.434064] md: md0 stopped.
> [ 57.586951] md0: detected capacity change from 0 to 107520
> [ 57.618454] BUG: kernel NULL pointer dereference, address:
> 0000000000000094
> [ 57.620830] #PF: supervisor read access in kernel mode
> [ 57.622554] #PF: error_code(0x0000) - not-present page
> [ 57.624273] PGD 800000010d5ee067 P4D 800000010d5ee067 PUD 10df28067
> PMD 0
> [ 57.626548] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 57.627942] CPU: 3 PID: 1064 Comm: mkfs.ext3 Not tainted
> 5.18.0-rc3.mx64.425-00108-g6ad84d559b8c #77
> [ 57.630952] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> [ 57.635927] RIP: 0010:bfq_bio_bfqg+0x26/0x80
> [ 57.638027] Code: 00 0f 1f 00 0f 1f 44 00 00 55 53 48 89 fd 48 8b
> 56 48 48 89 f7 48 85 d2 74 32 48 63 05 53 54 1c 01 48 83 c0 16 48 8b
> 5c c2 08 <80> bb 94 00 00 00 00 70
> [ 57.645295] RSP: 0018:ffffc90001c27b38 EFLAGS: 00010006
> [ 57.647414] RAX: 0000000000000018 RBX: 0000000000000000 RCX:
> 0000000000000001
> [ 57.650039] RDX: ffff888109297800 RSI: ffff8881032ba180 RDI:
> ffff8881032ba180
> [ 57.652541] RBP: ffff888102177800 R08: ffff88810c9004c8 R09:
> ffff88810318cb00
> [ 57.654852] R10: 0000000000000000 R11: ffff8881032ba180 R12:
> ffff88810318cae0
> [ 57.657128] R13: ffff888102177800 R14: ffffc90001c27ca8 R15:
> ffffc90001c27c00
> [ 57.659316] FS: 00007fdfce47d440(0000) GS:ffff8882b5ac0000(0000)
> knlGS:0000000000000000
> [ 57.661700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 57.663461] CR2: 0000000000000094 CR3: 000000010d438002 CR4:
> 0000000000170ee0
> [ 57.665453] Call Trace:
> [ 57.666479] <TASK>
> [ 57.667382] bfq_bic_update_cgroup+0x28/0x1b0
> [ 57.668724] bfq_insert_requests+0x233/0x2340
> [ 57.670049] ? ioc_find_get_icq+0x21c/0x2a0
> [ 57.671315] ? bfq_prepare_request+0x11/0x30
> [ 57.672565] blk_mq_sched_insert_requests+0x5c/0x150
> [ 57.673891] blk_mq_flush_plug_list+0xe1/0x2a0
> [ 57.675140] __blk_flush_plug+0xdf/0x120
> [ 57.676259] io_schedule_prepare+0x3d/0x50
> [ 57.677373] io_schedule_timeout+0xf/0x40
> [ 57.678465] wait_for_completion_io+0x78/0x140
> [ 57.679578] submit_bio_wait+0x5b/0x80
> [ 57.680575] blkdev_issue_discard+0x65/0xb0
> [ 57.681640] blkdev_common_ioctl+0x391/0x8f0
> [ 57.682712] blkdev_ioctl+0x216/0x2a0
> [ 57.683648] __x64_sys_ioctl+0x76/0xb0
> [ 57.684607] do_syscall_64+0x42/0x90
> [ 57.685527] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 57.686645] RIP: 0033:0x7fdfce56dc17
> [ 57.687535] Code: 48 c7 c3 ff ff ff ff 48 89 d8 5b 5d 41 5c c3 0f
> 1f 40 00 48 89 e8 48 f7 d8 48 39 c3 0f 92 c0 eb 93 66 90 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 08
> [ 57.691055] RSP: 002b:00007ffe24319828 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 57.692537] RAX: ffffffffffffffda RBX: 00000000004645a0 RCX:
> 00007fdfce56dc17
> [ 57.693905] RDX: 00007ffe24319830 RSI: 0000000000001277 RDI:
> 0000000000000003
> [ 57.695288] RBP: 0000000000460960 R08: 0000000000000400 R09:
> 0000000000000000
> [ 57.696645] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000000
> [ 57.697954] R13: 000000000000d200 R14: 0000000000000000 R15:
> 0000000000000000
> [ 57.699281] </TASK>
> [ 57.699901] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q garp
> stp mrp llc bochs drm_vram_helper drm_ttm_helper kvm_intel ttm
> drm_kms_helper kvm drm fb_sys_fops vi4
> [ 57.705955] CR2: 0000000000000094
> [ 57.706710] ---[ end trace 0000000000000000 ]---
> [ 57.707599] RIP: 0010:bfq_bio_bfqg+0x26/0x80
> [ 57.708434] Code: 00 0f 1f 00 0f 1f 44 00 00 55 53 48 89 fd 48 8b
> 56 48 48 89 f7 48 85 d2 74 32 48 63 05 53 54 1c 01 48 83 c0 16 48 8b
> 5c c2 08 <80> bb 94 00 00 00 00 70
> [ 57.711426] RSP: 0018:ffffc90001c27b38 EFLAGS: 00010006
> [ 57.712391] RAX: 0000000000000018 RBX: 0000000000000000 RCX:
> 0000000000000001
> [ 57.713605] RDX: ffff888109297800 RSI: ffff8881032ba180 RDI:
> ffff8881032ba180
> [ 57.714811] RBP: ffff888102177800 R08: ffff88810c9004c8 R09:
> ffff88810318cb00
> [ 57.716018] R10: 0000000000000000 R11: ffff8881032ba180 R12:
> ffff88810318cae0
> [ 57.717236] R13: ffff888102177800 R14: ffffc90001c27ca8 R15:
> ffffc90001c27c00
> [ 57.718438] FS: 00007fdfce47d440(0000) GS:ffff8882b5ac0000(0000)
> knlGS:0000000000000000
> [ 57.719778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 57.720808] CR2: 0000000000000094 CR3: 000000010d438002 CR4:
> 0000000000170ee0
> [ 57.722019] note: mkfs.ext3[1064] exited with preempt_count 1
> [ 57.723067] ------------[ cut here ]------------
> [ 57.723960] WARNING: CPU: 3 PID: 1064 at kernel/exit.c:741
> do_exit+0x8cb/0xbc0
> [ 57.725196] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q garp
> stp mrp llc bochs drm_vram_helper drm_ttm_helper kvm_intel ttm
> drm_kms_helper kvm drm fb_sys_fops vi4
> [ 57.731011] CPU: 3 PID: 1064 Comm: mkfs.ext3 Tainted: G D
> 5.18.0-rc3.mx64.425-00108-g6ad84d559b8c #77
> [ 57.732704] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> [ 57.734853] RIP: 0010:do_exit+0x8cb/0xbc0
> [ 57.735711] Code: e9 13 ff ff ff 48 8b bb e0 04 00 00 31 f6 e8 4c
> db ff ff e9 98 fd ff ff 4c 89 e6 bf 05 06 00 00 e8 8a c8 00 00 e9 41
> f8 ff ff <0f> 0b e9 6b f7 ff ff 4b
> [ 57.738851] RSP: 0018:ffffc90001c27ee8 EFLAGS: 00010082
> [ 57.739899] RAX: 0000000000000000 RBX: ffff888101e48000 RCX:
> 0000000000000000
> [ 57.741196] RDX: 0000000000000001 RSI: ffffffff8220a969 RDI:
> 0000000000000009
> [ 57.742485] RBP: 0000000000000009 R08: 0000000000000000 R09:
> c0000000ffffbfff
> [ 57.743777] R10: 00007fdfce47d440 R11: ffffc90001c27d60 R12:
> 0000000000000009
> [ 57.745081] R13: 0000000000000046 R14: 0000000000000000 R15:
> 0000000000000000
> [ 57.746388] FS: 00007fdfce47d440(0000) GS:ffff8882b5ac0000(0000)
> knlGS:0000000000000000
> [ 57.747806] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 57.748931] CR2: 0000000000000094 CR3: 000000010d438002 CR4:
> 0000000000170ee0
> [ 57.750225] Call Trace:
> [ 57.750894] <TASK>
> [ 57.751535] make_task_dead+0x41/0xf0
> [ 57.752369] rewind_stack_and_make_dead+0x17/0x17
> [ 57.753336] RIP: 0033:0x7fdfce56dc17
> [ 57.754155] Code: 48 c7 c3 ff ff ff ff 48 89 d8 5b 5d 41 5c c3 0f
> 1f 40 00 48 89 e8 48 f7 d8 48 39 c3 0f 92 c0 eb 93 66 90 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 08
> [ 57.757318] RSP: 002b:00007ffe24319828 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 57.758669] RAX: ffffffffffffffda RBX: 00000000004645a0 RCX:
> 00007fdfce56dc17
> [ 57.759956] RDX: 00007ffe24319830 RSI: 0000000000001277 RDI:
> 0000000000000003
> [ 57.761256] RBP: 0000000000460960 R08: 0000000000000400 R09:
> 0000000000000000
> [ 57.762531] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000000
> [ 57.763806] R13: 000000000000d200 R14: 0000000000000000 R15:
> 0000000000000000
> [ 57.765177] </TASK>
> [ 57.765813] ---[ end trace 0000000000000000 ]---
> [ 57.790046] md0: detected capacity change from 107520 to 0
> [ 57.792834] md: md0 stopped.
> [ 78.843853] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 78.845334] rcu: 10-...0: (0 ticks this GP)
> idle=07b/1/0x4000000000000000 softirq=1140/1140 fqs=4805
> [ 78.847246] (detected by 13, t=21005 jiffies, g=9013, q=1419)
> [ 78.848619] Sending NMI from CPU 13 to CPUs 10:
> [ 78.849810] NMI backtrace for cpu 10
> [ 78.849813] CPU: 10 PID: 1081 Comm: mdadm Tainted: G D
> W 5.18.0-rc3.mx64.425-00108-g6ad84d559b8c #77
> [ 78.849816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> [ 78.849817] RIP: 0010:queued_spin_lock_slowpath+0x4c/0x1d0
> [ 78.849832] Code: 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0
> a9 00 01 ff ff 75 1b 85 c0 75 0f b8 01 00 00 00 66 89 07 5b 5d 41 5c
> c3 f3 90 <8b> 07 84 c0 75 f8 eb e7
> [ 78.849834] RSP: 0018:ffffc90001c9f9e0 EFLAGS: 00000002
> [ 78.849837] RAX: 0000000000040101 RBX: ffff88810c914fc8 RCX:
> 0000000000000000
> [ 78.849838] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff888102177c30
> [ 78.849840] RBP: 0000000000000000 R08: ffff88810c914fc8 R09:
> ffff888106a4ed10
> [ 78.849841] R10: ffffc90001c9fae8 R11: ffff888101b048d8 R12:
> ffff888103833000
> [ 78.849842] R13: ffff888102177800 R14: ffffc90001c9fb20 R15:
> ffffc90001c9fa78
> [ 78.849844] FS: 00007fd3d66c4340(0000) GS:ffff8882b5c80000(0000)
> knlGS:0000000000000000
> [ 78.849847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 78.849848] CR2: 00000000004a5b58 CR3: 000000010d438001 CR4:
> 0000000000170ee0
> [ 78.849850] Call Trace:
> [ 78.849853] <TASK>
> [ 78.849855] bfq_insert_requests+0xae/0x2340
> [ 78.849862] ? submit_bio_noacct_nocheck+0x225/0x2b0
> [ 78.849868] blk_mq_sched_insert_requests+0x5c/0x150
> [ 78.849872] blk_mq_flush_plug_list+0xe1/0x2a0
> [ 78.849876] __blk_flush_plug+0xdf/0x120
> [ 78.849879] blk_finish_plug+0x27/0x40
> [ 78.849882] read_pages+0x15b/0x360
> [ 78.849891] page_cache_ra_unbounded+0x120/0x170
> [ 78.849894] filemap_get_pages+0xdd/0x5f0
> [ 78.849899] filemap_read+0xbf/0x350
> [ 78.849902] ? __mod_memcg_lruvec_state+0x72/0xc0
> [ 78.849907] ? __mod_lruvec_page_state+0xb4/0x160
> [ 78.849909] ? folio_add_lru+0x51/0x80
> [ 78.849912] ? _raw_spin_unlock+0x12/0x30
> [ 78.849916] ? __handle_mm_fault+0xdee/0x14d0
> [ 78.849921] blkdev_read_iter+0xa9/0x180
> [ 78.849924] new_sync_read+0x109/0x180
> [ 78.849929] vfs_read+0x187/0x1b0
> [ 78.849932] ksys_read+0xa1/0xe0
> [ 78.849935] do_syscall_64+0x42/0x90
> [ 78.849938] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 78.849941] RIP: 0033:0x7fd3d6322f8e
> [ 78.849944] Code: c0 e9 c6 fe ff ff 48 8d 3d a7 07 0a 00 48 83 ec
> 08 e8 b6 e1 01 00 66 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75
> 14 0f 05 <48> 3d 00 f0 ff ff 77 59
> [ 78.849945] RSP: 002b:00007ffe92d46ea8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000000
> [ 78.849948] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007fd3d6322f8e
> [ 78.849949] RDX: 0000000000001000 RSI: 00000000004a3000 RDI:
> 0000000000000003
> [ 78.849950] RBP: 0000000000000003 R08: 00000000004a3000 R09:
> 0000000000000003
> [ 78.849951] R10: 00007fd3d623d0a8 R11: 0000000000000246 R12:
> 00000000004a2a60
> [ 78.849952] R13: 0000000000000000 R14: 00000000004a3000 R15:
> 000000000048a4a0
> [ 78.849954] </TASK>
Looks like bfq or block issue, will try it from my side.
Thanks,
Guoqing
next prev parent reply other threads:[~2022-05-23 1:08 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-05 8:16 [PATCH 0/2] two fixes for md Guoqing Jiang
2022-05-05 8:16 ` [PATCH V3 1/2] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang
2022-05-05 14:02 ` kernel test robot
2022-05-05 18:04 ` kernel test robot
2022-05-06 2:34 ` Guoqing Jiang
2022-05-05 8:16 ` [PATCH 2/2] md: protect md_unregister_thread from reentrancy Guoqing Jiang
2022-05-09 6:39 ` Song Liu
2022-05-09 8:12 ` Guoqing Jiang
2022-05-06 11:36 ` [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang
2022-05-09 6:37 ` Song Liu
2022-05-09 8:09 ` Guoqing Jiang
2022-05-09 9:32 ` Wols Lists
2022-05-09 10:37 ` Guoqing Jiang
2022-05-09 11:19 ` Wols Lists
2022-05-09 11:26 ` Guoqing Jiang
2022-05-10 6:44 ` Song Liu
2022-05-10 12:01 ` Donald Buczek
2022-05-10 12:09 ` Guoqing Jiang
2022-05-10 12:35 ` Donald Buczek
2022-05-10 18:02 ` Song Liu
2022-05-11 8:10 ` Guoqing Jiang
2022-05-11 21:45 ` Song Liu
2022-05-20 18:27 ` Logan Gunthorpe
2022-05-21 18:23 ` Donald Buczek
2022-05-23 1:08 ` Guoqing Jiang [this message]
2022-05-23 5:41 ` Donald Buczek
2022-05-23 9:51 ` Guoqing Jiang
2022-05-24 16:13 ` Logan Gunthorpe
2022-05-25 9:04 ` Guoqing Jiang
2022-05-25 18:22 ` Logan Gunthorpe
2022-05-26 9:46 ` Jan Kara
2022-05-26 11:53 ` Jan Kara
2022-05-31 6:11 ` Christoph Hellwig
2022-05-31 7:43 ` Jan Kara
2022-05-30 9:55 ` Guoqing Jiang
2022-05-30 16:35 ` Logan Gunthorpe
2022-05-31 8:13 ` Guoqing Jiang
2022-05-24 15:58 ` Logan Gunthorpe
2022-05-24 18:16 ` Song Liu
2022-05-25 9:17 ` Guoqing Jiang
2022-05-24 15:51 ` Logan Gunthorpe
2022-06-02 8:12 ` Xiao Ni
2022-05-09 8:18 ` Donald Buczek
2022-05-09 8:48 ` Guoqing Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5b0584a3-c128-cb53-7c8a-63744c60c667@linux.dev \
--to=guoqing.jiang@linux.dev \
--cc=buczek@molgen.mpg.de \
--cc=linux-raid@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).