* [PATCH] md/bitmap: wait for external bitmap writes to complete during tear down
@ 2021-04-13 1:07 Sudhakar Panneerselvam
2021-04-13 1:39 ` heming.zhao
0 siblings, 1 reply; 3+ messages in thread
From: Sudhakar Panneerselvam @ 2021-04-13 1:07 UTC (permalink / raw)
To: linux-raid, song, heming.zhao; +Cc: lidong.zhong, xni, colyli, martin.petersen
NULL pointer dereference was observed in super_written() when it tries
to access the mddev structure.
[The below stack trace is from an older kernel, but the problem described
in this patch applies to the mainline kernel.]
[ 1194.474861] task: ffff8fdd20858000 task.stack: ffffb99d40790000
[ 1194.488000] RIP: 0010:super_written+0x29/0xe1
[ 1194.499688] RSP: 0018:ffff8ffb7fcc3c78 EFLAGS: 00010046
[ 1194.512477] RAX: 0000000000000000 RBX: ffff8ffb7bf4a000 RCX: ffff8ffb78991048
[ 1194.527325] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ffb56b8a200
[ 1194.542576] RBP: ffff8ffb7fcc3c90 R08: 000000000000000b R09: 0000000000000000
[ 1194.558001] R10: ffff8ffb56b8a298 R11: 0000000000000000 R12: ffff8ffb56b8a200
[ 1194.573070] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1194.588117] FS: 0000000000000000(0000) GS:ffff8ffb7fcc0000(0000) knlGS:0000000000000000
[ 1194.604264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1194.617375] CR2: 00000000000002b8 CR3: 00000021e040a002 CR4: 00000000007606e0
[ 1194.632327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1194.647865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1194.663316] PKRU: 55555554
[ 1194.674090] Call Trace:
[ 1194.683735] <IRQ>
[ 1194.692948] bio_endio+0xae/0x135
[ 1194.703580] blk_update_request+0xad/0x2fa
[ 1194.714990] blk_update_bidi_request+0x20/0x72
[ 1194.726578] __blk_end_bidi_request+0x2c/0x4d
[ 1194.738373] __blk_end_request_all+0x31/0x49
[ 1194.749344] blk_flush_complete_seq+0x377/0x383
[ 1194.761550] flush_end_io+0x1dd/0x2a7
[ 1194.772910] blk_finish_request+0x9f/0x13c
[ 1194.784544] scsi_end_request+0x180/0x25c
[ 1194.796149] scsi_io_completion+0xc8/0x610
[ 1194.807503] scsi_finish_command+0xdc/0x125
[ 1194.818897] scsi_softirq_done+0x81/0xde
[ 1194.830062] blk_done_softirq+0xa4/0xcc
[ 1194.841008] __do_softirq+0xd9/0x29f
[ 1194.851257] irq_exit+0xe6/0xeb
[ 1194.861290] do_IRQ+0x59/0xe3
[ 1194.871060] common_interrupt+0x1c6/0x382
[ 1194.881988] </IRQ>
[ 1194.890646] RIP: 0010:cpuidle_enter_state+0xdd/0x2a5
[ 1194.902532] RSP: 0018:ffffb99d40793e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff43
[ 1194.917317] RAX: ffff8ffb7fce27c0 RBX: ffff8ffb7fced800 RCX: 000000000000001f
[ 1194.932056] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
[ 1194.946428] RBP: ffffb99d40793ea0 R08: 0000000000000004 R09: 0000000000002ed2
[ 1194.960508] R10: 0000000000002664 R11: 0000000000000018 R12: 0000000000000003
[ 1194.974454] R13: 000000000000000b R14: ffffffff925715a0 R15: 0000011610120d5a
[ 1194.988607] ? cpuidle_enter_state+0xcc/0x2a5
[ 1194.999077] cpuidle_enter+0x17/0x19
[ 1195.008395] call_cpuidle+0x23/0x3a
[ 1195.017718] do_idle+0x172/0x1d5
[ 1195.026358] cpu_startup_entry+0x73/0x75
[ 1195.035769] start_secondary+0x1b9/0x20b
[ 1195.044894] secondary_startup_64+0xa5/0xa5
[ 1195.084921] RIP: super_written+0x29/0xe1 RSP: ffff8ffb7fcc3c78
[ 1195.096354] CR2: 00000000000002b8
bio in the above stack is a bitmap write whose completion is invoked after
the tear down sequence sets the mddev structure to NULL in rdev.
During tear down, there is an attempt to flush the bitmap writes, but for
external bitmaps, there is no explicit wait for all the bitmap writes to
complete. For instance, md_bitmap_flush() is called to flush the bitmap
writes, but the last call to md_bitmap_daemon_work() in md_bitmap_flush()
could generate new bitmap writes for which there is no explicit wait to
complete those writes. The call to md_bitmap_update_sb() will return
simply for external bitmaps and the follow-up call to md_update_sb() is
conditional and may not get called for external bitmaps. This results in a
kernel panic when the completion routine, super_written() is called which
tries to reference mddev in the rdev that has been set to
NULL(in unbind_rdev_from_array() by tear down sequence).
The solution is to call md_super_wait() for external bitmaps after the
last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there
are no pending bitmap writes before proceeding with the tear down.
Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
---
drivers/md/md-bitmap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 200c5d0f08bf..ea3130e11680 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1722,6 +1722,8 @@ void md_bitmap_flush(struct mddev *mddev)
md_bitmap_daemon_work(mddev);
bitmap->daemon_lastrun -= sleep;
md_bitmap_daemon_work(mddev);
+ if (mddev->bitmap_info.external)
+ md_super_wait(mddev);
md_bitmap_update_sb(bitmap);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] md/bitmap: wait for external bitmap writes to complete during tear down
2021-04-13 1:07 [PATCH] md/bitmap: wait for external bitmap writes to complete during tear down Sudhakar Panneerselvam
@ 2021-04-13 1:39 ` heming.zhao
2021-04-13 1:43 ` Sudhakar Panneerselvam
0 siblings, 1 reply; 3+ messages in thread
From: heming.zhao @ 2021-04-13 1:39 UTC (permalink / raw)
To: Sudhakar Panneerselvam, linux-raid, song
Cc: lidong.zhong, xni, colyli, martin.petersen
On 4/13/21 9:07 AM, Sudhakar Panneerselvam wrote:
> NULL pointer dereference was observed in super_written() when it tries
> to access the mddev structure.
>
> [The below stack trace is from an older kernel, but the problem described
> in this patch applies to the mainline kernel.]
> ... ...
>
> The solution is to call md_super_wait() for external bitmaps after the
> last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there
> are no pending bitmap writes before proceeding with the tear down.
>
> Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
> Reviewed-by: Heming Zhao <heming.zhao@suse.com>
> ---
> drivers/md/md-bitmap.c | 2 ++
> 1 file changed, 2 insertions(+)
Hello Sudhakar,
A few info to you. If I understand kernel patch submit rules correctly.
1.
You couldn't add the line "Reviewed-by: Heming Zhao <heming.zhao@suse.com>" before
I give you this line in my email.
But take it easy, you can add my name now.
2.
This is v2 patch, you should change title from [PATCH] to [PATCH v2], and
also need to write changelog in patch.
Thanks,
Heming
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [PATCH] md/bitmap: wait for external bitmap writes to complete during tear down
2021-04-13 1:39 ` heming.zhao
@ 2021-04-13 1:43 ` Sudhakar Panneerselvam
0 siblings, 0 replies; 3+ messages in thread
From: Sudhakar Panneerselvam @ 2021-04-13 1:43 UTC (permalink / raw)
To: heming.zhao@suse.com, linux-raid@vger.kernel.org, song@kernel.org
Cc: lidong.zhong@suse.com, xni@redhat.com, colyli@suse.com,
Martin Petersen
> -----Original Message-----
> From: heming.zhao@suse.com [mailto:heming.zhao@suse.com]
> Sent: Monday, April 12, 2021 7:40 PM
> To: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>; linux-raid@vger.kernel.org; song@kernel.org
> Cc: lidong.zhong@suse.com; xni@redhat.com; colyli@suse.com; Martin Petersen <martin.petersen@oracle.com>
> Subject: Re: [PATCH] md/bitmap: wait for external bitmap writes to complete during tear down
>
> On 4/13/21 9:07 AM, Sudhakar Panneerselvam wrote:
> > NULL pointer dereference was observed in super_written() when it tries
> > to access the mddev structure.
> >
> > [The below stack trace is from an older kernel, but the problem described
> > in this patch applies to the mainline kernel.]
> > ... ...
> >
> > The solution is to call md_super_wait() for external bitmaps after the
> > last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there
> > are no pending bitmap writes before proceeding with the tear down.
> >
> > Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
> > Reviewed-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> > drivers/md/md-bitmap.c | 2 ++
> > 1 file changed, 2 insertions(+)
>
>
> Hello Sudhakar,
>
> A few info to you. If I understand kernel patch submit rules correctly.
> 1.
> You couldn't add the line "Reviewed-by: Heming Zhao <heming.zhao@suse.com>" before
> I give you this line in my email.
> But take it easy, you can add my name now.
>
> 2.
> This is v2 patch, you should change title from [PATCH] to [PATCH v2], and
> also need to write changelog in patch.
My apologies. Will resend the patch with the modifications
Thanks
Sudhakar
>
> Thanks,
> Heming
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-04-13 1:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-13 1:07 [PATCH] md/bitmap: wait for external bitmap writes to complete during tear down Sudhakar Panneerselvam
2021-04-13 1:39 ` heming.zhao
2021-04-13 1:43 ` Sudhakar Panneerselvam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox