[PATCH 1/2] raid5: Before freeing old multi-thread worker,it should flush them.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] raid5: Before freeing old multi-thread worker,it should flush them.
@ 2013-11-12  2:43 majianpeng
  2013-11-12  3:39 ` Shaohua Li
  0 siblings, 1 reply; 2+ messages in thread
From: majianpeng @ 2013-11-12  2:43 UTC (permalink / raw)
  To: NeilBrown, shli; +Cc: linux-raid

When changed group_thread_cnt from sysfs entry,the kernel met oops.
The kernel messages are:
[  740.961389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  740.961444] IP: [<ffffffff81062570>] process_one_work+0x30/0x500
[  740.961476] PGD b9013067 PUD b651e067 PMD 0
[  740.961503] Oops: 0000 [#1] SMP
[  740.961525] Modules linked in: netconsole e1000e ptp pps_core
[  740.961577] CPU: 0 PID: 3683 Comm: kworker/u8:5 Not tainted 3.12.0+ #23
[  740.961602] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[  740.961646] task: ffff88013abe0000 ti: ffff88013a246000 task.ti: ffff88013a246000
[  740.961673] RIP: 0010:[<ffffffff81062570>]  [<ffffffff81062570>] process_one_work+0x30/0x500
[  740.961708] RSP: 0018:ffff88013a247e08  EFLAGS: 00010086
[  740.961730] RAX: ffff8800b912b400 RBX: ffff88013a61e680 RCX: ffff8800b912b400
[  740.961757] RDX: ffff8800b912b600 RSI: ffff8800b912b600 RDI: ffff88013a61e680
[  740.961782] RBP: ffff88013a247e48 R08: ffff88013a246000 R09: 000000000002c09d
[  740.961808] R10: 000000000000010f R11: 0000000000000000 R12: ffff88013b00cc00
[  740.961833] R13: 0000000000000000 R14: ffff88013b00cf80 R15: ffff88013a61e6b0
[  740.961861] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[  740.961893] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  740.962001] CR2: 00000000000000b8 CR3: 00000000b24fe000 CR4: 00000000000407f0
[  740.962001] Stack:
[  740.962001]  0000000000000008 ffff8800b912b600 ffff88013b00cc00 ffff88013a61e680
[  740.962001]  ffff88013b00cc00 ffff88013b00cc18 ffff88013b00cf80 ffff88013a61e6b0
[  740.962001]  ffff88013a247eb8 ffffffff810639c6 0000000000012a80 ffff88013a247fd8
[  740.962001] Call Trace:
[  740.962001]  [<ffffffff810639c6>] worker_thread+0x206/0x3f0
[  740.962001]  [<ffffffff810637c0>] ? manage_workers+0x2c0/0x2c0
[  740.962001]  [<ffffffff81069656>] kthread+0xc6/0xd0
[  740.962001]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  740.962001]  [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
[  740.962001]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
[  740.962001] Code: 89 e5 41 57 41 56 41 55 45 31 ed 41 54 53 48 89 fb 48 83 ec 18 48 8b 06 4c 8b 67 48 48 89 c1 30 c9 a8 04 4c 0f 45 e9 80 7f 58 00 <49> 8b 45 08 44 8b b0 00 01 00 00 78 0c 41 f6 44 24 10 04 0f 84
[  740.962001] RIP  [<ffffffff81062570>] process_one_work+0x30/0x500
[  740.962001]  RSP <ffff88013a247e08>
[  740.962001] CR2: 0000000000000008
[  740.962001] ---[ end trace 39181460000748de ]---
[  740.962001] Kernel panic - not syncing: Fatal exception

Suppose this condition,there are left some stirpes which less MAX_STRIPE_BATCH.
it queued a worker to handle.But before calling raid5_do_work, raid5d handle those
stripe make conf->active_striep =0.So mddev_suspend() can return.
It free old worker resources before raid5_do_work.So when process_one_work() call
raid5_do_work, the raid5 worker already free.

				raid5d()		raid5_store_group_thread_cnt()
queue_work						mddev_suspend()
				handle_strips
				active_stripe=0
							free(old worker resources)
		process_one_work
		raid5_do_work

To avoid this, we should only flush those worker before free them.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
---
 drivers/md/raid5.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f8b9068..cc866f8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5239,6 +5239,9 @@ raid5_store_group_thread_cnt(struct mddev *mddev, const char *page, size_t len)
 	old_groups = conf->worker_groups;
 	old_group_cnt = conf->worker_cnt_per_group;
 
+	if (old_groups)
+		flush_workqueue(raid5_wq);
+
 	conf->worker_groups = NULL;
 	err = alloc_thread_groups(conf, new);
 	if (err) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH 1/2] raid5: Before freeing old multi-thread worker,it should flush them.
  2013-11-12  2:43 [PATCH 1/2] raid5: Before freeing old multi-thread worker,it should flush them majianpeng
@ 2013-11-12  3:39 ` Shaohua Li
  0 siblings, 0 replies; 2+ messages in thread
From: Shaohua Li @ 2013-11-12  3:39 UTC (permalink / raw)
  To: majianpeng; +Cc: NeilBrown, linux-raid

On Tue, Nov 12, 2013 at 10:43:46AM +0800, majianpeng wrote:
> When changed group_thread_cnt from sysfs entry,the kernel met oops.
> The kernel messages are:
> [  740.961389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [  740.961444] IP: [<ffffffff81062570>] process_one_work+0x30/0x500
> [  740.961476] PGD b9013067 PUD b651e067 PMD 0
> [  740.961503] Oops: 0000 [#1] SMP
> [  740.961525] Modules linked in: netconsole e1000e ptp pps_core
> [  740.961577] CPU: 0 PID: 3683 Comm: kworker/u8:5 Not tainted 3.12.0+ #23
> [  740.961602] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
> [  740.961646] task: ffff88013abe0000 ti: ffff88013a246000 task.ti: ffff88013a246000
> [  740.961673] RIP: 0010:[<ffffffff81062570>]  [<ffffffff81062570>] process_one_work+0x30/0x500
> [  740.961708] RSP: 0018:ffff88013a247e08  EFLAGS: 00010086
> [  740.961730] RAX: ffff8800b912b400 RBX: ffff88013a61e680 RCX: ffff8800b912b400
> [  740.961757] RDX: ffff8800b912b600 RSI: ffff8800b912b600 RDI: ffff88013a61e680
> [  740.961782] RBP: ffff88013a247e48 R08: ffff88013a246000 R09: 000000000002c09d
> [  740.961808] R10: 000000000000010f R11: 0000000000000000 R12: ffff88013b00cc00
> [  740.961833] R13: 0000000000000000 R14: ffff88013b00cf80 R15: ffff88013a61e6b0
> [  740.961861] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
> [  740.961893] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  740.962001] CR2: 00000000000000b8 CR3: 00000000b24fe000 CR4: 00000000000407f0
> [  740.962001] Stack:
> [  740.962001]  0000000000000008 ffff8800b912b600 ffff88013b00cc00 ffff88013a61e680
> [  740.962001]  ffff88013b00cc00 ffff88013b00cc18 ffff88013b00cf80 ffff88013a61e6b0
> [  740.962001]  ffff88013a247eb8 ffffffff810639c6 0000000000012a80 ffff88013a247fd8
> [  740.962001] Call Trace:
> [  740.962001]  [<ffffffff810639c6>] worker_thread+0x206/0x3f0
> [  740.962001]  [<ffffffff810637c0>] ? manage_workers+0x2c0/0x2c0
> [  740.962001]  [<ffffffff81069656>] kthread+0xc6/0xd0
> [  740.962001]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
> [  740.962001]  [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
> [  740.962001]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
> [  740.962001] Code: 89 e5 41 57 41 56 41 55 45 31 ed 41 54 53 48 89 fb 48 83 ec 18 48 8b 06 4c 8b 67 48 48 89 c1 30 c9 a8 04 4c 0f 45 e9 80 7f 58 00 <49> 8b 45 08 44 8b b0 00 01 00 00 78 0c 41 f6 44 24 10 04 0f 84
> [  740.962001] RIP  [<ffffffff81062570>] process_one_work+0x30/0x500
> [  740.962001]  RSP <ffff88013a247e08>
> [  740.962001] CR2: 0000000000000008
> [  740.962001] ---[ end trace 39181460000748de ]---
> [  740.962001] Kernel panic - not syncing: Fatal exception
> 
> Suppose this condition,there are left some stirpes which less MAX_STRIPE_BATCH.
> it queued a worker to handle.But before calling raid5_do_work, raid5d handle those
> stripe make conf->active_striep =0.So mddev_suspend() can return.
> It free old worker resources before raid5_do_work.So when process_one_work() call
> raid5_do_work, the raid5 worker already free.
> 
> 				raid5d()		raid5_store_group_thread_cnt()
> queue_work						mddev_suspend()
> 				handle_strips
> 				active_stripe=0
> 							free(old worker resources)
> 		process_one_work
> 		raid5_do_work
> 
> To avoid this, we should only flush those worker before free them.
> 
> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>

thanks for fixing it.
Reviewed-by: Shaohua Li <shli@kernel.org>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-11-12  3:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-12  2:43 [PATCH 1/2] raid5: Before freeing old multi-thread worker,it should flush them majianpeng
2013-11-12  3:39 ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).