linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] crash in md-raid1 and md-raid10 due to incorrect list manipulation
@ 2015-10-01 19:17 Mikulas Patocka
  2015-10-08 21:35 ` Neil Brown
  0 siblings, 1 reply; 2+ messages in thread
From: Mikulas Patocka @ 2015-10-01 19:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, dm-devel, linux-kernel, Mike Snitzer

The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure 
device failure recorded before write request returns) is causing crash in 
the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100% 
reproducible.

The reason for the crash is that the newly added code in raid1d moves the 
list from conf->bio_end_io_list to tmp, then tests if tmp is non-empty and 
then incorrectly pops the bio from conf->bio_end_io_list (which is empty 
because the list was alrady moved).

Raid-10 has a similar bug.

Kernel Fault: Code=15 regs=000000006ccb8640 (Addr=0000000100000000)
CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35
task: 000000006cc1f258 ti: 000000006ccb8000 task.ti: 000000006ccb8000

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111000001111 Not tainted
r00-03  000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38
r04-07  000000001059d000 000000007f16be08 0000000000200200 0000000000000001
r08-11  000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000
r12-15  000000004056f320 0000000000000000 0000000000013dd0 0000000000000000
r16-19  00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001
r20-23  000000000800000f 0000000042200390 0000000000000000 0000000000000000
r24-27  0000000000000001 000000000800000f 000000007f16be08 000000001059d000
r28-31  0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000
sr00-03  0000000000249800 0000000000000000 0000000000000000 0000000000249800
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001059f61c 000000001059f620
 IIR: 0f8010c6    ISR: 0000000000000000  IOR: 0000000100000000
 CPU:        3   CR30: 000000006ccb8000 CR31: 0000000000000000
 ORIG_R28: 000000001059d000
 IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1]
 IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1]
 RP(r2): raid_end_bio_io+0x88/0x168 [raid1]
Backtrace:
 [<000000001059f818>] raid_end_bio_io+0x88/0x168 [raid1]
 [<00000000105a4f64>] raid1d+0x144/0x1640 [raid1]
 [<000000004017fd5c>] kthread+0x144/0x160

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
Fixes: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")

---
 drivers/md/raid1.c  |    4 ++--
 drivers/md/raid10.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/drivers/md/raid1.c
===================================================================
--- linux-2.6.orig/drivers/md/raid1.c	2015-10-01 21:10:05.000000000 +0200
+++ linux-2.6/drivers/md/raid1.c	2015-10-01 21:10:58.000000000 +0200
@@ -2437,8 +2437,8 @@ static void raid1d(struct md_thread *thr
 		}
 		spin_unlock_irqrestore(&conf->device_lock, flags);
 		while (!list_empty(&tmp)) {
-			r1_bio = list_first_entry(&conf->bio_end_io_list,
-						  struct r1bio, retry_list);
+			r1_bio = list_first_entry(&tmp, struct r1bio,
+						  retry_list);
 			list_del(&r1_bio->retry_list);
 			raid_end_bio_io(r1_bio);
 		}
Index: linux-2.6/drivers/md/raid10.c
===================================================================
--- linux-2.6.orig/drivers/md/raid10.c	2015-10-01 21:11:02.000000000 +0200
+++ linux-2.6/drivers/md/raid10.c	2015-10-01 21:11:19.000000000 +0200
@@ -2804,8 +2804,8 @@ static void raid10d(struct md_thread *th
 		}
 		spin_unlock_irqrestore(&conf->device_lock, flags);
 		while (!list_empty(&tmp)) {
-			r10_bio = list_first_entry(&conf->bio_end_io_list,
-						  struct r10bio, retry_list);
+			r10_bio = list_first_entry(&tmp, struct r10bio,
+						   retry_list);
 			list_del(&r10_bio->retry_list);
 			raid_end_bio_io(r10_bio);
 		}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] crash in md-raid1 and md-raid10 due to incorrect list manipulation
  2015-10-01 19:17 [PATCH] crash in md-raid1 and md-raid10 due to incorrect list manipulation Mikulas Patocka
@ 2015-10-08 21:35 ` Neil Brown
  0 siblings, 0 replies; 2+ messages in thread
From: Neil Brown @ 2015-10-08 21:35 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-raid, dm-devel, linux-kernel, Mike Snitzer

[-- Attachment #1: Type: text/plain, Size: 4204 bytes --]

Mikulas Patocka <mpatocka@redhat.com> writes:

> The commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure 
> device failure recorded before write request returns) is causing crash in 
> the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100% 
> reproducible.
>
> The reason for the crash is that the newly added code in raid1d moves the 
> list from conf->bio_end_io_list to tmp, then tests if tmp is non-empty and 
> then incorrectly pops the bio from conf->bio_end_io_list (which is empty 
> because the list was alrady moved).
>
> Raid-10 has a similar bug.

Ouch.  I can't have been thinking when I wrote that code!

Thanks for finding and fixing this.  Patch will be sent to Linus in time
for next -rc.

Thanks,
NeilBrown

>
> Kernel Fault: Code=15 regs=000000006ccb8640 (Addr=0000000100000000)
> CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35
> task: 000000006cc1f258 ti: 000000006ccb8000 task.ti: 000000006ccb8000
>
>      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 00001000000001001111111000001111 Not tainted
> r00-03  000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38
> r04-07  000000001059d000 000000007f16be08 0000000000200200 0000000000000001
> r08-11  000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000
> r12-15  000000004056f320 0000000000000000 0000000000013dd0 0000000000000000
> r16-19  00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001
> r20-23  000000000800000f 0000000042200390 0000000000000000 0000000000000000
> r24-27  0000000000000001 000000000800000f 000000007f16be08 000000001059d000
> r28-31  0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000
> sr00-03  0000000000249800 0000000000000000 0000000000000000 0000000000249800
> sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>
> IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001059f61c 000000001059f620
>  IIR: 0f8010c6    ISR: 0000000000000000  IOR: 0000000100000000
>  CPU:        3   CR30: 000000006ccb8000 CR31: 0000000000000000
>  ORIG_R28: 000000001059d000
>  IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1]
>  IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1]
>  RP(r2): raid_end_bio_io+0x88/0x168 [raid1]
> Backtrace:
>  [<000000001059f818>] raid_end_bio_io+0x88/0x168 [raid1]
>  [<00000000105a4f64>] raid1d+0x144/0x1640 [raid1]
>  [<000000004017fd5c>] kthread+0x144/0x160
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
> Fixes: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
>
> ---
>  drivers/md/raid1.c  |    4 ++--
>  drivers/md/raid10.c |    4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> Index: linux-2.6/drivers/md/raid1.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/raid1.c	2015-10-01 21:10:05.000000000 +0200
> +++ linux-2.6/drivers/md/raid1.c	2015-10-01 21:10:58.000000000 +0200
> @@ -2437,8 +2437,8 @@ static void raid1d(struct md_thread *thr
>  		}
>  		spin_unlock_irqrestore(&conf->device_lock, flags);
>  		while (!list_empty(&tmp)) {
> -			r1_bio = list_first_entry(&conf->bio_end_io_list,
> -						  struct r1bio, retry_list);
> +			r1_bio = list_first_entry(&tmp, struct r1bio,
> +						  retry_list);
>  			list_del(&r1_bio->retry_list);
>  			raid_end_bio_io(r1_bio);
>  		}
> Index: linux-2.6/drivers/md/raid10.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/raid10.c	2015-10-01 21:11:02.000000000 +0200
> +++ linux-2.6/drivers/md/raid10.c	2015-10-01 21:11:19.000000000 +0200
> @@ -2804,8 +2804,8 @@ static void raid10d(struct md_thread *th
>  		}
>  		spin_unlock_irqrestore(&conf->device_lock, flags);
>  		while (!list_empty(&tmp)) {
> -			r10_bio = list_first_entry(&conf->bio_end_io_list,
> -						  struct r10bio, retry_list);
> +			r10_bio = list_first_entry(&tmp, struct r10bio,
> +						   retry_list);
>  			list_del(&r10_bio->retry_list);
>  			raid_end_bio_io(r10_bio);
>  		}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-10-08 21:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-01 19:17 [PATCH] crash in md-raid1 and md-raid10 due to incorrect list manipulation Mikulas Patocka
2015-10-08 21:35 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).