[PATCH 0/5] Fixes for RAID1 resync

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] Fixes for RAID1 resync
@ 2014-09-10  6:20 NeilBrown
  2014-09-10  6:20 ` [PATCH 1/5] md/raid1: be more cautious where we read-balance during resync NeilBrown
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-10  6:20 UTC (permalink / raw)
  To: Eivind Sarto, Brassow Jonathan; +Cc: linux-raid, majianpeng

This set of patches fixes all the issues that I could
find with the new raid1 resync code.
I didn't find anything that would obviously fix the hang
that Jon Brassow reported.

Jon: could you test with these patches on top of what you
have just in case something happens to fix the problem without
me realising it?

Eivind: if you could test too, I'd really appreciate it.

Thanks,
NeilBrown


---

NeilBrown (5):
      md/raid1:  be more cautious where we read-balance during resync.
      md/raid1: clean up request counts properly in close_sync()
      md/raid1: make sure resync waits for conflicting writes to complete.
      md/raid1: Don't use next_resync to determine how far resync has progressed
      md/raid1: update next_resync under resync_lock.


 drivers/md/raid1.c |   33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

-- 
Signature


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] md/raid1: be more cautious where we read-balance during resync.
  2014-09-10  6:20 [PATCH 0/5] Fixes for RAID1 resync NeilBrown
@ 2014-09-10  6:20 ` NeilBrown
  2014-09-10  6:20 ` [PATCH 4/5] md/raid1: Don't use next_resync to determine how far resync has progressed NeilBrown
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-10  6:20 UTC (permalink / raw)
  To: Eivind Sarto, Brassow Jonathan; +Cc: linux-raid, majianpeng

commit 79ef3a8aa1cb1523cc231c9a90a278333c21f761 made
it possible for reads to happen concurrently with resync.
This means that we need to be more careful where read_balancing
is allowed during resync - we can no longer be sure that any
resync that has already started will definitely finish.

So keep read_balancing to before recovery_cp, which is conservative
but safe.

This bug makes it possible to read from a device that doesn't
have up-to-date data, so it can cause data corruption.
So it is suitable for any kernel since 3.11.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
cc: stable@vger.kernel.org (v3.11+)
Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d7690f86fdb9..f66f67e5a423 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -540,11 +540,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 	has_nonrot_disk = 0;
 	choose_next_idle = 0;
 
-	if (conf->mddev->recovery_cp < MaxSector &&
-	    (this_sector + sectors >= conf->next_resync))
-		choose_first = 1;
-	else
-		choose_first = 0;
+	choose_first = (conf->mddev->recovery_cp < this_sector + sectors);
 
 	for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) {
 		sector_t dist;



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] md/raid1: Don't use next_resync to determine how far resync has progressed
  2014-09-10  6:20 [PATCH 0/5] Fixes for RAID1 resync NeilBrown
  2014-09-10  6:20 ` [PATCH 1/5] md/raid1: be more cautious where we read-balance during resync NeilBrown
@ 2014-09-10  6:20 ` NeilBrown
  2014-09-10  6:20 ` [PATCH 2/5] md/raid1: clean up request counts properly in close_sync() NeilBrown
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-10  6:20 UTC (permalink / raw)
  To: Eivind Sarto, Brassow Jonathan; +Cc: linux-raid, majianpeng

next_resync is (approximately) the locate for the next resync request.
However it does *not* reliably determine the earliest location
at which resync might be happening.
This is because resync requests can complete out of order, and
or only limit the number of current requests, not the distance
from the earliest pending request to the latest.

mddev->curr_resync_completed is a reliable indicator of the earliest
position at which resync could be happening.   It is updated less
frequently, but is actually reliable which is more important.

So use it to determine if a write request is before the region
being resynced and so safe from conflict.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 86753358ea46..08a6989fcff6 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -875,12 +875,10 @@ static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
 	if (conf->array_frozen || !bio)
 		wait = true;
 	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
-		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
-			wait = true;
-		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
-				>= bio_end_sector(bio)) ||
-			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
-				<= bio->bi_iter.bi_sector))
+		if ((conf->mddev->curr_resync_completed
+		     >= bio_end_sector(bio)) ||
+		    (conf->next_resync + NEXT_NORMALIO_DISTANCE
+		     <= bio->bi_iter.bi_sector))
 			wait = false;
 		else
 			wait = true;
@@ -918,7 +916,7 @@ static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
 
 	if (bio && bio_data_dir(bio) == WRITE) {
 		if (bio->bi_iter.bi_sector >=
-		    conf->next_resync) {
+		    conf->mddev->curr_resync_completed) {
 			if (conf->start_next_window == MaxSector)
 				conf->start_next_window =
 					conf->next_resync +



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] md/raid1: clean up request counts properly in close_sync()
  2014-09-10  6:20 [PATCH 0/5] Fixes for RAID1 resync NeilBrown
  2014-09-10  6:20 ` [PATCH 1/5] md/raid1: be more cautious where we read-balance during resync NeilBrown
  2014-09-10  6:20 ` [PATCH 4/5] md/raid1: Don't use next_resync to determine how far resync has progressed NeilBrown
@ 2014-09-10  6:20 ` NeilBrown
  2014-09-10  6:20 ` [PATCH 3/5] md/raid1: make sure resync waits for conflicting writes to complete NeilBrown
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-10  6:20 UTC (permalink / raw)
  To: Eivind Sarto, Brassow Jonathan; +Cc: linux-raid, majianpeng

If there are outstanding writes when close_sync is called,
the change to ->start_next_window might cause them to
decrement the wrong counter when they complete.  Fix this
by merging the two counters into the one that will be decremented.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f66f67e5a423..65afd8aa6608 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1544,8 +1544,13 @@ static void close_sync(struct r1conf *conf)
 	mempool_destroy(conf->r1buf_pool);
 	conf->r1buf_pool = NULL;
 
+	spin_lock_irq(&conf->resync_lock);
 	conf->next_resync = 0;
 	conf->start_next_window = MaxSector;
+	conf->current_window_requests +=
+		conf->next_window_requests;
+	conf->next_window_requests = 0;
+	spin_unlock_irq(&conf->resync_lock);
 }
 
 static int raid1_spare_active(struct mddev *mddev)



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] md/raid1: make sure resync waits for conflicting writes to complete.
  2014-09-10  6:20 [PATCH 0/5] Fixes for RAID1 resync NeilBrown
                   ` (2 preceding siblings ...)
  2014-09-10  6:20 ` [PATCH 2/5] md/raid1: clean up request counts properly in close_sync() NeilBrown
@ 2014-09-10  6:20 ` NeilBrown
  2014-09-10  6:20 ` [PATCH 5/5] md/raid1: update next_resync under resync_lock NeilBrown
  2014-09-11  3:45 ` [PATCH 0/5] Fixes for RAID1 resync Brassow Jonathan
  5 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-10  6:20 UTC (permalink / raw)
  To: Eivind Sarto, Brassow Jonathan; +Cc: linux-raid, majianpeng

The resync/recovery process for raid1 was recently changed
so that writes could happen in parallel with resync providing
they were in different regions of the device.

There is a problem though:  While a write request will always
wait for conflicting resync to complete, a resync request
will *not* always wait for conflicting writes to complete.

Two changes are needed to fix this:

1/ raise_barrier (which waits until it is safe to do resync)
   must wait until current_window_requests is zero
2/ wait_battier (which wait at the start of a new write request)
   must update current_window_requests if the request could
   possible conflict with a concurrent resync.

As concurrent writes and resync can lead to data loss,
this patch is suitable for -stable.

Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Cc: stable@vger.kernel.org (v3.13+)
Cc: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 65afd8aa6608..86753358ea46 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -845,10 +845,12 @@ static void raise_barrier(struct r1conf *conf)
 	 * C: next_resync + RESYNC_SECTORS > start_next_window, meaning
 	 *    next resync will reach to the window which normal bios are
 	 *    handling.
+	 * D: while there are any active requests in the current window.
 	 */
 	wait_event_lock_irq(conf->wait_barrier,
 			    !conf->array_frozen &&
 			    conf->barrier < RESYNC_DEPTH &&
+			    conf->current_window_requests == 0 &&
 			    (conf->start_next_window >=
 			     conf->next_resync + RESYNC_SECTORS),
 			    conf->resync_lock);
@@ -915,8 +917,8 @@ static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
 	}
 
 	if (bio && bio_data_dir(bio) == WRITE) {
-		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
-		    <= bio->bi_iter.bi_sector) {
+		if (bio->bi_iter.bi_sector >=
+		    conf->next_resync) {
 			if (conf->start_next_window == MaxSector)
 				conf->start_next_window =
 					conf->next_resync +



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] md/raid1: update next_resync under resync_lock.
  2014-09-10  6:20 [PATCH 0/5] Fixes for RAID1 resync NeilBrown
                   ` (3 preceding siblings ...)
  2014-09-10  6:20 ` [PATCH 3/5] md/raid1: make sure resync waits for conflicting writes to complete NeilBrown
@ 2014-09-10  6:20 ` NeilBrown
  2014-09-11  3:45 ` [PATCH 0/5] Fixes for RAID1 resync Brassow Jonathan
  5 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-10  6:20 UTC (permalink / raw)
  To: Eivind Sarto, Brassow Jonathan; +Cc: linux-raid, majianpeng

raise_barrier() uses next_resync as part of its calculations, so it
really should be updated first, instead of afterwards.

next_resync is always used under  resync_lock so update it under
resync lock to, just before it is used.  That is safest.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 08a6989fcff6..53e163d31750 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -827,7 +827,7 @@ static void flush_pending_writes(struct r1conf *conf)
  *    there is no normal IO happeing.  It must arrange to call
  *    lower_barrier when the particular background IO completes.
  */
-static void raise_barrier(struct r1conf *conf)
+static void raise_barrier(struct r1conf *conf, sector_t sector_nr)
 {
 	spin_lock_irq(&conf->resync_lock);
 
@@ -837,6 +837,7 @@ static void raise_barrier(struct r1conf *conf)
 
 	/* block any new IO from starting */
 	conf->barrier++;
+	conf->next_resync = sector_nr;
 
 	/* For these conditions we must wait:
 	 * A: while the array is in frozen state
@@ -2542,9 +2543,8 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
 
 	bitmap_cond_end_sync(mddev->bitmap, sector_nr);
 	r1_bio = mempool_alloc(conf->r1buf_pool, GFP_NOIO);
-	raise_barrier(conf);
 
-	conf->next_resync = sector_nr;
+	raise_barrier(conf, sector_nr);
 
 	rcu_read_lock();
 	/*



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Fixes for RAID1 resync
  2014-09-10  6:20 [PATCH 0/5] Fixes for RAID1 resync NeilBrown
                   ` (4 preceding siblings ...)
  2014-09-10  6:20 ` [PATCH 5/5] md/raid1: update next_resync under resync_lock NeilBrown
@ 2014-09-11  3:45 ` Brassow Jonathan
  2014-09-11 17:12   ` Brassow Jonathan
  5 siblings, 1 reply; 13+ messages in thread
From: Brassow Jonathan @ 2014-09-11  3:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: Eivind Sarto, linux-raid, majianpeng


On Sep 10, 2014, at 1:20 AM, NeilBrown wrote:

> 
> Jon: could you test with these patches on top of what you
> have just in case something happens to fix the problem without
> me realising it?

I'm on it.  The test is running.  I'll know later tomorrow.

 brassow


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Fixes for RAID1 resync
  2014-09-11  3:45 ` [PATCH 0/5] Fixes for RAID1 resync Brassow Jonathan
@ 2014-09-11 17:12   ` Brassow Jonathan
  2014-09-15  3:30     ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Brassow Jonathan @ 2014-09-11 17:12 UTC (permalink / raw)
  To: Brassow Jonathan; +Cc: NeilBrown, Eivind Sarto, linux-raid, majianpeng


On Sep 10, 2014, at 10:45 PM, Brassow Jonathan wrote:

> 
> On Sep 10, 2014, at 1:20 AM, NeilBrown wrote:
> 
>> 
>> Jon: could you test with these patches on top of what you
>> have just in case something happens to fix the problem without
>> me realising it?
> 
> I'm on it.  The test is running.  I'll know later tomorrow.
> 
> brassow

The test is still failing from here.  I grabbed 3.17.0-rc4, added the 5 patches, and got the attached backtraces when testing.  As I said, the hangs are not exactly the same.  This set shows the mdX_raid1 thread in the middle of handling a read failure.

 brassow

Sep 11 07:48:02 bp-01 kernel: INFO: task dmeventd:27071 blocked for more than 12
0 seconds.
Sep 11 07:48:02 bp-01 kernel:      Tainted: G            E  3.17.0-rc4 #1
Sep 11 07:48:02 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
 disables this message.
Sep 11 07:48:02 bp-01 kernel: dmeventd        D 0000000000000003     0 27071      1 0x00000080
Sep 11 07:48:02 bp-01 kernel: ffff8804038efae8 0000000000000082 ffff8800dbccf460 ffff88021721c0d0
Sep 11 07:48:02 bp-01 kernel: ffff8804038ec010 0000000000012bc0 0000000000012bc0 ffff88041432b180
Sep 11 07:48:02 bp-01 kernel: ffff8804038efb28 ffff88021fa72bc0 ffff88041432b180 ffff88041432b180
Sep 11 07:48:02 bp-01 kernel: Call Trace:
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81580999>] schedule+0x29/0x70
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81580a6c>] io_schedule+0x8c/0xd0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811d0044>] dio_await_completion+0x54/0xd0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811d264a>] do_blockdev_direct_IO+0x7fa/0xbd0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81143655>] ? pagevec_lookup_tag+0x25/0x40
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81141327>] ? write_cache_pages+0x147/0x510
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811cd850>] ? I_BDEV+0x10/0x10
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811d2a6c>] __blockdev_direct_IO+0x4c/0x50
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811cd850>] ? I_BDEV+0x10/0x10
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811ce7ee>] blkdev_direct_IO+0x4e/0x50
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811cd850>] ? I_BDEV+0x10/0x10
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811376e3>] generic_file_read_iter+0x143/0x150
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811cdab7>] blkdev_read_iter+0x37/0x40
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811977af>] new_sync_read+0x8f/0xc0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81197cb3>] vfs_read+0xa3/0x110
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811b47a3>] ? __fdget+0x13/0x20
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81198266>] SyS_read+0x56/0xd0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff810ec4e6>] ? __audit_syscall_exit+0x216/0x2c0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81584612>] system_call_fastpath+0x16/0x1b
Sep 11 07:48:02 bp-01 kernel: INFO: task kworker/u129:4:24399 blocked for more than 120 seconds.
Sep 11 07:48:02 bp-01 kernel:      Tainted: G            E  3.17.0-rc4 #1
Sep 11 07:48:02 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 11 07:48:02 bp-01 kernel: kworker/u129:4  D 0000000000000000     0 24399      2 0x00000080
Sep 11 07:48:02 bp-01 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:16)
Sep 11 07:48:02 bp-01 kernel: ffff8801fc8bb468 0000000000000046 0000000100000000 ffffffff81a19480
Sep 11 07:48:02 bp-01 kernel: ffff8801fc8b8010 0000000000012bc0 0000000000012bc0 ffff88021721ce40
Sep 11 07:48:02 bp-01 kernel: ffff8801fc8bb558 ffff880414f9e940 ffff880414f9e9b8 ffff8801fc8bb498
Sep 11 07:48:02 bp-01 kernel: Call Trace:
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81580999>] schedule+0x29/0x70
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa0408e9d>] wait_barrier+0xbd/0x230 [raid1]
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8108eb00>] ? bit_waitqueue+0xe0/0xe0
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa040baaa>] make_request+0x9a/0xc00 [raid1]
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81137ff0>] ? mempool_alloc+0x60/0x170
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81137e95>] ? mempool_alloc_slab+0x15/0x20
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81137ff0>] ? mempool_alloc+0x60/0x170
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa0423018>] raid_map+0x18/0x20 [dm_raid]
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa000336a>] __map_bio+0x4a/0x120 [dm_mod]
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa0003723>] __clone_and_map_data_bio+0x113/0x130 [dm_mod]
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa00037ac>] __split_and_process_non_flush+0x6c/0xb0 [dm_mod]
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa0003991>] __split_and_process_bio+0x1a1/0x200 [dm_mod]
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa0003b12>] _dm_request+0x122/0x190 [dm_mod]
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa0003ba8>] dm_request+0x28/0x40 [dm_mod]
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81269620>] generic_make_request+0xc0/0x100
Sep 11 07:48:02 bp-01 kernel: [<ffffffff812696d1>] submit_bio+0x71/0x140
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811c9686>] _submit_bh+0x146/0x220
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811c9770>] submit_bh+0x10/0x20
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811ccf93>] __block_write_full_page.clone.0+0x1a3/0x340
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811cd850>] ? I_BDEV+0x10/0x10
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811cd850>] ? I_BDEV+0x10/0x10
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811cd306>] block_write_full_page+0xc6/0x100
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811ce8f8>] blkdev_writepage+0x18/0x20
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81140067>] __writepage+0x17/0x50
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81141424>] write_cache_pages+0x244/0x510
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81140050>] ? set_page_dirty+0x60/0x60
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81141741>] generic_writepages+0x51/0x80
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81141790>] do_writepages+0x20/0x40
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811bfee9>] __writeback_single_inode+0x49/0x230
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811c3329>] writeback_sb_inodes+0x249/0x360
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811c34de>] __writeback_inodes_wb+0x9e/0xd0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811c370b>] wb_writeback+0x1fb/0x2c0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811c3976>] wb_do_writeback+0x1a6/0x1f0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff811c3a30>] bdi_writeback_workfn+0x70/0x210
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8106b762>] process_one_work+0x182/0x450
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8106bb4f>] worker_thread+0x11f/0x3c0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8106ba30>] ? process_one_work+0x450/0x450
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8107083e>] kthread+0xce/0xf0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8158456c>] ret_from_fork+0x7c/0xb0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
Sep 11 07:48:02 bp-01 kernel: INFO: task mdX_raid1:27151 blocked for more than 120 seconds.
Sep 11 07:48:02 bp-01 kernel:      Tainted: G            E  3.17.0-rc4 #1
Sep 11 07:48:02 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 11 07:48:02 bp-01 kernel: mdX_raid1       D 0000000000000002     0 27151      2 0x00000080
Sep 11 07:48:02 bp-01 kernel: ffff880415bebc88 0000000000000046 0000000000000296 ffff880217260f00
Sep 11 07:48:02 bp-01 kernel: ffff880415be8010 0000000000012bc0 0000000000012bc0 ffff8803fc6982d0
Sep 11 07:48:02 bp-01 kernel: ffff8800de22bd20 ffff880414f9e940 ffff880414f9e9b8 ffff880415bebca8
Sep 11 07:48:02 bp-01 kernel: Call Trace:
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81580999>] schedule+0x29/0x70
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa0408d34>] freeze_array+0x74/0xc0 [raid1]
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8108eb00>] ? bit_waitqueue+0xe0/0xe0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8126a7a3>] ? blk_queue_bio+0x143/0x320
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa040a14d>] handle_read_error+0x3d/0x300 [raid1]
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81269620>] ? generic_make_request+0xc0/0x100
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa040841b>] ? sync_request_write+0xab/0x1a0 [raid1]
Sep 11 07:48:02 bp-01 kernel: [<ffffffffa040a515>] raid1d+0x105/0x170 [raid1]
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81460e76>] md_thread+0x116/0x150
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8108eb00>] ? bit_waitqueue+0xe0/0xe0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81460d60>] ? md_rdev_init+0x110/0x110
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8107083e>] kthread+0xce/0xf0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
Sep 11 07:48:02 bp-01 kernel: [<ffffffff8158456c>] ret_from_fork+0x7c/0xb0
Sep 11 07:48:02 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
Sep 11 07:48:03 bp-01 kernel: INFO: task mdX_resync:27154 blocked for more than 120 seconds.
Sep 11 07:48:03 bp-01 kernel:      Tainted: G            E  3.17.0-rc4 #1
Sep 11 07:48:03 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 11 07:48:03 bp-01 kernel: mdX_resync      D 0000000000000004     0 27154      2 0x00000080
Sep 11 07:48:03 bp-01 kernel: ffff880405777c58 0000000000000046 ffff880405777bf8 ffff88021726cf40
Sep 11 07:48:03 bp-01 kernel: ffff880405774010 0000000000012bc0 0000000000012bc0 ffff88040284f180
Sep 11 07:48:03 bp-01 kernel: ffff88041400323c ffff880405777db8 ffff88041400323c ffff880414003010
Sep 11 07:48:03 bp-01 kernel: Call Trace:
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81580999>] schedule+0x29/0x70
Sep 11 07:48:03 bp-01 kernel: [<ffffffff814608e7>] md_do_sync+0xac7/0xd40
Sep 11 07:48:03 bp-01 kernel: [<ffffffff8108eb00>] ? bit_waitqueue+0xe0/0xe0
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81460e76>] md_thread+0x116/0x150
Sep 11 07:48:03 bp-01 kernel: [<ffffffff815804be>] ? __schedule+0x34e/0x6e0
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81460d60>] ? md_rdev_init+0x110/0x110
Sep 11 07:48:03 bp-01 kernel: [<ffffffff8107083e>] kthread+0xce/0xf0
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81580999>] ? schedule+0x29/0x70
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
Sep 11 07:48:03 bp-01 kernel: [<ffffffff8158456c>] ret_from_fork+0x7c/0xb0
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
Sep 11 07:48:03 bp-01 kernel: INFO: task kjournald:27205 blocked for more than 120 seconds.
Sep 11 07:48:03 bp-01 kernel:      Tainted: G            E  3.17.0-rc4 #1
Sep 11 07:48:03 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 11 07:48:03 bp-01 kernel: kjournald       D 0000000000000000     0 27205      2 0x00000080
Sep 11 07:48:03 bp-01 kernel: ffff8803f5c07868 0000000000000046 ffff8803f5c07988 ffffffff81a19480
Sep 11 07:48:03 bp-01 kernel: ffff8803f5c04010 0000000000012bc0 0000000000012bc0 ffff8803f5774ec0
Sep 11 07:48:03 bp-01 kernel: ffff880216e06000 ffff880414f9e940 ffff880414f9e9b8 ffff8803f5c07898
Sep 11 07:48:03 bp-01 kernel: Call Trace:
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81580999>] schedule+0x29/0x70
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa0408e9d>] wait_barrier+0xbd/0x230 [raid1]
Sep 11 07:48:03 bp-01 kernel: [<ffffffff8108eb00>] ? bit_waitqueue+0xe0/0xe0
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa040baaa>] make_request+0x9a/0xc00 [raid1]
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81137ff0>] ? mempool_alloc+0x60/0x170
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa0423018>] raid_map+0x18/0x20 [dm_raid]
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa000336a>] __map_bio+0x4a/0x120 [dm_mod]
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa0003723>] __clone_and_map_data_bio+0x113/0x130 [dm_mod]
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa00037ac>] __split_and_process_non_flush+0x6c/0xb0 [dm_mod]
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa0003991>] __split_and_process_bio+0x1a1/0x200 [dm_mod]
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa0003b12>] _dm_request+0x122/0x190 [dm_mod]
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa0003ba8>] dm_request+0x28/0x40 [dm_mod]
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81269620>] generic_make_request+0xc0/0x100
Sep 11 07:48:03 bp-01 kernel: [<ffffffff812696d1>] submit_bio+0x71/0x140
Sep 11 07:48:03 bp-01 kernel: [<ffffffff811c9686>] _submit_bh+0x146/0x220
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa042e553>] journal_do_submit_data+0x43/0x60 [jbd]
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa042ea12>] journal_submit_data_buffers+0x202/0x2f0 [jbd]
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa042eda6>] journal_commit_transaction+0x2a6/0xf80 [jbd]
Sep 11 07:48:03 bp-01 kernel: [<ffffffff8108847f>] ? put_prev_entity+0x2f/0x400
Sep 11 07:48:03 bp-01 kernel: [<ffffffff810b21bb>] ? try_to_del_timer_sync+0x5b/0x70
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa0432ae1>] kjournald+0xf1/0x270 [jbd]
Sep 11 07:48:03 bp-01 kernel: [<ffffffff8108eb00>] ? bit_waitqueue+0xe0/0xe0
Sep 11 07:48:03 bp-01 kernel: [<ffffffffa04329f0>] ? commit_timeout+0x10/0x10 [jbd]
Sep 11 07:48:03 bp-01 kernel: [<ffffffff8107083e>] kthread+0xce/0xf0
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
Sep 11 07:48:03 bp-01 kernel: [<ffffffff8158456c>] ret_from_fork+0x7c/0xb0
Sep 11 07:48:03 bp-01 kernel: [<ffffffff81070770>] ? kthread_freezable_should_stop+0x70/0x70
Sep 11 08:40:01 bp-01 auditd[1981]: Audit daemon rotating log files

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Fixes for RAID1 resync
  2014-09-11 17:12   ` Brassow Jonathan
@ 2014-09-15  3:30     ` NeilBrown
  2014-09-16 16:31       ` Brassow Jonathan
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2014-09-15  3:30 UTC (permalink / raw)
  To: Brassow Jonathan; +Cc: Eivind Sarto, linux-raid, majianpeng

[-- Attachment #1: Type: text/plain, Size: 2108 bytes --]

On Thu, 11 Sep 2014 12:12:01 -0500 Brassow Jonathan <jbrassow@redhat.com>
wrote:

> 
> On Sep 10, 2014, at 10:45 PM, Brassow Jonathan wrote:
> 
> > 
> > On Sep 10, 2014, at 1:20 AM, NeilBrown wrote:
> > 
> >> 
> >> Jon: could you test with these patches on top of what you
> >> have just in case something happens to fix the problem without
> >> me realising it?
> > 
> > I'm on it.  The test is running.  I'll know later tomorrow.
> > 
> > brassow
> 
> The test is still failing from here.  I grabbed 3.17.0-rc4, added the 5 patches, and got the attached backtraces when testing.  As I said, the hangs are not exactly the same.  This set shows the mdX_raid1 thread in the middle of handling a read failure.

Thanks.
mdX_raid1 is blocked in freeze_array.
That could be caused by conf->nr_pending nor aligning properly with
conf->nr_queued.

Both normal IO and resync IO can be retried with reschedule_retry()
and so be counted into ->nr_queued, but only normal IO gets counted in
->nr_pending.

Previously could could only possibly have on or the other and when handling
a read failure it could only be normal IO.  But now that they two types can
interleave, we can have both normal and resync IO requests queued, so we need
to count them both in nr_pending.

So the following patch might help.

How complicated are your test scripts?  Could you send them to me so I can
try too?

Thanks,
NeilBrown

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 888dbdfb6986..6a9c73435eb8 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -856,6 +856,7 @@ static void raise_barrier(struct r1conf *conf, sector_t sector_nr)
 			     conf->next_resync + RESYNC_SECTORS),
 			    conf->resync_lock);
 
+	conf->nr_pending++;
 	spin_unlock_irq(&conf->resync_lock);
 }
 
@@ -865,6 +866,7 @@ static void lower_barrier(struct r1conf *conf)
 	BUG_ON(conf->barrier <= 0);
 	spin_lock_irqsave(&conf->resync_lock, flags);
 	conf->barrier--;
+	conf->nr_pending--;
 	spin_unlock_irqrestore(&conf->resync_lock, flags);
 	wake_up(&conf->wait_barrier);
 }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Fixes for RAID1 resync
  2014-09-15  3:30     ` NeilBrown
@ 2014-09-16 16:31       ` Brassow Jonathan
  2014-09-18  7:48         ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Brassow Jonathan @ 2014-09-16 16:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: Eivind Sarto, linux-raid, majianpeng


On Sep 14, 2014, at 10:30 PM, NeilBrown wrote:

> On Thu, 11 Sep 2014 12:12:01 -0500 Brassow Jonathan <jbrassow@redhat.com>
> wrote:
> 
>> 
>> On Sep 10, 2014, at 10:45 PM, Brassow Jonathan wrote:
>> 
>>> 
>>> On Sep 10, 2014, at 1:20 AM, NeilBrown wrote:
>>> 
>>>> 
>>>> Jon: could you test with these patches on top of what you
>>>> have just in case something happens to fix the problem without
>>>> me realising it?
>>> 
>>> I'm on it.  The test is running.  I'll know later tomorrow.
>>> 
>>> brassow
>> 
>> The test is still failing from here.  I grabbed 3.17.0-rc4, added the 5 patches, and got the attached backtraces when testing.  As I said, the hangs are not exactly the same.  This set shows the mdX_raid1 thread in the middle of handling a read failure.
> 
> Thanks.
> mdX_raid1 is blocked in freeze_array.
> That could be caused by conf->nr_pending nor aligning properly with
> conf->nr_queued.
> 
> Both normal IO and resync IO can be retried with reschedule_retry()
> and so be counted into ->nr_queued, but only normal IO gets counted in
> ->nr_pending.
> 
> Previously could could only possibly have on or the other and when handling
> a read failure it could only be normal IO.  But now that they two types can
> interleave, we can have both normal and resync IO requests queued, so we need
> to count them both in nr_pending.
> 
> So the following patch might help.
> 
> How complicated are your test scripts?  Could you send them to me so I can
> try too?
> 
> Thanks,
> NeilBrown
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 888dbdfb6986..6a9c73435eb8 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -856,6 +856,7 @@ static void raise_barrier(struct r1conf *conf, sector_t sector_nr)
> 			     conf->next_resync + RESYNC_SECTORS),
> 			    conf->resync_lock);
> 
> +	conf->nr_pending++;
> 	spin_unlock_irq(&conf->resync_lock);
> }
> 
> @@ -865,6 +866,7 @@ static void lower_barrier(struct r1conf *conf)
> 	BUG_ON(conf->barrier <= 0);
> 	spin_lock_irqsave(&conf->resync_lock, flags);
> 	conf->barrier--;
> +	conf->nr_pending--;
> 	spin_unlock_irqrestore(&conf->resync_lock, flags);
> 	wake_up(&conf->wait_barrier);
> }

No luck, it is failing faster than before.

I haven't looked into this myself, but the dm-raid1.c code makes use of dm-region-hash.c which coordinates recovery and nominal I/O in a way that allows them to both occur in a simple, non-overlapping way.  I'm not sure it would make sense to use that instead of this new approach.  I have no idea how much effort that would be, but I could have someone look into it at some point if you think it might be interesting.

 brassow

[-rc5 kernel with previous 5 patches, plus the one above]
Sep 15 16:52:35 bp-01 kernel: INFO: task kworker/u129:2:21621 blocked for more than 120 seconds.
Sep 15 16:52:35 bp-01 kernel:      Tainted: G            E  3.17.0-rc5 #1
Sep 15 16:52:35 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 15 16:52:35 bp-01 kernel: kworker/u129:2  D 0000000000000001     0 21621      2 0x00000080
Sep 15 16:52:35 bp-01 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:11)
Sep 15 16:52:35 bp-01 kernel: ffff8802040538c8 0000000000000046 0000000000000000 ffff880217254150
Sep 15 16:52:35 bp-01 kernel: ffff880204050010 0000000000012bc0 0000000000012bc0 ffff8802173c2490
Sep 15 16:52:35 bp-01 kernel: ffff880204053898 ffff88021fa32bc0 ffff8802173c2490 ffffffff81580a60
Sep 15 16:52:35 bp-01 kernel: Call Trace:
Sep 15 16:52:35 bp-01 kernel: [<ffffffff81580a60>] ? yield_to+0x180/0x180
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580779>] schedule+0x29/0x70
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8158084c>] io_schedule+0x8c/0xd0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580a8c>] bit_wait_io+0x2c/0x50
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580d75>] __wait_on_bit+0x65/0x90
Sep 15 16:52:36 bp-01 kernel: [<ffffffff811350a4>] wait_on_page_bit+0xc4/0xd0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8108eb00>] ? wake_atomic_t_function+0x40/0x40
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8114126a>] write_cache_pages+0x33a/0x510
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8113fda0>] ? set_page_dirty+0x60/0x60
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81141491>] generic_writepages+0x51/0x80
Sep 15 16:52:36 bp-01 kernel: [<ffffffff811414f5>] do_writepages+0x35/0x40
Sep 15 16:52:36 bp-01 kernel: [<ffffffff811bfbe9>] __writeback_single_inode+0x49/0x230
Sep 15 16:52:36 bp-01 kernel: [<ffffffff811c3029>] writeback_sb_inodes+0x249/0x360
Sep 15 16:52:36 bp-01 kernel: [<ffffffff811c3309>] wb_writeback+0xf9/0x2c0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff811c3552>] wb_do_writeback+0x82/0x1f0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff810790c6>] ? ttwu_queue+0x136/0x150
Sep 15 16:52:36 bp-01 kernel: [<ffffffff811c3730>] bdi_writeback_workfn+0x70/0x210
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8106b5fe>] process_one_work+0x14e/0x430
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8106b9ff>] worker_thread+0x11f/0x3c0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8106b8e0>] ? process_one_work+0x430/0x430
Sep 15 16:52:36 bp-01 kernel: [<ffffffff810707de>] kthread+0xce/0xf0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81070710>] ? kthread_freezable_should_stop+0x70/0x70
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8158432c>] ret_from_fork+0x7c/0xb0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81070710>] ? kthread_freezable_should_stop+0x70/0x70
Sep 15 16:52:36 bp-01 kernel: INFO: task kjournald:26375 blocked for more than 120 seconds.
Sep 15 16:52:36 bp-01 kernel:      Tainted: G            E  3.17.0-rc5 #1
Sep 15 16:52:36 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 15 16:52:36 bp-01 kernel: kjournald       D 0000000000000004     0 26375      2 0x00000080
Sep 15 16:52:36 bp-01 kernel: ffff8804019dfb98 0000000000000046 000000000000003c ffff88021726cf40
Sep 15 16:52:36 bp-01 kernel: ffff8804019dc010 0000000000012bc0 0000000000012bc0 ffff880415632f00
Sep 15 16:52:36 bp-01 kernel: ffff8804019dfb68 ffff88021fa92bc0 ffff880415632f00 ffff8804019dfc50
Sep 15 16:52:36 bp-01 kernel: Call Trace:
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580a60>] ? yield_to+0x180/0x180
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580779>] schedule+0x29/0x70
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8158084c>] io_schedule+0x8c/0xd0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580a8c>] bit_wait_io+0x2c/0x50
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580b76>] __wait_on_bit_lock+0x76/0xb0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580a60>] ? yield_to+0x180/0x180
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81580c28>] out_of_line_wait_on_bit_lock+0x78/0x90
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8108eb00>] ? wake_atomic_t_function+0x40/0x40
Sep 15 16:52:36 bp-01 kernel: [<ffffffff811ca27e>] __lock_buffer+0x2e/0x30
Sep 15 16:52:36 bp-01 kernel: [<ffffffffa042eac0>] journal_submit_data_buffers+0x2b0/0x2f0 [jbd]
Sep 15 16:52:36 bp-01 kernel: [<ffffffffa042eda6>] journal_commit_transaction+0x2a6/0xf80 [jbd]
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8108841f>] ? put_prev_entity+0x2f/0x400
Sep 15 16:52:36 bp-01 kernel: [<ffffffff810b211b>] ? try_to_del_timer_sync+0x5b/0x70
Sep 15 16:52:36 bp-01 kernel: [<ffffffffa0432ae1>] kjournald+0xf1/0x270 [jbd]
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8108ea70>] ? bit_waitqueue+0xb0/0xb0
Sep 15 16:52:36 bp-01 kernel: [<ffffffffa04329f0>] ? commit_timeout+0x10/0x10 [jbd]
Sep 15 16:52:36 bp-01 kernel: [<ffffffff810707de>] kthread+0xce/0xf0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81070710>] ? kthread_freezable_should_stop+0x70/0x70
Sep 15 16:52:36 bp-01 kernel: [<ffffffff8158432c>] ret_from_fork+0x7c/0xb0
Sep 15 16:52:36 bp-01 kernel: [<ffffffff81070710>] ? kthread_freezable_should_stop+0x70/0x70




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Fixes for RAID1 resync
  2014-09-16 16:31       ` Brassow Jonathan
@ 2014-09-18  7:48         ` NeilBrown
  2014-09-24  4:25           ` Brassow Jonathan
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2014-09-18  7:48 UTC (permalink / raw)
  To: Brassow Jonathan; +Cc: Eivind Sarto, linux-raid, majianpeng

[-- Attachment #1: Type: text/plain, Size: 4707 bytes --]

On Tue, 16 Sep 2014 11:31:26 -0500 Brassow Jonathan <jbrassow@redhat.com>
wrote:

> 
> On Sep 14, 2014, at 10:30 PM, NeilBrown wrote:
> 
> > On Thu, 11 Sep 2014 12:12:01 -0500 Brassow Jonathan <jbrassow@redhat.com>
> > wrote:
> > 
> >> 
> >> On Sep 10, 2014, at 10:45 PM, Brassow Jonathan wrote:
> >> 
> >>> 
> >>> On Sep 10, 2014, at 1:20 AM, NeilBrown wrote:
> >>> 
> >>>> 
> >>>> Jon: could you test with these patches on top of what you
> >>>> have just in case something happens to fix the problem without
> >>>> me realising it?
> >>> 
> >>> I'm on it.  The test is running.  I'll know later tomorrow.
> >>> 
> >>> brassow
> >> 
> >> The test is still failing from here.  I grabbed 3.17.0-rc4, added the 5 patches, and got the attached backtraces when testing.  As I said, the hangs are not exactly the same.  This set shows the mdX_raid1 thread in the middle of handling a read failure.
> > 
> > Thanks.
> > mdX_raid1 is blocked in freeze_array.
> > That could be caused by conf->nr_pending nor aligning properly with
> > conf->nr_queued.
> > 
> > Both normal IO and resync IO can be retried with reschedule_retry()
> > and so be counted into ->nr_queued, but only normal IO gets counted in
> > ->nr_pending.
> > 
> > Previously could could only possibly have on or the other and when handling
> > a read failure it could only be normal IO.  But now that they two types can
> > interleave, we can have both normal and resync IO requests queued, so we need
> > to count them both in nr_pending.
> > 
> > So the following patch might help.
> > 
> > How complicated are your test scripts?  Could you send them to me so I can
> > try too?
> > 
> > Thanks,
> > NeilBrown
> > 
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 888dbdfb6986..6a9c73435eb8 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -856,6 +856,7 @@ static void raise_barrier(struct r1conf *conf, sector_t sector_nr)
> > 			     conf->next_resync + RESYNC_SECTORS),
> > 			    conf->resync_lock);
> > 
> > +	conf->nr_pending++;
> > 	spin_unlock_irq(&conf->resync_lock);
> > }
> > 
> > @@ -865,6 +866,7 @@ static void lower_barrier(struct r1conf *conf)
> > 	BUG_ON(conf->barrier <= 0);
> > 	spin_lock_irqsave(&conf->resync_lock, flags);
> > 	conf->barrier--;
> > +	conf->nr_pending--;
> > 	spin_unlock_irqrestore(&conf->resync_lock, flags);
> > 	wake_up(&conf->wait_barrier);
> > }
> 
> No luck, it is failing faster than before.
> 
> I haven't looked into this myself, but the dm-raid1.c code makes use of dm-region-hash.c which coordinates recovery and nominal I/O in a way that allows them to both occur in a simple, non-overlapping way.  I'm not sure it would make sense to use that instead of this new approach.  I have no idea how much effort that would be, but I could have someone look into it at some point if you think it might be interesting.
> 

Hi Jon,
 I can see the appeal of using known-working code, but there is every chance
 that we would break it when plugging it into md ;-)

 I've found another bug.... it is a very subtle one and it has been around
 since before the patch you bisected to so it probably isn't your bug.
 It also only affects array with bad-blocks listed.  The patch is below
 but I very much doubt testing will show any change...

 I'll keep looking..... oh, found one.  This one looks more convincing.
 If memory is short, make_request() will allocate an r1bio from the mempool
 rather than from the slab.  That r1bio won't have just been zeroed.
 This is mostly OK as we initialise all the fields that aren't left in
 a clean state ... except ->start_next_window.
 We initialise that for write requests, but not for read.
 So when we use a mempool-allocated r1bio that was previously used for
 write and had ->start_next_window set, and is now used for read,
 then things will go wrong.
 So this patch definitely is worth testing.
 Thanks for your continued patience in testing!!!


Thanks,
NeilBrown

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a95f9e179e6f..7187d9b8431f 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1185,6 +1185,7 @@ read_again:
 				   atomic_read(&bitmap->behind_writes) == 0);
 		}
 		r1_bio->read_disk = rdisk;
+		r1_bio->start_next_window = 0;
 
 		read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
 		bio_trim(read_bio, r1_bio->sector - bio->bi_iter.bi_sector,
@@ -1444,6 +1445,7 @@ read_again:
 		r1_bio->state = 0;
 		r1_bio->mddev = mddev;
 		r1_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
+		start_next_window = wait_barrier(conf, bio);
 		goto retry_write;
 	}
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Fixes for RAID1 resync
  2014-09-18  7:48         ` NeilBrown
@ 2014-09-24  4:25           ` Brassow Jonathan
  2014-09-24  4:49             ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Brassow Jonathan @ 2014-09-24  4:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: Eivind Sarto, linux-raid, majianpeng


On Sep 18, 2014, at 2:48 AM, NeilBrown wrote:

> On Tue, 16 Sep 2014 11:31:26 -0500 Brassow Jonathan <jbrassow@redhat.com>
> wrote:
> 
>> 
>> On Sep 14, 2014, at 10:30 PM, NeilBrown wrote:
>> 
>>> On Thu, 11 Sep 2014 12:12:01 -0500 Brassow Jonathan <jbrassow@redhat.com>
>>> wrote:
>>> 
>>>> 
>>>> On Sep 10, 2014, at 10:45 PM, Brassow Jonathan wrote:
>>>> 
>>>>> 
>>>>> On Sep 10, 2014, at 1:20 AM, NeilBrown wrote:
>>>>> 
>>>>>> 
>>>>>> Jon: could you test with these patches on top of what you
>>>>>> have just in case something happens to fix the problem without
>>>>>> me realising it?
>>>>> 
>>>>> I'm on it.  The test is running.  I'll know later tomorrow.
>>>>> 
>>>>> brassow
>>>> 
>>>> The test is still failing from here.  I grabbed 3.17.0-rc4, added the 5 patches, and got the attached backtraces when testing.  As I said, the hangs are not exactly the same.  This set shows the mdX_raid1 thread in the middle of handling a read failure.
>>> 
>>> Thanks.
>>> mdX_raid1 is blocked in freeze_array.
>>> That could be caused by conf->nr_pending nor aligning properly with
>>> conf->nr_queued.
>>> 
>>> Both normal IO and resync IO can be retried with reschedule_retry()
>>> and so be counted into ->nr_queued, but only normal IO gets counted in
>>> ->nr_pending.
>>> 
>>> Previously could could only possibly have on or the other and when handling
>>> a read failure it could only be normal IO.  But now that they two types can
>>> interleave, we can have both normal and resync IO requests queued, so we need
>>> to count them both in nr_pending.
>>> 
>>> So the following patch might help.
>>> 
>>> How complicated are your test scripts?  Could you send them to me so I can
>>> try too?
>>> 
>>> Thanks,
>>> NeilBrown
>>> 
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 888dbdfb6986..6a9c73435eb8 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -856,6 +856,7 @@ static void raise_barrier(struct r1conf *conf, sector_t sector_nr)
>>> 			     conf->next_resync + RESYNC_SECTORS),
>>> 			    conf->resync_lock);
>>> 
>>> +	conf->nr_pending++;
>>> 	spin_unlock_irq(&conf->resync_lock);
>>> }
>>> 
>>> @@ -865,6 +866,7 @@ static void lower_barrier(struct r1conf *conf)
>>> 	BUG_ON(conf->barrier <= 0);
>>> 	spin_lock_irqsave(&conf->resync_lock, flags);
>>> 	conf->barrier--;
>>> +	conf->nr_pending--;
>>> 	spin_unlock_irqrestore(&conf->resync_lock, flags);
>>> 	wake_up(&conf->wait_barrier);
>>> }
>> 
>> No luck, it is failing faster than before.
>> 
>> I haven't looked into this myself, but the dm-raid1.c code makes use of dm-region-hash.c which coordinates recovery and nominal I/O in a way that allows them to both occur in a simple, non-overlapping way.  I'm not sure it would make sense to use that instead of this new approach.  I have no idea how much effort that would be, but I could have someone look into it at some point if you think it might be interesting.
>> 
> 
> Hi Jon,
> I can see the appeal of using known-working code, but there is every chance
> that we would break it when plugging it into md ;-)
> 
> I've found another bug.... it is a very subtle one and it has been around
> since before the patch you bisected to so it probably isn't your bug.
> It also only affects array with bad-blocks listed.  The patch is below
> but I very much doubt testing will show any change...
> 
> I'll keep looking..... oh, found one.  This one looks more convincing.
> If memory is short, make_request() will allocate an r1bio from the mempool
> rather than from the slab.  That r1bio won't have just been zeroed.
> This is mostly OK as we initialise all the fields that aren't left in
> a clean state ... except ->start_next_window.
> We initialise that for write requests, but not for read.
> So when we use a mempool-allocated r1bio that was previously used for
> write and had ->start_next_window set, and is now used for read,
> then things will go wrong.
> So this patch definitely is worth testing.
> Thanks for your continued patience in testing!!!
> 
> 
> Thanks,
> NeilBrown
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index a95f9e179e6f..7187d9b8431f 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1185,6 +1185,7 @@ read_again:
> 				   atomic_read(&bitmap->behind_writes) == 0);
> 		}
> 		r1_bio->read_disk = rdisk;
> +		r1_bio->start_next_window = 0;
> 
> 		read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
> 		bio_trim(read_bio, r1_bio->sector - bio->bi_iter.bi_sector,
> @@ -1444,6 +1445,7 @@ read_again:
> 		r1_bio->state = 0;
> 		r1_bio->mddev = mddev;
> 		r1_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
> +		start_next_window = wait_barrier(conf, bio);
> 		goto retry_write;
> 	}


Sorry, still not there yet.

I'm sorry I haven't had more time to spend on this.  I'll try to get some help (perhaps from Heinz) and see if we can pitch-in instead of making you do all the work.

 brassow

Sep 23 05:01:39 bp-01 kernel: INFO: task kworker/u129:2:19096 blocked for more than 120 seconds.
Sep 23 05:01:39 bp-01 kernel:      Tainted: G            E  3.17.0-rc6 #1
Sep 23 05:01:39 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 05:01:39 bp-01 kernel: kworker/u129:2  D 0000000000000002     0 19096      2 0x00000080
Sep 23 05:01:39 bp-01 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:29)
Sep 23 05:01:39 bp-01 kernel: ffff8802079138c8 0000000000000046 ffffea0006f7b738 ffff880217260f00
Sep 23 05:01:39 bp-01 kernel: ffff880207910010 0000000000012bc0 0000000000012bc0 ffff880216aca0d0
Sep 23 05:01:39 bp-01 kernel: ffff880207913898 ffff88021fa52bc0 ffff880216aca0d0 ffff880207913980
Sep 23 05:01:39 bp-01 kernel: Call Trace:
Sep 23 05:01:39 bp-01 kernel: [<ffffffff81580ac0>] ? yield_to+0x180/0x180
Sep 23 05:01:39 bp-01 kernel: [<ffffffff815807d9>] schedule+0x29/0x70
Sep 23 05:01:39 bp-01 kernel: [<ffffffff815808ac>] io_schedule+0x8c/0xd0
Sep 23 05:01:39 bp-01 kernel: [<ffffffff81580aec>] bit_wait_io+0x2c/0x50
Sep 23 05:01:39 bp-01 kernel: [<ffffffff81580bd6>] __wait_on_bit_lock+0x76/0xb0
Sep 23 05:01:39 bp-01 kernel: [<ffffffff811cd006>] ? block_write_full_page+0xc6/0x100
Sep 23 05:01:39 bp-01 kernel: [<ffffffff81134cd8>] __lock_page+0xa8/0xb0
Sep 23 05:01:39 bp-01 kernel: [<ffffffff8108eb30>] ? wake_atomic_t_function+0x40/0x40
Sep 23 05:01:39 bp-01 kernel: [<ffffffff81141258>] write_cache_pages+0x318/0x510
Sep 23 05:01:39 bp-01 kernel: [<ffffffff8113fdb0>] ? set_page_dirty+0x60/0x60
Sep 23 05:01:39 bp-01 kernel: [<ffffffff811414a1>] generic_writepages+0x51/0x80
Sep 23 05:01:39 bp-01 kernel: [<ffffffff81141505>] do_writepages+0x35/0x40
Sep 23 05:01:39 bp-01 kernel: [<ffffffff811bfbe9>] __writeback_single_inode+0x49/0x230
Sep 23 05:01:39 bp-01 kernel: [<ffffffff811c3029>] writeback_sb_inodes+0x249/0x360
Sep 23 05:01:39 bp-01 kernel: [<ffffffff811c3309>] wb_writeback+0xf9/0x2c0
Sep 23 05:01:39 bp-01 kernel: [<ffffffff811c3552>] wb_do_writeback+0x82/0x1f0
Sep 23 05:01:39 bp-01 kernel: [<ffffffff811c3730>] bdi_writeback_workfn+0x70/0x210
Sep 23 05:01:39 bp-01 kernel: [<ffffffff8106b62e>] process_one_work+0x14e/0x430
Sep 23 05:01:39 bp-01 kernel: [<ffffffff8106ba2f>] worker_thread+0x11f/0x3c0
Sep 23 05:01:39 bp-01 kernel: [<ffffffff8106b910>] ? process_one_work+0x430/0x430
Sep 23 05:01:39 bp-01 kernel: [<ffffffff8107080e>] kthread+0xce/0xf0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81070740>] ? kthread_freezable_should_stop+0x70/0x70
Sep 23 05:01:40 bp-01 kernel: [<ffffffff815843ac>] ret_from_fork+0x7c/0xb0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81070740>] ? kthread_freezable_should_stop+0x70/0x70
Sep 23 05:01:40 bp-01 kernel: INFO: task kjournald:21626 blocked for more than 120 seconds.
Sep 23 05:01:40 bp-01 kernel:      Tainted: G            E  3.17.0-rc6 #1
Sep 23 05:01:40 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 05:01:40 bp-01 kernel: kjournald       D 0000000000000002     0 21626      2 0x00000080
Sep 23 05:01:40 bp-01 kernel: ffff8802024bbc08 0000000000000046 ffff8802024bbbd8 ffff880217260f00
Sep 23 05:01:40 bp-01 kernel: ffff8802024b8010 0000000000012bc0 0000000000012bc0 ffff880214de9240
Sep 23 05:01:40 bp-01 kernel: ffff8802024bbc08 ffff88021fa52bc0 ffff880214de9240 ffffffff81580ac0
Sep 23 05:01:40 bp-01 kernel: Call Trace:
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580ac0>] ? yield_to+0x180/0x180
Sep 23 05:01:40 bp-01 kernel: [<ffffffff815807d9>] schedule+0x29/0x70
Sep 23 05:01:40 bp-01 kernel: [<ffffffff815808ac>] io_schedule+0x8c/0xd0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580aec>] bit_wait_io+0x2c/0x50
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580dd5>] __wait_on_bit+0x65/0x90
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580ac0>] ? yield_to+0x180/0x180
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580e78>] out_of_line_wait_on_bit+0x78/0x90
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8108eb30>] ? wake_atomic_t_function+0x40/0x40
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811ca05e>] __wait_on_buffer+0x2e/0x30
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa03f9372>] journal_commit_transaction+0x872/0xf80 [jbd]
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8108844f>] ? put_prev_entity+0x2f/0x400
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8108eaa0>] ? bit_waitqueue+0xb0/0xb0
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa03fcae1>] kjournald+0xf1/0x270 [jbd]
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8108eaa0>] ? bit_waitqueue+0xb0/0xb0
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa03fc9f0>] ? commit_timeout+0x10/0x10 [jbd]
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8107080e>] kthread+0xce/0xf0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81070740>] ? kthread_freezable_should_stop+0x70/0x70
Sep 23 05:01:40 bp-01 kernel: [<ffffffff815843ac>] ret_from_fork+0x7c/0xb0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81070740>] ? kthread_freezable_should_stop+0x70/0x70
Sep 23 05:01:40 bp-01 kernel: INFO: task xdoio:21891 blocked for more than 120 seconds.
Sep 23 05:01:40 bp-01 kernel:      Tainted: G            E  3.17.0-rc6 #1
Sep 23 05:01:40 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 05:01:40 bp-01 kernel: xdoio           D 0000000000000001     0 21891  21890 0x00000080
Sep 23 05:01:40 bp-01 kernel: ffff880405d8f978 0000000000000082 ffff880414ec0c00 ffff880217254150
Sep 23 05:01:40 bp-01 kernel: ffff880405d8c010 0000000000012bc0 0000000000012bc0 ffff8803e7be9080
Sep 23 05:01:40 bp-01 kernel: ffff880405d8f958 ffff88021fa32bc0 ffff8803e7be9080 ffffffff81580ac0
Sep 23 05:01:40 bp-01 kernel: Call Trace:
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580ac0>] ? yield_to+0x180/0x180
Sep 23 05:01:40 bp-01 kernel: [<ffffffff815807d9>] schedule+0x29/0x70
Sep 23 05:01:40 bp-01 kernel: [<ffffffff815808ac>] io_schedule+0x8c/0xd0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580aec>] bit_wait_io+0x2c/0x50
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580dd5>] __wait_on_bit+0x65/0x90
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580ac0>] ? yield_to+0x180/0x180
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81580e78>] out_of_line_wait_on_bit+0x78/0x90
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8108eb30>] ? wake_atomic_t_function+0x40/0x40
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811ca05e>] __wait_on_buffer+0x2e/0x30
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811cb440>] __sync_dirty_buffer+0xb0/0xd0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811cb473>] sync_dirty_buffer+0x13/0x20
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa03f7e12>] journal_dirty_data+0x1f2/0x260 [jbd]
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa0415200>] ext3_journal_dirty_data+0x20/0x50 [ext3]
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa0415255>] journal_dirty_data_fn+0x25/0x30 [ext3]
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa04148d4>] walk_page_buffers+0x84/0xc0 [ext3]
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa0415230>] ? ext3_journal_dirty_data+0x50/0x50 [ext3]
Sep 23 05:01:40 bp-01 kernel: [<ffffffffa041897f>] ext3_ordered_write_end+0xaf/0x1e0 [ext3]
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81134742>] generic_perform_write+0x112/0x1c0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8113789f>] __generic_file_write_iter+0x18f/0x390
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81137aee>] generic_file_write_iter+0x4e/0xd0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81197187>] do_iter_readv_writev+0x77/0xb0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8122e343>] ? security_file_permission+0x23/0x90
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81137aa0>] ? __generic_file_write_iter+0x390/0x390
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81137aa0>] ? __generic_file_write_iter+0x390/0x390
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811988a0>] do_readv_writev+0xd0/0x320
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81137aa0>] ? __generic_file_write_iter+0x390/0x390
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81197240>] ? do_sync_readv_writev+0x80/0x80
Sep 23 05:01:40 bp-01 kernel: [<ffffffff810c722f>] ? do_futex+0xaf/0x1b0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81198b31>] vfs_writev+0x41/0x50
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81198c66>] SyS_writev+0x56/0xf0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff810ec466>] ? __audit_syscall_exit+0x216/0x2c0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81584452>] system_call_fastpath+0x16/0x1b
Sep 23 05:01:40 bp-01 kernel: INFO: task sync:21916 blocked for more than 120 seconds.
Sep 23 05:01:40 bp-01 kernel:      Tainted: G            E  3.17.0-rc6 #1
Sep 23 05:01:40 bp-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 23 05:01:40 bp-01 kernel: sync            D 0000000000000001     0 21916  21915 0x00000080
Sep 23 05:01:40 bp-01 kernel: ffff88020902bd68 0000000000000086 ffff88020902bd28 ffff880217254150
Sep 23 05:01:40 bp-01 kernel: ffff880209028010 0000000000012bc0 0000000000012bc0 ffff880216c36dc0
Sep 23 05:01:40 bp-01 kernel: ffff88020902bda8 ffff88020902bec8 ffff88020902bed0 7fffffffffffffff
Sep 23 05:01:40 bp-01 kernel: Call Trace:
Sep 23 05:01:40 bp-01 kernel: [<ffffffff815807d9>] schedule+0x29/0x70
Sep 23 05:01:40 bp-01 kernel: [<ffffffff815832ed>] schedule_timeout+0x13d/0x1d0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8106af82>] ? __queue_delayed_work+0xb2/0x1a0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81581756>] wait_for_completion+0xc6/0x100
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8107e3a0>] ? try_to_wake_up+0x220/0x220
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81583c3b>] ? _raw_spin_unlock_bh+0x1b/0x20
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811c74b0>] ? fdatawrite_one_bdev+0x20/0x20
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811c058d>] sync_inodes_sb+0x9d/0xd0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811c74cd>] sync_inodes_one_sb+0x1d/0x30
Sep 23 05:01:40 bp-01 kernel: [<ffffffff8119aab3>] iterate_supers+0xb3/0xe0
Sep 23 05:01:40 bp-01 kernel: [<ffffffff811c7515>] sys_sync+0x35/0x90
Sep 23 05:01:40 bp-01 kernel: [<ffffffff81584452>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] Fixes for RAID1 resync
  2014-09-24  4:25           ` Brassow Jonathan
@ 2014-09-24  4:49             ` NeilBrown
  0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2014-09-24  4:49 UTC (permalink / raw)
  To: Brassow Jonathan; +Cc: Eivind Sarto, linux-raid, majianpeng

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]

On Tue, 23 Sep 2014 23:25:12 -0500 Brassow Jonathan <jbrassow@redhat.com>
wrote:

> 
> Sorry, still not there yet.

No??  I was *sure* I had it that time!

> 
> I'm sorry I haven't had more time to spend on this.  I'll try to get some help (perhaps from Heinz) and see if we can pitch-in instead of making you do all the work.

Could you include /proc/PID/stack for the two md kernel threads?  They are
unlikely to differ from what we have seen before but as they are the
strongest pointer to where the problem is, I like to be able to see them :-)

Also I'd like to see the disassembly of raise_barrier() just to be certain
which of the two 'wait's it is waiting in.

One thing that might be useful is to change the various wait_event_lock_irq()
calls to wait_event_lock_irq_cmd() with an extra argument 'dotrace(conf)'
where dotrace() is a new function that first calls
   schedule_timeout(60*HZ);
and then if that returns 0, prints out all fields that might be of interest.
Also get it to print which 'wait' it was called from (e.g. pass a string to
dotrace).

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-09-24  4:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-10  6:20 [PATCH 0/5] Fixes for RAID1 resync NeilBrown
2014-09-10  6:20 ` [PATCH 1/5] md/raid1: be more cautious where we read-balance during resync NeilBrown
2014-09-10  6:20 ` [PATCH 4/5] md/raid1: Don't use next_resync to determine how far resync has progressed NeilBrown
2014-09-10  6:20 ` [PATCH 2/5] md/raid1: clean up request counts properly in close_sync() NeilBrown
2014-09-10  6:20 ` [PATCH 3/5] md/raid1: make sure resync waits for conflicting writes to complete NeilBrown
2014-09-10  6:20 ` [PATCH 5/5] md/raid1: update next_resync under resync_lock NeilBrown
2014-09-11  3:45 ` [PATCH 0/5] Fixes for RAID1 resync Brassow Jonathan
2014-09-11 17:12   ` Brassow Jonathan
2014-09-15  3:30     ` NeilBrown
2014-09-16 16:31       ` Brassow Jonathan
2014-09-18  7:48         ` NeilBrown
2014-09-24  4:25           ` Brassow Jonathan
2014-09-24  4:49             ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).