[md PATCH 0/4] Improve blktrace tracing of md.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [md PATCH 0/4] Improve blktrace tracing of md.
@ 2016-11-14  5:30 NeilBrown
  2016-11-14  5:30 ` [md PATCH 1/4] md: add block tracing for bio_remapping NeilBrown
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: NeilBrown @ 2016-11-14  5:30 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

blktrace on md devices reports when a request is queued and when it is
split, but request completion and the mapping to subordinate devices
is not reported.
So add that, as well some some events when IO is delayed for one
reason or another (eg. bitmap updates etc).

---

NeilBrown (4):
      md: add block tracing for bio_remapping
      md: add bio completion tracing for raid1/raid10
      md/bitmap: add blktrace event for writes to the bitmap.
      md/raid1,raid10: add blktrace records when IO is delayed.


 drivers/md/bitmap.c |   11 ++++++++++-
 drivers/md/linear.c |    8 +++++++-
 drivers/md/raid0.c  |    8 +++++++-
 drivers/md/raid1.c  |   42 +++++++++++++++++++++++++++++++++++++++---
 drivers/md/raid10.c |   38 ++++++++++++++++++++++++++++++++++++--
 5 files changed, 99 insertions(+), 8 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [md PATCH 2/4] md: add bio completion tracing for raid1/raid10
  2016-11-14  5:30 [md PATCH 0/4] Improve blktrace tracing of md NeilBrown
                   ` (2 preceding siblings ...)
  2016-11-14  5:30 ` [md PATCH 3/4] md/bitmap: add blktrace event for writes to the bitmap NeilBrown
@ 2016-11-14  5:30 ` NeilBrown
  2016-11-16 14:32   ` Christoph Hellwig
  3 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2016-11-14  5:30 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

raid5 already has this, as does dm.
linear and raid0 do no see completions, only bio_chain_end() or bio_endio()
see those.
So just add it for raid1 and raid10.

Between
 Commit: 3a366e614d08 ("block: add missing block_bio_complete() tracepoint")
and
 Commit: 0a82a8d132b2 ("Revert "block: add missing block_bio_complete() tracepoint"")
in the 3.9-rc series, this was done centrally in bio_endio().
Until/unless that is resurected, do the tracing in the md/raid code.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c  |    1 +
 drivers/md/raid10.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3710a792a149..0674e5a0142e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -257,6 +257,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		bio->bi_error = -EIO;
 
 	if (done) {
+		trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), bio, bio->bi_error);
 		bio_endio(bio);
 		/*
 		 * Wake up any possible resync thread that waits for the device
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d144c3425824..c3036099ff9a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -311,6 +311,7 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
 	if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
 		bio->bi_error = -EIO;
 	if (done) {
+		trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), bio, bio->bi_error);
 		bio_endio(bio);
 		/*
 		 * Wake up any possible resync thread that waits for the device



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [md PATCH 1/4] md: add block tracing for bio_remapping
  2016-11-14  5:30 [md PATCH 0/4] Improve blktrace tracing of md NeilBrown
@ 2016-11-14  5:30 ` NeilBrown
  2016-11-16 19:29   ` Shaohua Li
  2016-11-14  5:30 ` [md PATCH 4/4] md/raid1, raid10: add blktrace records when IO is delayed NeilBrown
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2016-11-14  5:30 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

The block tracing infrastructure (accessed with blktrace/blkparse)
supports the tracing of mapping bios from one device to another.
This is currently used when a bio in a partition is mapped to the
whole device, when bios are mapped by dm, and for mapping in md/raid5.
Other md personalities do not include this tracing yet, so add it.

When a read-error is detected we redirect the request to a different device.
This could justifiably be seen as a new mapping for the originial bio,
or a secondary mapping for the bio that errors.  This patch uses
the second option.

When md is used under dm-raid, the mappings are not traced as we do
not have access to the block device number of the parent.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/linear.c |    8 +++++++-
 drivers/md/raid0.c  |    8 +++++++-
 drivers/md/raid1.c  |   33 ++++++++++++++++++++++++++++++---
 drivers/md/raid10.c |   29 +++++++++++++++++++++++++++--
 4 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 9c7d4f5483ea..8c0bccfa53a2 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -21,6 +21,7 @@
 #include <linux/seq_file.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <trace/events/block.h>
 #include "md.h"
 #include "linear.h"
 
@@ -256,8 +257,13 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
 			 !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
 			/* Just ignore it */
 			bio_endio(split);
-		} else
+		} else {
+			if (mddev->gendisk)
+				trace_block_bio_remap(bdev_get_queue(split->bi_bdev),
+						      split, disk_devt(mddev->gendisk),
+						      bio->bi_iter.bi_sector);
 			generic_make_request(split);
+		}
 	} while (split != bio);
 	return;
 
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index b3ba77a3c3bc..841b3ad0f5ff 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -21,6 +21,7 @@
 #include <linux/seq_file.h>
 #include <linux/module.h>
 #include <linux/slab.h>
+#include <trace/events/block.h>
 #include "md.h"
 #include "raid0.h"
 #include "raid5.h"
@@ -491,8 +492,13 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
 			 !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
 			/* Just ignore it */
 			bio_endio(split);
-		} else
+		} else {
+			if (mddev->gendisk)
+				trace_block_bio_remap(bdev_get_queue(split->bi_bdev),
+						      split, disk_devt(mddev->gendisk),
+						      bio->bi_iter.bi_sector);
 			generic_make_request(split);
+		}
 	} while (split != bio);
 }
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 9ac61cd85e5c..3710a792a149 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -37,6 +37,7 @@
 #include <linux/module.h>
 #include <linux/seq_file.h>
 #include <linux/ratelimit.h>
+#include <trace/events/block.h>
 #include "md.h"
 #include "raid1.h"
 #include "bitmap.h"
@@ -743,6 +744,7 @@ static void flush_pending_writes(struct r1conf *conf)
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
 			struct md_rdev *rdev = (void*)bio->bi_bdev;
+			struct r1bio *r1_bio = bio->bi_private;
 			bio->bi_next = NULL;
 			bio->bi_bdev = rdev->bdev;
 			if (test_bit(Faulty, &rdev->flags)) {
@@ -752,8 +754,13 @@ static void flush_pending_writes(struct r1conf *conf)
 					    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
 				/* Just ignore it */
 				bio_endio(bio);
-			else
+			else {
+				if (conf->mddev->gendisk)
+					trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+							      bio, disk_devt(conf->mddev->gendisk),
+							      r1_bio->sector);
 				generic_make_request(bio);
+			}
 			bio = next;
 		}
 	} else
@@ -1022,6 +1029,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	while (bio) { /* submit pending writes */
 		struct bio *next = bio->bi_next;
 		struct md_rdev *rdev = (void*)bio->bi_bdev;
+		struct r1bio *r1_bio = bio->bi_private;
 		bio->bi_next = NULL;
 		bio->bi_bdev = rdev->bdev;
 		if (test_bit(Faulty, &rdev->flags)) {
@@ -1031,8 +1039,13 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
 				    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
 			/* Just ignore it */
 			bio_endio(bio);
-		else
+		else {
+			if (mddev->gendisk)
+				trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+						      bio, disk_devt(mddev->gendisk),
+						      r1_bio->sector);
 			generic_make_request(bio);
+		}
 		bio = next;
 	}
 	kfree(plug);
@@ -1162,6 +1175,11 @@ static void raid1_make_request(struct mddev *mddev, struct bio * bio)
 		bio_set_op_attrs(read_bio, op, do_sync);
 		read_bio->bi_private = r1_bio;
 
+		if (mddev->gendisk)
+			trace_block_bio_remap(bdev_get_queue(read_bio->bi_bdev),
+					      read_bio, disk_devt(mddev->gendisk),
+					      r1_bio->sector);
+
 		if (max_sectors < r1_bio->sectors) {
 			/* could not read all from this device, so we will
 			 * need another r1_bio.
@@ -2290,6 +2308,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 	struct bio *bio;
 	char b[BDEVNAME_SIZE];
 	struct md_rdev *rdev;
+	dev_t bio_dev;
+	sector_t bio_sector;
 
 	clear_bit(R1BIO_ReadError, &r1_bio->state);
 	/* we got a read error. Maybe the drive is bad.  Maybe just
@@ -2303,6 +2323,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 
 	bio = r1_bio->bios[r1_bio->read_disk];
 	bdevname(bio->bi_bdev, b);
+	bio_dev = bio->bi_bdev->bd_dev;
+	bio_sector = conf->mirrors[r1_bio->read_disk].rdev->data_offset + r1_bio->sector;
 	bio_put(bio);
 	r1_bio->bios[r1_bio->read_disk] = NULL;
 
@@ -2353,6 +2375,8 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 			else
 				mbio->bi_phys_segments++;
 			spin_unlock_irq(&conf->device_lock);
+			trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+					      bio, bio_dev, bio_sector);
 			generic_make_request(bio);
 			bio = NULL;
 
@@ -2367,8 +2391,11 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio)
 				sectors_handled;
 
 			goto read_more;
-		} else
+		} else {
+			trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+					      bio, bio_dev, bio_sector);
 			generic_make_request(bio);
+		}
 	}
 }
 
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5290be3d5c26..d144c3425824 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -25,6 +25,7 @@
 #include <linux/seq_file.h>
 #include <linux/ratelimit.h>
 #include <linux/kthread.h>
+#include <trace/events/block.h>
 #include "md.h"
 #include "raid10.h"
 #include "raid0.h"
@@ -859,6 +860,7 @@ static void flush_pending_writes(struct r10conf *conf)
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
 			struct md_rdev *rdev = (void*)bio->bi_bdev;
+			struct r10bio *r10_bio = bio->bi_private;
 			bio->bi_next = NULL;
 			bio->bi_bdev = rdev->bdev;
 			if (test_bit(Faulty, &rdev->flags)) {
@@ -868,8 +870,13 @@ static void flush_pending_writes(struct r10conf *conf)
 					    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
 				/* Just ignore it */
 				bio_endio(bio);
-			else
+			else {
+				if (conf->mddev->gendisk)
+					trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+							      bio, disk_devt(conf->mddev->gendisk),
+							      r10_bio->sector);
 				generic_make_request(bio);
+			}
 			bio = next;
 		}
 	} else
@@ -1042,6 +1049,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	while (bio) { /* submit pending writes */
 		struct bio *next = bio->bi_next;
 		struct md_rdev *rdev = (void*)bio->bi_bdev;
+		struct r10bio *r10_bio = bio->bi_private;
 		bio->bi_next = NULL;
 		bio->bi_bdev = rdev->bdev;
 		if (test_bit(Faulty, &rdev->flags)) {
@@ -1051,8 +1059,13 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 				    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
 			/* Just ignore it */
 			bio_endio(bio);
-		else
+		else {
+			if (conf->mddev->gendisk)
+				trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+						      bio, disk_devt(conf->mddev->gendisk),
+						      r10_bio->sector);
 			generic_make_request(bio);
+		}
 		bio = next;
 	}
 	kfree(plug);
@@ -1165,6 +1178,10 @@ static void __make_request(struct mddev *mddev, struct bio *bio)
 		bio_set_op_attrs(read_bio, op, do_sync);
 		read_bio->bi_private = r10_bio;
 
+		if (mddev->gendisk)
+			trace_block_bio_remap(bdev_get_queue(read_bio->bi_bdev),
+					      read_bio, disk_devt(mddev->gendisk),
+					      r10_bio->sector);
 		if (max_sectors < r10_bio->sectors) {
 			/* Could not read all from this device, so we will
 			 * need another r10_bio.
@@ -2496,6 +2513,8 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 	char b[BDEVNAME_SIZE];
 	unsigned long do_sync;
 	int max_sectors;
+	dev_t bio_dev;
+	sector_t bio_last_sector;
 
 	/* we got a read error. Maybe the drive is bad.  Maybe just
 	 * the block and we can fix it.
@@ -2507,6 +2526,8 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 	 */
 	bio = r10_bio->devs[slot].bio;
 	bdevname(bio->bi_bdev, b);
+	bio_dev = bio->bi_bdev->bd_dev;
+	bio_last_sector = r10_bio->devs[slot].addr + rdev->data_offset + r10_bio->sectors;
 	bio_put(bio);
 	r10_bio->devs[slot].bio = NULL;
 
@@ -2546,6 +2567,10 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 	bio_set_op_attrs(bio, REQ_OP_READ, do_sync);
 	bio->bi_private = r10_bio;
 	bio->bi_end_io = raid10_end_read_request;
+	trace_block_bio_remap(bdev_get_queue(bio->bi_bdev),
+			      bio, bio_dev,
+			      bio_last_sector - r10_bio->sectors);
+
 	if (max_sectors < r10_bio->sectors) {
 		/* Drat - have to split this up more */
 		struct bio *mbio = r10_bio->master_bio;



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [md PATCH 4/4] md/raid1, raid10: add blktrace records when IO is delayed.
  2016-11-14  5:30 [md PATCH 0/4] Improve blktrace tracing of md NeilBrown
  2016-11-14  5:30 ` [md PATCH 1/4] md: add block tracing for bio_remapping NeilBrown
@ 2016-11-14  5:30 ` NeilBrown
  2016-11-14  5:30 ` [md PATCH 3/4] md/bitmap: add blktrace event for writes to the bitmap NeilBrown
  2016-11-14  5:30 ` [md PATCH 2/4] md: add bio completion tracing for raid1/raid10 NeilBrown
  3 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2016-11-14  5:30 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Both raid1 and raid10 will sometimes delay handling an IO request,
such as when resync is happening or there are too many requests queued.

Add some blktrace messsages so we can see when that is happening when
looking for performance artefacts.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/raid1.c  |    8 ++++++++
 drivers/md/raid10.c |    8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0674e5a0142e..e94db92a4dbf 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -71,6 +71,9 @@ static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
 			  sector_t bi_sector);
 static void lower_barrier(struct r1conf *conf);
 
+#define raid1_log(md, fmt, args...)				\
+	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
+
 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct pool_info *pi = data;
@@ -868,6 +871,7 @@ static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
 		 * that queue to allow conf->start_next_window
 		 * to increase.
 		 */
+		raid1_log(conf->mddev, "wait barrier");
 		wait_event_lock_irq(conf->wait_barrier,
 				    !conf->array_frozen &&
 				    (!conf->barrier ||
@@ -947,6 +951,7 @@ static void freeze_array(struct r1conf *conf, int extra)
 	 */
 	spin_lock_irq(&conf->resync_lock);
 	conf->array_frozen = 1;
+	raid1_log(conf->mddev, "wait freeze");
 	wait_event_lock_irq_cmd(conf->wait_barrier,
 				conf->nr_pending == conf->nr_queued+extra,
 				conf->resync_lock,
@@ -1157,6 +1162,7 @@ static void raid1_make_request(struct mddev *mddev, struct bio * bio)
 			 * take care not to over-take any writes
 			 * that are 'behind'
 			 */
+			raid1_log(mddev, "wait behind writes");
 			wait_event(bitmap->behind_wait,
 				   atomic_read(&bitmap->behind_writes) == 0);
 		}
@@ -1221,6 +1227,7 @@ static void raid1_make_request(struct mddev *mddev, struct bio * bio)
 	 */
 	if (conf->pending_count >= max_queued_requests) {
 		md_wakeup_thread(mddev->thread);
+		raid1_log(mddev, "wait queued");
 		wait_event(conf->wait_barrier,
 			   conf->pending_count < max_queued_requests);
 	}
@@ -1312,6 +1319,7 @@ static void raid1_make_request(struct mddev *mddev, struct bio * bio)
 				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
 		r1_bio->state = 0;
 		allow_barrier(conf, start_next_window, bio->bi_iter.bi_sector);
+		raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
 		start_next_window = wait_barrier(conf, bio);
 		/*
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index c3036099ff9a..15e55488a9d2 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -106,6 +106,9 @@ static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio);
 static void end_reshape_write(struct bio *bio);
 static void end_reshape(struct r10conf *conf);
 
+#define raid10_log(md, fmt, args...)				\
+	do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid10 " fmt, ##args); } while (0)
+
 static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
 {
 	struct r10conf *conf = data;
@@ -949,6 +952,7 @@ static void wait_barrier(struct r10conf *conf)
 		 * that queue to get the nr_pending
 		 * count down.
 		 */
+		raid10_log(conf->mddev, "wait barrier");
 		wait_event_lock_irq(conf->wait_barrier,
 				    !conf->barrier ||
 				    (atomic_read(&conf->nr_pending) &&
@@ -1106,6 +1110,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio)
 		/* IO spans the reshape position.  Need to wait for
 		 * reshape to pass
 		 */
+		raid10_log(conf->mddev, "wait reshape");
 		allow_barrier(conf);
 		wait_event(conf->wait_barrier,
 			   conf->reshape_progress <= bio->bi_iter.bi_sector ||
@@ -1125,6 +1130,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio)
 		set_mask_bits(&mddev->flags, 0,
 			      BIT(MD_CHANGE_DEVS) | BIT(MD_CHANGE_PENDING));
 		md_wakeup_thread(mddev->thread);
+		raid10_log(conf->mddev, "wait reshape metadata");
 		wait_event(mddev->sb_wait,
 			   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
 
@@ -1222,6 +1228,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio)
 	 */
 	if (conf->pending_count >= max_queued_requests) {
 		md_wakeup_thread(mddev->thread);
+		raid10_log(mddev, "wait queued");
 		wait_event(conf->wait_barrier,
 			   conf->pending_count < max_queued_requests);
 	}
@@ -1349,6 +1356,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio)
 			}
 		}
 		allow_barrier(conf);
+		raid10_log(conf->mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
 		wait_barrier(conf);
 		goto retry_write;



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [md PATCH 3/4] md/bitmap: add blktrace event for writes to the bitmap.
  2016-11-14  5:30 [md PATCH 0/4] Improve blktrace tracing of md NeilBrown
  2016-11-14  5:30 ` [md PATCH 1/4] md: add block tracing for bio_remapping NeilBrown
  2016-11-14  5:30 ` [md PATCH 4/4] md/raid1, raid10: add blktrace records when IO is delayed NeilBrown
@ 2016-11-14  5:30 ` NeilBrown
  2016-11-16 19:31   ` Shaohua Li
  2016-11-14  5:30 ` [md PATCH 2/4] md: add bio completion tracing for raid1/raid10 NeilBrown
  3 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2016-11-14  5:30 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

We trace wheneven bitmap_unplug() finds that it needs to write
to the bitmap, or when bitmap_daemon_work() find there is work
to do.

This makes it easier to correlate bitmap updates with data writes.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/md/bitmap.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 1a7f402b79ba..cf77cbf9ed22 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -27,6 +27,7 @@
 #include <linux/mount.h>
 #include <linux/buffer_head.h>
 #include <linux/seq_file.h>
+#include <trace/events/block.h>
 #include "md.h"
 #include "bitmap.h"
 
@@ -1008,8 +1009,12 @@ void bitmap_unplug(struct bitmap *bitmap)
 		need_write = test_and_clear_page_attr(bitmap, i,
 						      BITMAP_PAGE_NEEDWRITE);
 		if (dirty || need_write) {
-			if (!writing)
+			if (!writing) {
 				bitmap_wait_writes(bitmap);
+				if (bitmap->mddev->queue)
+					blk_add_trace_msg(bitmap->mddev->queue,
+							  "md bitmap_unplug");
+			}
 			clear_page_attr(bitmap, i, BITMAP_PAGE_PENDING);
 			write_page(bitmap, bitmap->storage.filemap[i], 0);
 			writing = 1;
@@ -1234,6 +1239,10 @@ void bitmap_daemon_work(struct mddev *mddev)
 	}
 	bitmap->allclean = 1;
 
+	if (bitmap->mddev->queue)
+		blk_add_trace_msg(bitmap->mddev->queue,
+				  "md bitmap_daemon_work");
+
 	/* Any file-page which is PENDING now needs to be written.
 	 * So set NEEDWRITE now, then after we make any last-minute changes
 	 * we will write it.



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [md PATCH 2/4] md: add bio completion tracing for raid1/raid10
  2016-11-14  5:30 ` [md PATCH 2/4] md: add bio completion tracing for raid1/raid10 NeilBrown
@ 2016-11-16 14:32   ` Christoph Hellwig
  2016-11-17  5:35     ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2016-11-16 14:32 UTC (permalink / raw)
  To: NeilBrown; +Cc: Shaohua Li, linux-raid

On Mon, Nov 14, 2016 at 04:30:21PM +1100, NeilBrown wrote:
> raid5 already has this, as does dm.
> linear and raid0 do no see completions, only bio_chain_end() or bio_endio()
> see those.
> So just add it for raid1 and raid10.
> 
> Between
>  Commit: 3a366e614d08 ("block: add missing block_bio_complete() tracepoint")
> and
>  Commit: 0a82a8d132b2 ("Revert "block: add missing block_bio_complete() tracepoint"")
> in the 3.9-rc series, this was done centrally in bio_endio().
> Until/unless that is resurected, do the tracing in the md/raid code.

We're working on getting it back for 4.10, so please don't add these
tracepoints to the MD driver for now.  Next time please also ask
linux-block first and/or Cc the list on a patch like this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [md PATCH 1/4] md: add block tracing for bio_remapping
  2016-11-14  5:30 ` [md PATCH 1/4] md: add block tracing for bio_remapping NeilBrown
@ 2016-11-16 19:29   ` Shaohua Li
  2016-11-17  5:33     ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Shaohua Li @ 2016-11-16 19:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Nov 14, 2016 at 04:30:21PM +1100, Neil Brown wrote:
> The block tracing infrastructure (accessed with blktrace/blkparse)
> supports the tracing of mapping bios from one device to another.
> This is currently used when a bio in a partition is mapped to the
> whole device, when bios are mapped by dm, and for mapping in md/raid5.
> Other md personalities do not include this tracing yet, so add it.
> 
> When a read-error is detected we redirect the request to a different device.
> This could justifiably be seen as a new mapping for the originial bio,
> or a secondary mapping for the bio that errors.  This patch uses
> the second option.
> 
> When md is used under dm-raid, the mappings are not traced as we do
> not have access to the block device number of the parent.

Looks the the original sector (the last parameter of trace_block_bio_remap)
isn't correct.
- in linear/raid0, bio_split already updated bio->bi_iter.bi_sector
- in raid1/raid10, r1_bio->sector is updated before the bio is sent.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [md PATCH 3/4] md/bitmap: add blktrace event for writes to the bitmap.
  2016-11-14  5:30 ` [md PATCH 3/4] md/bitmap: add blktrace event for writes to the bitmap NeilBrown
@ 2016-11-16 19:31   ` Shaohua Li
  0 siblings, 0 replies; 13+ messages in thread
From: Shaohua Li @ 2016-11-16 19:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Nov 14, 2016 at 04:30:21PM +1100, Neil Brown wrote:
> We trace wheneven bitmap_unplug() finds that it needs to write
> to the bitmap, or when bitmap_daemon_work() find there is work
> to do.
> 
> This makes it easier to correlate bitmap updates with data writes.

Looks good. This reminds me if we should do similar thing for md_write_start if
we write superblock. Especially for raid5-cache, as we write superblock regularly.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [md PATCH 1/4] md: add block tracing for bio_remapping
  2016-11-16 19:29   ` Shaohua Li
@ 2016-11-17  5:33     ` NeilBrown
  2016-11-17 18:04       ` Shaohua Li
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2016-11-17  5:33 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On Thu, Nov 17 2016, Shaohua Li wrote:

> On Mon, Nov 14, 2016 at 04:30:21PM +1100, Neil Brown wrote:
>> The block tracing infrastructure (accessed with blktrace/blkparse)
>> supports the tracing of mapping bios from one device to another.
>> This is currently used when a bio in a partition is mapped to the
>> whole device, when bios are mapped by dm, and for mapping in md/raid5.
>> Other md personalities do not include this tracing yet, so add it.
>> 
>> When a read-error is detected we redirect the request to a different device.
>> This could justifiably be seen as a new mapping for the originial bio,
>> or a secondary mapping for the bio that errors.  This patch uses
>> the second option.
>> 
>> When md is used under dm-raid, the mappings are not traced as we do
>> not have access to the block device number of the parent.
>
> Looks the the original sector (the last parameter of trace_block_bio_remap)
> isn't correct.
> - in linear/raid0, bio_split already updated bio->bi_iter.bi_sector

Oh yes, of course.  in the common case 'split == bio' so when
split->bi_iter.bi_sector  is adjusted, bio->.... is as well.
I'll fix that, and also add calls to trace_block_split() as appropriate.


> - in raid1/raid10, r1_bio->sector is updated before the bio is sent.

Here I really think my code is correct.  r1_bio->sector is always the
address in the array of the request.  It is only set once for each
r1_bio, and that is before the call to trace_block_io_remap().

Thanks,
NeilBrown


>
> Thanks,
> Shaohua

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [md PATCH 2/4] md: add bio completion tracing for raid1/raid10
  2016-11-16 14:32   ` Christoph Hellwig
@ 2016-11-17  5:35     ` NeilBrown
  2016-11-17 12:51       ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2016-11-17  5:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Shaohua Li, linux-raid

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]

On Thu, Nov 17 2016, Christoph Hellwig wrote:

> On Mon, Nov 14, 2016 at 04:30:21PM +1100, NeilBrown wrote:
>> raid5 already has this, as does dm.
>> linear and raid0 do no see completions, only bio_chain_end() or bio_endio()
>> see those.
>> So just add it for raid1 and raid10.
>> 
>> Between
>>  Commit: 3a366e614d08 ("block: add missing block_bio_complete() tracepoint")
>> and
>>  Commit: 0a82a8d132b2 ("Revert "block: add missing block_bio_complete() tracepoint"")
>> in the 3.9-rc series, this was done centrally in bio_endio().
>> Until/unless that is resurected, do the tracing in the md/raid code.
>
> We're working on getting it back for 4.10, so please don't add these
> tracepoints to the MD driver for now.  Next time please also ask
> linux-block first and/or Cc the list on a patch like this.

Oh good, thanks for letting me know. I'll drop that one.
Do you know if there is any plan to include trace_block_split() in
bio_split()??

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [md PATCH 2/4] md: add bio completion tracing for raid1/raid10
  2016-11-17  5:35     ` NeilBrown
@ 2016-11-17 12:51       ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2016-11-17 12:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: Christoph Hellwig, Shaohua Li, linux-raid

On Thu, Nov 17, 2016 at 04:35:47PM +1100, NeilBrown wrote:
> Oh good, thanks for letting me know. I'll drop that one.
> Do you know if there is any plan to include trace_block_split() in
> bio_split()??

Not that I know off.  But doing so seems sensible, and I'm pretty
sure patches would be welcome.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [md PATCH 1/4] md: add block tracing for bio_remapping
  2016-11-17  5:33     ` NeilBrown
@ 2016-11-17 18:04       ` Shaohua Li
  2016-11-18  0:45         ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Shaohua Li @ 2016-11-17 18:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Nov 17, 2016 at 04:33:33PM +1100, Neil Brown wrote:
> On Thu, Nov 17 2016, Shaohua Li wrote:
> 
> > On Mon, Nov 14, 2016 at 04:30:21PM +1100, Neil Brown wrote:
> >> The block tracing infrastructure (accessed with blktrace/blkparse)
> >> supports the tracing of mapping bios from one device to another.
> >> This is currently used when a bio in a partition is mapped to the
> >> whole device, when bios are mapped by dm, and for mapping in md/raid5.
> >> Other md personalities do not include this tracing yet, so add it.
> >> 
> >> When a read-error is detected we redirect the request to a different device.
> >> This could justifiably be seen as a new mapping for the originial bio,
> >> or a secondary mapping for the bio that errors.  This patch uses
> >> the second option.
> >> 
> >> When md is used under dm-raid, the mappings are not traced as we do
> >> not have access to the block device number of the parent.
> >
> > Looks the the original sector (the last parameter of trace_block_bio_remap)
> > isn't correct.
> > - in linear/raid0, bio_split already updated bio->bi_iter.bi_sector
> 
> Oh yes, of course.  in the common case 'split == bio' so when
> split->bi_iter.bi_sector  is adjusted, bio->.... is as well.
> I'll fix that, and also add calls to trace_block_split() as appropriate.
> 
> 
> > - in raid1/raid10, r1_bio->sector is updated before the bio is sent.
> 
> Here I really think my code is correct.  r1_bio->sector is always the
> address in the array of the request.  It is only set once for each
> r1_bio, and that is before the call to trace_block_io_remap().

Oh, you are right, sorry. I think moving the trace_remap right before we add
the bio to plug or pending list is better. It will make the code simpler. The
timing of the trace doesn't matter.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [md PATCH 1/4] md: add block tracing for bio_remapping
  2016-11-17 18:04       ` Shaohua Li
@ 2016-11-18  0:45         ` NeilBrown
  0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2016-11-18  0:45 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2291 bytes --]

On Fri, Nov 18 2016, Shaohua Li wrote:

> On Thu, Nov 17, 2016 at 04:33:33PM +1100, Neil Brown wrote:
>> On Thu, Nov 17 2016, Shaohua Li wrote:
>> 
>> > On Mon, Nov 14, 2016 at 04:30:21PM +1100, Neil Brown wrote:
>> >> The block tracing infrastructure (accessed with blktrace/blkparse)
>> >> supports the tracing of mapping bios from one device to another.
>> >> This is currently used when a bio in a partition is mapped to the
>> >> whole device, when bios are mapped by dm, and for mapping in md/raid5.
>> >> Other md personalities do not include this tracing yet, so add it.
>> >> 
>> >> When a read-error is detected we redirect the request to a different device.
>> >> This could justifiably be seen as a new mapping for the originial bio,
>> >> or a secondary mapping for the bio that errors.  This patch uses
>> >> the second option.
>> >> 
>> >> When md is used under dm-raid, the mappings are not traced as we do
>> >> not have access to the block device number of the parent.
>> >
>> > Looks the the original sector (the last parameter of trace_block_bio_remap)
>> > isn't correct.
>> > - in linear/raid0, bio_split already updated bio->bi_iter.bi_sector
>> 
>> Oh yes, of course.  in the common case 'split == bio' so when
>> split->bi_iter.bi_sector  is adjusted, bio->.... is as well.
>> I'll fix that, and also add calls to trace_block_split() as appropriate.
>> 
>> 
>> > - in raid1/raid10, r1_bio->sector is updated before the bio is sent.
>> 
>> Here I really think my code is correct.  r1_bio->sector is always the
>> address in the array of the request.  It is only set once for each
>> r1_bio, and that is before the call to trace_block_io_remap().
>
> Oh, you are right, sorry. I think moving the trace_remap right before we add
> the bio to plug or pending list is better. It will make the code simpler. The
> timing of the trace doesn't matter.

I agree the timing doesn't matter

The reason I didn't do that before is that bio->bi_bdev points to the
rdev when the write bio is queued to the pending list.  I'm not sure if
trace_block_bio_remap() accesses ->bi_bdev, but it could.

So I'm not completely sure that it makes the code simpler, but I'll post
a version like that to see what you think.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-11-18  0:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-14  5:30 [md PATCH 0/4] Improve blktrace tracing of md NeilBrown
2016-11-14  5:30 ` [md PATCH 1/4] md: add block tracing for bio_remapping NeilBrown
2016-11-16 19:29   ` Shaohua Li
2016-11-17  5:33     ` NeilBrown
2016-11-17 18:04       ` Shaohua Li
2016-11-18  0:45         ` NeilBrown
2016-11-14  5:30 ` [md PATCH 4/4] md/raid1, raid10: add blktrace records when IO is delayed NeilBrown
2016-11-14  5:30 ` [md PATCH 3/4] md/bitmap: add blktrace event for writes to the bitmap NeilBrown
2016-11-16 19:31   ` Shaohua Li
2016-11-14  5:30 ` [md PATCH 2/4] md: add bio completion tracing for raid1/raid10 NeilBrown
2016-11-16 14:32   ` Christoph Hellwig
2016-11-17  5:35     ` NeilBrown
2016-11-17 12:51       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).