From: NeilBrown <neilb@suse.com>
To: Christoph Hellwig <hch@infradead.org>, Jens Axboe <axboe@kernel.dk>
Cc: linux-raid@vger.kernel.org,
"Martin K . Petersen" <martin.petersen@oracle.com>,
Mike Snitzer <snitzer@redhat.com>,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
dm-devel@redhat.com, Shaohua Li <shli@kernel.org>,
Alasdair Kergon <agk@redhat.com>
Subject: [PATCH v2] block: trace completion of all bios.
Date: Thu, 23 Mar 2017 17:29:02 +1100 [thread overview]
Message-ID: <87shm4a4lt.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170322125149.GA29606@infradead.org>
[-- Attachment #1.1: Type: text/plain, Size: 7116 bytes --]
Currently only dm and md/raid5 bios trigger trace_block_bio_complete().
Now that we have bio_chain(), it is not possible, in general, for a
driver to know when the bio is really complete. Only bio_endio()
knows that.
So move the trace_block_bio_complete() call to bio_endio().
Now trace_block_bio_complete() pairs with trace_block_bio_queue().
Any bio for which a 'queue' event is traced, will subsequently
generate a 'complete' event.
There are a few cases where completion tracing is not wanted.
1/ If blk_update_request() has already generated a completion
trace event at the 'request' level, there is no point generating
one at the bio level too. In this case the bi_sector and bi_size
will have changed, so the bio level event would be wrong
2/ If the bio hasn't actually been queued yet, but is being aborted
early, then a trace event could be confusing. Some filesystems
call bio_endio() and will need to use a different interface to
avoid tracing
3/ The bio_integrity code interposes itself by replacing bi_end_io,
then restores it and calls bio_endio() again. This would produce
two identical trace events if left like that.
To handle these, we provide bio_endio_notrace(). This patch only adds
uses of this in core code. Separate patches will be needed to update
the filesystems to avoid tracing.
Signed-off-by: NeilBrown <neilb@suse.com>
---
block/bio-integrity.c | 4 ++--
block/bio.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
block/blk-core.c | 2 +-
drivers/md/dm.c | 1 -
drivers/md/raid5.c | 8 --------
include/linux/bio.h | 1 +
6 files changed, 50 insertions(+), 12 deletions(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 5384713d48bc..28581e2f68fb 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -370,7 +370,7 @@ static void bio_integrity_verify_fn(struct work_struct *work)
/* Restore original bio completion handler */
bio->bi_end_io = bip->bip_end_io;
- bio_endio(bio);
+ bio_endio_notrace(bio);
}
/**
@@ -397,7 +397,7 @@ void bio_integrity_endio(struct bio *bio)
*/
if (bio->bi_error) {
bio->bi_end_io = bip->bip_end_io;
- bio_endio(bio);
+ bio_endio_notrace(bio);
return;
}
diff --git a/block/bio.c b/block/bio.c
index 5eec5e08417f..c8e5d24abd52 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1811,6 +1811,45 @@ static inline bool bio_remaining_done(struct bio *bio)
}
/**
+ * bio_endio_notrace - end I/O on a bio without tracing
+ * @bio: bio
+ *
+ * Description:
+ * bio_endio_notrace() will end I/O on the whole bio.
+ * bio_endio_notrace() should only be call if a completion trace
+ * event is not needed. This can be the case if a request-level
+ * completion event has already been generated, if the bio is
+ * being completed early, before it was even queued.
+ *
+ **/
+void bio_endio_notrace(struct bio *bio)
+{
+again:
+ if (!bio_remaining_done(bio))
+ return;
+
+ /*
+ * Need to have a real endio function for chained bios, otherwise
+ * various corner cases will break (like stacking block devices that
+ * save/restore bi_end_io) - however, we want to avoid unbounded
+ * recursion and blowing the stack. Tail call optimization would
+ * handle this, but compiling with frame pointers also disables
+ * gcc's sibling call optimization.
+ */
+ if (bio->bi_end_io == bio_chain_endio) {
+ bio = __bio_chain_endio(bio);
+ goto again;
+ }
+
+ if (bio->bi_bdev)
+ trace_block_bio_complete(bdev_get_queue(bio->bi_bdev),
+ bio, bio->bi_error);
+ if (bio->bi_end_io)
+ bio->bi_end_io(bio);
+}
+EXPORT_SYMBOL(bio_endio_notrace);
+
+/**
* bio_endio - end I/O on a bio
* @bio: bio
*
@@ -1818,6 +1857,10 @@ static inline bool bio_remaining_done(struct bio *bio)
* bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
* way to end I/O on a bio. No one should call bi_end_io() directly on a
* bio unless they own it and thus know that it has an end_io function.
+ *
+ * bio_endio() can be called several times on a bio that has been chained
+ * using bio_chain(). The ->bi_end_io() function will only be call the
+ * time. At this point the BLK_TA_COMPLETE tracing event will be generated.
**/
void bio_endio(struct bio *bio)
{
@@ -1838,6 +1881,9 @@ void bio_endio(struct bio *bio)
goto again;
}
+ if (bio->bi_bdev)
+ trace_block_bio_complete(bdev_get_queue(bio->bi_bdev),
+ bio, bio->bi_error);
if (bio->bi_end_io)
bio->bi_end_io(bio);
}
diff --git a/block/blk-core.c b/block/blk-core.c
index 0eeb99ef654f..b6c76580a796 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -142,7 +142,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
/* don't actually finish bio if it's part of flush sequence */
if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
- bio_endio(bio);
+ bio_endio_notrace(bio);
}
void blk_dump_rq_flags(struct request *rq, char *msg)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f4ffd1eb8f44..f5f09ace690a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -810,7 +810,6 @@ static void dec_pending(struct dm_io *io, int error)
queue_io(md, bio);
} else {
/* done with normal IO or empty flush */
- trace_block_bio_complete(md->queue, bio, io_error);
bio->bi_error = io_error;
bio_endio(bio);
}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9a3b7da34137..f684cb566721 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5141,8 +5141,6 @@ static void raid5_align_endio(struct bio *bi)
rdev_dec_pending(rdev, conf->mddev);
if (!error) {
- trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev),
- raid_bi, 0);
bio_endio(raid_bi);
if (atomic_dec_and_test(&conf->active_aligned_reads))
wake_up(&conf->wait_for_quiescent);
@@ -5727,10 +5725,6 @@ static void raid5_make_request(struct mddev *mddev, struct bio * bi)
md_write_end(mddev);
remaining = raid5_dec_bi_active_stripes(bi);
if (remaining == 0) {
-
-
- trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
- bi, 0);
bio_endio(bi);
}
}
@@ -6138,8 +6132,6 @@ static int retry_aligned_read(struct r5conf *conf, struct bio *raid_bio)
}
remaining = raid5_dec_bi_active_stripes(raid_bio);
if (remaining == 0) {
- trace_block_bio_complete(bdev_get_queue(raid_bio->bi_bdev),
- raid_bio, 0);
bio_endio(raid_bio);
}
if (atomic_dec_and_test(&conf->active_aligned_reads))
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e521194f6fc..e0552bee227b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -418,6 +418,7 @@ static inline struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
extern blk_qc_t submit_bio(struct bio *);
extern void bio_endio(struct bio *);
+extern void bio_endio_notrace(struct bio *);
static inline void bio_io_error(struct bio *bio)
{
--
2.12.0
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.com>
To: Christoph Hellwig <hch@infradead.org>, Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-raid@vger.kernel.org,
dm-devel@redhat.com, Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@redhat.com>, Shaohua Li <shli@kernel.org>,
linux-kernel@vger.kernel.org,
Martin K "." Petersen <martin.petersen@oracle.com>
Subject: [PATCH v2] block: trace completion of all bios.
Date: Thu, 23 Mar 2017 17:29:02 +1100 [thread overview]
Message-ID: <87shm4a4lt.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170322125149.GA29606@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 7116 bytes --]
Currently only dm and md/raid5 bios trigger trace_block_bio_complete().
Now that we have bio_chain(), it is not possible, in general, for a
driver to know when the bio is really complete. Only bio_endio()
knows that.
So move the trace_block_bio_complete() call to bio_endio().
Now trace_block_bio_complete() pairs with trace_block_bio_queue().
Any bio for which a 'queue' event is traced, will subsequently
generate a 'complete' event.
There are a few cases where completion tracing is not wanted.
1/ If blk_update_request() has already generated a completion
trace event at the 'request' level, there is no point generating
one at the bio level too. In this case the bi_sector and bi_size
will have changed, so the bio level event would be wrong
2/ If the bio hasn't actually been queued yet, but is being aborted
early, then a trace event could be confusing. Some filesystems
call bio_endio() and will need to use a different interface to
avoid tracing
3/ The bio_integrity code interposes itself by replacing bi_end_io,
then restores it and calls bio_endio() again. This would produce
two identical trace events if left like that.
To handle these, we provide bio_endio_notrace(). This patch only adds
uses of this in core code. Separate patches will be needed to update
the filesystems to avoid tracing.
Signed-off-by: NeilBrown <neilb@suse.com>
---
block/bio-integrity.c | 4 ++--
block/bio.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
block/blk-core.c | 2 +-
drivers/md/dm.c | 1 -
drivers/md/raid5.c | 8 --------
include/linux/bio.h | 1 +
6 files changed, 50 insertions(+), 12 deletions(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 5384713d48bc..28581e2f68fb 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -370,7 +370,7 @@ static void bio_integrity_verify_fn(struct work_struct *work)
/* Restore original bio completion handler */
bio->bi_end_io = bip->bip_end_io;
- bio_endio(bio);
+ bio_endio_notrace(bio);
}
/**
@@ -397,7 +397,7 @@ void bio_integrity_endio(struct bio *bio)
*/
if (bio->bi_error) {
bio->bi_end_io = bip->bip_end_io;
- bio_endio(bio);
+ bio_endio_notrace(bio);
return;
}
diff --git a/block/bio.c b/block/bio.c
index 5eec5e08417f..c8e5d24abd52 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1811,6 +1811,45 @@ static inline bool bio_remaining_done(struct bio *bio)
}
/**
+ * bio_endio_notrace - end I/O on a bio without tracing
+ * @bio: bio
+ *
+ * Description:
+ * bio_endio_notrace() will end I/O on the whole bio.
+ * bio_endio_notrace() should only be call if a completion trace
+ * event is not needed. This can be the case if a request-level
+ * completion event has already been generated, if the bio is
+ * being completed early, before it was even queued.
+ *
+ **/
+void bio_endio_notrace(struct bio *bio)
+{
+again:
+ if (!bio_remaining_done(bio))
+ return;
+
+ /*
+ * Need to have a real endio function for chained bios, otherwise
+ * various corner cases will break (like stacking block devices that
+ * save/restore bi_end_io) - however, we want to avoid unbounded
+ * recursion and blowing the stack. Tail call optimization would
+ * handle this, but compiling with frame pointers also disables
+ * gcc's sibling call optimization.
+ */
+ if (bio->bi_end_io == bio_chain_endio) {
+ bio = __bio_chain_endio(bio);
+ goto again;
+ }
+
+ if (bio->bi_bdev)
+ trace_block_bio_complete(bdev_get_queue(bio->bi_bdev),
+ bio, bio->bi_error);
+ if (bio->bi_end_io)
+ bio->bi_end_io(bio);
+}
+EXPORT_SYMBOL(bio_endio_notrace);
+
+/**
* bio_endio - end I/O on a bio
* @bio: bio
*
@@ -1818,6 +1857,10 @@ static inline bool bio_remaining_done(struct bio *bio)
* bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
* way to end I/O on a bio. No one should call bi_end_io() directly on a
* bio unless they own it and thus know that it has an end_io function.
+ *
+ * bio_endio() can be called several times on a bio that has been chained
+ * using bio_chain(). The ->bi_end_io() function will only be call the
+ * time. At this point the BLK_TA_COMPLETE tracing event will be generated.
**/
void bio_endio(struct bio *bio)
{
@@ -1838,6 +1881,9 @@ void bio_endio(struct bio *bio)
goto again;
}
+ if (bio->bi_bdev)
+ trace_block_bio_complete(bdev_get_queue(bio->bi_bdev),
+ bio, bio->bi_error);
if (bio->bi_end_io)
bio->bi_end_io(bio);
}
diff --git a/block/blk-core.c b/block/blk-core.c
index 0eeb99ef654f..b6c76580a796 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -142,7 +142,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
/* don't actually finish bio if it's part of flush sequence */
if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
- bio_endio(bio);
+ bio_endio_notrace(bio);
}
void blk_dump_rq_flags(struct request *rq, char *msg)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f4ffd1eb8f44..f5f09ace690a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -810,7 +810,6 @@ static void dec_pending(struct dm_io *io, int error)
queue_io(md, bio);
} else {
/* done with normal IO or empty flush */
- trace_block_bio_complete(md->queue, bio, io_error);
bio->bi_error = io_error;
bio_endio(bio);
}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9a3b7da34137..f684cb566721 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5141,8 +5141,6 @@ static void raid5_align_endio(struct bio *bi)
rdev_dec_pending(rdev, conf->mddev);
if (!error) {
- trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev),
- raid_bi, 0);
bio_endio(raid_bi);
if (atomic_dec_and_test(&conf->active_aligned_reads))
wake_up(&conf->wait_for_quiescent);
@@ -5727,10 +5725,6 @@ static void raid5_make_request(struct mddev *mddev, struct bio * bi)
md_write_end(mddev);
remaining = raid5_dec_bi_active_stripes(bi);
if (remaining == 0) {
-
-
- trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
- bi, 0);
bio_endio(bi);
}
}
@@ -6138,8 +6132,6 @@ static int retry_aligned_read(struct r5conf *conf, struct bio *raid_bio)
}
remaining = raid5_dec_bi_active_stripes(raid_bio);
if (remaining == 0) {
- trace_block_bio_complete(bdev_get_queue(raid_bio->bi_bdev),
- raid_bio, 0);
bio_endio(raid_bio);
}
if (atomic_dec_and_test(&conf->active_aligned_reads))
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8e521194f6fc..e0552bee227b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -418,6 +418,7 @@ static inline struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
extern blk_qc_t submit_bio(struct bio *);
extern void bio_endio(struct bio *);
+extern void bio_endio_notrace(struct bio *);
static inline void bio_io_error(struct bio *bio)
{
--
2.12.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-03-23 6:29 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-22 2:38 [PATCH] block: trace completion of all bios NeilBrown
2017-03-22 2:38 ` NeilBrown
2017-03-22 12:51 ` Christoph Hellwig
2017-03-23 6:26 ` NeilBrown
2017-03-23 6:26 ` NeilBrown
2017-03-23 6:29 ` NeilBrown [this message]
2017-03-23 6:29 ` [PATCH v2] " NeilBrown
2017-03-23 10:43 ` Ming Lei
2017-03-24 0:06 ` NeilBrown
2017-03-24 0:07 ` [PATCH v3] " NeilBrown
2017-03-24 6:47 ` Ming Lei
2017-03-26 23:17 ` NeilBrown
2017-03-26 23:17 ` NeilBrown
2017-03-27 9:03 ` Christoph Hellwig
2017-03-27 9:49 ` NeilBrown
2017-03-27 9:49 ` NeilBrown
2017-03-27 17:14 ` Christoph Hellwig
2017-03-27 17:14 ` Christoph Hellwig
2017-03-27 23:42 ` [dm-devel] " NeilBrown
2017-03-27 23:42 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87shm4a4lt.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=shli@kernel.org \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.