* [patch 5/5] block: use a driver-specific handler for the "inflight" value
@ 2018-11-06 21:35 Mikulas Patocka
2018-11-08 14:52 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Mikulas Patocka @ 2018-11-06 21:35 UTC (permalink / raw)
To: Mike Snitzer, Alasdair G. Kergon; +Cc: dm-devel, Mikulas Patocka
[-- Attachment #1: dm-inflight.patch --]
[-- Type: text/plain, Size: 4240 bytes --]
Device mapper was converted to percpu inflight counters. In order to
display the correct values in the "inflight" sysfs file, we need a custom
callback that sums the percpu counters.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
---
block/blk-settings.c | 6 ++++++
block/genhd.c | 5 +++++
drivers/md/dm.c | 18 ++++++++++++++++++
include/linux/blkdev.h | 4 ++++
4 files changed, 33 insertions(+)
Index: linux-2.6/block/genhd.c
===================================================================
--- linux-2.6.orig/block/genhd.c 2018-11-06 22:31:46.350000000 +0100
+++ linux-2.6/block/genhd.c 2018-11-06 22:31:46.350000000 +0100
@@ -85,6 +85,11 @@ void part_in_flight(struct request_queue
void part_in_flight_rw(struct request_queue *q, struct hd_struct *part,
unsigned int inflight[2])
{
+ if (q->get_inflight_fn) {
+ q->get_inflight_fn(q, inflight);
+ return;
+ }
+
if (q->mq_ops) {
blk_mq_in_flight_rw(q, part, inflight);
return;
Index: linux-2.6/include/linux/blkdev.h
===================================================================
--- linux-2.6.orig/include/linux/blkdev.h 2018-11-06 22:31:46.350000000 +0100
+++ linux-2.6/include/linux/blkdev.h 2018-11-06 22:31:46.350000000 +0100
@@ -324,6 +324,7 @@ typedef int (lld_busy_fn) (struct reques
typedef int (bsg_job_fn) (struct bsg_job *);
typedef int (init_rq_fn)(struct request_queue *, struct request *, gfp_t);
typedef void (exit_rq_fn)(struct request_queue *, struct request *);
+typedef void (get_inflight_fn)(struct request_queue *, unsigned int [2]);
enum blk_eh_timer_return {
BLK_EH_DONE, /* drivers has completed the command */
@@ -466,6 +467,8 @@ struct request_queue {
exit_rq_fn *exit_rq_fn;
/* Called from inside blk_get_request() */
void (*initialize_rq_fn)(struct request *rq);
+ /* Called to get the "inflight" values */
+ get_inflight_fn *get_inflight_fn;
const struct blk_mq_ops *mq_ops;
@@ -1232,6 +1235,7 @@ extern int blk_queue_dma_drain(struct re
dma_drain_needed_fn *dma_drain_needed,
void *buf, unsigned int size);
extern void blk_queue_lld_busy(struct request_queue *q, lld_busy_fn *fn);
+extern void blk_queue_get_inflight(struct request_queue *q, get_inflight_fn *fn);
extern void blk_queue_segment_boundary(struct request_queue *, unsigned long);
extern void blk_queue_virt_boundary(struct request_queue *, unsigned long);
extern void blk_queue_prep_rq(struct request_queue *, prep_rq_fn *pfn);
Index: linux-2.6/block/blk-settings.c
===================================================================
--- linux-2.6.orig/block/blk-settings.c 2018-11-06 22:31:46.350000000 +0100
+++ linux-2.6/block/blk-settings.c 2018-11-06 22:31:46.350000000 +0100
@@ -79,6 +79,12 @@ void blk_queue_lld_busy(struct request_q
}
EXPORT_SYMBOL_GPL(blk_queue_lld_busy);
+void blk_queue_get_inflight(struct request_queue *q, get_inflight_fn *fn)
+{
+ q->get_inflight_fn = fn;
+}
+EXPORT_SYMBOL_GPL(blk_queue_get_inflight);
+
/**
* blk_set_default_limits - reset limits to default values
* @lim: the queue_limits structure to reset
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c 2018-11-06 22:31:46.350000000 +0100
+++ linux-2.6/drivers/md/dm.c 2018-11-06 22:31:46.350000000 +0100
@@ -662,6 +662,23 @@ static void end_io_acct(struct dm_io *io
}
}
+static void dm_get_inflight(struct request_queue *q, unsigned int inflight[2])
+{
+ struct mapped_device *md = q->queuedata;
+ int cpu;
+
+ inflight[READ] = inflight[WRITE] = 0;
+ for_each_possible_cpu(cpu) {
+ struct dm_percpu *p = per_cpu_ptr(md->counters, cpu);
+ inflight[READ] += p->inflight[READ];
+ inflight[WRITE] += p->inflight[WRITE];
+ }
+ if ((int)inflight[READ] < 0)
+ inflight[READ] = 0;
+ if ((int)inflight[WRITE] < 0)
+ inflight[WRITE] = 0;
+}
+
/*
* Add the bio to the list of deferred io.
*/
@@ -2242,6 +2259,7 @@ int dm_setup_md_queue(struct mapped_devi
case DM_TYPE_NVME_BIO_BASED:
dm_init_normal_md_queue(md);
blk_queue_make_request(md->queue, dm_make_request);
+ blk_queue_get_inflight(md->queue, dm_get_inflight);
break;
case DM_TYPE_NONE:
WARN_ON_ONCE(true);
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
2018-11-06 21:35 [patch 5/5] block: use a driver-specific handler for the "inflight" value Mikulas Patocka
@ 2018-11-08 14:52 ` Christoph Hellwig
2018-11-08 17:07 ` Mike Snitzer
2018-11-14 22:35 ` Mikulas Patocka
0 siblings, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-11-08 14:52 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: Mike Snitzer, dm-devel, Alasdair G. Kergon
On Tue, Nov 06, 2018 at 10:35:03PM +0100, Mikulas Patocka wrote:
> Device mapper was converted to percpu inflight counters. In order to
> display the correct values in the "inflight" sysfs file, we need a custom
> callback that sums the percpu counters.
The attribute that calls this is per-partition, while your method
is per-queue, so there is some impedence mismatch here.
Is there any way you could look into just making the generic code
use percpu counters?
Also please cc linux-block on patches that touch the generic block
code.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
2018-11-08 14:52 ` Christoph Hellwig
@ 2018-11-08 17:07 ` Mike Snitzer
2018-11-14 22:35 ` Mikulas Patocka
1 sibling, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2018-11-08 17:07 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block, axboe, dm-devel, Mikulas Patocka, Alasdair G. Kergon
On Thu, Nov 08 2018 at 9:52am -0500,
Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Nov 06, 2018 at 10:35:03PM +0100, Mikulas Patocka wrote:
> > Device mapper was converted to percpu inflight counters. In order to
> > display the correct values in the "inflight" sysfs file, we need a custom
> > callback that sums the percpu counters.
>
> The attribute that calls this is per-partition, while your method
> is per-queue, so there is some impedence mismatch here.
>
> Is there any way you could look into just making the generic code
> use percpu counters?
Discussed doing that with Jens and reported as much here:
https://www.redhat.com/archives/dm-devel/2018-November/msg00068.html
And Jens gave additional context for why yet another attempt to switch
block core's in_flight to percpu counters is doomed (having already been
proposed and rejected twice):
https://www.redhat.com/archives/dm-devel/2018-November/msg00071.html
And yes, definitely should've cc'd linux-block (now added).
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
@ 2018-11-08 17:07 ` Mike Snitzer
0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2018-11-08 17:07 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mikulas Patocka, Alasdair G. Kergon, dm-devel, linux-block, axboe
On Thu, Nov 08 2018 at 9:52am -0500,
Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Nov 06, 2018 at 10:35:03PM +0100, Mikulas Patocka wrote:
> > Device mapper was converted to percpu inflight counters. In order to
> > display the correct values in the "inflight" sysfs file, we need a custom
> > callback that sums the percpu counters.
>
> The attribute that calls this is per-partition, while your method
> is per-queue, so there is some impedence mismatch here.
>
> Is there any way you could look into just making the generic code
> use percpu counters?
Discussed doing that with Jens and reported as much here:
https://www.redhat.com/archives/dm-devel/2018-November/msg00068.html
And Jens gave additional context for why yet another attempt to switch
block core's in_flight to percpu counters is doomed (having already been
proposed and rejected twice):
https://www.redhat.com/archives/dm-devel/2018-November/msg00071.html
And yes, definitely should've cc'd linux-block (now added).
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
2018-11-08 17:07 ` Mike Snitzer
@ 2018-11-14 15:18 ` Christoph Hellwig
-1 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-11-14 15:18 UTC (permalink / raw)
To: Mike Snitzer
Cc: axboe, linux-block, Christoph Hellwig, dm-devel, Mikulas Patocka,
Alasdair G. Kergon
On Thu, Nov 08, 2018 at 12:07:01PM -0500, Mike Snitzer wrote:
> Discussed doing that with Jens and reported as much here:
>
> https://www.redhat.com/archives/dm-devel/2018-November/msg00068.html
>
> And Jens gave additional context for why yet another attempt to switch
> block core's in_flight to percpu counters is doomed (having already been
> proposed and rejected twice):
>
> https://www.redhat.com/archives/dm-devel/2018-November/msg00071.html
>
> And yes, definitely should've cc'd linux-block (now added).
So how is dm different from the the other 3 handful of drivers using
the make_request interface that the per-cpu counters work for dm and
not the others?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
@ 2018-11-14 15:18 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2018-11-14 15:18 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Mikulas Patocka, Alasdair G. Kergon, dm-devel,
linux-block, axboe
On Thu, Nov 08, 2018 at 12:07:01PM -0500, Mike Snitzer wrote:
> Discussed doing that with Jens and reported as much here:
>
> https://www.redhat.com/archives/dm-devel/2018-November/msg00068.html
>
> And Jens gave additional context for why yet another attempt to switch
> block core's in_flight to percpu counters is doomed (having already been
> proposed and rejected twice):
>
> https://www.redhat.com/archives/dm-devel/2018-November/msg00071.html
>
> And yes, definitely should've cc'd linux-block (now added).
So how is dm different from the the other 3 handful of drivers using
the make_request interface that the per-cpu counters work for dm and
not the others?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
2018-11-14 15:18 ` Christoph Hellwig
@ 2018-11-14 15:34 ` Mike Snitzer
-1 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2018-11-14 15:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block, axboe, dm-devel, Mikulas Patocka, Alasdair G. Kergon
On Wed, Nov 14 2018 at 10:18am -0500,
Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 08, 2018 at 12:07:01PM -0500, Mike Snitzer wrote:
> > Discussed doing that with Jens and reported as much here:
> >
> > https://www.redhat.com/archives/dm-devel/2018-November/msg00068.html
> >
> > And Jens gave additional context for why yet another attempt to switch
> > block core's in_flight to percpu counters is doomed (having already been
> > proposed and rejected twice):
> >
> > https://www.redhat.com/archives/dm-devel/2018-November/msg00071.html
> >
> > And yes, definitely should've cc'd linux-block (now added).
>
> So how is dm different from the the other 3 handful of drivers using
> the make_request interface that the per-cpu counters work for dm and
> not the others?
Think the big part of the historic reluctance to switch to percpu
in_flight counters was that until now (4.21) the legacy request path was
also using the in_flight counters.
Now that they are only used by bio-based (make_request) we likely have
more latitude (hopefully?). Though I cannot say for sure why they
performed so well in Mikulas' testing.. you'd thinking all the percpu
summing on every jiffie during IO completion would've still been
costly.. but Mikulas saw great results.
Mikulas and I have discussed a new way forward and he is actively
working through implementing it. Basically he'll still switch to percpu
in_flight counters but he'll change the algorithm for IO accounting
during completion so that it is more of an approximation of the
historically precise in_flight counters and io_ticks (io_ticks is
another problematic component that gets in the way of performance).
Basically the accounting done during IO completion would be much much
faster. Big part of this is the summation of the percpu in_flight
counters would happen on demand (via sysfs or /proc/diskstats access).
I could look back at my logs from my chat with Mikulas to give you more
details or we could just wait for Mikulas to post the patches (hopefully
within a week). Your call.
Coming off my Monday discussion with Mikulas I really think the approach
will work nicely and offer a nice performance win for bio-based.
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
@ 2018-11-14 15:34 ` Mike Snitzer
0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2018-11-14 15:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mikulas Patocka, Alasdair G. Kergon, dm-devel, linux-block, axboe
On Wed, Nov 14 2018 at 10:18am -0500,
Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Nov 08, 2018 at 12:07:01PM -0500, Mike Snitzer wrote:
> > Discussed doing that with Jens and reported as much here:
> >
> > https://www.redhat.com/archives/dm-devel/2018-November/msg00068.html
> >
> > And Jens gave additional context for why yet another attempt to switch
> > block core's in_flight to percpu counters is doomed (having already been
> > proposed and rejected twice):
> >
> > https://www.redhat.com/archives/dm-devel/2018-November/msg00071.html
> >
> > And yes, definitely should've cc'd linux-block (now added).
>
> So how is dm different from the the other 3 handful of drivers using
> the make_request interface that the per-cpu counters work for dm and
> not the others?
Think the big part of the historic reluctance to switch to percpu
in_flight counters was that until now (4.21) the legacy request path was
also using the in_flight counters.
Now that they are only used by bio-based (make_request) we likely have
more latitude (hopefully?). Though I cannot say for sure why they
performed so well in Mikulas' testing.. you'd thinking all the percpu
summing on every jiffie during IO completion would've still been
costly.. but Mikulas saw great results.
Mikulas and I have discussed a new way forward and he is actively
working through implementing it. Basically he'll still switch to percpu
in_flight counters but he'll change the algorithm for IO accounting
during completion so that it is more of an approximation of the
historically precise in_flight counters and io_ticks (io_ticks is
another problematic component that gets in the way of performance).
Basically the accounting done during IO completion would be much much
faster. Big part of this is the summation of the percpu in_flight
counters would happen on demand (via sysfs or /proc/diskstats access).
I could look back at my logs from my chat with Mikulas to give you more
details or we could just wait for Mikulas to post the patches (hopefully
within a week). Your call.
Coming off my Monday discussion with Mikulas I really think the approach
will work nicely and offer a nice performance win for bio-based.
Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
2018-11-14 15:18 ` Christoph Hellwig
@ 2018-11-14 22:49 ` Mikulas Patocka
-1 siblings, 0 replies; 11+ messages in thread
From: Mikulas Patocka @ 2018-11-14 22:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block, axboe, dm-devel, Alasdair G. Kergon, Mike Snitzer
On Wed, 14 Nov 2018, Christoph Hellwig wrote:
> On Thu, Nov 08, 2018 at 12:07:01PM -0500, Mike Snitzer wrote:
> > Discussed doing that with Jens and reported as much here:
> >
> > https://www.redhat.com/archives/dm-devel/2018-November/msg00068.html
> >
> > And Jens gave additional context for why yet another attempt to switch
> > block core's in_flight to percpu counters is doomed (having already been
> > proposed and rejected twice):
> >
> > https://www.redhat.com/archives/dm-devel/2018-November/msg00071.html
> >
> > And yes, definitely should've cc'd linux-block (now added).
>
> So how is dm different from the the other 3 handful of drivers using
> the make_request interface that the per-cpu counters work for dm and
> not the others?
We want to make dm-linear (and dm-stripe) completely lockless, because it
is used often and we don't want it to degrade performance.
DM already uses srcu to handle table changes, so that the fast path
doesn't take any locks. And the only one "lock" that is remaining is the
"in_flight" variable.
As for other drivers, md-raid0 could probably be lockless too (using
percpu counting similar to dm). The other raid levels can't be lockless
because they need to check the status of the stripe that is being
accessed.
Mikulas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
@ 2018-11-14 22:49 ` Mikulas Patocka
0 siblings, 0 replies; 11+ messages in thread
From: Mikulas Patocka @ 2018-11-14 22:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Alasdair G. Kergon, dm-devel, linux-block, axboe
On Wed, 14 Nov 2018, Christoph Hellwig wrote:
> On Thu, Nov 08, 2018 at 12:07:01PM -0500, Mike Snitzer wrote:
> > Discussed doing that with Jens and reported as much here:
> >
> > https://www.redhat.com/archives/dm-devel/2018-November/msg00068.html
> >
> > And Jens gave additional context for why yet another attempt to switch
> > block core's in_flight to percpu counters is doomed (having already been
> > proposed and rejected twice):
> >
> > https://www.redhat.com/archives/dm-devel/2018-November/msg00071.html
> >
> > And yes, definitely should've cc'd linux-block (now added).
>
> So how is dm different from the the other 3 handful of drivers using
> the make_request interface that the per-cpu counters work for dm and
> not the others?
We want to make dm-linear (and dm-stripe) completely lockless, because it
is used often and we don't want it to degrade performance.
DM already uses srcu to handle table changes, so that the fast path
doesn't take any locks. And the only one "lock" that is remaining is the
"in_flight" variable.
As for other drivers, md-raid0 could probably be lockless too (using
percpu counting similar to dm). The other raid levels can't be lockless
because they need to check the status of the stripe that is being
accessed.
Mikulas
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 5/5] block: use a driver-specific handler for the "inflight" value
2018-11-08 14:52 ` Christoph Hellwig
2018-11-08 17:07 ` Mike Snitzer
@ 2018-11-14 22:35 ` Mikulas Patocka
1 sibling, 0 replies; 11+ messages in thread
From: Mikulas Patocka @ 2018-11-14 22:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Mike Snitzer, dm-devel, Alasdair G. Kergon
On Thu, 8 Nov 2018, Christoph Hellwig wrote:
> On Tue, Nov 06, 2018 at 10:35:03PM +0100, Mikulas Patocka wrote:
> > Device mapper was converted to percpu inflight counters. In order to
> > display the correct values in the "inflight" sysfs file, we need a custom
> > callback that sums the percpu counters.
>
> The attribute that calls this is per-partition, while your method
> is per-queue, so there is some impedence mismatch here.
Device mapper doesn't use partitions.
> Is there any way you could look into just making the generic code
> use percpu counters?
In the next merge window, single-queue request-based block drivers will be
eliminated - all the drivers will be multiqueue. So, they won't use
the in_flight variables at all.
in_flight will be only use by bio-based stacking drivers like md.
> Also please cc linux-block on patches that touch the generic block
> code.
Mikulas
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-11-14 22:49 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-06 21:35 [patch 5/5] block: use a driver-specific handler for the "inflight" value Mikulas Patocka
2018-11-08 14:52 ` Christoph Hellwig
2018-11-08 17:07 ` Mike Snitzer
2018-11-08 17:07 ` Mike Snitzer
2018-11-14 15:18 ` Christoph Hellwig
2018-11-14 15:18 ` Christoph Hellwig
2018-11-14 15:34 ` Mike Snitzer
2018-11-14 15:34 ` Mike Snitzer
2018-11-14 22:49 ` Mikulas Patocka
2018-11-14 22:49 ` Mikulas Patocka
2018-11-14 22:35 ` Mikulas Patocka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.