* Re: split scsi passthrough fields out of struct request V2
From: Bart Van Assche @ 2017-01-27 1:15 UTC (permalink / raw)
To: hch@lst.de, axboe@fb.com
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
snitzer@redhat.com, linux-raid@vger.kernel.org,
dm-devel@redhat.com, j-nomura@ce.jp.nec.com
In-Reply-To: <2d971693-b79d-c1b9-fb2a-f5dd04128c68@fb.com>
On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote:
> On 01/26/2017 05:38 PM, Bart Van Assche wrote:
> > I see similar behavior with the blk-mq-sched branch of
> > git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf):
> > booting happens much slower than usual and I/O hangs if I run the
> > srp-test software.
>
> Please don't run that, run for-4.11/block and merge it to master.
> Same behavior?
I have not yet had the chance to run the srp-test software against that
kernel. But I already see that booting takes more than ten times longer
than usual. Note: as far as I know the dm-mpath driver is not involved
in the boot process of my test system.
> > Regarding creating a similar dm setup: I hope that in the future it
> > will become possible to run the srp-test software without any special
> > hardware and with in-tree drivers. Today running the srp-test software
> > with in-tree drivers namely requires IB hardware. This is how to run the
> > srp-test software today with in-tree drivers:
> > * Find a system with at least two InfiniBand ports.
> > * Make sure that the appropriate IB driver in the kernel is enabled and
> > also that LIO (CONFIG_TARGET_CORE=m and CONFIG_TCM_FILEIO=m), ib_srp,
> > ib_srpt and dm-mpath are built as kernel modules.
> > * If none of the IB ports are connected to an IB switch, connect the
> > two ports to each other and configure and start the opensm software
> > such that the port states change from "Initializing" to "Active".
> > * Check with "ibstat | grep State: Active" that at least one port is
> > in the active state.
> > * Configure multipathd as explained in
> > https://github.com/bvanassche/srp-test/blob/master/README.md.
> > * Restart multipathd to make sure it picks up /etc/multipath.conf.
> > * Clone https://github.com/bvanassche/srp-test and start it as follows:
> > srp-test/run_tests -t 02-mq
>
> I can't run that. Any chance of a test case that doesn't require IB?
It is possible to run that test on top of the SoftRoCE driver. I will first
check myself whether the latest version of the SoftRoCE driver is stable
enough to run srp-test on top of it (see also
https://github.com/dledford/linux/commits/k.o/for-4.11).
Bart.
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Jens Axboe @ 2017-01-27 1:22 UTC (permalink / raw)
To: Bart Van Assche, hch@lst.de
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
snitzer@redhat.com, linux-raid@vger.kernel.org,
dm-devel@redhat.com, j-nomura@ce.jp.nec.com
In-Reply-To: <1485479738.2540.30.camel@sandisk.com>
On 01/26/2017 06:15 PM, Bart Van Assche wrote:
> On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote:
>> On 01/26/2017 05:38 PM, Bart Van Assche wrote:
>>> I see similar behavior with the blk-mq-sched branch of
>>> git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf):
>>> booting happens much slower than usual and I/O hangs if I run the
>>> srp-test software.
>>
>> Please don't run that, run for-4.11/block and merge it to master.
>> Same behavior?
>
> I have not yet had the chance to run the srp-test software against that
> kernel. But I already see that booting takes more than ten times longer
> than usual. Note: as far as I know the dm-mpath driver is not involved
> in the boot process of my test system.
What's your boot device? I've been booting this on a variety of setups,
no problems observed. It's booting my laptop, and on SCSI and SATA as
well. What is your root drive? What is the queue depth of it?
Controller?
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH 1/1] imsm: fix missing error message during migration
From: jes.sorensen @ 2017-01-27 1:59 UTC (permalink / raw)
To: Pawel Baldysiak; +Cc: linux-raid
In-Reply-To: <20170124132933.16943-1-pawel.baldysiak@intel.com>
Pawel Baldysiak <pawel.baldysiak@intel.com> writes:
> If user tries to migrate from raid0 to raid5 and there is no spare
> drive to perform it - mdadm will exit with errorcode, but
> no error message is printed.
>
> Print error instead of debug message when this condition occurs,
> so user is informed why requested migration is not started.
>
> Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
> ---
> super-intel.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Applied!
Thanks,
Jes
^ permalink raw reply
* Re: [dm-devel] split scsi passthrough fields out of struct request V2
From: Jens Axboe @ 2017-01-27 6:40 UTC (permalink / raw)
To: Bart Van Assche, hch@lst.de
Cc: linux-scsi@vger.kernel.org, linux-raid@vger.kernel.org,
dm-devel@redhat.com, linux-block@vger.kernel.org,
snitzer@redhat.com, j-nomura@ce.jp.nec.com
In-Reply-To: <ecca734c-7a9c-f060-9335-c2de089849c2@fb.com>
On 01/26/2017 06:22 PM, Jens Axboe wrote:
> On 01/26/2017 06:15 PM, Bart Van Assche wrote:
>> On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote:
>>> On 01/26/2017 05:38 PM, Bart Van Assche wrote:
>>>> I see similar behavior with the blk-mq-sched branch of
>>>> git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf):
>>>> booting happens much slower than usual and I/O hangs if I run the
>>>> srp-test software.
>>>
>>> Please don't run that, run for-4.11/block and merge it to master.
>>> Same behavior?
>>
>> I have not yet had the chance to run the srp-test software against that
>> kernel. But I already see that booting takes more than ten times longer
>> than usual. Note: as far as I know the dm-mpath driver is not involved
>> in the boot process of my test system.
>
> What's your boot device? I've been booting this on a variety of setups,
> no problems observed. It's booting my laptop, and on SCSI and SATA as
> well. What is your root drive? What is the queue depth of it?
> Controller?
Are you using dm for your root device?
I think I see what is going on. The scheduler framework put the
insertion of flushes on the side, whereas it's integrated "nicely"
on the legacy side.
Can you try with this applied? This is on top of the previous two that
we already went through. Or, you can just pull:
git://git.kernel.dk/linux-block for-4.11/next
which is for-4.11/block with the next set of fixes on top that I haven't
pulled in yet.
commit 995447bfd14dd871e0c8771261ed7d1f2b5b4c86
Author: Jens Axboe <axboe@fb.com>
Date: Thu Jan 26 23:34:56 2017 -0700
blk-mq-sched: integrate flush insertion into blk_mq_sched_get_request()
Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function.
Outside of cleaning up the code, this handles the case where
outside callers insert a flush, like through
blk_insert_cloned_request().
Signed-off-by: Jens Axboe <axboe@fb.com>
diff --git a/block/blk-core.c b/block/blk-core.c
index a61f1407f4f6..78daf5b6d7cb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2129,7 +2129,7 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
if (q->mq_ops) {
if (blk_queue_io_stat(q))
blk_account_io_start(rq, true);
- blk_mq_sched_insert_request(rq, false, true, false);
+ blk_mq_sched_insert_request(rq, false, true, false, false);
return 0;
}
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 86656fdfa637..ed1f10165268 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -66,7 +66,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
* be reused after dying flag is set
*/
if (q->mq_ops) {
- blk_mq_sched_insert_request(rq, at_head, true, false);
+ blk_mq_sched_insert_request(rq, at_head, true, false, false);
return;
}
diff --git a/block/blk-flush.c b/block/blk-flush.c
index d7de34ee39c2..4427896641ac 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -456,7 +456,7 @@ void blk_insert_flush(struct request *rq)
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops)
- blk_mq_sched_insert_request(rq, false, true, false);
+ blk_mq_sched_insert_request(rq, false, true, false, false);
else
list_add_tail(&rq->queuelist, &q->queue_head);
return;
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c27613de80c5..fa2ff0f458fa 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -336,6 +336,64 @@ void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx)
}
}
+/*
+ * Add flush/fua to the queue. If we fail getting a driver tag, then
+ * punt to the requeue list. Requeue will re-invoke us from a context
+ * that's safe to block from.
+ */
+static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
+ struct request *rq, bool can_block)
+{
+ if (blk_mq_get_driver_tag(rq, &hctx, can_block)) {
+ blk_insert_flush(rq);
+ blk_mq_run_hw_queue(hctx, !can_block);
+ } else
+ blk_mq_add_to_requeue_list(rq, true, true);
+}
+
+void blk_mq_sched_insert_request(struct request *rq, bool at_head,
+ bool run_queue, bool async, bool can_block)
+{
+ struct request_queue *q = rq->q;
+ struct elevator_queue *e = q->elevator;
+ struct blk_mq_ctx *ctx = rq->mq_ctx;
+ struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+
+ if (rq->tag == -1 && (rq->cmd_flags & (REQ_PREFLUSH | REQ_FUA))) {
+ blk_mq_sched_insert_flush(hctx, rq, can_block);
+ return;
+ }
+
+ if (e && e->type->ops.mq.insert_requests) {
+ LIST_HEAD(list);
+
+ list_add(&rq->queuelist, &list);
+ e->type->ops.mq.insert_requests(hctx, &list, at_head);
+ } else {
+ spin_lock(&ctx->lock);
+ __blk_mq_insert_request(hctx, rq, at_head);
+ spin_unlock(&ctx->lock);
+ }
+
+ if (run_queue)
+ blk_mq_run_hw_queue(hctx, async);
+}
+
+void blk_mq_sched_insert_requests(struct request_queue *q,
+ struct blk_mq_ctx *ctx,
+ struct list_head *list, bool run_queue_async)
+{
+ struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+ struct elevator_queue *e = hctx->queue->elevator;
+
+ if (e && e->type->ops.mq.insert_requests)
+ e->type->ops.mq.insert_requests(hctx, list, false);
+ else
+ blk_mq_insert_requests(hctx, ctx, list);
+
+ blk_mq_run_hw_queue(hctx, run_queue_async);
+}
+
static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index becbc7840364..9478aaeb48c5 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -21,6 +21,12 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx);
+void blk_mq_sched_insert_request(struct request *rq, bool at_head,
+ bool run_queue, bool async, bool can_block);
+void blk_mq_sched_insert_requests(struct request_queue *q,
+ struct blk_mq_ctx *ctx,
+ struct list_head *list, bool run_queue_async);
+
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_move_to_dispatch(struct blk_mq_hw_ctx *hctx,
struct list_head *rq_list,
@@ -62,45 +68,6 @@ static inline void blk_mq_sched_put_rq_priv(struct request_queue *q,
e->type->ops.mq.put_rq_priv(q, rq);
}
-static inline void
-blk_mq_sched_insert_request(struct request *rq, bool at_head, bool run_queue,
- bool async)
-{
- struct request_queue *q = rq->q;
- struct elevator_queue *e = q->elevator;
- struct blk_mq_ctx *ctx = rq->mq_ctx;
- struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
-
- if (e && e->type->ops.mq.insert_requests) {
- LIST_HEAD(list);
-
- list_add(&rq->queuelist, &list);
- e->type->ops.mq.insert_requests(hctx, &list, at_head);
- } else {
- spin_lock(&ctx->lock);
- __blk_mq_insert_request(hctx, rq, at_head);
- spin_unlock(&ctx->lock);
- }
-
- if (run_queue)
- blk_mq_run_hw_queue(hctx, async);
-}
-
-static inline void
-blk_mq_sched_insert_requests(struct request_queue *q, struct blk_mq_ctx *ctx,
- struct list_head *list, bool run_queue_async)
-{
- struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
- struct elevator_queue *e = hctx->queue->elevator;
-
- if (e && e->type->ops.mq.insert_requests)
- e->type->ops.mq.insert_requests(hctx, list, false);
- else
- blk_mq_insert_requests(hctx, ctx, list);
-
- blk_mq_run_hw_queue(hctx, run_queue_async);
-}
-
static inline bool
blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
struct bio *bio)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 1b156ca79af6..78bbacd129c9 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -106,6 +106,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
struct sbq_wait_state *ws;
DEFINE_WAIT(wait);
unsigned int tag_offset;
+ bool drop_ctx;
int tag;
if (data->flags & BLK_MQ_REQ_RESERVED) {
@@ -128,6 +129,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
return BLK_MQ_TAG_FAIL;
ws = bt_wait_ptr(bt, data->hctx);
+ drop_ctx = data->ctx == NULL;
do {
prepare_to_wait(&ws->wait, &wait, TASK_UNINTERRUPTIBLE);
@@ -150,7 +152,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
if (tag != -1)
break;
- blk_mq_put_ctx(data->ctx);
+ if (data->ctx)
+ blk_mq_put_ctx(data->ctx);
io_schedule();
@@ -166,6 +169,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
ws = bt_wait_ptr(bt, data->hctx);
} while (1);
+ if (drop_ctx && data->ctx)
+ blk_mq_put_ctx(data->ctx);
+
finish_wait(&ws->wait, &wait);
found_tag:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4df397910251..9046f7802de3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -568,13 +568,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
rq->rq_flags &= ~RQF_SOFTBARRIER;
list_del_init(&rq->queuelist);
- blk_mq_sched_insert_request(rq, true, false, false);
+ blk_mq_sched_insert_request(rq, true, false, false, true);
}
while (!list_empty(&rq_list)) {
rq = list_entry(rq_list.next, struct request, queuelist);
list_del_init(&rq->queuelist);
- blk_mq_sched_insert_request(rq, false, false, false);
+ blk_mq_sched_insert_request(rq, false, false, false, true);
}
blk_mq_run_hw_queues(q, false);
@@ -847,12 +847,11 @@ static inline unsigned int queued_to_index(unsigned int queued)
return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
}
-static bool blk_mq_get_driver_tag(struct request *rq,
- struct blk_mq_hw_ctx **hctx, bool wait)
+bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
+ bool wait)
{
struct blk_mq_alloc_data data = {
.q = rq->q,
- .ctx = rq->mq_ctx,
.hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu),
.flags = wait ? 0 : BLK_MQ_REQ_NOWAIT,
};
@@ -1395,7 +1394,7 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
}
insert:
- blk_mq_sched_insert_request(rq, false, true, true);
+ blk_mq_sched_insert_request(rq, false, true, true, false);
}
/*
@@ -1445,12 +1444,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
cookie = request_to_qc_t(data.hctx, rq);
- if (unlikely(is_flush_fua)) {
- blk_mq_bio_to_request(rq, bio);
- blk_mq_get_driver_tag(rq, NULL, true);
- blk_insert_flush(rq);
- goto run_queue;
- }
+ if (unlikely(is_flush_fua))
+ goto insert;
plug = current->plug;
/*
@@ -1499,10 +1494,11 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
}
if (q->elevator) {
+insert:
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_sched_insert_request(rq, false, true,
- !is_sync || is_flush_fua);
+ !is_sync || is_flush_fua, true);
goto done;
}
if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
@@ -1512,7 +1508,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
* latter allows for merging opportunities and more efficient
* dispatching.
*/
-run_queue:
blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
}
blk_mq_put_ctx(data.ctx);
@@ -1567,12 +1562,8 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
cookie = request_to_qc_t(data.hctx, rq);
- if (unlikely(is_flush_fua)) {
- blk_mq_bio_to_request(rq, bio);
- blk_mq_get_driver_tag(rq, NULL, true);
- blk_insert_flush(rq);
- goto run_queue;
- }
+ if (unlikely(is_flush_fua))
+ goto insert;
/*
* A task plug currently exists. Since this is completely lockless,
@@ -1609,10 +1600,11 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
}
if (q->elevator) {
+insert:
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_sched_insert_request(rq, false, true,
- !is_sync || is_flush_fua);
+ !is_sync || is_flush_fua, true);
goto done;
}
if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
@@ -1622,7 +1614,6 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
* latter allows for merging opportunities and more efficient
* dispatching.
*/
-run_queue:
blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d19b0e75a129..d34929968071 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -34,6 +34,8 @@ void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *, struct list_head *);
void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
+bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
+ bool wait);
/*
* Internal helpers for allocating/freeing the request map
--
Jens Axboe
^ permalink raw reply related
* Re: [systemd-devel] Errorneous detection of degraded array
From: Andrei Borzenkov @ 2017-01-27 7:12 UTC (permalink / raw)
To: Luke Pyzowski, 'systemd-devel@lists.freedesktop.org',
linux-raid
In-Reply-To: <96A26C8C6786C341B83BC4F2BC5419E4795DE9A6@SRF-EXCH1.corp.sunrisefutures.com>
26.01.2017 21:02, Luke Pyzowski пишет:
> Hello,
> I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur.
>
> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
> systemd[1]: Starting Activate md array even though degraded...
> systemd[1]: Stopped target Local File Systems.
> systemd[1]: Stopping Local File Systems.
> systemd[1]: Unmounting /share...
> systemd[1]: Stopped (with error) /dev/md0.
> systemd[1]: Started Activate md array even though degraded.
> systemd[1]: Unmounted /share.
>
> When the system boots normally the following is in the logs:
> systemd[1]: Started Timer to wait for more drives before activating degraded array..
> systemd[1]: Starting Timer to wait for more drives before activating degraded array..
> ...
> systemd[1]: Stopped Timer to wait for more drives before activating degraded array..
> systemd[1]: Stopping Timer to wait for more drives before activating degraded array..
>
> The above occurs within the same second according to the timestamps and the timer ends prior to mounting any local filesystems, it properly detects that the RAID is valid and everything continues normally. The other RAID device - a RAID1 of 2 disks containing swap and / have never exhibited this failure.
>
> My question is, what are the conditions where systemd detects the RAID6 as being degraded? It seems to be a race condition somewhere, but I am not sure what configuration should be modified if any. If needed I can provide more verbose logs, just let me know if they might be useful.
>
It is not directly related to systemd. When block device that is part of
MD array is detected by kernel, udev rule queries array if it is
complete. If it is, it starts array (subject to general rules of which
arrays are auto-started); and if not, it (udev rule) starts timer to
assemble degraded array.
See udev-md-raid-assembly.rules in mdadm sources:
ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*",
ENV{MD_FOREIGN}=="no",
ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer"
So it looks like events for some array members either got lost or are
delivered late.
Note that there was discussion on openSUSE list where arrays would not
be auto-assembled on boot, even though triggering device change *after*
initial boot would correctly run these rules. This situation was
triggered by adding extra disk to the system (i.e. - boot with 3 disks
worked, with 4 disks - not). I could not find any hints even after
enabling full udev and systemd debug logs. Logs are available if anyone
wants to try it.
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Jens Axboe @ 2017-01-27 8:04 UTC (permalink / raw)
To: Bart Van Assche, hch@lst.de
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
snitzer@redhat.com, linux-raid@vger.kernel.org,
dm-devel@redhat.com, j-nomura@ce.jp.nec.com
In-Reply-To: <37ab009a-bc2d-d2ae-a875-269ab563a430@fb.com>
On 01/26/2017 11:40 PM, Jens Axboe wrote:
> On 01/26/2017 06:22 PM, Jens Axboe wrote:
>> On 01/26/2017 06:15 PM, Bart Van Assche wrote:
>>> On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote:
>>>> On 01/26/2017 05:38 PM, Bart Van Assche wrote:
>>>>> I see similar behavior with the blk-mq-sched branch of
>>>>> git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf):
>>>>> booting happens much slower than usual and I/O hangs if I run the
>>>>> srp-test software.
>>>>
>>>> Please don't run that, run for-4.11/block and merge it to master.
>>>> Same behavior?
>>>
>>> I have not yet had the chance to run the srp-test software against that
>>> kernel. But I already see that booting takes more than ten times longer
>>> than usual. Note: as far as I know the dm-mpath driver is not involved
>>> in the boot process of my test system.
>>
>> What's your boot device? I've been booting this on a variety of setups,
>> no problems observed. It's booting my laptop, and on SCSI and SATA as
>> well. What is your root drive? What is the queue depth of it?
>> Controller?
>
> Are you using dm for your root device?
>
> I think I see what is going on. The scheduler framework put the
> insertion of flushes on the side, whereas it's integrated "nicely"
> on the legacy side.
>
> Can you try with this applied? This is on top of the previous two that
> we already went through. Or, you can just pull:
>
> git://git.kernel.dk/linux-block for-4.11/next
>
> which is for-4.11/block with the next set of fixes on top that I haven't
> pulled in yet.
The previous patch had a bug if you didn't use a scheduler, here's a
version that should work fine in both cases. I've also updated the
above mentioned branch, so feel free to pull that as well and merge to
master like before.
commit 2f54ba92a274a7c1a5ceb34a56565f84f7b994b7
Author: Jens Axboe <axboe@fb.com>
Date: Fri Jan 27 01:00:47 2017 -0700
blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()
Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().
Signed-off-by: Jens Axboe <axboe@fb.com>
diff --git a/block/blk-core.c b/block/blk-core.c
index a61f1407f4f6..78daf5b6d7cb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2129,7 +2129,7 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
if (q->mq_ops) {
if (blk_queue_io_stat(q))
blk_account_io_start(rq, true);
- blk_mq_sched_insert_request(rq, false, true, false);
+ blk_mq_sched_insert_request(rq, false, true, false, false);
return 0;
}
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 86656fdfa637..ed1f10165268 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -66,7 +66,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
* be reused after dying flag is set
*/
if (q->mq_ops) {
- blk_mq_sched_insert_request(rq, at_head, true, false);
+ blk_mq_sched_insert_request(rq, at_head, true, false, false);
return;
}
diff --git a/block/blk-flush.c b/block/blk-flush.c
index d7de34ee39c2..4427896641ac 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -456,7 +456,7 @@ void blk_insert_flush(struct request *rq)
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops)
- blk_mq_sched_insert_request(rq, false, true, false);
+ blk_mq_sched_insert_request(rq, false, true, false, false);
else
list_add_tail(&rq->queuelist, &q->queue_head);
return;
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c27613de80c5..5e91743e193a 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -336,6 +336,64 @@ void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx)
}
}
+/*
+ * Add flush/fua to the queue. If we fail getting a driver tag, then
+ * punt to the requeue list. Requeue will re-invoke us from a context
+ * that's safe to block from.
+ */
+static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
+ struct request *rq, bool can_block)
+{
+ if (blk_mq_get_driver_tag(rq, &hctx, can_block)) {
+ blk_insert_flush(rq);
+ blk_mq_run_hw_queue(hctx, true);
+ } else
+ blk_mq_add_to_requeue_list(rq, true, true);
+}
+
+void blk_mq_sched_insert_request(struct request *rq, bool at_head,
+ bool run_queue, bool async, bool can_block)
+{
+ struct request_queue *q = rq->q;
+ struct elevator_queue *e = q->elevator;
+ struct blk_mq_ctx *ctx = rq->mq_ctx;
+ struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+
+ if (rq->tag == -1 && (rq->cmd_flags & (REQ_PREFLUSH | REQ_FUA))) {
+ blk_mq_sched_insert_flush(hctx, rq, can_block);
+ return;
+ }
+
+ if (e && e->type->ops.mq.insert_requests) {
+ LIST_HEAD(list);
+
+ list_add(&rq->queuelist, &list);
+ e->type->ops.mq.insert_requests(hctx, &list, at_head);
+ } else {
+ spin_lock(&ctx->lock);
+ __blk_mq_insert_request(hctx, rq, at_head);
+ spin_unlock(&ctx->lock);
+ }
+
+ if (run_queue)
+ blk_mq_run_hw_queue(hctx, async);
+}
+
+void blk_mq_sched_insert_requests(struct request_queue *q,
+ struct blk_mq_ctx *ctx,
+ struct list_head *list, bool run_queue_async)
+{
+ struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
+ struct elevator_queue *e = hctx->queue->elevator;
+
+ if (e && e->type->ops.mq.insert_requests)
+ e->type->ops.mq.insert_requests(hctx, list, false);
+ else
+ blk_mq_insert_requests(hctx, ctx, list);
+
+ blk_mq_run_hw_queue(hctx, run_queue_async);
+}
+
static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index becbc7840364..9478aaeb48c5 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -21,6 +21,12 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx);
+void blk_mq_sched_insert_request(struct request *rq, bool at_head,
+ bool run_queue, bool async, bool can_block);
+void blk_mq_sched_insert_requests(struct request_queue *q,
+ struct blk_mq_ctx *ctx,
+ struct list_head *list, bool run_queue_async);
+
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_move_to_dispatch(struct blk_mq_hw_ctx *hctx,
struct list_head *rq_list,
@@ -62,45 +68,6 @@ static inline void blk_mq_sched_put_rq_priv(struct request_queue *q,
e->type->ops.mq.put_rq_priv(q, rq);
}
-static inline void
-blk_mq_sched_insert_request(struct request *rq, bool at_head, bool run_queue,
- bool async)
-{
- struct request_queue *q = rq->q;
- struct elevator_queue *e = q->elevator;
- struct blk_mq_ctx *ctx = rq->mq_ctx;
- struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
-
- if (e && e->type->ops.mq.insert_requests) {
- LIST_HEAD(list);
-
- list_add(&rq->queuelist, &list);
- e->type->ops.mq.insert_requests(hctx, &list, at_head);
- } else {
- spin_lock(&ctx->lock);
- __blk_mq_insert_request(hctx, rq, at_head);
- spin_unlock(&ctx->lock);
- }
-
- if (run_queue)
- blk_mq_run_hw_queue(hctx, async);
-}
-
-static inline void
-blk_mq_sched_insert_requests(struct request_queue *q, struct blk_mq_ctx *ctx,
- struct list_head *list, bool run_queue_async)
-{
- struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
- struct elevator_queue *e = hctx->queue->elevator;
-
- if (e && e->type->ops.mq.insert_requests)
- e->type->ops.mq.insert_requests(hctx, list, false);
- else
- blk_mq_insert_requests(hctx, ctx, list);
-
- blk_mq_run_hw_queue(hctx, run_queue_async);
-}
-
static inline bool
blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
struct bio *bio)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 1b156ca79af6..78bbacd129c9 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -106,6 +106,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
struct sbq_wait_state *ws;
DEFINE_WAIT(wait);
unsigned int tag_offset;
+ bool drop_ctx;
int tag;
if (data->flags & BLK_MQ_REQ_RESERVED) {
@@ -128,6 +129,7 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
return BLK_MQ_TAG_FAIL;
ws = bt_wait_ptr(bt, data->hctx);
+ drop_ctx = data->ctx == NULL;
do {
prepare_to_wait(&ws->wait, &wait, TASK_UNINTERRUPTIBLE);
@@ -150,7 +152,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
if (tag != -1)
break;
- blk_mq_put_ctx(data->ctx);
+ if (data->ctx)
+ blk_mq_put_ctx(data->ctx);
io_schedule();
@@ -166,6 +169,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
ws = bt_wait_ptr(bt, data->hctx);
} while (1);
+ if (drop_ctx && data->ctx)
+ blk_mq_put_ctx(data->ctx);
+
finish_wait(&ws->wait, &wait);
found_tag:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4df397910251..888868b62018 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -568,13 +568,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
rq->rq_flags &= ~RQF_SOFTBARRIER;
list_del_init(&rq->queuelist);
- blk_mq_sched_insert_request(rq, true, false, false);
+ blk_mq_sched_insert_request(rq, true, false, false, true);
}
while (!list_empty(&rq_list)) {
rq = list_entry(rq_list.next, struct request, queuelist);
list_del_init(&rq->queuelist);
- blk_mq_sched_insert_request(rq, false, false, false);
+ blk_mq_sched_insert_request(rq, false, false, false, true);
}
blk_mq_run_hw_queues(q, false);
@@ -847,12 +847,11 @@ static inline unsigned int queued_to_index(unsigned int queued)
return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
}
-static bool blk_mq_get_driver_tag(struct request *rq,
- struct blk_mq_hw_ctx **hctx, bool wait)
+bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
+ bool wait)
{
struct blk_mq_alloc_data data = {
.q = rq->q,
- .ctx = rq->mq_ctx,
.hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu),
.flags = wait ? 0 : BLK_MQ_REQ_NOWAIT,
};
@@ -1395,7 +1394,7 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
}
insert:
- blk_mq_sched_insert_request(rq, false, true, true);
+ blk_mq_sched_insert_request(rq, false, true, true, false);
}
/*
@@ -1446,10 +1445,12 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
cookie = request_to_qc_t(data.hctx, rq);
if (unlikely(is_flush_fua)) {
+ blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_get_driver_tag(rq, NULL, true);
blk_insert_flush(rq);
- goto run_queue;
+ blk_mq_run_hw_queue(data.hctx, true);
+ goto done;
}
plug = current->plug;
@@ -1502,7 +1503,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_sched_insert_request(rq, false, true,
- !is_sync || is_flush_fua);
+ !is_sync || is_flush_fua, true);
goto done;
}
if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
@@ -1512,7 +1513,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
* latter allows for merging opportunities and more efficient
* dispatching.
*/
-run_queue:
blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
}
blk_mq_put_ctx(data.ctx);
@@ -1568,10 +1568,12 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
cookie = request_to_qc_t(data.hctx, rq);
if (unlikely(is_flush_fua)) {
+ blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_get_driver_tag(rq, NULL, true);
blk_insert_flush(rq);
- goto run_queue;
+ blk_mq_run_hw_queue(data.hctx, true);
+ goto done;
}
/*
@@ -1612,7 +1614,7 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
blk_mq_put_ctx(data.ctx);
blk_mq_bio_to_request(rq, bio);
blk_mq_sched_insert_request(rq, false, true,
- !is_sync || is_flush_fua);
+ !is_sync || is_flush_fua, true);
goto done;
}
if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
@@ -1622,7 +1624,6 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
* latter allows for merging opportunities and more efficient
* dispatching.
*/
-run_queue:
blk_mq_run_hw_queue(data.hctx, !is_sync || is_flush_fua);
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d19b0e75a129..d34929968071 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -34,6 +34,8 @@ void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *, struct list_head *);
void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
+bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
+ bool wait);
/*
* Internal helpers for allocating/freeing the request map
--
Jens Axboe
^ permalink raw reply related
* Re: Errorneous detection of degraded array
From: Martin Wilck @ 2017-01-27 8:25 UTC (permalink / raw)
To: Andrei Borzenkov, Luke Pyzowski,
'systemd-devel@lists.freedesktop.org', linux-raid
In-Reply-To: <b842c2ce-7d32-a392-e9d9-9f330fa6a7cf@gmail.com>
> 26.01.2017 21:02, Luke Pyzowski пишет:
> > Hello,
> > I have a large RAID6 device with 24 local drives on CentOS7.3.
> > Randomly (around 50% of the time) systemd will unmount my RAID
> > device thinking it is degraded after the mdadm-last-resort@.timer
> > expires, however the device is working normally by all accounts,
> > and I can immediately mount it manually upon boot completion. In
> > the logs below /share is the RAID device. I can increase the timer
> > in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60
> > seconds, but this problem can randomly still occur.
It seems to me that you rather need to decrease the timeout value, or
(more reasonable) increase x-systemd.device-timeout for the /share
mount point.
Unfortunately your log excerpt contains to time stamps but I suppose
you're facing a race where the device times out before the "last
resort" timer starts it (and before the last devices appear).
Martin
--
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel
^ permalink raw reply
* hi Linux
From: nimiak @ 2017-01-27 12:12 UTC (permalink / raw)
To: Linux Raid
greetings Linux
http://gilstrap-sneed.com/bookmark.php?love=2kfz2auv8hc1fd2g
nimiak
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Jens Axboe @ 2017-01-27 16:11 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Junichi Nomura, linux-block, linux-scsi, linux-raid,
dm-devel
In-Reply-To: <1485365126-23210-1-git-send-email-hch@lst.de>
On Wed, Jan 25 2017, Christoph Hellwig wrote:
> Hi all,
>
> this series splits the support for SCSI passthrough commands from the
> main struct request used all over the block layer into a separate
> scsi_request structure that drivers that want to support SCSI passthough
> need to embedded as the first thing into their request-private data,
> similar to how we handle NVMe passthrough commands.
>
> To support this I've added support for that the private data after
> request structure to the legacy request path instead, so that it can
> be treated the same way as the blk-mq path. Compare to the current
> scsi_cmnd allocator that actually is a major simplification.
>
> Changes since V1:
> - fix handling of a NULL sense pointer in __scsi_execute
> - clean up handling of the flush flags in the block layer and MD
> - additional small cleanup in dm-rq
I've queued this up for 4.11. Since some of the patches had dependencies
on changes in master since for-4.11/block was forked, they are sitting
in a separate branch that has both for-4.11/block and v4.10-rc5 pulled
in first. for-next has everything, as usual.
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH 05/18] block: allow specifying size for extra command data
From: Christoph Hellwig @ 2017-01-27 16:12 UTC (permalink / raw)
To: Martin K. Petersen
Cc: linux-block, linux-raid, linux-scsi, Mike Snitzer, Jens Axboe,
dm-devel, Junichi Nomura, Christoph Hellwig
In-Reply-To: <yq160l2a4us.fsf@oracle.com>
On Wed, Jan 25, 2017 at 10:15:55PM -0500, Martin K. Petersen wrote:
> +static void *alloc_request_size(gfp_t gfp_mask, void *data)
>
> I like alloc_request_simple() but alloc_request_size() seems a bit
> contrived. _reserve? _extra? _special? Don't have any good suggestions,
> I'm afraid.
Not that I'm a fan of _size, but I like the other suggestions even less.
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Christoph Hellwig @ 2017-01-27 16:17 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-raid, linux-scsi, Mike Snitzer, linux-block, dm-devel,
Junichi Nomura, Christoph Hellwig
In-Reply-To: <20170127161114.GA501@kernel.dk>
On Fri, Jan 27, 2017 at 09:11:14AM -0700, Jens Axboe wrote:
> I've queued this up for 4.11. Since some of the patches had dependencies
> on changes in master since for-4.11/block was forked, they are sitting
> in a separate branch that has both for-4.11/block and v4.10-rc5 pulled
> in first. for-next has everything, as usual.
Eww. I just had a couple non-trivial updates that I now do again.
In case you haven't pushed it out yet can you let me repost first?
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Jens Axboe @ 2017-01-27 16:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-raid, Mike Snitzer, linux-scsi, linux-block, dm-devel,
Junichi Nomura
In-Reply-To: <20170127161718.GA16911@lst.de>
On 01/27/2017 09:17 AM, Christoph Hellwig wrote:
> On Fri, Jan 27, 2017 at 09:11:14AM -0700, Jens Axboe wrote:
>> I've queued this up for 4.11. Since some of the patches had dependencies
>> on changes in master since for-4.11/block was forked, they are sitting
>> in a separate branch that has both for-4.11/block and v4.10-rc5 pulled
>> in first. for-next has everything, as usual.
>
> Eww. I just had a couple non-trivial updates that I now do again.
> In case you haven't pushed it out yet can you let me repost first?
Why the eww?! You can't fix this with a repost.
It's fine, I'll just ship off for-4.11/block first (as usual), then
for-4.11/rq-refactor.
The two issues is in virtio_blk and raid1. For some reason, raid1
included a refactor of a function later in the cycle (hrmpf). So there's
really no good way to solve this, unless I pull in v4.10-rc5 into
for-4.11/block. And I don't want to do that. Hence the topic branch for
this work.
I have pushed it out, but it's not merged into for-next yet, it's just
standalone. When I've done some sanity testing, I'll push it out.
--
Jens Axboe
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Christoph Hellwig @ 2017-01-27 16:23 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-raid, linux-scsi, Mike Snitzer, linux-block, dm-devel,
Junichi Nomura, Christoph Hellwig
In-Reply-To: <775aff8d-6bcc-9cc5-cfa3-dd1c157c95c5@fb.com>
On Fri, Jan 27, 2017 at 09:21:46AM -0700, Jens Axboe wrote:
> On 01/27/2017 09:17 AM, Christoph Hellwig wrote:
> > On Fri, Jan 27, 2017 at 09:11:14AM -0700, Jens Axboe wrote:
> >> I've queued this up for 4.11. Since some of the patches had dependencies
> >> on changes in master since for-4.11/block was forked, they are sitting
> >> in a separate branch that has both for-4.11/block and v4.10-rc5 pulled
> >> in first. for-next has everything, as usual.
> >
> > Eww. I just had a couple non-trivial updates that I now do again.
> > In case you haven't pushed it out yet can you let me repost first?
>
> Why the eww?! You can't fix this with a repost.
Not because of the merge, mostly because I just spent two same
time adding all the ACKs, fixing typos and adding the removal of
the ->cmd tracing to the series and was getting ready for a repost.
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Jens Axboe @ 2017-01-27 16:27 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-raid, Mike Snitzer, linux-scsi, linux-block, dm-devel,
Junichi Nomura
In-Reply-To: <20170127162316.GA17059@lst.de>
On 01/27/2017 09:23 AM, Christoph Hellwig wrote:
> On Fri, Jan 27, 2017 at 09:21:46AM -0700, Jens Axboe wrote:
>> On 01/27/2017 09:17 AM, Christoph Hellwig wrote:
>>> On Fri, Jan 27, 2017 at 09:11:14AM -0700, Jens Axboe wrote:
>>>> I've queued this up for 4.11. Since some of the patches had dependencies
>>>> on changes in master since for-4.11/block was forked, they are sitting
>>>> in a separate branch that has both for-4.11/block and v4.10-rc5 pulled
>>>> in first. for-next has everything, as usual.
>>>
>>> Eww. I just had a couple non-trivial updates that I now do again.
>>> In case you haven't pushed it out yet can you let me repost first?
>>
>> Why the eww?! You can't fix this with a repost.
>
> Not because of the merge, mostly because I just spent two same
> time adding all the ACKs, fixing typos and adding the removal of
> the ->cmd tracing to the series and was getting ready for a repost.
Feel free to repost it, I have no problem rebasing that branch as it's
standalone for now.
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH 07/18] dm: always defer request allocation to the owner of the request_queue
From: Mike Snitzer @ 2017-01-27 16:34 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-block, linux-raid, linux-scsi, Jens Axboe, dm-devel,
Junichi Nomura
In-Reply-To: <1485365126-23210-8-git-send-email-hch@lst.de>
On Wed, Jan 25 2017 at 12:25pm -0500,
Christoph Hellwig <hch@lst.de> wrote:
> DM already calls blk_mq_alloc_request on the request_queue of the
> underlying device if it is a blk-mq device. But now that we allow drivers
> to allocate additional data and initialize it ahead of time we need to do
> the same for all drivers. Doing so and using the new cmd_size
> infrastructure in the block layer greatly simplifies the dm-rq and mpath
> code, and should also make arbitrary combinations of SQ and MQ devices
> with SQ or MQ device mapper tables easily possible as a further step.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> Reviewed-by: Mike Snitzer <snitzer@redhat.com>
...
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 3f12916..8d06834 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -185,7 +163,7 @@ static void end_clone_bio(struct bio *clone)
>
> static struct dm_rq_target_io *tio_from_request(struct request *rq)
> {
> - return (rq->q->mq_ops ? blk_mq_rq_to_pdu(rq) : rq->special);
> + return blk_mq_rq_to_pdu(rq);
> }
Noticed after further review that it seems a bit weird to have the non
blk-mq support in drivers calling blk_mq_rq_to_pdu(). But I'm not sure
a blk_rq_to_pdu() macro to blk_mq_rq_to_pdu() is the right thing. What
do you guys think?
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Christoph Hellwig @ 2017-01-27 16:34 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-raid, linux-scsi, Mike Snitzer, linux-block, dm-devel,
Junichi Nomura, Christoph Hellwig
In-Reply-To: <195e1638-7d85-0618-0101-5cfff34b91e9@fb.com>
On Fri, Jan 27, 2017 at 09:27:02AM -0700, Jens Axboe wrote:
> Feel free to repost it, I have no problem rebasing that branch as it's
> standalone for now.
Ok, I'll repost what I have right now, which is on top of a merge
of your block/for-4.11/next and your for-next from this morning
my time.
Btw, I disagred with your patch to use op_is_flush in
generic_make_request_checks - given that we clear these flags just
below I think using the helper obsfucates what's really going on.
^ permalink raw reply
* split scsi passthrough fields out of struct request V3
From: Christoph Hellwig @ 2017-01-27 16:34 UTC (permalink / raw)
To: Jens Axboe
Cc: Mike Snitzer, Junichi Nomura, linux-block, linux-scsi, linux-raid,
dm-devel
Hi all,
this series splits the support for SCSI passthrough commands from the
main struct request used all over the block layer into a separate
scsi_request structure that drivers that want to support SCSI passthough
need to embedded as the first thing into their request-private data,
similar to how we handle NVMe passthrough commands.
To support this I've added support for that the private data after
request structure to the legacy request path instead, so that it can
be treated the same way as the blk-mq path. Compare to the current
scsi_cmnd allocator that actually is a major simplification.
Changes since V2:
- remove req->cmd tracing
- minor spelling fixes
Changes since V1:
- fix handling of a NULL sense pointer in __scsi_execute
- clean up handling of the flush flags in the block layer and MD
- additional small cleanup in dm-rq
^ permalink raw reply
* [PATCH 01/19] block: add a op_is_flush helper
From: Christoph Hellwig @ 2017-01-27 16:35 UTC (permalink / raw)
To: Jens Axboe
Cc: Mike Snitzer, Junichi Nomura, linux-block, linux-scsi, linux-raid,
dm-devel
In-Reply-To: <1485534918-18239-1-git-send-email-hch@lst.de>
This centralizes the checks for bios that needs to be go into the flush
state machine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
block/blk-core.c | 8 ++++----
block/blk-mq-sched.c | 5 ++---
block/blk-mq.c | 4 ++--
drivers/md/bcache/request.c | 2 +-
drivers/md/dm-cache-target.c | 13 +++----------
drivers/md/dm-thin.c | 13 +++++--------
include/linux/blk_types.h | 9 +++++++++
7 files changed, 26 insertions(+), 28 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 78daf5b..4bfd867 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1035,7 +1035,7 @@ static bool blk_rq_should_init_elevator(struct bio *bio)
* Flush requests do not use the elevator so skip initialization.
* This allows a request to share the flush and elevator data.
*/
- if (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA))
+ if (op_is_flush(bio->bi_opf))
return false;
return true;
@@ -1641,7 +1641,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
return BLK_QC_T_NONE;
}
- if (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA)) {
+ if (op_is_flush(bio->bi_opf)) {
spin_lock_irq(q->queue_lock);
where = ELEVATOR_INSERT_FLUSH;
goto get_rq;
@@ -2145,7 +2145,7 @@ int blk_insert_cloned_request(struct request_queue *q, struct request *rq)
*/
BUG_ON(blk_queued_rq(rq));
- if (rq->cmd_flags & (REQ_PREFLUSH | REQ_FUA))
+ if (op_is_flush(rq->cmd_flags))
where = ELEVATOR_INSERT_FLUSH;
add_acct_request(q, rq, where);
@@ -3256,7 +3256,7 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
/*
* rq is already accounted, so use raw insert
*/
- if (rq->cmd_flags & (REQ_PREFLUSH | REQ_FUA))
+ if (op_is_flush(rq->cmd_flags))
__elv_add_request(q, rq, ELEVATOR_INSERT_FLUSH);
else
__elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 5e91743..1112752 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -111,7 +111,6 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
struct request *rq;
- const bool is_flush = op & (REQ_PREFLUSH | REQ_FUA);
blk_queue_enter_live(q);
ctx = blk_mq_get_ctx(q);
@@ -126,7 +125,7 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
* Flush requests are special and go directly to the
* dispatch list.
*/
- if (!is_flush && e->type->ops.mq.get_request) {
+ if (!op_is_flush(op) && e->type->ops.mq.get_request) {
rq = e->type->ops.mq.get_request(q, op, data);
if (rq)
rq->rq_flags |= RQF_QUEUED;
@@ -139,7 +138,7 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
}
if (rq) {
- if (!is_flush) {
+ if (!op_is_flush(op)) {
rq->elv.icq = NULL;
if (e && e->type->icq_cache)
blk_mq_sched_assign_ioc(q, rq, bio);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 888868b..ff70219 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1405,7 +1405,7 @@ static void blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie)
static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
{
const int is_sync = op_is_sync(bio->bi_opf);
- const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
+ const int is_flush_fua = op_is_flush(bio->bi_opf);
struct blk_mq_alloc_data data = { .flags = 0 };
struct request *rq;
unsigned int request_count = 0, srcu_idx;
@@ -1527,7 +1527,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
{
const int is_sync = op_is_sync(bio->bi_opf);
- const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
+ const int is_flush_fua = op_is_flush(bio->bi_opf);
struct blk_plug *plug;
unsigned int request_count = 0;
struct blk_mq_alloc_data data = { .flags = 0 };
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 76d2087..01035e7 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -666,7 +666,7 @@ static inline struct search *search_alloc(struct bio *bio,
s->iop.write_prio = 0;
s->iop.error = 0;
s->iop.flags = 0;
- s->iop.flush_journal = (bio->bi_opf & (REQ_PREFLUSH|REQ_FUA)) != 0;
+ s->iop.flush_journal = op_is_flush(bio->bi_opf);
s->iop.wq = bcache_wq;
return s;
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index e04c61e..5b9cf56 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -787,8 +787,7 @@ static void check_if_tick_bio_needed(struct cache *cache, struct bio *bio)
struct per_bio_data *pb = get_per_bio_data(bio, pb_data_size);
spin_lock_irqsave(&cache->lock, flags);
- if (cache->need_tick_bio &&
- !(bio->bi_opf & (REQ_FUA | REQ_PREFLUSH)) &&
+ if (cache->need_tick_bio && !op_is_flush(bio->bi_opf) &&
bio_op(bio) != REQ_OP_DISCARD) {
pb->tick = true;
cache->need_tick_bio = false;
@@ -828,11 +827,6 @@ static dm_oblock_t get_bio_block(struct cache *cache, struct bio *bio)
return to_oblock(block_nr);
}
-static int bio_triggers_commit(struct cache *cache, struct bio *bio)
-{
- return bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
-}
-
/*
* You must increment the deferred set whilst the prison cell is held. To
* encourage this, we ask for 'cell' to be passed in.
@@ -884,7 +878,7 @@ static void issue(struct cache *cache, struct bio *bio)
{
unsigned long flags;
- if (!bio_triggers_commit(cache, bio)) {
+ if (!op_is_flush(bio->bi_opf)) {
accounted_request(cache, bio);
return;
}
@@ -1069,8 +1063,7 @@ static void dec_io_migrations(struct cache *cache)
static bool discard_or_flush(struct bio *bio)
{
- return bio_op(bio) == REQ_OP_DISCARD ||
- bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
+ return bio_op(bio) == REQ_OP_DISCARD || op_is_flush(bio->bi_opf);
}
static void __cell_defer(struct cache *cache, struct dm_bio_prison_cell *cell)
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index d1c05c1..110982d 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -699,7 +699,7 @@ static void remap_to_origin(struct thin_c *tc, struct bio *bio)
static int bio_triggers_commit(struct thin_c *tc, struct bio *bio)
{
- return (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA)) &&
+ return op_is_flush(bio->bi_opf) &&
dm_thin_changed_this_transaction(tc->td);
}
@@ -870,8 +870,7 @@ static void __inc_remap_and_issue_cell(void *context,
struct bio *bio;
while ((bio = bio_list_pop(&cell->bios))) {
- if (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA) ||
- bio_op(bio) == REQ_OP_DISCARD)
+ if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD)
bio_list_add(&info->defer_bios, bio);
else {
inc_all_io_entry(info->tc->pool, bio);
@@ -1716,9 +1715,8 @@ static void __remap_and_issue_shared_cell(void *context,
struct bio *bio;
while ((bio = bio_list_pop(&cell->bios))) {
- if ((bio_data_dir(bio) == WRITE) ||
- (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA) ||
- bio_op(bio) == REQ_OP_DISCARD))
+ if (bio_data_dir(bio) == WRITE || op_is_flush(bio->bi_opf) ||
+ bio_op(bio) == REQ_OP_DISCARD)
bio_list_add(&info->defer_bios, bio);
else {
struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));;
@@ -2635,8 +2633,7 @@ static int thin_bio_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_SUBMITTED;
}
- if (bio->bi_opf & (REQ_PREFLUSH | REQ_FUA) ||
- bio_op(bio) == REQ_OP_DISCARD) {
+ if (op_is_flush(bio->bi_opf) || bio_op(bio) == REQ_OP_DISCARD) {
thin_defer_bio_with_throttle(tc, bio);
return DM_MAPIO_SUBMITTED;
}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0e5b1cd..37c9a43 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -221,6 +221,15 @@ static inline bool op_is_write(unsigned int op)
}
/*
+ * Check if the bio or request is one that needs special treatment in the
+ * flush state machine.
+ */
+static inline bool op_is_flush(unsigned int op)
+{
+ return op & (REQ_FUA | REQ_PREFLUSH);
+}
+
+/*
* Reads are always treated as synchronous, as are requests with the FUA or
* PREFLUSH flag. Other operations may be marked as synchronous using the
* REQ_SYNC flag.
--
2.1.4
^ permalink raw reply related
* [PATCH 02/19] md: cleanup bio op / flags handling in raid1_write_request
From: Christoph Hellwig @ 2017-01-27 16:35 UTC (permalink / raw)
To: Jens Axboe
Cc: Mike Snitzer, Junichi Nomura, linux-block, linux-scsi, linux-raid,
dm-devel
In-Reply-To: <1485534918-18239-1-git-send-email-hch@lst.de>
No need for the local variables, the bio is still live and we can just
assign the bits we want directly. Make me wonder why we can't assign
all the bio flags to start with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
drivers/md/raid1.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7b0f647..67b0365 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1170,10 +1170,6 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
int i, disks;
struct bitmap *bitmap = mddev->bitmap;
unsigned long flags;
- const int op = bio_op(bio);
- const unsigned long do_sync = (bio->bi_opf & REQ_SYNC);
- const unsigned long do_flush_fua = (bio->bi_opf &
- (REQ_PREFLUSH | REQ_FUA));
struct md_rdev *blocked_rdev;
struct blk_plug_cb *cb;
struct raid1_plug_cb *plug = NULL;
@@ -1389,7 +1385,8 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
conf->mirrors[i].rdev->data_offset);
mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
mbio->bi_end_io = raid1_end_write_request;
- bio_set_op_attrs(mbio, op, do_flush_fua | do_sync);
+ mbio->bi_opf = bio_op(bio) |
+ (bio->bi_opf & (REQ_SYNC | REQ_PREFLUSH | REQ_FUA));
if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
!test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
conf->raid_disks - mddev->degraded > 1)
--
2.1.4
^ permalink raw reply related
* Re: [PATCH 07/18] dm: always defer request allocation to the owner of the request_queue
From: Christoph Hellwig @ 2017-01-27 16:36 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-block, linux-raid, linux-scsi, Jens Axboe, dm-devel,
Junichi Nomura, Christoph Hellwig
In-Reply-To: <20170127163434.GA30221@redhat.com>
On Fri, Jan 27, 2017 at 11:34:34AM -0500, Mike Snitzer wrote:
> Noticed after further review that it seems a bit weird to have the non
> blk-mq support in drivers calling blk_mq_rq_to_pdu(). But I'm not sure
> a blk_rq_to_pdu() macro to blk_mq_rq_to_pdu() is the right thing. What
> do you guys think?
My first version had an additional name for it, but it caused more
confusion than help.
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Jens Axboe @ 2017-01-27 16:38 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-raid, Mike Snitzer, linux-scsi, linux-block, dm-devel,
Junichi Nomura
In-Reply-To: <20170127163450.GA17311@lst.de>
On 01/27/2017 09:34 AM, Christoph Hellwig wrote:
> On Fri, Jan 27, 2017 at 09:27:02AM -0700, Jens Axboe wrote:
>> Feel free to repost it, I have no problem rebasing that branch as it's
>> standalone for now.
>
> Ok, I'll repost what I have right now, which is on top of a merge
> of your block/for-4.11/next and your for-next from this morning
> my time.
Perfect.
> Btw, I disagred with your patch to use op_is_flush in
> generic_make_request_checks - given that we clear these flags just
> below I think using the helper obsfucates what's really going on.
Why? It's the exact same check. The ugly part is the fact that
we strip the flags later on, imho.
--
Jens Axboe
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Christoph Hellwig @ 2017-01-27 16:42 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-raid, linux-scsi, Mike Snitzer, linux-block, dm-devel,
Junichi Nomura, Christoph Hellwig
In-Reply-To: <f0305c67-4aca-e2cf-d729-3ceb93d2cc50@fb.com>
On Fri, Jan 27, 2017 at 09:38:40AM -0700, Jens Axboe wrote:
> > Ok, I'll repost what I have right now, which is on top of a merge
> > of your block/for-4.11/next and your for-next from this morning
> > my time.
>
> Perfect.
At least I tried, looks like the mail server is overloaded and crapped
out three mails into it. For now there is a git tree here:
http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/block-pc-refactor
>
> > Btw, I disagred with your patch to use op_is_flush in
> > generic_make_request_checks - given that we clear these flags just
> > below I think using the helper obsfucates what's really going on.
>
> Why? It's the exact same check. The ugly part is the fact that
> we strip the flags later on, imho.
But before it was pretty obvious that it clears exactly the flags checked
two lines earlier. Now it's not as obvious.
^ permalink raw reply
* Re: [PATCH 07/18] dm: always defer request allocation to the owner of the request_queue
From: Mike Snitzer @ 2017-01-27 16:44 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-raid, linux-scsi, Jens Axboe, linux-block, dm-devel,
Junichi Nomura
In-Reply-To: <20170127163604.GA17337@lst.de>
On Fri, Jan 27 2017 at 11:36am -0500,
Christoph Hellwig <hch@lst.de> wrote:
> On Fri, Jan 27, 2017 at 11:34:34AM -0500, Mike Snitzer wrote:
> > Noticed after further review that it seems a bit weird to have the non
> > blk-mq support in drivers calling blk_mq_rq_to_pdu(). But I'm not sure
> > a blk_rq_to_pdu() macro to blk_mq_rq_to_pdu() is the right thing. What
> > do you guys think?
>
> My first version had an additional name for it, but it caused more
> confusion than help.
And renaming blk_mq_rq_to_pdu() to blk_rq_to_pdu() tree-wide would be
too much churn?
I can live with blk_mq_rq_to_pdu(); just figured I'd ask.
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Bart Van Assche @ 2017-01-27 16:52 UTC (permalink / raw)
To: hch@lst.de, axboe@fb.com
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
snitzer@redhat.com, linux-raid@vger.kernel.org,
dm-devel@redhat.com, j-nomura@ce.jp.nec.com
In-Reply-To: <9cbf0ce5-ed79-0252-fd2d-34bebaafffa3@fb.com>
On Fri, 2017-01-27 at 01:04 -0700, Jens Axboe wrote:
> The previous patch had a bug if you didn't use a scheduler, here's a
> version that should work fine in both cases. I've also updated the
> above mentioned branch, so feel free to pull that as well and merge to
> master like before.
Booting time is back to normal with commit f3a8ab7d55bc merged with
v4.10-rc5. That's a great improvement. However, running the srp-test
software triggers now a new complaint:
[ 215.600386] sd 11:0:0:0: [sdh] Attached SCSI disk
[ 215.609485] sd 11:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
[ 215.722900] scsi 13:0:0:0: alua: Detached
[ 215.724452] general protection fault: 0000 [#1] SMP
[ 215.724484] Modules linked in: dm_service_time ib_srp scsi_transport_srp target_core_user uio target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod brd netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm msr configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel hid_generic kvm usbhid irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mlx4_core ghash_clmulni_intel iTCO_wdt dcdbas pcbc tg3
[ 215.724629] iTCO_vendor_support ptp aesni_intel pps_core aes_x86_64 pcspkr crypto_simd libphy ipmi_si glue_helper cryptd ipmi_devintf tpm_tis devlink fjes ipmi_msghandler tpm_tis_core tpm mei_me lpc_ich mei mfd_core button shpchp wmi mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm sr_mod cdrom ehci_pci ehci_hcd usbcore usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
[ 215.724719] CPU: 9 PID: 8043 Comm: multipathd Not tainted 4.10.0-rc5-dbg+ #1
[ 215.724748] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
[ 215.724775] task: ffff8801717998c0 task.stack: ffffc90002a9c000
[ 215.724804] RIP: 0010:scsi_device_put+0xb/0x30
[ 215.724829] RSP: 0018:ffffc90002a9faa0 EFLAGS: 00010246
[ 215.724855] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88038bf85698 RCX: 0000000000000006
[ 215.724880] RDX: 0000000000000006 RSI: ffff88017179a108 RDI: ffff88038bf85698
[ 215.724906] RBP: ffffc90002a9faa8 R08: ffff880384786008 R09: 0000000100170007
[ 215.724932] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88038bf85698
[ 215.724958] R13: ffff88038919f090 R14: dead000000000100 R15: ffff88038a41dd28
[ 215.724983] FS: 00007fbf8c6cf700(0000) GS:ffff88046f440000(0000) knlGS:0000000000000000
[ 215.725010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 215.725035] CR2: 00007f1262ef3ee0 CR3: 000000044f6cc000 CR4: 00000000001406e0
[ 215.725060] Call Trace:
[ 215.725086] scsi_disk_put+0x2d/0x40
[ 215.725110] sd_release+0x3d/0xb0
[ 215.725137] __blkdev_put+0x29e/0x360
[ 215.725163] blkdev_put+0x49/0x170
[ 215.725192] dm_put_table_device+0x58/0xc0 [dm_mod]
[ 215.725219] dm_put_device+0x70/0xc0 [dm_mod]
[ 215.725269] free_priority_group+0x92/0xc0 [dm_multipath]
[ 215.725295] free_multipath+0x70/0xc0 [dm_multipath]
[ 215.725320] multipath_dtr+0x19/0x20 [dm_multipath]
[ 215.725348] dm_table_destroy+0x67/0x120 [dm_mod]
[ 215.725379] dev_suspend+0xde/0x240 [dm_mod]
[ 215.725434] ctl_ioctl+0x1f5/0x520 [dm_mod]
[ 215.725489] dm_ctl_ioctl+0xe/0x20 [dm_mod]
[ 215.725515] do_vfs_ioctl+0x8f/0x700
[ 215.725589] SyS_ioctl+0x3c/0x70
[ 215.725614] entry_SYSCALL_64_fastpath+0x18/0xad
[ 215.725641] RIP: 0033:0x7fbf8aca0667
[ 215.725665] RSP: 002b:00007fbf8c6cd668 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 215.725692] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007fbf8aca0667
[ 215.725716] RDX: 00007fbf8006b940 RSI: 00000000c138fd06 RDI: 0000000000000007
[ 215.725743] RBP: 0000000000000009 R08: 00007fbf8c6cb3c0 R09: 00007fbf8b68d8d8
[ 215.725768] R10: 0000000000000075 R11: 0000000000000246 R12: 00007fbf8c6cd770
[ 215.725793] R13: 0000000000000013 R14: 00000000006168f0 R15: 0000000000f74780
[ 215.725820] Code: bc 24 b8 00 00 00 e8 55 c8 1c 00 48 83 c4 08 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 00 55 48 89 e5 53 48 8b 07 48 89 fb <48> 8b 80 a8 01 00 00 48 8b 38 e8 f6 68 c5 ff 48 8d bb 38 02 00
[ 215.725903] RIP: scsi_device_put+0xb/0x30 RSP: ffffc90002a9faa0
(gdb) list *(scsi_device_put+0xb)
0xffffffff8149fc2b is in scsi_device_put (drivers/scsi/scsi.c:957).
952 * count of the underlying LLDD module. The device is freed once the last
953 * user vanishes.
954 */
955 void scsi_device_put(struct scsi_device *sdev)
956 {
957 module_put(sdev->host->hostt->module);
958 put_device(&sdev->sdev_gendev);
959 }
960 EXPORT_SYMBOL(scsi_device_put);
961
(gdb) disas scsi_device_put
Dump of assembler code for function scsi_device_put:
0xffffffff8149fc20 <+0>: push %rbp
0xffffffff8149fc21 <+1>: mov %rsp,%rbp
0xffffffff8149fc24 <+4>: push %rbx
0xffffffff8149fc25 <+5>: mov (%rdi),%rax
0xffffffff8149fc28 <+8>: mov %rdi,%rbx
0xffffffff8149fc2b <+11>: mov 0x1a8(%rax),%rax
0xffffffff8149fc32 <+18>: mov (%rax),%rdi
0xffffffff8149fc35 <+21>: callq 0xffffffff810f6530 <module_put>
0xffffffff8149fc3a <+26>: lea 0x238(%rbx),%rdi
0xffffffff8149fc41 <+33>: callq 0xffffffff814714b0 <put_device>
0xffffffff8149fc46 <+38>: pop %rbx
0xffffffff8149fc47 <+39>: pop %rbp
0xffffffff8149fc48 <+40>: retq
End of assembler dump.
(gdb) print &((struct Scsi_Host *)0)->hostt
$2 = (struct scsi_host_template **) 0x1a8 <irq_stack_union+424>
Apparently scsi_device_put() was called for a SCSI device that was already
freed (memory poisoning was enabled in my test). This is something I had
not yet seen before.
Bart.
^ permalink raw reply
* Re: split scsi passthrough fields out of struct request V2
From: Jens Axboe @ 2017-01-27 16:56 UTC (permalink / raw)
To: Bart Van Assche, hch@lst.de
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
snitzer@redhat.com, linux-raid@vger.kernel.org,
dm-devel@redhat.com, j-nomura@ce.jp.nec.com
In-Reply-To: <1485535925.4267.1.camel@sandisk.com>
On 01/27/2017 09:52 AM, Bart Van Assche wrote:
> On Fri, 2017-01-27 at 01:04 -0700, Jens Axboe wrote:
>> The previous patch had a bug if you didn't use a scheduler, here's a
>> version that should work fine in both cases. I've also updated the
>> above mentioned branch, so feel free to pull that as well and merge to
>> master like before.
>
> Booting time is back to normal with commit f3a8ab7d55bc merged with
> v4.10-rc5. That's a great improvement. However, running the srp-test
> software triggers now a new complaint:
>
> [ 215.600386] sd 11:0:0:0: [sdh] Attached SCSI disk
> [ 215.609485] sd 11:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
> [ 215.722900] scsi 13:0:0:0: alua: Detached
> [ 215.724452] general protection fault: 0000 [#1] SMP
> [ 215.724484] Modules linked in: dm_service_time ib_srp scsi_transport_srp target_core_user uio target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod brd netconsole xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat libcrc32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm msr configfs ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel hid_generic kvm usbhid irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel mlx4_core ghash_clmulni_intel iTCO_wdt d
cdbas pcbc tg3
> [ 215.724629] iTCO_vendor_support ptp aesni_intel pps_core aes_x86_64 pcspkr crypto_simd libphy ipmi_si glue_helper cryptd ipmi_devintf tpm_tis devlink fjes ipmi_msghandler tpm_tis_core tpm mei_me lpc_ich mei mfd_core button shpchp wmi mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm sr_mod cdrom ehci_pci ehci_hcd usbcore usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua autofs4
> [ 215.724719] CPU: 9 PID: 8043 Comm: multipathd Not tainted 4.10.0-rc5-dbg+ #1
> [ 215.724748] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
> [ 215.724775] task: ffff8801717998c0 task.stack: ffffc90002a9c000
> [ 215.724804] RIP: 0010:scsi_device_put+0xb/0x30
> [ 215.724829] RSP: 0018:ffffc90002a9faa0 EFLAGS: 00010246
> [ 215.724855] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88038bf85698 RCX: 0000000000000006
> [ 215.724880] RDX: 0000000000000006 RSI: ffff88017179a108 RDI: ffff88038bf85698
> [ 215.724906] RBP: ffffc90002a9faa8 R08: ffff880384786008 R09: 0000000100170007
> [ 215.724932] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88038bf85698
> [ 215.724958] R13: ffff88038919f090 R14: dead000000000100 R15: ffff88038a41dd28
> [ 215.724983] FS: 00007fbf8c6cf700(0000) GS:ffff88046f440000(0000) knlGS:0000000000000000
> [ 215.725010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 215.725035] CR2: 00007f1262ef3ee0 CR3: 000000044f6cc000 CR4: 00000000001406e0
> [ 215.725060] Call Trace:
> [ 215.725086] scsi_disk_put+0x2d/0x40
> [ 215.725110] sd_release+0x3d/0xb0
> [ 215.725137] __blkdev_put+0x29e/0x360
> [ 215.725163] blkdev_put+0x49/0x170
> [ 215.725192] dm_put_table_device+0x58/0xc0 [dm_mod]
> [ 215.725219] dm_put_device+0x70/0xc0 [dm_mod]
> [ 215.725269] free_priority_group+0x92/0xc0 [dm_multipath]
> [ 215.725295] free_multipath+0x70/0xc0 [dm_multipath]
> [ 215.725320] multipath_dtr+0x19/0x20 [dm_multipath]
> [ 215.725348] dm_table_destroy+0x67/0x120 [dm_mod]
> [ 215.725379] dev_suspend+0xde/0x240 [dm_mod]
> [ 215.725434] ctl_ioctl+0x1f5/0x520 [dm_mod]
> [ 215.725489] dm_ctl_ioctl+0xe/0x20 [dm_mod]
> [ 215.725515] do_vfs_ioctl+0x8f/0x700
> [ 215.725589] SyS_ioctl+0x3c/0x70
> [ 215.725614] entry_SYSCALL_64_fastpath+0x18/0xad
> [ 215.725641] RIP: 0033:0x7fbf8aca0667
> [ 215.725665] RSP: 002b:00007fbf8c6cd668 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 215.725692] RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007fbf8aca0667
> [ 215.725716] RDX: 00007fbf8006b940 RSI: 00000000c138fd06 RDI: 0000000000000007
> [ 215.725743] RBP: 0000000000000009 R08: 00007fbf8c6cb3c0 R09: 00007fbf8b68d8d8
> [ 215.725768] R10: 0000000000000075 R11: 0000000000000246 R12: 00007fbf8c6cd770
> [ 215.725793] R13: 0000000000000013 R14: 00000000006168f0 R15: 0000000000f74780
> [ 215.725820] Code: bc 24 b8 00 00 00 e8 55 c8 1c 00 48 83 c4 08 48 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 00 55 48 89 e5 53 48 8b 07 48 89 fb <48> 8b 80 a8 01 00 00 48 8b 38 e8 f6 68 c5 ff 48 8d bb 38 02 00
> [ 215.725903] RIP: scsi_device_put+0xb/0x30 RSP: ffffc90002a9faa0
>
> (gdb) list *(scsi_device_put+0xb)
> 0xffffffff8149fc2b is in scsi_device_put (drivers/scsi/scsi.c:957).
> 952 * count of the underlying LLDD module. The device is freed once the last
> 953 * user vanishes.
> 954 */
> 955 void scsi_device_put(struct scsi_device *sdev)
> 956 {
> 957 module_put(sdev->host->hostt->module);
> 958 put_device(&sdev->sdev_gendev);
> 959 }
> 960 EXPORT_SYMBOL(scsi_device_put);
> 961
> (gdb) disas scsi_device_put
> Dump of assembler code for function scsi_device_put:
> 0xffffffff8149fc20 <+0>: push %rbp
> 0xffffffff8149fc21 <+1>: mov %rsp,%rbp
> 0xffffffff8149fc24 <+4>: push %rbx
> 0xffffffff8149fc25 <+5>: mov (%rdi),%rax
> 0xffffffff8149fc28 <+8>: mov %rdi,%rbx
> 0xffffffff8149fc2b <+11>: mov 0x1a8(%rax),%rax
> 0xffffffff8149fc32 <+18>: mov (%rax),%rdi
> 0xffffffff8149fc35 <+21>: callq 0xffffffff810f6530 <module_put>
> 0xffffffff8149fc3a <+26>: lea 0x238(%rbx),%rdi
> 0xffffffff8149fc41 <+33>: callq 0xffffffff814714b0 <put_device>
> 0xffffffff8149fc46 <+38>: pop %rbx
> 0xffffffff8149fc47 <+39>: pop %rbp
> 0xffffffff8149fc48 <+40>: retq
> End of assembler dump.
> (gdb) print &((struct Scsi_Host *)0)->hostt
> $2 = (struct scsi_host_template **) 0x1a8 <irq_stack_union+424>
>
> Apparently scsi_device_put() was called for a SCSI device that was already
> freed (memory poisoning was enabled in my test). This is something I had
> not yet seen before.
I have no idea what this is, I haven't messed with life time or devices
or queues at all in that branch.
--
Jens Axboe
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox