linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG
@ 2017-08-23 23:57 Benjamin Block
  2017-08-23 23:57 ` [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer Benjamin Block
  2017-08-24 14:22 ` [PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG Jens Axboe
  0 siblings, 2 replies; 5+ messages in thread
From: Benjamin Block @ 2017-08-23 23:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Benjamin Block, Martin K . Petersen, linux-block, linux-kernel,
	linux-scsi, Johannes Thumshirn, Christoph Hellwig, Steffen Maier

Hello all,

This is the second try for fixing the regression in the BSG-interface that
exists since v4.11 (for more infos see the first series).

I separated my other changes from the bug-fix so that it is easier to apply
if judged good. I will rebase my cleanups I sent in v1 and send them when I
get a bit more time. But the regression-fix is more important, so here's
that.

I did some more tests on it than on v1, including some heavy parallel I/O
on the same blk-queue using both BSG and the normal SCSI-stack at the same
time (throwing some intentional bad commands in it too). That seemed to
work all well enough - i.e. it didn't crash and got the expected results. I
haven't done any external error-inject, but IMO that would be beyond the
scope right now.

The fix is based on Christoph's idea, I discussed this with him off-list
already.

I rebased the series on Jens' for-next.

Reviews are more than welcome :)


                                                Beste Grüße / Best regards,
                                                  - Benjamin Block

Benjamin Block (1):
  bsg-lib: fix kernel panic resulting from missing allocation of
    reply-buffer

 block/bsg-lib.c         | 74 +++++++++++++++++++++++++++++--------------------
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

-- 
Linux on z Systems Development         /         IBM Systems & Technology Group
               IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer
  2017-08-23 23:57 [PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG Benjamin Block
@ 2017-08-23 23:57 ` Benjamin Block
  2017-08-24  8:45   ` Christoph Hellwig
  2017-08-24 14:22 ` [PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG Jens Axboe
  1 sibling, 1 reply; 5+ messages in thread
From: Benjamin Block @ 2017-08-23 23:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Benjamin Block, Martin K . Petersen, linux-block, linux-kernel,
	linux-scsi, Johannes Thumshirn, Christoph Hellwig, Steffen Maier

Since we split the scsi_request out of struct request bsg fails to
provide a reply-buffer for the drivers. This was done via the pointer
for sense-data, that is not preallocated anymore.

Failing to allocate/assign it results in illegal dereferences because
LLDs use this pointer unquestioned.

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:0000000001590007 R3:0000000000000024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: <Long List>
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 0000000065cb0100 task.stack: 0000000065cb4000
Krnl PSW : 0704e00180000000 000003ff801e4156 (zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000001 000000005fa9d0d0 000000005fa9d078 0000000000e16866
           000003ff00000290 6b6b6b6b6b6b6b6b 0000000059f78f00 000000000000000f
           00000000593a0958 00000000593a0958 0000000060d88800 000000005ddd4c38
           0000000058b50100 07000000659cba08 000003ff801e8556 00000000659cb9a8
Krnl Code: 000003ff801e4146: e31020500004        lg      %r1,80(%r2)
           000003ff801e414c: 58402040           l       %r4,64(%r2)
          #000003ff801e4150: e35020200004       lg      %r5,32(%r2)
          >000003ff801e4156: 50405004           st      %r4,4(%r5)
           000003ff801e415a: e54c50080000       mvhi    8(%r5),0
           000003ff801e4160: e33010280012       lt      %r3,40(%r1)
           000003ff801e4166: a718fffb           lhi     %r1,-5
           000003ff801e416a: 1803               lr      %r0,%r3
Call Trace:
([<000003ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<000003ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<000003ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<00000000009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<00000000009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<00000000001684c2>] tasklet_action+0x15a/0x1d8
 [<0000000000bd28ec>] __do_softirq+0x3ec/0x848
 [<00000000001675a4>] irq_exit+0x74/0xf8
 [<000000000010dd6a>] do_IRQ+0xba/0xf0
 [<0000000000bd19e8>] io_int_handler+0x104/0x2d4
 [<00000000001033b6>] enabled_wait+0xb6/0x188
([<000000000010339e>] enabled_wait+0x9e/0x188)
 [<000000000010396a>] arch_cpu_idle+0x32/0x50
 [<0000000000bd0112>] default_idle_call+0x52/0x68
 [<00000000001cd0fa>] do_idle+0x102/0x188
 [<00000000001cd41e>] cpu_startup_entry+0x3e/0x48
 [<0000000000118c64>] smp_start_secondary+0x11c/0x130
 [<0000000000bd2016>] restart_int_handler+0x62/0x78
 [<0000000000000000>]           (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<000003ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
time, including the allocation of a buffer for the reply-data.

This means, struct bsg_job is not allocated separately anymore, but as part
of struct request allocation - similar to struct scsi_cmd. Reflect this in
the function names that used to handle creation/destruction of struct
bsg_job.

Reported-by: Steffen Maier <maier@linux.vnet.ibm.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc: <stable@vger.kernel.org> #4.11+
---
 block/bsg-lib.c         | 74 +++++++++++++++++++++++++++++--------------------
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..dd56d7460cb9 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -29,26 +29,25 @@
 #include <scsi/scsi_cmnd.h>
 
 /**
- * bsg_destroy_job - routine to teardown/delete a bsg job
+ * bsg_teardown_job - routine to teardown a bsg job
  * @job: bsg_job that is to be torn down
  */
-static void bsg_destroy_job(struct kref *kref)
+static void bsg_teardown_job(struct kref *kref)
 {
 	struct bsg_job *job = container_of(kref, struct bsg_job, kref);
 	struct request *rq = job->req;
 
-	blk_end_request_all(rq, BLK_STS_OK);
-
 	put_device(job->dev);	/* release reference for the request */
 
 	kfree(job->request_payload.sg_list);
 	kfree(job->reply_payload.sg_list);
-	kfree(job);
+
+	blk_end_request_all(rq, BLK_STS_OK);
 }
 
 void bsg_job_put(struct bsg_job *job)
 {
-	kref_put(&job->kref, bsg_destroy_job);
+	kref_put(&job->kref, bsg_teardown_job);
 }
 EXPORT_SYMBOL_GPL(bsg_job_put);
 
@@ -100,7 +99,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
  */
 static void bsg_softirq_done(struct request *rq)
 {
-	struct bsg_job *job = rq->special;
+	struct bsg_job *job = blk_mq_rq_to_pdu(rq);
 
 	bsg_job_put(job);
 }
@@ -122,33 +121,20 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
 }
 
 /**
- * bsg_create_job - create the bsg_job structure for the bsg request
+ * bsg_prepare_job - create the bsg_job structure for the bsg request
  * @dev: device that is being sent the bsg request
  * @req: BSG request that needs a job structure
  */
-static int bsg_create_job(struct device *dev, struct request *req)
+static int bsg_prepare_job(struct device *dev, struct request *req)
 {
 	struct request *rsp = req->next_rq;
-	struct request_queue *q = req->q;
 	struct scsi_request *rq = scsi_req(req);
-	struct bsg_job *job;
+	struct bsg_job *job = blk_mq_rq_to_pdu(req);
 	int ret;
 
-	BUG_ON(req->special);
-
-	job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
-	if (!job)
-		return -ENOMEM;
-
-	req->special = job;
-	job->req = req;
-	if (q->bsg_job_size)
-		job->dd_data = (void *)&job[1];
 	job->request = rq->cmd;
 	job->request_len = rq->cmd_len;
-	job->reply = rq->sense;
-	job->reply_len = SCSI_SENSE_BUFFERSIZE;	/* Size of sense buffer
-						 * allocated */
+
 	if (req->bio) {
 		ret = bsg_map_buffer(&job->request_payload, req);
 		if (ret)
@@ -187,7 +173,6 @@ static void bsg_request_fn(struct request_queue *q)
 {
 	struct device *dev = q->queuedata;
 	struct request *req;
-	struct bsg_job *job;
 	int ret;
 
 	if (!get_device(dev))
@@ -199,7 +184,7 @@ static void bsg_request_fn(struct request_queue *q)
 			break;
 		spin_unlock_irq(q->queue_lock);
 
-		ret = bsg_create_job(dev, req);
+		ret = bsg_prepare_job(dev, req);
 		if (ret) {
 			scsi_req(req)->result = ret;
 			blk_end_request_all(req, BLK_STS_OK);
@@ -207,8 +192,7 @@ static void bsg_request_fn(struct request_queue *q)
 			continue;
 		}
 
-		job = req->special;
-		ret = q->bsg_job_fn(job);
+		ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
 		spin_lock_irq(q->queue_lock);
 		if (ret)
 			break;
@@ -219,6 +203,35 @@ static void bsg_request_fn(struct request_queue *q)
 	spin_lock_irq(q->queue_lock);
 }
 
+static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t gfp)
+{
+	struct bsg_job *job = blk_mq_rq_to_pdu(req);
+	struct scsi_request *sreq = &job->sreq;
+
+	memset(job, 0, sizeof(*job));
+
+	scsi_req_init(sreq);
+	sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
+	sreq->sense = kzalloc(sreq->sense_len, gfp);
+	if (!sreq->sense)
+		return -ENOMEM;
+
+	job->req = req;
+	job->reply = sreq->sense;
+	job->reply_len = sreq->sense_len;
+	job->dd_data = job + 1;
+
+	return 0;
+}
+
+static void bsg_exit_rq(struct request_queue *q, struct request *req)
+{
+	struct bsg_job *job = blk_mq_rq_to_pdu(req);
+	struct scsi_request *sreq = &job->sreq;
+
+	kfree(sreq->sense);
+}
+
 /**
  * bsg_setup_queue - Create and add the bsg hooks so we can receive requests
  * @dev: device to attach bsg device to
@@ -235,7 +248,9 @@ struct request_queue *bsg_setup_queue(struct device *dev, char *name,
 	q = blk_alloc_queue(GFP_KERNEL);
 	if (!q)
 		return ERR_PTR(-ENOMEM);
-	q->cmd_size = sizeof(struct scsi_request);
+	q->cmd_size = sizeof(struct bsg_job) + dd_job_size;
+	q->init_rq_fn = bsg_init_rq;
+	q->exit_rq_fn = bsg_exit_rq;
 	q->request_fn = bsg_request_fn;
 
 	ret = blk_init_allocated_queue(q);
@@ -243,7 +258,6 @@ struct request_queue *bsg_setup_queue(struct device *dev, char *name,
 		goto out_cleanup_queue;
 
 	q->queuedata = dev;
-	q->bsg_job_size = dd_job_size;
 	q->bsg_job_fn = job_fn;
 	queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
 	queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f45f157b2910..6ae9aa6f93f0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -568,7 +568,6 @@ struct request_queue {
 
 #if defined(CONFIG_BLK_DEV_BSG)
 	bsg_job_fn		*bsg_job_fn;
-	int			bsg_job_size;
 	struct bsg_class_device bsg_dev;
 #endif
 
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index e34dde2da0ef..637a20cfb237 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -24,6 +24,7 @@
 #define _BLK_BSG_
 
 #include <linux/blkdev.h>
+#include <scsi/scsi_request.h>
 
 struct request;
 struct device;
@@ -37,6 +38,7 @@ struct bsg_buffer {
 };
 
 struct bsg_job {
+	struct scsi_request sreq;
 	struct device *dev;
 	struct request *req;
 
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer
  2017-08-23 23:57 ` [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer Benjamin Block
@ 2017-08-24  8:45   ` Christoph Hellwig
  2017-08-24 13:36     ` Benjamin Block
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2017-08-24  8:45 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Jens Axboe, Martin K . Petersen, linux-block, linux-kernel,
	linux-scsi, Johannes Thumshirn, Christoph Hellwig, Steffen Maier

>  /**
> - * bsg_destroy_job - routine to teardown/delete a bsg job
> + * bsg_teardown_job - routine to teardown a bsg job
>   * @job: bsg_job that is to be torn down
>   */
> -static void bsg_destroy_job(struct kref *kref)
> +static void bsg_teardown_job(struct kref *kref)

Why this rename?  The destroy name seems to be one of the most
common patterns for the kref_put callbacks.

Otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer
  2017-08-24  8:45   ` Christoph Hellwig
@ 2017-08-24 13:36     ` Benjamin Block
  0 siblings, 0 replies; 5+ messages in thread
From: Benjamin Block @ 2017-08-24 13:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Martin K . Petersen, linux-block, linux-kernel,
	linux-scsi, Johannes Thumshirn, Steffen Maier

On Thu, Aug 24, 2017 at 10:45:56AM +0200, Christoph Hellwig wrote:
> >  /**
> > - * bsg_destroy_job - routine to teardown/delete a bsg job
> > + * bsg_teardown_job - routine to teardown a bsg job
> >   * @job: bsg_job that is to be torn down
> >   */
> > -static void bsg_destroy_job(struct kref *kref)
> > +static void bsg_teardown_job(struct kref *kref)
> 
> Why this rename?  The destroy name seems to be one of the most
> common patterns for the kref_put callbacks.
>

Hmm, I did it mostly so it is symmetric with bsg_prepare_job() and it
doesn't really itself destroy the job-struct anymore. If there are other
thing amiss I can change that along with them, if it bothers poeple.


                                                    Beste Gr��e / Best regards,
                                                      - Benjamin Block

> 
> Otherwise this looks fine:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

-- 
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz     /        Gesch�ftsf�hrung: Dirk Wittkopp
Sitz der Gesellschaft: B�blingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG
  2017-08-23 23:57 [PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG Benjamin Block
  2017-08-23 23:57 ` [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer Benjamin Block
@ 2017-08-24 14:22 ` Jens Axboe
  1 sibling, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2017-08-24 14:22 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Martin K . Petersen, linux-block, linux-kernel, linux-scsi,
	Johannes Thumshirn, Christoph Hellwig, Steffen Maier

On 08/23/2017 05:57 PM, Benjamin Block wrote:
> Hello all,
> 
> This is the second try for fixing the regression in the BSG-interface that
> exists since v4.11 (for more infos see the first series).
> 
> I separated my other changes from the bug-fix so that it is easier to apply
> if judged good. I will rebase my cleanups I sent in v1 and send them when I
> get a bit more time. But the regression-fix is more important, so here's
> that.
> 
> I did some more tests on it than on v1, including some heavy parallel I/O
> on the same blk-queue using both BSG and the normal SCSI-stack at the same
> time (throwing some intentional bad commands in it too). That seemed to
> work all well enough - i.e. it didn't crash and got the expected results. I
> haven't done any external error-inject, but IMO that would be beyond the
> scope right now.
> 
> The fix is based on Christoph's idea, I discussed this with him off-list
> already.
> 
> I rebased the series on Jens' for-next.

Added for 4.13, thanks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-24 14:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-23 23:57 [PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG Benjamin Block
2017-08-23 23:57 ` [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer Benjamin Block
2017-08-24  8:45   ` Christoph Hellwig
2017-08-24 13:36     ` Benjamin Block
2017-08-24 14:22 ` [PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).