Linux block layer

Linux block layer
 help / color / mirror / Atom feed

* Re: BIO request larger than our storage device supports in linux kernel 4.x
From: Jens Axboe @ 2017-03-08 18:22 UTC (permalink / raw)
  To: Umesh Patel, Christoph Hellwig; +Cc: linux-block@vger.kernel.org
In-Reply-To: <CY1PR0401MB1353D1D8B5FB4E58AE1ABA5EE12E0@CY1PR0401MB1353.namprd04.prod.outlook.com>

On 03/08/2017 11:20 AM, Umesh Patel wrote:
>> You are doing it wrong, you should be writing this as a blk-mq driver.
>> Not only would that fix your issue, it would also solve a host of
>> other issues that I'm sure your driver has since it hasn't even been
>> reviewed.
> 
> [Umesh] : This has been working since 2.x kernel to till 3.18. It has
> this issue only in 4.x.

And had your driver been in the kernel, it would still be working. I already
told you what is wrong, go back and read the first reply.

-- 
Jens Axboe

^ permalink raw reply

* RE: BIO request larger than our storage device supports in linux kernel 4.x
From: Umesh Patel @ 2017-03-08 18:20 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig; +Cc: linux-block@vger.kernel.org
In-Reply-To: <211c3042-5f72-def6-4ede-510fa99ef683@kernel.dk>

You are doing it wrong, you should be writing this as a blk-mq driver.
Not only would that fix your issue, it would also solve a host of other iss=
ues that I'm sure your driver has since it hasn't even been reviewed.

[Umesh] : This has been working since 2.x kernel to till 3.18. It has this =
issue only in 4.x.

Thanks
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality N=
otice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or l=
egally privileged information of WDC and/or its affiliates, and are intende=
d solely for the use of the individual or entity to which they are addresse=
d. If you are not the intended recipient, any disclosure, copying, distribu=
tion or any action taken or omitted to be taken in reliance on it, is prohi=
bited. If you have received this e-mail in error, please notify the sender =
immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply

* Re: BIO request larger than our storage device supports in linux kernel 4.x
From: Jens Axboe @ 2017-03-08 18:13 UTC (permalink / raw)
  To: Umesh Patel, Christoph Hellwig; +Cc: linux-block@vger.kernel.org
In-Reply-To: <CY1PR0401MB1353F97FF898FB07AB7E5AD3E12E0@CY1PR0401MB1353.namprd04.prod.outlook.com>

On 03/08/2017 11:08 AM, Umesh Patel wrote:
> 
> Yes. This is non-open driver for our Flash based PCI drive for enterprise storage solution.

Please don't top post.

> Some snippet of queue related code is below
> 
> struct request_queue *queue;
> 
> blk_queue_max_hw_sectors(dev->osdev.queue, (dev->aggr_max_size >> KERNEL_SECTOR_SHIFT));
> blk_queue_max_segments(dev->osdev.queue, BLK_MAX_SEGMENTS);
> blk_queue_bounce_limit(dev->osdev.queue, BLK_BOUNCE_ANY);
> blk_queue_logical_block_size(dev->osdev.queue, dev->hardsect_size);
> blk_queue_physical_block_size(dev->osdev.queue, dev->hardsect_size);
> 
> dev->osdev.queue = blk_alloc_queue(GFP_KERNEL);
> 
> blk_queue_make_request(dev->osdev.queue, fmm_bdev_make_request_wrapper);
> 
> Here queue is part of our internal structure struct fmm_bdev *dev

You are doing it wrong, you should be writing this as a blk-mq driver.
Not only would that fix your issue, it would also solve a host of other
issues that I'm sure your driver has since it hasn't even been
reviewed.

Generally, I have very little interest in providing support for non
open drivers. Scratch that, no interest.

-- 
Jens Axboe

^ permalink raw reply

* RE: BIO request larger than our storage device supports in linux kernel 4.x
From: Umesh Patel @ 2017-03-08 18:08 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig; +Cc: linux-block@vger.kernel.org
In-Reply-To: <04c47362-2e9d-b32a-fb98-14113b45b6f7@kernel.dk>


Yes. This is non-open driver for our Flash based PCI drive for enterprise s=
torage solution.

Some snippet of queue related code is below

struct request_queue *queue;

blk_queue_max_hw_sectors(dev->osdev.queue, (dev->aggr_max_size >> KERNEL_SE=
CTOR_SHIFT));
blk_queue_max_segments(dev->osdev.queue, BLK_MAX_SEGMENTS);
blk_queue_bounce_limit(dev->osdev.queue, BLK_BOUNCE_ANY);
blk_queue_logical_block_size(dev->osdev.queue, dev->hardsect_size);
blk_queue_physical_block_size(dev->osdev.queue, dev->hardsect_size);

dev->osdev.queue =3D blk_alloc_queue(GFP_KERNEL);

blk_queue_make_request(dev->osdev.queue, fmm_bdev_make_request_wrapper);

Here queue is part of our internal structure struct fmm_bdev *dev




-----Original Message-----
From: Jens Axboe [mailto:axboe@kernel.dk] =

Sent: Wednesday, March 8, 2017 8:35 PM
To: Umesh Patel; linux-block@vger.kernel.org
Subject: Re: BIO request larger than our storage device supports in linux k=
ernel 4.x

On 03/08/2017 02:34 AM, Umesh Patel wrote:
> Hello,
> =

> We are registering BIO size of our storage device through linux kernel =

> API blk_queue_max_hw_sectors worth of 104K. So max size of BIO that =

> our block device can handle is 104 KB but kernel block layer is =

> sending 256 KB worth of BIO which is more than we are supporting.
> This issue we are observing in 4.4 and also in latest 4.10.1 kernel =

> release. Until this we had not seen this issue.
> =

> From the kernel source code history I came to below things.  4.1 =

> kernel version onwards "nr_pages =3D min(sdio->pages_in_io, =

> bio_get_nr_vecs(map_bh->b_bdev))" has been removed which
> (bio_get_nr_vecs) was considering max_sectors_kb of queue which was
> 104 KB for our device.
> =

> Presently new code is something like "nr_pages =3D =

> min(sdio->pages_in_io, BIO_MAX_PAGES)" which is sending 256 KB
> (BIO_MAX_PAGES) of BIO size to our device which we are not supporting.
> =

> =

> So from some documentation I found out that 256 KB of bio size is the =

> fix (from 4.x kernel) and that has to support by any drive.  Please =

> let me know is there any way to change BIO size worth of 104 KB from
> 256 KB or any other way to register our BIO size with kernel or any =

> other area to look in to ?.

I'm assuming this is some not-open driver, and I'm also assuming that you a=
re bypassing the proper API and using make_request_fn. In which case you ar=
e missing a call to blk_queue_split().

--
Jens Axboe

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality N=
otice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or l=
egally privileged information of WDC and/or its affiliates, and are intende=
d solely for the use of the individual or entity to which they are addresse=
d. If you are not the intended recipient, any disclosure, copying, distribu=
tion or any action taken or omitted to be taken in reliance on it, is prohi=
bited. If you have received this e-mail in error, please notify the sender =
immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply

* Re: [PATCH v2] cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
From: Jens Axboe @ 2017-03-08 17:56 UTC (permalink / raw)
  To: Hou Tao, linux-block; +Cc: jack, jmoyer, stable, vgoyal
In-Reply-To: <1488975415-40417-1-git-send-email-houtao1@huawei.com>

On 03/08/2017 05:16 AM, Hou Tao wrote:
> When adding a cfq_group into the cfq service tree, we use CFQ_IDLE_DELAY
> as the delay of cfq_group's vdisktime if there have been other cfq_groups
> already.
> 
> When cfq is under iops mode, commit 9a7f38c42c2b ("cfq-iosched: Convert
> from jiffies to nanoseconds") could result in a large iops delay and
> lead to an abnormal io schedule delay for the added cfq_group. To fix
> it, we just need to revert to the old CFQ_IDLE_DELAY value: HZ / 5
> when iops mode is enabled.
> 
> Despite having the same value, the delay of a cfq_queue in idle class
> and the delay of cfq_queue are different things, so I define two new
> macros for the delay of a cfq_group under time-slice mode and IOPs mode.

Added, thanks.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH] axonram: Fix gendisk handling
From: Jens Axboe @ 2017-03-08 17:55 UTC (permalink / raw)
  To: Jan Kara, Michael Ellerman; +Cc: linuxppc-dev, linux-block, Dan Carpenter
In-Reply-To: <20170308135605.4250-1-jack@suse.cz>

On 03/08/2017 06:56 AM, Jan Kara wrote:
> It is invalid to call del_gendisk() when disk->queue is NULL. Fix error
> handling in axon_ram_probe() to avoid doing that.
> 
> Also del_gendisk() does not drop a reference to gendisk allocated by
> alloc_disk(). That has to be done by put_disk(). Add that call where
> needed.

Applied, thanks.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 0/4] block: Fixes for bdi handling
From: Omar Sandoval @ 2017-03-08 17:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Tejun Heo, linux-block, Dan Williams, Arthur Marsh,
	linux-scsi
In-Reply-To: <20170308165051.GA8598@vader>

On Wed, Mar 08, 2017 at 08:50:51AM -0800, Omar Sandoval wrote:
> On Wed, Mar 08, 2017 at 05:48:30PM +0100, Jan Kara wrote:
> > Hi!
> > 
> > patches in this series fix the most urgent bugs that were introduced by commit
> > 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" and by
> > 0dba1314d4f8 "scsi, block: fix duplicate bdi name registration crashes".
> > In fact before these commits we had a different set of problems in the
> > code but they were less visible :).
> > 
> > I'm still waiting for test confirmation from Omar and Arthur Marsh who reported
> > issues but I'm not able to hit any problem anymore in my testing.  I think it
> > would be nice to get the patches to rc2 so to speed up things I'm posting the
> > patches now so that review can happen in parallel with the testing.
> > 
> > Other BDI handling fixes I have in my queue can wait a bit more since they are
> > either theoretical or long-standing issues. So I'll repost them once these four
> > are sorted out.
> > 
> > 								Honza
> 
> Applying patches 1 and 2 indeed fixed my virtio-scsi problem. I'll apply
> the whole series and test that my sd/sr test cases still work.

Yup, everything is working here. For the series:

Tested-by: Omar Sandoval <osandov@fb.com>

^ permalink raw reply

* Re: [PATCH 4/4] Revert "scsi, block: fix duplicate bdi name registration crashes"
From: Dan Williams @ 2017-03-08 17:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Tejun Heo, linux-block, Omar Sandoval, Arthur Marsh,
	linux-scsi
In-Reply-To: <20170308164834.14302-5-jack@suse.cz>

On Wed, Mar 8, 2017 at 8:48 AM, Jan Kara <jack@suse.cz> wrote:
> This reverts commit 0dba1314d4f81115dce711292ec7981d17231064. It causes
> leaking of device numbers for SCSI when SCSI registers multiple gendisks
> for one request_queue in succession. It can be easily reproduced using
> Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
> Furthermore the protection provided by this commit is not needed anymore
> as the problem it was fixing got also fixed by commit 165a5e22fafb
> "block: Move bdi_unregister() to del_gendisk()".
>
> [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Acked-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply

* Re: [PATCH 0/4] block: Fixes for bdi handling
From: Jens Axboe @ 2017-03-08 16:55 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tejun Heo, linux-block, Dan Williams, Omar Sandoval, Arthur Marsh,
	linux-scsi
In-Reply-To: <20170308164834.14302-1-jack@suse.cz>

On 03/08/2017 09:48 AM, Jan Kara wrote:
> Hi!
> 
> patches in this series fix the most urgent bugs that were introduced by commit
> 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" and by
> 0dba1314d4f8 "scsi, block: fix duplicate bdi name registration crashes".
> In fact before these commits we had a different set of problems in the
> code but they were less visible :).
> 
> I'm still waiting for test confirmation from Omar and Arthur Marsh who reported
> issues but I'm not able to hit any problem anymore in my testing.  I think it
> would be nice to get the patches to rc2 so to speed up things I'm posting the
> patches now so that review can happen in parallel with the testing.
> 
> Other BDI handling fixes I have in my queue can wait a bit more since they are
> either theoretical or long-standing issues. So I'll repost them once these four
> are sorted out.

Agree, we'll need to get this sorted for -rc2, thanks a lot for looking
into it. I'll queue these up for testing as well, and hopefully ship
them off tomorrow (or Friday, at the latest).

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 0/4] block: Fixes for bdi handling
From: Omar Sandoval @ 2017-03-08 16:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Tejun Heo, linux-block, Dan Williams, Arthur Marsh,
	linux-scsi
In-Reply-To: <20170308164834.14302-1-jack@suse.cz>

On Wed, Mar 08, 2017 at 05:48:30PM +0100, Jan Kara wrote:
> Hi!
> 
> patches in this series fix the most urgent bugs that were introduced by commit
> 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" and by
> 0dba1314d4f8 "scsi, block: fix duplicate bdi name registration crashes".
> In fact before these commits we had a different set of problems in the
> code but they were less visible :).
> 
> I'm still waiting for test confirmation from Omar and Arthur Marsh who reported
> issues but I'm not able to hit any problem anymore in my testing.  I think it
> would be nice to get the patches to rc2 so to speed up things I'm posting the
> patches now so that review can happen in parallel with the testing.
> 
> Other BDI handling fixes I have in my queue can wait a bit more since they are
> either theoretical or long-standing issues. So I'll repost them once these four
> are sorted out.
> 
> 								Honza

Applying patches 1 and 2 indeed fixed my virtio-scsi problem. I'll apply
the whole series and test that my sd/sr test cases still work.

^ permalink raw reply

* [PATCH 4/4] Revert "scsi, block: fix duplicate bdi name registration crashes"
From: Jan Kara @ 2017-03-08 16:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Dan Williams, Omar Sandoval, Arthur Marsh,
	linux-scsi, Jan Kara
In-Reply-To: <20170308164834.14302-1-jack@suse.cz>

This reverts commit 0dba1314d4f81115dce711292ec7981d17231064. It causes
leaking of device numbers for SCSI when SCSI registers multiple gendisks
for one request_queue in succession. It can be easily reproduced using
Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
Furthermore the protection provided by this commit is not needed anymore
as the problem it was fixing got also fixed by commit 165a5e22fafb
"block: Move bdi_unregister() to del_gendisk()".

[1]: http://marc.info/?l=linux-block&m=148554717109098&w=2

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-core.c       |  2 --
 block/genhd.c          | 21 ---------------------
 drivers/scsi/sd.c      | 41 ++++++++---------------------------------
 include/linux/blkdev.h |  1 -
 include/linux/genhd.h  |  8 --------
 5 files changed, 8 insertions(+), 65 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 1086dac8724c..a76895c9776d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -578,8 +578,6 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
-	put_disk_devt(q->disk_devt);
-
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/genhd.c b/block/genhd.c
index 94f323842b52..a9c516a8b37d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -572,20 +572,6 @@ static void register_disk(struct device *parent, struct gendisk *disk)
 	disk_part_iter_exit(&piter);
 }
 
-void put_disk_devt(struct disk_devt *disk_devt)
-{
-	if (disk_devt && atomic_dec_and_test(&disk_devt->count))
-		disk_devt->release(disk_devt);
-}
-EXPORT_SYMBOL(put_disk_devt);
-
-void get_disk_devt(struct disk_devt *disk_devt)
-{
-	if (disk_devt)
-		atomic_inc(&disk_devt->count);
-}
-EXPORT_SYMBOL(get_disk_devt);
-
 /**
  * device_add_disk - add partitioning information to kernel list
  * @parent: parent device for the disk
@@ -626,13 +612,6 @@ void device_add_disk(struct device *parent, struct gendisk *disk)
 
 	disk_alloc_events(disk);
 
-	/*
-	 * Take a reference on the devt and assign it to queue since it
-	 * must not be reallocated while the bdi is registered
-	 */
-	disk->queue->disk_devt = disk->disk_devt;
-	get_disk_devt(disk->disk_devt);
-
 	/* Register BDI before referencing it from bdev */
 	bdi = disk->queue->backing_dev_info;
 	bdi_register_owner(bdi, disk_to_dev(disk));
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c7839f6c35cc..d277e8620e3e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3075,23 +3075,6 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 	put_device(&sdkp->dev);
 }
 
-struct sd_devt {
-	int idx;
-	struct disk_devt disk_devt;
-};
-
-static void sd_devt_release(struct disk_devt *disk_devt)
-{
-	struct sd_devt *sd_devt = container_of(disk_devt, struct sd_devt,
-			disk_devt);
-
-	spin_lock(&sd_index_lock);
-	ida_remove(&sd_index_ida, sd_devt->idx);
-	spin_unlock(&sd_index_lock);
-
-	kfree(sd_devt);
-}
-
 /**
  *	sd_probe - called during driver initialization and whenever a
  *	new scsi device is attached to the system. It is called once
@@ -3113,7 +3096,6 @@ static void sd_devt_release(struct disk_devt *disk_devt)
 static int sd_probe(struct device *dev)
 {
 	struct scsi_device *sdp = to_scsi_device(dev);
-	struct sd_devt *sd_devt;
 	struct scsi_disk *sdkp;
 	struct gendisk *gd;
 	int index;
@@ -3139,13 +3121,9 @@ static int sd_probe(struct device *dev)
 	if (!sdkp)
 		goto out;
 
-	sd_devt = kzalloc(sizeof(*sd_devt), GFP_KERNEL);
-	if (!sd_devt)
-		goto out_free;
-
 	gd = alloc_disk(SD_MINORS);
 	if (!gd)
-		goto out_free_devt;
+		goto out_free;
 
 	do {
 		if (!ida_pre_get(&sd_index_ida, GFP_KERNEL))
@@ -3161,11 +3139,6 @@ static int sd_probe(struct device *dev)
 		goto out_put;
 	}
 
-	atomic_set(&sd_devt->disk_devt.count, 1);
-	sd_devt->disk_devt.release = sd_devt_release;
-	sd_devt->idx = index;
-	gd->disk_devt = &sd_devt->disk_devt;
-
 	error = sd_format_disk_name("sd", index, gd->disk_name, DISK_NAME_LEN);
 	if (error) {
 		sdev_printk(KERN_WARNING, sdp, "SCSI disk (sd) name length exceeded.\n");
@@ -3205,12 +3178,11 @@ static int sd_probe(struct device *dev)
 	return 0;
 
  out_free_index:
-	put_disk_devt(&sd_devt->disk_devt);
-	sd_devt = NULL;
+	spin_lock(&sd_index_lock);
+	ida_remove(&sd_index_ida, index);
+	spin_unlock(&sd_index_lock);
  out_put:
 	put_disk(gd);
- out_free_devt:
-	kfree(sd_devt);
  out_free:
 	kfree(sdkp);
  out:
@@ -3271,7 +3243,10 @@ static void scsi_disk_release(struct device *dev)
 	struct scsi_disk *sdkp = to_scsi_disk(dev);
 	struct gendisk *disk = sdkp->disk;
 	
-	put_disk_devt(disk->disk_devt);
+	spin_lock(&sd_index_lock);
+	ida_remove(&sd_index_ida, sdkp->index);
+	spin_unlock(&sd_index_lock);
+
 	disk->private_data = NULL;
 	put_disk(disk);
 	put_device(&sdkp->device->sdev_gendev);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 796016e63c1d..5a7da607ca04 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -435,7 +435,6 @@ struct request_queue {
 	struct delayed_work	delay_work;
 
 	struct backing_dev_info	*backing_dev_info;
-	struct disk_devt	*disk_devt;
 
 	/*
 	 * The queue owner gets to use this for whatever they like.
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index a999d281a2f1..76f39754e7b0 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -167,13 +167,6 @@ struct blk_integrity {
 };
 
 #endif	/* CONFIG_BLK_DEV_INTEGRITY */
-struct disk_devt {
-	atomic_t count;
-	void (*release)(struct disk_devt *disk_devt);
-};
-
-void put_disk_devt(struct disk_devt *disk_devt);
-void get_disk_devt(struct disk_devt *disk_devt);
 
 struct gendisk {
 	/* major, first_minor and minors are input parameters only,
@@ -183,7 +176,6 @@ struct gendisk {
 	int first_minor;
 	int minors;                     /* maximum number of minors, =1 for
                                          * disks that can't be partitioned. */
-	struct disk_devt *disk_devt;
 
 	char disk_name[DISK_NAME_LEN];	/* name of major driver */
 	char *(*devnode)(struct gendisk *gd, umode_t *mode);
-- 
2.10.2

^ permalink raw reply related

* [PATCH 3/4] block: Make del_gendisk() safer for disks without queues
From: Jan Kara @ 2017-03-08 16:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Dan Williams, Omar Sandoval, Arthur Marsh,
	linux-scsi, Jan Kara
In-Reply-To: <20170308164834.14302-1-jack@suse.cz>

Commit 165a5e22fafb "block: Move bdi_unregister() to del_gendisk()"
added disk->queue dereference to del_gendisk(). Although del_gendisk()
is not supposed to be called without disk->queue valid and
blk_unregister_queue() warns in that case, this change will make it oops
instead. Return to the old more robust behavior of just warning when
del_gendisk() gets called for gendisk with disk->queue being NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index b26a5ea115d0..94f323842b52 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -681,12 +681,16 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-	/*
-	 * Unregister bdi before releasing device numbers (as they can get
-	 * reused and we'd get clashes in sysfs).
-	 */
-	bdi_unregister(disk->queue->backing_dev_info);
-	blk_unregister_queue(disk);
+	if (disk->queue) {
+		/*
+		 * Unregister bdi before releasing device numbers (as they can
+		 * get reused and we'd get clashes in sysfs).
+		 */
+		bdi_unregister(disk->queue->backing_dev_info);
+		blk_unregister_queue(disk);
+	} else {
+		WARN_ON(1);
+	}
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
 	part_stat_set_all(&disk->part0, 0);
-- 
2.10.2

^ permalink raw reply related

* [PATCH 2/4] bdi: Fix use-after-free in wb_congested_put()
From: Jan Kara @ 2017-03-08 16:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Dan Williams, Omar Sandoval, Arthur Marsh,
	linux-scsi, Jan Kara
In-Reply-To: <20170308164834.14302-1-jack@suse.cz>

bdi_writeback_congested structures get created for each blkcg and bdi
regardless whether bdi is registered or not. When they are created in
unregistered bdi and the request queue (and thus bdi) is then destroyed
while blkg still holds reference to bdi_writeback_congested structure,
this structure will be referencing freed bdi and last wb_congested_put()
will try to remove the structure from already freed bdi.

With commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()", SCSI started to destroy bdis without calling
bdi_unregister() first (previously it was calling bdi_unregister() even
for unregistered bdis) and thus the code detaching
bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
started hitting this use-after-free bug. It is enough to boot a KVM
instance with virtio-scsi device to trigger this behavior.

Fix the problem by detaching bdi_writeback_congested structures in
bdi_exit() instead of bdi_unregister(). This is also more logical as
they can get attached to bdi regardless whether it ever got registered
or not.

Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/backing-dev.c | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6ac932210f56..c6f2a37028c2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -683,30 +683,18 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 {
 	struct radix_tree_iter iter;
-	struct rb_node *rbn;
 	void **slot;
 
 	WARN_ON(test_bit(WB_registered, &bdi->wb.state));
 
 	spin_lock_irq(&cgwb_lock);
-
 	radix_tree_for_each_slot(slot, &bdi->cgwb_tree, &iter, 0)
 		cgwb_kill(*slot);
-
-	while ((rbn = rb_first(&bdi->cgwb_congested_tree))) {
-		struct bdi_writeback_congested *congested =
-			rb_entry(rbn, struct bdi_writeback_congested, rb_node);
-
-		rb_erase(rbn, &bdi->cgwb_congested_tree);
-		congested->bdi = NULL;	/* mark @congested unlinked */
-	}
-
 	spin_unlock_irq(&cgwb_lock);
 
 	/*
-	 * All cgwb's and their congested states must be shutdown and
-	 * released before returning.  Drain the usage counter to wait for
-	 * all cgwb's and cgwb_congested's ever created on @bdi.
+	 * All cgwb's must be shutdown and released before returning.  Drain
+	 * the usage counter to wait for all cgwb's ever created on @bdi.
 	 */
 	atomic_dec(&bdi->usage_cnt);
 	wait_event(cgwb_release_wait, !atomic_read(&bdi->usage_cnt));
@@ -754,6 +742,21 @@ void wb_blkcg_offline(struct blkcg *blkcg)
 	spin_unlock_irq(&cgwb_lock);
 }
 
+static void cgwb_bdi_exit(struct backing_dev_info *bdi)
+{
+	struct rb_node *rbn;
+
+	spin_lock_irq(&cgwb_lock);
+	while ((rbn = rb_first(&bdi->cgwb_congested_tree))) {
+		struct bdi_writeback_congested *congested =
+			rb_entry(rbn, struct bdi_writeback_congested, rb_node);
+
+		rb_erase(rbn, &bdi->cgwb_congested_tree);
+		congested->bdi = NULL;	/* mark @congested unlinked */
+	}
+	spin_unlock_irq(&cgwb_lock);
+}
+
 #else	/* CONFIG_CGROUP_WRITEBACK */
 
 static int cgwb_bdi_init(struct backing_dev_info *bdi)
@@ -774,7 +777,9 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 	return 0;
 }
 
-static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
+static void cgwb_bdi_destroy(struct backing_dev_info *bdi) { }
+
+static void cgwb_bdi_exit(struct backing_dev_info *bdi)
 {
 	wb_congested_put(bdi->wb_congested);
 }
@@ -905,6 +910,7 @@ static void bdi_exit(struct backing_dev_info *bdi)
 {
 	WARN_ON_ONCE(bdi->dev);
 	wb_exit(&bdi->wb);
+	cgwb_bdi_exit(bdi);
 }
 
 static void release_bdi(struct kref *ref)
-- 
2.10.2

^ permalink raw reply related

* [PATCH 1/4] block: Allow bdi re-registration
From: Jan Kara @ 2017-03-08 16:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Dan Williams, Omar Sandoval, Arthur Marsh,
	linux-scsi, Jan Kara
In-Reply-To: <20170308164834.14302-1-jack@suse.cz>

SCSI can call device_add_disk() several times for one request queue when
a device in unbound and bound, creating new gendisk each time. This will
lead to bdi being repeatedly registered and unregistered. This was not a
big problem until commit 165a5e22fafb "block: Move bdi_unregister() to
del_gendisk()" since bdi was only registered repeatedly (bdi_register()
handles repeated calls fine, only we ended up leaking reference to
gendisk due to overwriting bdi->owner) but unregistered only in
blk_cleanup_queue() which didn't get called repeatedly. After
165a5e22fafb we were doing correct bdi_register() - bdi_unregister()
cycles however bdi_unregister() is not prepared for it. So make sure
bdi_unregister() cleans up bdi in such a way that it is prepared for
a possible following bdi_register() call.

An easy way to provoke this behavior is to enable
CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
scsi disk which immediately hangs without this fix.

Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/backing-dev.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6d861d090e9f..6ac932210f56 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -710,6 +710,11 @@ static void cgwb_bdi_destroy(struct backing_dev_info *bdi)
 	 */
 	atomic_dec(&bdi->usage_cnt);
 	wait_event(cgwb_release_wait, !atomic_read(&bdi->usage_cnt));
+	/*
+	 * Grab back our reference so that we hold it when @bdi gets
+	 * re-registered.
+	 */
+	atomic_inc(&bdi->usage_cnt);
 }

 /**
@@ -857,6 +862,8 @@ int bdi_register_owner(struct backing_dev_info *bdi, struct device *owner)
 			MINOR(owner->devt));
 	if (rc)
 		return rc;
+	/* Leaking owner reference... */
+	WARN_ON(bdi->owner);
 	bdi->owner = owner;
 	get_device(owner);
 	return 0;
-- 
2.10.2

^ permalink raw reply related

* [PATCH 0/4] block: Fixes for bdi handling
From: Jan Kara @ 2017-03-08 16:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, linux-block, Dan Williams, Omar Sandoval, Arthur Marsh,
	linux-scsi, Jan Kara

Hi!

patches in this series fix the most urgent bugs that were introduced by commit
165a5e22fafb "block: Move bdi_unregister() to del_gendisk()" and by
0dba1314d4f8 "scsi, block: fix duplicate bdi name registration crashes".
In fact before these commits we had a different set of problems in the
code but they were less visible :).

I'm still waiting for test confirmation from Omar and Arthur Marsh who reported
issues but I'm not able to hit any problem anymore in my testing.  I think it
would be nice to get the patches to rc2 so to speed up things I'm posting the
patches now so that review can happen in parallel with the testing.

Other BDI handling fixes I have in my queue can wait a bit more since they are
either theoretical or long-standing issues. So I'll repost them once these four
are sorted out.

								Honza

^ permalink raw reply

* Re: [PATCH 5/8] nowait aio: return on congested block device
From: Jens Axboe @ 2017-03-08 16:17 UTC (permalink / raw)
  To: Goldwyn Rodrigues, Sagi Grimberg, jack
  Cc: hch, linux-fsdevel, linux-block, linux-btrfs, linux-ext4,
	linux-xfs, Goldwyn Rodrigues
In-Reply-To: <44cc02b1-f7ac-cbc6-6f09-726ae8225358@suse.de>

On 03/08/2017 08:00 AM, Goldwyn Rodrigues wrote:
> 
> 
> On 03/08/2017 01:03 AM, Sagi Grimberg wrote:
>>
>>> -        if (likely(blk_queue_enter(q, false) == 0)) {
>>> +        if (likely(blk_queue_enter(q, bio_flagged(bio, BIO_NOWAIT))
>>> == 0)) {
>>>              ret = q->make_request_fn(q, bio);
>>
>> I think that for ->make_request to not block we'd need to set
>> BLK_MQ_REQ_NOWAIT in blk_mq_alloc_data to avoid blocking on a tag
>> allocation.
>>
>> Something like the untested addition below:
> 
> I did that in the first series, but there are too many reasons to block
> in blk-mq [1]. I dropped blk-mq work in v2.

That's complete nonsense, there are no more places in blk-mq that will
block that in the legacy path. Most of the examples from your URL:

> [1] https://patchwork.kernel.org/patch/9571051/

are not blk-mq, but writeback throttling, and drivers that explicitly
hook into ->make_request_fn.

As others have mentioned, it's a total non-starter to focus on the
deprecated IO path and just ignore the new one. Back to the drawing
board.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 5/8] nowait aio: return on congested block device
From: Christoph Hellwig @ 2017-03-08 15:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: Goldwyn Rodrigues, Sagi Grimberg, jack, hch, linux-fsdevel,
	linux-block, linux-btrfs, linux-ext4, linux-xfs,
	Goldwyn Rodrigues
In-Reply-To: <20170308152806.GA9288@quack2.suse.cz>

On Wed, Mar 08, 2017 at 04:28:06PM +0100, Jan Kara wrote:
> Well, that's not really good. If we cannot support this for both blk-mq and
> legacy block layer the feature will not be usable. So please work on blk-mq
> support as well.

Exactly.  In addition to that anything implementing a feature for the
legacy request request code but not blk-mq is grounds for an instant
NAK.

^ permalink raw reply

* Re: [PATCH] f2fs: allocate a bio for discarding when actually issuing it
From: Christoph Hellwig @ 2017-03-08 15:03 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-kernel, linux-fsdevel, linux-f2fs-devel, linux-block
In-Reply-To: <20170308023333.15034-1-jaegeuk@kernel.org>

On Tue, Mar 07, 2017 at 06:33:33PM -0800, Jaegeuk Kim wrote:
> Let's allocate a bio when issuing discard commands later.

Does this solve the issue with your queue stalls?

> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/f2fs.h    |   4 +-
>  fs/f2fs/segment.c | 113 ++++++++++++++++++++++++++++--------------------------
>  2 files changed, 62 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index a58c2e43bd2a..870bb4d9bc65 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -197,10 +197,12 @@ enum {
>  struct discard_cmd {
>  	struct list_head list;		/* command list */
>  	struct completion wait;		/* compleation */
> +	struct block_device *bdev;	/* bdev */
>  	block_t lstart;			/* logical start address */
> +	block_t start;			/* actual start address in dev */
>  	block_t len;			/* length */
> -	struct bio *bio;		/* bio */
>  	int state;			/* state */
> +	int error;			/* bio error */
>  };
>  
>  struct discard_cmd_control {
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 50c65cc4645a..d8f9e6c895cd 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -636,7 +636,8 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno)
>  }
>  
>  static void __add_discard_cmd(struct f2fs_sb_info *sbi,
> -			struct bio *bio, block_t lstart, block_t len)
> +		struct block_device *bdev, block_t lstart,
> +		block_t start, block_t len)
>  {
>  	struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
>  	struct list_head *cmd_list = &(dcc->discard_cmd_list);
> @@ -644,9 +645,9 @@ static void __add_discard_cmd(struct f2fs_sb_info *sbi,
>  
>  	dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS);
>  	INIT_LIST_HEAD(&dc->list);
> -	dc->bio = bio;
> -	bio->bi_private = dc;
> +	dc->bdev = bdev;
>  	dc->lstart = lstart;
> +	dc->start = start;
>  	dc->len = len;
>  	dc->state = D_PREP;
>  	init_completion(&dc->wait);
> @@ -658,22 +659,66 @@ static void __add_discard_cmd(struct f2fs_sb_info *sbi,
>  
>  static void __remove_discard_cmd(struct f2fs_sb_info *sbi, struct discard_cmd *dc)
>  {
> -	int err = dc->bio->bi_error;
> -
>  	if (dc->state == D_DONE)
>  		atomic_dec(&(SM_I(sbi)->dcc_info->submit_discard));
>  
> -	if (err == -EOPNOTSUPP)
> -		err = 0;
> +	if (dc->error == -EOPNOTSUPP)
> +		dc->error = 0;
>  
> -	if (err)
> +	if (dc->error)
>  		f2fs_msg(sbi->sb, KERN_INFO,
> -				"Issue discard failed, ret: %d", err);
> -	bio_put(dc->bio);
> +				"Issue discard failed, ret: %d", dc->error);
>  	list_del(&dc->list);
>  	kmem_cache_free(discard_cmd_slab, dc);
>  }
>  
> +static void f2fs_submit_discard_endio(struct bio *bio)
> +{
> +	struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
> +
> +	complete(&dc->wait);
> +	dc->error = bio->bi_error;
> +	dc->state = D_DONE;
> +	bio_put(bio);
> +}
> +
> +/* this function is copied from blkdev_issue_discard from block/blk-lib.c */
> +static int __submit_discard_cmd(struct discard_cmd *dc)
> +{
> +	struct bio *bio = NULL;
> +	int err;
> +
> +	err = __blkdev_issue_discard(dc->bdev,
> +				SECTOR_FROM_BLOCK(dc->start),
> +				SECTOR_FROM_BLOCK(dc->len),
> +				GFP_NOFS, 0, &bio);
> +	if (!err && bio) {
> +		bio->bi_private = dc;
> +		bio->bi_end_io = f2fs_submit_discard_endio;
> +		bio->bi_opf |= REQ_SYNC;
> +		submit_bio(bio);
> +	}
> +	dc->state = D_SUBMIT;
> +	return err;
> +}
> +
> +static int __queue_discard_cmd(struct f2fs_sb_info *sbi,
> +		struct block_device *bdev, block_t blkstart, block_t blklen)
> +{
> +	block_t lblkstart = blkstart;
> +
> +	trace_f2fs_issue_discard(bdev, blkstart, blklen);
> +
> +	if (sbi->s_ndevs) {
> +		int devi = f2fs_target_device_index(sbi, blkstart);
> +
> +		blkstart -= FDEV(devi).start_blk;
> +	}
> +	__add_discard_cmd(sbi, bdev, lblkstart, blkstart, blklen);
> +	wake_up(&SM_I(sbi)->dcc_info->discard_wait_queue);
> +	return 0;
> +}
> +
>  /* This should be covered by global mutex, &sit_i->sentry_lock */
>  void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
>  {
> @@ -690,8 +735,7 @@ void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
>  
>  		if (blkaddr == NULL_ADDR) {
>  			if (dc->state == D_PREP) {
> -				dc->state = D_SUBMIT;
> -				submit_bio(dc->bio);
> +				__submit_discard_cmd(dc);
>  				atomic_inc(&dcc->submit_discard);
>  			}
>  			continue;
> @@ -716,14 +760,6 @@ void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
>  	mutex_unlock(&dcc->cmd_lock);
>  }
>  
> -static void f2fs_submit_discard_endio(struct bio *bio)
> -{
> -	struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
> -
> -	complete(&dc->wait);
> -	dc->state = D_DONE;
> -}
> -
>  static int issue_discard_thread(void *data)
>  {
>  	struct f2fs_sb_info *sbi = data;
> @@ -742,8 +778,7 @@ static int issue_discard_thread(void *data)
>  	mutex_lock(&dcc->cmd_lock);
>  	list_for_each_entry_safe(dc, tmp, cmd_list, list) {
>  		if (dc->state == D_PREP) {
> -			dc->state = D_SUBMIT;
> -			submit_bio(dc->bio);
> +			__submit_discard_cmd(dc);
>  			atomic_inc(&dcc->submit_discard);
>  			if (iter++ > DISCARD_ISSUE_RATE)
>  				break;
> @@ -763,36 +798,6 @@ static int issue_discard_thread(void *data)
>  	goto repeat;
>  }
>  
> -
> -/* this function is copied from blkdev_issue_discard from block/blk-lib.c */
> -static int __f2fs_issue_discard_async(struct f2fs_sb_info *sbi,
> -		struct block_device *bdev, block_t blkstart, block_t blklen)
> -{
> -	struct bio *bio = NULL;
> -	block_t lblkstart = blkstart;
> -	int err;
> -
> -	trace_f2fs_issue_discard(bdev, blkstart, blklen);
> -
> -	if (sbi->s_ndevs) {
> -		int devi = f2fs_target_device_index(sbi, blkstart);
> -
> -		blkstart -= FDEV(devi).start_blk;
> -	}
> -	err = __blkdev_issue_discard(bdev,
> -				SECTOR_FROM_BLOCK(blkstart),
> -				SECTOR_FROM_BLOCK(blklen),
> -				GFP_NOFS, 0, &bio);
> -	if (!err && bio) {
> -		bio->bi_end_io = f2fs_submit_discard_endio;
> -		bio->bi_opf |= REQ_SYNC;
> -
> -		__add_discard_cmd(sbi, bio, lblkstart, blklen);
> -		wake_up(&SM_I(sbi)->dcc_info->discard_wait_queue);
> -	}
> -	return err;
> -}
> -
>  #ifdef CONFIG_BLK_DEV_ZONED
>  static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
>  		struct block_device *bdev, block_t blkstart, block_t blklen)
> @@ -816,7 +821,7 @@ static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
>  	case BLK_ZONE_TYPE_CONVENTIONAL:
>  		if (!blk_queue_discard(bdev_get_queue(bdev)))
>  			return 0;
> -		return __f2fs_issue_discard_async(sbi, bdev, lblkstart, blklen);
> +		return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
>  	case BLK_ZONE_TYPE_SEQWRITE_REQ:
>  	case BLK_ZONE_TYPE_SEQWRITE_PREF:
>  		sector = SECTOR_FROM_BLOCK(blkstart);
> @@ -848,7 +853,7 @@ static int __issue_discard_async(struct f2fs_sb_info *sbi,
>  				bdev_zoned_model(bdev) != BLK_ZONED_NONE)
>  		return __f2fs_issue_discard_zone(sbi, bdev, blkstart, blklen);
>  #endif
> -	return __f2fs_issue_discard_async(sbi, bdev, blkstart, blklen);
> +	return __queue_discard_cmd(sbi, bdev, blkstart, blklen);
>  }
>  
>  static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
> -- 
> 2.11.0
> 
---end quoted text---

^ permalink raw reply

* Re: [PATCH 5/8] nowait aio: return on congested block device
From: Jan Kara @ 2017-03-08 15:28 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: Sagi Grimberg, jack, hch, linux-fsdevel, linux-block, linux-btrfs,
	linux-ext4, linux-xfs, Goldwyn Rodrigues
In-Reply-To: <44cc02b1-f7ac-cbc6-6f09-726ae8225358@suse.de>

On Wed 08-03-17 09:00:09, Goldwyn Rodrigues wrote:
> 
> 
> On 03/08/2017 01:03 AM, Sagi Grimberg wrote:
> > 
> >> -        if (likely(blk_queue_enter(q, false) == 0)) {
> >> +        if (likely(blk_queue_enter(q, bio_flagged(bio, BIO_NOWAIT))
> >> == 0)) {
> >>              ret = q->make_request_fn(q, bio);
> > 
> > I think that for ->make_request to not block we'd need to set
> > BLK_MQ_REQ_NOWAIT in blk_mq_alloc_data to avoid blocking on a tag
> > allocation.
> > 
> > Something like the untested addition below:
> 
> I did that in the first series, but there are too many reasons to block
> in blk-mq [1]. I dropped blk-mq work in v2.
> 
> [1] https://patchwork.kernel.org/patch/9571051/

Well, that's not really good. If we cannot support this for both blk-mq and
legacy block layer the feature will not be usable. So please work on blk-mq
support as well.

								Honza 

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: BIO request larger than our storage device supports in linux kernel 4.x
From: Jens Axboe @ 2017-03-08 15:04 UTC (permalink / raw)
  To: Umesh Patel, linux-block@vger.kernel.org
In-Reply-To: <CY1PR0401MB1353279ED2A07E69F8ED42B5E12E0@CY1PR0401MB1353.namprd04.prod.outlook.com>

On 03/08/2017 02:34 AM, Umesh Patel wrote:
> Hello,
> 
> We are registering BIO size of our storage device through linux kernel
> API blk_queue_max_hw_sectors worth of 104K. So max size of BIO that
> our block device can handle is 104 KB but kernel block layer is
> sending 256 KB worth of BIO which is more than we are supporting.
> This issue we are observing in 4.4 and also in latest 4.10.1 kernel
> release. Until this we had not seen this issue.
> 
> From the kernel source code history I came to below things.  4.1
> kernel version onwards "nr_pages = min(sdio->pages_in_io,
> bio_get_nr_vecs(map_bh->b_bdev))" has been removed which
> (bio_get_nr_vecs) was considering max_sectors_kb of queue which was
> 104 KB for our device.
> 
> Presently new code is something like "nr_pages =
> min(sdio->pages_in_io, BIO_MAX_PAGES)" which is sending 256 KB
> (BIO_MAX_PAGES) of BIO size to our device which we are not supporting.
> 
> 
> So from some documentation I found out that 256 KB of bio size is the
> fix (from 4.x kernel) and that has to support by any drive.  Please
> let me know is there any way to change BIO size worth of 104 KB from
> 256 KB or any other way to register our BIO size with kernel or any
> other area to look in to ?. 

I'm assuming this is some not-open driver, and I'm also assuming that
you are bypassing the proper API and using make_request_fn. In which
case you are missing a call to blk_queue_split().

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 5/8] nowait aio: return on congested block device
From: Goldwyn Rodrigues @ 2017-03-08 15:00 UTC (permalink / raw)
  To: Sagi Grimberg, jack
  Cc: hch, linux-fsdevel, linux-block, linux-btrfs, linux-ext4,
	linux-xfs, Goldwyn Rodrigues
In-Reply-To: <0d1ba678-69c0-16ce-6c3e-475cd8da203c@grimberg.me>



On 03/08/2017 01:03 AM, Sagi Grimberg wrote:
> 
>> -        if (likely(blk_queue_enter(q, false) == 0)) {
>> +        if (likely(blk_queue_enter(q, bio_flagged(bio, BIO_NOWAIT))
>> == 0)) {
>>              ret = q->make_request_fn(q, bio);
> 
> I think that for ->make_request to not block we'd need to set
> BLK_MQ_REQ_NOWAIT in blk_mq_alloc_data to avoid blocking on a tag
> allocation.
> 
> Something like the untested addition below:

I did that in the first series, but there are too many reasons to block
in blk-mq [1]. I dropped blk-mq work in v2.

[1] https://patchwork.kernel.org/patch/9571051/

-- 
Goldwyn

^ permalink raw reply

* Re: [Nbd] [PATCH 6/6] nbd: add a basic netlink interface
From: Josef Bacik @ 2017-03-08 14:56 UTC (permalink / raw)
  To: Wouter Verhelst; +Cc: axboe, nbd-general, linux-block, kernel-team
In-Reply-To: <20170308100704.xdxuhlpsttigfreq@grep.be>

On Wed, 2017-03-08 at 11:07 +0100, Wouter Verhelst wrote:
> On Tue, Feb 28, 2017 at 11:57:11AM -0500, Josef Bacik wrote:
> > 
> > The existing ioctl interface for configuring NBD devices is a bit
> > cumbersome and hard to extend.  The other problem is we leave a
> > userspace app sitting in it's syscall until the device disconnects,
> > which is less than ideal.
> True.
> 
> On the other hand, it has the advantage that you leave a userspace
> app
> sitting around until the device disconnects, which allows for some
> form
> of recovery in case you're doing root-on-NBD and the server has a
> hiccup. Don't underestimate the advantage of that.
> 
> (of course, that requires that the return value of NBD_DO_IT makes a
> difference between "unexpected connection drop" and "we sent
> NBD_CMD_DISC", but that's a different matter entirely)
> 

Stay tuned for further developments ;).  Yeah the problem is that even
though we can return and allow the user to reconnect, we completely
tear down the device and will return EIO to anything that comes in
while we're reconnecting, which sucks for users.  The patches that I'm
testing now will multi-cast messages over netlink when a link goes down
so a user space application can reconnect and provide a new connection
seamlessly.  The next step after that is to allow a complete failure of
all connections and we will simply sit there and queue IO until
userspace reconnects or the configured timeout elapses at which point
we'll tear down the device.  The end goal of all of this is seamless
reconnects without throwing errors.  Thanks,

Josef

^ permalink raw reply

* Re: BIO request larger than our storage device supports in linux kernel 4.x
From: Christoph Hellwig @ 2017-03-08 14:55 UTC (permalink / raw)
  To: Umesh Patel; +Cc: linux-block@vger.kernel.org
In-Reply-To: <CY1PR0401MB1353279ED2A07E69F8ED42B5E12E0@CY1PR0401MB1353.namprd04.prod.outlook.com>

Can you send the code for the driver, please?

^ permalink raw reply

* [PATCH] axonram: Fix gendisk handling
From: Jan Kara @ 2017-03-08 13:56 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, linux-block, Dan Carpenter, Jan Kara

It is invalid to call del_gendisk() when disk->queue is NULL. Fix error
handling in axon_ram_probe() to avoid doing that.

Also del_gendisk() does not drop a reference to gendisk allocated by
alloc_disk(). That has to be done by put_disk(). Add that call where
needed.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 arch/powerpc/sysdev/axonram.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Warning: The patch is not tested in any way. I just based the fix on Smatch
warning and how things should be...

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index ada29eaed6e2..f523ac883150 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -274,7 +274,9 @@ static int axon_ram_probe(struct platform_device *device)
 			if (bank->disk->major > 0)
 				unregister_blkdev(bank->disk->major,
 						bank->disk->disk_name);
-			del_gendisk(bank->disk);
+			if (bank->disk->flags & GENHD_FL_UP)
+				del_gendisk(bank->disk);
+			put_disk(bank->disk);
 		}
 		device->dev.platform_data = NULL;
 		if (bank->io_addr != 0)
@@ -299,6 +301,7 @@ axon_ram_remove(struct platform_device *device)
 	device_remove_file(&device->dev, &dev_attr_ecc);
 	free_irq(bank->irq_id, device);
 	del_gendisk(bank->disk);
+	put_disk(bank->disk);
 	iounmap((void __iomem *) bank->io_addr);
 	kfree(bank);
 
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH 02/11] block: Fix race of bdev open with gendisk shutdown
From: Jan Kara @ 2017-03-08 14:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jan Kara, Jens Axboe, linux-block, Christoph Hellwig,
	Dan Williams, Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown,
	Omar Sandoval
In-Reply-To: <20170307194349.GF31179@htj.duckdns.org>

On Tue 07-03-17 14:43:49, Tejun Heo wrote:
> Hello,
> 
> On Tue, Mar 07, 2017 at 12:14:20PM +0100, Jan Kara wrote:
> > > Given that this isn't a super hot path, I think it'd be far better to
> > > stick to a simpler synchronization mechanism.
> > 
> > I'd be happy to do so however struct gendisk does not have any lock in it
> > and introducing a special lock just for that seems awkward.
> 
> I think even a awkward shared lock is better than memory barriers.
> Barriers are the trickiest locking construct to get right and we
> really shouldn't using that unless absolutely necessary.

OK, I'll try to come up with something.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox