Linux block layer

Linux block layer
 help / color / mirror / Atom feed

* [PATCH 2/5] nvme: cleanup nvme_req_needs_retry
From: Christoph Hellwig @ 2017-04-05 14:18 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block, linux-scsi
In-Reply-To: <20170405141856.1862-1-hch@lst.de>

Don't pass the status explicitly but derive it from the requeust,
and unwind the complex condition to be more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0437f44d00f9..b225aacf4b89 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -67,11 +67,17 @@ static DEFINE_SPINLOCK(dev_list_lock);
 
 static struct class *nvme_class;
 
-static inline bool nvme_req_needs_retry(struct request *req, u16 status)
+static inline bool nvme_req_needs_retry(struct request *req)
 {
-	return !(status & NVME_SC_DNR || blk_noretry_request(req)) &&
-		(jiffies - req->start_time) < req->timeout &&
-		req->retries < nvme_max_retries;
+	if (blk_noretry_request(req))
+		return false;
+	if (req->errors & NVME_SC_DNR)
+		return false;
+	if (jiffies - req->start_time >= req->timeout)
+		return false;
+	if (req->retries >= nvme_max_retries)
+		return false;
+	return true;
 }
 
 void nvme_complete_rq(struct request *req)
@@ -79,7 +85,7 @@ void nvme_complete_rq(struct request *req)
 	int error = 0;
 
 	if (unlikely(req->errors)) {
-		if (nvme_req_needs_retry(req, req->errors)) {
+		if (nvme_req_needs_retry(req)) {
 			req->retries++;
 			blk_mq_requeue_request(req,
 					!blk_mq_queue_stopped(req->q));
-- 
2.11.0

^ permalink raw reply related

* [PATCH 1/5] nvme: move ->retries setup to nvme_setup_cmd
From: Christoph Hellwig @ 2017-04-05 14:18 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block, linux-scsi
In-Reply-To: <20170405141856.1862-1-hch@lst.de>

This way we get the behavior right for the non-PCIe transports.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 5 +++++
 drivers/nvme/host/pci.c  | 4 ----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3c908e1bc903..0437f44d00f9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -350,6 +350,11 @@ int nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
 {
 	int ret = BLK_MQ_RQ_QUEUE_OK;
 
+	if (!(req->rq_flags & RQF_DONTPREP)) {
+		req->retries = 0;
+		req->rq_flags |= RQF_DONTPREP;
+	}
+
 	switch (req_op(req)) {
 	case REQ_OP_DRV_IN:
 	case REQ_OP_DRV_OUT:
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9e686a67d93b..3818ab15a26e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -326,10 +326,6 @@ static int nvme_init_iod(struct request *rq, struct nvme_dev *dev)
 	iod->nents = 0;
 	iod->length = size;
 
-	if (!(rq->rq_flags & RQF_DONTPREP)) {
-		rq->retries = 0;
-		rq->rq_flags |= RQF_DONTPREP;
-	}
 	return BLK_MQ_RQ_QUEUE_OK;
 }
 
-- 
2.11.0

^ permalink raw reply related

* ->retries fixups
From: Christoph Hellwig @ 2017-04-05 14:18 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: linux-nvme, linux-block, linux-scsi

This series fixes a few lose bits in terms of how nvme uses ->retries,
including fixing it for non-PCIe transports.  While at it I noticed that
nvme and scsi use the field in entirely different ways, and no other
driver uses it at all.  So I decided to move it into the nvme_request and
scsi_request structures instead.

^ permalink raw reply

* Re: [PATCH rfc 5/6] block: Add rdma affinity based queue mapping helper
From: Jens Axboe @ 2017-04-05 14:17 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma, linux-nvme, linux-block
  Cc: netdev, Saeed Mahameed, Or Gerlitz, Christoph Hellwig
In-Reply-To: <1491140492-25703-6-git-send-email-sagi@grimberg.me>

On 04/02/2017 07:41 AM, Sagi Grimberg wrote:
> Like pci and virtio, we add a rdma helper for affinity
> spreading. This achieves optimal mq affinity assignments
> according to the underlying rdma device affinity maps.

Reviewed-by: Jens Axboe <axboe@fb.com>

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH] cfq: Disable writeback throttling by default
From: Jens Axboe @ 2017-04-05 14:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-block
In-Reply-To: <20170404123130.23151-1-jack@suse.cz>

On 04/04/2017 06:31 AM, Jan Kara wrote:
> Writeback throttling does not play well with CFQ since that also tries
> to throttle async writes. As a result async writeback can get starved in
> presence of readers. As an example take a benchmark simulating
> postgreSQL database running over a standard rotating SATA drive. There
> are 16 processes doing random reads from a huge file (2*machine memory),
> 1 process doing random writes to the huge file and calling fsync once
> per 50000 writes and 1 process doing sequential 8k writes to a
> relatively small file wrapping around at the end of the file and calling
> fsync every 5 writes. Under this load read latency easily exceeds the
> target latency of 75 ms (just because there are so many reads happening
> against a relatively slow disk) and thus writeback is throttled to a
> point where only 1 write request is allowed at a time.

Thanks Jan, it's probably really hard to ever combine anything with
CFQ, since it does its own thing. I have applied the patch.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH] blk-mq: Remove blk_mq_queue_data.list
From: Jens Axboe @ 2017-04-05 14:07 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-block@vger.kernel.org
In-Reply-To: <CY1PR0401MB15363601FA034CE8627DA41881080@CY1PR0401MB1536.namprd04.prod.outlook.com>

On Mon, Apr 03 2017, Bart Van Assche wrote:
> The block layer core sets blk_mq_queue_data.list but no block
> drivers read that member. Hence remove it and also the code that
> is used to set this member.

Looks fine to me, might as well kill it. Your patch came through mangled
here though, on both my private and corporate email account. Did
something change with the setup at your end? Would be great if you could
resend it.

-- 
Jens Axboe

^ permalink raw reply

* [PATCH] remove the obsolete hd driver
From: Christoph Hellwig @ 2017-04-05 13:59 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel

This driver is for pre-IDE hardisk that are only found in PC from the
stoneage of personal computing, and which we don't support elsewhere
in the kernel these days.

It's also been marked broken forever.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/Kconfig  |  12 -
 drivers/block/Makefile |   1 -
 drivers/block/hd.c     | 819 -------------------------------------------------
 3 files changed, 832 deletions(-)
 delete mode 100644 drivers/block/hd.c

diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index f744de7a0f9b..a1c2e816128f 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -514,18 +514,6 @@ config VIRTIO_BLK_SCSI
 	  virtio protocol and not enabled by default by any hypervisor.
 	  Your probably want to virtio-scsi instead.
 
-config BLK_DEV_HD
-	bool "Very old hard disk (MFM/RLL/IDE) driver"
-	depends on HAVE_IDE
-	depends on !ARM || ARCH_RPC || BROKEN
-	help
-	  This is a very old hard disk driver that lacks the enhanced
-	  functionality of the newer ones.
-
-	  It is required for systems with ancient MFM/RLL/ESDI drives.
-
-	  If unsure, say N.
-
 config BLK_DEV_RBD
 	tristate "Rados block device (RBD)"
 	depends on INET && BLOCK
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 1e9661e26f29..b12c772bbeb3 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -30,7 +30,6 @@ obj-$(CONFIG_BLK_DEV_CRYPTOLOOP) += cryptoloop.o
 obj-$(CONFIG_VIRTIO_BLK)	+= virtio_blk.o
 
 obj-$(CONFIG_BLK_DEV_SX8)	+= sx8.o
-obj-$(CONFIG_BLK_DEV_HD)	+= hd.o
 
 obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= xen-blkfront.o
 obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= xen-blkback/
diff --git a/drivers/block/hd.c b/drivers/block/hd.c
deleted file mode 100644
index 79d63b20a297..000000000000
--- a/drivers/block/hd.c
+++ /dev/null
@@ -1,819 +0,0 @@
-/*
- *  Copyright (C) 1991, 1992  Linus Torvalds
- *
- * This is the low-level hd interrupt support. It traverses the
- * request-list, using interrupts to jump between functions. As
- * all the functions are called within interrupts, we may not
- * sleep. Special care is recommended.
- *
- *  modified by Drew Eckhardt to check nr of hd's from the CMOS.
- *
- *  Thanks to Branko Lankester, lankeste@fwi.uva.nl, who found a bug
- *  in the early extended-partition checks and added DM partitions
- *
- *  IRQ-unmask, drive-id, multiple-mode, support for ">16 heads",
- *  and general streamlining by Mark Lord.
- *
- *  Removed 99% of above. Use Mark's ide driver for those options.
- *  This is now a lightweight ST-506 driver. (Paul Gortmaker)
- *
- *  Modified 1995 Russell King for ARM processor.
- *
- *  Bugfix: max_sectors must be <= 255 or the wheels tend to come
- *  off in a hurry once you queue things up - Paul G. 02/2001
- */
-
-/* Uncomment the following if you want verbose error reports. */
-/* #define VERBOSE_ERRORS */
-
-#include <linux/blkdev.h>
-#include <linux/errno.h>
-#include <linux/signal.h>
-#include <linux/interrupt.h>
-#include <linux/timer.h>
-#include <linux/fs.h>
-#include <linux/kernel.h>
-#include <linux/genhd.h>
-#include <linux/string.h>
-#include <linux/ioport.h>
-#include <linux/init.h>
-#include <linux/blkpg.h>
-#include <linux/ata.h>
-#include <linux/hdreg.h>
-
-#define HD_IRQ 14
-
-#define REALLY_SLOW_IO
-#include <asm/io.h>
-#include <linux/uaccess.h>
-
-#ifdef __arm__
-#undef  HD_IRQ
-#endif
-#include <asm/irq.h>
-#ifdef __arm__
-#define HD_IRQ IRQ_HARDDISK
-#endif
-
-/* Hd controller regster ports */
-
-#define HD_DATA		0x1f0		/* _CTL when writing */
-#define HD_ERROR	0x1f1		/* see err-bits */
-#define HD_NSECTOR	0x1f2		/* nr of sectors to read/write */
-#define HD_SECTOR	0x1f3		/* starting sector */
-#define HD_LCYL		0x1f4		/* starting cylinder */
-#define HD_HCYL		0x1f5		/* high byte of starting cyl */
-#define HD_CURRENT	0x1f6		/* 101dhhhh , d=drive, hhhh=head */
-#define HD_STATUS	0x1f7		/* see status-bits */
-#define HD_FEATURE	HD_ERROR	/* same io address, read=error, write=feature */
-#define HD_PRECOMP	HD_FEATURE	/* obsolete use of this port - predates IDE */
-#define HD_COMMAND	HD_STATUS	/* same io address, read=status, write=cmd */
-
-#define HD_CMD		0x3f6		/* used for resets */
-#define HD_ALTSTATUS	0x3f6		/* same as HD_STATUS but doesn't clear irq */
-
-/* Bits of HD_STATUS */
-#define ERR_STAT		0x01
-#define INDEX_STAT		0x02
-#define ECC_STAT		0x04	/* Corrected error */
-#define DRQ_STAT		0x08
-#define SEEK_STAT		0x10
-#define SERVICE_STAT		SEEK_STAT
-#define WRERR_STAT		0x20
-#define READY_STAT		0x40
-#define BUSY_STAT		0x80
-
-/* Bits for HD_ERROR */
-#define MARK_ERR		0x01	/* Bad address mark */
-#define TRK0_ERR		0x02	/* couldn't find track 0 */
-#define ABRT_ERR		0x04	/* Command aborted */
-#define MCR_ERR			0x08	/* media change request */
-#define ID_ERR			0x10	/* ID field not found */
-#define MC_ERR			0x20	/* media changed */
-#define ECC_ERR			0x40	/* Uncorrectable ECC error */
-#define BBD_ERR			0x80	/* pre-EIDE meaning:  block marked bad */
-#define ICRC_ERR		0x80	/* new meaning:  CRC error during transfer */
-
-static DEFINE_SPINLOCK(hd_lock);
-static unsigned int hd_queue;
-static struct request *hd_req;
-
-#define TIMEOUT_VALUE	(6*HZ)
-#define	HD_DELAY	0
-
-#define MAX_ERRORS     16	/* Max read/write errors/sector */
-#define RESET_FREQ      8	/* Reset controller every 8th retry */
-#define RECAL_FREQ      4	/* Recalibrate every 4th retry */
-#define MAX_HD		2
-
-#define STAT_OK		(READY_STAT|SEEK_STAT)
-#define OK_STATUS(s)	(((s)&(STAT_OK|(BUSY_STAT|WRERR_STAT|ERR_STAT)))==STAT_OK)
-
-static void recal_intr(void);
-static void bad_rw_intr(void);
-
-static int reset;
-static int hd_error;
-
-/*
- *  This struct defines the HD's and their types.
- */
-struct hd_i_struct {
-	unsigned int head, sect, cyl, wpcom, lzone, ctl;
-	int unit;
-	int recalibrate;
-	int special_op;
-};
-
-#ifdef HD_TYPE
-static struct hd_i_struct hd_info[] = { HD_TYPE };
-static int NR_HD = ARRAY_SIZE(hd_info);
-#else
-static struct hd_i_struct hd_info[MAX_HD];
-static int NR_HD;
-#endif
-
-static struct gendisk *hd_gendisk[MAX_HD];
-
-static struct timer_list device_timer;
-
-#define TIMEOUT_VALUE (6*HZ)
-
-#define SET_TIMER							\
-	do {								\
-		mod_timer(&device_timer, jiffies + TIMEOUT_VALUE);	\
-	} while (0)
-
-static void (*do_hd)(void) = NULL;
-#define SET_HANDLER(x) \
-if ((do_hd = (x)) != NULL) \
-	SET_TIMER; \
-else \
-	del_timer(&device_timer);
-
-
-#if (HD_DELAY > 0)
-
-#include <linux/i8253.h>
-
-unsigned long last_req;
-
-unsigned long read_timer(void)
-{
-	unsigned long t, flags;
-	int i;
-
-	raw_spin_lock_irqsave(&i8253_lock, flags);
-	t = jiffies * 11932;
-	outb_p(0, 0x43);
-	i = inb_p(0x40);
-	i |= inb(0x40) << 8;
-	raw_spin_unlock_irqrestore(&i8253_lock, flags);
-	return(t - i);
-}
-#endif
-
-static void __init hd_setup(char *str, int *ints)
-{
-	int hdind = 0;
-
-	if (ints[0] != 3)
-		return;
-	if (hd_info[0].head != 0)
-		hdind = 1;
-	hd_info[hdind].head = ints[2];
-	hd_info[hdind].sect = ints[3];
-	hd_info[hdind].cyl = ints[1];
-	hd_info[hdind].wpcom = 0;
-	hd_info[hdind].lzone = ints[1];
-	hd_info[hdind].ctl = (ints[2] > 8 ? 8 : 0);
-	NR_HD = hdind+1;
-}
-
-static bool hd_end_request(int err, unsigned int bytes)
-{
-	if (__blk_end_request(hd_req, err, bytes))
-		return true;
-	hd_req = NULL;
-	return false;
-}
-
-static bool hd_end_request_cur(int err)
-{
-	return hd_end_request(err, blk_rq_cur_bytes(hd_req));
-}
-
-static void dump_status(const char *msg, unsigned int stat)
-{
-	char *name = "hd?";
-	if (hd_req)
-		name = hd_req->rq_disk->disk_name;
-
-#ifdef VERBOSE_ERRORS
-	printk("%s: %s: status=0x%02x { ", name, msg, stat & 0xff);
-	if (stat & BUSY_STAT)	printk("Busy ");
-	if (stat & READY_STAT)	printk("DriveReady ");
-	if (stat & WRERR_STAT)	printk("WriteFault ");
-	if (stat & SEEK_STAT)	printk("SeekComplete ");
-	if (stat & DRQ_STAT)	printk("DataRequest ");
-	if (stat & ECC_STAT)	printk("CorrectedError ");
-	if (stat & INDEX_STAT)	printk("Index ");
-	if (stat & ERR_STAT)	printk("Error ");
-	printk("}\n");
-	if ((stat & ERR_STAT) == 0) {
-		hd_error = 0;
-	} else {
-		hd_error = inb(HD_ERROR);
-		printk("%s: %s: error=0x%02x { ", name, msg, hd_error & 0xff);
-		if (hd_error & BBD_ERR)		printk("BadSector ");
-		if (hd_error & ECC_ERR)		printk("UncorrectableError ");
-		if (hd_error & ID_ERR)		printk("SectorIdNotFound ");
-		if (hd_error & ABRT_ERR)	printk("DriveStatusError ");
-		if (hd_error & TRK0_ERR)	printk("TrackZeroNotFound ");
-		if (hd_error & MARK_ERR)	printk("AddrMarkNotFound ");
-		printk("}");
-		if (hd_error & (BBD_ERR|ECC_ERR|ID_ERR|MARK_ERR)) {
-			printk(", CHS=%d/%d/%d", (inb(HD_HCYL)<<8) + inb(HD_LCYL),
-				inb(HD_CURRENT) & 0xf, inb(HD_SECTOR));
-			if (hd_req)
-				printk(", sector=%ld", blk_rq_pos(hd_req));
-		}
-		printk("\n");
-	}
-#else
-	printk("%s: %s: status=0x%02x.\n", name, msg, stat & 0xff);
-	if ((stat & ERR_STAT) == 0) {
-		hd_error = 0;
-	} else {
-		hd_error = inb(HD_ERROR);
-		printk("%s: %s: error=0x%02x.\n", name, msg, hd_error & 0xff);
-	}
-#endif
-}
-
-static void check_status(void)
-{
-	int i = inb_p(HD_STATUS);
-
-	if (!OK_STATUS(i)) {
-		dump_status("check_status", i);
-		bad_rw_intr();
-	}
-}
-
-static int controller_busy(void)
-{
-	int retries = 100000;
-	unsigned char status;
-
-	do {
-		status = inb_p(HD_STATUS);
-	} while ((status & BUSY_STAT) && --retries);
-	return status;
-}
-
-static int status_ok(void)
-{
-	unsigned char status = inb_p(HD_STATUS);
-
-	if (status & BUSY_STAT)
-		return 1;	/* Ancient, but does it make sense??? */
-	if (status & WRERR_STAT)
-		return 0;
-	if (!(status & READY_STAT))
-		return 0;
-	if (!(status & SEEK_STAT))
-		return 0;
-	return 1;
-}
-
-static int controller_ready(unsigned int drive, unsigned int head)
-{
-	int retry = 100;
-
-	do {
-		if (controller_busy() & BUSY_STAT)
-			return 0;
-		outb_p(0xA0 | (drive<<4) | head, HD_CURRENT);
-		if (status_ok())
-			return 1;
-	} while (--retry);
-	return 0;
-}
-
-static void hd_out(struct hd_i_struct *disk,
-		   unsigned int nsect,
-		   unsigned int sect,
-		   unsigned int head,
-		   unsigned int cyl,
-		   unsigned int cmd,
-		   void (*intr_addr)(void))
-{
-	unsigned short port;
-
-#if (HD_DELAY > 0)
-	while (read_timer() - last_req < HD_DELAY)
-		/* nothing */;
-#endif
-	if (reset)
-		return;
-	if (!controller_ready(disk->unit, head)) {
-		reset = 1;
-		return;
-	}
-	SET_HANDLER(intr_addr);
-	outb_p(disk->ctl, HD_CMD);
-	port = HD_DATA;
-	outb_p(disk->wpcom >> 2, ++port);
-	outb_p(nsect, ++port);
-	outb_p(sect, ++port);
-	outb_p(cyl, ++port);
-	outb_p(cyl >> 8, ++port);
-	outb_p(0xA0 | (disk->unit << 4) | head, ++port);
-	outb_p(cmd, ++port);
-}
-
-static void hd_request (void);
-
-static int drive_busy(void)
-{
-	unsigned int i;
-	unsigned char c;
-
-	for (i = 0; i < 500000 ; i++) {
-		c = inb_p(HD_STATUS);
-		if ((c & (BUSY_STAT | READY_STAT | SEEK_STAT)) == STAT_OK)
-			return 0;
-	}
-	dump_status("reset timed out", c);
-	return 1;
-}
-
-static void reset_controller(void)
-{
-	int	i;
-
-	outb_p(4, HD_CMD);
-	for (i = 0; i < 1000; i++) barrier();
-	outb_p(hd_info[0].ctl & 0x0f, HD_CMD);
-	for (i = 0; i < 1000; i++) barrier();
-	if (drive_busy())
-		printk("hd: controller still busy\n");
-	else if ((hd_error = inb(HD_ERROR)) != 1)
-		printk("hd: controller reset failed: %02x\n", hd_error);
-}
-
-static void reset_hd(void)
-{
-	static int i;
-
-repeat:
-	if (reset) {
-		reset = 0;
-		i = -1;
-		reset_controller();
-	} else {
-		check_status();
-		if (reset)
-			goto repeat;
-	}
-	if (++i < NR_HD) {
-		struct hd_i_struct *disk = &hd_info[i];
-		disk->special_op = disk->recalibrate = 1;
-		hd_out(disk, disk->sect, disk->sect, disk->head-1,
-			disk->cyl, ATA_CMD_INIT_DEV_PARAMS, &reset_hd);
-		if (reset)
-			goto repeat;
-	} else
-		hd_request();
-}
-
-/*
- * Ok, don't know what to do with the unexpected interrupts: on some machines
- * doing a reset and a retry seems to result in an eternal loop. Right now I
- * ignore it, and just set the timeout.
- *
- * On laptops (and "green" PCs), an unexpected interrupt occurs whenever the
- * drive enters "idle", "standby", or "sleep" mode, so if the status looks
- * "good", we just ignore the interrupt completely.
- */
-static void unexpected_hd_interrupt(void)
-{
-	unsigned int stat = inb_p(HD_STATUS);
-
-	if (stat & (BUSY_STAT|DRQ_STAT|ECC_STAT|ERR_STAT)) {
-		dump_status("unexpected interrupt", stat);
-		SET_TIMER;
-	}
-}
-
-/*
- * bad_rw_intr() now tries to be a bit smarter and does things
- * according to the error returned by the controller.
- * -Mika Liljeberg (liljeber@cs.Helsinki.FI)
- */
-static void bad_rw_intr(void)
-{
-	struct request *req = hd_req;
-
-	if (req != NULL) {
-		struct hd_i_struct *disk = req->rq_disk->private_data;
-		if (++req->errors >= MAX_ERRORS || (hd_error & BBD_ERR)) {
-			hd_end_request_cur(-EIO);
-			disk->special_op = disk->recalibrate = 1;
-		} else if (req->errors % RESET_FREQ == 0)
-			reset = 1;
-		else if ((hd_error & TRK0_ERR) || req->errors % RECAL_FREQ == 0)
-			disk->special_op = disk->recalibrate = 1;
-		/* Otherwise just retry */
-	}
-}
-
-static inline int wait_DRQ(void)
-{
-	int retries;
-	int stat;
-
-	for (retries = 0; retries < 100000; retries++) {
-		stat = inb_p(HD_STATUS);
-		if (stat & DRQ_STAT)
-			return 0;
-	}
-	dump_status("wait_DRQ", stat);
-	return -1;
-}
-
-static void read_intr(void)
-{
-	struct request *req;
-	int i, retries = 100000;
-
-	do {
-		i = (unsigned) inb_p(HD_STATUS);
-		if (i & BUSY_STAT)
-			continue;
-		if (!OK_STATUS(i))
-			break;
-		if (i & DRQ_STAT)
-			goto ok_to_read;
-	} while (--retries > 0);
-	dump_status("read_intr", i);
-	bad_rw_intr();
-	hd_request();
-	return;
-
-ok_to_read:
-	req = hd_req;
-	insw(HD_DATA, bio_data(req->bio), 256);
-#ifdef DEBUG
-	printk("%s: read: sector %ld, remaining = %u, buffer=%p\n",
-	       req->rq_disk->disk_name, blk_rq_pos(req) + 1,
-	       blk_rq_sectors(req) - 1, bio_data(req->bio)+512);
-#endif
-	if (hd_end_request(0, 512)) {
-		SET_HANDLER(&read_intr);
-		return;
-	}
-
-	(void) inb_p(HD_STATUS);
-#if (HD_DELAY > 0)
-	last_req = read_timer();
-#endif
-	hd_request();
-}
-
-static void write_intr(void)
-{
-	struct request *req = hd_req;
-	int i;
-	int retries = 100000;
-
-	do {
-		i = (unsigned) inb_p(HD_STATUS);
-		if (i & BUSY_STAT)
-			continue;
-		if (!OK_STATUS(i))
-			break;
-		if ((blk_rq_sectors(req) <= 1) || (i & DRQ_STAT))
-			goto ok_to_write;
-	} while (--retries > 0);
-	dump_status("write_intr", i);
-	bad_rw_intr();
-	hd_request();
-	return;
-
-ok_to_write:
-	if (hd_end_request(0, 512)) {
-		SET_HANDLER(&write_intr);
-		outsw(HD_DATA, bio_data(req->bio), 256);
-		return;
-	}
-
-#if (HD_DELAY > 0)
-	last_req = read_timer();
-#endif
-	hd_request();
-}
-
-static void recal_intr(void)
-{
-	check_status();
-#if (HD_DELAY > 0)
-	last_req = read_timer();
-#endif
-	hd_request();
-}
-
-/*
- * This is another of the error-routines I don't know what to do with. The
- * best idea seems to just set reset, and start all over again.
- */
-static void hd_times_out(unsigned long dummy)
-{
-	char *name;
-
-	do_hd = NULL;
-
-	if (!hd_req)
-		return;
-
-	spin_lock_irq(&hd_lock);
-	reset = 1;
-	name = hd_req->rq_disk->disk_name;
-	printk("%s: timeout\n", name);
-	if (++hd_req->errors >= MAX_ERRORS) {
-#ifdef DEBUG
-		printk("%s: too many errors\n", name);
-#endif
-		hd_end_request_cur(-EIO);
-	}
-	hd_request();
-	spin_unlock_irq(&hd_lock);
-}
-
-static int do_special_op(struct hd_i_struct *disk, struct request *req)
-{
-	if (disk->recalibrate) {
-		disk->recalibrate = 0;
-		hd_out(disk, disk->sect, 0, 0, 0, ATA_CMD_RESTORE, &recal_intr);
-		return reset;
-	}
-	if (disk->head > 16) {
-		printk("%s: cannot handle device with more than 16 heads - giving up\n", req->rq_disk->disk_name);
-		hd_end_request_cur(-EIO);
-	}
-	disk->special_op = 0;
-	return 1;
-}
-
-static int set_next_request(void)
-{
-	struct request_queue *q;
-	int old_pos = hd_queue;
-
-	do {
-		q = hd_gendisk[hd_queue]->queue;
-		if (++hd_queue == NR_HD)
-			hd_queue = 0;
-		if (q) {
-			hd_req = blk_fetch_request(q);
-			if (hd_req)
-				break;
-		}
-	} while (hd_queue != old_pos);
-
-	return hd_req != NULL;
-}
-
-/*
- * The driver enables interrupts as much as possible.  In order to do this,
- * (a) the device-interrupt is disabled before entering hd_request(),
- * and (b) the timeout-interrupt is disabled before the sti().
- *
- * Interrupts are still masked (by default) whenever we are exchanging
- * data/cmds with a drive, because some drives seem to have very poor
- * tolerance for latency during I/O. The IDE driver has support to unmask
- * interrupts for non-broken hardware, so use that driver if required.
- */
-static void hd_request(void)
-{
-	unsigned int block, nsect, sec, track, head, cyl;
-	struct hd_i_struct *disk;
-	struct request *req;
-
-	if (do_hd)
-		return;
-repeat:
-	del_timer(&device_timer);
-
-	if (!hd_req && !set_next_request()) {
-		do_hd = NULL;
-		return;
-	}
-	req = hd_req;
-
-	if (reset) {
-		reset_hd();
-		return;
-	}
-	disk = req->rq_disk->private_data;
-	block = blk_rq_pos(req);
-	nsect = blk_rq_sectors(req);
-	if (block >= get_capacity(req->rq_disk) ||
-	    ((block+nsect) > get_capacity(req->rq_disk))) {
-		printk("%s: bad access: block=%d, count=%d\n",
-			req->rq_disk->disk_name, block, nsect);
-		hd_end_request_cur(-EIO);
-		goto repeat;
-	}
-
-	if (disk->special_op) {
-		if (do_special_op(disk, req))
-			goto repeat;
-		return;
-	}
-	sec   = block % disk->sect + 1;
-	track = block / disk->sect;
-	head  = track % disk->head;
-	cyl   = track / disk->head;
-#ifdef DEBUG
-	printk("%s: %sing: CHS=%d/%d/%d, sectors=%d, buffer=%p\n",
-		req->rq_disk->disk_name,
-		req_data_dir(req) == READ ? "read" : "writ",
-		cyl, head, sec, nsect, bio_data(req->bio));
-#endif
-
-	switch (req_op(req)) {
-	case REQ_OP_READ:
-		hd_out(disk, nsect, sec, head, cyl, ATA_CMD_PIO_READ,
-			&read_intr);
-		if (reset)
-			goto repeat;
-		break;
-	case REQ_OP_WRITE:
-		hd_out(disk, nsect, sec, head, cyl, ATA_CMD_PIO_WRITE,
-			&write_intr);
-		if (reset)
-			goto repeat;
-		if (wait_DRQ()) {
-			bad_rw_intr();
-			goto repeat;
-		}
-		outsw(HD_DATA, bio_data(req->bio), 256);
-		break;
-	default:
-		printk("unknown hd-command\n");
-		hd_end_request_cur(-EIO);
-		break;
-	}
-}
-
-static void do_hd_request(struct request_queue *q)
-{
-	hd_request();
-}
-
-static int hd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
-{
-	struct hd_i_struct *disk = bdev->bd_disk->private_data;
-
-	geo->heads = disk->head;
-	geo->sectors = disk->sect;
-	geo->cylinders = disk->cyl;
-	return 0;
-}
-
-/*
- * Releasing a block device means we sync() it, so that it can safely
- * be forgotten about...
- */
-
-static irqreturn_t hd_interrupt(int irq, void *dev_id)
-{
-	void (*handler)(void) = do_hd;
-
-	spin_lock(&hd_lock);
-
-	do_hd = NULL;
-	del_timer(&device_timer);
-	if (!handler)
-		handler = unexpected_hd_interrupt;
-	handler();
-
-	spin_unlock(&hd_lock);
-
-	return IRQ_HANDLED;
-}
-
-static const struct block_device_operations hd_fops = {
-	.getgeo =	hd_getgeo,
-};
-
-static int __init hd_init(void)
-{
-	int drive;
-
-	if (register_blkdev(HD_MAJOR, "hd"))
-		return -1;
-
-	init_timer(&device_timer);
-	device_timer.function = hd_times_out;
-
-	if (!NR_HD) {
-		/*
-		 * We don't know anything about the drive.  This means
-		 * that you *MUST* specify the drive parameters to the
-		 * kernel yourself.
-		 *
-		 * If we were on an i386, we used to read this info from
-		 * the BIOS or CMOS.  This doesn't work all that well,
-		 * since this assumes that this is a primary or secondary
-		 * drive, and if we're using this legacy driver, it's
-		 * probably an auxiliary controller added to recover
-		 * legacy data off an ST-506 drive.  Either way, it's
-		 * definitely safest to have the user explicitly specify
-		 * the information.
-		 */
-		printk("hd: no drives specified - use hd=cyl,head,sectors"
-			" on kernel command line\n");
-		goto out;
-	}
-
-	for (drive = 0 ; drive < NR_HD ; drive++) {
-		struct gendisk *disk = alloc_disk(64);
-		struct hd_i_struct *p = &hd_info[drive];
-		if (!disk)
-			goto Enomem;
-		disk->major = HD_MAJOR;
-		disk->first_minor = drive << 6;
-		disk->fops = &hd_fops;
-		sprintf(disk->disk_name, "hd%c", 'a'+drive);
-		disk->private_data = p;
-		set_capacity(disk, p->head * p->sect * p->cyl);
-		disk->queue = blk_init_queue(do_hd_request, &hd_lock);
-		if (!disk->queue)
-			goto Enomem;
-		blk_queue_max_hw_sectors(disk->queue, 255);
-		blk_queue_logical_block_size(disk->queue, 512);
-		p->unit = drive;
-		hd_gendisk[drive] = disk;
-		printk("%s: %luMB, CHS=%d/%d/%d\n",
-			disk->disk_name, (unsigned long)get_capacity(disk)/2048,
-			p->cyl, p->head, p->sect);
-	}
-
-	if (request_irq(HD_IRQ, hd_interrupt, 0, "hd", NULL)) {
-		printk("hd: unable to get IRQ%d for the hard disk driver\n",
-			HD_IRQ);
-		goto out1;
-	}
-	if (!request_region(HD_DATA, 8, "hd")) {
-		printk(KERN_WARNING "hd: port 0x%x busy\n", HD_DATA);
-		goto out2;
-	}
-	if (!request_region(HD_CMD, 1, "hd(cmd)")) {
-		printk(KERN_WARNING "hd: port 0x%x busy\n", HD_CMD);
-		goto out3;
-	}
-
-	/* Let them fly */
-	for (drive = 0; drive < NR_HD; drive++)
-		add_disk(hd_gendisk[drive]);
-
-	return 0;
-
-out3:
-	release_region(HD_DATA, 8);
-out2:
-	free_irq(HD_IRQ, NULL);
-out1:
-	for (drive = 0; drive < NR_HD; drive++)
-		put_disk(hd_gendisk[drive]);
-	NR_HD = 0;
-out:
-	del_timer(&device_timer);
-	unregister_blkdev(HD_MAJOR, "hd");
-	return -1;
-Enomem:
-	for (drive = 0; drive < NR_HD; drive++) {
-		if (hd_gendisk[drive]) {
-			if (hd_gendisk[drive]->queue)
-				blk_cleanup_queue(hd_gendisk[drive]->queue);
-			put_disk(hd_gendisk[drive]);
-		}
-	}
-	goto out;
-}
-
-static int __init parse_hd_setup(char *line)
-{
-	int ints[6];
-
-	(void) get_options(line, ARRAY_SIZE(ints), ints);
-	hd_setup(NULL, ints);
-
-	return 1;
-}
-__setup("hd=", parse_hd_setup);
-
-late_initcall(hd_init);
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH 4/4] mtd: nand: nandsim: convert to memalloc_noreclaim_*()
From: Michal Hocko @ 2017-04-05 12:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Richard Weinberger, Andrew Morton, linux-mm, linux-kernel,
	Mel Gorman, Johannes Weiner, linux-block, nbd-general, open-iscsi,
	linux-scsi, netdev, Boris Brezillon, Adrian Hunter
In-Reply-To: <9b9d5bca-e125-e07b-b700-196cc800bbd7@suse.cz>

On Wed 05-04-17 13:39:16, Vlastimil Babka wrote:
> On 04/05/2017 01:36 PM, Richard Weinberger wrote:
> > Michal,
> > 
> > Am 05.04.2017 um 13:31 schrieb Michal Hocko:
> >> On Wed 05-04-17 09:47:00, Vlastimil Babka wrote:
> >>> Nandsim has own functions set_memalloc() and clear_memalloc() for robust
> >>> setting and clearing of PF_MEMALLOC. Replace them by the new generic helpers.
> >>> No functional change.
> >>
> >> This one smells like an abuser. Why the hell should read/write path
> >> touch memory reserves at all!
> > 
> > Could be. Let's ask Adrian, AFAIK he wrote that code.
> > Adrian, can you please clarify why nandsim needs to play with PF_MEMALLOC?
> 
> I was thinking about it and concluded that since the simulator can be
> used as a block device where reclaimed pages go to, writing the data out
> is a memalloc operation. Then reading can be called as part of r-m-w
> cycle, so reading as well. But it would be great if somebody more
> knowledgeable confirmed this.

then this deserves a big fat comment explaining all the details,
including how the complete depletion of reserves is prevented.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: always use REQ_OP_WRITE_ZEROES for zeroing offload
From: Christoph Hellwig @ 2017-04-05 12:08 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, axboe, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <yq1vaqjt74b.fsf@oracle.com>

On Wed, Apr 05, 2017 at 07:40:36AM -0400, Martin K. Petersen wrote:
> > This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> > supported by the block layer, and switches existing implementations
> > of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> > removes incorrect discard_zeroes_data, and also switches WRITE SAME
> > based zeroing in SCSI to this new method.
> 
> Very, very nice. I think this is the correct approach.
> 
> I'm going to send two follow-up patches that allow us to use UNMAP for
> discards and WRITE SAME w/ UNMAP for zeroout. That appears to be the
> preferred configuration for most storage devices.

Thanks, I'll resend it with your nitpicks fixed, your follow on
patches tacked on and the various reviewed-by tags in a bit.

^ permalink raw reply

* Re: [PATCH] blk-mq: Remove blk_mq_queue_data.list
From: Sagi Grimberg @ 2017-04-05 12:02 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe; +Cc: linux-block@vger.kernel.org
In-Reply-To: <CY1PR0401MB15363601FA034CE8627DA41881080@CY1PR0401MB1536.namprd04.prod.outlook.com>

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH v3] blk-mq: remap queues when adding/removing hardware queues
From: Sagi Grimberg @ 2017-04-05 12:01 UTC (permalink / raw)
  To: Omar Sandoval, Jens Axboe, linux-block
  Cc: Christoph Hellwig, Keith Busch, Josef Bacik, kernel-team
In-Reply-To: <9753ebd0c51a9d49f110a6d0d00888170905d97a.1490993257.git.osandov@fb.com>

Looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH 25/25] block: remove the discard_zeroes_data flag
From: Martin K. Petersen @ 2017-04-05 11:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-26-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we
> can kill this hack.

Oh yeah!

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 22/25] block: stop using discards for zeroing
From: Martin K. Petersen @ 2017-04-05 11:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-23-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> Now that we have REQ_OP_WRITE_ZEROES implemented for all devices that
> support efficient zeroing of devices we can remove the call to

s/of devices/

> blkdev_issue_discard.  This means we only have two ways of zeroing
> left and can simply the code.

simplify

> + *  Note that this function may fail with -EOPNOTSUPP if the driver supports
> + *  efficient zeroing operation, but the device capabilities can only be
> + *  discovered by trial and error.  In this case the caller should call the
> + *  function again, and it will use the fallback path.

Maybe:

"Note that this function may fail with -EOPNOTSUPP if the driver signals
zeroing offload support but the device fails to process the command (for
some devices there is no non-destructive way to verify whether this
operation is actually supported). If -EOPNOTSUPP is returned, the caller
should retry the blkdev_issue_zeroout() and the fallback path will be
used."

Otherwise OK.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 15/25] nvme: implement REQ_OP_WRITE_ZEROES
From: Martin K. Petersen @ 2017-04-05 11:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-16-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> But now for the real NVMe Write Zeroes yet, just to get rid of the
> discard abuse for zeroing.  Also rename the quirk flag to be a bit
> more self-explanatory.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 14/25] sd: implement unmapping Write Zeroes
From: Martin K. Petersen @ 2017-04-05 11:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-15-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> Try to use a write same with unmap bit variant if the device supports it
> and the caller allows for it.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 13/25] block_dev: use blkdev_issue_zerout for hole punches
From: Martin K. Petersen @ 2017-04-05 11:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-14-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> This gets us support for non-discard efficient write of zeroes
> (e.g. NVMe) and prepare for removing the discard_zeroes_data flag.
                         s

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 12/25] block: add a new BLKDEV_ZERO_NOFALLBACK flag
From: Martin K. Petersen @ 2017-04-05 11:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-13-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> This avoids fallbacks to explicit zeroing in (__)blkdev_issue_zeroout
> if the caller doesn't want them.
>
> Also clean up the convoluted check for the return condition that this
> new flag is added to.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 11/25] block: add a REQ_UNMAP flag for REQ_OP_WRITE_ZEROES
From: Martin K. Petersen @ 2017-04-05 11:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-12-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> If this flag is set logical provisioning capable device should
> release space for the zeroed blocks if possible, if it is not set
> devices should keep the blocks anchored.
>
> Also remove an out of sync kerneldoc comment for a static function
> that would have become even more out of data with this change.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 10/25] block: add a flags argument to (__)blkdev_issue_zeroout
From: Martin K. Petersen @ 2017-04-05 11:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-11-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> Turn the existin discard flag into a new BLKDEV_ZERO_UNMAP flag with
                  g

> similar semantics, but without referring to di=D1=95card.
                                                s

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

--=20
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 09/25] block: stop using blkdev_issue_write_same for zeroing
From: Martin K. Petersen @ 2017-04-05 11:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-10-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> We'll always use the WRITE ZEROES code for zeroing now.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 04/25] sd: implement REQ_OP_WRITE_ZEROES
From: Martin K. Petersen @ 2017-04-05 11:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-5-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 03/25] block: implement splitting of REQ_OP_WRITE_ZEROES bios
From: Martin K. Petersen @ 2017-04-05 11:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-4-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> Copy and past the REQ_OP_WRITE_SAME code to prepare to implementations
> that limit the write zeroes size.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 02/25] block: renumber REQ_OP_WRITE_ZEROES
From: Martin K. Petersen @ 2017-04-05 11:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-3-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> Make life easy for implementations that needs to send a data buffer
> to the device (e.g. SCSI) by numbering it as a data out command.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH 01/25] ѕd: split sd_setup_discard_cmnd
From: Martin K. Petersen @ 2017-04-05 11:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-2-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

> Split sd_setup_discard_cmnd into one function per provisioning type.  While
> this creates some very slight duplication of boilerplate code it keeps the
> code modular for additions of new provisioning types, and for reusing the
> write same functions for the upcoming scsi implementation of the Write Zeroes
> operation.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

Minor nit: Patch header should be "sd: ..." instead of " d: ...".

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: always use REQ_OP_WRITE_ZEROES for zeroing offload
From: Martin K. Petersen @ 2017-04-05 11:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, martin.petersen, agk, snitzer, shli, philipp.reisner,
	lars.ellenberg, linux-block, linux-scsi, drbd-dev, dm-devel,
	linux-raid
In-Reply-To: <20170331163313.31821-1-hch@lst.de>

Christoph Hellwig <hch@lst.de> writes:

Christoph,

> This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> supported by the block layer, and switches existing implementations
> of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> removes incorrect discard_zeroes_data, and also switches WRITE SAME
> based zeroing in SCSI to this new method.

Very, very nice. I think this is the correct approach.

I'm going to send two follow-up patches that allow us to use UNMAP for
discards and WRITE SAME w/ UNMAP for zeroout. That appears to be the
preferred configuration for most storage devices.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox