linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00 of 16] Block/SCSI Data Integrity Support
@ 2008-04-25 23:12 Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 01 of 16] Add support for the T10 Data Integrity Field CRC Martin K. Petersen
                   ` (15 more replies)
  0 siblings, 16 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi


This is the first take of my data integrity patches.  It's quite hard
to explain everything in a brief couple of paragraphs but I'll try.

There's more information to be found in the docs section at:

	http://oss.oracle.com/projects/data-integrity/


Here's the executive summary:


What's This All About?
----------------------

These patches allow data integrity information (checksum and more) to
be attached to I/Os at the block/filesystem layers and transferred
through the entire I/O stack all the way to the physical storage
device.

The integrity metadata can be generated in close proximity to the
original data.  Capable host adapters, RAID arrays and physical disks
can verify the data integrity and abort I/Os in case of a mismatch.

Right now this is SCSI disk only, but similar efforts are in progress
for SATA and SCSI tape.  With a few minor nits due to protocol
limitations, the proposed SATA format is identical to the SCSI ditto
for easy interoperability.


T10 DIF
-------

SCSI drives can usually be reformatted to 520-byte sectors, yielding 8
extra bytes per sector.  These 8 bytes have traditionally been used by
RAID controllers to store internal protection information.

DIF (Data Integrity Field) is an extension to the SCSI Block Commands
that standardizes the format of the 8 extra bytes and defines ways to
interact with the contents at the protocol level.  We refer to the
extra information as "integrity metadata" or "IMD".

Each 8-byte DIF tuple is split into three chunks:

	- a 16-bit guard tag containing a CRC of the 512-byte data
      	  portion of the sector.

	- a 16-bit application tag which is up for grabs.

	- a 32-bit reference tag which contains an incrementing
          counter for each sector.  For DIF Type 1 it also needs to
          match the physical LBA on the drive.

There are three types of DIF defined: Type 1, Type 2, and Type 3.  My
patches are Type 1 only, although Type 3 devices should work.  Type 2
depends on 32-byte CDBs and is in progress.

Since the DIF tuple format is standardized, both initiators and
targets (as well as potentially transport switches in-between) to
verify the integrity of the data going over the bus.

When writing, the HBA will DMA 512-byte sectors from host memory,
generate the matching integrity metadata and send out 520-byte sectors
on the wire.  The disk will verify the integrity of the data before
committing it to stable storage.

When reading, the drive will send 520-byte sectors to the HBA.  The
HBA will verify the data integrity and DMA 512-byte sectors to host
memory.

IOW, DIF provides means for added integrity protection between HBA and
disk.


Data Integrity Extensions
-------------------------

In order to provide true end-to-end data integrity we need to be able
to get access to the integrity metadata from the OS.  Dealing with
520-byte sectors is quite inconvenient, so we have worked with HBA
manufacturers to separate the data buffer scatter-gather from the
integrity metadata scatter-gather.

Also, the CRC16 is somewhat expensive to calculate in software.  So we
have also allowed alternate checksums to be used.  Currently we only
support the IP checksum which is fast and cheap to calculate.

These two features and a few more knobs constitute what is known as
DIX or the I/O Controller Data Integrity Extensions.

When writing, the HBA will DMA two scatterlists from host memory: One
containing the data as usual, and one containing the integrity
metadata.  The HBA will verify that the two are in agreement and
interleave them before sending them out on the wire as 520-byte
sectors.

When reading, the disk will return 520-byte sectors, the HBA will
verify the integrity, separate IMD from the data, and DMA to the two
separate scatterlists in host memory.


SCSI Layer Changes
------------------

At the SCSI level, there are a few changes required to support this:

 - an extra scatterlist for the integrity metadata

 - tweaks to sd.c to detect and handle disks formatted with DIF

 - sd.c must issue the right READ/WRITE commands when DIF is on

 - helper functions for HBA drivers

 - extra fields in scsi_host to signal the HBA driver's DIF
   capabilities


Block Layer Changes
-------------------

The main idea of DIF/DIX is to allow integrity metadata to be
generated as close to the original data as possible.  So in the long
run we'd like this to happen in userland.  Given mmap(), direct I/O,
etc. this obviously poses some challenges.  *cough*

For now the integrity metadata is generated at the block layer when an
I/O is submitted by the filesystem.  There are also functions that
allow filesystems to use the application tag to mark sectors for
future recovery or similar.

struct bio has been extended with a pointer to a struct bip which in
turn contains the integrity metadata.  The bip is essentially a
trimmed down bio with a bio_vec and some housekeeping.

There are a few hooks inserted in fs/bio.c and block/blk-* to allow
integrity metadata to be handled correctly when splitting, cloning and
merging.  Aside from that, the integrity stuff is completely opaque.

Because we don't want the block layer, filesystems, etc. to know about
DIF, DIX, tuple formats, etc. all the functions that interact with the
integrity metadata reside in the SCSI layer and are registered via a
callback handler template.  The block layer changes have been made so
that the upcoming standards for data integrity on SATA (T13 External
Path Protection) and SCSI tape will fit right in and can register
their own handlers.

I have included a more in-depth description of the block layer changes
in Documentation/block/data-integrity.txt.


The Patches
-----------

Since this patch set is quite intrusive all across the board, I
haven't been able to split it up in incremental pieces.  So the
patches are more or less a logical grouping and won't work
independently.

Right now everything has been set up so it can be toggled with
CONFIG_BLK_DEV_INTEGRITY and CONFIG_SCSI_PROTECTION.  There are a few
places where this makes things really icky in terms of either code or
#ifdefs.  I'm hoping we can eventually promote the data integrity
stuff to become a first class citizen and get rid of that cruft.

Other than that here are the patches.  I'm very interested in
comments, suggestions, etc.


How To Experiment Without DIF Hardware
--------------------------------------

# modprobe scsi_debug protection=1 guard=1 ato=1 dev_size_mb=1024

-- 
Martin K. Petersen	Oracle Linux Engineering




^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01 of 16] Add support for the T10 Data Integrity Field CRC
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 02 of 16] Globalize bio_set and bio_vec_slab Martin K. Petersen
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

4 files changed, 84 insertions(+)
include/linux/crc-t10dif.h |    8 +++++
lib/Kconfig                |    7 ++++
lib/Makefile               |    1 
lib/crc-t10dif.c           |   68 ++++++++++++++++++++++++++++++++++++++++++++


Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r 6065be53b4fd -r eb54ccf75103 include/linux/crc-t10dif.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/include/linux/crc-t10dif.h	Fri Apr 25 17:39:29 2008 -0400
@@ -0,0 +1,8 @@
+#ifndef _LINUX_CRC_T10DIF_H
+#define _LINUX_CRC_T10DIF_H
+
+#include <linux/types.h>
+
+__u16 crc_t10dif(unsigned char const *, size_t);
+
+#endif
diff -r 6065be53b4fd -r eb54ccf75103 lib/Kconfig
--- a/lib/Kconfig	Thu Apr 24 14:41:20 2008 -0700
+++ b/lib/Kconfig	Fri Apr 25 17:39:29 2008 -0400
@@ -22,6 +22,13 @@
 	  modules require CRC16 functions, but a module built outside
 	  the kernel tree does. Such modules that use library CRC16
 	  functions require M here.
+
+config CRC_T10DIF
+	tristate "CRC calculation for the T10 Data Integrity Field"
+	help
+	  This option is only needed if a module that's not in the
+	  kernel tree needs to calculate CRC checks for use with the
+	  SCSI data integrity subsystem.
 
 config CRC_ITU_T
 	tristate "CRC ITU-T V.41 functions"
diff -r 6065be53b4fd -r eb54ccf75103 lib/Makefile
--- a/lib/Makefile	Thu Apr 24 14:41:20 2008 -0700
+++ b/lib/Makefile	Fri Apr 25 17:39:29 2008 -0400
@@ -43,6 +43,7 @@
 obj-$(CONFIG_BITREVERSE) += bitrev.o
 obj-$(CONFIG_CRC_CCITT)	+= crc-ccitt.o
 obj-$(CONFIG_CRC16)	+= crc16.o
+obj-$(CONFIG_CRC_T10DIF)+= crc-t10dif.o
 obj-$(CONFIG_CRC_ITU_T)	+= crc-itu-t.o
 obj-$(CONFIG_CRC32)	+= crc32.o
 obj-$(CONFIG_CRC7)	+= crc7.o
diff -r 6065be53b4fd -r eb54ccf75103 lib/crc-t10dif.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/lib/crc-t10dif.c	Fri Apr 25 17:39:29 2008 -0400
@@ -0,0 +1,68 @@
+/*
+ * T10 Data Integrity Field CRC16 calculation
+ *
+ * Copyright (c) 2007 Oracle Corporation.  All rights reserved.
+ * Written by Martin K. Petersen <martin.petersen@oracle.com>
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/crc-t10dif.h>
+
+/* Table generated using the following polynomium:
+ * x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
+ * gt: 0x8bb7
+ */
+static const __u16 t10_dif_crc_table[256] = {
+	0x0000, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
+	0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
+	0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
+	0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
+	0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
+	0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
+	0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
+	0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
+	0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
+	0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
+	0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
+	0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
+	0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
+	0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
+	0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
+	0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
+	0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
+	0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
+	0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
+	0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
+	0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
+	0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
+	0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
+	0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
+	0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
+	0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
+	0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
+	0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
+	0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
+	0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
+	0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
+	0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
+};
+
+__u16 crc_t10dif(const unsigned char *buffer, size_t len)
+{
+	__u16 crc = 0;
+	unsigned int i;
+
+	for (i=0 ; i < len ; i++)
+		crc = (crc << 8) ^ t10_dif_crc_table[((crc >> 8) ^ buffer[i]) & 0xff];
+
+	return crc;
+}
+
+EXPORT_SYMBOL(crc_t10dif);
+
+MODULE_DESCRIPTION("T10 DIF CRC calculation");
+MODULE_LICENSE("GPL");



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 02 of 16] Globalize bio_set and bio_vec_slab
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 01 of 16] Add support for the T10 Data Integrity Field CRC Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 03 of 16] Find bio sector offset given idx and offset Martin K. Petersen
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

2 files changed, 38 insertions(+), 28 deletions(-)
fs/bio.c            |   36 ++++++++----------------------------
include/linux/bio.h |   30 ++++++++++++++++++++++++++++++


Move struct bio_set and biovec_slab definitions to bio.h so they can
be used outside of bio.c.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r eb54ccf75103 -r b9f133a520ea fs/bio.c
--- a/fs/bio.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/fs/bio.c	Fri Apr 25 17:39:29 2008 -0400
@@ -28,24 +28,9 @@
 #include <linux/blktrace_api.h>
 #include <scsi/sg.h>		/* for struct sg_iovec */
 
-#define BIO_POOL_SIZE 2
-
 static struct kmem_cache *bio_slab __read_mostly;
 
-#define BIOVEC_NR_POOLS 6
-
-/*
- * a small number of entries is fine, not going to be performance critical.
- * basically we just need to survive
- */
-#define BIO_SPLIT_ENTRIES 2
 mempool_t *bio_split_pool __read_mostly;
-
-struct biovec_slab {
-	int nr_vecs;
-	char *name; 
-	struct kmem_cache *slab;
-};
 
 /*
  * if you change this list, also change bvec_alloc or things will
@@ -60,23 +45,18 @@
 #undef BV
 
 /*
- * bio_set is used to allow other portions of the IO system to
- * allocate their own private memory pools for bio and iovec structures.
- * These memory pools in turn all allocate from the bio_slab
- * and the bvec_slabs[].
- */
-struct bio_set {
-	mempool_t *bio_pool;
-	mempool_t *bvec_pools[BIOVEC_NR_POOLS];
-};
-
-/*
  * fs_bio_set is the bio_set containing bio and iovec memory pools used by
  * IO code that does not need private memory pools.
  */
-static struct bio_set *fs_bio_set;
+struct bio_set *fs_bio_set;
 
-static inline struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx, struct bio_set *bs)
+inline int bvec_nr_vecs(int idx)
+{
+	return bvec_slabs[idx].nr_vecs;
+}
+EXPORT_SYMBOL(bvec_nr_vecs);
+
+struct bio_vec *bvec_alloc_bs(gfp_t gfp_mask, int nr, unsigned long *idx, struct bio_set *bs)
 {
 	struct bio_vec *bvl;
 
diff -r eb54ccf75103 -r b9f133a520ea include/linux/bio.h
--- a/include/linux/bio.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/linux/bio.h	Fri Apr 25 17:39:29 2008 -0400
@@ -331,6 +331,36 @@
 				     int, int);
 extern int bio_uncopy_user(struct bio *);
 void zero_fill_bio(struct bio *bio);
+extern struct bio_vec *bvec_alloc_bs(gfp_t, int, unsigned long *, struct bio_set *);
+extern inline int bvec_nr_vecs(int idx);
+
+/*
+ * bio_set is used to allow other portions of the IO system to
+ * allocate their own private memory pools for bio and iovec structures.
+ * These memory pools in turn all allocate from the bio_slab
+ * and the bvec_slabs[].
+ */
+#define BIO_POOL_SIZE 2
+#define BIOVEC_NR_POOLS 6
+
+struct bio_set {
+	mempool_t *bio_pool;
+	mempool_t *bvec_pools[BIOVEC_NR_POOLS];
+};
+
+struct biovec_slab {
+	int nr_vecs;
+	char *name; 
+	struct kmem_cache *slab;
+};
+
+extern struct bio_set *fs_bio_set;
+
+/*
+ * a small number of entries is fine, not going to be performance critical.
+ * basically we just need to survive
+ */
+#define BIO_SPLIT_ENTRIES 2
 
 #ifdef CONFIG_HIGHMEM
 /*



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 03 of 16] Find bio sector offset given idx and offset
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 01 of 16] Add support for the T10 Data Integrity Field CRC Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 02 of 16] Globalize bio_set and bio_vec_slab Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 04 of 16] Block layer data integrity Martin K. Petersen
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

2 files changed, 26 insertions(+)
fs/bio.c            |   24 ++++++++++++++++++++++++
include/linux/bio.h |    2 ++


Helper function to find the sector offset in a bio given bvec index
and page offset.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r b9f133a520ea -r 5d30928a2730 fs/bio.c
--- a/fs/bio.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/fs/bio.c	Fri Apr 25 17:39:29 2008 -0400
@@ -1142,6 +1142,30 @@
 	return bp;
 }
 
+sector_t bio_sector_offset(struct bio *bio, unsigned short index, unsigned int offset)
+{
+	struct bio_vec *bv;
+	unsigned int sector_sz = bio->bi_bdev->bd_disk->queue->hardsect_size;
+	sector_t sectors;
+	int i;
+
+	sectors = 0;
+
+	BUG_ON(index >= bio->bi_vcnt);
+	
+	bio_for_each_segment(bv, bio, i) {
+		if (i == index) {
+			if (offset > bv->bv_offset)
+				sectors += (offset - bv->bv_offset) / sector_sz;
+			return sectors;
+		}
+		
+		sectors += bv->bv_len / sector_sz;
+	}
+	
+	BUG();
+}
+EXPORT_SYMBOL(bio_sector_offset);
 
 /*
  * create memory pools for biovec's in a bio_set.
diff -r b9f133a520ea -r 5d30928a2730 include/linux/bio.h
--- a/include/linux/bio.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/linux/bio.h	Fri Apr 25 17:39:29 2008 -0400
@@ -315,6 +315,8 @@
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
 			   unsigned int, unsigned int);
 extern int bio_get_nr_vecs(struct block_device *);
+extern sector_t bio_sector_offset(struct bio *, unsigned short, unsigned int);
+
 extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
 				unsigned long, unsigned int, int);
 struct sg_iovec;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 04 of 16] Block layer data integrity
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (2 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 03 of 16] Find bio sector offset given idx and offset Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 05 " Martin K. Petersen
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

4 files changed, 822 insertions(+), 3 deletions(-)
fs/Makefile         |    1 
fs/bio-integrity.c  |  712 +++++++++++++++++++++++++++++++++++++++++++++++++++
fs/bio.c            |   27 +
include/linux/bio.h |   85 ++++++


Allows integrity metadata to be attached to a bio.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r 5d30928a2730 -r c4c47b2f1539 fs/Makefile
--- a/fs/Makefile	Fri Apr 25 17:39:29 2008 -0400
+++ b/fs/Makefile	Fri Apr 25 17:39:29 2008 -0400
@@ -19,6 +19,7 @@
 obj-y +=	no-block.o
 endif
 
+obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o
 obj-$(CONFIG_INOTIFY)		+= inotify.o
 obj-$(CONFIG_INOTIFY_USER)	+= inotify_user.o
 obj-$(CONFIG_EPOLL)		+= eventpoll.o
diff -r 5d30928a2730 -r c4c47b2f1539 fs/bio-integrity.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/fs/bio-integrity.c	Fri Apr 25 17:39:29 2008 -0400
@@ -0,0 +1,712 @@
+/*
+ * bio-integrity.c - bio data integrity extensions
+ *
+ * Copyright (C) 2007, 2008 Oracle Corporation
+ * Written by: Martin K. Petersen <martin.petersen@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+#include <linux/mempool.h>
+#include <linux/bio.h>
+#include <linux/workqueue.h>
+
+static struct kmem_cache *bio_integrity_slab __read_mostly;
+static struct workqueue_struct *kintegrityd_wq;
+
+/**
+ * bio_integrity_alloc_bioset - Allocate integrity payload and attach it to bio
+ * @bio:	bio to attach integrity metadata to
+ * @gfp_mask:	Memory allocation mask
+ * @nr_vecs:	Number of integrity metadata scatter-gather elements
+ * @bs:		bio_set to allocate from
+ *
+ * Description: This function prepares a bio for attaching integrity
+ * metadata.  nr_vecs specifies the maximum number of pages containing
+ * integrity metadata that can be attached.
+ */
+struct bip *bio_integrity_alloc_bioset(struct bio *bio, gfp_t gfp_mask, unsigned int nr_vecs, struct bio_set *bs)
+{
+	struct bip *bip;
+	struct bio_vec *bv;
+	unsigned long idx;
+
+	BUG_ON(bio == NULL);
+
+	bip = mempool_alloc(bs->bio_integrity_pool, gfp_mask);
+	if (unlikely(bip == NULL)) {
+		printk(KERN_ERR "%s: could not alloc bip\n", __func__);
+		return NULL;
+	}
+
+	memset(bip, 0, sizeof(*bip));
+	idx = 0;
+
+	bv = bvec_alloc_bs(gfp_mask, nr_vecs, &idx, bs);
+	if (unlikely(bv == NULL)) {
+		printk(KERN_ERR "%s: could not alloc bip_vec\n", __func__);
+		mempool_free(bip, bs->bio_integrity_pool);
+		return NULL;
+	}
+
+	bip->bip_pool = idx;
+	bip->bip_vec = bv;
+	bip->bip_bio = bio;
+	bio->bi_integrity = bip;
+
+	return bip;
+}
+EXPORT_SYMBOL(bio_integrity_alloc_bioset);
+
+/**
+ * bio_integrity_alloc - Allocate integrity payload and attach it to bio
+ * @bio:	bio to attach integrity metadata to
+ * @gfp_mask:	Memory allocation mask
+ * @nr_vecs:	Number of integrity metadata scatter-gather elements
+ *
+ * Description: This function prepares a bio for attaching integrity
+ * metadata.  nr_vecs specifies the maximum number of pages containing
+ * integrity metadata that can be attached.
+ */
+struct bip *bio_integrity_alloc(struct bio *bio, gfp_t gfp_mask,
+				unsigned int nr_vecs)
+{
+	return bio_integrity_alloc_bioset(bio, gfp_mask, nr_vecs, fs_bio_set);
+}
+EXPORT_SYMBOL(bio_integrity_alloc);
+
+/**
+ * bio_integrity_free - Free bio integrity payload
+ * @bio:	bio containing bip to be freed
+ * @bs:		bio_set this bio was allocated from
+ *
+ * Description: Used to free the integrity portion of a bio. Usually
+ * called from bio_free().
+ */
+void bio_integrity_free(struct bio *bio, struct bio_set *bs)
+{
+	struct bip *bip = bio->bi_integrity;
+
+	BUG_ON(bip == NULL);
+
+	/* A cloned bio doesn't own the integrity metadata */
+	if (!bio_flagged(bio, BIO_CLONED) && bip->bip_buf != NULL)
+		kfree(bip->bip_buf);
+
+	mempool_free(bip->bip_vec, bs->bvec_pools[bip->bip_pool]);
+	mempool_free(bip, bs->bio_integrity_pool);
+
+	bio->bi_integrity = NULL;
+}
+EXPORT_SYMBOL(bio_integrity_free);
+
+/**
+ * bio_integrity_add_page - Attach integrity metadata
+ * @bio:	bio to update
+ * @page:	page containing integrity metadata
+ * @len:	number of bytes of integrity metadata in page
+ * @offset:	start offset within page
+ *
+ * Description: Attach a page containing integrity metadata to bio.
+ */
+int bio_integrity_add_page(struct bio *bio, struct page *page,
+			   unsigned int len, unsigned int offset)
+{
+	struct bip *bip;
+	struct bio_vec *iv;
+
+	bip = bio->bi_integrity;
+
+	if (bip->bip_vcnt >= bvec_nr_vecs(bip->bip_pool)) {
+		printk(KERN_ERR "%s: bip_vec full\n", __func__);
+		return 0;
+	}
+
+	iv = bip_vec_idx(bip, bip->bip_vcnt);
+	BUG_ON(iv == NULL);
+	BUG_ON(iv->bv_page != NULL);
+
+	iv->bv_page = page;
+	iv->bv_len = len;
+	iv->bv_offset = offset;
+	bip->bip_vcnt++;
+
+	return len;
+}
+EXPORT_SYMBOL(bio_integrity_add_page);
+
+/**
+ * bio_integrity_enabled - Check whether integrity can be passed
+ * @bio:	bio to check
+ *
+ * Description: Determines whether bio_integrity_prep() can be called
+ * on this bio or not.	bio data direction and target device must be
+ * set prior to calling.  The functions honors the write_generate and
+ * read_verify flags in sysfs.
+ */
+inline int bio_integrity_enabled(struct bio *bio)
+{
+	/* Already protected? */
+	if (bio_integrity(bio))
+		return 0;
+
+	return bdev_integrity_enabled(bio->bi_bdev, bio_data_dir(bio));
+}
+EXPORT_SYMBOL(bio_integrity_enabled);
+
+/**
+ * bio_integrity_tag_size - Retrieve integrity tag space
+ * @bio:	bio to inspect
+ *
+ * Description: Returns the maximum number of tag bytes that can be
+ * attached to this bio. Filesystems can use this to determine how
+ * much metadata to attach to an I/O.
+ */
+unsigned int bio_integrity_tag_size(struct bio *bio)
+{
+	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+
+	BUG_ON(bio->bi_size == 0);
+
+	return bi->tag_size * (bio->bi_size / bi->sector_size);
+}
+EXPORT_SYMBOL(bio_integrity_tag_size);
+
+/**
+ * bio_integrity_set_tag - Attach a tag buffer to a bio
+ * @bio:	bio to attach buffer to
+ * @tag_buf:	Pointer to a buffer containing tag data
+ * @len:	Length of the included buffer
+ *
+ * Description: Use this function to tag a bio by leveraging the extra
+ * space provided by devices formatted with integrity protection.  The
+ * size of the integrity buffer must be <= to the size reported by
+ * bio_integrity_tag_size().
+ */
+int bio_integrity_set_tag(struct bio *bio, void *tag_buf, unsigned int len)
+{
+	struct bip *bip = bio->bi_integrity;
+	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+	unsigned int nr_sectors;
+
+	BUG_ON(bip->bip_buf == NULL);
+	BUG_ON(bio_data_dir(bio) != WRITE);
+
+	if (bi->tag_size == 0)
+		return -1;
+
+	nr_sectors = len / bi->tag_size;
+
+	if (len % 2)
+		nr_sectors++;
+
+	if (bi->sector_size == 4096)
+		nr_sectors >>= 3;
+
+	if (nr_sectors * bi->tuple_size > bip->bip_size) {
+		printk(KERN_ERR "%s: tag too big for bio: %u > %u\n",
+		       __func__, nr_sectors * bi->tuple_size, bip->bip_size);
+		return -1;
+	}
+
+	bi->set_tag_fn(bip->bip_buf, tag_buf, nr_sectors);
+
+	return 0;
+}
+EXPORT_SYMBOL(bio_integrity_set_tag);
+
+/**
+ * bio_integrity_get_tag - Retrieve a tag buffer from a bio
+ * @bio:	bio to retrieve buffer from
+ * @tag_buf:	Pointer to a buffer for the tag data
+ * @len:	Length of the target buffer
+ *
+ * Description: Use this function to retrieve the tag buffer from a
+ * completed I/O. The size of the integrity buffer must be <= to the
+ * size reported by bio_integrity_tag_size().
+ */
+int bio_integrity_get_tag(struct bio *bio, void *tag_buf, unsigned int len)
+{
+	struct bip *bip = bio->bi_integrity;
+	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+	unsigned int nr_sectors;
+
+	BUG_ON(bip->bip_buf == NULL);
+	BUG_ON(bio_data_dir(bio) != READ);
+
+	if (bi->tag_size == 0)
+		return -1;
+
+	nr_sectors = len / bi->tag_size;
+
+	if (len % 2)
+		nr_sectors++;
+
+	if (bi->sector_size == 4096)
+		nr_sectors >>= 3;
+
+	if (nr_sectors * bi->tuple_size > bip->bip_size) {
+		printk(KERN_ERR "%s: tag too big for bio: %u > %u\n",
+		       __func__, nr_sectors * bi->tuple_size, bip->bip_size);
+		return -1;
+	}
+
+	bi->get_tag_fn(bip->bip_buf, tag_buf, nr_sectors);
+
+	return 0;
+}
+EXPORT_SYMBOL(bio_integrity_get_tag);
+
+/**
+ * bio_integrity_generate - Generate integrity metadata for a bio
+ * @bio:	bio to generate integrity metadata for
+ *
+ * Description: Generates integrity metadata for a bio by calling the
+ * block device's generation callback function.	 The bio must have a
+ * bip attached with enough room to accomodate the generated integrity
+ * metadata.
+ */
+static void bio_integrity_generate(struct bio *bio)
+{
+	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+	struct blk_integrity_exchg bix;
+	struct bio_vec *bv;
+	sector_t sector = bio->bi_sector;
+	unsigned int i, sectors, total;
+	void *prot_buf = bio->bi_integrity->bip_buf;
+
+	total = 0;
+	bix.disk_name = bio->bi_bdev->bd_disk->disk_name;
+	bix.sector_size = bi->sector_size;
+
+	bio_for_each_segment(bv, bio, i) {
+		bix.data_buf = kmap_atomic(bv->bv_page, KM_USER0)
+			+ bv->bv_offset;
+		bix.data_size = bv->bv_len;
+		bix.prot_buf = prot_buf;
+		bix.sector = sector;
+
+		bi->generate_fn(&bix);
+
+		sectors = bv->bv_len / bi->sector_size;
+		sector += sectors;
+		prot_buf += sectors * bi->tuple_size;
+		total += sectors * bi->tuple_size;
+		BUG_ON(total > bio->bi_integrity->bip_size);
+
+		kunmap_atomic(bv->bv_page, KM_USER0);
+	}
+}
+
+/**
+ * bio_integrity_prep - Prepare bio for integrity I/O
+ * @bio:	bio to prepare
+ *
+ * Description: Allocates a buffer for integrity metadata, maps the
+ * pages and attaches them to a bio.  The bio must have data
+ * direction, target device and start sector set priot to calling.  In
+ * the WRITE case, integrity metadata will be generated using the
+ * block device's integrity function.  In the READ case, the buffer
+ * will be prepared for DMA and a suitable end_io handler set up.
+ */
+int bio_integrity_prep(struct bio *bio)
+{
+	struct bip *bip;
+	struct blk_integrity *bi;
+	struct request_queue *q;
+	void *buf;
+	unsigned long start, end;
+	unsigned int len, nr_pages;
+	unsigned int bytes, offset, i;
+	unsigned int sectors = bio_sectors(bio);
+
+	bi = bdev_get_integrity(bio->bi_bdev);
+	q = bdev_get_queue(bio->bi_bdev);
+	BUG_ON(bi == NULL);
+	BUG_ON(bio_integrity(bio));
+
+	/* Allocate kernel buffer for protection data */
+	len = sectors * blk_integrity_tuple_size(bi);
+	buf = kzalloc(len, GFP_NOIO | q->bounce_gfp);
+	if (unlikely(buf == NULL)) {
+		printk(KERN_ERR "could not allocate integrity buffer\n");
+		return -EIO;
+	}
+
+	end = (((unsigned long) buf) + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	start = ((unsigned long) buf) >> PAGE_SHIFT;
+	nr_pages = end - start;
+
+	/* Allocate bio integrity payload and integrity vectors */
+	bip = bio_integrity_alloc(bio, GFP_NOIO, nr_pages);
+	if (unlikely(bip == NULL)) {
+		printk(KERN_ERR "could not allocate data integrity bioset\n");
+		kfree(buf);
+		return -EIO;
+	}
+
+	bip->bip_buf = buf;
+	bip->bip_size = len;
+	bip->bip_sector = bio->bi_sector;
+
+	/* Map it */
+	offset = offset_in_page(buf);
+	for (i = 0 ; i < nr_pages ; i++) {
+		int ret;
+		bytes = PAGE_SIZE - offset;
+
+		if (len <= 0)
+			break;
+
+		if (bytes > len)
+			bytes = len;
+
+		ret = bio_integrity_add_page(bio, virt_to_page(buf),
+					     bytes, offset);
+
+		if (ret == 0)
+			return 0;
+
+		if (ret < bytes)
+			break;
+
+		buf += bytes;
+		len -= bytes;
+		offset = 0;
+	}
+
+	/* Install custom I/O completion handler if read verify is enabled */
+	if (bio_data_dir(bio) == READ) {
+		bip->bip_end_io = bio->bi_end_io;
+		bio->bi_end_io = bio_integrity_endio;
+	}
+
+	/* Auto-generate integrity metadata if this is a write */
+	if (bio_data_dir(bio) == WRITE)
+		bio_integrity_generate(bio);
+
+	return 0;
+}
+EXPORT_SYMBOL(bio_integrity_prep);
+
+/**
+ * bio_integrity_verify - Verify integrity metadata for a bio
+ * @bio:	bio to verify
+ *
+ * Description: This function is called to verify the integrity of a
+ * bio.	 The data in the bio io_vec is compared to the integrity
+ * metadata returned by the HBA.
+ */
+static int bio_integrity_verify(struct bio *bio)
+{
+	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+	struct blk_integrity_exchg bix;
+	struct bio_vec *bv;
+	sector_t sector = bio->bi_integrity->bip_sector;
+	unsigned int i, sectors, total, ret;
+	void *prot_buf = bio->bi_integrity->bip_buf;
+
+	total = 0;
+	bix.disk_name = bio->bi_bdev->bd_disk->disk_name;
+	bix.sector_size = bi->sector_size;
+
+	bio_for_each_segment(bv, bio, i) {
+		bix.data_buf = kmap_atomic(bv->bv_page, KM_USER0)
+			+ bv->bv_offset;
+		bix.data_size = bv->bv_len;
+		bix.prot_buf = prot_buf;
+		bix.sector = sector;
+
+		ret = bi->verify_fn(&bix);
+
+		if (ret) {
+			kunmap_atomic(bv->bv_page, KM_USER0);
+			return ret;
+		}
+
+		sectors = bv->bv_len / bi->sector_size;
+		sector += sectors;
+		prot_buf += sectors * bi->tuple_size;
+		total += sectors * bi->tuple_size;
+		BUG_ON(total > bio->bi_integrity->bip_size);
+
+		kunmap_atomic(bv->bv_page, KM_USER0);
+	}
+
+	return 0;
+}
+
+/**
+ * bio_integrity_verify_fn - Integrity I/O completion worker
+ * @work:	Work struct stored in bio to be verified
+ *
+ * Description: This workqueue function is called to complete a READ
+ * request.  The function verifies the transferred integrity metadata
+ * and then calls the original bio end_io function.
+ */
+static void bio_integrity_verify_fn(struct work_struct *work)
+{
+	struct bip *bip = container_of(work, struct bip, bip_work);
+	struct bio *bio = bip->bip_bio;
+	int error = bip->bip_error;
+
+	if (bio_integrity_verify(bio)) {
+		clear_bit(BIO_UPTODATE, &bio->bi_flags);
+		error = -EIO;
+	}
+
+	/* Restore original bio completion handler */
+	bio->bi_end_io = bip->bip_end_io;
+
+	if (bio->bi_end_io)
+		bio->bi_end_io(bio, error);
+}
+
+/**
+ * bio_integrity_endio - Integrity I/O completion function
+ * @bio:	Protected bio
+ * @error:	Pointer to errno
+ *
+ * Description: Completion for integrity I/O
+ *
+ * Normally I/O completion is done in interrupt context.  However,
+ * verifying I/O integrity is a time-consuming task which must be run
+ * in process context.	This function postpones completion
+ * accordingly.
+ */
+void bio_integrity_endio(struct bio *bio, int error)
+{
+	struct bip *bip = bio->bi_integrity;
+
+	BUG_ON(bip->bip_bio != bio);
+
+	bip->bip_error = error;
+	INIT_WORK(&bip->bip_work, bio_integrity_verify_fn);
+	queue_work(kintegrityd_wq, &bip->bip_work);
+}
+EXPORT_SYMBOL(bio_integrity_endio);
+
+/**
+ * bio_integrity_advance - Advance integrity vector
+ * @bio:	bio whose integrity vector to update
+ * @bytes_done:	number of data bytes that have been completed
+ *
+ * Description: This function calculates how many integrity bytes the
+ * number of completed data bytes correspond to and advances the
+ * integrity vector accordingly.
+ */
+void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
+{
+	struct bip *bip = bio->bi_integrity;
+	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+	struct bio_vec *iv;
+	unsigned int i, skip, nr_sectors;
+
+	BUG_ON(bip == NULL);
+	BUG_ON(bi == NULL);
+
+	nr_sectors = bytes_done >> 9;
+
+	if (bi->sector_size == 4096)
+		nr_sectors >>= 3;
+
+	skip = nr_sectors * bi->tuple_size;
+
+	bip_for_each_vec(iv, bip, i) {
+		if (skip == 0) {
+			bip->bip_idx = i;
+			return;
+		} else if (skip >= iv->bv_len) {
+			skip -= iv->bv_len;
+		} else { /* skip < iv->bv_len) */
+			iv->bv_offset += skip;
+			iv->bv_len -= skip;
+			bip->bip_idx = i;
+			return;
+		}
+	}
+}
+EXPORT_SYMBOL(bio_integrity_advance);
+
+/**
+ * bio_integrity_trim - Trim integrity vector
+ * @bio:	bio whose integrity vector to update
+ * @offset:	offset to first data sector
+ * @sectors:	number of data sectors
+ *
+ * Description: Used to trim the integrity vector in a cloned bio.
+ * The ivec will be advanced corresponding to 'offset' data sectors
+ * and the length will be truncated corresponding to 'len' data
+ * sectors.
+ */
+void bio_integrity_trim(struct bio *bio, unsigned int offset, unsigned int sectors)
+{
+	struct bip *bip = bio->bi_integrity;
+	struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
+	struct bio_vec *iv;
+	unsigned int i, skip, nr_bytes;
+
+	BUG_ON(bip == NULL);
+	BUG_ON(bi == NULL);
+	BUG_ON(!bio_flagged(bio, BIO_CLONED));
+
+	if (bi->sector_size == 4096)
+		sectors >>= 3;
+
+	bip->bip_sector = bip->bip_sector + offset;
+	skip = offset * bi->tuple_size;
+	nr_bytes = sectors * bi->tuple_size;
+
+	/* Mark head */
+	bip_for_each_vec(iv, bip, i) {
+		if (skip == 0) {
+			bip->bip_idx = i;
+			break;
+		} else if (skip >= iv->bv_len) {
+			skip -= iv->bv_len;
+		} else { /* skip < iv->bv_len) */
+			iv->bv_offset += skip;
+			iv->bv_len -= skip;
+			bip->bip_idx = i;
+			break;
+		}
+	}
+
+	/* Mark tail */
+	bip_for_each_vec(iv, bip, i) {
+		if (nr_bytes == 0) {
+			bip->bip_vcnt = i;
+			break;
+		} else if (nr_bytes >= iv->bv_len) {
+			nr_bytes -= iv->bv_len;
+		} else { /* nr_bytes < iv->bv_len) */
+			iv->bv_len = nr_bytes;
+			nr_bytes = 0;
+		}
+	}
+}
+EXPORT_SYMBOL(bio_integrity_trim);
+
+/**
+ * bio_integrity_split - Split integrity metadata
+ * @bio:	Protected bio
+ * @bp:		Resulting bio_pair
+ * @sectors:	Offset
+ *
+ * Description: Splits an integrity page into a bio_pair.
+ */
+void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
+{
+	struct blk_integrity *bi;
+	struct bip *bip = bio->bi_integrity;
+
+	if (bio_integrity(bio) == 0)
+		return;
+
+	bi = bdev_get_integrity(bio->bi_bdev);
+	BUG_ON(bi == NULL);
+	BUG_ON(bip->bip_vcnt != 1);
+
+	if (bi->sector_size == 4096)
+		sectors >>= 3;
+
+	bp->bio1.bi_integrity = &bp->bip1;
+	bp->bio2.bi_integrity = &bp->bip2;
+
+	bp->iv1 = bip->bip_vec[0];
+	bp->iv2 = bip->bip_vec[0];
+
+	bp->bip1.bip_vec = &bp->iv1;
+	bp->bip2.bip_vec = &bp->iv2;
+
+	bp->iv1.bv_len = sectors * bi->tuple_size;
+	bp->iv2.bv_offset += sectors * bi->tuple_size;
+	bp->iv2.bv_len -= sectors * bi->tuple_size;
+
+	bp->bip1.bip_sector = bio->bi_integrity->bip_sector;
+	bp->bip2.bip_sector = bio->bi_integrity->bip_sector + sectors;
+
+	bp->bip1.bip_vcnt = bp->bip2.bip_vcnt = 1;
+	bp->bip1.bip_idx = bp->bip2.bip_idx = 0;
+}
+EXPORT_SYMBOL(bio_integrity_split);
+
+/**
+ * bio_integrity_clone - Callback for cloning bios with integrity metadata
+ * @bio:	New bio
+ * @bio_src:	Original bio
+ * @bs:		bio_set to allocate bip from
+ *
+ * Description:	Called to allocate a bip when cloning a bio
+ */
+int bio_integrity_clone(struct bio *bio, struct bio *bio_src, struct bio_set *bs)
+{
+	struct bip *bip_src = bio_src->bi_integrity;
+	struct bip *bip;
+
+	BUG_ON(bip_src == NULL);
+
+	bip = bio_integrity_alloc_bioset(bio, GFP_NOIO, bip_src->bip_vcnt, bs);
+
+	if (bip == NULL)
+		return -EIO;
+
+	memcpy(bip->bip_vec, bip_src->bip_vec,
+	       bip_src->bip_vcnt * sizeof(struct bio_vec));
+
+	bip->bip_sector = bip_src->bip_sector;
+	bip->bip_vcnt = bip_src->bip_vcnt;
+	bip->bip_idx = bip_src->bip_idx;
+
+	return 0;
+}
+EXPORT_SYMBOL(bio_integrity_clone);
+
+int bioset_integrity_create(struct bio_set *bs, int pool_size)
+{
+	bs->bio_integrity_pool = mempool_create_slab_pool(pool_size,
+							  bio_integrity_slab);
+	if (!bs->bio_integrity_pool)
+		return -1;
+
+	return 0;
+}
+EXPORT_SYMBOL(bioset_integrity_create);
+
+void bioset_integrity_free(struct bio_set *bs)
+{
+	if (bs->bio_integrity_pool)
+		mempool_destroy(bs->bio_integrity_pool);
+}
+EXPORT_SYMBOL(bioset_integrity_free);
+
+void __init bio_integrity_init_slab(void)
+{
+	bio_integrity_slab = KMEM_CACHE(bip, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
+}
+EXPORT_SYMBOL(bio_integrity_init_slab);
+
+static int __init integrity_init(void)
+{
+	kintegrityd_wq = create_workqueue("kintegrityd");
+
+	if (!kintegrityd_wq)
+		panic("Failed to create kintegrityd\n");
+
+	return 0;
+}
+subsys_initcall(integrity_init);
diff -r 5d30928a2730 -r c4c47b2f1539 fs/bio.c
--- a/fs/bio.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/fs/bio.c	Fri Apr 25 17:39:29 2008 -0400
@@ -96,6 +96,9 @@
 
 		mempool_free(bio->bi_io_vec, bio_set->bvec_pools[pool_idx]);
 	}
+
+	if (bio_integrity(bio))
+		bio_integrity_free(bio, bio_set);
 
 	mempool_free(bio, bio_set->bio_pool);
 }
@@ -255,9 +258,19 @@
 {
 	struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, fs_bio_set);
 
-	if (b) {
-		b->bi_destructor = bio_fs_destructor;
-		__bio_clone(b, bio);
+	if (!b) 
+		return NULL;
+
+	b->bi_destructor = bio_fs_destructor;
+	__bio_clone(b, bio);
+
+	if (bio_integrity(bio)) {
+		int ret;
+
+		ret = bio_integrity_clone(b, bio, fs_bio_set);
+		
+		if (ret < 0)
+			return NULL;
 	}
 
 	return b;
@@ -1139,6 +1152,9 @@
 	bp->bio1.bi_private = bi;
 	bp->bio2.bi_private = pool;
 
+	if (bio_integrity(bi))
+		bio_integrity_split(bi, bp, first_sectors);
+
 	return bp;
 }
 
@@ -1204,6 +1220,7 @@
 	if (bs->bio_pool)
 		mempool_destroy(bs->bio_pool);
 
+	bioset_integrity_free(bs);
 	biovec_free_pools(bs);
 
 	kfree(bs);
@@ -1218,6 +1235,9 @@
 
 	bs->bio_pool = mempool_create_slab_pool(bio_pool_size, bio_slab);
 	if (!bs->bio_pool)
+		goto bad;
+
+	if (bioset_integrity_create(bs, bio_pool_size))
 		goto bad;
 
 	if (!biovec_create_pools(bs, bvec_pool_size))
@@ -1246,6 +1266,7 @@
 {
 	bio_slab = KMEM_CACHE(bio, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
 
+	bio_integrity_init_slab();
 	biovec_init_slabs();
 
 	fs_bio_set = bioset_create(BIO_POOL_SIZE, 2);
diff -r 5d30928a2730 -r c4c47b2f1539 include/linux/bio.h
--- a/include/linux/bio.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/linux/bio.h	Fri Apr 25 17:39:29 2008 -0400
@@ -64,6 +64,7 @@
 
 struct bio_set;
 struct bio;
+struct bip;
 typedef void (bio_end_io_t) (struct bio *, int);
 typedef void (bio_destructor_t) (struct bio *);
 
@@ -112,6 +113,9 @@
 	atomic_t		bi_cnt;		/* pin count */
 
 	void			*bi_private;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+	struct bip		*bi_integrity;  /* data integrity */
+#endif
 
 	bio_destructor_t	*bi_destructor;	/* destructor */
 };
@@ -271,6 +275,29 @@
  */
 #define bio_get(bio)	atomic_inc(&(bio)->bi_cnt)
 
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+/*
+ * bio integrity payload
+ */
+struct bip {
+	struct bio		*bip_bio;	/* parent bio */
+	struct bio_vec		*bip_vec;	/* integrity data vector */
+
+	sector_t		bip_sector;	/* virtual start sector */
+
+	void			*bip_buf;	/* generated integrity data */
+	bio_end_io_t		*bip_end_io;	/* saved I/O completion fn */
+
+	int			bip_error;	/* saved I/O error */
+	unsigned int		bip_size;
+
+	unsigned short		bip_pool;	/* pool the ivec came from */
+	unsigned short		bip_vcnt;	/* # of integrity bio_vecs */
+	unsigned short		bip_idx;	/* current bip_vec index */
+
+	struct work_struct	bip_work;	/* I/O completion */
+};
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
 
 /*
  * A bio_pair is used when we need to split a bio.
@@ -285,6 +312,10 @@
 struct bio_pair {
 	struct bio	bio1, bio2;
 	struct bio_vec	bv1, bv2;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+	struct bip	bip1, bip2;
+	struct bio_vec	iv1, iv2;
+#endif
 	atomic_t	cnt;
 	int		error;
 };
@@ -347,6 +378,9 @@
 
 struct bio_set {
 	mempool_t *bio_pool;
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+	mempool_t *bio_integrity_pool;
+#endif
 	mempool_t *bvec_pools[BIOVEC_NR_POOLS];
 };
 
@@ -411,5 +445,56 @@
 	__bio_kmap_irq((bio), (bio)->bi_idx, (flags))
 #define bio_kunmap_irq(buf,flags)	__bio_kunmap_irq(buf, flags)
 
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+
+#define bip_vec_idx(bip, idx)	(&(bip->bip_vec[(idx)]))
+#define bip_vec(bip)		bip_vec_idx(bip, 0)
+
+#define __bip_for_each_vec(bvl, bip, i, start_idx)			\
+	for (bvl = bip_vec_idx((bip), (start_idx)), i = (start_idx);	\
+	     i < (bip)->bip_vcnt;					\
+	     bvl++, i++)
+
+#define bip_for_each_vec(bvl, bip, i)					\
+	__bip_for_each_vec(bvl, bip, i, (bip)->bip_idx)
+
+#define bio_integrity(bio)	((bio)->bi_integrity ? 1 : 0)
+
+extern struct bip *bio_integrity_alloc_bioset(struct bio *, gfp_t, unsigned int, struct bio_set *);
+extern struct bip *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
+extern void bio_integrity_free(struct bio *, struct bio_set *);
+extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
+extern inline int bio_integrity_enabled(struct bio *bio);
+extern int bio_integrity_set_tag(struct bio *, void *, unsigned int);
+extern int bio_integrity_get_tag(struct bio *, void *, unsigned int);
+extern int bio_integrity_prep(struct bio *);
+extern void bio_integrity_endio(struct bio *, int);
+extern void bio_integrity_advance(struct bio *, unsigned int);
+extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
+extern void bio_integrity_split(struct bio *, struct bio_pair *, int);
+extern int bio_integrity_clone(struct bio *, struct bio *, struct bio_set *);
+extern int bioset_integrity_create(struct bio_set *, int);
+extern void bioset_integrity_free(struct bio_set *);
+extern void bio_integrity_init_slab(void);
+
+#else /* CONFIG_BLK_DEV_INTEGRITY */
+
+#define bio_integrity(a)		(0)
+#define bioset_integrity_create(a, b)	(0)
+#define bio_integrity_prep(a)		(0)
+#define bio_integrity_enabled(a)	(0)
+#define bio_integrity_clone(a, b, c)	(0)
+#define bioset_integrity_free(a)	do { } while (0)
+#define bio_integrity_free(a, b)	do { } while (0)
+#define bio_integrity_endio(a, b)	do { } while (0)
+#define bio_integrity_advance(a, b)	do { } while (0)
+#define bio_integrity_trim(a, b, c)	do { } while (0)
+#define bio_integrity_split(a, b, c)	do { } while (0)
+#define bio_integrity_set_tag(a, b, c)	do { } while (0)
+#define bio_integrity_get_tag(a, b, c)	do { } while (0)
+#define bio_integrity_init_slab(a)	do { } while (0)
+
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
 #endif /* CONFIG_BLOCK */
 #endif /* __LINUX_BIO_H */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 05 of 16] Block layer data integrity
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (3 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 04 of 16] Block layer data integrity Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-05-06 20:29   ` malahal
  2008-04-25 23:12 ` [PATCH 06 of 16] Detect devices with protection information turned on in INQUIRY Martin K. Petersen
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

10 files changed, 849 insertions(+)
Documentation/block/data-integrity.txt |  327 +++++++++++++++++++++++++++
block/Kconfig                          |   12 
block/Makefile                         |    1 
block/blk-core.c                       |    7 
block/blk-integrity.c                  |  385 ++++++++++++++++++++++++++++++++
block/blk-merge.c                      |    3 
block/blk.h                            |    8 
block/elevator.c                       |    6 
include/linux/blkdev.h                 |   97 ++++++++
include/linux/genhd.h                  |    3 


Support for merging and mapping bio integrity metadata.

Block device integrity type registration.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r c4c47b2f1539 -r 26ccaf2ccdc5 Documentation/block/data-integrity.txt
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Documentation/block/data-integrity.txt	Fri Apr 25 17:39:29 2008 -0400
@@ -0,0 +1,327 @@
+----------------------------------------------------------------------
+1. INTRODUCTION
+
+Modern filesystems feature checksumming of data and metadata to
+protect against data corruption.  However, the detection of the
+corruption is done at read time which could potentially be months
+after the data was written.  At that point the original data that the
+application tried to write is most likely lost.
+
+The solution is to ensure that the disk is actually storing what the
+application meant it to.  Recent additions to both the SCSI family
+protocols (SBC Data Integrity Field, SCC protection proposal) as well
+as SATA/T13 (External Path Protection) try to remedy this by adding
+support for appending integrity metadata to an I/O.  The integrity
+metadata (or protection information in SCSI terminology) includes a
+checksum for each sector as well as an incrementing counter that
+ensures the individual sectors are written in the right order.  And
+for some protection schemes also that the I/O is written to the right
+place on disk.
+
+Current storage controllers and devices implement various protective
+measures, for instance checksumming and scrubbing.  But these
+technologies are working in their own isolated domains or at best
+between adjacent nodes in the I/O path.  The interesting thing about
+DIF and the other integrity extensions is that the protection format
+is well defined and every node in the I/O path can verify the
+integrity of the I/O and reject it if corruption is detected.  This
+allows not only corruption prevention but also isolation of the point
+of failure.
+
+----------------------------------------------------------------------
+2. THE DATA INTEGRITY EXTENSIONS
+
+As written, the protocol extensions only protect the path between
+controller and storage device.  However, many controllers actually
+allow the operating system to interact with the integrity metadata
+(IMD).  We have been working with several FC/SAS HBA vendors to enable
+the protection information to be transferred to and from their
+controllers.
+
+The SCSI Data Integrity Field works by appending 8 bytes of protection
+information to each sector.  The data + integrity metadata is stored
+in 520 byte sectors on disk.  Data + IMD are interleaved when
+transferred between the controller and target.  The T13 proposal is
+similar.
+
+Because it is highly inconvenient for operating systems to deal with
+520 (and 4104) byte sectors, we approached several HBA vendors and
+encouraged them to allow separation of the data and integrity metadata
+scatter-gather lists.
+
+The controller will interleave the buffers on write and split them on
+read.  This means that the Linux can DMA the data buffers to and from
+host memory without changes to the page cache.
+
+Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
+is somewhat heavy to compute in software.  Benchmarks found that
+calculating this checksum had a significant impact on system
+performance for a number of workloads.  Some controllers allow a
+lighter-weight checksum to be used when interfacing with the operating
+system.  Emulex, for instance, supports the TCP/IP checksum instead.
+The IP checksum received from the OS is converted to the 16-bit CRC
+when writing and vice versa.  This allows the integrity metadata to be
+generated by Linux or the application at very low cost (comparable to
+software RAID5).
+
+The IP checksum is weaker than the CRC in terms of detecting bit
+errors.  However, the strength is really in the separation of the data
+buffers and the integrity metadata.  These two distinct buffers much
+match up for an I/O to complete.
+
+The separation of the data and integrity metadata buffers as well as
+the choice in checksums is referred to as the Data Integrity
+Extensions.  As these extensions are outside the scope of the protocol
+bodies (T10, T13), Oracle and its partners are trying to standardize
+them within the Storage Networking Industry Association.
+
+----------------------------------------------------------------------
+3. KERNEL CHANGES
+
+The data integrity framework in Linux enables protection information
+to be pinned to I/Os and sent to/received from controllers that
+support it.
+
+The advantage to the integrity extensions in SCSI and SATA is that
+they enable us to protect the entire path from application to storage
+device.  However, at the same time this is also the biggest
+disadvantage. It means that the protection information must be in a
+format that can be understood by the disk.
+
+Generally Linux/POSIX applications are agnostic to the intricacies of
+the storage devices they are accessing.  The virtual filesystem switch
+and the block layer make things like hardware sector size and
+transport protocols completely transparent to the application.
+
+However, this level of detail is required when preparing the
+protection information to send to a disk.  Consequently, the very
+concept of an end-to-end protection scheme is a layering violation.
+It is completely unreasonable for an application to be aware whether
+it is accessing a SCSI or SATA disk.
+
+The data integrity support implemented in Linux attempts to hide this
+from the application.  As far as the application (and to some extent
+the kernel) is concerned, the integrity metadata is opaque information
+that's attached to the I/O.
+
+The current implementation allows the block layer to automatically
+generate the protection information for any I/O.  Eventually the
+intent is to move the integrity metadata calculation to userspace for
+user data.  Metadata and other I/O that originates within the kernel
+will still use the automatic generation interface.
+
+Some storage devices allow each hardware sector to be tagged with a
+16-bit value.  The owner of this tag space is the owner of the block
+device.  I.e. the filesystem in most cases.  The filesystem can use
+this extra space to tag sectors as they see fit.  Because the tag
+space is limited, the block interface allows tagging bigger chunks by
+way of interleaving.  This way, 8*16 bits of information can be
+attached to a typical 4KB filesystem block.
+
+This also means that applications such as fsck and mkfs will need
+access to manipulate the tags from user space.  A passthrough
+interface for this is being worked on.
+
+
+----------------------------------------------------------------------
+4. BLOCK LAYER IMPLEMENTATION DETAILS
+
+4.1 BIO
+
+The data integrity patches add a new field to struct bio when
+CONFIG_BLK_DEV_INTEGRITY is enabled.  bio->bi_integrity is a pointer
+to a struct bip which contains the bio integrity payload.  Essentially
+a bip is a trimmed down struct bio which holds a bio_vec containing
+the integrity metadata and the required housekeeping information (bvec
+pool, vector count, etc.)
+
+A kernel subsystem can enable data integrity protection on a bio by
+calling bio_integrity_alloc(bio).  This will allocate and attach the
+bip to the bio.
+
+Individual pages containing integrity metadata can subsequently be
+attached using bio_integrity_add_page().
+
+bio_free() will automatically free the bip.
+
+
+4.2 BLOCK DEVICE
+
+Because the format of the protection data is tied to the physical
+disk, each block device has been extended with a block integrity
+profile (struct blk_integrity).  This optional profile is registered
+with the block layer using blk_integrity_register().
+
+The profile contains callback functions for generating and verifying
+the protection data, as well as getting and setting application tags.
+The profile also contains a few constants to aid in completing,
+merging and splitting the integrity metadata.
+
+Layered block devices will need to pick a profile that's appropriate
+for all subdevices.  blk_integrity_compare() can help with that.  DM
+and MD linear, RAID0 and RAID1 are currently supported.  RAID4/5/6
+will require extra work due to the application tag.
+
+
+----------------------------------------------------------------------
+5.0 BLOCK LAYER INTEGRITY API
+
+5.1 NORMAL FILESYSTEM
+
+    The normal filesystem is unaware that the underlying block device
+    is capable of sending/receiving integrity metadata.  The IMD will
+    be automatically generated by the block layer at submit_bio() time
+    in case of a WRITE.  A READ request will cause the I/O integrity
+    to be verified upon completion.
+
+    IMD generation and verification can be toggled using the
+
+      /sys/class/block/<bdev>/integrity/write_generate
+
+    and
+
+      /sys/class/block/<bdev>/integrity/read_verify
+
+    flags.
+
+
+5.2 INTEGRITY-AWARE FILESYSTEM
+
+    A filesystem that is integrity-aware can prepare I/Os with IMD
+    attached.  It can also use the application tag space if this is
+    supported by the block device.
+
+
+    int bdev_integrity_enabled(block_device, int rw);
+
+      bdev_integrity_enabled() will return 1 if the block device
+      supports integrity metadata transfer for the data direction
+      specified in 'rw'.
+
+      bdev_integrity_enabled() honors the write_generate and
+      read_verify flags in sysfs and will respond accordingly.
+
+
+    int bio_integrity_prep(bio);
+
+      To generate IMD for WRITE and to set up buffers for READ, the
+      filesystem must call bio_integrity_prep(bio).
+
+      Prior to calling this function, the bio data direction and start
+      sector must be set, and the bio should have all data pages
+      added.  It is up to the caller to ensure that the bio does not
+      change while I/O is in progress.
+
+      bio_integrity_prep() should only be called if
+      bio_integrity_enabled() returned 1.
+
+
+    int bio_integrity_tag_size(bio);
+
+      If the filesystem wants to use the application tag space it will
+      first have to find out how much storage space is available.
+      Because tag space is generally limited (usually 2 bytes per
+      sector regardless of sector size), the integrity framework
+      supports interleaving the information between the sectors in an
+      I/O.
+
+      Filesystems can call bio_integrity_tag_size(bio) to find out how
+      many bytes of storage are available for that particular bio.
+
+      Another option is bdev_get_tag_size(block_device) which will
+      return the number of available bytes per hardware sector.
+
+
+    int bio_integrity_set_tag(bio, void *tag_buf, len);
+
+      After a successful return from bio_integrity_prep(),
+      bio_integrity_set_tag() can be used to attach an opaque tag
+      buffer to a bio.  Obviously this only makes sense if the I/O is
+      a WRITE.
+
+
+    int bio_integrity_get_tag(bio, void *tag_buf, len);
+
+      Similarly, at READ I/O completion time the filesystem can
+      retrieve the tag buffer using bio_integrity_get_tag().
+
+
+6.3 PASSING EXISTING INTEGRITY METADATA
+
+    Filesystems that either generate their own integrity metadata or
+    are capable of transferring IMD from user space can use the
+    following calls:
+
+
+    struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
+
+      Allocates the bio integrity payload and hangs it off of the bio.
+      nr_pages indicate how many pages of protection data need to be
+      stored in the integrity bio_vec list (similar to bio_alloc()).
+
+      The integrity payload will be freed at bio_free() time.
+
+
+    int bio_integrity_add_page(bio, page, len, offset);
+
+      Attaches a page containing integrity metadata to an existing
+      bio.  The bio must have an existing bip,
+      i.e. bio_integrity_alloc() must have been called.  For a WRITE,
+      the integrity metadata in the pages must be in a format
+      understood by the target device with the notable exception that
+      the sector numbers will be remapped as the request traverses the
+      I/O stack.  This implies that the pages added using this call
+      will be modified during I/O!  The first reference tag in the
+      integrity metadata must have a value of bip->bip_sector.
+
+      Pages can be added using bio_integrity_add_page() as long as
+      there is room in the bip bio_vec array (nr_pages).
+
+      Upon completion of a READ operation, the attached pages will
+      contain the integrity metadata received from the storage device.
+      It is up to the receiver to process them and verify data
+      integrity upon completion.
+
+
+6.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
+    METADATA
+
+    To enable integrity exchange on a block device the gendisk must be
+    registered as capable:
+
+    int blk_integrity_register(gendisk, blk_integrity);
+
+      The blk_integrity struct is a template and should contain the
+      following:
+
+        static struct blk_integrity my_profile = {
+            .name                   = "STANDARDSBODY-TYPE-VARIANT",
+            .generate_fn            = my_generate_fn,
+       	    .verify_fn              = my_verify_fn,
+       	    .get_tag_fn             = my_get_tag_fn,
+       	    .set_tag_fn             = my_set_tag_fn,
+	    .tuple_size             = sizeof(struct my_tuple_size),
+	    .tag_size               = <tag bytes per hw sector>,
+        };
+
+      'name' is a text string which will be visible in sysfs.  This is
+      part of the userland API so chose it carefully and never change
+      it.  The format is standards body-type-variant.  E.g. T10-DIF-IP
+      or T13-EPP-CRC.
+
+      'generate_fn' generates appropriate integrity metadata (for WRITE).
+
+      'verify_fn' verifies that the data buffer matches the integrity
+      metadata.
+
+      'tuple_size' must be set to match the size of the integrity
+      metadata per sector.  I.e. 8 for DIF and EPP.
+
+      'tag_size' must be set to identify how many bytes of tag space
+      are available per hardware sector.  For DIF this is either 2 or
+      0 depending on the value of the Control Mode Page ATO bit.
+
+      See 6.2 for a description of get_tag_fn and set_tag_fn.
+
+----------------------------------------------------------------------
+2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 block/Kconfig
--- a/block/Kconfig	Fri Apr 25 17:39:29 2008 -0400
+++ b/block/Kconfig	Fri Apr 25 17:39:29 2008 -0400
@@ -81,6 +81,18 @@
 
 	  If unsure, say N.
 
+config BLK_DEV_INTEGRITY
+	bool "Block layer data integrity support"
+	---help---
+	Some storage devices allow extra information to be
+	stored/retrieved to help protect the data.  The block layer
+	data integrity option provides hooks which can be used by
+	filesystems to ensure better data integrity.
+
+	Say yes here if you have a storage device that provides the
+	T10/SCSI Data Integrity Field or the T13/ATA External Path
+	Protection.
+
 endif # BLOCK
 
 config BLOCK_COMPAT
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 block/Makefile
--- a/block/Makefile	Fri Apr 25 17:39:29 2008 -0400
+++ b/block/Makefile	Fri Apr 25 17:39:29 2008 -0400
@@ -14,3 +14,4 @@
 
 obj-$(CONFIG_BLK_DEV_IO_TRACE)	+= blktrace.o
 obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
+obj-$(CONFIG_BLK_DEV_INTEGRITY)	+= blk-integrity.o
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 block/blk-core.c
--- a/block/blk-core.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/block/blk-core.c	Fri Apr 25 17:39:29 2008 -0400
@@ -162,6 +162,10 @@
 
 		bio->bi_size -= nbytes;
 		bio->bi_sector += (nbytes >> 9);
+
+		if (bio_integrity(bio))
+			bio_integrity_advance(bio, nbytes);
+
 		if (bio->bi_size == 0)
 			bio_endio(bio, error);
 	} else {
@@ -1389,6 +1393,9 @@
 		 */
 		blk_partition_remap(bio);
 
+		if (bio_integrity_enabled(bio) && bio_integrity_prep(bio))
+			goto end_io;
+
 		if (old_sector != -1)
 			blk_add_trace_remap(q, bio, old_dev, bio->bi_sector,
 					    old_sector);
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 block/blk-integrity.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/block/blk-integrity.c	Fri Apr 25 17:39:29 2008 -0400
@@ -0,0 +1,385 @@
+/*
+ * blk-integrity.c - Block layer data integrity extensions
+ *
+ * Copyright (C) 2007, 2008 Oracle Corporation
+ * Written by: Martin K. Petersen <martin.petersen@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+#include <linux/mempool.h>
+#include <linux/bio.h>
+#include <linux/scatterlist.h>
+
+#include "blk.h"
+
+static struct kmem_cache *integrity_cachep;
+
+/**
+ * blk_rq_count_integrity_sg - Count number of integrity scatterlist elements
+ * @rq:		request with integrity metadata attached
+ *
+ * Description: Returns the number of elements required in a
+ * scatterlist corresponding to the integrity metadata in a request.
+ */
+int blk_rq_count_integrity_sg(struct request *rq)
+{
+	struct bio_vec *iv, *ivprv;
+	struct req_iterator iter;
+	unsigned int segments;
+
+	ivprv = NULL;
+	segments = 0;
+
+	rq_for_each_integrity_segment(iv, rq, iter) {
+
+		if (ivprv && BIOVEC_PHYS_MERGEABLE(ivprv, iv))
+			;
+		else
+			segments++;
+
+		ivprv = iv;
+	}
+
+	return segments;
+}
+EXPORT_SYMBOL(blk_rq_count_integrity_sg);
+
+/**
+ * blk_rq_map_integrity_sg - Map integrity metadata into a scatterlist
+ * @rq:		request with integrity metadata attached
+ * @sglist:	target scatterlist
+ *
+ * Description: Map the integrity vectors in request into a
+ * scatterlist.  The scatterlist must be big enough to hold all
+ * elements.  I.e. sized using blk_rq_count_integrity_sg().
+ */
+int blk_rq_map_integrity_sg(struct request *rq, struct scatterlist *sglist)
+{
+	struct bio_vec *iv, *ivprv;
+	struct req_iterator iter;
+	struct scatterlist *sg;
+	unsigned int segments;
+
+	ivprv = NULL;
+	sg = NULL;
+	segments = 0;
+
+	rq_for_each_integrity_segment(iv, rq, iter) {
+
+		if (ivprv) {
+			if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))
+				goto new_segment;
+
+			sg->length += iv->bv_len;
+		} else {
+new_segment:
+			if (!sg)
+				sg = sglist;
+			else {
+				sg->page_link &= ~0x02;
+				sg = sg_next(sg);
+			}
+
+			sg_set_page(sg, iv->bv_page, iv->bv_len, iv->bv_offset);
+			segments++;
+		}
+
+		ivprv = iv;
+	}
+
+	if (sg)
+		sg_mark_end(sg);
+
+	return segments;
+}
+EXPORT_SYMBOL(blk_rq_map_integrity_sg);
+
+/**
+ * blk_integrity_compare - Compare integrity profile of two block devices
+ * @b1:		Device to compare
+ * @b2:		Device to compare
+ *
+ * Description: Meta-devices like DM and MD need to verify that all
+ * sub-devices use the same integrity format before advertising to
+ * upper layers that they can send/receive integrity metadata.  This
+ * function can be used to check whether two block devices have
+ * compatible integrity formats.
+ */
+int blk_integrity_compare(struct block_device *bd1, struct block_device *bd2)
+{
+	struct blk_integrity *b1 = bd1->bd_disk->integrity;
+	struct blk_integrity *b2 = bd2->bd_disk->integrity;
+
+	BUG_ON(bd1->bd_disk == NULL);
+	BUG_ON(bd2->bd_disk == NULL);
+
+	if (!b1 || !b2)
+		return 0;
+
+	if (b1->sector_size != b2->sector_size) {
+		printk(KERN_ERR "%s: %s/%s sector sz %u != %u\n", __func__,
+		       bd1->bd_disk->disk_name, bd2->bd_disk->disk_name,
+		       b1->sector_size, b2->sector_size);
+		return -1;
+	}
+
+	if (b1->tuple_size != b2->tuple_size) {
+		printk(KERN_ERR "%s: %s/%s tuple sz %u != %u\n", __func__,
+		       bd1->bd_disk->disk_name, bd2->bd_disk->disk_name,
+		       b1->tuple_size, b2->tuple_size);
+		return -1;
+	}
+
+	if (b1->tag_size && b2->tag_size && (b1->tag_size != b2->tag_size)) {
+		printk(KERN_ERR "%s: %s/%s tag sz %u != %u\n", __func__,
+		       bd1->bd_disk->disk_name, bd2->bd_disk->disk_name,
+		       b1->tag_size, b2->tag_size);
+		return -1;
+	}
+
+	if (strcmp(b1->name, b2->name)) {
+		printk(KERN_ERR "%s: %s/%s type %s != %s\n", __func__,
+		       bd1->bd_disk->disk_name, bd2->bd_disk->disk_name,
+		       b1->name, b2->name);
+		return -1;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(blk_integrity_compare);
+
+struct integrity_sysfs_entry {
+	struct attribute attr;
+	ssize_t (*show)(struct blk_integrity *, char *);
+	ssize_t (*store)(struct blk_integrity *, const char *, size_t);
+};
+
+static ssize_t integrity_attr_show(struct kobject *kobj, struct attribute *attr,
+				   char *page)
+{
+	struct blk_integrity *bi =
+		container_of(kobj, struct blk_integrity, kobj);
+	struct integrity_sysfs_entry *entry =
+		container_of(attr, struct integrity_sysfs_entry, attr);
+	ssize_t ret = -EIO;
+
+	if (entry->show)
+		ret = entry->show(bi, page);
+
+	return ret;
+}
+
+static ssize_t integrity_attr_store(struct kobject *kobj, struct attribute *attr,
+				    const char *page, size_t count)
+{
+	struct blk_integrity *bi =
+		container_of(kobj, struct blk_integrity, kobj);
+	struct integrity_sysfs_entry *entry =
+		container_of(attr, struct integrity_sysfs_entry, attr);
+	ssize_t ret = 0;
+
+	if (entry->store)
+		ret = entry->store(bi, page, count);
+
+	return ret;
+}
+
+static ssize_t integrity_format_show(struct blk_integrity *bi, char *page)
+{
+	if (bi != NULL && bi->name != NULL)
+		return sprintf(page, "%s\n", bi->name);
+	else
+		return sprintf(page, "none\n");
+}
+
+static ssize_t integrity_tag_size_show(struct blk_integrity *bi, char *page)
+{
+	if (bi != NULL)
+		return sprintf(page, "%u\n", bi->tag_size);
+	else
+		return sprintf(page, "0\n");
+}
+
+static ssize_t integrity_read_store(struct blk_integrity *bi,
+				    const char *page, size_t count)
+{
+	char *p = (char *) page;
+	unsigned long val = simple_strtoul(p, &p, 10);
+
+	if (val == 1)
+		set_bit(INTEGRITY_FLAG_READ, &bi->flags);
+	else
+		clear_bit(INTEGRITY_FLAG_READ, &bi->flags);
+
+	return count;
+}
+
+static ssize_t integrity_read_show(struct blk_integrity *bi, char *page)
+{
+	return sprintf(page, "%d\n",
+		       test_bit(INTEGRITY_FLAG_READ, &bi->flags) ? 1 : 0);
+}
+
+static ssize_t integrity_write_store(struct blk_integrity *bi,
+				     const char *page, size_t count)
+{
+	char *p = (char *) page;
+	unsigned long val = simple_strtoul(p, &p, 10);
+
+	if (val == 1)
+		set_bit(INTEGRITY_FLAG_WRITE, &bi->flags);
+	else
+		clear_bit(INTEGRITY_FLAG_WRITE, &bi->flags);
+
+	return count;
+}
+
+static ssize_t integrity_write_show(struct blk_integrity *bi, char *page)
+{
+	return sprintf(page, "%d\n",
+		       test_bit(INTEGRITY_FLAG_WRITE, &bi->flags) ? 1 : 0);
+}
+
+static struct integrity_sysfs_entry integrity_format_entry = {
+	.attr = { .name = "format", .mode = S_IRUGO },
+	.show = integrity_format_show,
+};
+
+static struct integrity_sysfs_entry integrity_tag_size_entry = {
+	.attr = { .name = "tag_size", .mode = S_IRUGO },
+	.show = integrity_tag_size_show,
+};
+
+static struct integrity_sysfs_entry integrity_read_entry = {
+	.attr = { .name = "read_verify", .mode = S_IRUGO | S_IWUSR },
+	.show = integrity_read_show,
+	.store = integrity_read_store,
+};
+
+static struct integrity_sysfs_entry integrity_write_entry = {
+	.attr = { .name = "write_generate", .mode = S_IRUGO | S_IWUSR },
+	.show = integrity_write_show,
+	.store = integrity_write_store,
+};
+
+static struct attribute *integrity_attrs[] = {
+	&integrity_format_entry.attr,
+	&integrity_tag_size_entry.attr,
+	&integrity_read_entry.attr,
+	&integrity_write_entry.attr,
+	NULL,
+};
+
+static struct sysfs_ops integrity_ops = {
+	.show	= &integrity_attr_show,
+	.store	= &integrity_attr_store,
+};
+
+static int __init blk_dev_integrity_init(void)
+{
+	integrity_cachep = kmem_cache_create("blkdev_integrity",
+					     sizeof(struct blk_integrity),
+					     0, SLAB_PANIC, NULL);
+	return 0;
+}
+subsys_initcall(blk_dev_integrity_init);
+
+static void blk_integrity_release(struct kobject *kobj)
+{
+	struct blk_integrity *bi =
+		container_of(kobj, struct blk_integrity, kobj);
+
+	kmem_cache_free(integrity_cachep, bi);
+}
+
+static struct kobj_type integrity_ktype = {
+	.default_attrs	= integrity_attrs,
+	.sysfs_ops	= &integrity_ops,
+	.release	= blk_integrity_release,
+};
+
+/**
+ * blk_integrity_register - Register a gendisk as being integrity-capable
+ * @disk:	struct gendisk pointer to make integrity-aware
+ * @template:	integrity profile
+ *
+ * Description: When a device needs to advertise itself as being able
+ * to send/receive integrity metadata it must use this function to
+ * register the capability with the block layer.  The template is a
+ * blk_integrity struct with values appropriate for the underlying
+ * hardware.  See Documentation/block/data-integrity.txt.
+ */
+int blk_integrity_register(struct gendisk *disk, struct blk_integrity *template)
+{
+	struct blk_integrity *bi;
+
+	BUG_ON(disk == NULL);
+	BUG_ON(template == NULL);
+
+	if (disk->integrity == NULL) {
+		bi = kmem_cache_alloc(integrity_cachep, GFP_KERNEL | __GFP_ZERO);
+		if (!bi)
+			return -1;
+
+		if (kobject_init_and_add(&bi->kobj, &integrity_ktype,
+					 &disk->dev.kobj, "%s", "integrity"))
+			return -1;
+
+		kobject_uevent(&bi->kobj, KOBJ_ADD);
+
+		set_bit(INTEGRITY_FLAG_READ, &bi->flags);
+		set_bit(INTEGRITY_FLAG_WRITE, &bi->flags);
+		bi->sector_size = disk->queue->hardsect_size;
+		disk->integrity = bi;
+	} else
+		bi = disk->integrity;
+
+	/* Use the provided profile as template */
+	bi->name = template->name;
+	bi->generate_fn = template->generate_fn;
+	bi->verify_fn = template->verify_fn;
+	bi->tuple_size = template->tuple_size;
+	bi->set_tag_fn = template->set_tag_fn;
+	bi->get_tag_fn = template->get_tag_fn;
+	bi->tag_size = template->tag_size;
+
+	return 0;
+}
+EXPORT_SYMBOL(blk_integrity_register);
+
+/**
+ * blk_integrity_unregister - Remove block integrity profile
+ * @disk:	disk whose integrity profile to deallocate
+ *
+ * Description: This function frees all memory used by the block
+ * integrity profile.  To be called at device teardown.
+ */
+void blk_integrity_unregister(struct gendisk *disk)
+{
+	struct blk_integrity *bi;
+
+	if (!disk || !disk->integrity)
+		return;
+
+	bi = disk->integrity;
+
+	kobject_uevent(&bi->kobj, KOBJ_REMOVE);
+	kobject_del(&bi->kobj);
+	kobject_put(&disk->dev.kobj);
+}
+EXPORT_SYMBOL(blk_integrity_unregister);
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 block/blk-merge.c
--- a/block/blk-merge.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/block/blk-merge.c	Fri Apr 25 17:39:29 2008 -0400
@@ -441,6 +441,9 @@
 	    || next->special)
 		return 0;
 
+	if (blk_integrity_rq(req) != blk_integrity_rq(next))
+		return 0;
+
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 block/blk.h
--- a/block/blk.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/block/blk.h	Fri Apr 25 17:39:29 2008 -0400
@@ -52,4 +52,12 @@
 	return q->nr_congestion_off;
 }
 
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+
+#define rq_for_each_integrity_segment(bvl, _rq, _iter)		\
+	__rq_for_each_bio(_iter.bio, _rq)			\
+		bip_for_each_vec(bvl, _iter.bio->bi_integrity, _iter.i)
+
+#endif /* BLK_DEV_INTEGRITY */
+
 #endif
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 block/elevator.c
--- a/block/elevator.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/block/elevator.c	Fri Apr 25 17:39:29 2008 -0400
@@ -84,6 +84,12 @@
 	 * must be same device and not a special request
 	 */
 	if (rq->rq_disk != bio->bi_bdev->bd_disk || rq->special)
+		return 0;
+
+	/*
+	 * only merge integrity protected bio into ditto rq
+	 */
+	if (bio_integrity(bio) != blk_integrity_rq(rq))
 		return 0;
 
 	if (!elv_iosched_allow_merge(rq, bio))
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 include/linux/blkdev.h
--- a/include/linux/blkdev.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/linux/blkdev.h	Fri Apr 25 17:39:29 2008 -0400
@@ -113,6 +113,7 @@
 	__REQ_ALLOCED,		/* request came from our alloc pool */
 	__REQ_RW_META,		/* metadata io request */
 	__REQ_COPY_USER,	/* contains copies of user pages */
+	__REQ_INTEGRITY,	/* integrity metadata has been remapped */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -135,6 +136,7 @@
 #define REQ_ALLOCED	(1 << __REQ_ALLOCED)
 #define REQ_RW_META	(1 << __REQ_RW_META)
 #define REQ_COPY_USER	(1 << __REQ_COPY_USER)
+#define REQ_INTEGRITY	(1 << __REQ_INTEGRITY)
 
 #define BLK_MAX_CDB	16
 
@@ -827,6 +829,101 @@
 	MODULE_ALIAS("block-major-" __stringify(major) "-*")
 
 
+#if defined(CONFIG_BLK_DEV_INTEGRITY)
+
+#define INTEGRITY_FLAG_READ	1	/* verify data integrity on read */
+#define INTEGRITY_FLAG_WRITE	2	/* generate data integrity on write */
+
+struct blk_integrity_exchg {
+	void			*prot_buf;
+	void			*data_buf;
+	sector_t		sector;
+	unsigned int		data_size;
+	unsigned short		sector_size;
+	const char		*disk_name;
+};
+					     
+typedef void (integrity_gen_fn) (struct blk_integrity_exchg *);
+typedef int (integrity_vrfy_fn) (struct blk_integrity_exchg *);
+typedef void (integrity_set_tag_fn) (void *, void *, unsigned int);
+typedef void (integrity_get_tag_fn) (void *, void *, unsigned int);
+
+struct blk_integrity {
+	integrity_gen_fn	*generate_fn;
+	integrity_vrfy_fn	*verify_fn;
+	integrity_set_tag_fn	*set_tag_fn;
+	integrity_get_tag_fn	*get_tag_fn;
+
+	unsigned short		flags;
+	unsigned short		tuple_size;
+	unsigned short		sector_size;
+	unsigned short		tag_size;
+
+	const char		*name;
+
+	struct kobject		kobj;
+};
+
+extern int blk_integrity_register(struct gendisk *, struct blk_integrity *);
+extern void blk_integrity_unregister(struct gendisk *);
+extern int blk_integrity_compare(struct block_device *, struct block_device *);
+extern int blk_rq_map_integrity_sg(struct request *, struct scatterlist *);
+extern int blk_rq_count_integrity_sg(struct request *);
+
+static inline unsigned short blk_integrity_tuple_size(struct blk_integrity *bi)
+{
+	return (bi == NULL) ? 0 : bi->tuple_size;
+}
+
+static inline struct blk_integrity *bdev_get_integrity(struct block_device *bdev)
+{
+	return bdev->bd_disk->integrity;
+}
+
+static inline unsigned int bdev_get_tag_size(struct block_device *bdev)
+{
+	struct blk_integrity *bi = bdev_get_integrity(bdev);
+
+	return (bi == NULL) ? 0 : bi->tag_size;
+}
+
+static inline int bdev_integrity_enabled(struct block_device *bdev, int rw)
+{
+	struct blk_integrity *bi = bdev_get_integrity(bdev); 
+
+	if (bi == NULL)
+		return 0;
+
+	if (rw == READ && bi->verify_fn != NULL && 
+	    test_bit(INTEGRITY_FLAG_READ, &bi->flags))
+		return 1;
+
+	if (rw == WRITE && bi->generate_fn != NULL &&
+	    test_bit(INTEGRITY_FLAG_WRITE, &bi->flags))
+		return 1;
+
+	return 0;
+}
+
+static inline int blk_integrity_rq(struct request *rq)
+{
+	BUG_ON(rq->bio == NULL);
+
+	return bio_integrity(rq->bio);
+}
+
+#else /* CONFIG_BLK_DEV_INTEGRITY */
+
+#define blk_integrity_rq(rq)			(0)
+#define bdev_get_integrity(a)			(0)
+#define bdev_get_tag_size(a)			(0)
+#define blk_integrity_compare(a, b)		(0)
+#define blk_integrity_register(a, b)		(0)
+#define blk_integrity_unregister(a)		do { } while (0);
+
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
+
 #else /* CONFIG_BLOCK */
 /*
  * stubs for when the block layer is configured out
diff -r c4c47b2f1539 -r 26ccaf2ccdc5 include/linux/genhd.h
--- a/include/linux/genhd.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/linux/genhd.h	Fri Apr 25 17:39:29 2008 -0400
@@ -141,6 +141,9 @@
 	struct disk_stats dkstats;
 #endif
 	struct work_struct async_notify;
+#ifdef  CONFIG_BLK_DEV_INTEGRITY
+	struct blk_integrity *integrity;
+#endif
 };
 
 /* 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 06 of 16] Detect devices with protection information turned on in INQUIRY
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (4 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 05 " Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 07 of 16] Rename scsi_bidi_sdb_cache Martin K. Petersen
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

2 files changed, 4 insertions(+)
drivers/scsi/scsi_scan.c   |    3 +++
include/scsi/scsi_device.h |    1 +


Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r 26ccaf2ccdc5 -r bfbea544d342 drivers/scsi/scsi_scan.c
--- a/drivers/scsi/scsi_scan.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/scsi_scan.c	Fri Apr 25 17:39:29 2008 -0400
@@ -877,6 +877,9 @@
 
 	if (*bflags & BLIST_USE_10_BYTE_MS)
 		sdev->use_10_for_ms = 1;
+
+	if (inq_result[5] & 0x1)
+		sdev->protection = 1;
 
 	/* set the device running here so that slave configure
 	 * may do I/O */
diff -r 26ccaf2ccdc5 -r bfbea544d342 include/scsi/scsi_device.h
--- a/include/scsi/scsi_device.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/scsi/scsi_device.h	Fri Apr 25 17:39:29 2008 -0400
@@ -140,6 +140,7 @@
 	unsigned guess_capacity:1;	/* READ_CAPACITY might be too high by 1 */
 	unsigned retry_hwerror:1;	/* Retry HARDWARE_ERROR */
 	unsigned last_sector_bug:1;	/* Always read last sector in a 1 sector read */
+	unsigned protection:1;		/* Data Integrity Field */
 
 	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
 	struct list_head event_list;	/* asserted events */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 07 of 16] Rename scsi_bidi_sdb_cache
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (5 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 06 of 16] Detect devices with protection information turned on in INQUIRY Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 08 of 16] SCSI protection information scatterlist handling Martin K. Petersen
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

1 file changed, 13 insertions(+), 13 deletions(-)
drivers/scsi/scsi_lib.c |   26 +++++++++++++-------------


The data integrity changes need to dynamically allocate
scsi_data_buffers too.  Rename scsi_bidi_sdb_cache for clarity.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r bfbea544d342 -r f37e1616176b drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/scsi_lib.c	Fri Apr 25 17:39:29 2008 -0400
@@ -65,7 +65,7 @@
 };
 #undef SP
 
-static struct kmem_cache *scsi_bidi_sdb_cache;
+static struct kmem_cache *scsi_sdb_cache;
 
 static void scsi_run_queue(struct request_queue *q);
 
@@ -771,7 +771,7 @@
 		struct scsi_data_buffer *bidi_sdb =
 			cmd->request->next_rq->special;
 		scsi_free_sgtable(bidi_sdb);
-		kmem_cache_free(scsi_bidi_sdb_cache, bidi_sdb);
+		kmem_cache_free(scsi_sdb_cache, bidi_sdb);
 		cmd->request->next_rq->special = NULL;
 	}
 }
@@ -1046,7 +1046,7 @@
 
 	if (blk_bidi_rq(cmd->request)) {
 		struct scsi_data_buffer *bidi_sdb = kmem_cache_zalloc(
-			scsi_bidi_sdb_cache, GFP_ATOMIC);
+			scsi_sdb_cache, GFP_ATOMIC);
 		if (!bidi_sdb) {
 			error = BLKPREP_DEFER;
 			goto err_exit;
@@ -1678,11 +1678,11 @@
 		return -ENOMEM;
 	}
 
-	scsi_bidi_sdb_cache = kmem_cache_create("scsi_bidi_sdb",
-					sizeof(struct scsi_data_buffer),
-					0, 0, NULL);
-	if (!scsi_bidi_sdb_cache) {
-		printk(KERN_ERR "SCSI: can't init scsi bidi sdb cache\n");
+	scsi_sdb_cache = kmem_cache_create("scsi_sdb",
+					   sizeof(struct scsi_data_buffer),
+					   0, 0, NULL);
+	if (!scsi_sdb_cache) {
+		printk(KERN_ERR "SCSI: can't init scsi sdb cache\n");
 		goto cleanup_io_context;
 	}
 
@@ -1695,7 +1695,7 @@
 		if (!sgp->slab) {
 			printk(KERN_ERR "SCSI: can't init sg slab %s\n",
 					sgp->name);
-			goto cleanup_bidi_sdb;
+			goto cleanup_sdb;
 		}
 
 		sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
@@ -1703,13 +1703,13 @@
 		if (!sgp->pool) {
 			printk(KERN_ERR "SCSI: can't init sg mempool %s\n",
 					sgp->name);
-			goto cleanup_bidi_sdb;
+			goto cleanup_sdb;
 		}
 	}
 
 	return 0;
 
-cleanup_bidi_sdb:
+cleanup_sdb:
 	for (i = 0; i < SG_MEMPOOL_NR; i++) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
 		if (sgp->pool)
@@ -1717,7 +1717,7 @@
 		if (sgp->slab)
 			kmem_cache_destroy(sgp->slab);
 	}
-	kmem_cache_destroy(scsi_bidi_sdb_cache);
+	kmem_cache_destroy(scsi_sdb_cache);
 cleanup_io_context:
 	kmem_cache_destroy(scsi_io_context_cache);
 
@@ -1729,7 +1729,7 @@
 	int i;
 
 	kmem_cache_destroy(scsi_io_context_cache);
-	kmem_cache_destroy(scsi_bidi_sdb_cache);
+	kmem_cache_destroy(scsi_sdb_cache);
 
 	for (i = 0; i < SG_MEMPOOL_NR; i++) {
 		struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 08 of 16] SCSI protection information scatterlist handling
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (6 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 07 of 16] Rename scsi_bidi_sdb_cache Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 09 of 16] Support for the SBC Data Integrity Field format Martin K. Petersen
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

3 files changed, 86 insertions(+)
drivers/scsi/Kconfig     |   15 +++++++++++++++
drivers/scsi/scsi_lib.c  |   42 ++++++++++++++++++++++++++++++++++++++++++
include/scsi/scsi_cmnd.h |   29 +++++++++++++++++++++++++++++


Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r f37e1616176b -r ea489bb64376 drivers/scsi/Kconfig
--- a/drivers/scsi/Kconfig	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/Kconfig	Fri Apr 25 17:39:29 2008 -0400
@@ -260,6 +260,21 @@
 	default m
 	depends on SCSI
 	depends on MODULES
+
+config SCSI_PROTECTION
+       bool "SCSI Data Integrity Protection"
+       depends on SCSI
+       depends on BLK_DEV_INTEGRITY
+       help 
+	 Some SCSI devices support data protection features above and
+	 beyond those implemented in the transport.  Select this
+	 option to enable protection information to be transferred to
+	 and from a device.  Specifically, this option will enable DIF
+	 (Data Integrity Field) for SCSI disks.
+
+	 The SCSI protection features depend on the block layer data
+	 integrity infrastructure so the latter must be enabled for
+	 this option to work.
 
 menu "SCSI Transports"
 	depends on SCSI
diff -r f37e1616176b -r ea489bb64376 drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/scsi_lib.c	Fri Apr 25 17:39:29 2008 -0400
@@ -774,6 +774,13 @@
 		kmem_cache_free(scsi_sdb_cache, bidi_sdb);
 		cmd->request->next_rq->special = NULL;
 	}
+
+#if defined(CONFIG_SCSI_PROTECTION)
+	if (scsi_prot_sg_count(cmd)) {
+		scsi_free_sgtable(cmd->prot_sdb);
+		kmem_cache_free(scsi_sdb_cache, cmd->prot_sdb);
+	}
+#endif
 }
 EXPORT_SYMBOL(scsi_release_buffers);
 
@@ -1027,6 +1034,32 @@
 	return BLKPREP_OK;
 }
 
+#if defined(CONFIG_SCSI_PROTECTION)
+static int scsi_protect_io(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+{
+	struct request *req = cmd->request;
+	struct scsi_data_buffer *pdb;
+	int ivecs, count;
+
+	pdb = kmem_cache_zalloc(scsi_sdb_cache, gfp_mask);
+	if (unlikely(pdb == NULL))
+		return BLKPREP_DEFER;
+
+	ivecs = blk_rq_count_integrity_sg(req);
+
+	if (unlikely(scsi_alloc_sgtable(pdb, ivecs, gfp_mask)))
+		return BLKPREP_DEFER;
+
+	count = blk_rq_map_integrity_sg(req, pdb->table.sgl);
+	BUG_ON(unlikely(count > ivecs));
+
+	cmd->prot_sdb = pdb;
+	cmd->prot_sdb->table.nents = count;
+
+	return BLKPREP_OK;
+}
+#endif
+
 /*
  * Function:    scsi_init_io()
  *
@@ -1058,6 +1091,15 @@
 		if (error)
 			goto err_exit;
 	}
+
+#if defined(CONFIG_SCSI_PROTECTION)
+	if (blk_integrity_rq(cmd->request)) {
+		error = scsi_protect_io(cmd, gfp_mask);
+
+		if (error != BLKPREP_OK)
+			goto err_exit;
+	}
+#endif
 
 	return BLKPREP_OK ;
 
diff -r f37e1616176b -r ea489bb64376 include/scsi/scsi_cmnd.h
--- a/include/scsi/scsi_cmnd.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/scsi/scsi_cmnd.h	Fri Apr 25 17:39:29 2008 -0400
@@ -71,6 +71,9 @@
 
 	/* These elements define the operation we ultimately want to perform */
 	struct scsi_data_buffer sdb;
+#if defined(CONFIG_SCSI_PROTECTION)
+	struct scsi_data_buffer *prot_sdb;
+#endif
 	unsigned underflow;	/* Return error if less than
 				   this amount is transferred */
 
@@ -192,4 +195,30 @@
 				 buf, buflen);
 }
 
+#if defined(CONFIG_SCSI_PROTECTION)
+
+static inline unsigned scsi_prot_sg_count(struct scsi_cmnd *cmd)
+{
+	return cmd->prot_sdb ? cmd->prot_sdb->table.nents : 0;
+}
+
+static inline struct scatterlist *scsi_prot_sglist(struct scsi_cmnd *cmd)
+{
+	return cmd->prot_sdb ? cmd->prot_sdb->table.sgl : NULL;
+}
+
+static inline struct scsi_data_buffer *scsi_prot(struct scsi_cmnd *cmd)
+{
+	return cmd->prot_sdb;
+}
+
+#define scsi_for_each_prot_sg(cmd, sg, nseg, __i)		\
+	for_each_sg(scsi_prot_sglist(cmd), sg, nseg, __i)
+
+#else /* CONFIG_SCSI_PROTECTION */
+
+#define scsi_prot_sg_count(a)		(0)
+
+#endif /* CONFIG_SCSI_PROTECTION */
+
 #endif /* _SCSI_SCSI_CMND_H */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 09 of 16] Support for the SBC Data Integrity Field format
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (7 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 08 of 16] SCSI protection information scatterlist handling Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 10 of 16] Allow sd_print_sense_hdr to be called outside of sd.c Martin K. Petersen
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

9 files changed, 729 insertions(+)
Documentation/scsi/data-integrity.txt |   57 +++
drivers/scsi/Kconfig                  |    1 
drivers/scsi/Makefile                 |    2 
drivers/scsi/scsi_dif.c               |  496 +++++++++++++++++++++++++++++++++
drivers/scsi/scsi_error.c             |    3 
drivers/scsi/scsi_lib.c               |    4 
drivers/scsi/scsi_sysfs.c             |    4 
include/scsi/scsi_dif.h               |  158 ++++++++++
include/scsi/scsi_host.h              |    4 


Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r ea489bb64376 -r 7d9a353f8b7c Documentation/scsi/data-integrity.txt
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Documentation/scsi/data-integrity.txt	Fri Apr 25 17:39:29 2008 -0400
@@ -0,0 +1,57 @@
+----------------------------------------------------------------------
+1.0 INTRODUCTION
+
+For a general overview of the data integrity framework please consult
+Documentation/block/data-integrity.txt.
+
+----------------------------------------------------------------------
+2.0 SCSI LAYER IMPLEMENTATION DETAILS
+
+The scsi_command has been extended with a scatterlist for the
+integrity metadata.  Note that all SCSI mid layer changes refer to
+this using the term "protection information" which is what it is
+called in the T10 spec.
+
+The term DIF (Data Integrity Field) is specific to SCSI disks (SBC).
+The SCSI midlayer doesn't know, or care, about the contents of the
+protection scatterlist, except it calls blk_rq_map_integrity_sg()
+during command initialization.
+
+
+2.1 SCSI DEVICE SCANNING
+
+A SCSI device has the PROTECT bit set in the standard INQUIRY page if
+it supports protection information.  The state of this bit is saved in
+the scsi_device struct.
+
+
+2.2 SCSI DISK SETUP
+
+In the case of a SCSI disk the actual DIF protection format is
+contained in in result of READ CAPACITY(16).  Consequently we have to
+use the 16-byte READ CAPACITY variant if the device is
+protection-capable.
+
+If the device has DIF-enabled we'll negotiate capabilities with the
+HBA.  And if the HBA is capable of protection DMA, the blk_integrity
+profile will be registered.
+
+Currently we only support Type 1 and Type 3.  Type 2 is only defined
+for 32-byte CDBs and is awaiting varlen CDB support.
+
+The controller may support checksum conversion as an optimization.
+Initial benchmarks showed that calculating a 16-bit CRC for each 512
+bytes of an I/O was expensive.  Emulex' hardware had the capability to
+convert an IP checksum to the T10 CRC on the wire.  So as part of the
+negotiation process the checksum algorithm will be selected and the
+blk_integrity profile set accordingly.
+
+----------------------------------------------------------------------
+3.0 HBA INTERFACE
+
+See the following doc:
+
+http://oss.oracle.com/projects/data-integrity/dist/documentation/linux-hba.pdf
+
+----------------------------------------------------------------------
+2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
diff -r ea489bb64376 -r 7d9a353f8b7c drivers/scsi/Kconfig
--- a/drivers/scsi/Kconfig	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/Kconfig	Fri Apr 25 17:39:29 2008 -0400
@@ -265,6 +265,7 @@
        bool "SCSI Data Integrity Protection"
        depends on SCSI
        depends on BLK_DEV_INTEGRITY
+       select CRC_T10DIF
        help 
 	 Some SCSI devices support data protection features above and
 	 beyond those implemented in the transport.  Select this
diff -r ea489bb64376 -r 7d9a353f8b7c drivers/scsi/Makefile
--- a/drivers/scsi/Makefile	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/Makefile	Fri Apr 25 17:39:29 2008 -0400
@@ -148,6 +148,8 @@
 scsi_tgt-y			+= scsi_tgt_lib.o scsi_tgt_if.o
 
 sd_mod-objs	:= sd.o
+sd_mod-$(CONFIG_SCSI_PROTECTION) += scsi_dif.o
+
 sr_mod-objs	:= sr.o sr_ioctl.o sr_vendor.o
 ncr53c8xx-flags-$(CONFIG_SCSI_ZALON) \
 		:= -DCONFIG_NCR53C8XX_PREFETCH -DSCSI_NCR_BIG_ENDIAN \
diff -r ea489bb64376 -r 7d9a353f8b7c drivers/scsi/scsi_dif.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/drivers/scsi/scsi_dif.c	Fri Apr 25 17:39:29 2008 -0400
@@ -0,0 +1,496 @@
+/*
+ * scsi_dif.c - SCSI Data Integrity Field
+ *
+ * Copyright (C) 2007, 2008 Oracle Corporation
+ * Written by: Martin K. Petersen <martin.petersen@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#include <linux/blkdev.h>
+#include <linux/crc-t10dif.h>
+
+#include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_dbg.h>
+#include <scsi/scsi_device.h>
+#include <scsi/scsi_driver.h>
+#include <scsi/scsi_eh.h>
+#include <scsi/scsi_host.h>
+#include <scsi/scsi_ioctl.h>
+#include <scsi/scsicam.h>
+#include <scsi/sd.h>
+#include <scsi/scsi_dif.h>
+
+#include <net/checksum.h>
+
+typedef __u16 (csum_fn) (void *, unsigned int);
+
+static __u16 scsi_dif_crc_fn(void *data, unsigned int len)
+{
+	return cpu_to_be16(crc_t10dif(data, len));
+}
+
+static __u16 scsi_dif_ip_fn(void *data, unsigned int len)
+{
+	return ip_compute_csum(data, len);
+}
+
+/*
+ * Generate protection information
+ */
+static void scsi_dif_generate(struct blk_integrity_exchg *bix, csum_fn *fn)
+{
+	void *buf = bix->data_buf;
+	struct scsi_dif_tuple *sdt = bix->prot_buf;
+	sector_t sector = bix->sector;
+	unsigned int i;
+
+	for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
+		sdt->guard_tag = fn(buf, bix->sector_size);
+		sdt->ref_tag = cpu_to_be32(sector & 0xffffffff);
+		sdt->app_tag = 0;
+
+		buf += bix->sector_size;
+		sector++;
+	}
+}
+
+static void scsi_dif_generate_crc(struct blk_integrity_exchg *bix)
+{
+	scsi_dif_generate(bix, scsi_dif_crc_fn);
+}
+
+static void scsi_dif_generate_ip(struct blk_integrity_exchg *bix)
+{
+	scsi_dif_generate(bix, scsi_dif_ip_fn);
+}
+
+/* 
+ * Verify protection information
+ */
+static int scsi_dif_verify(struct blk_integrity_exchg *bix, csum_fn *fn)
+{
+	void *buf = bix->data_buf;
+	struct scsi_dif_tuple *sdt = bix->prot_buf;
+	sector_t sector = bix->sector;
+	unsigned int i;
+	__u16 csum;
+
+	for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
+		/* Unwritten sectors */
+		if (sdt->app_tag == 0xffff)
+			return 0;
+
+		/* Bad ref tag received from disk */
+		if (sdt->ref_tag == 0xffffffff) {
+			printk(KERN_ERR
+			       "%s: bad phys ref tag on sector %lu\n",
+			       bix->disk_name, sector);
+			return -EIO;
+		}
+
+		if (be32_to_cpu(sdt->ref_tag) != (sector & 0xffffffff)) {
+			printk(KERN_ERR
+			       "%s: ref tag error on sector %lu (rcvd %u)\n",
+			       bix->disk_name, sector,
+			       be32_to_cpu(sdt->ref_tag));
+			return -EIO;
+		}
+
+		csum = fn(buf, bix->sector_size);
+
+		if (sdt->guard_tag != csum) {
+			printk(KERN_ERR "%s: guard tag error on sector %lu " \
+			       "(rcvd %04x, data %04x)\n", bix->disk_name,
+			       sector, be16_to_cpu(sdt->guard_tag),
+			       be16_to_cpu(csum));
+			return -EIO;
+		}
+
+		buf += bix->sector_size;
+		sector++;
+	}
+
+	return 0;
+}
+
+static int scsi_dif_verify_crc(struct blk_integrity_exchg *bix)
+{
+	return scsi_dif_verify(bix, scsi_dif_crc_fn);
+}
+
+static int scsi_dif_verify_ip(struct blk_integrity_exchg *bix)
+{
+	return scsi_dif_verify(bix, scsi_dif_ip_fn);
+}
+
+/*
+ * Interleave tag buffer between app tags
+ */
+static void scsi_dif_set_tag(void *prot, void *tag_buf, unsigned int sectors)
+{
+	struct scsi_dif_tuple *sdt = prot;
+	char *tag = tag_buf;
+	unsigned int i, j;
+
+	for (i = 0, j = 0 ; i < sectors ; i++, j += 2, sdt++)
+		sdt->app_tag = tag[j] << 8 | tag[j+1];
+}
+
+/*
+ * Reassemble tag buffer from app tags
+ */
+static void scsi_dif_get_tag(void *prot, void *tag_buf, unsigned int sectors)
+{
+	struct scsi_dif_tuple *sdt = prot;
+	char *tag = tag_buf;
+	unsigned int i, j;
+
+	for (i = 0, j = 0 ; i < sectors ; i++, j += 2, sdt++) {
+		tag[j] = (sdt->app_tag & 0xff00) >> 8;
+		tag[j+1] = sdt->app_tag & 0xff;
+	}
+}
+
+static struct blk_integrity dif_integrity_crc = {
+	.name			= "T10-DIF-CRC",
+	.generate_fn		= scsi_dif_generate_crc,
+	.verify_fn		= scsi_dif_verify_crc,
+	.get_tag_fn		= scsi_dif_get_tag,
+	.set_tag_fn		= scsi_dif_set_tag,
+	.tuple_size		= sizeof(struct scsi_dif_tuple),
+	.tag_size		= 0,
+};
+
+static struct blk_integrity dif_integrity_ip = {
+	.name			= "T10-DIF-IP",
+	.generate_fn		= scsi_dif_generate_ip,
+	.verify_fn		= scsi_dif_verify_ip,
+	.get_tag_fn		= scsi_dif_get_tag,
+	.set_tag_fn		= scsi_dif_set_tag,
+	.tuple_size		= sizeof(struct scsi_dif_tuple),
+	.tag_size		= 0,
+};
+
+/*
+ * The ATO bit indicates whether the application tag is available to
+ * the OS
+ */
+void scsi_dif_app_tag_own(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+	int res, offset;
+	struct scsi_device *sdp = sdkp->device;
+	struct scsi_mode_data data;
+	struct scsi_sense_hdr sshdr;
+
+	if (sdp->type != TYPE_DISK)
+		return;
+
+	if (sdkp->protection_type == 0)
+		return;
+
+	res = scsi_mode_sense(sdp, 1, 0x0a, buffer, 36, SD_TIMEOUT,
+			      SD_MAX_RETRIES, &data, &sshdr);
+
+	if (!scsi_status_is_good(res) || !data.header_length ||
+	    data.length < 6) {
+		sd_printk(KERN_WARNING, sdkp,
+			  "getting Control mode page failed, assume no ATO\n");
+
+		if (scsi_sense_valid(&sshdr))
+			sd_print_sense_hdr(sdkp, &sshdr);
+
+		goto no_ato;
+	}
+
+	offset = data.header_length + data.block_descriptor_length;
+
+	if ((buffer[offset] & 0x3f) != 0x0a) {
+		sd_printk(KERN_ERR, sdkp, "ATO Got wrong page\n");
+		goto no_ato;
+	}
+
+	if ((buffer[offset + 5] & 0x80) == 0)
+		goto no_ato;
+
+	sdkp->ATO = 1;
+	sd_printk(KERN_NOTICE, sdkp, "ATO Enabled\n");
+
+	return;
+
+no_ato:
+	sd_printk(KERN_NOTICE, sdkp, "ATO Disabled\n");
+}
+EXPORT_SYMBOL(scsi_dif_app_tag_own);
+
+/*
+ * Determine whether disk supports Data Integrity Field
+ */
+void scsi_dif_config_disk(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+	struct scsi_device *sdp = sdkp->device;
+	u8 type;
+
+	if (sdp->protection == 0 || (buffer[12] & 1) == 0)
+		type = 0;
+	else
+		type = ((buffer[12] >> 1) & 7) + 1; /* P_TYPE 0 = Type 1 */
+
+	switch (type) {
+	case SCSI_DIF_TYPE0_PROTECTION:
+		sd_printk(KERN_NOTICE, sdkp, "formatted without data " \
+			  "integrity protection\n");
+		sdkp->protection_type = 0;
+		break;
+
+	case SCSI_DIF_TYPE1_PROTECTION:
+	case SCSI_DIF_TYPE3_PROTECTION:
+		sd_printk(KERN_NOTICE, sdkp, "formatted with DIF Type %d " \
+			  "protection\n", type);
+		sdkp->protection_type = type;
+		break;
+
+	case SCSI_DIF_TYPE2_PROTECTION:
+		sd_printk(KERN_ERR, sdkp, "formatted with DIF Type 2 "	\
+			  "protection which is currently unsupported. "	\
+			  "Disabling disk!\n");
+		goto disable;
+
+	default:
+		sd_printk(KERN_ERR, sdkp, "formatted with unknown "	\
+			  "protection type %d. Disabling disk!\n", type);
+		goto disable;
+	}
+
+	return;
+
+disable:
+	sdkp->protection_type = 0;
+	sdkp->capacity = 0;
+}
+EXPORT_SYMBOL(scsi_dif_config_disk);
+
+/*
+ * Configure exchange of protection information between OS and HBA
+ */
+void scsi_dif_config_host(struct scsi_disk *sdkp)
+{
+	struct scsi_device *sdp = sdkp->device;
+	struct gendisk *disk = sdkp->disk;
+	u8 type = sdkp->protection_type;
+
+	if (type == SCSI_DIF_TYPE0_PROTECTION)
+		return;
+
+	if (scsi_host_dif_dma(sdp->host) == 0) {
+		sd_printk(KERN_NOTICE, sdkp, "Type %d protection "	\
+			  "unsupported by HBA. No protection DMA!\n",
+			  type);
+		sdkp->protection_type = 0;
+		return;
+	}
+
+	if (scsi_host_dif_type(sdp->host, type) == 0) {
+		sd_printk(KERN_NOTICE, sdkp, "Type %d protection "	\
+			  "unsupported by HBA. Disabling DIF!\n",
+			  type);
+		sdkp->protection_type = 0;
+		return;
+	}
+
+	if (scsi_host_guard_type(sdkp->device->host) & SCSI_DIF_GUARD_IP)
+		blk_integrity_register(disk, &dif_integrity_ip);
+	else
+		blk_integrity_register(disk, &dif_integrity_crc);
+
+	sd_printk(KERN_INFO, sdkp,
+		  "Enabling %s data integrity protection between OS and HBA\n",
+		  disk->integrity->name);
+
+	/* Signal to block layer that we can store a 16 bit tag per sector */
+	if (sdkp->ATO)
+		disk->integrity->tag_size = sizeof(u16);
+}
+EXPORT_SYMBOL(scsi_dif_config_host);
+
+/*
+ * DIF DMA operation magic decoder ring.  DIF-capable HBA drivers
+ * should call this function in their queuecommand to determine how to
+ * handle the I/O.
+ */
+unsigned char scsi_dif_op(struct scsi_cmnd *scmd)
+{
+	struct request *rq = scmd->request;
+	struct scsi_disk *sdkp;
+	int hba_to_disk, os_to_hba, csum_convert;
+
+	if (rq->cmd_type != REQ_TYPE_FS)
+		return SCSI_DIF_NORMAL;
+
+	/* Protection information passed between OS and HBA */
+	sdkp = scsi_disk(rq->rq_disk);
+	hba_to_disk = sdkp->protection_type;
+
+	/* Protection information between HBA and storage device */
+	os_to_hba = scsi_prot_sg_count(scmd);
+
+	/* Convert checksum? */
+	if (scsi_host_guard_type(scmd->device->host) == SCSI_DIF_GUARD_IP)
+		csum_convert = 1;
+	else
+		csum_convert = 0;
+
+	switch (scmd->cmnd[0]) {
+	case READ_10:
+	case READ_12:
+	case READ_16:
+		if (hba_to_disk && os_to_hba)
+			return	csum_convert ?
+				SCSI_DIF_READ_CONVERT :
+				SCSI_DIF_READ_PASS;
+
+		else if (hba_to_disk && !os_to_hba)
+			return SCSI_DIF_READ_STRIP;
+
+		else if (!hba_to_disk && os_to_hba)
+			return SCSI_DIF_READ_INSERT;
+
+		break;
+
+	case WRITE_10:
+	case WRITE_12:
+	case WRITE_16:
+		if (hba_to_disk && os_to_hba)
+			return csum_convert ?
+				SCSI_DIF_WRITE_CONVERT :
+				SCSI_DIF_WRITE_PASS;
+
+		else if (hba_to_disk && !os_to_hba)
+			return SCSI_DIF_WRITE_INSERT;
+
+		else if (!hba_to_disk && os_to_hba)
+			return SCSI_DIF_WRITE_STRIP;
+
+		break;
+	}
+
+	return SCSI_DIF_NORMAL;
+}
+EXPORT_SYMBOL(scsi_dif_op);
+
+/*
+ * The virtual start sector is the one that was originally submitted
+ * by the block layer.	Due to partitioning, MD/DM cloning, etc. the
+ * actual physical start sector is likely to be different.  Remap
+ * protection information to match the physical LBA.
+ */
+int scsi_dif_prepare(struct request *rq, sector_t hw_sector, unsigned int sector_sz)
+{
+	const int tuple_sz = sizeof(struct scsi_dif_tuple);
+	struct bio *bio;
+	struct scsi_disk *sdkp;
+	struct scsi_dif_tuple *sdt;
+	unsigned int i, j;
+	u32 phys, virt;
+
+	/* Already remapped? */
+	if (rq->cmd_flags & REQ_INTEGRITY)
+		return 0;
+
+	sdkp = rq->bio->bi_bdev->bd_disk->private_data;
+	rq->cmd_flags |= REQ_INTEGRITY;
+	phys = hw_sector & 0xffffffff;
+
+	__rq_for_each_bio(bio, rq) {
+		struct bio_vec *iv;
+
+		virt = bio->bi_integrity->bip_sector & 0xffffffff;
+
+		bip_for_each_vec(iv, bio->bi_integrity, i) {
+			sdt = kmap_atomic(iv->bv_page, KM_USER0) + iv->bv_offset;
+
+			for (j = 0 ; j < iv->bv_len ; j += tuple_sz, sdt++) {
+
+				if (be32_to_cpu(sdt->ref_tag) != virt)
+					goto error;
+
+				sdt->ref_tag = cpu_to_be32(phys);
+				virt++;
+				phys++;
+			}
+
+			kunmap_atomic(iv->bv_page, KM_USER0);
+		}
+	}
+
+	return 0;
+
+error:
+	sd_printk(KERN_ERR, sdkp, "%s: virt %u, phys %u, ref %u\n",
+		  __func__, virt, phys, be32_to_cpu(sdt->ref_tag));
+
+	return -EIO;
+}
+
+/*
+ * Remap physical sector values in the reference tag to the virtual
+ * values expected by the block layer.
+ */
+void scsi_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes)
+{
+	const int tuple_sz = sizeof(struct scsi_dif_tuple);
+	struct bio *bio;
+	struct scsi_dif_tuple *sdt;
+	unsigned int i, j, sectors, sector_sz;
+	u32 phys, virt;
+
+	sector_sz = scmd->device->sector_size;
+	sectors = good_bytes / sector_sz;
+
+	phys = scmd->request->sector & 0xffffffff;
+	if (sector_sz == 4096)
+		phys >>= 3;
+
+	__rq_for_each_bio(bio, scmd->request) {
+		struct bio_vec *iv;
+
+		virt = bio->bi_integrity->bip_sector & 0xffffffff;
+
+		bip_for_each_vec(iv, bio->bi_integrity, i) {
+			sdt = kmap_atomic(iv->bv_page, KM_USER0) + iv->bv_offset;
+
+			for (j = 0 ; j < iv->bv_len ; j += tuple_sz, sdt++) {
+
+				if (sectors == 0)
+					return;
+
+				if (be32_to_cpu(sdt->ref_tag) != phys &&
+				    sdt->app_tag != 0xffff) {
+					sdt->ref_tag = 0xffffffff; /* Bad ref */
+				}
+				else
+					sdt->ref_tag = cpu_to_be32(virt);
+
+				virt++;
+				phys++;
+				sectors--;
+			}
+
+			kunmap_atomic(iv->bv_page, KM_USER0);
+		}
+	}
+}
diff -r ea489bb64376 -r 7d9a353f8b7c drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/scsi_error.c	Fri Apr 25 17:39:29 2008 -0400
@@ -333,6 +333,9 @@
 		return /* soft_error */ SUCCESS;
 
 	case ABORTED_COMMAND:
+		if (sshdr.asc == 0x10) /* DIF */
+			return SUCCESS;
+
 		return NEEDS_RETRY;
 	case NOT_READY:
 	case UNIT_ATTENTION:
diff -r ea489bb64376 -r 7d9a353f8b7c drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/scsi_lib.c	Fri Apr 25 17:39:29 2008 -0400
@@ -943,6 +943,10 @@
 				scsi_requeue_command(q, cmd);
 				return;
 			} else {
+				if (sshdr.asc == 0x10) { /* DIF */
+					scsi_print_result(cmd);
+					scsi_print_sense("", cmd);
+				}
 				scsi_end_request(cmd, -EIO, this_count, 1);
 				return;
 			}
diff -r ea489bb64376 -r 7d9a353f8b7c drivers/scsi/scsi_sysfs.c
--- a/drivers/scsi/scsi_sysfs.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/scsi_sysfs.c	Fri Apr 25 17:39:29 2008 -0400
@@ -247,6 +247,8 @@
 shost_rd_attr(can_queue, "%hd\n");
 shost_rd_attr(sg_tablesize, "%hu\n");
 shost_rd_attr(unchecked_isa_dma, "%d\n");
+shost_rd_attr(dif_capabilities, "%hd\n");
+shost_rd_attr(dif_guard_type, "%hd\n");
 shost_rd_attr2(proc_name, hostt->proc_name, "%s\n");
 
 static struct device_attribute *scsi_sysfs_shost_attrs[] = {
@@ -261,6 +263,8 @@
 	&dev_attr_hstate,
 	&dev_attr_supported_mode,
 	&dev_attr_active_mode,
+	&dev_attr_dif_capabilities,
+	&dev_attr_dif_guard_type,
 	NULL
 };
 
diff -r ea489bb64376 -r 7d9a353f8b7c include/scsi/scsi_dif.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/include/scsi/scsi_dif.h	Fri Apr 25 17:39:29 2008 -0400
@@ -0,0 +1,158 @@
+/*
+ * scsi_dif.h - SCSI Data Integrity Field
+ *
+ * Copyright (C) 2007 Oracle Corporation
+ * Written by: Martin K. Petersen <martin.petersen@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; see the file COPYING.  If not, write to
+ * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
+ * USA.
+ *
+ */
+
+#ifndef _SCSI_SCSI_DIF_H
+#define _SCSI_SCSI_DIF_H
+
+#include <scsi/sd.h>
+
+/*
+ * Type 1 through 3 indicate the DIF format. The DMA flag indicates
+ * that the initiator is capable of transferring protection data to
+ * and from host memory.
+ */
+
+enum scsi_dif_host_capabilities {
+	SHOST_DIF_TYPE1_PROTECTION = 1 << 0,
+	SHOST_DIF_TYPE2_PROTECTION = 1 << 1,
+	SHOST_DIF_TYPE3_PROTECTION = 1 << 2,
+	SHOST_DIF_PROTECTION_DMA   = 1 << 7,
+};
+
+static inline void scsi_host_set_dif_caps(struct Scsi_Host *shost, unsigned char mask)
+{
+	shost->dif_capabilities = mask;
+}
+
+static inline unsigned char scsi_host_dif_type(struct Scsi_Host *shost, unsigned int target_type)
+{
+	if (target_type == 0)
+		return 0;
+
+	return shost->dif_capabilities & (1 << (target_type - 1));
+}
+
+static inline unsigned char scsi_host_dif_dma(struct Scsi_Host *shost)
+{
+	return shost->dif_capabilities & SHOST_DIF_PROTECTION_DMA;
+}
+
+/*
+ * All DIF-capable initiators must support the T10-mandated CRC
+ * checksum.  Controllers can optionally implement the IP checksum
+ * scheme which has much lower impact on system performance.  Note
+ * that the main rationale for the checksum is to match integrity
+ * metadata with data.  Detecting bit errors are a job for ECC memory
+ * and buses.
+ */
+
+enum scsi_dif_guard_types {
+	SCSI_DIF_GUARD_CRC = 1 << 0,
+	SCSI_DIF_GUARD_IP  = 1 << 1,
+};
+
+static inline void scsi_host_set_guard_type(struct Scsi_Host *shost, unsigned char type)
+{
+	shost->dif_guard_type = type;
+}
+
+static inline unsigned char scsi_host_guard_type(struct Scsi_Host *shost)
+{
+	return shost->dif_guard_type;
+}
+
+/*
+ * Depending on the protection scheme implemented by initiator and
+ * target device, the request needs to be routed accordingly.  The
+ * host operations below are hints that tell the controller driver how
+ * to handle the I/O.
+ */
+
+enum scsi_dif_host_operations {
+	/* Normal I/O */
+	SCSI_DIF_NORMAL = 0,
+
+	/* OS-HBA: Protected, HBA-Target: Unprotected */
+	SCSI_DIF_READ_INSERT,
+	SCSI_DIF_WRITE_STRIP,
+
+	/* OS-HBA: Unprotected, HBA-Target: Protected */
+	SCSI_DIF_READ_STRIP,
+	SCSI_DIF_WRITE_INSERT,
+
+	/* OS-HBA: Protected, HBA-Target: Protected */
+	SCSI_DIF_READ_PASS,
+	SCSI_DIF_WRITE_PASS,
+
+	/* OS-HBA: Protected, HBA-Target: Protected, checksum conversion */
+	SCSI_DIF_READ_CONVERT,
+	SCSI_DIF_WRITE_CONVERT,
+};
+
+/* A DIF-capable target device can be formatted with different
+ * protection schemes.  Currently 0 through 3 are defined:
+ *
+ * Type 0 is regular (unprotected I/O)
+ *
+ * Type 1 defines the contents of the guard and reference tags
+ *
+ * Type 2 defines the contents of the guard and reference tags and
+ * uses 32-byte commands to seed the latter
+ *
+ * Type 3 defines the contents of the guard tag only
+ */
+
+enum scsi_dif_target_protection_types {
+	SCSI_DIF_TYPE0_PROTECTION = 0x0,
+	SCSI_DIF_TYPE1_PROTECTION = 0x1,
+	SCSI_DIF_TYPE2_PROTECTION = 0x2,
+	SCSI_DIF_TYPE3_PROTECTION = 0x3,
+};
+
+/* DIF contents are considered data and consequently host-endian */
+struct scsi_dif_tuple {
+       __u16 guard_tag;
+       __u16 app_tag;
+       __u32 ref_tag;
+};
+
+#if defined(CONFIG_SCSI_PROTECTION)
+
+extern unsigned char scsi_dif_op(struct scsi_cmnd *);
+extern void scsi_dif_app_tag_own(struct scsi_disk *, unsigned char *);
+extern void scsi_dif_config_disk(struct scsi_disk *, unsigned char *);
+extern void scsi_dif_config_host(struct scsi_disk *);
+extern int scsi_dif_prepare(struct request *rq, sector_t, unsigned int);
+extern void scsi_dif_complete(struct scsi_cmnd *, unsigned int);
+
+#else /* CONFIG_SCSI_PROTECTION */
+
+#define scsi_dif_op(a)				(0)
+#define scsi_dif_app_tag_own(a, b)		do { } while (0)
+#define scsi_dif_config_disk(a, b)		do { } while (0)
+#define scsi_dif_config_host(a)			do { } while (0)
+#define scsi_dif_prepare(a, b, c)		(0)
+#define scsi_dif_complete(a, b)			(0)
+
+#endif /* CONFIG_SCSI_PROTECTION */
+
+#endif /* _SCSI_SCSI_DIF_H */
diff -r ea489bb64376 -r 7d9a353f8b7c include/scsi/scsi_host.h
--- a/include/scsi/scsi_host.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/scsi/scsi_host.h	Fri Apr 25 17:39:29 2008 -0400
@@ -638,6 +638,10 @@
 	 */
 	unsigned int max_host_blocked;
 
+	/* Data Integrity Field */
+	unsigned char dif_capabilities;
+	unsigned char dif_guard_type;
+
 	/*
 	 * q used for scsi_tgt msgs, async events or any other requests that
 	 * need to be processed in userspace



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 10 of 16] Allow sd_print_sense_hdr to be called outside of sd.c
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (8 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 09 of 16] Support for the SBC Data Integrity Field format Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 11 of 16] Move scsi_disk() accessor function to sd.h Martin K. Petersen
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

2 files changed, 4 insertions(+), 3 deletions(-)
drivers/scsi/sd.c |    5 ++---
include/scsi/sd.h |    2 ++


Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r 7d9a353f8b7c -r d13d7571165e drivers/scsi/sd.c
--- a/drivers/scsi/sd.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/sd.c	Fri Apr 25 17:39:29 2008 -0400
@@ -96,7 +96,6 @@
 static int sd_done(struct scsi_cmnd *);
 static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
 static void scsi_disk_release(struct device *cdev);
-static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
 static void sd_print_result(struct scsi_disk *, int);
 
 static DEFINE_IDR(sd_index_idr);
@@ -1930,14 +1929,14 @@
 module_init(init_sd);
 module_exit(exit_sd);
 
-static void sd_print_sense_hdr(struct scsi_disk *sdkp,
-			       struct scsi_sense_hdr *sshdr)
+void sd_print_sense_hdr(struct scsi_disk *sdkp, struct scsi_sense_hdr *sshdr)
 {
 	sd_printk(KERN_INFO, sdkp, "");
 	scsi_show_sense_hdr(sshdr);
 	sd_printk(KERN_INFO, sdkp, "");
 	scsi_show_extd_sense(sshdr->asc, sshdr->ascq);
 }
+EXPORT_SYMBOL(sd_print_sense_hdr);
 
 static void sd_print_result(struct scsi_disk *sdkp, int result)
 {
diff -r 7d9a353f8b7c -r d13d7571165e include/scsi/sd.h
--- a/include/scsi/sd.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/scsi/sd.h	Fri Apr 25 17:39:29 2008 -0400
@@ -48,6 +48,8 @@
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
+extern void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
+
 #define sd_printk(prefix, sdsk, fmt, a...)				\
         (sdsk)->disk ?							\
 	sdev_printk(prefix, (sdsk)->device, "[%s] " fmt,		\



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 11 of 16] Move scsi_disk() accessor function to sd.h
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (9 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 10 of 16] Allow sd_print_sense_hdr to be called outside of sd.c Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-26  6:23   ` Christoph Hellwig
  2008-04-25 23:12 ` [PATCH 12 of 16] SCSI host driver DIF helpers Martin K. Petersen
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

2 files changed, 5 insertions(+), 5 deletions(-)
drivers/scsi/sd.c |    5 -----
include/scsi/sd.h |    5 +++++


Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r d13d7571165e -r b512cd02f8b0 drivers/scsi/sd.c
--- a/drivers/scsi/sd.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/sd.c	Fri Apr 25 17:39:29 2008 -0400
@@ -292,11 +292,6 @@
 		BUG();
 		return 0;	/* shut up gcc */
 	}
-}
-
-static inline struct scsi_disk *scsi_disk(struct gendisk *disk)
-{
-	return container_of(disk->private_data, struct scsi_disk, driver);
 }
 
 static struct scsi_disk *__scsi_disk_get(struct gendisk *disk)
diff -r d13d7571165e -r b512cd02f8b0 include/scsi/sd.h
--- a/include/scsi/sd.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/scsi/sd.h	Fri Apr 25 17:39:29 2008 -0400
@@ -50,6 +50,11 @@
 
 extern void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
 
+static inline struct scsi_disk *scsi_disk(struct gendisk *disk)
+{
+	return container_of(disk->private_data, struct scsi_disk, driver);
+}
+
 #define sd_printk(prefix, sdsk, fmt, a...)				\
         (sdsk)->disk ?							\
 	sdev_printk(prefix, (sdsk)->device, "[%s] " fmt,		\



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 12 of 16] SCSI host driver DIF helpers
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (10 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 11 of 16] Move scsi_disk() accessor function to sd.h Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 13 of 16] Support for SCSI disk (SBC) Data Integrity Field Martin K. Petersen
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

1 file changed, 23 insertions(+)
include/scsi/sd.h |   23 +++++++++++++++++++++++


HBA drivers need to poke in the CDB to prepare DIF commands.  Provide
helpers for extracting {RD,WR}PROTECT and start LBA.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r b512cd02f8b0 -r 5262eba570f4 include/scsi/sd.h
--- a/include/scsi/sd.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/scsi/sd.h	Fri Apr 25 17:39:29 2008 -0400
@@ -1,5 +1,7 @@
 #ifndef _SCSI_DISK_H
 #define _SCSI_DISK_H
+
+#include <scsi/scsi_cmnd.h>
 
 /*
  * More than enough for everybody ;)  The huge number of majors
@@ -61,4 +63,25 @@
 		    (sdsk)->disk->disk_name, ##a) :			\
 	sdev_printk(prefix, (sdsk)->device, fmt, ##a)
 
+static inline unsigned int sd_protect_field(struct scsi_cmnd *scmd)
+{
+	switch(scmd->cmnd[0]) {
+	case READ_10:
+	case READ_12:
+	case READ_16:
+	case WRITE_10:
+	case WRITE_12:
+	case WRITE_16:
+		return (scmd->cmnd[1] & 0xe0) >> 5;
+
+	default:
+		return -1;
+	}
+}
+
+static inline sector_t sd_start_lba(struct scsi_cmnd *scmd)
+{
+	return scmd->request->sector;
+}
+
 #endif /* _SCSI_DISK_H */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 13 of 16] Support for SCSI disk (SBC) Data Integrity Field
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (11 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 12 of 16] SCSI host driver DIF helpers Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 14 of 16] Implement support for DIF in SCSI debug driver Martin K. Petersen
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

2 files changed, 53 insertions(+), 7 deletions(-)
drivers/scsi/sd.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++-------
include/scsi/sd.h |    2 +


Configure DMA of protection information and issue READ/WRITE commands
with RDPROTECT/WRPROTECT set accordingly.

Force READ CAPACITY(16) if the target has the PROTECT bit set and grab
an extra byte of response (P_TYPE and PROT_EN are in byte 12).

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r 5262eba570f4 -r b912d7bb3c47 drivers/scsi/sd.c
--- a/drivers/scsi/sd.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/sd.c	Fri Apr 25 17:39:29 2008 -0400
@@ -59,6 +59,7 @@
 #include <scsi/scsi_ioctl.h>
 #include <scsi/scsicam.h>
 #include <scsi/sd.h>
+#include <scsi/scsi_dif.h>
 
 #include "scsi_logging.h"
 
@@ -233,6 +234,24 @@
 	return snprintf(buf, 40, "%d\n", sdkp->device->allow_restart);
 }
 
+static ssize_t
+sd_show_protection_type(struct device *dev, struct device_attribute *attr,
+			char *buf)
+{
+	struct scsi_disk *sdkp = to_scsi_disk(dev);
+
+	return snprintf(buf, 20, "%u\n", sdkp->protection_type);
+}
+
+static ssize_t
+sd_show_app_tag_own(struct device *dev, struct device_attribute *attr,
+		    char *buf)
+{
+	struct scsi_disk *sdkp = to_scsi_disk(dev);
+
+	return snprintf(buf, 20, "%u\n", sdkp->ATO);
+}
+
 static struct device_attribute sd_disk_attrs[] = {
 	__ATTR(cache_type, S_IRUGO|S_IWUSR, sd_show_cache_type,
 	       sd_store_cache_type),
@@ -241,6 +260,8 @@
 	       sd_store_allow_restart),
 	__ATTR(manage_start_stop, S_IRUGO|S_IWUSR, sd_show_manage_start_stop,
 	       sd_store_manage_start_stop),
+	__ATTR(protection_type, S_IRUGO, sd_show_protection_type, NULL),
+	__ATTR(app_tag_own, S_IRUGO, sd_show_app_tag_own, NULL),
 	__ATTR_NULL,
 };
 
@@ -353,6 +374,7 @@
 	struct scsi_cmnd *SCpnt;
 	struct scsi_device *sdp = q->queuedata;
 	struct gendisk *disk = rq->rq_disk;
+	struct scsi_disk *sdkp;
 	sector_t block = rq->sector;
 	unsigned int this_count = rq->nr_sectors;
 	unsigned int timeout = sdp->timeout;
@@ -369,6 +391,7 @@
 	if (ret != BLKPREP_OK)
 		goto out;
 	SCpnt = rq->special;
+	sdkp = scsi_disk(disk);
 
 	/* from here on until we're complete, any goto out
 	 * is used for a killable error condition */
@@ -458,6 +481,11 @@
 		}
 		SCpnt->cmnd[0] = WRITE_6;
 		SCpnt->sc_data_direction = DMA_TO_DEVICE;
+
+		if (blk_integrity_rq(rq) && 
+		    scsi_dif_prepare(rq, block, sdp->sector_size) == -EIO)
+			goto out;
+
 	} else if (rq_data_dir(rq) == READ) {
 		SCpnt->cmnd[0] = READ_6;
 		SCpnt->sc_data_direction = DMA_FROM_DEVICE;
@@ -472,8 +500,11 @@
 					"writing" : "reading", this_count,
 					rq->nr_sectors));
 
-	SCpnt->cmnd[1] = 0;
-	
+	if (scsi_host_dif_type(sdp->host, sdkp->protection_type))
+		SCpnt->cmnd[1] = 1 << 5;
+	else
+		SCpnt->cmnd[1] = 0;
+
 	if (block > 0xffffffff) {
 		SCpnt->cmnd[0] += READ_16 - READ_6;
 		SCpnt->cmnd[1] |= blk_fua_rq(rq) ? 0x8 : 0;
@@ -491,6 +522,7 @@
 		SCpnt->cmnd[13] = (unsigned char) this_count & 0xff;
 		SCpnt->cmnd[14] = SCpnt->cmnd[15] = 0;
 	} else if ((this_count > 0xff) || (block > 0x1fffff) ||
+		   SCpnt->device->protection ||
 		   SCpnt->device->use_10_for_rw) {
 		if (this_count > 0xffff)
 			this_count = 0xffff;
@@ -516,6 +548,8 @@
 				    "FUA write on READ/WRITE(6) drive\n");
 			goto out;
 		}
+
+		BUG_ON(sdkp->protection_type);
 
 		SCpnt->cmnd[1] |= (unsigned char) ((block >> 16) & 0x1f);
 		SCpnt->cmnd[2] = (unsigned char) ((block >> 8) & 0xff);
@@ -1005,7 +1039,8 @@
 		good_bytes = xfer_size;
 		break;
 	case ILLEGAL_REQUEST:
-		if (SCpnt->device->use_10_for_rw &&
+		if (SCpnt->device->protection == 0 &&
+		    SCpnt->device->use_10_for_rw &&
 		    (SCpnt->cmnd[0] == READ_10 ||
 		     SCpnt->cmnd[0] == WRITE_10))
 			SCpnt->device->use_10_for_rw = 0;
@@ -1018,6 +1053,9 @@
 		break;
 	}
  out:
+	if (rq_data_dir(SCpnt->request) == READ && scsi_prot_sg_count(SCpnt))
+		scsi_dif_complete(SCpnt, good_bytes);
+
 	return good_bytes;
 }
 
@@ -1172,7 +1210,8 @@
 	unsigned char cmd[16];
 	int the_result, retries;
 	int sector_size = 0;
-	int longrc = 0;
+	/* Force READ CAPACITY(16) when PROTECT=1 */
+	int longrc = sdkp->device->protection ? 1 : 0;
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	struct scsi_device *sdp = sdkp->device;
@@ -1184,8 +1223,8 @@
 			memset((void *) cmd, 0, 16);
 			cmd[0] = SERVICE_ACTION_IN;
 			cmd[1] = SAI_READ_CAPACITY_16;
-			cmd[13] = 12;
-			memset((void *) buffer, 0, 12);
+			cmd[13] = 13;
+			memset((void *) buffer, 0, 13);
 		} else {
 			cmd[0] = READ_CAPACITY;
 			memset((void *) &cmd[1], 0, 9);
@@ -1193,7 +1232,7 @@
 		}
 		
 		the_result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE,
-					      buffer, longrc ? 12 : 8, &sshdr,
+					      buffer, longrc ? 13 : 8, &sshdr,
 					      SD_TIMEOUT, SD_MAX_RETRIES);
 
 		if (media_not_present(sdkp, &sshdr))
@@ -1268,6 +1307,8 @@
 			
 		sector_size = (buffer[8] << 24) |
 			(buffer[9] << 16) | (buffer[10] << 8) | buffer[11];
+
+		scsi_dif_config_disk(sdkp, buffer);
 	}	
 
 	/* Some devices return the total number of sectors, not the
@@ -1565,6 +1606,7 @@
 	sdkp->write_prot = 0;
 	sdkp->WCE = 0;
 	sdkp->RCD = 0;
+	sdkp->ATO = 0;
 
 	sd_spinup_disk(sdkp);
 
@@ -1576,6 +1618,7 @@
 		sd_read_capacity(sdkp, buffer);
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
+		scsi_dif_app_tag_own(sdkp, buffer);
 	}
 
 	/*
@@ -1709,6 +1752,7 @@
 
 	dev_set_drvdata(dev, sdkp);
 	add_disk(gd);
+	scsi_dif_config_host(sdkp);
 
 	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
 		  sdp->removable ? "removable " : "");
diff -r 5262eba570f4 -r b912d7bb3c47 include/scsi/sd.h
--- a/include/scsi/sd.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/scsi/sd.h	Fri Apr 25 17:39:29 2008 -0400
@@ -43,7 +43,9 @@
 	u32		index;
 	u8		media_present;
 	u8		write_prot;
+	u8		protection_type;/* Data Integrity Field */
 	unsigned	previous_state : 1;
+	unsigned	ATO : 1;	/* state of disk ATO bit */
 	unsigned	WCE : 1;	/* state of disk WCE bit */
 	unsigned	RCD : 1;	/* state of disk RCD bit, unused */
 	unsigned	DPOFUA : 1;	/* state of disk DPOFUA bit */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 14 of 16] Implement support for DIF in SCSI debug driver
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (12 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 13 of 16] Support for SCSI disk (SBC) Data Integrity Field Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 15 of 16] Add support for data integrity to DM Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 16 of 16] Add support for data integrity to MD Martin K. Petersen
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

1 file changed, 385 insertions(+), 4 deletions(-)
drivers/scsi/scsi_debug.c |  389 ++++++++++++++++++++++++++++++++++++++++++++-


Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r b912d7bb3c47 -r f32469624774 drivers/scsi/scsi_debug.c
--- a/drivers/scsi/scsi_debug.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/scsi/scsi_debug.c	Fri Apr 25 17:39:29 2008 -0400
@@ -40,6 +40,8 @@
 #include <linux/moduleparam.h>
 #include <linux/scatterlist.h>
 #include <linux/blkdev.h>
+#include <linux/crc-t10dif.h>
+#include <linux/ctype.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -47,8 +49,12 @@
 #include <scsi/scsi_host.h>
 #include <scsi/scsicam.h>
 #include <scsi/scsi_eh.h>
+#include <scsi/sd.h>
+#include <scsi/scsi_dif.h>
+#include <scsi/scsi_cmnd.h>
 
 #include <linux/stat.h>
+#include <net/checksum.h>
 
 #include "scsi_logging.h"
 
@@ -94,6 +100,9 @@
 #define DEF_VIRTUAL_GB   0
 #define DEF_FAKE_RW	0
 #define DEF_VPD_USE_HOSTNO 1
+#define DEF_PROTECTION 0
+#define DEF_GUARD 1
+#define DEF_ATO 1
 
 /* bit mask values for scsi_debug_opts */
 #define SCSI_DEBUG_OPT_NOISE   1
@@ -142,6 +151,9 @@
 static int scsi_debug_virtual_gb = DEF_VIRTUAL_GB;
 static int scsi_debug_fake_rw = DEF_FAKE_RW;
 static int scsi_debug_vpd_use_hostno = DEF_VPD_USE_HOSTNO;
+static int scsi_debug_protection = DEF_PROTECTION;
+static int scsi_debug_guard = DEF_GUARD;
+static int scsi_debug_ato = DEF_ATO;
 
 static int scsi_debug_cmnd_count = 0;
 
@@ -207,11 +219,15 @@
 static struct sdebug_queued_cmd queued_arr[SCSI_DEBUG_CANQUEUE];
 
 static unsigned char * fake_storep;	/* ramdisk storage */
+static unsigned char * dif_storep;	/* protection info */
 
 static int num_aborts = 0;
 static int num_dev_resets = 0;
 static int num_bus_resets = 0;
 static int num_host_resets = 0;
+static int dif_writes = 0;
+static int dif_reads = 0;
+static int dif_errors = 0;
 
 static DEFINE_SPINLOCK(queued_arr_lock);
 static DEFINE_RWLOCK(atomic_rw);
@@ -220,6 +236,11 @@
 
 static struct bus_type pseudo_lld_bus;
 
+static inline sector_t dif_offset(sector_t sector)
+{
+	return sector << 3;
+}
+
 static struct device_driver sdebug_driverfs_driver = {
 	.name 		= sdebug_proc_name,
 	.bus		= &pseudo_lld_bus,
@@ -227,6 +248,10 @@
 
 static const int check_condition_result =
 		(DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION;
+
+static const int illegal_condition_result =
+	((DRIVER_SENSE|SUGGEST_DIE) << 24) | (DID_ABORT << 16 ) 
+	| SAM_STAT_CHECK_CONDITION;
 
 static unsigned char ctrl_m_pg[] = {0xa, 10, 2, 0, 0, 0, 0, 0,
 				    0, 0, 0x2, 0x4b};
@@ -720,7 +745,11 @@
 		} else if (0x86 == cmd[2]) { /* extended inquiry */
 			arr[1] = cmd[2];	/*sanity */
 			arr[3] = 0x3c;	/* number of following entries */
-			arr[4] = 0x0;   /* no protection stuff */
+			if (scsi_debug_protection)
+				arr[4] = 0x5;   /* SPT: Type 1 Protection,
+						 * GRD_CHK:1, REF_CHK:1 */
+			else
+				arr[4] = 0x0;   /* no protection stuff */
 			arr[5] = 0x7;   /* head of q, ordered + simple q's */
 		} else if (0x87 == cmd[2]) { /* mode page policy */
 			arr[1] = cmd[2];	/*sanity */
@@ -758,6 +787,7 @@
 	arr[2] = scsi_debug_scsi_level;
 	arr[3] = 2;    /* response_data_format==2 */
 	arr[4] = SDEBUG_LONG_INQ_SZ - 5;
+	arr[5] = scsi_debug_protection ? 1 : 0; /* PROTECT bit */
 	if (0 == scsi_debug_vpd_use_hostno)
 		arr[5] = 0x10; /* claim: implicit TGPS */
 	arr[6] = 0x10; /* claim: MultiP */
@@ -906,6 +936,11 @@
 	arr[9] = (SECT_SIZE_PER(target) >> 16) & 0xff;
 	arr[10] = (SECT_SIZE_PER(target) >> 8) & 0xff;
 	arr[11] = SECT_SIZE_PER(target) & 0xff;
+	
+	if (scsi_debug_protection) {
+		arr[12] = (scsi_debug_protection - 1) << 1; /* P_TYPE */
+		arr[12] |= 1; /* PROT_EN */
+	}
 	return fill_from_dev_buffer(scp, arr,
 				    min(alloc_len, SDEBUG_READCAP16_ARR_SZ));
 }
@@ -1057,6 +1092,10 @@
 		ctrl_m_pg[2] |= 0x4;
 	else
 		ctrl_m_pg[2] &= ~0x4;
+
+	if (scsi_debug_ato)
+		ctrl_m_pg[5] |= 0x80; /* ATO=1 */
+
 	memcpy(p, ctrl_m_pg, sizeof(ctrl_m_pg));
 	if (1 == pcontrol)
 		memcpy(p + 2, ch_ctrl_m_pg, sizeof(ch_ctrl_m_pg));
@@ -1527,6 +1566,75 @@
 	return ret;
 }
 
+#ifdef CONFIG_SCSI_PROTECTION
+static int prot_verify_read(struct scsi_cmnd * SCpnt, sector_t start_sec,
+			    unsigned int sectors)
+{
+	unsigned int i, resid;
+	struct scatterlist *psgl;
+	struct scsi_dif_tuple *sdt;
+	sector_t sector;
+	void *paddr;
+
+	sdt = (struct scsi_dif_tuple *)(dif_storep + dif_offset(start_sec));
+
+	for (i=0 ; i<sectors ; i++) {
+		u16 csum;
+
+		if (sdt[i].app_tag == 0xffff)
+			continue;
+
+		sector = start_sec + i;
+
+		switch (scsi_debug_guard) {
+		case 1:
+			csum = ip_compute_csum(fake_storep + (sector * SECT_SIZE),
+					       SECT_SIZE);
+			break;
+		case 0:
+			csum = cpu_to_be16(crc_t10dif(fake_storep + (sector * SECT_SIZE),
+						      SECT_SIZE));
+			break;
+		default:
+			BUG();
+		}
+
+		if (sdt[i].guard_tag != csum) {
+			printk(KERN_ERR "%s: GUARD check failed on sector %lu " \
+			       "rcvd 0x%04x, data 0x%04x\n", __func__, sector, 
+			       be16_to_cpu(sdt[i].guard_tag), be16_to_cpu(csum));
+			dif_errors++;
+			return 0x01;
+		}
+
+		if (be32_to_cpu(sdt[i].ref_tag) != (sector & 0xffffffff)) {
+			printk(KERN_ERR "%s: REF check failed on sector %lu\n",
+			       __func__, sector);
+			dif_errors++;
+			return 0x03;
+		}
+	}
+
+	resid = sectors * 8; /* Bytes of protection data to copy into sgl */
+	sector = start_sec;
+
+	scsi_for_each_prot_sg(SCpnt, psgl, scsi_prot_sg_count(SCpnt), i) {
+		int len = min(psgl->length, resid);
+
+		paddr = kmap_atomic(sg_page(psgl), KM_IRQ0) + psgl->offset;
+		memcpy(paddr, dif_storep + dif_offset(sector), len);
+
+		sector += len >> 3;
+		resid -= len;
+		kunmap_atomic(sg_page(psgl), KM_IRQ0);
+	}
+
+	dif_reads++;
+
+	return 0;
+}
+#endif
+
 static int resp_read(struct scsi_cmnd *SCpnt, unsigned long long lba,
 		     unsigned int num, struct sdebug_dev_info *devip)
 {
@@ -1554,11 +1662,152 @@
 		}
 		return check_condition_result;
 	}
+
+	/* T10 DIF */
+#ifdef CONFIG_SCSI_PROTECTION
+	if (scsi_debug_protection && scsi_prot_sg_count(SCpnt)) {
+		int prot_ret;
+
+		if ((prot_ret = prot_verify_read(SCpnt, lba, num))) {
+			mk_sense_buffer(devip, ABORTED_COMMAND, 0x10, prot_ret);
+			return illegal_condition_result;
+		}
+	}
+#endif
+
 	read_lock_irqsave(&atomic_rw, iflags);
 	ret = do_device_access(SCpnt, devip, lba, num, 0);
 	read_unlock_irqrestore(&atomic_rw, iflags);
 	return ret;
 }
+
+#ifdef CONFIG_SCSI_PROTECTION
+
+void dump_sector(unsigned char *buf, int len)
+{
+	int i, j;
+
+	printk(KERN_ERR ">>> Sector Dump <<<\n");
+
+	for (i=0 ; i<len ; i+=16) {
+		printk(KERN_ERR "%04d: ", i);
+
+		for (j=0 ; j<16 ; j++) {
+			unsigned char c = buf[i+j];
+			if (c >= 0x20 && c < 0x7e)
+				printk(" %c ",buf[i+j]);
+			else
+				printk("%02x ", buf[i+j]);
+		}
+
+		printk("\n");
+	}
+}
+
+static int prot_verify_write(struct scsi_cmnd * SCpnt, sector_t start_sec,
+			     unsigned int sectors)
+{
+	int i, j, ret;
+	struct scsi_dif_tuple *sdt;
+	struct scatterlist *dsgl = scsi_sglist(SCpnt);
+	struct scatterlist *psgl = scsi_prot_sglist(SCpnt);
+	void *daddr, *paddr;
+	sector_t sector = start_sec;
+	int ppage_offset;
+	unsigned short csum;
+
+	if (((SCpnt->cmnd[1] >> 5) & 7) != 1) {
+		printk(KERN_WARNING "scsi_debug: WRPROTECT != 1\n");
+		return 0;
+	}
+
+	BUG_ON(scsi_sg_count(SCpnt) == 0);
+	BUG_ON(scsi_prot_sg_count(SCpnt) == 0);
+
+	paddr = kmap_atomic(sg_page(psgl), KM_IRQ1) + psgl->offset;
+	ppage_offset = 0;
+
+	/* For each data page */
+	scsi_for_each_sg(SCpnt, dsgl, scsi_sg_count(SCpnt), i) {
+		daddr = kmap_atomic(sg_page(dsgl), KM_IRQ0) + dsgl->offset;
+
+		/* For each sector-sized chunk in data page */
+		for (j=0 ; j<dsgl->length ; j+=SECT_SIZE) {
+
+			/* If we're at the end of the current
+			 * protection page advance to the next one
+			 */
+			if (ppage_offset >= psgl->length) {
+				kunmap_atomic(sg_page(psgl), KM_IRQ1);
+				psgl = sg_next(psgl);
+				BUG_ON(psgl == NULL);
+				paddr = kmap_atomic(sg_page(psgl), KM_IRQ1) 
+					+ psgl->offset;
+				ppage_offset = 0;
+			}
+
+			sdt = paddr + ppage_offset;
+
+			switch (scsi_debug_guard) {
+			case 1:
+				csum = ip_compute_csum(daddr, SECT_SIZE);
+				break;
+			case 0:
+				csum = cpu_to_be16(crc_t10dif(daddr, SECT_SIZE));
+				break;
+			default:
+				BUG();
+				ret = 0;
+				goto out;
+			}
+
+			if (sdt->guard_tag != csum) {
+				printk(KERN_ERR "%s: GUARD check failed on sector %lu " \
+				       "rcvd 0x%04x, calculated 0x%04x\n",
+				       __func__, sector, 
+				       be16_to_cpu(sdt->guard_tag), be16_to_cpu(csum));
+				ret = 0x01;
+				dump_sector(daddr, SECT_SIZE);
+				goto out;
+			}
+
+			if (be32_to_cpu(sdt->ref_tag) != (sector & 0xffffffff)) {
+				printk(KERN_ERR 
+				       "%s: REF check failed on sector %lu\n",
+				       __func__, sector);
+				ret = 0x03;
+				dump_sector(daddr, SECT_SIZE);
+				goto out;
+			}
+
+			/* Would be great to copy this in bigger
+			 * chunks.  However, for the sake of
+			 * correctness we need to verify each sector
+			 * before writing it to "stable" storage
+			 */
+			memcpy(dif_storep + dif_offset(sector), sdt, 8);
+
+			sector++;
+			daddr += SECT_SIZE;
+			ppage_offset += sizeof(struct scsi_dif_tuple);
+		}
+
+		kunmap_atomic(sg_page(dsgl), KM_IRQ0);
+	}
+
+	kunmap_atomic(sg_page(psgl), KM_IRQ1);
+
+	dif_writes++;
+
+	return 0;
+
+out:
+	dif_errors++;
+	kunmap_atomic(sg_page(dsgl), KM_IRQ0);
+	kunmap_atomic(sg_page(psgl), KM_IRQ1);
+	return ret;
+}
+#endif
 
 static int resp_write(struct scsi_cmnd *SCpnt, unsigned long long lba,
 		      unsigned int num, struct sdebug_dev_info *devip)
@@ -1569,6 +1818,18 @@
 	ret = check_device_access_params(devip, lba, num);
 	if (ret)
 		return ret;
+
+#ifdef CONFIG_SCSI_PROTECTION
+	/* T10 DIF */
+	if (scsi_debug_protection && scsi_prot_sg_count(SCpnt)) {
+		int prot_ret;
+
+		if ((prot_ret = prot_verify_write(SCpnt, lba, num))) {
+			mk_sense_buffer(devip, ILLEGAL_REQUEST, 0x10, prot_ret);
+			return illegal_condition_result;
+		}
+	}
+#endif
 
 	write_lock_irqsave(&atomic_rw, iflags);
 	ret = do_device_access(SCpnt, devip, lba, num, 1);
@@ -2085,6 +2346,9 @@
 module_param_named(virtual_gb, scsi_debug_virtual_gb, int, S_IRUGO | S_IWUSR);
 module_param_named(vpd_use_hostno, scsi_debug_vpd_use_hostno, int,
 		   S_IRUGO | S_IWUSR);
+module_param_named(protection, scsi_debug_protection, int, S_IRUGO | S_IWUSR);
+module_param_named(guard, scsi_debug_guard, int, S_IRUGO | S_IWUSR);
+module_param_named(ato, scsi_debug_ato, int, S_IRUGO | S_IWUSR);
 
 MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
 MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -2106,7 +2370,8 @@
 MODULE_PARM_DESC(scsi_level, "SCSI level to simulate(def=5[SPC-3])");
 MODULE_PARM_DESC(virtual_gb, "virtual gigabyte size (def=0 -> use dev_size_mb)");
 MODULE_PARM_DESC(vpd_use_hostno, "0 -> dev ids ignore hostno (def=1 -> unique dev ids)");
-
+MODULE_PARM_DESC(protection, "enable support for data integrity field (def=0)");
+MODULE_PARM_DESC(guard, "protection checksum: 0=crc, 1=ip (def=1)");
 
 static char sdebug_info[256];
 
@@ -2153,13 +2418,14 @@
 	    "delay=%d, max_luns=%d, scsi_level=%d\n"
 	    "sector_size=%d bytes, cylinders=%d, heads=%d, sectors=%d\n"
 	    "number of aborts=%d, device_reset=%d, bus_resets=%d, "
-	    "host_resets=%d\n",
+	    "host_resets=%d\ndif_reads=%d dif_writes=%d dif_errors=%d\n",
 	    SCSI_DEBUG_VERSION, scsi_debug_version_date, scsi_debug_num_tgts,
 	    scsi_debug_dev_size_mb, scsi_debug_opts, scsi_debug_every_nth,
 	    scsi_debug_cmnd_count, scsi_debug_delay,
 	    scsi_debug_max_luns, scsi_debug_scsi_level,
 	    SECT_SIZE, sdebug_cylinders_per, sdebug_heads, sdebug_sectors_per,
-	    num_aborts, num_dev_resets, num_bus_resets, num_host_resets);
+	    num_aborts, num_dev_resets, num_bus_resets, num_host_resets,
+	    dif_reads, dif_writes, dif_errors);
 	if (pos < offset) {
 		len = 0;
 		begin = pos;
@@ -2434,6 +2700,83 @@
 DRIVER_ATTR(vpd_use_hostno, S_IRUGO | S_IWUSR, sdebug_vpd_use_hostno_show,
 	    sdebug_vpd_use_hostno_store);
 
+static ssize_t sdebug_protection_show(struct device_driver * ddp, char * buf)
+{
+        return scnprintf(buf, PAGE_SIZE, "%d\n", scsi_debug_protection);
+}
+
+static ssize_t sdebug_protection_store(struct device_driver * ddp,
+				       const char * buf, size_t count)
+{
+        int n;
+	
+	if ((count > 0) && (1 == sscanf(buf, "%d", &n)) && (n >= 0)) {
+		switch (n) {
+		case 0: /* Type 0 Protection (None) */
+		case 1: /* Type 1 Protection (GRD, APP, REF) */
+		case 2: /* Type 2 Protection (READ32/WRITE32) */
+		case 3: /* Type 3 Protection (GRD only) */
+			scsi_debug_protection = n;
+			return count;
+		default:
+			break;
+		}
+	}
+	return -EINVAL;
+}
+DRIVER_ATTR(protection, S_IRUGO | S_IWUSR, sdebug_protection_show,
+	    sdebug_protection_store);
+
+static ssize_t sdebug_guard_show(struct device_driver * ddp, char * buf)
+{
+        return scnprintf(buf, PAGE_SIZE, "%u\n", scsi_debug_guard);
+}
+
+static ssize_t sdebug_guard_store(struct device_driver * ddp,
+				  const char * buf, size_t count)
+{
+        int n;
+	
+	if ((count > 0) && (1 == sscanf(buf, "%u", &n)) && (n >= 0)) {
+		switch (n) {
+		case 0: /* T10 DIF CRC */
+		case 1: /* IP checksum */
+			scsi_debug_protection = n;
+			return count;
+		default:
+			break;
+		}
+	}
+	return -EINVAL;
+}
+DRIVER_ATTR(guard, S_IRUGO | S_IWUSR, sdebug_guard_show,
+	    sdebug_guard_store);
+
+static ssize_t sdebug_ato_show(struct device_driver * ddp, char * buf)
+{
+        return scnprintf(buf, PAGE_SIZE, "%u\n", scsi_debug_ato);
+}
+
+static ssize_t sdebug_ato_store(struct device_driver * ddp,
+				const char * buf, size_t count)
+{
+        int n;
+	
+	if ((count > 0) && (1 == sscanf(buf, "%u", &n)) && (n >= 0)) {
+		switch (n) {
+		case 0: scsi_debug_ato = 0;
+			return count;
+		case 1: scsi_debug_ato = 1;
+			return count;
+		default:
+			break;
+		}
+	}
+	return -EINVAL;
+}
+DRIVER_ATTR(ato, S_IRUGO | S_IWUSR, sdebug_ato_show,
+	    sdebug_ato_store);
+
 /* Note: The following function creates attribute files in the
    /sys/bus/pseudo/drivers/scsi_debug directory. The advantage of these
    files (over those found in the /sys/module/scsi_debug/parameters
@@ -2499,6 +2842,8 @@
 	int k;
 	int ret;
 
+	ret = 0;
+
 	if (scsi_debug_dev_size_mb < 1)
 		scsi_debug_dev_size_mb = 1;  /* force minimum 1 MB ramdisk */
 	sz = (unsigned long)scsi_debug_dev_size_mb * 1048576;
@@ -2530,6 +2875,25 @@
 	memset(fake_storep, 0, sz);
 	if (scsi_debug_num_parts > 0)
 		sdebug_build_parts(fake_storep, sz);
+
+#ifdef CONFIG_SCSI_PROTECTION
+	if (scsi_debug_protection) {
+		int dif_size;
+
+		dif_size = sdebug_store_sectors * sizeof(struct scsi_dif_tuple);
+		dif_storep = vmalloc(dif_size);
+
+		printk(KERN_ERR "scsi_debug_init: dif_storep %u bytes @ %p\n", 
+		       dif_size, dif_storep);
+
+		if (dif_storep == NULL) {
+			printk(KERN_ERR "scsi_debug_init: out of mem. (DIF)\n");
+			goto free_vm;
+		}
+
+		memset(dif_storep, 0xff, dif_size);
+	}
+#endif
 
 	ret = device_register(&pseudo_primary);
 	if (ret < 0) {
@@ -2583,6 +2947,8 @@
 dev_unreg:
 	device_unregister(&pseudo_primary);
 free_vm:
+	if (dif_storep)
+		vfree(dif_storep);
 	vfree(fake_storep);
 
 	return ret;
@@ -2600,6 +2966,8 @@
 	bus_unregister(&pseudo_lld_bus);
 	device_unregister(&pseudo_primary);
 
+	if (dif_storep)
+		vfree(dif_storep);
 	vfree(fake_storep);
 }
 
@@ -2968,6 +3336,19 @@
 		hpnt->max_id = scsi_debug_num_tgts;
 	hpnt->max_lun = SAM2_WLUN_REPORT_LUNS;	/* = scsi_debug_max_luns; */
 
+#ifdef CONFIG_SCSI_PROTECTION
+	if (scsi_debug_protection) {
+		scsi_host_set_dif_caps(hpnt, 
+				       SHOST_DIF_TYPE1_PROTECTION
+				       |SHOST_DIF_PROTECTION_DMA);
+
+		if (scsi_debug_guard == 1)
+			scsi_host_set_guard_type(hpnt, SCSI_DIF_GUARD_IP);
+		else
+			scsi_host_set_guard_type(hpnt, SCSI_DIF_GUARD_CRC);
+	}
+#endif
+
         error = scsi_add_host(hpnt, &sdbg_host->dev);
         if (error) {
                 printk(KERN_ERR "%s: scsi_add_host failed\n", __FUNCTION__);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 15 of 16] Add support for data integrity to DM
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (13 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 14 of 16] Implement support for DIF in SCSI debug driver Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  2008-04-25 23:12 ` [PATCH 16 of 16] Add support for data integrity to MD Martin K. Petersen
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

4 files changed, 60 insertions(+), 4 deletions(-)
drivers/md/dm-table.c         |   35 ++++++++++++++++++++++++++++++++++-
drivers/md/dm.c               |   26 ++++++++++++++++++++++++--
drivers/md/dm.h               |    2 +-
include/linux/device-mapper.h |    1 +


If all subdevices support the same protection format the DM device is
flagged as capable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r f32469624774 -r 49afddbc7220 drivers/md/dm-table.c
--- a/drivers/md/dm-table.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/md/dm-table.c	Fri Apr 25 17:39:29 2008 -0400
@@ -897,8 +897,12 @@
 	return &t->targets[(KEYS_PER_NODE * n) + k];
 }
 
-void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q)
+void dm_table_set_restrictions(struct dm_table *t, struct mapped_device *md)
 {
+	struct request_queue *q = dm_queue(md);
+	struct list_head *devices = dm_table_get_devices(t);
+	struct dm_dev *prev, *cur;
+
 	/*
 	 * Make sure we obey the optimistic sub devices
 	 * restrictions.
@@ -916,6 +920,35 @@
 	else
 		q->queue_flags |= (1 << QUEUE_FLAG_CLUSTER);
 
+	/*
+	 * Run through all devices to ensure they have matching
+	 * integrity profile
+	 */
+	cur = prev = NULL;
+
+	list_for_each_entry(cur, devices, list) {
+
+		if (prev && blk_integrity_compare(prev->bdev, cur->bdev) < 0) {
+			printk(KERN_ERR "%s: %s %s Integrity mismatch!\n",
+			       __func__, prev->bdev->bd_disk->disk_name,
+			       cur->bdev->bd_disk->disk_name);
+			return;
+		}
+		prev = cur;
+	}
+
+	/* Register dm device as being integrity capable */
+	if (prev && bdev_get_integrity(prev->bdev)) {
+		struct gendisk *disk = dm_disk(md);
+
+		if (blk_integrity_register(dm_disk(md), 
+					   bdev_get_integrity(prev->bdev)))
+			printk(KERN_ERR "%s: %s Could not register integrity!\n",
+			       __func__, disk->disk_name);
+		else
+			printk(KERN_INFO "Enabling data integrity on %s\n",
+			       disk->disk_name);
+	}
 }
 
 unsigned int dm_table_get_num_targets(struct dm_table *t)
diff -r f32469624774 -r 49afddbc7220 drivers/md/dm.c
--- a/drivers/md/dm.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/md/dm.c	Fri Apr 25 17:39:29 2008 -0400
@@ -665,6 +665,12 @@
 	clone->bi_size = to_bytes(len);
 	clone->bi_io_vec->bv_offset = offset;
 	clone->bi_io_vec->bv_len = clone->bi_size;
+	clone->bi_flags |= 1 << BIO_CLONED;
+
+	if (bio_integrity(bio)) {
+		bio_integrity_clone(clone, bio, bs);
+		bio_integrity_trim(clone, bio_sector_offset(bio, idx, offset), len);
+	}
 
 	return clone;
 }
@@ -686,6 +692,13 @@
 	clone->bi_vcnt = idx + bv_count;
 	clone->bi_size = to_bytes(len);
 	clone->bi_flags &= ~(1 << BIO_SEG_VALID);
+
+	if (bio_integrity(bio)) {
+		bio_integrity_clone(clone, bio, bs);
+
+		if (idx != bio->bi_idx || clone->bi_size < bio->bi_size)
+			bio_integrity_trim(clone, bio_sector_offset(bio, idx, 0), len);
+	}
 
 	return clone;
 }
@@ -1059,6 +1072,7 @@
 	md->disk->queue = md->queue;
 	md->disk->private_data = md;
 	sprintf(md->disk->disk_name, "dm-%d", minor);
+	printk(KERN_ERR "DM: Created %s\n", md->disk->disk_name);
 	add_disk(md->disk);
 	format_dev_t(md->name, MKDEV(_major, minor));
 
@@ -1108,6 +1122,7 @@
 	mempool_destroy(md->tio_pool);
 	mempool_destroy(md->io_pool);
 	bioset_free(md->bs);
+	blk_integrity_unregister(md->disk);
 	del_gendisk(md->disk);
 	free_minor(minor);
 
@@ -1151,7 +1166,6 @@
 
 static int __bind(struct mapped_device *md, struct dm_table *t)
 {
-	struct request_queue *q = md->queue;
 	sector_t size;
 
 	size = dm_table_get_size(t);
@@ -1172,7 +1186,7 @@
 
 	write_lock(&md->map_lock);
 	md->map = t;
-	dm_table_set_restrictions(t, q);
+	dm_table_set_restrictions(t, md);
 	write_unlock(&md->map_lock);
 
 	return 0;
@@ -1624,7 +1638,15 @@
  */
 struct gendisk *dm_disk(struct mapped_device *md)
 {
+	BUG_ON(md == NULL);
+	BUG_ON(md->disk == NULL);
+
 	return md->disk;
+}
+
+struct request_queue *dm_queue(struct mapped_device *md)
+{
+	return md->queue;
 }
 
 int dm_suspended(struct mapped_device *md)
diff -r f32469624774 -r 49afddbc7220 drivers/md/dm.h
--- a/drivers/md/dm.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/md/dm.h	Fri Apr 25 17:39:29 2008 -0400
@@ -104,7 +104,7 @@
 			     void (*fn)(void *), void *context);
 struct dm_target *dm_table_get_target(struct dm_table *t, unsigned int index);
 struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector);
-void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q);
+void dm_table_set_restrictions(struct dm_table *t, struct mapped_device *md);
 struct list_head *dm_table_get_devices(struct dm_table *t);
 void dm_table_presuspend_targets(struct dm_table *t);
 void dm_table_postsuspend_targets(struct dm_table *t);
diff -r f32469624774 -r 49afddbc7220 include/linux/device-mapper.h
--- a/include/linux/device-mapper.h	Fri Apr 25 17:39:29 2008 -0400
+++ b/include/linux/device-mapper.h	Fri Apr 25 17:39:29 2008 -0400
@@ -194,6 +194,7 @@
 const char *dm_device_name(struct mapped_device *md);
 int dm_copy_name_and_uuid(struct mapped_device *md, char *name, char *uuid);
 struct gendisk *dm_disk(struct mapped_device *md);
+struct request_queue *dm_queue(struct mapped_device *md);
 int dm_suspended(struct mapped_device *md);
 int dm_noflush_suspending(struct dm_target *ti);
 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 16 of 16] Add support for data integrity to MD
  2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
                   ` (14 preceding siblings ...)
  2008-04-25 23:12 ` [PATCH 15 of 16] Add support for data integrity to DM Martin K. Petersen
@ 2008-04-25 23:12 ` Martin K. Petersen
  15 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-25 23:12 UTC (permalink / raw)
  To: linux-scsi

1 file changed, 28 insertions(+), 1 deletion(-)
drivers/md/md.c |   29 ++++++++++++++++++++++++++++-


If all subdevices support the same protection format the MD device is
flagged as capable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

diff -r 49afddbc7220 -r 927db99c5b35 drivers/md/md.c
--- a/drivers/md/md.c	Fri Apr 25 17:39:29 2008 -0400
+++ b/drivers/md/md.c	Fri Apr 25 17:39:29 2008 -0400
@@ -3284,7 +3284,7 @@
 	int err;
 	int chunk_size;
 	struct list_head *tmp;
-	mdk_rdev_t *rdev;
+	mdk_rdev_t *rdev, *prev;
 	struct gendisk *disk;
 	struct mdk_personality *pers;
 	char b[BDEVNAME_SIZE];
@@ -3541,6 +3541,32 @@
 	mddev->changed = 1;
 	md_new_event(mddev);
 	kobject_uevent(&mddev->gendisk->dev.kobj, KOBJ_CHANGE);
+
+	prev = NULL;
+
+	/* Data Integrity */
+	rdev_for_each(rdev, tmp, mddev) {
+
+		if (prev && blk_integrity_compare(prev->bdev, rdev->bdev) < 0) {
+			printk(KERN_ERR "%s: %s %s Integrity mismatch!\n",
+			       __func__, prev->bdev->bd_disk->disk_name,
+			       rdev->bdev->bd_disk->disk_name);
+			return 0;
+		}
+
+		prev = rdev;
+	}
+
+	if (prev && bdev_get_integrity(prev->bdev)) {
+
+		if (blk_integrity_register(disk, prev->bdev->bd_disk->integrity))
+			printk(KERN_ERR "%s: %s Could not register integrity!\n",
+			       __func__, disk->disk_name);
+		else
+			printk(KERN_INFO "Enabling data integrity on %s\n",
+			       disk->disk_name);
+	}
+
 	return 0;
 }
 
@@ -3716,6 +3742,7 @@
 		printk(KERN_INFO "md: %s switched to read-only mode.\n",
 			mdname(mddev));
 	err = 0;
+	blk_integrity_unregister(disk);
 	md_new_event(mddev);
 out:
 	return err;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 11 of 16] Move scsi_disk() accessor function to sd.h
  2008-04-25 23:12 ` [PATCH 11 of 16] Move scsi_disk() accessor function to sd.h Martin K. Petersen
@ 2008-04-26  6:23   ` Christoph Hellwig
  2008-04-26 13:01     ` Martin K. Petersen
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2008-04-26  6:23 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi

On Fri, Apr 25, 2008 at 07:12:13PM -0400, Martin K. Petersen wrote:
> 2 files changed, 5 insertions(+), 5 deletions(-)
> drivers/scsi/sd.c |    5 -----
> include/scsi/sd.h |    5 +++++

Nack.  Creating of sd.h was a big mistake already and should be reverse.
No one but sd.c has any business playing with these.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 11 of 16] Move scsi_disk() accessor function to sd.h
  2008-04-26  6:23   ` Christoph Hellwig
@ 2008-04-26 13:01     ` Martin K. Petersen
  0 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-04-26 13:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Martin K. Petersen, linux-scsi

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> Nack.  Creating of sd.h was a big mistake already and
Christoph> should be reverse.  No one but sd.c has any business
Christoph> playing with these.

DIF is part of SBC hence scsi_dif.c is a logical subset of sd.c.

I could roll scsi_dif.c into sd.c but I don't necessarily think that's
prettier.

Maybe I should rename scsi_dif.c to sd_dif.c?


An alternative to exporting scsi_disk() is to introduce a field in
scsi_cmnd to let the HBA driver know how to handle the request.
That's how I originally did it but didn't want to grow scsi_cmnd when
I could avoid it.  And it seemed DIF-specific.

However, given the impending "I-can't-believe-it's-not-DIF" extensions
for SCSI tape maybe there's merit in having a generic routing field?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05 of 16] Block layer data integrity
  2008-04-25 23:12 ` [PATCH 05 " Martin K. Petersen
@ 2008-05-06 20:29   ` malahal
  2008-05-07  1:56     ` Martin K. Petersen
  0 siblings, 1 reply; 23+ messages in thread
From: malahal @ 2008-05-06 20:29 UTC (permalink / raw)
  To: linux-scsi

Martin K. Petersen [martin.petersen@oracle.com] wrote:
>  		 */
>  		blk_partition_remap(bio);
>  
> +		if (bio_integrity_enabled(bio) && bio_integrity_prep(bio))
> +			goto end_io;
> +
>  		if (old_sector != -1)
>  			blk_add_trace_remap(q, bio, old_dev, bio->bi_sector,
>  					    old_sector);

It is expected that the bio's data should NOT be changed until this I/O
is sent out to the HBA for WRITES. How do you ensure that applications
or file systems don't modify the data of a bio that is in progress?

--Malahal.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05 of 16] Block layer data integrity
  2008-05-06 20:29   ` malahal
@ 2008-05-07  1:56     ` Martin K. Petersen
  2008-05-07  2:50       ` malahal
  0 siblings, 1 reply; 23+ messages in thread
From: Martin K. Petersen @ 2008-05-07  1:56 UTC (permalink / raw)
  To: linux-scsi

>>>>> "Malahal" == malahal  <malahal@us.ibm.com> writes:

Malahal> Martin K. Petersen [martin.petersen@oracle.com] wrote:
>> */ blk_partition_remap(bio);
>> 
>> + if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) + goto
>> end_io;
>> +
>> if (old_sector != -1) blk_add_trace_remap(q, bio, old_dev,
>> bio->bi_sector, old_sector);

Malahal> It is expected that the bio's data should NOT be changed
Malahal> until this I/O is sent out to the HBA for WRITES. How do you
Malahal> ensure that applications or file systems don't modify the
Malahal> data of a bio that is in progress?

As I mentioned on dm-devel a few weeks ago it's a big problem for ext2
in particular due to lack of locking the page down while I/O is in
flight.  So ext2 fails spectacularly.

I have been unable to trip XFS and btrfs.  I had one mismatch with
ext3 but that might have been something else.  In any case I have not
been able to reproduce it.  Even with artificially induced delays in
queuecommand.

But generally, yes.  It is up to the submitter to guarantee that the
page isn't being modified during I/O.  This requirement will trickle
all the way to the top of the stack for things like direct I/O.

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05 of 16] Block layer data integrity
  2008-05-07  1:56     ` Martin K. Petersen
@ 2008-05-07  2:50       ` malahal
  2008-05-07 20:22         ` Martin K. Petersen
  0 siblings, 1 reply; 23+ messages in thread
From: malahal @ 2008-05-07  2:50 UTC (permalink / raw)
  To: linux-scsi

Martin K. Petersen [martin.petersen@oracle.com] wrote:
> But generally, yes.  It is up to the submitter to guarantee that the
> page isn't being modified during I/O.  This requirement will trickle
> all the way to the top of the stack for things like direct I/O.

Understandable for direct I/O. How does an application using mmap for
writing can control this? Does that mean mmap writes are not really
supported?

--Malahal.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 05 of 16] Block layer data integrity
  2008-05-07  2:50       ` malahal
@ 2008-05-07 20:22         ` Martin K. Petersen
  0 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2008-05-07 20:22 UTC (permalink / raw)
  To: linux-scsi

>>>>> "Malahal" == malahal  <malahal@us.ibm.com> writes:

Malahal> Understandable for direct I/O. How does an application using
Malahal> mmap for writing can control this?

There's no way to control it :/


Malahal> Does that mean mmap writes are not really supported?

For now, yes.  Zach proposed unmapping the pages while I/O is in
progress.  I'll dig into that can of worms shortly...

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-05-07 20:24 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-25 23:12 [PATCH 00 of 16] Block/SCSI Data Integrity Support Martin K. Petersen
2008-04-25 23:12 ` [PATCH 01 of 16] Add support for the T10 Data Integrity Field CRC Martin K. Petersen
2008-04-25 23:12 ` [PATCH 02 of 16] Globalize bio_set and bio_vec_slab Martin K. Petersen
2008-04-25 23:12 ` [PATCH 03 of 16] Find bio sector offset given idx and offset Martin K. Petersen
2008-04-25 23:12 ` [PATCH 04 of 16] Block layer data integrity Martin K. Petersen
2008-04-25 23:12 ` [PATCH 05 " Martin K. Petersen
2008-05-06 20:29   ` malahal
2008-05-07  1:56     ` Martin K. Petersen
2008-05-07  2:50       ` malahal
2008-05-07 20:22         ` Martin K. Petersen
2008-04-25 23:12 ` [PATCH 06 of 16] Detect devices with protection information turned on in INQUIRY Martin K. Petersen
2008-04-25 23:12 ` [PATCH 07 of 16] Rename scsi_bidi_sdb_cache Martin K. Petersen
2008-04-25 23:12 ` [PATCH 08 of 16] SCSI protection information scatterlist handling Martin K. Petersen
2008-04-25 23:12 ` [PATCH 09 of 16] Support for the SBC Data Integrity Field format Martin K. Petersen
2008-04-25 23:12 ` [PATCH 10 of 16] Allow sd_print_sense_hdr to be called outside of sd.c Martin K. Petersen
2008-04-25 23:12 ` [PATCH 11 of 16] Move scsi_disk() accessor function to sd.h Martin K. Petersen
2008-04-26  6:23   ` Christoph Hellwig
2008-04-26 13:01     ` Martin K. Petersen
2008-04-25 23:12 ` [PATCH 12 of 16] SCSI host driver DIF helpers Martin K. Petersen
2008-04-25 23:12 ` [PATCH 13 of 16] Support for SCSI disk (SBC) Data Integrity Field Martin K. Petersen
2008-04-25 23:12 ` [PATCH 14 of 16] Implement support for DIF in SCSI debug driver Martin K. Petersen
2008-04-25 23:12 ` [PATCH 15 of 16] Add support for data integrity to DM Martin K. Petersen
2008-04-25 23:12 ` [PATCH 16 of 16] Add support for data integrity to MD Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).