* [PATCH 0/6] Support DAX for device-mapper dm-linear devices @ 2016-06-13 22:21 Toshi Kani 2016-06-13 22:21 ` [PATCH 1/6] genhd: Add GENHD_FL_DAX to gendisk flags Toshi Kani ` (7 more replies) 0 siblings, 8 replies; 40+ messages in thread From: Toshi Kani @ 2016-06-13 22:21 UTC (permalink / raw) To: agk, snitzer, dan.j.williams Cc: ross.zwisler, viro, axboe, toshi.kani, linux-nvdimm, dm-devel, linux-raid, linux-kernel This patch-set adds DAX support to device-mapper dm-linear devices used by LVM. It works with LVM commands as follows: - Creation of a logical volume with all DAX capable devices (such as pmem) sets the logical volume DAX capable as well. - Once a logical volume is set to DAX capable, the volume may not be extended with non-DAX capable devices. The direct_access interface is added to dm and dm-linear to map a request to a target device. - Patch 1-2 introduce GENHD_FL_DAX flag to indicate DAX capability. - Patch 3-4 add direct_access functions to dm and dm-linear. - Patch 5-6 set GENHD_FL_DAX to dm when all targets are DAX capable. --- Toshi Kani (6): 1/6 genhd: Add GENHD_FL_DAX to gendisk flags 2/6 block: Check GENHD_FL_DAX for DAX capability 3/6 dm: Add dm_blk_direct_access() for mapped device 4/6 dm-linear: Add linear_direct_access() 5/6 dm, dm-linear: Add dax_supported to dm_target 6/6 dm: Enable DAX support for mapper device --- drivers/block/brd.c | 2 +- drivers/md/dm-linear.c | 19 +++++++++++++++++++ drivers/md/dm-table.c | 12 +++++++++--- drivers/md/dm.c | 38 ++++++++++++++++++++++++++++++++++++-- drivers/md/dm.h | 7 +++++++ drivers/nvdimm/pmem.c | 2 +- drivers/s390/block/dcssblk.c | 1 + fs/block_dev.c | 5 +++-- include/linux/device-mapper.h | 15 +++++++++++++++ include/linux/genhd.h | 2 +- 10 files changed, 93 insertions(+), 10 deletions(-) ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 1/6] genhd: Add GENHD_FL_DAX to gendisk flags 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani @ 2016-06-13 22:21 ` Toshi Kani 2016-06-13 22:21 ` [PATCH 2/6] block: Check GENHD_FL_DAX for DAX capability Toshi Kani ` (6 subsequent siblings) 7 siblings, 0 replies; 40+ messages in thread From: Toshi Kani @ 2016-06-13 22:21 UTC (permalink / raw) To: agk, snitzer, dan.j.williams Cc: ross.zwisler, viro, axboe, toshi.kani, linux-nvdimm, dm-devel, linux-raid, linux-kernel, Martin Schwidefsky, Heiko Carstens, linux-s390 Currently, presence of direct_access() in block_device_operations indicates support of DAX on its block device. Because block_device_operations is instantiated with 'const', this DAX capablity may not be enabled conditinally. In preparation for supporting DAX to device-mapper devices, add GENHD_FL_DAX to gendisk flags to indicate that this block device supports DAX. This will allow to set the DAX capability based on how mapped device is composed. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <linux-s390@vger.kernel.org> --- drivers/block/brd.c | 2 +- drivers/nvdimm/pmem.c | 2 +- drivers/s390/block/dcssblk.c | 1 + include/linux/genhd.h | 2 +- 4 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index c04bd9b..43a6765 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -518,7 +518,7 @@ static struct brd_device *brd_alloc(int i) disk->fops = &brd_fops; disk->private_data = brd; disk->queue = brd->brd_queue; - disk->flags = GENHD_FL_EXT_DEVT; + disk->flags = GENHD_FL_EXT_DEVT | GENHD_FL_DAX; sprintf(disk->disk_name, "ram%d", i); set_capacity(disk, rd_size * 2); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 608fc44..3847449 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -295,7 +295,7 @@ static int pmem_attach_disk(struct device *dev, disk->fops = &pmem_fops; disk->queue = q; - disk->flags = GENHD_FL_EXT_DEVT; + disk->flags = GENHD_FL_EXT_DEVT | GENHD_FL_DAX; nvdimm_namespace_disk_name(ndns, disk->disk_name); disk->driverfs_dev = dev; set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset) diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index bed53c4..60b1857 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -616,6 +616,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char dev_info->gd->queue = dev_info->dcssblk_queue; dev_info->gd->private_data = dev_info; dev_info->gd->driverfs_dev = &dev_info->dev; + dev_info->gd->flags = GENHD_FL_DAX; blk_queue_make_request(dev_info->dcssblk_queue, dcssblk_make_request); blk_queue_logical_block_size(dev_info->dcssblk_queue, 4096); diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 359a8e4..9dc3a3d 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -131,7 +131,7 @@ struct hd_struct { }; #define GENHD_FL_REMOVABLE 1 -/* 2 is unused */ +#define GENHD_FL_DAX 2 #define GENHD_FL_MEDIA_CHANGE_NOTIFY 4 #define GENHD_FL_CD 8 #define GENHD_FL_UP 16 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 2/6] block: Check GENHD_FL_DAX for DAX capability 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani 2016-06-13 22:21 ` [PATCH 1/6] genhd: Add GENHD_FL_DAX to gendisk flags Toshi Kani @ 2016-06-13 22:21 ` Toshi Kani 2016-06-13 22:21 ` [PATCH 3/6] dm: Add dm_blk_direct_access() for mapped device Toshi Kani ` (5 subsequent siblings) 7 siblings, 0 replies; 40+ messages in thread From: Toshi Kani @ 2016-06-13 22:21 UTC (permalink / raw) To: agk, snitzer, dan.j.williams Cc: ross.zwisler, viro, axboe, toshi.kani, linux-nvdimm, dm-devel, linux-raid, linux-kernel, linux-fsdevel Now that GENHD_FL_DAX is set to all drivers supporting DAX, change bdev_direct_access() and __blkdev_get() to check this GENHD_FL_DAX flag. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: <linux-fsdevel@vger.kernel.org> --- fs/block_dev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 71ccab1..61935ee 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -493,7 +493,7 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax) if (size < 0) return size; - if (!ops->direct_access) + if (!(bdev->bd_disk->flags & GENHD_FL_DAX) || !ops->direct_access) return -EOPNOTSUPP; if ((sector + DIV_ROUND_UP(size, 512)) > part_nr_sects_read(bdev->bd_part)) @@ -1287,7 +1287,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) bdev->bd_disk = disk; bdev->bd_queue = disk->queue; bdev->bd_contains = bdev; - if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access) + if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && + disk->flags & GENHD_FL_DAX) bdev->bd_inode->i_flags = S_DAX; else bdev->bd_inode->i_flags = 0; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 3/6] dm: Add dm_blk_direct_access() for mapped device 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani 2016-06-13 22:21 ` [PATCH 1/6] genhd: Add GENHD_FL_DAX to gendisk flags Toshi Kani 2016-06-13 22:21 ` [PATCH 2/6] block: Check GENHD_FL_DAX for DAX capability Toshi Kani @ 2016-06-13 22:21 ` Toshi Kani 2016-06-13 22:21 ` [PATCH 4/6] dm-linear: Add linear_direct_access() Toshi Kani ` (4 subsequent siblings) 7 siblings, 0 replies; 40+ messages in thread From: Toshi Kani @ 2016-06-13 22:21 UTC (permalink / raw) To: agk, snitzer, dan.j.williams Cc: ross.zwisler, viro, axboe, toshi.kani, linux-nvdimm, dm-devel, linux-raid, linux-kernel Change mapped device to implement direct_access function, dm_blk_direct_access(), which calls a target direct_access function. 'struct target_type' is extended to have target direct_access interface. This function limits direct accessible size to the dm_target's limit with max_io_len(). Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> --- drivers/md/dm.c | 29 +++++++++++++++++++++++++++++ include/linux/device-mapper.h | 10 ++++++++++ 2 files changed, 39 insertions(+) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 1b2f962..6e9f958 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1473,6 +1473,34 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len) } EXPORT_SYMBOL_GPL(dm_set_target_max_io_len); +static long dm_blk_direct_access(struct block_device *bdev, sector_t sector, + void __pmem **kaddr, pfn_t *pfn, long size) +{ + struct mapped_device *md = bdev->bd_disk->private_data; + struct dm_table *map; + struct dm_target *ti; + int srcu_idx; + long len, ret = -EIO; + + map = dm_get_live_table(md, &srcu_idx); + if (!map) + return ret; + + ti = dm_table_find_target(map, sector); + if (!dm_target_is_valid(ti)) + goto out; + + len = max_io_len(sector, ti) << SECTOR_SHIFT; + size = min(len, size); + + if (ti->type->direct_access) + ret = ti->type->direct_access(ti, sector, kaddr, pfn, size); + +out: + dm_put_live_table(md, srcu_idx); + return min(ret, size); +} + /* * A target may call dm_accept_partial_bio only from the map routine. It is * allowed for all bio types except REQ_FLUSH. @@ -3721,6 +3749,7 @@ static const struct block_device_operations dm_blk_dops = { .open = dm_blk_open, .release = dm_blk_close, .ioctl = dm_blk_ioctl, + .direct_access = dm_blk_direct_access, .getgeo = dm_blk_getgeo, .pr_ops = &dm_pr_ops, .owner = THIS_MODULE diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 0830c9e..16e6c8c 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -116,6 +116,15 @@ typedef void (*dm_io_hints_fn) (struct dm_target *ti, */ typedef int (*dm_busy_fn) (struct dm_target *ti); +/* + * Returns: + * < 0 : error + * >= 0 : the number of bytes accessible at the address + */ +typedef long (*dm_direct_access_fn) (struct dm_target *ti, sector_t sector, + void __pmem **kaddr, pfn_t *pfn, + long size); + void dm_error(const char *message); struct dm_dev { @@ -162,6 +171,7 @@ struct target_type { dm_busy_fn busy; dm_iterate_devices_fn iterate_devices; dm_io_hints_fn io_hints; + dm_direct_access_fn direct_access; /* For internal device-mapper use. */ struct list_head list; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 4/6] dm-linear: Add linear_direct_access() 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani ` (2 preceding siblings ...) 2016-06-13 22:21 ` [PATCH 3/6] dm: Add dm_blk_direct_access() for mapped device Toshi Kani @ 2016-06-13 22:21 ` Toshi Kani 2016-06-13 22:21 ` [PATCH 5/6] dm, dm-linear: Add dax_supported to dm_target Toshi Kani ` (3 subsequent siblings) 7 siblings, 0 replies; 40+ messages in thread From: Toshi Kani @ 2016-06-13 22:21 UTC (permalink / raw) To: agk, snitzer, dan.j.williams Cc: ross.zwisler, viro, axboe, toshi.kani, linux-nvdimm, dm-devel, linux-raid, linux-kernel Change dm-linear to implement direct_access function, linear_direct_access(), which maps sector and calls direct_access function of its target device. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> --- drivers/md/dm-linear.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 05c35aa..49bd7d2 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -141,6 +141,22 @@ static int linear_iterate_devices(struct dm_target *ti, return fn(ti, lc->dev, lc->start, ti->len, data); } +static long linear_direct_access(struct dm_target *ti, sector_t sector, + void __pmem **kaddr, pfn_t *pfn, long size) +{ + struct linear_c *lc; + struct block_device *tbdev; + const struct block_device_operations *tops; + sector_t tsector; + + lc = ti->private; + tbdev = lc->dev->bdev; + tops = tbdev->bd_disk->fops; + tsector = linear_map_sector(ti, sector); + + return tops->direct_access(tbdev, tsector, kaddr, pfn, size); +} + static struct target_type linear_target = { .name = "linear", .version = {1, 2, 1}, @@ -151,6 +167,7 @@ static struct target_type linear_target = { .status = linear_status, .prepare_ioctl = linear_prepare_ioctl, .iterate_devices = linear_iterate_devices, + .direct_access = linear_direct_access, }; int __init dm_linear_init(void) ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 5/6] dm, dm-linear: Add dax_supported to dm_target 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani ` (3 preceding siblings ...) 2016-06-13 22:21 ` [PATCH 4/6] dm-linear: Add linear_direct_access() Toshi Kani @ 2016-06-13 22:21 ` Toshi Kani 2016-06-13 22:21 ` [PATCH 6/6] dm: Enable DAX support for mapper device Toshi Kani ` (2 subsequent siblings) 7 siblings, 0 replies; 40+ messages in thread From: Toshi Kani @ 2016-06-13 22:21 UTC (permalink / raw) To: agk, snitzer, dan.j.williams Cc: ross.zwisler, viro, axboe, toshi.kani, linux-nvdimm, dm-devel, linux-raid, linux-kernel Extend 'struct dm_target' to have dax_supported bit, which allows dm-table to check if a dm-target supports DAX. Change dm-linear to set this bit when its target physical device supports DAX. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> --- drivers/md/dm-linear.c | 2 ++ include/linux/device-mapper.h | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 49bd7d2..6fdbbc8 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -59,6 +59,8 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) ti->num_flush_bios = 1; ti->num_discard_bios = 1; ti->num_write_same_bios = 1; + if (lc->dev->bdev->bd_disk->flags & GENHD_FL_DAX) + ti->dax_supported = 1; ti->private = lc; return 0; diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 16e6c8c..9b72989 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -290,6 +290,11 @@ struct dm_target { * Set if this target does not return zeroes on discarded blocks. */ bool discard_zeroes_data_unsupported:1; + + /* + * Set if the target supports DAX (direct access). + */ + bool dax_supported:1; }; /* Each target can link one of these into the table */ ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH 6/6] dm: Enable DAX support for mapper device 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani ` (4 preceding siblings ...) 2016-06-13 22:21 ` [PATCH 5/6] dm, dm-linear: Add dax_supported to dm_target Toshi Kani @ 2016-06-13 22:21 ` Toshi Kani 2016-06-13 22:57 ` [PATCH 0/6] Support DAX for device-mapper dm-linear devices Mike Snitzer 2016-06-13 23:18 ` Dan Williams 7 siblings, 0 replies; 40+ messages in thread From: Toshi Kani @ 2016-06-13 22:21 UTC (permalink / raw) To: agk, snitzer, dan.j.williams Cc: ross.zwisler, viro, axboe, toshi.kani, linux-nvdimm, dm-devel, linux-raid, linux-kernel Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped device supports DAX and is bio based. This new type is used to assure that all target devices have DAX support and remain that way after GENHD_FL_DAX is set to mapped device. At initial table load, GENHD_FL_DAX is set to mapped device when setting DM_TYPE_DAX_BIO_BASED to the type. Any subsequent table load to the mapped device must have the same type, or else it fails per the check in table_load(). Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> --- drivers/md/dm-table.c | 12 +++++++++--- drivers/md/dm.c | 9 +++++++-- drivers/md/dm.h | 7 +++++++ 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 626a5ec..81138e7 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -833,7 +833,7 @@ static bool __table_type_request_based(unsigned table_type) static int dm_table_set_type(struct dm_table *t) { unsigned i; - unsigned bio_based = 0, request_based = 0, hybrid = 0; + unsigned bio_based = 0, request_based = 0, hybrid = 0, num_dax = 0; bool use_blk_mq = false; struct dm_target *tgt; struct dm_dev_internal *dd; @@ -849,6 +849,9 @@ static int dm_table_set_type(struct dm_table *t) else bio_based = 1; + if (tgt->dax_supported) + num_dax++; + if (bio_based && request_based) { DMWARN("Inconsistent table: different target types" " can't be mixed up"); @@ -870,7 +873,10 @@ static int dm_table_set_type(struct dm_table *t) if (bio_based) { /* We must use this table as bio-based */ - t->type = DM_TYPE_BIO_BASED; + if (num_dax && num_dax == t->num_targets) + t->type = DM_TYPE_DAX_BIO_BASED; + else + t->type = DM_TYPE_BIO_BASED; return 0; } @@ -978,7 +984,7 @@ static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device * return -EINVAL; } - if (type == DM_TYPE_BIO_BASED) + if (dm_type_bio_based(type)) for (i = 0; i < t->num_targets; i++) { tgt = t->targets + i; per_io_data_size = max(per_io_data_size, tgt->per_io_data_size); diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 6e9f958..41b8912 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2492,7 +2492,7 @@ static void __bind_mempools(struct mapped_device *md, struct dm_table *t) if (md->bs) { /* The md already has necessary mempools. */ - if (dm_table_get_type(t) == DM_TYPE_BIO_BASED) { + if (dm_type_bio_based(dm_table_get_type(t))) { /* * Reload bioset because front_pad may have changed * because a different table was loaded. @@ -2659,6 +2659,9 @@ void dm_set_md_type(struct mapped_device *md, unsigned type) { BUG_ON(!mutex_is_locked(&md->type_lock)); md->type = type; + + if (type == DM_TYPE_DAX_BIO_BASED) + dm_disk(md)->flags |= GENHD_FL_DAX; } unsigned dm_get_md_type(struct mapped_device *md) @@ -2837,7 +2840,7 @@ out_kfree_tag_set: static unsigned filter_md_type(unsigned type, struct mapped_device *md) { - if (type == DM_TYPE_BIO_BASED) + if (dm_type_bio_based(type)) return type; return !md->use_blk_mq ? DM_TYPE_REQUEST_BASED : DM_TYPE_MQ_REQUEST_BASED; @@ -2867,6 +2870,7 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t) } break; case DM_TYPE_BIO_BASED: + case DM_TYPE_DAX_BIO_BASED: dm_init_normal_md_queue(md); blk_queue_make_request(md->queue, dm_make_request); /* @@ -3573,6 +3577,7 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, unsigned t switch (type) { case DM_TYPE_BIO_BASED: + case DM_TYPE_DAX_BIO_BASED: cachep = _io_cache; pool_size = dm_get_reserved_bio_based_ios(); front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone); diff --git a/drivers/md/dm.h b/drivers/md/dm.h index 13a758e..6d22667 100644 --- a/drivers/md/dm.h +++ b/drivers/md/dm.h @@ -39,6 +39,13 @@ #define DM_TYPE_BIO_BASED 1 #define DM_TYPE_REQUEST_BASED 2 #define DM_TYPE_MQ_REQUEST_BASED 3 +#define DM_TYPE_DAX_BIO_BASED 4 + +/* + * To check whether the DM type is bio-based or not. + */ +#define dm_type_bio_based(type) (((type) == DM_TYPE_BIO_BASED) || \ + ((type) == DM_TYPE_DAX_BIO_BASED)) /* * List of devices that a metadevice uses and should open/close. ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani ` (5 preceding siblings ...) 2016-06-13 22:21 ` [PATCH 6/6] dm: Enable DAX support for mapper device Toshi Kani @ 2016-06-13 22:57 ` Mike Snitzer [not found] ` <20160613225756.GA18417-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-06-13 23:18 ` Dan Williams 7 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-13 22:57 UTC (permalink / raw) To: Toshi Kani Cc: agk, dan.j.williams, ross.zwisler, viro, axboe, linux-nvdimm, dm-devel, linux-raid, linux-kernel On Mon, Jun 13 2016 at 6:21pm -0400, Toshi Kani <toshi.kani@hpe.com> wrote: > This patch-set adds DAX support to device-mapper dm-linear devices > used by LVM. It works with LVM commands as follows: > - Creation of a logical volume with all DAX capable devices (such > as pmem) sets the logical volume DAX capable as well. > - Once a logical volume is set to DAX capable, the volume may not > be extended with non-DAX capable devices. > > The direct_access interface is added to dm and dm-linear to map > a request to a target device. > > - Patch 1-2 introduce GENHD_FL_DAX flag to indicate DAX capability. > - Patch 3-4 add direct_access functions to dm and dm-linear. > - Patch 5-6 set GENHD_FL_DAX to dm when all targets are DAX capable. > > --- > Toshi Kani (6): > 1/6 genhd: Add GENHD_FL_DAX to gendisk flags > 2/6 block: Check GENHD_FL_DAX for DAX capability > 3/6 dm: Add dm_blk_direct_access() for mapped device > 4/6 dm-linear: Add linear_direct_access() > 5/6 dm, dm-linear: Add dax_supported to dm_target > 6/6 dm: Enable DAX support for mapper device Thanks a lot for doing this. I recently added it to my TODO so your patches come at a great time. I'll try to get to reviewing/testing your work by the end of this week. Mike ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20160613225756.GA18417-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <20160613225756.GA18417-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2016-06-20 18:00 ` Mike Snitzer [not found] ` <20160620180043.GA21261-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-20 18:00 UTC (permalink / raw) To: Toshi Kani Cc: axboe-tSWWG44O7X1aa/9Udqfwiw, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-raid-u79uwXL29TY76Z2rM5mHXA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, agk-H+wXaHxf7aLQT0dZR+AlfA On Mon, Jun 13 2016 at 6:57pm -0400, Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Mon, Jun 13 2016 at 6:21pm -0400, > Toshi Kani <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: > > > This patch-set adds DAX support to device-mapper dm-linear devices > > used by LVM. It works with LVM commands as follows: > > - Creation of a logical volume with all DAX capable devices (such > > as pmem) sets the logical volume DAX capable as well. > > - Once a logical volume is set to DAX capable, the volume may not > > be extended with non-DAX capable devices. > > > > The direct_access interface is added to dm and dm-linear to map > > a request to a target device. > > > > - Patch 1-2 introduce GENHD_FL_DAX flag to indicate DAX capability. > > - Patch 3-4 add direct_access functions to dm and dm-linear. > > - Patch 5-6 set GENHD_FL_DAX to dm when all targets are DAX capable. > > > > --- > > Toshi Kani (6): > > 1/6 genhd: Add GENHD_FL_DAX to gendisk flags > > 2/6 block: Check GENHD_FL_DAX for DAX capability > > 3/6 dm: Add dm_blk_direct_access() for mapped device > > 4/6 dm-linear: Add linear_direct_access() > > 5/6 dm, dm-linear: Add dax_supported to dm_target > > 6/6 dm: Enable DAX support for mapper device > > Thanks a lot for doing this. I recently added it to my TODO so your > patches come at a great time. > > I'll try to get to reviewing/testing your work by the end of this week. I rebased your patches on linux-dm.git's 'for-next' (which includes what I've already staged for the 4.8 merge window). And I folded/changed some of the DM patches so that there are only 2 now (1 for DM core and 1 for dm-linear). Please see the 4 topmost commits in my 'wip' here: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip Feel free to pick these patches up to use as the basis for continued work or re-posting of this set.. either that or I could post them as v2 on your behalf. As for testing, I've verified that basic IO works to a pmem-based DM linear device and that mixed table types are rejected as expected. Mike ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20160620180043.GA21261-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <20160620180043.GA21261-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2016-06-20 18:31 ` Kani, Toshimitsu [not found] ` <1466446861.3504.243.camel-ZPxbGqLxI0U@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-20 18:31 UTC (permalink / raw) To: snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Mon, 2016-06-20 at 14:00 -0400, Mike Snitzer wrote: > On Mon, Jun 13 2016 at 6:57pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > On Mon, Jun 13 2016 at 6:21pm -0400, > > Toshi Kani <toshi.kani@hpe.com> wrote: > > > > > > > > This patch-set adds DAX support to device-mapper dm-linear devices > > > used by LVM. It works with LVM commands as follows: > > > - Creation of a logical volume with all DAX capable devices (such > > > as pmem) sets the logical volume DAX capable as well. > > > - Once a logical volume is set to DAX capable, the volume may not > > > be extended with non-DAX capable devices. > > > > > > The direct_access interface is added to dm and dm-linear to map > > > a request to a target device. > > > > > > - Patch 1-2 introduce GENHD_FL_DAX flag to indicate DAX capability. > > > - Patch 3-4 add direct_access functions to dm and dm-linear. > > > - Patch 5-6 set GENHD_FL_DAX to dm when all targets are DAX capable. > > > > > > --- > > > Toshi Kani (6): > > > 1/6 genhd: Add GENHD_FL_DAX to gendisk flags > > > 2/6 block: Check GENHD_FL_DAX for DAX capability > > > 3/6 dm: Add dm_blk_direct_access() for mapped device > > > 4/6 dm-linear: Add linear_direct_access() > > > 5/6 dm, dm-linear: Add dax_supported to dm_target > > > 6/6 dm: Enable DAX support for mapper device > > Thanks a lot for doing this. I recently added it to my TODO so your > > patches come at a great time. > > > > I'll try to get to reviewing/testing your work by the end of this week. > > I rebased your patches on linux-dm.git's 'for-next' (which includes what > I've already staged for the 4.8 merge window). And I folded/changed > some of the DM patches so that there are only 2 now (1 for DM core and 1 > for dm-linear). Please see the 4 topmost commits in my 'wip' here: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip > > Feel free to pick these patches up to use as the basis for continued > work or re-posting of this set.. either that or I could post them as v2 > on your behalf. > > As for testing, I've verified that basic IO works to a pmem-based DM > linear device and that mixed table types are rejected as expected. Great! I will send additional patch, add DAX support to dm-stripe, on top of these once I finish my testing. Thanks, -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <1466446861.3504.243.camel-ZPxbGqLxI0U@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <1466446861.3504.243.camel-ZPxbGqLxI0U@public.gmane.org> @ 2016-06-20 19:40 ` Mike Snitzer [not found] ` <20160620194026.GA21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-20 19:40 UTC (permalink / raw) To: Kani, Toshimitsu Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Mon, Jun 20 2016 at 2:31pm -0400, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Mon, 2016-06-20 at 14:00 -0400, Mike Snitzer wrote: > > > > I rebased your patches on linux-dm.git's 'for-next' (which includes what > > I've already staged for the 4.8 merge window). And I folded/changed > > some of the DM patches so that there are only 2 now (1 for DM core and 1 > > for dm-linear). Please see the 4 topmost commits in my 'wip' here: > > > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip > > > > Feel free to pick these patches up to use as the basis for continued > > work or re-posting of this set.. either that or I could post them as v2 > > on your behalf. > > > > As for testing, I've verified that basic IO works to a pmem-based DM > > linear device and that mixed table types are rejected as expected. > > Great! I will send additional patch, add DAX support to dm-stripe, on top of > these once I finish my testing. I did some further testing and am seeing some XFS corruption when testing a DM linear device that spans multiple pmem devices. I created 2 partitions ontop of /dev/pmem0 (which I created using the howto from https://nvdimm.wiki.kernel.org). Then I did: # pvcreate /dev/pmem0p1 # pvcreate /dev/pmem0p2 # vgcreate pmem /dev/pmem0p1 /dev/pmem0p2 # lvcreate -L 2.9G -n lv pmem # lsblk /dev/pmem0 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT pmem0 259:0 0 6G 0 disk ├─pmem0p1 259:1 0 1G 0 part │ └─pmem-lv 253:4 0 2.9G 0 lvm └─pmem0p2 259:2 0 2G 0 part └─pmem-lv 253:4 0 2.9G 0 lvm # dmsetup table pmem-lv 0 4186112 linear 259:2 2048 4186112 1900544 linear 259:1 2048 # mkfs.xfs /dev/pmem/lv # mount -o dax -t xfs /dev/pmem/lv /mnt/dax [11452.212034] XFS (dm-4): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [11452.220323] XFS (dm-4): Mounting V4 Filesystem [11452.226526] XFS (dm-4): Ending clean mount # dd if=/dev/zero of=/mnt/dax/meh bs=1024K oflag=direct [11729.754671] XFS (dm-4): Metadata corruption detected at xfs_agf_read_verify+0x70/0x120 [xfs], xfs_agf block 0x45a808 [11729.766423] XFS (dm-4): Unmount and run xfs_repair [11729.771774] XFS (dm-4): First 64 bytes of corrupted metadata buffer: [11729.778869] ffff8800b8038000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [11729.788582] ffff8800b8038010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [11729.798293] ffff8800b8038020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [11729.808002] ffff8800b8038030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [11729.817715] XFS (dm-4): metadata I/O error: block 0x45a808 ("xfs_trans_read_buf_map") error 117 numblks 8 When this XFS corruption occurs corruption then also manifests in lvm2's metadata: # vgremove pmem Do you really want to remove volume group "pmem" containing 1 logical volumes? [y/n]: y Do you really want to remove active logical volume lv? [y/n]: y Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 WARNING: Failed to write an MDA of VG pmem. Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 WARNING: Failed to write an MDA of VG pmem. Failed to write VG pmem. Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 If I don't use XFS, and only issue IO directly to the /dev/pmem/lv, I don't see this corruption. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20160620194026.GA21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <20160620194026.GA21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2016-06-20 19:52 ` Mike Snitzer [not found] ` <20160620195217.GB21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-20 19:52 UTC (permalink / raw) To: Kani, Toshimitsu Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Mon, Jun 20 2016 at 3:40pm -0400, Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > # dd if=/dev/zero of=/mnt/dax/meh bs=1024K oflag=direct > [11729.754671] XFS (dm-4): Metadata corruption detected at xfs_agf_read_verify+0x70/0x120 [xfs], xfs_agf block 0x45a808 > [11729.766423] XFS (dm-4): Unmount and run xfs_repair > [11729.771774] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > [11729.778869] ffff8800b8038000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [11729.788582] ffff8800b8038010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [11729.798293] ffff8800b8038020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [11729.808002] ffff8800b8038030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [11729.817715] XFS (dm-4): metadata I/O error: block 0x45a808 ("xfs_trans_read_buf_map") error 117 numblks 8 > > When this XFS corruption occurs corruption then also manifests in lvm2's > metadata: > > # vgremove pmem > Do you really want to remove volume group "pmem" containing 1 logical volumes? [y/n]: y > Do you really want to remove active logical volume lv? [y/n]: y > Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 > WARNING: Failed to write an MDA of VG pmem. > Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 > WARNING: Failed to write an MDA of VG pmem. > Failed to write VG pmem. > Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 > Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 > > If I don't use XFS, and only issue IO directly to the /dev/pmem/lv, I > don't see this corruption. I did the same test with ext4 instead of xfs and it resulted in the same type of systemic corruption (lvm2 metadata corrupted too): [12816.407147] EXT4-fs (dm-4): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [12816.416123] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: dax [12816.766855] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 9, block bitmap and bg descriptor inconsistent: 32768 vs 32395 free clusters [12816.782016] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 10, block bitmap and bg descriptor inconsistent: 32768 vs 16384 free clusters [12816.797491] JBD2: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash. # vgremove pmem Do you really want to remove volume group "pmem" containing 1 logical volumes? [y/n]: y Do you really want to remove active logical volume lv? [y/n]: y Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 WARNING: Failed to write an MDA of VG pmem. Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 WARNING: Failed to write an MDA of VG pmem. Failed to write VG pmem. Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20160620195217.GB21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <20160620195217.GB21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2016-06-20 20:11 ` Kani, Toshimitsu [not found] ` <1466452883.3504.244.camel-ZPxbGqLxI0U@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-20 20:11 UTC (permalink / raw) To: snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Mon, 2016-06-20 at 15:52 -0400, Mike Snitzer wrote: > On Mon, Jun 20 2016 at 3:40pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > # dd if=/dev/zero of=/mnt/dax/meh bs=1024K oflag=direct > > [11729.754671] XFS (dm-4): Metadata corruption detected at > > xfs_agf_read_verify+0x70/0x120 [xfs], xfs_agf block 0x45a808 > > [11729.766423] XFS (dm-4): Unmount and run xfs_repair > > [11729.771774] XFS (dm-4): First 64 bytes of corrupted metadata buffer: > > [11729.778869] ffff8800b8038000: 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 ................ > > [11729.788582] ffff8800b8038010: 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 ................ > > [11729.798293] ffff8800b8038020: 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 ................ > > [11729.808002] ffff8800b8038030: 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 00 00 00 ................ > > [11729.817715] XFS (dm-4): metadata I/O error: block 0x45a808 > > ("xfs_trans_read_buf_map") error 117 numblks 8 > > > > When this XFS corruption occurs corruption then also manifests in lvm2's > > metadata: > > > > # vgremove pmem > > Do you really want to remove volume group "pmem" containing 1 logical > > volumes? [y/n]: y > > Do you really want to remove active logical volume lv? [y/n]: y > > Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 > > WARNING: Failed to write an MDA of VG pmem. > > Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 > > WARNING: Failed to write an MDA of VG pmem. > > Failed to write VG pmem. > > Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 > > Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 > > > > If I don't use XFS, and only issue IO directly to the /dev/pmem/lv, I > > don't see this corruption. > I did the same test with ext4 instead of xfs and it resulted in the same > type of systemic corruption (lvm2 metadata corrupted too): > > [12816.407147] EXT4-fs (dm-4): DAX enabled. Warning: EXPERIMENTAL, use at > your own risk > [12816.416123] EXT4-fs (dm-4): mounted filesystem with ordered data mode. > Opts: dax > [12816.766855] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: > group 9, block bitmap and bg descriptor inconsistent: 32768 vs 32395 free > clusters > [12816.782016] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: > group 10, block bitmap and bg descriptor inconsistent: 32768 vs 16384 free > clusters > [12816.797491] JBD2: Spotted dirty metadata buffer (dev = dm-4, blocknr = > 0). There's a risk of filesystem corruption in case of system crash. > > # vgremove pmem > Do you really want to remove volume group "pmem" containing 1 logical > volumes? [y/n]: y > Do you really want to remove active logical volume lv? [y/n]: y > Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 > WARNING: Failed to write an MDA of VG pmem. > Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 > WARNING: Failed to write an MDA of VG pmem. > Failed to write VG pmem. > Incorrect metadata area header checksum on /dev/pmem0p2 at offset 4096 > Incorrect metadata area header checksum on /dev/pmem0p1 at offset 4096 I will look into the issue. Thanks for the testing! -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <1466452883.3504.244.camel-ZPxbGqLxI0U@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <1466452883.3504.244.camel-ZPxbGqLxI0U@public.gmane.org> @ 2016-06-20 21:28 ` Kani, Toshimitsu [not found] ` <1466457467.3504.249.camel-ZPxbGqLxI0U@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-20 21:28 UTC (permalink / raw) To: snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Mon, 2016-06-20 at 14:01 -0600, Kani, Toshimitsu wrote: > On Mon, 2016-06-20 at 15:52 -0400, Mike Snitzer wrote: > > > > On Mon, Jun 20 2016 at 3:40pm -0400, > > Mike Snitzer <snitzer@redhat.com> wrote: > > : > > > If I don't use XFS, and only issue IO directly to the /dev/pmem/lv, I > > > don't see this corruption. > > > > I did the same test with ext4 instead of xfs and it resulted in the same > > type of systemic corruption (lvm2 metadata corrupted too): > > : > I will look into the issue. Hi Mike, Can you fold the following patch to the dm-linear patch? Thanks, -Tsohi ------ Subject: [PATCH] dm-linear: Fix partition handling for DAX Partition handling was missing in linear_direct_access(). Call bdev_direct_access(), instead of directly calling target direct_access function. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> --- drivers/md/dm-linear.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 325aa06..38323e4 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -148,10 +148,16 @@ static long linear_direct_access(struct dm_target *ti, sector_t sector, { struct linear_c *lc = ti->private; struct block_device *bdev = lc->dev->bdev; - const struct block_device_operations *bd_ops = bdev->bd_disk->fops; - - return bd_ops->direct_access(bdev, linear_map_sector(ti, sector), - kaddr, pfn, size); + struct blk_dax_ctl dax = { + .sector = linear_map_sector(ti, sector), + .size = size, + }; + long ret; + + ret = bdev_direct_access(bdev, &dax); + *kaddr = dax.addr; + *pfn = dax.pfn; + return ret; } static struct target_type linear_target = { _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply related [flat|nested] 40+ messages in thread
[parent not found: <1466457467.3504.249.camel-ZPxbGqLxI0U@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <1466457467.3504.249.camel-ZPxbGqLxI0U@public.gmane.org> @ 2016-06-20 22:22 ` Mike Snitzer [not found] ` <20160620222236.GA22461-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-20 22:22 UTC (permalink / raw) To: Kani, Toshimitsu Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Mon, Jun 20 2016 at 5:28pm -0400, Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: > On Mon, 2016-06-20 at 14:01 -0600, Kani, Toshimitsu wrote: > > On Mon, 2016-06-20 at 15:52 -0400, Mike Snitzer wrote: > > > > > > On Mon, Jun 20 2016 at 3:40pm -0400, > > > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > > > : > > > > If I don't use XFS, and only issue IO directly to the /dev/pmem/lv, I > > > > don't see this corruption. > > > > > > I did the same test with ext4 instead of xfs and it resulted in the same > > > type of systemic corruption (lvm2 metadata corrupted too): > > > > : > > I will look into the issue. > > Hi Mike, > > Can you fold the following patch to the dm-linear patch? > > Thanks, > -Tsohi > > ------ > Subject: [PATCH] dm-linear: Fix partition handling for DAX > > Partition handling was missing in linear_direct_access(). > Call bdev_direct_access(), instead of directly calling > target direct_access function. > > Signed-off-by: Toshi Kani <toshi.kani-ZPxbGqLxI0U@public.gmane.org> > --- > drivers/md/dm-linear.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c > index 325aa06..38323e4 100644 > --- a/drivers/md/dm-linear.c > +++ b/drivers/md/dm-linear.c > @@ -148,10 +148,16 @@ static long linear_direct_access(struct dm_target *ti, > sector_t sector, > { > struct linear_c *lc = ti->private; > struct block_device *bdev = lc->dev->bdev; > - const struct block_device_operations *bd_ops = bdev->bd_disk->fops; > - > - return bd_ops->direct_access(bdev, linear_map_sector(ti, sector), > - kaddr, pfn, size); > + struct blk_dax_ctl dax = { > + .sector = linear_map_sector(ti, sector), > + .size = size, > + }; > + long ret; > + > + ret = bdev_direct_access(bdev, &dax); > + *kaddr = dax.addr; > + *pfn = dax.pfn; > + return ret; > } > > static struct target_type linear_target = { Looks good, I folded it in and tested it to work. Pushed to my 'wip' branch. No longer seeing any corruption in my test that was using partitions to span pmem devices with a dm-linear device. Jens, any chance you'd be open to picking up the first 2 patches in this series? Or would you like to see them folded or something different? Mike ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20160620222236.GA22461-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <20160620222236.GA22461-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2016-06-21 13:41 ` Mike Snitzer [not found] ` <20160621134147.GA26392-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-21 13:41 UTC (permalink / raw) To: Kani, Toshimitsu, axboe-b10kYP2dOMg Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Mon, Jun 20 2016 at 6:22pm -0400, Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Mon, Jun 20 2016 at 5:28pm -0400, > Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: > > > > > Hi Mike, > > > > Can you fold the following patch to the dm-linear patch? > > > > Thanks, > > -Tsohi > > > > ------ > > Subject: [PATCH] dm-linear: Fix partition handling for DAX > > > > Partition handling was missing in linear_direct_access(). > > Call bdev_direct_access(), instead of directly calling > > target direct_access function. > > > > Signed-off-by: Toshi Kani <toshi.kani-ZPxbGqLxI0U@public.gmane.org> > > --- > > drivers/md/dm-linear.c | 14 ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c > > index 325aa06..38323e4 100644 > > --- a/drivers/md/dm-linear.c > > +++ b/drivers/md/dm-linear.c > > @@ -148,10 +148,16 @@ static long linear_direct_access(struct dm_target *ti, > > sector_t sector, > > { > > struct linear_c *lc = ti->private; > > struct block_device *bdev = lc->dev->bdev; > > - const struct block_device_operations *bd_ops = bdev->bd_disk->fops; > > - > > - return bd_ops->direct_access(bdev, linear_map_sector(ti, sector), > > - kaddr, pfn, size); > > + struct blk_dax_ctl dax = { > > + .sector = linear_map_sector(ti, sector), > > + .size = size, > > + }; > > + long ret; > > + > > + ret = bdev_direct_access(bdev, &dax); > > + *kaddr = dax.addr; > > + *pfn = dax.pfn; > > + return ret; > > } > > > > static struct target_type linear_target = { > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > branch. > > No longer seeing any corruption in my test that was using partitions to > span pmem devices with a dm-linear device. > > Jens, any chance you'd be open to picking up the first 2 patches in this > series? Or would you like to see them folded or something different? I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX rather than establish GENHD_FL_DAX on the genhd? It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to check for a queue flag. ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20160621134147.GA26392-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <20160621134147.GA26392-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2016-06-21 15:44 ` Kani, Toshimitsu [not found] ` <1466523280.3504.262.camel-ZPxbGqLxI0U@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-21 15:44 UTC (permalink / raw) To: axboe-b10kYP2dOMg@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > On Mon, Jun 20 2016 at 6:22pm -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > > > > On Mon, Jun 20 2016 at 5:28pm -0400, > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > : > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > > branch. > > > > No longer seeing any corruption in my test that was using partitions to > > span pmem devices with a dm-linear device. > > > > Jens, any chance you'd be open to picking up the first 2 patches in this > > series? Or would you like to see them folded or something different? > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > rather than establish GENHD_FL_DAX on the genhd? > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > check for a queue flag. I think GENHD_FL_DAX is more appropriate since DAX does not use a request queue, except for protecting the underlining device being disabled while direct_access() is called (b2e0d1625e19). About protecting direct_access, this patch assumes that the underlining device cannot be disabled until dtr() is called. Is this correct? If not, I will need to call dax_map_atomic(). Thanks, -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <1466523280.3504.262.camel-ZPxbGqLxI0U@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <1466523280.3504.262.camel-ZPxbGqLxI0U@public.gmane.org> @ 2016-06-21 15:50 ` Kani, Toshimitsu 2016-06-21 16:25 ` Dan Williams 2016-06-21 18:17 ` Mike Snitzer 2 siblings, 0 replies; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-21 15:50 UTC (permalink / raw) To: axboe-b10kYP2dOMg@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Tue, 2016-06-21 at 09:34 -0600, Kani, Toshimitsu wrote: > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > On Mon, Jun 20 2016 at 5:28pm -0400, > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > : > > > > > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > > > branch. > > > > > > No longer seeing any corruption in my test that was using partitions > > > to span pmem devices with a dm-linear device. > > > > > > Jens, any chance you'd be open to picking up the first 2 patches in > > > this series? Or would you like to see them folded or something > > > different? > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > rather than establish GENHD_FL_DAX on the genhd? > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > > check for a queue flag. > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > queue, except for protecting the underlining device being disabled while > direct_access() is called (b2e0d1625e19). Forgot to mention that there are bdev_dax_supported() and bdev_dax_capable() interfaces that can be called from upper layers. They both call bdev_direct_access() which checks GENHD_FL_DAX. Thanks, -Toshi > About protecting direct_access, this patch assumes that the underlining > device cannot be disabled until dtr() is called. Is this correct? If > not, I will need to call dax_map_atomic(). > > Thanks, > -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <1466523280.3504.262.camel-ZPxbGqLxI0U@public.gmane.org> 2016-06-21 15:50 ` Kani, Toshimitsu @ 2016-06-21 16:25 ` Dan Williams [not found] ` <CAPcyv4gFREc94ANuFD_Lyddx3iqRTN2UDebgeJe3LqPL8xrVzg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-06-21 18:17 ` Mike Snitzer 2 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2016-06-21 16:25 UTC (permalink / raw) To: Kani, Toshimitsu Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, axboe-b10kYP2dOMg@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: >> On Mon, Jun 20 2016 at 6:22pm -0400, >> Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > >> > On Mon, Jun 20 2016 at 5:28pm -0400, >> > Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: >> > > : >> > Looks good, I folded it in and tested it to work. Pushed to my 'wip' >> > branch. >> > >> > No longer seeing any corruption in my test that was using partitions to >> > span pmem devices with a dm-linear device. >> > >> > Jens, any chance you'd be open to picking up the first 2 patches in this >> > series? Or would you like to see them folded or something different? >> >> I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX >> rather than establish GENHD_FL_DAX on the genhd? >> >> It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to >> check for a queue flag. > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > queue, except for protecting the underlining device being disabled while > direct_access() is called (b2e0d1625e19). > > About protecting direct_access, this patch assumes that the underlining > device cannot be disabled until dtr() is called. Is this correct? If not, > I will need to call dax_map_atomic(). Kernel internal usages of dax should be using dax_map_atomic() to safely resolve device removal races. ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <CAPcyv4gFREc94ANuFD_Lyddx3iqRTN2UDebgeJe3LqPL8xrVzg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <CAPcyv4gFREc94ANuFD_Lyddx3iqRTN2UDebgeJe3LqPL8xrVzg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-06-21 16:35 ` Kani, Toshimitsu [not found] ` <1466526342.3504.270.camel-ZPxbGqLxI0U@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-21 16:35 UTC (permalink / raw) To: dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, axboe-b10kYP2dOMg@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote: > On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > > > On Mon, Jun 20 2016 at 5:28pm -0400, > > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > : > > > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > > > > branch. > > > > > > > > No longer seeing any corruption in my test that was using partitions > > > > to span pmem devices with a dm-linear device. > > > > > > > > Jens, any chance you'd be open to picking up the first 2 patches in > > > > this series? Or would you like to see them folded or something > > > > different? > > > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > > rather than establish GENHD_FL_DAX on the genhd? > > > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > > > check for a queue flag. > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > > queue, except for protecting the underlining device being disabled while > > direct_access() is called (b2e0d1625e19). > > > > About protecting direct_access, this patch assumes that the underlining > > device cannot be disabled until dtr() is called. Is this correct? If > > not, I will need to call dax_map_atomic(). > > Kernel internal usages of dax should be using dax_map_atomic() to > safely resolve device removal races. Will do. In such case, shall I move dax_[un]map_atomic() to block_dev.c and rename them to bdev_dax_[un]map_atomic()? Thanks, -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <1466526342.3504.270.camel-ZPxbGqLxI0U@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <1466526342.3504.270.camel-ZPxbGqLxI0U@public.gmane.org> @ 2016-06-21 16:45 ` Dan Williams [not found] ` <CAPcyv4ht8B7dHe1ckv5d=bOrRzCy3=ZDVSTD0rRsak_LYD8r8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2016-06-21 16:45 UTC (permalink / raw) To: Kani, Toshimitsu Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, axboe-b10kYP2dOMg@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Tue, Jun 21, 2016 at 9:35 AM, Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: > On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote: >> On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> >> wrote: >> > >> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: >> > > >> > > On Mon, Jun 20 2016 at 6:22pm -0400, >> > > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > > > >> > > > On Mon, Jun 20 2016 at 5:28pm -0400, >> > > > Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: >> > : >> > > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' >> > > > branch. >> > > > >> > > > No longer seeing any corruption in my test that was using partitions >> > > > to span pmem devices with a dm-linear device. >> > > > >> > > > Jens, any chance you'd be open to picking up the first 2 patches in >> > > > this series? Or would you like to see them folded or something >> > > > different? >> > > >> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX >> > > rather than establish GENHD_FL_DAX on the genhd? >> > > >> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to >> > > check for a queue flag. >> > >> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request >> > queue, except for protecting the underlining device being disabled while >> > direct_access() is called (b2e0d1625e19). >> > >> > About protecting direct_access, this patch assumes that the underlining >> > device cannot be disabled until dtr() is called. Is this correct? If >> > not, I will need to call dax_map_atomic(). >> >> Kernel internal usages of dax should be using dax_map_atomic() to >> safely resolve device removal races. > > Will do. In such case, shall I move dax_[un]map_atomic() to block_dev.c and > rename them to bdev_dax_[un]map_atomic()? Sounds good to me. I know Jeff and Christoph don't like the current calling convention of passing in a structure. Just note that they might ask you to change it back to a list of parameters if it moves to bdev_dax_map_atomic(). ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <CAPcyv4ht8B7dHe1ckv5d=bOrRzCy3=ZDVSTD0rRsak_LYD8r8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <CAPcyv4ht8B7dHe1ckv5d=bOrRzCy3=ZDVSTD0rRsak_LYD8r8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2016-06-21 16:56 ` Kani, Toshimitsu 0 siblings, 0 replies; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-21 16:56 UTC (permalink / raw) To: dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, axboe-b10kYP2dOMg@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Tue, 2016-06-21 at 09:45 -0700, Dan Williams wrote: > On Tue, Jun 21, 2016 at 9:35 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > On Tue, 2016-06-21 at 09:25 -0700, Dan Williams wrote: > > > On Tue, Jun 21, 2016 at 8:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > > > wrote: > > > > : > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a > > > > request queue, except for protecting the underlining device being > > > > disabled while direct_access() is called (b2e0d1625e19). > > > > > > > > About protecting direct_access, this patch assumes that the > > > > underlining device cannot be disabled until dtr() is called. Is this > > > > correct? If not, I will need to call dax_map_atomic(). > > > > > > Kernel internal usages of dax should be using dax_map_atomic() to > > > safely resolve device removal races. > > > > Will do. In such case, shall I move dax_[un]map_atomic() to block_dev.c > > and rename them to bdev_dax_[un]map_atomic()? > > Sounds good to me. I know Jeff and Christoph don't like the current > calling convention of passing in a structure. Just note that they > might ask you to change it back to a list of parameters if it moves to > bdev_dax_map_atomic(). OK, I will change it back to a list of parameters as well. Thanks, -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <1466523280.3504.262.camel-ZPxbGqLxI0U@public.gmane.org> 2016-06-21 15:50 ` Kani, Toshimitsu 2016-06-21 16:25 ` Dan Williams @ 2016-06-21 18:17 ` Mike Snitzer 2016-06-22 17:44 ` Kani, Toshimitsu 2 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-21 18:17 UTC (permalink / raw) To: Kani, Toshimitsu Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, axboe-b10kYP2dOMg@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Tue, Jun 21 2016 at 11:44am -0400, Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > On Mon, Jun 20 2016 at 6:22pm -0400, > > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > > > > > On Mon, Jun 20 2016 at 5:28pm -0400, > > > Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: > > > > : > > > Looks good, I folded it in and tested it to work. Pushed to my 'wip' > > > branch. > > > > > > No longer seeing any corruption in my test that was using partitions to > > > span pmem devices with a dm-linear device. > > > > > > Jens, any chance you'd be open to picking up the first 2 patches in this > > > series? Or would you like to see them folded or something different? > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > rather than establish GENHD_FL_DAX on the genhd? > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > > check for a queue flag. > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > queue, except for protecting the underlining device being disabled while > direct_access() is called (b2e0d1625e19). The devices in question have a request_queue. All bio-based device have a request_queue. I don't have a big problem with GENHD_FL_DAX. Just wanted to point out that such block device capabilities are generally advertised in terms of a QUEUE_FLAG. > About protecting direct_access, this patch assumes that the underlining > device cannot be disabled until dtr() is called. Is this correct? If not, > I will need to call dax_map_atomic(). One of the big design considerations for DM that a DM device can be suspended (with or without flush) and any new IO will be blocked until the DM device is resumed. So ideally DM should be able to have the same capability even if using DAX. But that is different than what commit b2e0d1625e19 is addressing. For DM, I wouldn't think you'd need the extra protections that dax_map_atomic() is providing given that the underlying block device lifetime is managed via DM core's dm_get_device/dm_put_device (see also: dm.c:open_table_device/close_table_device). ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-21 18:17 ` Mike Snitzer @ 2016-06-22 17:44 ` Kani, Toshimitsu [not found] ` <1466616868.3504.320.camel-ZPxbGqLxI0U@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-22 17:44 UTC (permalink / raw) To: snitzer@redhat.com Cc: axboe@kernel.dk, sandeen@redhat.com, axboe@fb.com, linux-nvdimm@ml01.01.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, dm-devel@redhat.com, viro@zeniv.linux.org.uk, dan.j.williams@intel.com, ross.zwisler@linux.intel.com, agk@redhat.com On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote: > On Tue, Jun 21 2016 at 11:44am -0400, > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > > Mike Snitzer <snitzer@redhat.com> wrote: : > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > > rather than establish GENHD_FL_DAX on the genhd? > > > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to > > > check for a queue flag. > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a request > > queue, except for protecting the underlining device being disabled while > > direct_access() is called (b2e0d1625e19). > > The devices in question have a request_queue. All bio-based device have > a request_queue. DAX-capable devices have two operation modes, bio-based and DAX. I agree that bio-based operation is associated with a request queue, and its capabilities should be set to it. DAX, on the other hand, is rather independent from a request queue. > I don't have a big problem with GENHD_FL_DAX. Just wanted to point out > that such block device capabilities are generally advertised in terms of > a QUEUE_FLAG. I do not have a strong opinion, but feel a bit odd to associate DAX to a request queue. > > About protecting direct_access, this patch assumes that the underlining > > device cannot be disabled until dtr() is called. Is this correct? If > > not, I will need to call dax_map_atomic(). > > One of the big design considerations for DM that a DM device can be > suspended (with or without flush) and any new IO will be blocked until > the DM device is resumed. > > So ideally DM should be able to have the same capability even if using > DAX. Supporting suspend for DAX is challenging since it allows user applications to access a device directly. Once a device range is mmap'd, there is no kernel intervention to access the range, unless we invalidate user mappings. This isn't done today even after a driver is unbind'd from a device. > But that is different than what commit b2e0d1625e19 is addressing. For > DM, I wouldn't think you'd need the extra protections that > dax_map_atomic() is providing given that the underlying block device > lifetime is managed via DM core's dm_get_device/dm_put_device (see also: > dm.c:open_table_device/close_table_device). I thought so as well. But I realized that there is (almost) nothing that can prevent the unbind operation. It cannot fail, either. This unbind proceeds even when a device is in-use. In case of a pmem device, it is only protected by pmem_release_queue(), which is called when a pmem device is being deleted and calls blk_cleanup_queue() to serialize a critical section between blk_queue_enter() and blk_queue_exit() per b2e0d1625e19. This prevents from a kernel DTLB fault, but does not prevent a device disappeared while in-use. Protecting DM's underlining device with blk_queue_enter() (or something similar) requires more thoughts... blk_queue_enter() to a DM device cannot be redirected to its underlining device. So, this is TBD for now. But I do not think this is a blocker issue since doing unbind to a underlining device is quite harmful no matter what we do - even if it is protected with blk_queue_enter(). Thanks, -Toshi -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <1466616868.3504.320.camel-ZPxbGqLxI0U@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <1466616868.3504.320.camel-ZPxbGqLxI0U@public.gmane.org> @ 2016-06-22 19:15 ` Dan Williams 2016-06-22 20:16 ` Kani, Toshimitsu 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2016-06-22 19:15 UTC (permalink / raw) To: Kani, Toshimitsu Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, axboe-b10kYP2dOMg@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: > On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote: >> On Tue, Jun 21 2016 at 11:44am -0400, >> Kani, Toshimitsu <toshi.kani-ZPxbGqLxI0U@public.gmane.org> wrote: >> > >> > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: >> > > >> > > On Mon, Jun 20 2016 at 6:22pm -0400, >> > > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > : >> > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX >> > > rather than establish GENHD_FL_DAX on the genhd? >> > > >> > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) to >> > > check for a queue flag. >> > >> > I think GENHD_FL_DAX is more appropriate since DAX does not use a request >> > queue, except for protecting the underlining device being disabled while >> > direct_access() is called (b2e0d1625e19). >> >> The devices in question have a request_queue. All bio-based device have >> a request_queue. > > DAX-capable devices have two operation modes, bio-based and DAX. I agree that > bio-based operation is associated with a request queue, and its capabilities > should be set to it. DAX, on the other hand, is rather independent from a > request queue. > >> I don't have a big problem with GENHD_FL_DAX. Just wanted to point out >> that such block device capabilities are generally advertised in terms of >> a QUEUE_FLAG. > > I do not have a strong opinion, but feel a bit odd to associate DAX to a > request queue. Given that we do not support dax to a raw block device [1] it seems a gendisk flag is more misleading than request_queue flag that specifies what requests can be made of the device. [1]: acc93d30d7d4 Revert "block: enable dax for raw block devices" >> > About protecting direct_access, this patch assumes that the underlining >> > device cannot be disabled until dtr() is called. Is this correct? If >> > not, I will need to call dax_map_atomic(). >> >> One of the big design considerations for DM that a DM device can be >> suspended (with or without flush) and any new IO will be blocked until >> the DM device is resumed. >> >> So ideally DM should be able to have the same capability even if using >> DAX. > > Supporting suspend for DAX is challenging since it allows user applications to > access a device directly. Once a device range is mmap'd, there is no kernel > intervention to access the range, unless we invalidate user mappings. This > isn't done today even after a driver is unbind'd from a device. > >> But that is different than what commit b2e0d1625e19 is addressing. For >> DM, I wouldn't think you'd need the extra protections that >> dax_map_atomic() is providing given that the underlying block device >> lifetime is managed via DM core's dm_get_device/dm_put_device (see also: >> dm.c:open_table_device/close_table_device). > > I thought so as well. But I realized that there is (almost) nothing that can > prevent the unbind operation. It cannot fail, either. This unbind proceeds > even when a device is in-use. In case of a pmem device, it is only protected > by pmem_release_queue(), which is called when a pmem device is being deleted > and calls blk_cleanup_queue() to serialize a critical section between > blk_queue_enter() and blk_queue_exit() per b2e0d1625e19. This prevents from a > kernel DTLB fault, but does not prevent a device disappeared while in-use. > > Protecting DM's underlining device with blk_queue_enter() (or something > similar) requires more thoughts... blk_queue_enter() to a DM device cannot be > redirected to its underlining device. So, this is TBD for now. But I do not > think this is a blocker issue since doing unbind to a underlining device is > quite harmful no matter what we do - even if it is protected with > blk_queue_enter(). I still have the "block device removed" notification patches on my todo list. It's not a blocker, but there are scenarios where we can keep accessing memory via dax of a disabled device leading to memory corruption. I'll bump that up in my queue now that we are looking at additional scenarios where letting DAX mappings leak past the reconfiguration of a block device could lead to trouble. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-22 19:15 ` Dan Williams @ 2016-06-22 20:16 ` Kani, Toshimitsu 2016-06-22 22:38 ` Mike Snitzer 0 siblings, 1 reply; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-22 20:16 UTC (permalink / raw) To: dan.j.williams@intel.com Cc: linux-kernel@vger.kernel.org, sandeen@redhat.com, linux-nvdimm@ml01.01.org, agk@redhat.com, linux-raid@vger.kernel.org, snitzer@redhat.com, viro@zeniv.linux.org.uk, axboe@kernel.dk, axboe@fb.com, ross.zwisler@linux.intel.com, dm-devel@redhat.com On Wed, 2016-06-22 at 12:15 -0700, Dan Williams wrote: > On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > wrote: > > On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote: > > > > > > On Tue, Jun 21 2016 at 11:44am -0400, > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > > > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > > > > rather than establish GENHD_FL_DAX on the genhd? > > > > > > > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) > > > > > to check for a queue flag. > > > > > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a > > > > request queue, except for protecting the underlining device being > > > > disabled while direct_access() is called (b2e0d1625e19). > > > > > > The devices in question have a request_queue. All bio-based device have > > > a request_queue. > > > > DAX-capable devices have two operation modes, bio-based and DAX. I agree > > that bio-based operation is associated with a request queue, and its > > capabilities should be set to it. DAX, on the other hand, is rather > > independent from a request queue. > > > > > I don't have a big problem with GENHD_FL_DAX. Just wanted to point out > > > that such block device capabilities are generally advertised in terms of > > > a QUEUE_FLAG. > > > > I do not have a strong opinion, but feel a bit odd to associate DAX to a > > request queue. > > Given that we do not support dax to a raw block device [1] it seems a > gendisk flag is more misleading than request_queue flag that specifies > what requests can be made of the device. > > [1]: acc93d30d7d4 Revert "block: enable dax for raw block devices" Oh, I see. I will change to use request_queue flag. > > > > About protecting direct_access, this patch assumes that the > > > > underlining device cannot be disabled until dtr() is called. Is this > > > > correct? If not, I will need to call dax_map_atomic(). > > > > > > One of the big design considerations for DM that a DM device can be > > > suspended (with or without flush) and any new IO will be blocked until > > > the DM device is resumed. > > > > > > So ideally DM should be able to have the same capability even if using > > > DAX. > > > > Supporting suspend for DAX is challenging since it allows user > > applications to access a device directly. Once a device range is mmap'd, > > there is no kernel intervention to access the range, unless we invalidate > > user mappings. This isn't done today even after a driver is unbind'd from > > a device. > > > > > But that is different than what commit b2e0d1625e19 is addressing. For > > > DM, I wouldn't think you'd need the extra protections that > > > dax_map_atomic() is providing given that the underlying block device > > > lifetime is managed via DM core's dm_get_device/dm_put_device (see also: > > > dm.c:open_table_device/close_table_device). > > > > I thought so as well. But I realized that there is (almost) nothing that > > can prevent the unbind operation. It cannot fail, either. This unbind > > proceeds even when a device is in-use. In case of a pmem device, it is > > only protected by pmem_release_queue(), which is called when a pmem device > > is being deleted and calls blk_cleanup_queue() to serialize a critical > > section between > > blk_queue_enter() and blk_queue_exit() per b2e0d1625e19. This prevents > > from a kernel DTLB fault, but does not prevent a device disappeared while > > in-use. > > > > Protecting DM's underlining device with blk_queue_enter() (or something > > similar) requires more thoughts... blk_queue_enter() to a DM device > > cannot be redirected to its underlining device. So, this is TBD for > > now. But I do not think this is a blocker issue since doing unbind to a > > underlining device is quite harmful no matter what we do - even if it is > > protected with blk_queue_enter(). > > I still have the "block device removed" notification patches on my > todo list. It's not a blocker, but there are scenarios where we can > keep accessing memory via dax of a disabled device leading to memory > corruption. Right, I noticed that user applications can access mmap'd ranges on a disabled device. > I'll bump that up in my queue now that we are looking at > additional scenarios where letting DAX mappings leak past the > reconfiguration of a block device could lead to trouble. Great. With DM, removing a underlining device while in-use can lead to trouble, esp. with RAID0. Users need to remove a device from DM first... Thanks, -Toshi ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-22 20:16 ` Kani, Toshimitsu @ 2016-06-22 22:38 ` Mike Snitzer [not found] ` <20160622223842.GA34512-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-22 22:38 UTC (permalink / raw) To: Kani, Toshimitsu Cc: dan.j.williams@intel.com, linux-kernel@vger.kernel.org, sandeen@redhat.com, linux-nvdimm@ml01.01.org, agk@redhat.com, linux-raid@vger.kernel.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, axboe@fb.com, ross.zwisler@linux.intel.com, dm-devel@redhat.com On Wed, Jun 22 2016 at 4:16P -0400, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Wed, 2016-06-22 at 12:15 -0700, Dan Williams wrote: > > On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > > wrote: > > > On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote: > > > > > > > > On Tue, Jun 21 2016 at 11:44am -0400, > > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > > > > > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > > > On Mon, Jun 20 2016 at 6:22pm -0400, > > > > > > Mike Snitzer <snitzer@redhat.com> wrote: > > > > > > I'm now wondering if we'd be better off setting a new QUEUE_FLAG_DAX > > > > > > rather than establish GENHD_FL_DAX on the genhd? > > > > > > > > > > > > It'd be quite a bit easier to allow upper layers (e.g. XFS and ext4) > > > > > > to check for a queue flag. > > > > > > > > > > I think GENHD_FL_DAX is more appropriate since DAX does not use a > > > > > request queue, except for protecting the underlining device being > > > > > disabled while direct_access() is called (b2e0d1625e19). > > > > > > > > The devices in question have a request_queue. All bio-based device have > > > > a request_queue. > > > > > > DAX-capable devices have two operation modes, bio-based and DAX. I agree > > > that bio-based operation is associated with a request queue, and its > > > capabilities should be set to it. DAX, on the other hand, is rather > > > independent from a request queue. > > > > > > > I don't have a big problem with GENHD_FL_DAX. Just wanted to point out > > > > that such block device capabilities are generally advertised in terms of > > > > a QUEUE_FLAG. > > > > > > I do not have a strong opinion, but feel a bit odd to associate DAX to a > > > request queue. > > > > Given that we do not support dax to a raw block device [1] it seems a > > gendisk flag is more misleading than request_queue flag that specifies > > what requests can be made of the device. > > > > [1]: acc93d30d7d4 Revert "block: enable dax for raw block devices" > > Oh, I see. I will change to use request_queue flag. I implemented the block patch for this yesterday but based on your feedback I stopped there (didn't carry the change through to the DM core and DM linear -- can easily do so tomorrow though). Feel free to use this as a starting point and fix/extend and add a proper header: From e88736ce322f248157da6c7d402e940adafffa1e Mon Sep 17 00:00:00 2001 From: Mike Snitzer <snitzer@redhat.com> Date: Tue, 21 Jun 2016 12:23:29 -0400 Subject: [PATCH] block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/block/brd.c | 3 +++ drivers/nvdimm/pmem.c | 1 + drivers/s390/block/dcssblk.c | 1 + fs/block_dev.c | 5 +++-- include/linux/blkdev.h | 2 ++ 5 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/block/brd.c b/drivers/block/brd.c index f5b0d6f..13eee12 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -508,6 +508,9 @@ static struct brd_device *brd_alloc(int i) brd->brd_queue->limits.discard_granularity = PAGE_SIZE; blk_queue_max_discard_sectors(brd->brd_queue, UINT_MAX); brd->brd_queue->limits.discard_zeroes_data = 1; +#ifdef CONFIG_BLK_DEV_RAM_DAX + queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue); +#endif queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue); disk = brd->brd_disk = alloc_disk(max_part); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 608fc44..53b701b 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -283,6 +283,7 @@ static int pmem_attach_disk(struct device *dev, blk_queue_max_hw_sectors(q, UINT_MAX); blk_queue_bounce_limit(q, BLK_BOUNCE_ANY); queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q); + queue_flag_set_unlocked(QUEUE_FLAG_DAX, q); q->queuedata = pmem; disk = alloc_disk_node(0, nid); diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index bed53c4..093e9e1 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -618,6 +618,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char dev_info->gd->driverfs_dev = &dev_info->dev; blk_queue_make_request(dev_info->dcssblk_queue, dcssblk_make_request); blk_queue_logical_block_size(dev_info->dcssblk_queue, 4096); + queue_flag_set_unlocked(QUEUE_FLAG_DAX, dev_info->dcssblk_queue); seg_byte_size = (dev_info->end - dev_info->start + 1); set_capacity(dev_info->gd, seg_byte_size >> 9); // size in sectors diff --git a/fs/block_dev.c b/fs/block_dev.c index 71ccab1..9bcb3a9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -484,6 +484,7 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax) sector_t sector = dax->sector; long avail, size = dax->size; const struct block_device_operations *ops = bdev->bd_disk->fops; + struct request_queue *q = bdev_get_queue(bdev); /* * The device driver is allowed to sleep, in order to make the @@ -493,7 +494,7 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax) if (size < 0) return size; - if (!ops->direct_access) + if (!blk_queue_dax(q) || !ops->direct_access) return -EOPNOTSUPP; if ((sector + DIV_ROUND_UP(size, 512)) > part_nr_sects_read(bdev->bd_part)) @@ -1287,7 +1288,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) bdev->bd_disk = disk; bdev->bd_queue = disk->queue; bdev->bd_contains = bdev; - if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && disk->fops->direct_access) + if (IS_ENABLED(CONFIG_BLK_DEV_DAX) && blk_queue_dax(disk->queue)) bdev->bd_inode->i_flags = S_DAX; else bdev->bd_inode->i_flags = 0; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9746d22..d5cb326 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -505,6 +505,7 @@ struct request_queue { #define QUEUE_FLAG_WC 23 /* Write back caching */ #define QUEUE_FLAG_FUA 24 /* device supports FUA writes */ #define QUEUE_FLAG_FLUSH_NQ 25 /* flush not queueuable */ +#define QUEUE_FLAG_DAX 26 /* device supports DAX */ #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_STACKABLE) | \ @@ -594,6 +595,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q) #define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags) #define blk_queue_secdiscard(q) (blk_queue_discard(q) && \ test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags)) +#define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags) #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \ -- 2.7.4 (Apple Git-66) ^ permalink raw reply related [flat|nested] 40+ messages in thread
[parent not found: <20160622223842.GA34512-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices [not found] ` <20160622223842.GA34512-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2016-06-22 22:59 ` Kani, Toshimitsu 0 siblings, 0 replies; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-22 22:59 UTC (permalink / raw) To: snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, axboe-b10kYP2dOMg@public.gmane.org, linux-nvdimm-y27Ovi1pjclAfugRpC6u6w@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org, agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Wed, 2016-06-22 at 18:38 -0400, Mike Snitzer wrote: > On Wed, Jun 22 2016 at 4:16P -0400, > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > On Wed, 2016-06-22 at 12:15 -0700, Dan Williams wrote: > > > > > > On Wed, Jun 22, 2016 at 10:44 AM, Kani, Toshimitsu <toshi.kani@hpe.com> > > > wrote: > > > > > > > > On Tue, 2016-06-21 at 14:17 -0400, Mike Snitzer wrote: > > > > > > > > > > On Tue, Jun 21 2016 at 11:44am -0400, > > > > > Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > > > > > > > > > > > > On Tue, 2016-06-21 at 09:41 -0400, Mike Snitzer wrote: > > > > > > > > > > > > The devices in question have a request_queue. All bio-based device > > > > > have a request_queue. > > > > > > > > DAX-capable devices have two operation modes, bio-based and DAX. I > > > > agree that bio-based operation is associated with a request queue, and > > > > its capabilities should be set to it. DAX, on the other hand, is > > > > rather independent from a request queue. > > > > > > > > > > > > > > I don't have a big problem with GENHD_FL_DAX. Just wanted to point > > > > > out that such block device capabilities are generally advertised in > > > > > terms of a QUEUE_FLAG. > > > > > > > > I do not have a strong opinion, but feel a bit odd to associate DAX to > > > > a request queue. > > > > > > Given that we do not support dax to a raw block device [1] it seems a > > > gendisk flag is more misleading than request_queue flag that specifies > > > what requests can be made of the device. > > > > > > [1]: acc93d30d7d4 Revert "block: enable dax for raw block devices" > > > > Oh, I see. I will change to use request_queue flag. > > I implemented the block patch for this yesterday but based on your > feedback I stopped there (didn't carry the change through to the DM core > and DM linear -- can easily do so tomorrow though). > > Feel free to use this as a starting point and fix/extend and add a > proper header: Thanks Mike! I made similar changes as well. I will take yours and finish up the rest. :) -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani ` (6 preceding siblings ...) 2016-06-13 22:57 ` [PATCH 0/6] Support DAX for device-mapper dm-linear devices Mike Snitzer @ 2016-06-13 23:18 ` Dan Williams 2016-06-13 23:59 ` Kani, Toshimitsu 7 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2016-06-13 23:18 UTC (permalink / raw) To: Toshi Kani Cc: Alasdair Kergon, Mike Snitzer, Ross Zwisler, Al Viro, Jens Axboe, linux-nvdimm@lists.01.org, dm-devel, linux-raid, linux-kernel@vger.kernel.org Thanks Toshi! On Mon, Jun 13, 2016 at 3:21 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > This patch-set adds DAX support to device-mapper dm-linear devices > used by LVM. It works with LVM commands as follows: > - Creation of a logical volume with all DAX capable devices (such > as pmem) sets the logical volume DAX capable as well. > - Once a logical volume is set to DAX capable, the volume may not > be extended with non-DAX capable devices. I don't mind this, but it seems a policy decision that the kernel does not need to make. A sufficiently sophisticated user could cope with DAX being available at varying LBAs. Would it be sufficient to move this policy decision to userspace tooling? > The direct_access interface is added to dm and dm-linear to map > a request to a target device. I had dm-linear and md-raid0 support on my list of things to look at, did you have raid0 in your plans? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-13 23:18 ` Dan Williams @ 2016-06-13 23:59 ` Kani, Toshimitsu 2016-06-14 0:02 ` Dan Williams 2016-06-14 13:50 ` Jeff Moyer 0 siblings, 2 replies; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-13 23:59 UTC (permalink / raw) To: dan.j.williams@intel.com Cc: linux-kernel@vger.kernel.org, agk@redhat.com, linux-raid@vger.kernel.org, snitzer@redhat.com, viro@zeniv.linux.org.uk, axboe@kernel.dk, linux-nvdimm@lists.01.org, ross.zwisler@linux.intel.com, dm-devel@redhat.com On Mon, 2016-06-13 at 16:18 -0700, Dan Williams wrote: > Thanks Toshi! > > On Mon, Jun 13, 2016 at 3:21 PM, Toshi Kani <toshi.kani@hpe.com> wrote: > > > > This patch-set adds DAX support to device-mapper dm-linear devices > > used by LVM. It works with LVM commands as follows: > > - Creation of a logical volume with all DAX capable devices (such > > as pmem) sets the logical volume DAX capable as well. > > - Once a logical volume is set to DAX capable, the volume may not > > be extended with non-DAX capable devices. > > I don't mind this, but it seems a policy decision that the kernel does > not need to make. A sufficiently sophisticated user could cope with > DAX being available at varying LBAs. Would it be sufficient to move > this policy decision to userspace tooling? I think this is a kernel restriction. When a block device is declared as DAX capable, it should mean that the whole device is DAX capable. So, I think we need to assure the same to a mapped device. In LVM level, a volume group may contain both DAX and non-DAX capable devices. There is no restriction for creating/extending a volume group. > > The direct_access interface is added to dm and dm-linear to map > > a request to a target device. > > I had dm-linear and md-raid0 support on my list of things to look at, > did you have raid0 in your plans? Yes, I hope to extend further and raid0 is a good candidate. Thanks, -Toshi ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-13 23:59 ` Kani, Toshimitsu @ 2016-06-14 0:02 ` Dan Williams 2016-06-14 7:30 ` Dan Williams 2016-06-14 13:50 ` Jeff Moyer 1 sibling, 1 reply; 40+ messages in thread From: Dan Williams @ 2016-06-14 0:02 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel@vger.kernel.org, agk@redhat.com, linux-raid@vger.kernel.org, snitzer@redhat.com, viro@zeniv.linux.org.uk, axboe@kernel.dk, linux-nvdimm@lists.01.org, ross.zwisler@linux.intel.com, dm-devel@redhat.com On Mon, Jun 13, 2016 at 4:59 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: > On Mon, 2016-06-13 at 16:18 -0700, Dan Williams wrote: >> Thanks Toshi! >> >> On Mon, Jun 13, 2016 at 3:21 PM, Toshi Kani <toshi.kani@hpe.com> wrote: >> > >> > This patch-set adds DAX support to device-mapper dm-linear devices >> > used by LVM. It works with LVM commands as follows: >> > - Creation of a logical volume with all DAX capable devices (such >> > as pmem) sets the logical volume DAX capable as well. >> > - Once a logical volume is set to DAX capable, the volume may not >> > be extended with non-DAX capable devices. >> >> I don't mind this, but it seems a policy decision that the kernel does >> not need to make. A sufficiently sophisticated user could cope with >> DAX being available at varying LBAs. Would it be sufficient to move >> this policy decision to userspace tooling? > > I think this is a kernel restriction. When a block device is declared as > DAX capable, it should mean that the whole device is DAX capable. So, I > think we need to assure the same to a mapped device. Hmm, but we already violate this with badblocks. The device is DAX capable, but certain LBAs will return an error if direct_access is attempted. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-14 0:02 ` Dan Williams @ 2016-06-14 7:30 ` Dan Williams 0 siblings, 0 replies; 40+ messages in thread From: Dan Williams @ 2016-06-14 7:30 UTC (permalink / raw) To: Kani, Toshimitsu Cc: linux-kernel@vger.kernel.org, agk@redhat.com, linux-raid@vger.kernel.org, snitzer@redhat.com, viro@zeniv.linux.org.uk, axboe@kernel.dk, linux-nvdimm@lists.01.org, ross.zwisler@linux.intel.com, dm-devel@redhat.com On Mon, Jun 13, 2016 at 5:02 PM, Dan Williams <dan.j.williams@intel.com> wrote: > On Mon, Jun 13, 2016 at 4:59 PM, Kani, Toshimitsu <toshi.kani@hpe.com> wrote: >> On Mon, 2016-06-13 at 16:18 -0700, Dan Williams wrote: >>> Thanks Toshi! >>> >>> On Mon, Jun 13, 2016 at 3:21 PM, Toshi Kani <toshi.kani@hpe.com> wrote: >>> > >>> > This patch-set adds DAX support to device-mapper dm-linear devices >>> > used by LVM. It works with LVM commands as follows: >>> > - Creation of a logical volume with all DAX capable devices (such >>> > as pmem) sets the logical volume DAX capable as well. >>> > - Once a logical volume is set to DAX capable, the volume may not >>> > be extended with non-DAX capable devices. >>> >>> I don't mind this, but it seems a policy decision that the kernel does >>> not need to make. A sufficiently sophisticated user could cope with >>> DAX being available at varying LBAs. Would it be sufficient to move >>> this policy decision to userspace tooling? >> >> I think this is a kernel restriction. When a block device is declared as >> DAX capable, it should mean that the whole device is DAX capable. So, I >> think we need to assure the same to a mapped device. > > Hmm, but we already violate this with badblocks. The device is DAX > capable, but certain LBAs will return an error if direct_access is > attempted. Nevermind, for this to be useful we would need to fallback to regular mmap for a portion of the linear span. That's different than the badblocks case. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-13 23:59 ` Kani, Toshimitsu 2016-06-14 0:02 ` Dan Williams @ 2016-06-14 13:50 ` Jeff Moyer 2016-06-14 15:41 ` Mike Snitzer 2016-06-14 15:53 ` Kani, Toshimitsu 1 sibling, 2 replies; 40+ messages in thread From: Jeff Moyer @ 2016-06-14 13:50 UTC (permalink / raw) To: Kani, Toshimitsu Cc: dan.j.williams@intel.com, linux-kernel@vger.kernel.org, agk@redhat.com, linux-raid@vger.kernel.org, snitzer@redhat.com, viro@zeniv.linux.org.uk, axboe@kernel.dk, linux-nvdimm@lists.01.org, ross.zwisler@linux.intel.com, dm-devel@redhat.com "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: >> I had dm-linear and md-raid0 support on my list of things to look at, >> did you have raid0 in your plans? > > Yes, I hope to extend further and raid0 is a good candidate. dm-flakey would allow more xfstests test cases to run. I'd say that's more important than linear or raid0. ;-) Also, the next step in this work is to then decide how to determine on what numa node an LBA resides. We had discussed this at a prior plumbers conference, and I think the consensus was to use xattrs. Toshi, do you also plan to do that work? Cheers, Jeff ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-14 13:50 ` Jeff Moyer @ 2016-06-14 15:41 ` Mike Snitzer 2016-06-14 18:00 ` Kani, Toshimitsu 2016-06-14 20:19 ` Jeff Moyer 2016-06-14 15:53 ` Kani, Toshimitsu 1 sibling, 2 replies; 40+ messages in thread From: Mike Snitzer @ 2016-06-14 15:41 UTC (permalink / raw) To: Jeff Moyer Cc: Kani, Toshimitsu, axboe@kernel.dk, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, dm-devel@redhat.com, viro@zeniv.linux.org.uk, dan.j.williams@intel.com, ross.zwisler@linux.intel.com, agk@redhat.com On Tue, Jun 14 2016 at 9:50am -0400, Jeff Moyer <jmoyer@redhat.com> wrote: > "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: > > >> I had dm-linear and md-raid0 support on my list of things to look at, > >> did you have raid0 in your plans? > > > > Yes, I hope to extend further and raid0 is a good candidate. > > dm-flakey would allow more xfstests test cases to run. I'd say that's > more important than linear or raid0. ;-) Regardless of which target(s) grow DAX support the most pressing initial concern is getting the DM device stacking correct. And verifying that IO that cross pmem device boundaries are being properly split by DM core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to max_io_len). My hope is to nail down the DM core and its dependencies in block etc. Doing so in terms of dm-linear doesn't seem like wasted effort considering you told me it'd be useful to have for pmem devices. > Also, the next step in this work is to then decide how to determine on > what numa node an LBA resides. We had discussed this at a prior > plumbers conference, and I think the consensus was to use xattrs. > Toshi, do you also plan to do that work? How does the associated NUMA node relate to this? Does the DM requests_queue need to be setup to only allocate from the NUMA node the pmem device is attached to? I recently added support for this to DM. But there will likely be some code need to propagate the NUMA node id accordingly. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-14 15:41 ` Mike Snitzer @ 2016-06-14 18:00 ` Kani, Toshimitsu 2016-06-14 20:19 ` Jeff Moyer 1 sibling, 0 replies; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-14 18:00 UTC (permalink / raw) To: jmoyer@redhat.com, snitzer@redhat.com Cc: linux-kernel@vger.kernel.org, linux-nvdimm@ml01.01.org, agk@redhat.com, linux-raid@vger.kernel.org, viro@zeniv.linux.org.uk, dan.j.williams@intel.com, axboe@kernel.dk, ross.zwisler@linux.intel.com, dm-devel@redhat.com On Tue, 2016-06-14 at 11:41 -0400, Mike Snitzer wrote: > On Tue, Jun 14 2016 at 9:50am -0400, > Jeff Moyer <jmoyer@redhat.com> wrote: > > "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: > > > > I had dm-linear and md-raid0 support on my list of things to look > > > > at, did you have raid0 in your plans? > > > > > > Yes, I hope to extend further and raid0 is a good candidate. > > > > dm-flakey would allow more xfstests test cases to run. I'd say that's > > more important than linear or raid0. ;-) > > Regardless of which target(s) grow DAX support the most pressing initial > concern is getting the DM device stacking correct. And verifying that > IO that cross pmem device boundaries are being properly split by DM > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to > max_io_len). Agreed. I've briefly tested stacking and it seems working fine. As for IO crossing pmem device boundaries, __split_and_process_non_flush() is used when the device is mounted without DAX option. With DAX, this case is handled by dm_blk_direct_access() that limits return size. This leads the caller to iterate (read/write) or fallback to a smaller size (mmap pfault). > My hope is to nail down the DM core and its dependencies in block etc. > Doing so in terms of dm-linear doesn't seem like wasted effort > considering you told me it'd be useful to have for pmem devices. Yes, I think dm-linear is useful as it gives more flexibility, ex. it allows creating a large device with multiple pmem devices. > > Also, the next step in this work is to then decide how to determine on > > what numa node an LBA resides. We had discussed this at a prior > > plumbers conference, and I think the consensus was to use xattrs. > > Toshi, do you also plan to do that work? > > How does the associated NUMA node relate to this? Does the > DM requests_queue need to be setup to only allocate from the NUMA node > the pmem device is attached to? I recently added support for this to > DM. But there will likely be some code need to propagate the NUMA node > id accordingly. Each pmem device has sysfs "numa_node" so that tools like numactl can be used to bind application to run on the same locality as pmem device (since CPU directly accesses to pmem). This won't work well with mapped device since it can be composed with multiple localities. Locality info would need to be managed file-basis as Jeff mentioned. Thanks, -Toshi ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-14 15:41 ` Mike Snitzer 2016-06-14 18:00 ` Kani, Toshimitsu @ 2016-06-14 20:19 ` Jeff Moyer 2016-06-15 1:46 ` Mike Snitzer 1 sibling, 1 reply; 40+ messages in thread From: Jeff Moyer @ 2016-06-14 20:19 UTC (permalink / raw) To: Mike Snitzer Cc: Kani, Toshimitsu, axboe@kernel.dk, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, dm-devel@redhat.com, viro@zeniv.linux.org.uk, dan.j.williams@intel.com, ross.zwisler@linux.intel.com, agk@redhat.com Mike Snitzer <snitzer@redhat.com> writes: > On Tue, Jun 14 2016 at 9:50am -0400, > Jeff Moyer <jmoyer@redhat.com> wrote: > >> "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: >> >> >> I had dm-linear and md-raid0 support on my list of things to look at, >> >> did you have raid0 in your plans? >> > >> > Yes, I hope to extend further and raid0 is a good candidate. >> >> dm-flakey would allow more xfstests test cases to run. I'd say that's >> more important than linear or raid0. ;-) > > Regardless of which target(s) grow DAX support the most pressing initial > concern is getting the DM device stacking correct. And verifying that > IO that cross pmem device boundaries are being properly split by DM > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to > max_io_len). That was a tongue-in-cheek comment. You're reading way too much into it. >> Also, the next step in this work is to then decide how to determine on >> what numa node an LBA resides. We had discussed this at a prior >> plumbers conference, and I think the consensus was to use xattrs. >> Toshi, do you also plan to do that work? > > How does the associated NUMA node relate to this? Does the > DM requests_queue need to be setup to only allocate from the NUMA node > the pmem device is attached to? I recently added support for this to > DM. But there will likely be some code need to propagate the NUMA node > id accordingly. I assume you mean allocate memory (the volatile kind). That should work the same between pmem and regular block devices, no? What I was getting at was that applications may want to know on which node their data resides. Right now, it's easy to tell because a single device cannot span numa nodes, or, if it does, it does so via an interleave, so numa information isn't interesting. However, once data on a single file system can be placed on multiple different numa nodes, applications may want to query and/or control that placement. Here's a snippet from a blog post I never finished: There are two essential questions that need to be answered regarding persistent memory and NUMA: first, would an application benefit from being able to query the NUMA locality of its data, and second, would an application benefit from being able to specify a placement policy for its data? This article is an attempt to summarize the current state of hardware and software in order to consider the above two questions. We begin with a short list of use cases for these interfaces, which will frame the discussion. First, let's consider an interface that allows an application to query the NUMA placement of existing data. With such information, an application may want to perform the following actions: - relocate application processes to the same NUMA node as their data. (Interfaces for moving a process are readily available.) - specify a memory (RAM) allocation policy so that memory allocations come from the same NUMA node as the data. Second, we consider an interface that allows an application to specify a placement policy for new data. Using this interface, an application may: - ensure data is stored on the same NUMA node as the one on which the application is running - ensure data is stored on the same NUMA node as an I/O adapter such as a network card, that is a producer of data stored to NVM. - ensure data is stored on a different NUMA node: - so that the data is stored on the same NUMA node as related data - because the data does not need the faster access afforded by local NUMA placement. Presumably this is a trade-off, and other data will require local placement to meet the performance goals of the application. Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-14 20:19 ` Jeff Moyer @ 2016-06-15 1:46 ` Mike Snitzer 2016-06-15 2:07 ` Dan Williams 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2016-06-15 1:46 UTC (permalink / raw) To: Jeff Moyer Cc: Kani, Toshimitsu, axboe@kernel.dk, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, dm-devel@redhat.com, viro@zeniv.linux.org.uk, dan.j.williams@intel.com, ross.zwisler@linux.intel.com, agk@redhat.com On Tue, Jun 14 2016 at 4:19pm -0400, Jeff Moyer <jmoyer@redhat.com> wrote: > Mike Snitzer <snitzer@redhat.com> writes: > > > On Tue, Jun 14 2016 at 9:50am -0400, > > Jeff Moyer <jmoyer@redhat.com> wrote: > > > >> "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: > >> > >> >> I had dm-linear and md-raid0 support on my list of things to look at, > >> >> did you have raid0 in your plans? > >> > > >> > Yes, I hope to extend further and raid0 is a good candidate. > >> > >> dm-flakey would allow more xfstests test cases to run. I'd say that's > >> more important than linear or raid0. ;-) > > > > Regardless of which target(s) grow DAX support the most pressing initial > > concern is getting the DM device stacking correct. And verifying that > > IO that cross pmem device boundaries are being properly split by DM > > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to > > max_io_len). > > That was a tongue-in-cheek comment. You're reading way too much into > it. > > >> Also, the next step in this work is to then decide how to determine on > >> what numa node an LBA resides. We had discussed this at a prior > >> plumbers conference, and I think the consensus was to use xattrs. > >> Toshi, do you also plan to do that work? > > > > How does the associated NUMA node relate to this? Does the > > DM requests_queue need to be setup to only allocate from the NUMA node > > the pmem device is attached to? I recently added support for this to > > DM. But there will likely be some code need to propagate the NUMA node > > id accordingly. > > I assume you mean allocate memory (the volatile kind). That should work > the same between pmem and regular block devices, no? This is the commit I made to train DM to be numa node aware: 115485e83f497fdf9b4 ("dm: add 'dm_numa_node' module parameter") As is the DM code is focused on memory allocations. But I think blk-mq may use the NUMA node for via tag_set->numa_node. But that is moot given pmem is bio-based right? Steps could be taken to make all threads DM creates for a a given device get pinned to the specified NUMA node too. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-15 1:46 ` Mike Snitzer @ 2016-06-15 2:07 ` Dan Williams 2016-06-15 2:35 ` Mike Snitzer 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2016-06-15 2:07 UTC (permalink / raw) To: Mike Snitzer Cc: Jeff Moyer, Kani, Toshimitsu, axboe@kernel.dk, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, dm-devel@redhat.com, viro@zeniv.linux.org.uk, ross.zwisler@linux.intel.com, agk@redhat.com On Tue, Jun 14, 2016 at 6:46 PM, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, Jun 14 2016 at 4:19pm -0400, > Jeff Moyer <jmoyer@redhat.com> wrote: > >> Mike Snitzer <snitzer@redhat.com> writes: >> >> > On Tue, Jun 14 2016 at 9:50am -0400, >> > Jeff Moyer <jmoyer@redhat.com> wrote: >> > >> >> "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: >> >> >> >> >> I had dm-linear and md-raid0 support on my list of things to look at, >> >> >> did you have raid0 in your plans? >> >> > >> >> > Yes, I hope to extend further and raid0 is a good candidate. >> >> >> >> dm-flakey would allow more xfstests test cases to run. I'd say that's >> >> more important than linear or raid0. ;-) >> > >> > Regardless of which target(s) grow DAX support the most pressing initial >> > concern is getting the DM device stacking correct. And verifying that >> > IO that cross pmem device boundaries are being properly split by DM >> > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to >> > max_io_len). >> >> That was a tongue-in-cheek comment. You're reading way too much into >> it. >> >> >> Also, the next step in this work is to then decide how to determine on >> >> what numa node an LBA resides. We had discussed this at a prior >> >> plumbers conference, and I think the consensus was to use xattrs. >> >> Toshi, do you also plan to do that work? >> > >> > How does the associated NUMA node relate to this? Does the >> > DM requests_queue need to be setup to only allocate from the NUMA node >> > the pmem device is attached to? I recently added support for this to >> > DM. But there will likely be some code need to propagate the NUMA node >> > id accordingly. >> >> I assume you mean allocate memory (the volatile kind). That should work >> the same between pmem and regular block devices, no? > > This is the commit I made to train DM to be numa node aware: > 115485e83f497fdf9b4 ("dm: add 'dm_numa_node' module parameter") Hmm, but this is global for all DM device instances. > As is the DM code is focused on memory allocations. But I think blk-mq > may use the NUMA node for via tag_set->numa_node. But that is moot > given pmem is bio-based right? Right. > > Steps could be taken to make all threads DM creates for a a given device > get pinned to the specified NUMA node too. I think it would be useful if a DM instance inherited the numa node from the component devices by default (assuming they're all from the same node). A "dev_to_node(disk_to_dev(disk))" conversion works for pmem devices. As far as I understand, Jeff wants to go further and have a linear span across component devices from different nodes with an interface to do an LBA-to-numa-node conversion. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-15 2:07 ` Dan Williams @ 2016-06-15 2:35 ` Mike Snitzer 0 siblings, 0 replies; 40+ messages in thread From: Mike Snitzer @ 2016-06-15 2:35 UTC (permalink / raw) To: Dan Williams Cc: Jeff Moyer, Kani, Toshimitsu, axboe@kernel.dk, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, dm-devel@redhat.com, viro@zeniv.linux.org.uk, ross.zwisler@linux.intel.com, agk@redhat.com On Tue, Jun 14 2016 at 10:07pm -0400, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, Jun 14, 2016 at 6:46 PM, Mike Snitzer <snitzer@redhat.com> wrote: > > On Tue, Jun 14 2016 at 4:19pm -0400, > > Jeff Moyer <jmoyer@redhat.com> wrote: > > > >> Mike Snitzer <snitzer@redhat.com> writes: > >> > >> > On Tue, Jun 14 2016 at 9:50am -0400, > >> > Jeff Moyer <jmoyer@redhat.com> wrote: > >> > > >> >> "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: > >> >> > >> >> >> I had dm-linear and md-raid0 support on my list of things to look at, > >> >> >> did you have raid0 in your plans? > >> >> > > >> >> > Yes, I hope to extend further and raid0 is a good candidate. > >> >> > >> >> dm-flakey would allow more xfstests test cases to run. I'd say that's > >> >> more important than linear or raid0. ;-) > >> > > >> > Regardless of which target(s) grow DAX support the most pressing initial > >> > concern is getting the DM device stacking correct. And verifying that > >> > IO that cross pmem device boundaries are being properly split by DM > >> > core (via drivers/md/dm.c:__split_and_process_non_flush()'s call to > >> > max_io_len). > >> > >> That was a tongue-in-cheek comment. You're reading way too much into > >> it. > >> > >> >> Also, the next step in this work is to then decide how to determine on > >> >> what numa node an LBA resides. We had discussed this at a prior > >> >> plumbers conference, and I think the consensus was to use xattrs. > >> >> Toshi, do you also plan to do that work? > >> > > >> > How does the associated NUMA node relate to this? Does the > >> > DM requests_queue need to be setup to only allocate from the NUMA node > >> > the pmem device is attached to? I recently added support for this to > >> > DM. But there will likely be some code need to propagate the NUMA node > >> > id accordingly. > >> > >> I assume you mean allocate memory (the volatile kind). That should work > >> the same between pmem and regular block devices, no? > > > > This is the commit I made to train DM to be numa node aware: > > 115485e83f497fdf9b4 ("dm: add 'dm_numa_node' module parameter") > > Hmm, but this is global for all DM device instances. Right, only because I didn't have a convenient way to allow the user to specify it on a per-device level. But I'll defer skinning that cat for now since in this pmem case we'd inherit from the underlying device(s) > > As is the DM code is focused on memory allocations. But I think blk-mq > > may use the NUMA node for via tag_set->numa_node. But that is moot > > given pmem is bio-based right? > > Right. > > > > > Steps could be taken to make all threads DM creates for a a given device > > get pinned to the specified NUMA node too. > > I think it would be useful if a DM instance inherited the numa node > from the component devices by default (assuming they're all from the > same node). A "dev_to_node(disk_to_dev(disk))" conversion works for > pmem devices. OK, I can look to make that happen. > As far as I understand, Jeff wants to go further and have a linear > span across component devices from different nodes with an interface > to do an LBA-to-numa-node conversion. All that variability makes DM's ability to do anything sane with it close to impossible considering memory pools, threads, etc are all pinned during the first activation of the DM device. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/6] Support DAX for device-mapper dm-linear devices 2016-06-14 13:50 ` Jeff Moyer 2016-06-14 15:41 ` Mike Snitzer @ 2016-06-14 15:53 ` Kani, Toshimitsu 1 sibling, 0 replies; 40+ messages in thread From: Kani, Toshimitsu @ 2016-06-14 15:53 UTC (permalink / raw) To: jmoyer@redhat.com Cc: linux-kernel@vger.kernel.org, linux-nvdimm@ml01.01.org, agk@redhat.com, linux-raid@vger.kernel.org, snitzer@redhat.com, viro@zeniv.linux.org.uk, dan.j.williams@intel.com, axboe@kernel.dk, ross.zwisler@linux.intel.com, dm-devel@redhat.com On Tue, 2016-06-14 at 09:50 -0400, Jeff Moyer wrote: > "Kani, Toshimitsu" <toshi.kani@hpe.com> writes: > > > I had dm-linear and md-raid0 support on my list of things to look at, > > > did you have raid0 in your plans? > > > > Yes, I hope to extend further and raid0 is a good candidate. > > dm-flakey would allow more xfstests test cases to run. I'd say that's > more important than linear or raid0. ;-) That's an interesting one. We can emulate badblocks by failing direct_access with -EIO, but I do not think we can emulate "drop_writes" and "corrupt_bio_byte" features with DAX... > Also, the next step in this work is to then decide how to determine on > what numa node an LBA resides. We had discussed this at a prior > plumbers conference, and I think the consensus was to use xattrs. > Toshi, do you also plan to do that work? No, it's not my plan at this point. Thanks, -Toshi ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2016-06-22 22:59 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-13 22:21 [PATCH 0/6] Support DAX for device-mapper dm-linear devices Toshi Kani 2016-06-13 22:21 ` [PATCH 1/6] genhd: Add GENHD_FL_DAX to gendisk flags Toshi Kani 2016-06-13 22:21 ` [PATCH 2/6] block: Check GENHD_FL_DAX for DAX capability Toshi Kani 2016-06-13 22:21 ` [PATCH 3/6] dm: Add dm_blk_direct_access() for mapped device Toshi Kani 2016-06-13 22:21 ` [PATCH 4/6] dm-linear: Add linear_direct_access() Toshi Kani 2016-06-13 22:21 ` [PATCH 5/6] dm, dm-linear: Add dax_supported to dm_target Toshi Kani 2016-06-13 22:21 ` [PATCH 6/6] dm: Enable DAX support for mapper device Toshi Kani 2016-06-13 22:57 ` [PATCH 0/6] Support DAX for device-mapper dm-linear devices Mike Snitzer [not found] ` <20160613225756.GA18417-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-06-20 18:00 ` Mike Snitzer [not found] ` <20160620180043.GA21261-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-06-20 18:31 ` Kani, Toshimitsu [not found] ` <1466446861.3504.243.camel-ZPxbGqLxI0U@public.gmane.org> 2016-06-20 19:40 ` Mike Snitzer [not found] ` <20160620194026.GA21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-06-20 19:52 ` Mike Snitzer [not found] ` <20160620195217.GB21657-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-06-20 20:11 ` Kani, Toshimitsu [not found] ` <1466452883.3504.244.camel-ZPxbGqLxI0U@public.gmane.org> 2016-06-20 21:28 ` Kani, Toshimitsu [not found] ` <1466457467.3504.249.camel-ZPxbGqLxI0U@public.gmane.org> 2016-06-20 22:22 ` Mike Snitzer [not found] ` <20160620222236.GA22461-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-06-21 13:41 ` Mike Snitzer [not found] ` <20160621134147.GA26392-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-06-21 15:44 ` Kani, Toshimitsu [not found] ` <1466523280.3504.262.camel-ZPxbGqLxI0U@public.gmane.org> 2016-06-21 15:50 ` Kani, Toshimitsu 2016-06-21 16:25 ` Dan Williams [not found] ` <CAPcyv4gFREc94ANuFD_Lyddx3iqRTN2UDebgeJe3LqPL8xrVzg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-06-21 16:35 ` Kani, Toshimitsu [not found] ` <1466526342.3504.270.camel-ZPxbGqLxI0U@public.gmane.org> 2016-06-21 16:45 ` Dan Williams [not found] ` <CAPcyv4ht8B7dHe1ckv5d=bOrRzCy3=ZDVSTD0rRsak_LYD8r8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2016-06-21 16:56 ` Kani, Toshimitsu 2016-06-21 18:17 ` Mike Snitzer 2016-06-22 17:44 ` Kani, Toshimitsu [not found] ` <1466616868.3504.320.camel-ZPxbGqLxI0U@public.gmane.org> 2016-06-22 19:15 ` Dan Williams 2016-06-22 20:16 ` Kani, Toshimitsu 2016-06-22 22:38 ` Mike Snitzer [not found] ` <20160622223842.GA34512-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2016-06-22 22:59 ` Kani, Toshimitsu 2016-06-13 23:18 ` Dan Williams 2016-06-13 23:59 ` Kani, Toshimitsu 2016-06-14 0:02 ` Dan Williams 2016-06-14 7:30 ` Dan Williams 2016-06-14 13:50 ` Jeff Moyer 2016-06-14 15:41 ` Mike Snitzer 2016-06-14 18:00 ` Kani, Toshimitsu 2016-06-14 20:19 ` Jeff Moyer 2016-06-15 1:46 ` Mike Snitzer 2016-06-15 2:07 ` Dan Williams 2016-06-15 2:35 ` Mike Snitzer 2016-06-14 15:53 ` Kani, Toshimitsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).