* Snapshot target and DAX-capable devices @ 2018-08-27 16:07 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-27 16:07 UTC (permalink / raw) To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw Cc: Mike Snitzer, dm-devel-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA Hi, I've been analyzing why fstest generic/081 fails when the backing device is capable of DAX. The problem boils down to the failure of: lvm vgcreate -f vg0 /dev/pmem0 lvm lvcreate -L 128M -n lv0 vg0 lvm lvcreate -s -L 4M -n snap0 vg0/lv0 The last command fails like: device-mapper: reload ioctl on (253:0) failed: Invalid argument Failed to lock logical volume vg0/lv0. Aborting. Manual intervention required. And the core of the problem is that volume vg0/lv0 is originally of DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent DAX mounts if not supported". The question is whether / how this should be fixed. The current inability to create snapshots of DAX-capable devices looks weird and the cryptic failure makes it even worse (it took me quite a while to understand what is failing and why). OTOH I see the rationale behind Ross' change as well. Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Snapshot target and DAX-capable devices @ 2018-08-27 16:07 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-27 16:07 UTC (permalink / raw) To: linux-nvdimm Cc: Ross Zwisler, dm-devel, Mike Snitzer, linux-fsdevel, Toshi Kani, Dan Williams Hi, I've been analyzing why fstest generic/081 fails when the backing device is capable of DAX. The problem boils down to the failure of: lvm vgcreate -f vg0 /dev/pmem0 lvm lvcreate -L 128M -n lv0 vg0 lvm lvcreate -s -L 4M -n snap0 vg0/lv0 The last command fails like: device-mapper: reload ioctl on (253:0) failed: Invalid argument Failed to lock logical volume vg0/lv0. Aborting. Manual intervention required. And the core of the problem is that volume vg0/lv0 is originally of DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent DAX mounts if not supported". The question is whether / how this should be fixed. The current inability to create snapshots of DAX-capable devices looks weird and the cryptic failure makes it even worse (it took me quite a while to understand what is failing and why). OTOH I see the rationale behind Ross' change as well. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Snapshot target and DAX-capable devices @ 2018-08-27 16:07 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-27 16:07 UTC (permalink / raw) To: linux-nvdimm; +Cc: Mike Snitzer, dm-devel, linux-fsdevel Hi, I've been analyzing why fstest generic/081 fails when the backing device is capable of DAX. The problem boils down to the failure of: lvm vgcreate -f vg0 /dev/pmem0 lvm lvcreate -L 128M -n lv0 vg0 lvm lvcreate -s -L 4M -n snap0 vg0/lv0 The last command fails like: device-mapper: reload ioctl on (253:0) failed: Invalid argument Failed to lock logical volume vg0/lv0. Aborting. Manual intervention required. And the core of the problem is that volume vg0/lv0 is originally of DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent DAX mounts if not supported". The question is whether / how this should be fixed. The current inability to create snapshots of DAX-capable devices looks weird and the cryptic failure makes it even worse (it took me quite a while to understand what is failing and why). OTOH I see the rationale behind Ross' change as well. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20180827160744.GE4002-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-27 16:07 ` Jan Kara (?) @ 2018-08-27 16:43 ` Kani, Toshi -1 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-08-27 16:43 UTC (permalink / raw) To: jack-AlSwsSmVLrQ@public.gmane.org, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > Hi, > > I've been analyzing why fstest generic/081 fails when the backing device is > capable of DAX. The problem boils down to the failure of: > > lvm vgcreate -f vg0 /dev/pmem0 > lvm lvcreate -L 128M -n lv0 vg0 > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > The last command fails like: > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > Failed to lock logical volume vg0/lv0. > Aborting. Manual intervention required. > > And the core of the problem is that volume vg0/lv0 is originally of > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > DAX mounts if not supported". > > The question is whether / how this should be fixed. The current inability > to create snapshots of DAX-capable devices looks weird and the cryptic > failure makes it even worse (it took me quite a while to understand what is > failing and why). OTOH I see the rationale behind Ross' change as well. Here are the dm-snap changes that went along with the original DAX support. commit b5ab4a9ba55 commit f6e629bd237 Basically, snapshots can be added/removed to DAX-capable devices, but snapshots need to be mounted without dax option. Thanks, -Toshi ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-27 16:43 ` Kani, Toshi 0 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-08-27 16:43 UTC (permalink / raw) To: jack@suse.cz, linux-nvdimm@lists.01.org Cc: dm-devel@redhat.com, ross.zwisler@linux.intel.com, dan.j.williams@intel.com, snitzer@redhat.com, linux-fsdevel@vger.kernel.org On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > Hi, > > I've been analyzing why fstest generic/081 fails when the backing device is > capable of DAX. The problem boils down to the failure of: > > lvm vgcreate -f vg0 /dev/pmem0 > lvm lvcreate -L 128M -n lv0 vg0 > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > The last command fails like: > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > Failed to lock logical volume vg0/lv0. > Aborting. Manual intervention required. > > And the core of the problem is that volume vg0/lv0 is originally of > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > DAX mounts if not supported". > > The question is whether / how this should be fixed. The current inability > to create snapshots of DAX-capable devices looks weird and the cryptic > failure makes it even worse (it took me quite a while to understand what is > failing and why). OTOH I see the rationale behind Ross' change as well. Here are the dm-snap changes that went along with the original DAX support. commit b5ab4a9ba55 commit f6e629bd237 Basically, snapshots can be added/removed to DAX-capable devices, but snapshots need to be mounted without dax option. Thanks, -Toshi ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-27 16:43 ` Kani, Toshi 0 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-08-27 16:43 UTC (permalink / raw) To: jack@suse.cz, linux-nvdimm@lists.01.org Cc: linux-fsdevel@vger.kernel.org, dm-devel@redhat.com, snitzer@redhat.com On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > Hi, > > I've been analyzing why fstest generic/081 fails when the backing device is > capable of DAX. The problem boils down to the failure of: > > lvm vgcreate -f vg0 /dev/pmem0 > lvm lvcreate -L 128M -n lv0 vg0 > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > The last command fails like: > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > Failed to lock logical volume vg0/lv0. > Aborting. Manual intervention required. > > And the core of the problem is that volume vg0/lv0 is originally of > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > DAX mounts if not supported". > > The question is whether / how this should be fixed. The current inability > to create snapshots of DAX-capable devices looks weird and the cryptic > failure makes it even worse (it took me quite a while to understand what is > failing and why). OTOH I see the rationale behind Ross' change as well. Here are the dm-snap changes that went along with the original DAX support. commit b5ab4a9ba55 commit f6e629bd237 Basically, snapshots can be added/removed to DAX-capable devices, but snapshots need to be mounted without dax option. Thanks, -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <e38303902267d2d8bae8b0c88da84a4ed668e9fb.camel-ZPxbGqLxI0U@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-27 16:43 ` Kani, Toshi (?) @ 2018-08-28 7:50 ` Jan Kara -1 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-28 7:50 UTC (permalink / raw) To: Kani, Toshi Cc: jack-AlSwsSmVLrQ@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > Hi, > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > capable of DAX. The problem boils down to the failure of: > > > > lvm vgcreate -f vg0 /dev/pmem0 > > lvm lvcreate -L 128M -n lv0 vg0 > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > The last command fails like: > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > Failed to lock logical volume vg0/lv0. > > Aborting. Manual intervention required. > > > > And the core of the problem is that volume vg0/lv0 is originally of > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > DAX mounts if not supported". > > > > The question is whether / how this should be fixed. The current inability > > to create snapshots of DAX-capable devices looks weird and the cryptic > > failure makes it even worse (it took me quite a while to understand what is > > failing and why). OTOH I see the rationale behind Ross' change as well. > > Here are the dm-snap changes that went along with the original DAX > support. > > commit b5ab4a9ba55 > commit f6e629bd237 > > Basically, snapshots can be added/removed to DAX-capable devices, but > snapshots need to be mounted without dax option. Yes, and after these two commits things were working. But then commit dbc626597 broke things again so currently snapshotting DAX-capable devices does not work. Just try with 4.18... Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-28 7:50 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-28 7:50 UTC (permalink / raw) To: Kani, Toshi Cc: jack@suse.cz, linux-nvdimm@lists.01.org, dm-devel@redhat.com, ross.zwisler@linux.intel.com, dan.j.williams@intel.com, snitzer@redhat.com, linux-fsdevel@vger.kernel.org On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > Hi, > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > capable of DAX. The problem boils down to the failure of: > > > > lvm vgcreate -f vg0 /dev/pmem0 > > lvm lvcreate -L 128M -n lv0 vg0 > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > The last command fails like: > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > Failed to lock logical volume vg0/lv0. > > Aborting. Manual intervention required. > > > > And the core of the problem is that volume vg0/lv0 is originally of > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > DAX mounts if not supported". > > > > The question is whether / how this should be fixed. The current inability > > to create snapshots of DAX-capable devices looks weird and the cryptic > > failure makes it even worse (it took me quite a while to understand what is > > failing and why). OTOH I see the rationale behind Ross' change as well. > > Here are the dm-snap changes that went along with the original DAX > support. > > commit b5ab4a9ba55 > commit f6e629bd237 > > Basically, snapshots can be added/removed to DAX-capable devices, but > snapshots need to be mounted without dax option. Yes, and after these two commits things were working. But then commit dbc626597 broke things again so currently snapshotting DAX-capable devices does not work. Just try with 4.18... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-28 7:50 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-28 7:50 UTC (permalink / raw) To: Kani, Toshi Cc: jack@suse.cz, snitzer@redhat.com, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > Hi, > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > capable of DAX. The problem boils down to the failure of: > > > > lvm vgcreate -f vg0 /dev/pmem0 > > lvm lvcreate -L 128M -n lv0 vg0 > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > The last command fails like: > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > Failed to lock logical volume vg0/lv0. > > Aborting. Manual intervention required. > > > > And the core of the problem is that volume vg0/lv0 is originally of > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > DAX mounts if not supported". > > > > The question is whether / how this should be fixed. The current inability > > to create snapshots of DAX-capable devices looks weird and the cryptic > > failure makes it even worse (it took me quite a while to understand what is > > failing and why). OTOH I see the rationale behind Ross' change as well. > > Here are the dm-snap changes that went along with the original DAX > support. > > commit b5ab4a9ba55 > commit f6e629bd237 > > Basically, snapshots can be added/removed to DAX-capable devices, but > snapshots need to be mounted without dax option. Yes, and after these two commits things were working. But then commit dbc626597 broke things again so currently snapshotting DAX-capable devices does not work. Just try with 4.18... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20180828075025.GA17756-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-28 7:50 ` Jan Kara (?) @ 2018-08-28 17:56 ` Mike Snitzer -1 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-08-28 17:56 UTC (permalink / raw) To: Jan Kara Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Tue, Aug 28 2018 at 3:50am -0400, Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > Hi, > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > capable of DAX. The problem boils down to the failure of: > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > lvm lvcreate -L 128M -n lv0 vg0 > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > The last command fails like: > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > Failed to lock logical volume vg0/lv0. > > > Aborting. Manual intervention required. > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > DAX mounts if not supported". > > > > > > The question is whether / how this should be fixed. The current inability > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > failure makes it even worse (it took me quite a while to understand what is > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > Here are the dm-snap changes that went along with the original DAX > > support. > > > > commit b5ab4a9ba55 > > commit f6e629bd237 > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > snapshots need to be mounted without dax option. > > Yes, and after these two commits things were working. But then commit > dbc626597 broke things again so currently snapshotting DAX-capable devices > does not work. Just try with 4.18... Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as such. But commit dbc626597 has caused us to regress.. so we need to fix it. We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was reluctant to do so because it really is unclear how/if we can even support a device switching from DAX to non-DAX while IO is in-flight. DM supports suspending without flushing (via dmsetup suspend --noflush) and that could really be problematic if we leave DAX IO inflight and then switch the DM table such that the DM device no longer supports DAX. I'm open to suggestions. Mike ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-28 17:56 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-08-28 17:56 UTC (permalink / raw) To: Jan Kara Cc: Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Tue, Aug 28 2018 at 3:50am -0400, Jan Kara <jack@suse.cz> wrote: > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > Hi, > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > capable of DAX. The problem boils down to the failure of: > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > lvm lvcreate -L 128M -n lv0 vg0 > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > The last command fails like: > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > Failed to lock logical volume vg0/lv0. > > > Aborting. Manual intervention required. > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > DAX mounts if not supported". > > > > > > The question is whether / how this should be fixed. The current inability > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > failure makes it even worse (it took me quite a while to understand what is > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > Here are the dm-snap changes that went along with the original DAX > > support. > > > > commit b5ab4a9ba55 > > commit f6e629bd237 > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > snapshots need to be mounted without dax option. > > Yes, and after these two commits things were working. But then commit > dbc626597 broke things again so currently snapshotting DAX-capable devices > does not work. Just try with 4.18... Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as such. But commit dbc626597 has caused us to regress.. so we need to fix it. We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was reluctant to do so because it really is unclear how/if we can even support a device switching from DAX to non-DAX while IO is in-flight. DM supports suspending without flushing (via dmsetup suspend --noflush) and that could really be problematic if we leave DAX IO inflight and then switch the DM table such that the DM device no longer supports DAX. I'm open to suggestions. Mike ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-28 17:56 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-08-28 17:56 UTC (permalink / raw) To: Jan Kara Cc: linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Tue, Aug 28 2018 at 3:50am -0400, Jan Kara <jack@suse.cz> wrote: > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > Hi, > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > capable of DAX. The problem boils down to the failure of: > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > lvm lvcreate -L 128M -n lv0 vg0 > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > The last command fails like: > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > Failed to lock logical volume vg0/lv0. > > > Aborting. Manual intervention required. > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > DAX mounts if not supported". > > > > > > The question is whether / how this should be fixed. The current inability > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > failure makes it even worse (it took me quite a while to understand what is > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > Here are the dm-snap changes that went along with the original DAX > > support. > > > > commit b5ab4a9ba55 > > commit f6e629bd237 > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > snapshots need to be mounted without dax option. > > Yes, and after these two commits things were working. But then commit > dbc626597 broke things again so currently snapshotting DAX-capable devices > does not work. Just try with 4.18... Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as such. But commit dbc626597 has caused us to regress.. so we need to fix it. We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was reluctant to do so because it really is unclear how/if we can even support a device switching from DAX to non-DAX while IO is in-flight. DM supports suspending without flushing (via dmsetup suspend --noflush) and that could really be problematic if we leave DAX IO inflight and then switch the DM table such that the DM device no longer supports DAX. I'm open to suggestions. Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20180828175630.GA1197-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-28 17:56 ` Mike Snitzer (?) @ 2018-08-28 22:38 ` Kani, Toshi -1 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-08-28 22:38 UTC (permalink / raw) To: jack-AlSwsSmVLrQ@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org On Tue, 2018-08-28 at 13:56 -0400, Mike Snitzer wrote: > On Tue, Aug 28 2018 at 3:50am -0400, > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > Hi, > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > The last command fails like: > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > Failed to lock logical volume vg0/lv0. > > > > Aborting. Manual intervention required. > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > DAX mounts if not supported". > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > failure makes it even worse (it took me quite a while to understand what is > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > Here are the dm-snap changes that went along with the original DAX > > > support. > > > > > > commit b5ab4a9ba55 > > > commit f6e629bd237 > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > snapshots need to be mounted without dax option. > > > > Yes, and after these two commits things were working. But then commit > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > does not work. Just try with 4.18... > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > such. But commit dbc626597 has caused us to regress.. so we need to fix > it. > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > reluctant to do so because it really is unclear how/if we can even > support a device switching from DAX to non-DAX while IO is in-flight. DM > supports suspending without flushing (via dmsetup suspend --noflush) and > that could really be problematic if we leave DAX IO inflight and then > switch the DM table such that the DM device no longer supports DAX. > > I'm open to suggestions. Right, commit f6e629bd237 is a hack, but I do not have a better idea at this point... For now, I am afraid that reverting commit dbc626597 may be an option. Thanks, -Toshi ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-28 22:38 ` Kani, Toshi 0 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-08-28 22:38 UTC (permalink / raw) To: jack@suse.cz, snitzer@redhat.com Cc: dm-devel@redhat.com, ross.zwisler@linux.intel.com, linux-nvdimm@lists.01.org, dan.j.williams@intel.com, linux-fsdevel@vger.kernel.org On Tue, 2018-08-28 at 13:56 -0400, Mike Snitzer wrote: > On Tue, Aug 28 2018 at 3:50am -0400, > Jan Kara <jack@suse.cz> wrote: > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > Hi, > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > The last command fails like: > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > Failed to lock logical volume vg0/lv0. > > > > Aborting. Manual intervention required. > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > DAX mounts if not supported". > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > failure makes it even worse (it took me quite a while to understand what is > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > Here are the dm-snap changes that went along with the original DAX > > > support. > > > > > > commit b5ab4a9ba55 > > > commit f6e629bd237 > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > snapshots need to be mounted without dax option. > > > > Yes, and after these two commits things were working. But then commit > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > does not work. Just try with 4.18... > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > such. But commit dbc626597 has caused us to regress.. so we need to fix > it. > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > reluctant to do so because it really is unclear how/if we can even > support a device switching from DAX to non-DAX while IO is in-flight. DM > supports suspending without flushing (via dmsetup suspend --noflush) and > that could really be problematic if we leave DAX IO inflight and then > switch the DM table such that the DM device no longer supports DAX. > > I'm open to suggestions. Right, commit f6e629bd237 is a hack, but I do not have a better idea at this point... For now, I am afraid that reverting commit dbc626597 may be an option. Thanks, -Toshi ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-28 22:38 ` Kani, Toshi 0 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-08-28 22:38 UTC (permalink / raw) To: jack@suse.cz, snitzer@redhat.com Cc: linux-fsdevel@vger.kernel.org, dm-devel@redhat.com, linux-nvdimm@lists.01.org On Tue, 2018-08-28 at 13:56 -0400, Mike Snitzer wrote: > On Tue, Aug 28 2018 at 3:50am -0400, > Jan Kara <jack@suse.cz> wrote: > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > Hi, > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > The last command fails like: > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > Failed to lock logical volume vg0/lv0. > > > > Aborting. Manual intervention required. > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > DAX mounts if not supported". > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > failure makes it even worse (it took me quite a while to understand what is > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > Here are the dm-snap changes that went along with the original DAX > > > support. > > > > > > commit b5ab4a9ba55 > > > commit f6e629bd237 > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > snapshots need to be mounted without dax option. > > > > Yes, and after these two commits things were working. But then commit > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > does not work. Just try with 4.18... > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > such. But commit dbc626597 has caused us to regress.. so we need to fix > it. > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > reluctant to do so because it really is unclear how/if we can even > support a device switching from DAX to non-DAX while IO is in-flight. DM > supports suspending without flushing (via dmsetup suspend --noflush) and > that could really be problematic if we leave DAX IO inflight and then > switch the DM table such that the DM device no longer supports DAX. > > I'm open to suggestions. Right, commit f6e629bd237 is a hack, but I do not have a better idea at this point... For now, I am afraid that reverting commit dbc626597 may be an option. Thanks, -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices 2018-08-28 17:56 ` Mike Snitzer (?) @ 2018-08-30 9:30 ` Jan Kara -1 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-30 9:30 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > On Tue, Aug 28 2018 at 3:50am -0400, > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > Hi, > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > The last command fails like: > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > Failed to lock logical volume vg0/lv0. > > > > Aborting. Manual intervention required. > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > DAX mounts if not supported". > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > failure makes it even worse (it took me quite a while to understand what is > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > Here are the dm-snap changes that went along with the original DAX > > > support. > > > > > > commit b5ab4a9ba55 > > > commit f6e629bd237 > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > snapshots need to be mounted without dax option. > > > > Yes, and after these two commits things were working. But then commit > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > does not work. Just try with 4.18... > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > such. But commit dbc626597 has caused us to regress.. so we need to fix > it. > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > reluctant to do so because it really is unclear how/if we can even > support a device switching from DAX to non-DAX while IO is in-flight. DM > supports suspending without flushing (via dmsetup suspend --noflush) and > that could really be problematic if we leave DAX IO inflight and then > switch the DM table such that the DM device no longer supports DAX. Well, changing device from DAX-capable to DAX-incapable is problematic for filesystem on top of it as well. Filesystems simply don't expect this feature of a device can change so they would fail in unexpected ways. Also PFNs from the pmem (DAX-capable) device that are already mapped to user page tables won't magically become unmapped so those processes will still have DAX access to those areas of the device. But, if both original bdev and COW device are DAX-capable, we *should* be able to support snapshotting (and refusing mixing of DAX-capable and DAX-incapable devices in a snapshot is IMHO not very surprising to users). When creating a snapshot of a device, we need to freeze the filesystem using it. That will writeprotect all page tables so we are sure we'll get page faults (and thus ->direct_access requests from DM POV) for each write attempt to any mapping. Then ->direct_access method of snapshot-origin can make sure to copy original contents to the COW-device before returning PFN from ->direct_access. Similarly ->direct_access of COW-device can provide remapped PFN so everything should work seamlessly from user POV. So something like the above would seem like the best solution from user POV. Implementation of the above would not be completely trivial though as far as I'm looking into DM code. We'd have to implement ->direct_access paths for dm-snap and also I have a vague memory ->direct_access is not allowed to sleep these days and DM uses sleeping locks all around... Dan should know how big obstacle would it be to reintroduce the sleeping possibility (I'm not currently aware of any particular problem with that but I'm not paying close attention to those parts of NVDIMM code). Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 9:30 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-30 9:30 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > On Tue, Aug 28 2018 at 3:50am -0400, > Jan Kara <jack@suse.cz> wrote: > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > Hi, > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > The last command fails like: > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > Failed to lock logical volume vg0/lv0. > > > > Aborting. Manual intervention required. > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > DAX mounts if not supported". > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > failure makes it even worse (it took me quite a while to understand what is > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > Here are the dm-snap changes that went along with the original DAX > > > support. > > > > > > commit b5ab4a9ba55 > > > commit f6e629bd237 > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > snapshots need to be mounted without dax option. > > > > Yes, and after these two commits things were working. But then commit > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > does not work. Just try with 4.18... > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > such. But commit dbc626597 has caused us to regress.. so we need to fix > it. > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > reluctant to do so because it really is unclear how/if we can even > support a device switching from DAX to non-DAX while IO is in-flight. DM > supports suspending without flushing (via dmsetup suspend --noflush) and > that could really be problematic if we leave DAX IO inflight and then > switch the DM table such that the DM device no longer supports DAX. Well, changing device from DAX-capable to DAX-incapable is problematic for filesystem on top of it as well. Filesystems simply don't expect this feature of a device can change so they would fail in unexpected ways. Also PFNs from the pmem (DAX-capable) device that are already mapped to user page tables won't magically become unmapped so those processes will still have DAX access to those areas of the device. But, if both original bdev and COW device are DAX-capable, we *should* be able to support snapshotting (and refusing mixing of DAX-capable and DAX-incapable devices in a snapshot is IMHO not very surprising to users). When creating a snapshot of a device, we need to freeze the filesystem using it. That will writeprotect all page tables so we are sure we'll get page faults (and thus ->direct_access requests from DM POV) for each write attempt to any mapping. Then ->direct_access method of snapshot-origin can make sure to copy original contents to the COW-device before returning PFN from ->direct_access. Similarly ->direct_access of COW-device can provide remapped PFN so everything should work seamlessly from user POV. So something like the above would seem like the best solution from user POV. Implementation of the above would not be completely trivial though as far as I'm looking into DM code. We'd have to implement ->direct_access paths for dm-snap and also I have a vague memory ->direct_access is not allowed to sleep these days and DM uses sleeping locks all around... Dan should know how big obstacle would it be to reintroduce the sleeping possibility (I'm not currently aware of any particular problem with that but I'm not paying close attention to those parts of NVDIMM code). Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 9:30 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-30 9:30 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > On Tue, Aug 28 2018 at 3:50am -0400, > Jan Kara <jack@suse.cz> wrote: > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > Hi, > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > The last command fails like: > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > Failed to lock logical volume vg0/lv0. > > > > Aborting. Manual intervention required. > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > DAX mounts if not supported". > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > failure makes it even worse (it took me quite a while to understand what is > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > Here are the dm-snap changes that went along with the original DAX > > > support. > > > > > > commit b5ab4a9ba55 > > > commit f6e629bd237 > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > snapshots need to be mounted without dax option. > > > > Yes, and after these two commits things were working. But then commit > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > does not work. Just try with 4.18... > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > such. But commit dbc626597 has caused us to regress.. so we need to fix > it. > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > reluctant to do so because it really is unclear how/if we can even > support a device switching from DAX to non-DAX while IO is in-flight. DM > supports suspending without flushing (via dmsetup suspend --noflush) and > that could really be problematic if we leave DAX IO inflight and then > switch the DM table such that the DM device no longer supports DAX. Well, changing device from DAX-capable to DAX-incapable is problematic for filesystem on top of it as well. Filesystems simply don't expect this feature of a device can change so they would fail in unexpected ways. Also PFNs from the pmem (DAX-capable) device that are already mapped to user page tables won't magically become unmapped so those processes will still have DAX access to those areas of the device. But, if both original bdev and COW device are DAX-capable, we *should* be able to support snapshotting (and refusing mixing of DAX-capable and DAX-incapable devices in a snapshot is IMHO not very surprising to users). When creating a snapshot of a device, we need to freeze the filesystem using it. That will writeprotect all page tables so we are sure we'll get page faults (and thus ->direct_access requests from DM POV) for each write attempt to any mapping. Then ->direct_access method of snapshot-origin can make sure to copy original contents to the COW-device before returning PFN from ->direct_access. Similarly ->direct_access of COW-device can provide remapped PFN so everything should work seamlessly from user POV. So something like the above would seem like the best solution from user POV. Implementation of the above would not be completely trivial though as far as I'm looking into DM code. We'd have to implement ->direct_access paths for dm-snap and also I have a vague memory ->direct_access is not allowed to sleep these days and DM uses sleeping locks all around... Dan should know how big obstacle would it be to reintroduce the sleeping possibility (I'm not currently aware of any particular problem with that but I'm not paying close attention to those parts of NVDIMM code). Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20180830093028.GC1767-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-30 9:30 ` Jan Kara (?) @ 2018-08-30 18:49 ` Mike Snitzer -1 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-08-30 18:49 UTC (permalink / raw) To: Jan Kara, Mikulas Patocka, Jeff Moyer Cc: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, Aug 30 2018 at 5:30am -0400, Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > On Tue, Aug 28 2018 at 3:50am -0400, > > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > Hi, > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > The last command fails like: > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > Failed to lock logical volume vg0/lv0. > > > > > Aborting. Manual intervention required. > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > DAX mounts if not supported". > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > support. > > > > > > > > commit b5ab4a9ba55 > > > > commit f6e629bd237 > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > snapshots need to be mounted without dax option. > > > > > > Yes, and after these two commits things were working. But then commit > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > does not work. Just try with 4.18... > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > it. > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > reluctant to do so because it really is unclear how/if we can even > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > supports suspending without flushing (via dmsetup suspend --noflush) and > > that could really be problematic if we leave DAX IO inflight and then > > switch the DM table such that the DM device no longer supports DAX. > > Well, changing device from DAX-capable to DAX-incapable is problematic for > filesystem on top of it as well. Filesystems simply don't expect this > feature of a device can change so they would fail in unexpected ways. Also > PFNs from the pmem (DAX-capable) device that are already mapped to user page > tables won't magically become unmapped so those processes will still have > DAX access to those areas of the device. > > But, if both original bdev and COW device are DAX-capable, we *should* be > able to support snapshotting (and refusing mixing of DAX-capable and > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > When creating a snapshot of a device, we need to freeze the filesystem > using it. That will writeprotect all page tables so we are sure we'll get > page faults (and thus ->direct_access requests from DM POV) for each write > attempt to any mapping. Then ->direct_access method of snapshot-origin can > make sure to copy original contents to the COW-device before returning PFN > from ->direct_access. Similarly ->direct_access of COW-device can provide > remapped PFN so everything should work seamlessly from user POV. > > So something like the above would seem like the best solution from user > POV. Implementation of the above would not be completely trivial though as > far as I'm looking into DM code. We'd have to implement ->direct_access > paths for dm-snap and also I have a vague memory ->direct_access is not > allowed to sleep these days and DM uses sleeping locks all around... Dan > should know how big obstacle would it be to reintroduce the sleeping > possibility (I'm not currently aware of any particular problem with that > but I'm not paying close attention to those parts of NVDIMM code). Thanks for these details Jan. Think Dan is on sabbatical so we'll need Ross to weigh in. As you point out, how are the upper layers (e.g. filesystems) supposed to reliably cope with this runtime switch to from DAX to non-DAX access? It does look like we'll need the more elaborate work you outlined above. It could be that Mikulas will have interest, DAX expertise and time to do the work. Restating the issue: 4.18 commit dbc626597 switched drivers/md/dm-table.cdevice_supports_dax() to perform a much more detailed verification of the device's DAX capabilities by calling bdev_dax_supported() -- which will actually issue read IO via dax_direct_access() to validate the DAX support. dm-snapshot-origin's origin_direct_access() returns -EIO. When trying to create a snapshot of a DAX enabled linear device, this results in the following error: kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. This is because the active DM device's table is being switched from using the linear target to snapshot-origin. Because the corresponding DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED (again because bdev_dax_supported()'s call to dm-snapshot-origin's origin_direct_access() returns -EIO). In general I _never_ should have taken commit f6e629bd237 ("dm snap: add fake origin_direct_access"). It gave the elusion that DAX is supported by dm-snapshot-origin when in reality it simply returns -EIO. Expecting that this will "just work" because the bio-based path would be used instead is extremely fragile. Until we properly add DAX support to dm-snapshot I'm afraid we really do need to tolerate this "regression". Since reality is the original support for snapshot of a DAX DM device never worked in a robust way. I'm running the risk of making peoples' heads explode but I cannot just drop everything and scramble to implement all the required DAX changes in dm-snapshot. Contributions are welcome! Mike ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 18:49 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-08-30 18:49 UTC (permalink / raw) To: Jan Kara, Mikulas Patocka, Jeff Moyer Cc: Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Thu, Aug 30 2018 at 5:30am -0400, Jan Kara <jack@suse.cz> wrote: > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > On Tue, Aug 28 2018 at 3:50am -0400, > > Jan Kara <jack@suse.cz> wrote: > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > Hi, > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > The last command fails like: > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > Failed to lock logical volume vg0/lv0. > > > > > Aborting. Manual intervention required. > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > DAX mounts if not supported". > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > support. > > > > > > > > commit b5ab4a9ba55 > > > > commit f6e629bd237 > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > snapshots need to be mounted without dax option. > > > > > > Yes, and after these two commits things were working. But then commit > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > does not work. Just try with 4.18... > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > it. > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > reluctant to do so because it really is unclear how/if we can even > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > supports suspending without flushing (via dmsetup suspend --noflush) and > > that could really be problematic if we leave DAX IO inflight and then > > switch the DM table such that the DM device no longer supports DAX. > > Well, changing device from DAX-capable to DAX-incapable is problematic for > filesystem on top of it as well. Filesystems simply don't expect this > feature of a device can change so they would fail in unexpected ways. Also > PFNs from the pmem (DAX-capable) device that are already mapped to user page > tables won't magically become unmapped so those processes will still have > DAX access to those areas of the device. > > But, if both original bdev and COW device are DAX-capable, we *should* be > able to support snapshotting (and refusing mixing of DAX-capable and > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > When creating a snapshot of a device, we need to freeze the filesystem > using it. That will writeprotect all page tables so we are sure we'll get > page faults (and thus ->direct_access requests from DM POV) for each write > attempt to any mapping. Then ->direct_access method of snapshot-origin can > make sure to copy original contents to the COW-device before returning PFN > from ->direct_access. Similarly ->direct_access of COW-device can provide > remapped PFN so everything should work seamlessly from user POV. > > So something like the above would seem like the best solution from user > POV. Implementation of the above would not be completely trivial though as > far as I'm looking into DM code. We'd have to implement ->direct_access > paths for dm-snap and also I have a vague memory ->direct_access is not > allowed to sleep these days and DM uses sleeping locks all around... Dan > should know how big obstacle would it be to reintroduce the sleeping > possibility (I'm not currently aware of any particular problem with that > but I'm not paying close attention to those parts of NVDIMM code). Thanks for these details Jan. Think Dan is on sabbatical so we'll need Ross to weigh in. As you point out, how are the upper layers (e.g. filesystems) supposed to reliably cope with this runtime switch to from DAX to non-DAX access? It does look like we'll need the more elaborate work you outlined above. It could be that Mikulas will have interest, DAX expertise and time to do the work. Restating the issue: 4.18 commit dbc626597 switched drivers/md/dm-table.cdevice_supports_dax() to perform a much more detailed verification of the device's DAX capabilities by calling bdev_dax_supported() -- which will actually issue read IO via dax_direct_access() to validate the DAX support. dm-snapshot-origin's origin_direct_access() returns -EIO. When trying to create a snapshot of a DAX enabled linear device, this results in the following error: kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. This is because the active DM device's table is being switched from using the linear target to snapshot-origin. Because the corresponding DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED (again because bdev_dax_supported()'s call to dm-snapshot-origin's origin_direct_access() returns -EIO). In general I _never_ should have taken commit f6e629bd237 ("dm snap: add fake origin_direct_access"). It gave the elusion that DAX is supported by dm-snapshot-origin when in reality it simply returns -EIO. Expecting that this will "just work" because the bio-based path would be used instead is extremely fragile. Until we properly add DAX support to dm-snapshot I'm afraid we really do need to tolerate this "regression". Since reality is the original support for snapshot of a DAX DM device never worked in a robust way. I'm running the risk of making peoples' heads explode but I cannot just drop everything and scramble to implement all the required DAX changes in dm-snapshot. Contributions are welcome! Mike ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 18:49 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-08-30 18:49 UTC (permalink / raw) To: Jan Kara, Mikulas Patocka, Jeff Moyer Cc: linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Thu, Aug 30 2018 at 5:30am -0400, Jan Kara <jack@suse.cz> wrote: > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > On Tue, Aug 28 2018 at 3:50am -0400, > > Jan Kara <jack@suse.cz> wrote: > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > Hi, > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > The last command fails like: > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > Failed to lock logical volume vg0/lv0. > > > > > Aborting. Manual intervention required. > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > DAX mounts if not supported". > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > support. > > > > > > > > commit b5ab4a9ba55 > > > > commit f6e629bd237 > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > snapshots need to be mounted without dax option. > > > > > > Yes, and after these two commits things were working. But then commit > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > does not work. Just try with 4.18... > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > it. > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > reluctant to do so because it really is unclear how/if we can even > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > supports suspending without flushing (via dmsetup suspend --noflush) and > > that could really be problematic if we leave DAX IO inflight and then > > switch the DM table such that the DM device no longer supports DAX. > > Well, changing device from DAX-capable to DAX-incapable is problematic for > filesystem on top of it as well. Filesystems simply don't expect this > feature of a device can change so they would fail in unexpected ways. Also > PFNs from the pmem (DAX-capable) device that are already mapped to user page > tables won't magically become unmapped so those processes will still have > DAX access to those areas of the device. > > But, if both original bdev and COW device are DAX-capable, we *should* be > able to support snapshotting (and refusing mixing of DAX-capable and > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > When creating a snapshot of a device, we need to freeze the filesystem > using it. That will writeprotect all page tables so we are sure we'll get > page faults (and thus ->direct_access requests from DM POV) for each write > attempt to any mapping. Then ->direct_access method of snapshot-origin can > make sure to copy original contents to the COW-device before returning PFN > from ->direct_access. Similarly ->direct_access of COW-device can provide > remapped PFN so everything should work seamlessly from user POV. > > So something like the above would seem like the best solution from user > POV. Implementation of the above would not be completely trivial though as > far as I'm looking into DM code. We'd have to implement ->direct_access > paths for dm-snap and also I have a vague memory ->direct_access is not > allowed to sleep these days and DM uses sleeping locks all around... Dan > should know how big obstacle would it be to reintroduce the sleeping > possibility (I'm not currently aware of any particular problem with that > but I'm not paying close attention to those parts of NVDIMM code). Thanks for these details Jan. Think Dan is on sabbatical so we'll need Ross to weigh in. As you point out, how are the upper layers (e.g. filesystems) supposed to reliably cope with this runtime switch to from DAX to non-DAX access? It does look like we'll need the more elaborate work you outlined above. It could be that Mikulas will have interest, DAX expertise and time to do the work. Restating the issue: 4.18 commit dbc626597 switched drivers/md/dm-table.cdevice_supports_dax() to perform a much more detailed verification of the device's DAX capabilities by calling bdev_dax_supported() -- which will actually issue read IO via dax_direct_access() to validate the DAX support. dm-snapshot-origin's origin_direct_access() returns -EIO. When trying to create a snapshot of a DAX enabled linear device, this results in the following error: kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. This is because the active DM device's table is being switched from using the linear target to snapshot-origin. Because the corresponding DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED (again because bdev_dax_supported()'s call to dm-snapshot-origin's origin_direct_access() returns -EIO). In general I _never_ should have taken commit f6e629bd237 ("dm snap: add fake origin_direct_access"). It gave the elusion that DAX is supported by dm-snapshot-origin when in reality it simply returns -EIO. Expecting that this will "just work" because the bio-based path would be used instead is extremely fragile. Until we properly add DAX support to dm-snapshot I'm afraid we really do need to tolerate this "regression". Since reality is the original support for snapshot of a DAX DM device never worked in a robust way. I'm running the risk of making peoples' heads explode but I cannot just drop everything and scramble to implement all the required DAX changes in dm-snapshot. Contributions are welcome! Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20180830184907.GA14867-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-30 18:49 ` Mike Snitzer (?) @ 2018-08-30 19:32 ` Jeff Moyer -1 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:32 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. Agreed. -Jeff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 19:32 ` Jeff Moyer 0 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:32 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, Mikulas Patocka, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com Mike Snitzer <snitzer@redhat.com> writes: > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. Agreed. -Jeff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 19:32 ` Jeff Moyer 0 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:32 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org Mike Snitzer <snitzer@redhat.com> writes: > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. Agreed. -Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <x494lfbabwi.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-30 19:32 ` Jeff Moyer (?) @ 2018-08-30 19:47 ` Mikulas Patocka -1 siblings, 0 replies; 83+ messages in thread From: Mikulas Patocka @ 2018-08-30 19:47 UTC (permalink / raw) To: Jeff Moyer Cc: Jan Kara, Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, 30 Aug 2018, Jeff Moyer wrote: > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > need to tolerate this "regression". Since reality is the original > > support for snapshot of a DAX DM device never worked in a robust way. > > Agreed. > > -Jeff You can't support dax on snapshot - if someone maps a block and the block needs to be moved, then what? Mikulas ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 19:47 ` Mikulas Patocka 0 siblings, 0 replies; 83+ messages in thread From: Mikulas Patocka @ 2018-08-30 19:47 UTC (permalink / raw) To: Jeff Moyer Cc: Mike Snitzer, Jan Kara, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Thu, 30 Aug 2018, Jeff Moyer wrote: > Mike Snitzer <snitzer@redhat.com> writes: > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > need to tolerate this "regression". Since reality is the original > > support for snapshot of a DAX DM device never worked in a robust way. > > Agreed. > > -Jeff You can't support dax on snapshot - if someone maps a block and the block needs to be moved, then what? Mikulas ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 19:47 ` Mikulas Patocka 0 siblings, 0 replies; 83+ messages in thread From: Mikulas Patocka @ 2018-08-30 19:47 UTC (permalink / raw) To: Jeff Moyer Cc: Jan Kara, Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Thu, 30 Aug 2018, Jeff Moyer wrote: > Mike Snitzer <snitzer@redhat.com> writes: > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > need to tolerate this "regression". Since reality is the original > > support for snapshot of a DAX DM device never worked in a robust way. > > Agreed. > > -Jeff You can't support dax on snapshot - if someone maps a block and the block needs to be moved, then what? Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <alpine.LRH.2.02.1808301545200.30950-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-30 19:47 ` Mikulas Patocka (?) @ 2018-08-30 19:53 ` Jeff Moyer -1 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:53 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Mikulas Patocka <mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > On Thu, 30 Aug 2018, Jeff Moyer wrote: > >> Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: >> >> > Until we properly add DAX support to dm-snapshot I'm afraid we really do >> > need to tolerate this "regression". Since reality is the original >> > support for snapshot of a DAX DM device never worked in a robust way. >> >> Agreed. >> >> -Jeff > > You can't support dax on snapshot - if someone maps a block and the block > needs to be moved, then what? That's exactly the point I brought up in my reply to Jan. You'd have to unmap all mappings of the page/block. -Jeff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 19:53 ` Jeff Moyer 0 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:53 UTC (permalink / raw) To: Mikulas Patocka Cc: Mike Snitzer, Jan Kara, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com Mikulas Patocka <mpatocka@redhat.com> writes: > On Thu, 30 Aug 2018, Jeff Moyer wrote: > >> Mike Snitzer <snitzer@redhat.com> writes: >> >> > Until we properly add DAX support to dm-snapshot I'm afraid we really do >> > need to tolerate this "regression". Since reality is the original >> > support for snapshot of a DAX DM device never worked in a robust way. >> >> Agreed. >> >> -Jeff > > You can't support dax on snapshot - if someone maps a block and the block > needs to be moved, then what? That's exactly the point I brought up in my reply to Jan. You'd have to unmap all mappings of the page/block. -Jeff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 19:53 ` Jeff Moyer 0 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:53 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org Mikulas Patocka <mpatocka@redhat.com> writes: > On Thu, 30 Aug 2018, Jeff Moyer wrote: > >> Mike Snitzer <snitzer@redhat.com> writes: >> >> > Until we properly add DAX support to dm-snapshot I'm afraid we really do >> > need to tolerate this "regression". Since reality is the original >> > support for snapshot of a DAX DM device never worked in a robust way. >> >> Agreed. >> >> -Jeff > > You can't support dax on snapshot - if someone maps a block and the block > needs to be moved, then what? That's exactly the point I brought up in my reply to Jan. You'd have to unmap all mappings of the page/block. -Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices 2018-08-30 19:47 ` Mikulas Patocka (?) @ 2018-08-30 23:38 ` Dave Chinner -1 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-08-30 23:38 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > need to tolerate this "regression". Since reality is the original > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > Agreed. > > > > -Jeff > > You can't support dax on snapshot - if someone maps a block and the block > needs to be moved, then what? This is only a problem for access via mmap and page faults. At the filesystem level, it's no different to the existing direct IO algorithm for read/write IO - we simply allocate new space, copy the data we need to copy into the new space (may be no copy needed), and then write the new data into the new space. I'm pretty sure that for bio-based IO to dm-snapshot devices the algorithm will be exactly the same. However, for direct access via mmap, we have to modify how the userspace virtual address is mapped to the physical location. IOWs, during the COW operation, we have to invalidate all existing user mappings we have for that physical address. This means we have to do an invalidation after the allocate/copy part of the COW operation. If we are doing this during a page fault, it means we'll probably have to restart the page fault so it can look up the new physical address associated with the faulting user address. After we've done the invalidation, any new (or restarted) page fault finds the location of new copy we just made, maps it into the user address space, updates the ptes and we're all good. Well, that's the theory. We haven't implemented this for XFS yet, so it might end up a little different, and we might yet hit unexpected problems (it's DAX, that's what happens :/). It's a whole different ballgame for a dm-snapshot device - block devices are completely unaware of page faults to DAX file mappings. We'll need the filesystem to be aware it's on a remappable block device, and when we take a DAX write fault we'll need to ask the underlying device to remap the block and treat it like the filesystem COW case above. We'll need to do this remap/invalidate dance in the write IO path, too, because a COW by the block device is no different to filesystem COW in that path. Basically, it's the same algorithm as the filesystem COW case, we just get the physical location of the data and the notification of block changing physical location from a different interface. Hmmmm. ISTR that someone has been making a few noises recently about virtual block address space mapping interfaces that could help solve this problem.... Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 23:38 ` Dave Chinner 0 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-08-30 23:38 UTC (permalink / raw) To: Mikulas Patocka Cc: Jeff Moyer, Mike Snitzer, Jan Kara, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > Mike Snitzer <snitzer@redhat.com> writes: > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > need to tolerate this "regression". Since reality is the original > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > Agreed. > > > > -Jeff > > You can't support dax on snapshot - if someone maps a block and the block > needs to be moved, then what? This is only a problem for access via mmap and page faults. At the filesystem level, it's no different to the existing direct IO algorithm for read/write IO - we simply allocate new space, copy the data we need to copy into the new space (may be no copy needed), and then write the new data into the new space. I'm pretty sure that for bio-based IO to dm-snapshot devices the algorithm will be exactly the same. However, for direct access via mmap, we have to modify how the userspace virtual address is mapped to the physical location. IOWs, during the COW operation, we have to invalidate all existing user mappings we have for that physical address. This means we have to do an invalidation after the allocate/copy part of the COW operation. If we are doing this during a page fault, it means we'll probably have to restart the page fault so it can look up the new physical address associated with the faulting user address. After we've done the invalidation, any new (or restarted) page fault finds the location of new copy we just made, maps it into the user address space, updates the ptes and we're all good. Well, that's the theory. We haven't implemented this for XFS yet, so it might end up a little different, and we might yet hit unexpected problems (it's DAX, that's what happens :/). It's a whole different ballgame for a dm-snapshot device - block devices are completely unaware of page faults to DAX file mappings. We'll need the filesystem to be aware it's on a remappable block device, and when we take a DAX write fault we'll need to ask the underlying device to remap the block and treat it like the filesystem COW case above. We'll need to do this remap/invalidate dance in the write IO path, too, because a COW by the block device is no different to filesystem COW in that path. Basically, it's the same algorithm as the filesystem COW case, we just get the physical location of the data and the notification of block changing physical location from a different interface. Hmmmm. ISTR that someone has been making a few noises recently about virtual block address space mapping interfaces that could help solve this problem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 23:38 ` Dave Chinner 0 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-08-30 23:38 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > Mike Snitzer <snitzer@redhat.com> writes: > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > need to tolerate this "regression". Since reality is the original > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > Agreed. > > > > -Jeff > > You can't support dax on snapshot - if someone maps a block and the block > needs to be moved, then what? This is only a problem for access via mmap and page faults. At the filesystem level, it's no different to the existing direct IO algorithm for read/write IO - we simply allocate new space, copy the data we need to copy into the new space (may be no copy needed), and then write the new data into the new space. I'm pretty sure that for bio-based IO to dm-snapshot devices the algorithm will be exactly the same. However, for direct access via mmap, we have to modify how the userspace virtual address is mapped to the physical location. IOWs, during the COW operation, we have to invalidate all existing user mappings we have for that physical address. This means we have to do an invalidation after the allocate/copy part of the COW operation. If we are doing this during a page fault, it means we'll probably have to restart the page fault so it can look up the new physical address associated with the faulting user address. After we've done the invalidation, any new (or restarted) page fault finds the location of new copy we just made, maps it into the user address space, updates the ptes and we're all good. Well, that's the theory. We haven't implemented this for XFS yet, so it might end up a little different, and we might yet hit unexpected problems (it's DAX, that's what happens :/). It's a whole different ballgame for a dm-snapshot device - block devices are completely unaware of page faults to DAX file mappings. We'll need the filesystem to be aware it's on a remappable block device, and when we take a DAX write fault we'll need to ask the underlying device to remap the block and treat it like the filesystem COW case above. We'll need to do this remap/invalidate dance in the write IO path, too, because a COW by the block device is no different to filesystem COW in that path. Basically, it's the same algorithm as the filesystem COW case, we just get the physical location of the data and the notification of block changing physical location from a different interface. Hmmmm. ISTR that someone has been making a few noises recently about virtual block address space mapping interfaces that could help solve this problem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices 2018-08-30 23:38 ` Dave Chinner (?) @ 2018-08-31 9:42 ` Jan Kara -1 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:42 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri 31-08-18 09:38:09, Dave Chinner wrote: > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > > > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > > > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > > need to tolerate this "regression". Since reality is the original > > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > > > Agreed. > > > > > > -Jeff > > > > You can't support dax on snapshot - if someone maps a block and the block > > needs to be moved, then what? > > This is only a problem for access via mmap and page faults. > > At the filesystem level, it's no different to the existing direct IO > algorithm for read/write IO - we simply allocate new space, copy the > data we need to copy into the new space (may be no copy needed), and > then write the new data into the new space. I'm pretty sure that for > bio-based IO to dm-snapshot devices the algorithm will be exactly > the same. > > However, for direct access via mmap, we have to modify how the > userspace virtual address is mapped to the physical location. IOWs, > during the COW operation, we have to invalidate all existing user > mappings we have for that physical address. This means we have to do > an invalidation after the allocate/copy part of the COW operation. > > If we are doing this during a page fault, it means we'll probably > have to restart the page fault so it can look up the new physical > address associated with the faulting user address. After we've done > the invalidation, any new (or restarted) page fault finds the > location of new copy we just made, maps it into the user address > space, updates the ptes and we're all good. > > Well, that's the theory. We haven't implemented this for XFS yet, so > it might end up a little different, and we might yet hit unexpected > problems (it's DAX, that's what happens :/). Yes, that's outline of a plan :) > It's a whole different ballgame for a dm-snapshot device - block > devices are completely unaware of page faults to DAX file mappings. Actually, block devices are not completely unaware of DAX page faults - they will get ->direct_access callback for the fault range. It does not currently convey enough information - we also need to inform the block device whether it is read or write. But that's about all that's needed to add AFAICT. And by comparing returned PFN with the one we have stored in the radix tree (which we have if that file offset is mapped by anybody), the filesystem / DAX code can tell whether remapping happened and do the unmapping. > We'll need the filesystem to be aware it's on a remappable block > device, and when we take a DAX write fault we'll need to ask the > underlying device to remap the block and treat it like the > filesystem COW case above. We'll need to do this remap/invalidate > dance in the write IO path, too, because a COW by the block device > is no different to filesystem COW in that path. Right. > Basically, it's the same algorithm as the filesystem COW case, we > just get the physical location of the data and the notification of > block changing physical location from a different interface. > > Hmmmm. ISTR that someone has been making a few noises recently about > virtual block address space mapping interfaces that could help solve > this problem.... :-) Yes, virtual block address space mapping would be a nice solution for this. But that's a bit larger overhaul, isn't it? Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-31 9:42 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:42 UTC (permalink / raw) To: Dave Chinner Cc: Mikulas Patocka, Jeff Moyer, Mike Snitzer, Jan Kara, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Fri 31-08-18 09:38:09, Dave Chinner wrote: > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > > > Mike Snitzer <snitzer@redhat.com> writes: > > > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > > need to tolerate this "regression". Since reality is the original > > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > > > Agreed. > > > > > > -Jeff > > > > You can't support dax on snapshot - if someone maps a block and the block > > needs to be moved, then what? > > This is only a problem for access via mmap and page faults. > > At the filesystem level, it's no different to the existing direct IO > algorithm for read/write IO - we simply allocate new space, copy the > data we need to copy into the new space (may be no copy needed), and > then write the new data into the new space. I'm pretty sure that for > bio-based IO to dm-snapshot devices the algorithm will be exactly > the same. > > However, for direct access via mmap, we have to modify how the > userspace virtual address is mapped to the physical location. IOWs, > during the COW operation, we have to invalidate all existing user > mappings we have for that physical address. This means we have to do > an invalidation after the allocate/copy part of the COW operation. > > If we are doing this during a page fault, it means we'll probably > have to restart the page fault so it can look up the new physical > address associated with the faulting user address. After we've done > the invalidation, any new (or restarted) page fault finds the > location of new copy we just made, maps it into the user address > space, updates the ptes and we're all good. > > Well, that's the theory. We haven't implemented this for XFS yet, so > it might end up a little different, and we might yet hit unexpected > problems (it's DAX, that's what happens :/). Yes, that's outline of a plan :) > It's a whole different ballgame for a dm-snapshot device - block > devices are completely unaware of page faults to DAX file mappings. Actually, block devices are not completely unaware of DAX page faults - they will get ->direct_access callback for the fault range. It does not currently convey enough information - we also need to inform the block device whether it is read or write. But that's about all that's needed to add AFAICT. And by comparing returned PFN with the one we have stored in the radix tree (which we have if that file offset is mapped by anybody), the filesystem / DAX code can tell whether remapping happened and do the unmapping. > We'll need the filesystem to be aware it's on a remappable block > device, and when we take a DAX write fault we'll need to ask the > underlying device to remap the block and treat it like the > filesystem COW case above. We'll need to do this remap/invalidate > dance in the write IO path, too, because a COW by the block device > is no different to filesystem COW in that path. Right. > Basically, it's the same algorithm as the filesystem COW case, we > just get the physical location of the data and the notification of > block changing physical location from a different interface. > > Hmmmm. ISTR that someone has been making a few noises recently about > virtual block address space mapping interfaces that could help solve > this problem.... :-) Yes, virtual block address space mapping would be a nice solution for this. But that's a bit larger overhaul, isn't it? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-31 9:42 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:42 UTC (permalink / raw) To: Dave Chinner Cc: Jan Kara, Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Fri 31-08-18 09:38:09, Dave Chinner wrote: > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > > > Mike Snitzer <snitzer@redhat.com> writes: > > > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > > need to tolerate this "regression". Since reality is the original > > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > > > Agreed. > > > > > > -Jeff > > > > You can't support dax on snapshot - if someone maps a block and the block > > needs to be moved, then what? > > This is only a problem for access via mmap and page faults. > > At the filesystem level, it's no different to the existing direct IO > algorithm for read/write IO - we simply allocate new space, copy the > data we need to copy into the new space (may be no copy needed), and > then write the new data into the new space. I'm pretty sure that for > bio-based IO to dm-snapshot devices the algorithm will be exactly > the same. > > However, for direct access via mmap, we have to modify how the > userspace virtual address is mapped to the physical location. IOWs, > during the COW operation, we have to invalidate all existing user > mappings we have for that physical address. This means we have to do > an invalidation after the allocate/copy part of the COW operation. > > If we are doing this during a page fault, it means we'll probably > have to restart the page fault so it can look up the new physical > address associated with the faulting user address. After we've done > the invalidation, any new (or restarted) page fault finds the > location of new copy we just made, maps it into the user address > space, updates the ptes and we're all good. > > Well, that's the theory. We haven't implemented this for XFS yet, so > it might end up a little different, and we might yet hit unexpected > problems (it's DAX, that's what happens :/). Yes, that's outline of a plan :) > It's a whole different ballgame for a dm-snapshot device - block > devices are completely unaware of page faults to DAX file mappings. Actually, block devices are not completely unaware of DAX page faults - they will get ->direct_access callback for the fault range. It does not currently convey enough information - we also need to inform the block device whether it is read or write. But that's about all that's needed to add AFAICT. And by comparing returned PFN with the one we have stored in the radix tree (which we have if that file offset is mapped by anybody), the filesystem / DAX code can tell whether remapping happened and do the unmapping. > We'll need the filesystem to be aware it's on a remappable block > device, and when we take a DAX write fault we'll need to ask the > underlying device to remap the block and treat it like the > filesystem COW case above. We'll need to do this remap/invalidate > dance in the write IO path, too, because a COW by the block device > is no different to filesystem COW in that path. Right. > Basically, it's the same algorithm as the filesystem COW case, we > just get the physical location of the data and the notification of > block changing physical location from a different interface. > > Hmmmm. ISTR that someone has been making a few noises recently about > virtual block address space mapping interfaces that could help solve > this problem.... :-) Yes, virtual block address space mapping would be a nice solution for this. But that's a bit larger overhaul, isn't it? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices 2018-08-31 9:42 ` Jan Kara (?) @ 2018-09-05 1:25 ` Dave Chinner -1 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-09-05 1:25 UTC (permalink / raw) To: Jan Kara Cc: Mike Snitzer, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Jeff Moyer, Mikulas Patocka, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Fri, Aug 31, 2018 at 11:42:55AM +0200, Jan Kara wrote: > On Fri 31-08-18 09:38:09, Dave Chinner wrote: > > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > You can't support dax on snapshot - if someone maps a block and the block > > > needs to be moved, then what? > > > > This is only a problem for access via mmap and page faults. .... > > It's a whole different ballgame for a dm-snapshot device - block > > devices are completely unaware of page faults to DAX file mappings. > > Actually, block devices are not completely unaware of DAX page faults - > they will get ->direct_access callback for the fault range. It does not > currently convey enough information - we also need to inform the block > device whether it is read or write. But that's about all that's needed to > add AFAICT. And by comparing returned PFN with the one we have stored in > the radix tree (which we have if that file offset is mapped by anybody), > the filesystem / DAX code can tell whether remapping happened and do the > unmapping. I forgot about the direct access call. But it seems like a hack to redefine the simple, fast sector-to-pfn translation into a slow and potentially resource hungry interface for physical storage reallocation. Doing storage layer COW operations inside direct_access takes us straight back to the bad ways of get_block() interfaces. We moved all the filesystem allocation to iomap so that the storage management is separated from the mm/physical address translation side of DAX - doing block device storage management operations inside ->direct_access effectively reverts that separation and so just seems like a hack to me. Oh, right, DAX. Silly me. :/ Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-09-05 1:25 ` Dave Chinner 0 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-09-05 1:25 UTC (permalink / raw) To: Jan Kara Cc: Mikulas Patocka, Jeff Moyer, Mike Snitzer, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Fri, Aug 31, 2018 at 11:42:55AM +0200, Jan Kara wrote: > On Fri 31-08-18 09:38:09, Dave Chinner wrote: > > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > You can't support dax on snapshot - if someone maps a block and the block > > > needs to be moved, then what? > > > > This is only a problem for access via mmap and page faults. .... > > It's a whole different ballgame for a dm-snapshot device - block > > devices are completely unaware of page faults to DAX file mappings. > > Actually, block devices are not completely unaware of DAX page faults - > they will get ->direct_access callback for the fault range. It does not > currently convey enough information - we also need to inform the block > device whether it is read or write. But that's about all that's needed to > add AFAICT. And by comparing returned PFN with the one we have stored in > the radix tree (which we have if that file offset is mapped by anybody), > the filesystem / DAX code can tell whether remapping happened and do the > unmapping. I forgot about the direct access call. But it seems like a hack to redefine the simple, fast sector-to-pfn translation into a slow and potentially resource hungry interface for physical storage reallocation. Doing storage layer COW operations inside direct_access takes us straight back to the bad ways of get_block() interfaces. We moved all the filesystem allocation to iomap so that the storage management is separated from the mm/physical address translation side of DAX - doing block device storage management operations inside ->direct_access effectively reverts that separation and so just seems like a hack to me. Oh, right, DAX. Silly me. :/ Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-09-05 1:25 ` Dave Chinner 0 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-09-05 1:25 UTC (permalink / raw) To: Jan Kara Cc: Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Fri, Aug 31, 2018 at 11:42:55AM +0200, Jan Kara wrote: > On Fri 31-08-18 09:38:09, Dave Chinner wrote: > > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > You can't support dax on snapshot - if someone maps a block and the block > > > needs to be moved, then what? > > > > This is only a problem for access via mmap and page faults. .... > > It's a whole different ballgame for a dm-snapshot device - block > > devices are completely unaware of page faults to DAX file mappings. > > Actually, block devices are not completely unaware of DAX page faults - > they will get ->direct_access callback for the fault range. It does not > currently convey enough information - we also need to inform the block > device whether it is read or write. But that's about all that's needed to > add AFAICT. And by comparing returned PFN with the one we have stored in > the radix tree (which we have if that file offset is mapped by anybody), > the filesystem / DAX code can tell whether remapping happened and do the > unmapping. I forgot about the direct access call. But it seems like a hack to redefine the simple, fast sector-to-pfn translation into a slow and potentially resource hungry interface for physical storage reallocation. Doing storage layer COW operations inside direct_access takes us straight back to the bad ways of get_block() interfaces. We moved all the filesystem allocation to iomap so that the storage management is separated from the mm/physical address translation side of DAX - doing block device storage management operations inside ->direct_access effectively reverts that separation and so just seems like a hack to me. Oh, right, DAX. Silly me. :/ Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20180831094255.GB11622-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-31 9:42 ` Jan Kara (?) @ 2018-12-12 16:11 ` Huaisheng Ye -1 siblings, 0 replies; 83+ messages in thread From: Huaisheng Ye @ 2018-12-12 16:11 UTC (permalink / raw) To: Jan Kara Cc: Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, chengnt, Dave Chinner, colyli, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org ---- On Fri, 31 Aug 2018 17:42:55 +0800 Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote ---- > On Fri 31-08-18 09:38:09, Dave Chinner wrote: > > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > > > > > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > > > > > Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > > > > > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > > > need to tolerate this "regression". Since reality is the original > > > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > > > > > Agreed. > > > > > > > > -Jeff > > > > > > You can't support dax on snapshot - if someone maps a block and the block > > > needs to be moved, then what? > > > > This is only a problem for access via mmap and page faults. > > > > At the filesystem level, it's no different to the existing direct IO > > algorithm for read/write IO - we simply allocate new space, copy the > > data we need to copy into the new space (may be no copy needed), and > > then write the new data into the new space. I'm pretty sure that for > > bio-based IO to dm-snapshot devices the algorithm will be exactly > > the same. > > > > However, for direct access via mmap, we have to modify how the > > userspace virtual address is mapped to the physical location. IOWs, > > during the COW operation, we have to invalidate all existing user > > mappings we have for that physical address. This means we have to do > > an invalidation after the allocate/copy part of the COW operation. > > > > If we are doing this during a page fault, it means we'll probably > > have to restart the page fault so it can look up the new physical > > address associated with the faulting user address. After we've done > > the invalidation, any new (or restarted) page fault finds the > > location of new copy we just made, maps it into the user address > > space, updates the ptes and we're all good. > > > > Well, that's the theory. We haven't implemented this for XFS yet, so > > it might end up a little different, and we might yet hit unexpected > > problems (it's DAX, that's what happens :/). > > Yes, that's outline of a plan :) > > > It's a whole different ballgame for a dm-snapshot device - block > > devices are completely unaware of page faults to DAX file mappings. > > Actually, block devices are not completely unaware of DAX page faults - > they will get ->direct_access callback for the fault range. It does not > currently convey enough information - we also need to inform the block > device whether it is read or write. But that's about all that's needed to > add AFAICT. And by comparing returned PFN with the one we have stored in > the radix tree (which we have if that file offset is mapped by anybody), > the filesystem / DAX code can tell whether remapping happened and do the > unmapping. Hi Jan, I am trying to investigate how to make dm-snapshot to support DAX, and I dropped a patchset to upstream for comments. Any suggestion is welcome. # https://lkml.org/lkml/2018/11/21/281 In the beginning, I haven't considered the situation of mmap write faults. From Dan's reply and this email thread, now I have a more clear understanding. The question is that, even the virtual dm block device has been informed that the mmap may have write operations through PROT_WRITE, if userspace directly operate the virtual address of origin device like memcpy, dm-snapshot doesn't have chance to detect this behavior. Although dm-snapshot can have chance to prepare a COW area to back up origin's blocks within ->direct_access callback for the fault range, how can it to have opportunity to read the data from origin device and save it to COW? --- Cheers, Huaisheng Ye ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 16:11 ` Huaisheng Ye 0 siblings, 0 replies; 83+ messages in thread From: Huaisheng Ye @ 2018-12-12 16:11 UTC (permalink / raw) To: Jan Kara Cc: Dave Chinner, Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org, chengnt, yehs1, colyli ---- On Fri, 31 Aug 2018 17:42:55 +0800 Jan Kara <jack@suse.cz> wrote ---- > On Fri 31-08-18 09:38:09, Dave Chinner wrote: > > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > > > > > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > > > > > Mike Snitzer <snitzer@redhat.com> writes: > > > > > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > > > need to tolerate this "regression". Since reality is the original > > > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > > > > > Agreed. > > > > > > > > -Jeff > > > > > > You can't support dax on snapshot - if someone maps a block and the block > > > needs to be moved, then what? > > > > This is only a problem for access via mmap and page faults. > > > > At the filesystem level, it's no different to the existing direct IO > > algorithm for read/write IO - we simply allocate new space, copy the > > data we need to copy into the new space (may be no copy needed), and > > then write the new data into the new space. I'm pretty sure that for > > bio-based IO to dm-snapshot devices the algorithm will be exactly > > the same. > > > > However, for direct access via mmap, we have to modify how the > > userspace virtual address is mapped to the physical location. IOWs, > > during the COW operation, we have to invalidate all existing user > > mappings we have for that physical address. This means we have to do > > an invalidation after the allocate/copy part of the COW operation. > > > > If we are doing this during a page fault, it means we'll probably > > have to restart the page fault so it can look up the new physical > > address associated with the faulting user address. After we've done > > the invalidation, any new (or restarted) page fault finds the > > location of new copy we just made, maps it into the user address > > space, updates the ptes and we're all good. > > > > Well, that's the theory. We haven't implemented this for XFS yet, so > > it might end up a little different, and we might yet hit unexpected > > problems (it's DAX, that's what happens :/). > > Yes, that's outline of a plan :) > > > It's a whole different ballgame for a dm-snapshot device - block > > devices are completely unaware of page faults to DAX file mappings. > > Actually, block devices are not completely unaware of DAX page faults - > they will get ->direct_access callback for the fault range. It does not > currently convey enough information - we also need to inform the block > device whether it is read or write. But that's about all that's needed to > add AFAICT. And by comparing returned PFN with the one we have stored in > the radix tree (which we have if that file offset is mapped by anybody), > the filesystem / DAX code can tell whether remapping happened and do the > unmapping. Hi Jan, I am trying to investigate how to make dm-snapshot to support DAX, and I dropped a patchset to upstream for comments. Any suggestion is welcome. # https://lkml.org/lkml/2018/11/21/281 In the beginning, I haven't considered the situation of mmap write faults. >From Dan's reply and this email thread, now I have a more clear understanding. The question is that, even the virtual dm block device has been informed that the mmap may have write operations through PROT_WRITE, if userspace directly operate the virtual address of origin device like memcpy, dm-snapshot doesn't have chance to detect this behavior. Although dm-snapshot can have chance to prepare a COW area to back up origin's blocks within ->direct_access callback for the fault range, how can it to have opportunity to read the data from origin device and save it to COW? --- Cheers, Huaisheng Ye ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 16:11 ` Huaisheng Ye 0 siblings, 0 replies; 83+ messages in thread From: Huaisheng Ye @ 2018-12-12 16:11 UTC (permalink / raw) To: Jan Kara Cc: Mike Snitzer, linux-nvdimm@lists.01.org, chengnt, Dave Chinner, colyli, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org ---- On Fri, 31 Aug 2018 17:42:55 +0800 Jan Kara <jack@suse.cz> wrote ---- > On Fri 31-08-18 09:38:09, Dave Chinner wrote: > > On Thu, Aug 30, 2018 at 03:47:32PM -0400, Mikulas Patocka wrote: > > > > > > > > > On Thu, 30 Aug 2018, Jeff Moyer wrote: > > > > > > > Mike Snitzer <snitzer@redhat.com> writes: > > > > > > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > > > > need to tolerate this "regression". Since reality is the original > > > > > support for snapshot of a DAX DM device never worked in a robust way. > > > > > > > > Agreed. > > > > > > > > -Jeff > > > > > > You can't support dax on snapshot - if someone maps a block and the block > > > needs to be moved, then what? > > > > This is only a problem for access via mmap and page faults. > > > > At the filesystem level, it's no different to the existing direct IO > > algorithm for read/write IO - we simply allocate new space, copy the > > data we need to copy into the new space (may be no copy needed), and > > then write the new data into the new space. I'm pretty sure that for > > bio-based IO to dm-snapshot devices the algorithm will be exactly > > the same. > > > > However, for direct access via mmap, we have to modify how the > > userspace virtual address is mapped to the physical location. IOWs, > > during the COW operation, we have to invalidate all existing user > > mappings we have for that physical address. This means we have to do > > an invalidation after the allocate/copy part of the COW operation. > > > > If we are doing this during a page fault, it means we'll probably > > have to restart the page fault so it can look up the new physical > > address associated with the faulting user address. After we've done > > the invalidation, any new (or restarted) page fault finds the > > location of new copy we just made, maps it into the user address > > space, updates the ptes and we're all good. > > > > Well, that's the theory. We haven't implemented this for XFS yet, so > > it might end up a little different, and we might yet hit unexpected > > problems (it's DAX, that's what happens :/). > > Yes, that's outline of a plan :) > > > It's a whole different ballgame for a dm-snapshot device - block > > devices are completely unaware of page faults to DAX file mappings. > > Actually, block devices are not completely unaware of DAX page faults - > they will get ->direct_access callback for the fault range. It does not > currently convey enough information - we also need to inform the block > device whether it is read or write. But that's about all that's needed to > add AFAICT. And by comparing returned PFN with the one we have stored in > the radix tree (which we have if that file offset is mapped by anybody), > the filesystem / DAX code can tell whether remapping happened and do the > unmapping. Hi Jan, I am trying to investigate how to make dm-snapshot to support DAX, and I dropped a patchset to upstream for comments. Any suggestion is welcome. # https://lkml.org/lkml/2018/11/21/281 In the beginning, I haven't considered the situation of mmap write faults. >From Dan's reply and this email thread, now I have a more clear understanding. The question is that, even the virtual dm block device has been informed that the mmap may have write operations through PROT_WRITE, if userspace directly operate the virtual address of origin device like memcpy, dm-snapshot doesn't have chance to detect this behavior. Although dm-snapshot can have chance to prepare a COW area to back up origin's blocks within ->direct_access callback for the fault range, how can it to have opportunity to read the data from origin device and save it to COW? --- Cheers, Huaisheng Ye _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <167a3303a01.11a848ab768799.5161498967766415143-ytc+IHgoah0@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-12-12 16:11 ` Huaisheng Ye (?) @ 2018-12-12 16:12 ` Christoph Hellwig -1 siblings, 0 replies; 83+ messages in thread From: Christoph Hellwig @ 2018-12-12 16:12 UTC (permalink / raw) To: Huaisheng Ye Cc: Jan Kara, Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, chengnt, Dave Chinner, colyli, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Does it really make sense to enhance dm-snapshot? I thought all serious users of snapshots had moved on to dm-thinp? ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 16:12 ` Christoph Hellwig 0 siblings, 0 replies; 83+ messages in thread From: Christoph Hellwig @ 2018-12-12 16:12 UTC (permalink / raw) To: Huaisheng Ye Cc: Jan Kara, Dave Chinner, Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org, chengnt, yehs1, colyli Does it really make sense to enhance dm-snapshot? I thought all serious users of snapshots had moved on to dm-thinp? ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 16:12 ` Christoph Hellwig 0 siblings, 0 replies; 83+ messages in thread From: Christoph Hellwig @ 2018-12-12 16:12 UTC (permalink / raw) To: Huaisheng Ye Cc: Jan Kara, Mike Snitzer, linux-nvdimm@lists.01.org, chengnt, Dave Chinner, colyli, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org Does it really make sense to enhance dm-snapshot? I thought all serious users of snapshots had moved on to dm-thinp? _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20181212161254.GA20790-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-12-12 16:12 ` Christoph Hellwig (?) @ 2018-12-12 17:50 ` Mike Snitzer -1 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-12 17:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, chengnt, Dave Chinner, colyli, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Dec 12 2018 at 11:12am -0500, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > Does it really make sense to enhance dm-snapshot? I thought all serious > users of snapshots had moved on to dm-thinp? There are cases where dm-snapshot is still useful for people. But those are very niche users. I'm not opposed to others proposing enhancements for dm-snapshot in general but it is definitely not a priority (Google's dm-bow is an example of a case where dm-snapshot may get extended to fulfill google's needs). But for this specific DAX case, I can only assume efforts to prop up dm-snapshot like this are born out of legacy use-cases. The reality is getting DAX to work with dm-snapshot is pretty involved (due to mmap, etc). This thread got into a lot of the details: https://www.redhat.com/archives/dm-devel/2018-August/msg00211.html So any new attempt to reintroduce DAX support to dm-snapshot (or any more complex DM target) will have a very high bar. One that requires much more extensive mm and block (interlock) prep work to make DAX safe. Mike ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 17:50 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-12 17:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Huaisheng Ye, Jan Kara, Dave Chinner, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org, chengnt, yehs1, colyli On Wed, Dec 12 2018 at 11:12am -0500, Christoph Hellwig <hch@infradead.org> wrote: > Does it really make sense to enhance dm-snapshot? I thought all serious > users of snapshots had moved on to dm-thinp? There are cases where dm-snapshot is still useful for people. But those are very niche users. I'm not opposed to others proposing enhancements for dm-snapshot in general but it is definitely not a priority (Google's dm-bow is an example of a case where dm-snapshot may get extended to fulfill google's needs). But for this specific DAX case, I can only assume efforts to prop up dm-snapshot like this are born out of legacy use-cases. The reality is getting DAX to work with dm-snapshot is pretty involved (due to mmap, etc). This thread got into a lot of the details: https://www.redhat.com/archives/dm-devel/2018-August/msg00211.html So any new attempt to reintroduce DAX support to dm-snapshot (or any more complex DM target) will have a very high bar. One that requires much more extensive mm and block (interlock) prep work to make DAX safe. Mike ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 17:50 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-12 17:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Jan Kara, linux-nvdimm@lists.01.org, chengnt, Dave Chinner, colyli, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Wed, Dec 12 2018 at 11:12am -0500, Christoph Hellwig <hch@infradead.org> wrote: > Does it really make sense to enhance dm-snapshot? I thought all serious > users of snapshots had moved on to dm-thinp? There are cases where dm-snapshot is still useful for people. But those are very niche users. I'm not opposed to others proposing enhancements for dm-snapshot in general but it is definitely not a priority (Google's dm-bow is an example of a case where dm-snapshot may get extended to fulfill google's needs). But for this specific DAX case, I can only assume efforts to prop up dm-snapshot like this are born out of legacy use-cases. The reality is getting DAX to work with dm-snapshot is pretty involved (due to mmap, etc). This thread got into a lot of the details: https://www.redhat.com/archives/dm-devel/2018-August/msg00211.html So any new attempt to reintroduce DAX support to dm-snapshot (or any more complex DM target) will have a very high bar. One that requires much more extensive mm and block (interlock) prep work to make DAX safe. Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20181212175047.GA24962-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-12-12 17:50 ` Mike Snitzer (?) @ 2018-12-12 19:49 ` Kani, Toshi -1 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-12-12 19:49 UTC (permalink / raw) To: hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org, snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Cc: jack-AlSwsSmVLrQ@public.gmane.org, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, chengnt-6jq1YtArVR3QT0dZR+AlfA@public.gmane.org, david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org, colyli-l3A5Bk7waGM@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mpatocka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, 2018-12-12 at 12:50 -0500, Mike Snitzer wrote: > On Wed, Dec 12 2018 at 11:12am -0500, > Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > users of snapshots had moved on to dm-thinp? > > There are cases where dm-snapshot is still useful for people. But those > are very niche users. I'm not opposed to others proposing enhancements > for dm-snapshot in general but it is definitely not a priority (Google's > dm-bow is an example of a case where dm-snapshot may get extended to > fulfill google's needs). > > But for this specific DAX case, I can only assume efforts to prop up > dm-snapshot like this are born out of legacy use-cases. The reality is > getting DAX to work with dm-snapshot is pretty involved (due to mmap, > etc). This thread got into a lot of the details: > https://www.redhat.com/archives/dm-devel/2018-August/msg00211.html > > So any new attempt to reintroduce DAX support to dm-snapshot (or any > more complex DM target) will have a very high bar. One that requires > much more extensive mm and block (interlock) prep work to make DAX > safe. Just to be clear, "reintroduce DAX support to dm-snapshot" is a bit misleading because there wasn't such attempt before. My original hack/change in dm-snapshot was to support non-DAX use-cases to work with DAX-capable devices. Thanks, -Toshi ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 19:49 ` Kani, Toshi 0 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-12-12 19:49 UTC (permalink / raw) To: hch@infradead.org, snitzer@redhat.com Cc: dm-devel@redhat.com, chengnt@lenovo.com, mpatocka@redhat.com, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, colyli@suse.de, david@fromorbit.com, jack@suse.cz On Wed, 2018-12-12 at 12:50 -0500, Mike Snitzer wrote: > On Wed, Dec 12 2018 at 11:12am -0500, > Christoph Hellwig <hch@infradead.org> wrote: > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > users of snapshots had moved on to dm-thinp? > > There are cases where dm-snapshot is still useful for people. But those > are very niche users. I'm not opposed to others proposing enhancements > for dm-snapshot in general but it is definitely not a priority (Google's > dm-bow is an example of a case where dm-snapshot may get extended to > fulfill google's needs). > > But for this specific DAX case, I can only assume efforts to prop up > dm-snapshot like this are born out of legacy use-cases. The reality is > getting DAX to work with dm-snapshot is pretty involved (due to mmap, > etc). This thread got into a lot of the details: > https://www.redhat.com/archives/dm-devel/2018-August/msg00211.html > > So any new attempt to reintroduce DAX support to dm-snapshot (or any > more complex DM target) will have a very high bar. One that requires > much more extensive mm and block (interlock) prep work to make DAX > safe. Just to be clear, "reintroduce DAX support to dm-snapshot" is a bit misleading because there wasn't such attempt before. My original hack/change in dm-snapshot was to support non-DAX use-cases to work with DAX-capable devices. Thanks, -Toshi ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 19:49 ` Kani, Toshi 0 siblings, 0 replies; 83+ messages in thread From: Kani, Toshi @ 2018-12-12 19:49 UTC (permalink / raw) To: hch@infradead.org, snitzer@redhat.com Cc: jack@suse.cz, linux-nvdimm@lists.01.org, chengnt@lenovo.com, david@fromorbit.com, colyli@suse.de, dm-devel@redhat.com, mpatocka@redhat.com, linux-fsdevel@vger.kernel.org On Wed, 2018-12-12 at 12:50 -0500, Mike Snitzer wrote: > On Wed, Dec 12 2018 at 11:12am -0500, > Christoph Hellwig <hch@infradead.org> wrote: > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > users of snapshots had moved on to dm-thinp? > > There are cases where dm-snapshot is still useful for people. But those > are very niche users. I'm not opposed to others proposing enhancements > for dm-snapshot in general but it is definitely not a priority (Google's > dm-bow is an example of a case where dm-snapshot may get extended to > fulfill google's needs). > > But for this specific DAX case, I can only assume efforts to prop up > dm-snapshot like this are born out of legacy use-cases. The reality is > getting DAX to work with dm-snapshot is pretty involved (due to mmap, > etc). This thread got into a lot of the details: > https://www.redhat.com/archives/dm-devel/2018-August/msg00211.html > > So any new attempt to reintroduce DAX support to dm-snapshot (or any > more complex DM target) will have a very high bar. One that requires > much more extensive mm and block (interlock) prep work to make DAX > safe. Just to be clear, "reintroduce DAX support to dm-snapshot" is a bit misleading because there wasn't such attempt before. My original hack/change in dm-snapshot was to support non-DAX use-cases to work with DAX-capable devices. Thanks, -Toshi _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices 2018-12-12 17:50 ` Mike Snitzer @ 2018-12-12 21:15 ` Theodore Y. Ts'o -1 siblings, 0 replies; 83+ messages in thread From: Theodore Y. Ts'o @ 2018-12-12 21:15 UTC (permalink / raw) To: Mike Snitzer Cc: Huaisheng Ye, Jan Kara, yehs1, linux-nvdimm@lists.01.org, chengnt, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > On Wed, Dec 12 2018 at 11:12am -0500, > Christoph Hellwig <hch@infradead.org> wrote: > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > users of snapshots had moved on to dm-thinp? > > There are cases where dm-snapshot is still useful for people. But those > are very niche users. I'm not opposed to others proposing enhancements > for dm-snapshot in general but it is definitely not a priority (Google's > dm-bow is an example of a case where dm-snapshot may get extended to > fulfill google's needs). I would expect that dm-snapshot will be used quite a lot for short-lived snapshots (that only live during a database backup or an fsck run). I would hardly call that a "niche use case". One other major advantage that dm-snapshot has is that you can take a snapshot for any LVM volume. For dm-thinp you have migrate your storage to a thinp pool, and that adds a fair amount of friction to users migrating to dm-thinp. - Ted ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 21:15 ` Theodore Y. Ts'o 0 siblings, 0 replies; 83+ messages in thread From: Theodore Y. Ts'o @ 2018-12-12 21:15 UTC (permalink / raw) To: Mike Snitzer Cc: Christoph Hellwig, Huaisheng Ye, Jan Kara, Dave Chinner, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org, chengnt, yehs1, colyli On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > On Wed, Dec 12 2018 at 11:12am -0500, > Christoph Hellwig <hch@infradead.org> wrote: > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > users of snapshots had moved on to dm-thinp? > > There are cases where dm-snapshot is still useful for people. But those > are very niche users. I'm not opposed to others proposing enhancements > for dm-snapshot in general but it is definitely not a priority (Google's > dm-bow is an example of a case where dm-snapshot may get extended to > fulfill google's needs). I would expect that dm-snapshot will be used quite a lot for short-lived snapshots (that only live during a database backup or an fsck run). I would hardly call that a "niche use case". One other major advantage that dm-snapshot has is that you can take a snapshot for any LVM volume. For dm-thinp you have migrate your storage to a thinp pool, and that adds a fair amount of friction to users migrating to dm-thinp. - Ted ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20181212211547.GA24926-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-12-12 21:15 ` Theodore Y. Ts'o (?) @ 2018-12-12 22:43 ` Mike Snitzer -1 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-12 22:43 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, chengnt, Dave Chinner, colyli, Christoph Hellwig, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Dec 12 2018 at 4:15pm -0500, Theodore Y. Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote: > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > On Wed, Dec 12 2018 at 11:12am -0500, > > Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > users of snapshots had moved on to dm-thinp? > > > > There are cases where dm-snapshot is still useful for people. But those > > are very niche users. I'm not opposed to others proposing enhancements > > for dm-snapshot in general but it is definitely not a priority (Google's > > dm-bow is an example of a case where dm-snapshot may get extended to > > fulfill google's needs). > > I would expect that dm-snapshot will be used quite a lot for > short-lived snapshots (that only live during a database backup or an > fsck run). I would hardly call that a "niche use case". dm-snapshot is only ~60% performant for 1 snapshot. Try to do additional snapshots and performance crawls to a stop (though I haven't reassessed performance in a while). dm-snapshot has been in Linux since before 2005, I don't know of all the users of it -- maybe there are a ton of users who only take a single temporary snapshot and we're all oblivious. Definitely not seeing many bugs against it (but it has been around forever). I do know that there are relatively few people showing interest in it. But for 4.21 I did stage a couple useful performance fixes: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb > One other major advantage that dm-snapshot has is that you can take a > snapshot for any LVM volume. For dm-thinp you have migrate your > storage to a thinp pool, and that adds a fair amount of friction to > users migrating to dm-thinp. dm-thinp has the concept of an "external origin". Changes to origin LVM volume get copied out to the thin-pool (same copy cost as old dm-snapshot). But IIRC from that point on your LVM volume is a dm-thin device. Mike ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 22:43 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-12 22:43 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Huaisheng Ye, Jan Kara, yehs1, linux-nvdimm@lists.01.org, chengnt, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Wed, Dec 12 2018 at 4:15pm -0500, Theodore Y. Ts'o <tytso@mit.edu> wrote: > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > On Wed, Dec 12 2018 at 11:12am -0500, > > Christoph Hellwig <hch@infradead.org> wrote: > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > users of snapshots had moved on to dm-thinp? > > > > There are cases where dm-snapshot is still useful for people. But those > > are very niche users. I'm not opposed to others proposing enhancements > > for dm-snapshot in general but it is definitely not a priority (Google's > > dm-bow is an example of a case where dm-snapshot may get extended to > > fulfill google's needs). > > I would expect that dm-snapshot will be used quite a lot for > short-lived snapshots (that only live during a database backup or an > fsck run). I would hardly call that a "niche use case". dm-snapshot is only ~60% performant for 1 snapshot. Try to do additional snapshots and performance crawls to a stop (though I haven't reassessed performance in a while). dm-snapshot has been in Linux since before 2005, I don't know of all the users of it -- maybe there are a ton of users who only take a single temporary snapshot and we're all oblivious. Definitely not seeing many bugs against it (but it has been around forever). I do know that there are relatively few people showing interest in it. But for 4.21 I did stage a couple useful performance fixes: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb > One other major advantage that dm-snapshot has is that you can take a > snapshot for any LVM volume. For dm-thinp you have migrate your > storage to a thinp pool, and that adds a fair amount of friction to > users migrating to dm-thinp. dm-thinp has the concept of an "external origin". Changes to origin LVM volume get copied out to the thin-pool (same copy cost as old dm-snapshot). But IIRC from that point on your LVM volume is a dm-thin device. Mike ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-12 22:43 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-12 22:43 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Jan Kara, linux-nvdimm@lists.01.org, chengnt, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Wed, Dec 12 2018 at 4:15pm -0500, Theodore Y. Ts'o <tytso@mit.edu> wrote: > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > On Wed, Dec 12 2018 at 11:12am -0500, > > Christoph Hellwig <hch@infradead.org> wrote: > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > users of snapshots had moved on to dm-thinp? > > > > There are cases where dm-snapshot is still useful for people. But those > > are very niche users. I'm not opposed to others proposing enhancements > > for dm-snapshot in general but it is definitely not a priority (Google's > > dm-bow is an example of a case where dm-snapshot may get extended to > > fulfill google's needs). > > I would expect that dm-snapshot will be used quite a lot for > short-lived snapshots (that only live during a database backup or an > fsck run). I would hardly call that a "niche use case". dm-snapshot is only ~60% performant for 1 snapshot. Try to do additional snapshots and performance crawls to a stop (though I haven't reassessed performance in a while). dm-snapshot has been in Linux since before 2005, I don't know of all the users of it -- maybe there are a ton of users who only take a single temporary snapshot and we're all oblivious. Definitely not seeing many bugs against it (but it has been around forever). I do know that there are relatively few people showing interest in it. But for 4.21 I did stage a couple useful performance fixes: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb > One other major advantage that dm-snapshot has is that you can take a > snapshot for any LVM volume. For dm-thinp you have migrate your > storage to a thinp pool, and that adds a fair amount of friction to > users migrating to dm-thinp. dm-thinp has the concept of an "external origin". Changes to origin LVM volume get copied out to the thin-pool (same copy cost as old dm-snapshot). But IIRC from that point on your LVM volume is a dm-thin device. Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <20181212224321.GA2902-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [dm-devel] Snapshot target and DAX-capable devices 2018-12-12 22:43 ` Mike Snitzer (?) @ 2018-12-14 4:11 ` Theodore Y. Ts'o -1 siblings, 0 replies; 83+ messages in thread From: Theodore Y. Ts'o @ 2018-12-14 4:11 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, chengnt, Dave Chinner, colyli, Christoph Hellwig, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Dec 12, 2018 at 05:43:22PM -0500, Mike Snitzer wrote: > > I would expect that dm-snapshot will be used quite a lot for > > short-lived snapshots (that only live during a database backup or an > > fsck run). I would hardly call that a "niche use case". > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > additional snapshots and performance crawls to a stop (though I haven't > reassessed performance in a while). > > dm-snapshot has been in Linux since before 2005, I don't know of all the > users of it -- maybe there are a ton of users who only take a single > temporary snapshot and we're all oblivious. Well, here's one user: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/scrub/e2scrub.in This is a spiffed up version of my original script: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/contrib/e2croncheck I suppose we should look into enhancing e2scrub so it can deal with volumes stored on dm-thin pools..... > dm-thinp has the concept of an "external origin". Changes to origin LVM > volume get copied out to the thin-pool (same copy cost as old > dm-snapshot). But IIRC from that point on your LVM volume is a dm-thin > device. I would think the *snapshot* would have to be the dm-thin device, not the origin volume, correct? The original LVM module would still be mounted through the original LVM device-mapper device, so it couldn't get transmogrified to be a dm-thin device, right? - Ted ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [dm-devel] Snapshot target and DAX-capable devices @ 2018-12-14 4:11 ` Theodore Y. Ts'o 0 siblings, 0 replies; 83+ messages in thread From: Theodore Y. Ts'o @ 2018-12-14 4:11 UTC (permalink / raw) To: Mike Snitzer Cc: Huaisheng Ye, Jan Kara, yehs1, linux-nvdimm@lists.01.org, chengnt, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Wed, Dec 12, 2018 at 05:43:22PM -0500, Mike Snitzer wrote: > > I would expect that dm-snapshot will be used quite a lot for > > short-lived snapshots (that only live during a database backup or an > > fsck run). I would hardly call that a "niche use case". > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > additional snapshots and performance crawls to a stop (though I haven't > reassessed performance in a while). > > dm-snapshot has been in Linux since before 2005, I don't know of all the > users of it -- maybe there are a ton of users who only take a single > temporary snapshot and we're all oblivious. Well, here's one user: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/scrub/e2scrub.in This is a spiffed up version of my original script: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/contrib/e2croncheck I suppose we should look into enhancing e2scrub so it can deal with volumes stored on dm-thin pools..... > dm-thinp has the concept of an "external origin". Changes to origin LVM > volume get copied out to the thin-pool (same copy cost as old > dm-snapshot). But IIRC from that point on your LVM volume is a dm-thin > device. I would think the *snapshot* would have to be the dm-thin device, not the origin volume, correct? The original LVM module would still be mounted through the original LVM device-mapper device, so it couldn't get transmogrified to be a dm-thin device, right? - Ted ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [dm-devel] Snapshot target and DAX-capable devices @ 2018-12-14 4:11 ` Theodore Y. Ts'o 0 siblings, 0 replies; 83+ messages in thread From: Theodore Y. Ts'o @ 2018-12-14 4:11 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm@lists.01.org, chengnt, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Wed, Dec 12, 2018 at 05:43:22PM -0500, Mike Snitzer wrote: > > I would expect that dm-snapshot will be used quite a lot for > > short-lived snapshots (that only live during a database backup or an > > fsck run). I would hardly call that a "niche use case". > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > additional snapshots and performance crawls to a stop (though I haven't > reassessed performance in a while). > > dm-snapshot has been in Linux since before 2005, I don't know of all the > users of it -- maybe there are a ton of users who only take a single > temporary snapshot and we're all oblivious. Well, here's one user: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/scrub/e2scrub.in This is a spiffed up version of my original script: https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/contrib/e2croncheck I suppose we should look into enhancing e2scrub so it can deal with volumes stored on dm-thin pools..... > dm-thinp has the concept of an "external origin". Changes to origin LVM > volume get copied out to the thin-pool (same copy cost as old > dm-snapshot). But IIRC from that point on your LVM volume is a dm-thin > device. I would think the *snapshot* would have to be the dm-thin device, not the origin volume, correct? The original LVM module would still be mounted through the original LVM device-mapper device, so it couldn't get transmogrified to be a dm-thin device, right? - Ted _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* RE: [External] Re: Snapshot target and DAX-capable devices 2018-12-12 22:43 ` Mike Snitzer (?) @ 2018-12-14 8:24 ` Huaisheng HS1 Ye -1 siblings, 0 replies; 83+ messages in thread From: Huaisheng HS1 Ye @ 2018-12-14 8:24 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, NingTing Cheng, Dave Chinner, colyli, Christoph Hellwig, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Theodore Y. Ts'o From: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Sent: Thursday, December 13, 2018 6:43 AM > On Wed, Dec 12 2018 at 4:15pm -0500, > Theodore Y. Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote: > > > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > > On Wed, Dec 12 2018 at 11:12am -0500, > > > Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > > > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > > users of snapshots had moved on to dm-thinp? > > > > > > There are cases where dm-snapshot is still useful for people. But those > > > are very niche users. I'm not opposed to others proposing enhancements > > > for dm-snapshot in general but it is definitely not a priority (Google's > > > dm-bow is an example of a case where dm-snapshot may get extended to > > > fulfill google's needs). > > > > I would expect that dm-snapshot will be used quite a lot for > > short-lived snapshots (that only live during a database backup or an > > fsck run). I would hardly call that a "niche use case". > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > additional snapshots and performance crawls to a stop (though I haven't > reassessed performance in a while). > > dm-snapshot has been in Linux since before 2005, I don't know of all the > users of it -- maybe there are a ton of users who only take a single > temporary snapshot and we're all oblivious. > > Definitely not seeing many bugs against it (but it has been around > forever). I do know that there are relatively few people showing > interest in it. But for 4.21 I did stage a couple useful performance > fixes: > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > 21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > 21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb Hi Mike, Could these two patches be applied to current code of LVM? Although there is a difficult problem as mmap for dm-snapshot with DAX-capable, the two patches can be used for other complex DM targets when trying to implement DAX. [RFC PATCH v2 2/3] dm: expand hc_map in mapped_device for lack of map https://lkml.org/lkml/2018/11/21/273 [RFC PATCH v2 3/3] dm: expand valid types for dm-ioctl https://lkml.org/lkml/2018/11/21/276 Cheers, Huaisheng Ye ^ permalink raw reply [flat|nested] 83+ messages in thread
* RE: [External] Re: Snapshot target and DAX-capable devices @ 2018-12-14 8:24 ` Huaisheng HS1 Ye 0 siblings, 0 replies; 83+ messages in thread From: Huaisheng HS1 Ye @ 2018-12-14 8:24 UTC (permalink / raw) To: Mike Snitzer Cc: Huaisheng Ye, Jan Kara, linux-nvdimm@lists.01.org, NingTing Cheng, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org, Theodore Y. Ts'o From: Mike Snitzer <snitzer@redhat.com> Sent: Thursday, December 13, 2018 6:43 AM > On Wed, Dec 12 2018 at 4:15pm -0500, > Theodore Y. Ts'o <tytso@mit.edu> wrote: > > > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > > On Wed, Dec 12 2018 at 11:12am -0500, > > > Christoph Hellwig <hch@infradead.org> wrote: > > > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > > users of snapshots had moved on to dm-thinp? > > > > > > There are cases where dm-snapshot is still useful for people. But those > > > are very niche users. I'm not opposed to others proposing enhancements > > > for dm-snapshot in general but it is definitely not a priority (Google's > > > dm-bow is an example of a case where dm-snapshot may get extended to > > > fulfill google's needs). > > > > I would expect that dm-snapshot will be used quite a lot for > > short-lived snapshots (that only live during a database backup or an > > fsck run). I would hardly call that a "niche use case". > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > additional snapshots and performance crawls to a stop (though I haven't > reassessed performance in a while). > > dm-snapshot has been in Linux since before 2005, I don't know of all the > users of it -- maybe there are a ton of users who only take a single > temporary snapshot and we're all oblivious. > > Definitely not seeing many bugs against it (but it has been around > forever). I do know that there are relatively few people showing > interest in it. But for 4.21 I did stage a couple useful performance > fixes: > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > 21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > 21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb Hi Mike, Could these two patches be applied to current code of LVM? Although there is a difficult problem as mmap for dm-snapshot with DAX-capable, the two patches can be used for other complex DM targets when trying to implement DAX. [RFC PATCH v2 2/3] dm: expand hc_map in mapped_device for lack of map https://lkml.org/lkml/2018/11/21/273 [RFC PATCH v2 3/3] dm: expand valid types for dm-ioctl https://lkml.org/lkml/2018/11/21/276 Cheers, Huaisheng Ye ^ permalink raw reply [flat|nested] 83+ messages in thread
* RE: [External] Re: Snapshot target and DAX-capable devices @ 2018-12-14 8:24 ` Huaisheng HS1 Ye 0 siblings, 0 replies; 83+ messages in thread From: Huaisheng HS1 Ye @ 2018-12-14 8:24 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm@lists.01.org, NingTing Cheng, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org, Theodore Y. Ts'o From: Mike Snitzer <snitzer@redhat.com> Sent: Thursday, December 13, 2018 6:43 AM > On Wed, Dec 12 2018 at 4:15pm -0500, > Theodore Y. Ts'o <tytso@mit.edu> wrote: > > > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > > On Wed, Dec 12 2018 at 11:12am -0500, > > > Christoph Hellwig <hch@infradead.org> wrote: > > > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > > users of snapshots had moved on to dm-thinp? > > > > > > There are cases where dm-snapshot is still useful for people. But those > > > are very niche users. I'm not opposed to others proposing enhancements > > > for dm-snapshot in general but it is definitely not a priority (Google's > > > dm-bow is an example of a case where dm-snapshot may get extended to > > > fulfill google's needs). > > > > I would expect that dm-snapshot will be used quite a lot for > > short-lived snapshots (that only live during a database backup or an > > fsck run). I would hardly call that a "niche use case". > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > additional snapshots and performance crawls to a stop (though I haven't > reassessed performance in a while). > > dm-snapshot has been in Linux since before 2005, I don't know of all the > users of it -- maybe there are a ton of users who only take a single > temporary snapshot and we're all oblivious. > > Definitely not seeing many bugs against it (but it has been around > forever). I do know that there are relatively few people showing > interest in it. But for 4.21 I did stage a couple useful performance > fixes: > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > 21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > 21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb Hi Mike, Could these two patches be applied to current code of LVM? Although there is a difficult problem as mmap for dm-snapshot with DAX-capable, the two patches can be used for other complex DM targets when trying to implement DAX. [RFC PATCH v2 2/3] dm: expand hc_map in mapped_device for lack of map https://lkml.org/lkml/2018/11/21/273 [RFC PATCH v2 3/3] dm: expand valid types for dm-ioctl https://lkml.org/lkml/2018/11/21/276 Cheers, Huaisheng Ye _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <HK2PR03MB441871946735DE9EE714D10B92A10-LG58XzHXFHCi7fCZ8j4jr682SN/2zMuYvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-12-14 8:24 ` Huaisheng HS1 Ye (?) @ 2018-12-18 19:49 ` Mike Snitzer -1 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-18 19:49 UTC (permalink / raw) To: Huaisheng HS1 Ye Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, NingTing Cheng, Dave Chinner, colyli, Christoph Hellwig, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Theodore Y. Ts'o On Fri, Dec 14 2018 at 3:24am -0500, Huaisheng HS1 Ye <yehs1-6jq1YtArVR3QT0dZR+AlfA@public.gmane.org> wrote: > From: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Sent: Thursday, December 13, 2018 6:43 AM > > On Wed, Dec 12 2018 at 4:15pm -0500, > > Theodore Y. Ts'o <tytso-3s7WtUTddSA@public.gmane.org> wrote: > > > > > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > > > On Wed, Dec 12 2018 at 11:12am -0500, > > > > Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote: > > > > > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > > > users of snapshots had moved on to dm-thinp? > > > > > > > > There are cases where dm-snapshot is still useful for people. But those > > > > are very niche users. I'm not opposed to others proposing enhancements > > > > for dm-snapshot in general but it is definitely not a priority (Google's > > > > dm-bow is an example of a case where dm-snapshot may get extended to > > > > fulfill google's needs). > > > > > > I would expect that dm-snapshot will be used quite a lot for > > > short-lived snapshots (that only live during a database backup or an > > > fsck run). I would hardly call that a "niche use case". > > > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > > additional snapshots and performance crawls to a stop (though I haven't > > reassessed performance in a while). > > > > dm-snapshot has been in Linux since before 2005, I don't know of all the > > users of it -- maybe there are a ton of users who only take a single > > temporary snapshot and we're all oblivious. > > > > Definitely not seeing many bugs against it (but it has been around > > forever). I do know that there are relatively few people showing > > interest in it. But for 4.21 I did stage a couple useful performance > > fixes: > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > > 21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > > 21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb > > Hi Mike, > > Could these two patches be applied to current code of LVM? > Although there is a difficult problem as mmap for dm-snapshot with > DAX-capable, the two patches can be used for other complex DM targets > when trying to implement DAX. > > [RFC PATCH v2 2/3] dm: expand hc_map in mapped_device for lack of map > https://lkml.org/lkml/2018/11/21/273 > > [RFC PATCH v2 3/3] dm: expand valid types for dm-ioctl > https://lkml.org/lkml/2018/11/21/276 No I'm not taking these patches. They are hacks that allow DM targets to do things that aren't supportable (yet). ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-18 19:49 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-18 19:49 UTC (permalink / raw) To: Huaisheng HS1 Ye Cc: Huaisheng Ye, Jan Kara, linux-nvdimm@lists.01.org, NingTing Cheng, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org, Theodore Y. Ts'o On Fri, Dec 14 2018 at 3:24am -0500, Huaisheng HS1 Ye <yehs1@lenovo.com> wrote: > From: Mike Snitzer <snitzer@redhat.com> > Sent: Thursday, December 13, 2018 6:43 AM > > On Wed, Dec 12 2018 at 4:15pm -0500, > > Theodore Y. Ts'o <tytso@mit.edu> wrote: > > > > > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > > > On Wed, Dec 12 2018 at 11:12am -0500, > > > > Christoph Hellwig <hch@infradead.org> wrote: > > > > > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > > > users of snapshots had moved on to dm-thinp? > > > > > > > > There are cases where dm-snapshot is still useful for people. But those > > > > are very niche users. I'm not opposed to others proposing enhancements > > > > for dm-snapshot in general but it is definitely not a priority (Google's > > > > dm-bow is an example of a case where dm-snapshot may get extended to > > > > fulfill google's needs). > > > > > > I would expect that dm-snapshot will be used quite a lot for > > > short-lived snapshots (that only live during a database backup or an > > > fsck run). I would hardly call that a "niche use case". > > > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > > additional snapshots and performance crawls to a stop (though I haven't > > reassessed performance in a while). > > > > dm-snapshot has been in Linux since before 2005, I don't know of all the > > users of it -- maybe there are a ton of users who only take a single > > temporary snapshot and we're all oblivious. > > > > Definitely not seeing many bugs against it (but it has been around > > forever). I do know that there are relatively few people showing > > interest in it. But for 4.21 I did stage a couple useful performance > > fixes: > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > > 21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > > 21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb > > Hi Mike, > > Could these two patches be applied to current code of LVM? > Although there is a difficult problem as mmap for dm-snapshot with > DAX-capable, the two patches can be used for other complex DM targets > when trying to implement DAX. > > [RFC PATCH v2 2/3] dm: expand hc_map in mapped_device for lack of map > https://lkml.org/lkml/2018/11/21/273 > > [RFC PATCH v2 3/3] dm: expand valid types for dm-ioctl > https://lkml.org/lkml/2018/11/21/276 No I'm not taking these patches. They are hacks that allow DM targets to do things that aren't supportable (yet). ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-12-18 19:49 ` Mike Snitzer 0 siblings, 0 replies; 83+ messages in thread From: Mike Snitzer @ 2018-12-18 19:49 UTC (permalink / raw) To: Huaisheng HS1 Ye Cc: Jan Kara, linux-nvdimm@lists.01.org, NingTing Cheng, Dave Chinner, colyli, Christoph Hellwig, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org, Theodore Y. Ts'o On Fri, Dec 14 2018 at 3:24am -0500, Huaisheng HS1 Ye <yehs1@lenovo.com> wrote: > From: Mike Snitzer <snitzer@redhat.com> > Sent: Thursday, December 13, 2018 6:43 AM > > On Wed, Dec 12 2018 at 4:15pm -0500, > > Theodore Y. Ts'o <tytso@mit.edu> wrote: > > > > > On Wed, Dec 12, 2018 at 12:50:47PM -0500, Mike Snitzer wrote: > > > > On Wed, Dec 12 2018 at 11:12am -0500, > > > > Christoph Hellwig <hch@infradead.org> wrote: > > > > > > > > > Does it really make sense to enhance dm-snapshot? I thought all serious > > > > > users of snapshots had moved on to dm-thinp? > > > > > > > > There are cases where dm-snapshot is still useful for people. But those > > > > are very niche users. I'm not opposed to others proposing enhancements > > > > for dm-snapshot in general but it is definitely not a priority (Google's > > > > dm-bow is an example of a case where dm-snapshot may get extended to > > > > fulfill google's needs). > > > > > > I would expect that dm-snapshot will be used quite a lot for > > > short-lived snapshots (that only live during a database backup or an > > > fsck run). I would hardly call that a "niche use case". > > > > dm-snapshot is only ~60% performant for 1 snapshot. Try to do > > additional snapshots and performance crawls to a stop (though I haven't > > reassessed performance in a while). > > > > dm-snapshot has been in Linux since before 2005, I don't know of all the > > users of it -- maybe there are a ton of users who only take a single > > temporary snapshot and we're all oblivious. > > > > Definitely not seeing many bugs against it (but it has been around > > forever). I do know that there are relatively few people showing > > interest in it. But for 4.21 I did stage a couple useful performance > > fixes: > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > > 21&id=61d594bb7e1cf86dca49cbc9524eb80169d9fca6 > > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4. > > 21&id=d1f7898c7a1b24aa9ae670f9cc21b65e730827eb > > Hi Mike, > > Could these two patches be applied to current code of LVM? > Although there is a difficult problem as mmap for dm-snapshot with > DAX-capable, the two patches can be used for other complex DM targets > when trying to implement DAX. > > [RFC PATCH v2 2/3] dm: expand hc_map in mapped_device for lack of map > https://lkml.org/lkml/2018/11/21/273 > > [RFC PATCH v2 3/3] dm: expand valid types for dm-ioctl > https://lkml.org/lkml/2018/11/21/276 No I'm not taking these patches. They are hacks that allow DM targets to do things that aren't supportable (yet). _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices 2018-08-30 18:49 ` Mike Snitzer (?) @ 2018-08-30 19:44 ` Mikulas Patocka -1 siblings, 0 replies; 83+ messages in thread From: Mikulas Patocka @ 2018-08-30 19:44 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, 30 Aug 2018, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > Hi, > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > DAX mounts if not supported". > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > support. > > > > > > > > > > commit b5ab4a9ba55 > > > > > commit f6e629bd237 > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > snapshots need to be mounted without dax option. > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > does not work. Just try with 4.18... > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > it. > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > reluctant to do so because it really is unclear how/if we can even > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > that could really be problematic if we leave DAX IO inflight and then > > > switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > > > So something like the above would seem like the best solution from user > > POV. Implementation of the above would not be completely trivial though as > > far as I'm looking into DM code. We'd have to implement ->direct_access > > paths for dm-snap and also I have a vague memory ->direct_access is not > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > should know how big obstacle would it be to reintroduce the sleeping > > possibility (I'm not currently aware of any particular problem with that > > but I'm not paying close attention to those parts of NVDIMM code). > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > Ross to weigh in. > > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? > > It does look like we'll need the more elaborate work you outlined > above. It could be that Mikulas will have interest, DAX expertise and > time to do the work. > > Restating the issue: 4.18 commit dbc626597 switched > drivers/md/dm-table.cdevice_supports_dax() to perform a much more > detailed verification of the device's DAX capabilities by calling > bdev_dax_supported() -- which will actually issue read IO via > dax_direct_access() to validate the DAX support. dm-snapshot-origin's > origin_direct_access() returns -EIO. When trying to create a snapshot > of a DAX enabled linear device, this results in the following error: > kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. > > This is because the active DM device's table is being switched from > using the linear target to snapshot-origin. Because the corresponding > DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED > (again because bdev_dax_supported()'s call to dm-snapshot-origin's > origin_direct_access() returns -EIO). > > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > fake origin_direct_access"). It gave the elusion that DAX is supported > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > that this will "just work" because the bio-based path would be used > instead is extremely fragile. > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. > I'm running the risk of making peoples' heads explode but I cannot just > drop everything and scramble to implement all the required DAX changes > in dm-snapshot. > > Contributions are welcome! > > Mike I think a proper fix would be to add functions such as start_dax(struct block_device *) and stop_dax(struct block_device *). start_dax would be used by a (filesystem or other) driver that intends to use dax - stop_dax would be used when the driver is being unloaded and it no longer needs dax. Device mapper would then maintain a counter how many dax users are there and prevent reloading the table if there are any. Do the persistent memory maintainers intend to add such functions? Mikulas ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 19:44 ` Mikulas Patocka 0 siblings, 0 replies; 83+ messages in thread From: Mikulas Patocka @ 2018-08-30 19:44 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, Jeff Moyer, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Thu, 30 Aug 2018, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, > Jan Kara <jack@suse.cz> wrote: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > Jan Kara <jack@suse.cz> wrote: > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > Hi, > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > DAX mounts if not supported". > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > support. > > > > > > > > > > commit b5ab4a9ba55 > > > > > commit f6e629bd237 > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > snapshots need to be mounted without dax option. > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > does not work. Just try with 4.18... > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > it. > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > reluctant to do so because it really is unclear how/if we can even > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > that could really be problematic if we leave DAX IO inflight and then > > > switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > > > So something like the above would seem like the best solution from user > > POV. Implementation of the above would not be completely trivial though as > > far as I'm looking into DM code. We'd have to implement ->direct_access > > paths for dm-snap and also I have a vague memory ->direct_access is not > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > should know how big obstacle would it be to reintroduce the sleeping > > possibility (I'm not currently aware of any particular problem with that > > but I'm not paying close attention to those parts of NVDIMM code). > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > Ross to weigh in. > > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? > > It does look like we'll need the more elaborate work you outlined > above. It could be that Mikulas will have interest, DAX expertise and > time to do the work. > > Restating the issue: 4.18 commit dbc626597 switched > drivers/md/dm-table.cdevice_supports_dax() to perform a much more > detailed verification of the device's DAX capabilities by calling > bdev_dax_supported() -- which will actually issue read IO via > dax_direct_access() to validate the DAX support. dm-snapshot-origin's > origin_direct_access() returns -EIO. When trying to create a snapshot > of a DAX enabled linear device, this results in the following error: > kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. > > This is because the active DM device's table is being switched from > using the linear target to snapshot-origin. Because the corresponding > DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED > (again because bdev_dax_supported()'s call to dm-snapshot-origin's > origin_direct_access() returns -EIO). > > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > fake origin_direct_access"). It gave the elusion that DAX is supported > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > that this will "just work" because the bio-based path would be used > instead is extremely fragile. > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. > I'm running the risk of making peoples' heads explode but I cannot just > drop everything and scramble to implement all the required DAX changes > in dm-snapshot. > > Contributions are welcome! > > Mike I think a proper fix would be to add functions such as start_dax(struct block_device *) and stop_dax(struct block_device *). start_dax would be used by a (filesystem or other) driver that intends to use dax - stop_dax would be used when the driver is being unloaded and it no longer needs dax. Device mapper would then maintain a counter how many dax users are there and prevent reloading the table if there are any. Do the persistent memory maintainers intend to add such functions? Mikulas ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 19:44 ` Mikulas Patocka 0 siblings, 0 replies; 83+ messages in thread From: Mikulas Patocka @ 2018-08-30 19:44 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Thu, 30 Aug 2018, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, > Jan Kara <jack@suse.cz> wrote: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > Jan Kara <jack@suse.cz> wrote: > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > Hi, > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > DAX mounts if not supported". > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > support. > > > > > > > > > > commit b5ab4a9ba55 > > > > > commit f6e629bd237 > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > snapshots need to be mounted without dax option. > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > does not work. Just try with 4.18... > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > it. > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > reluctant to do so because it really is unclear how/if we can even > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > that could really be problematic if we leave DAX IO inflight and then > > > switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > > > So something like the above would seem like the best solution from user > > POV. Implementation of the above would not be completely trivial though as > > far as I'm looking into DM code. We'd have to implement ->direct_access > > paths for dm-snap and also I have a vague memory ->direct_access is not > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > should know how big obstacle would it be to reintroduce the sleeping > > possibility (I'm not currently aware of any particular problem with that > > but I'm not paying close attention to those parts of NVDIMM code). > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > Ross to weigh in. > > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? > > It does look like we'll need the more elaborate work you outlined > above. It could be that Mikulas will have interest, DAX expertise and > time to do the work. > > Restating the issue: 4.18 commit dbc626597 switched > drivers/md/dm-table.cdevice_supports_dax() to perform a much more > detailed verification of the device's DAX capabilities by calling > bdev_dax_supported() -- which will actually issue read IO via > dax_direct_access() to validate the DAX support. dm-snapshot-origin's > origin_direct_access() returns -EIO. When trying to create a snapshot > of a DAX enabled linear device, this results in the following error: > kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. > > This is because the active DM device's table is being switched from > using the linear target to snapshot-origin. Because the corresponding > DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED > (again because bdev_dax_supported()'s call to dm-snapshot-origin's > origin_direct_access() returns -EIO). > > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > fake origin_direct_access"). It gave the elusion that DAX is supported > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > that this will "just work" because the bio-based path would be used > instead is extremely fragile. > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. > I'm running the risk of making peoples' heads explode but I cannot just > drop everything and scramble to implement all the required DAX changes > in dm-snapshot. > > Contributions are welcome! > > Mike I think a proper fix would be to add functions such as start_dax(struct block_device *) and stop_dax(struct block_device *). start_dax would be used by a (filesystem or other) driver that intends to use dax - stop_dax would be used when the driver is being unloaded and it no longer needs dax. Device mapper would then maintain a counter how many dax users are there and prevent reloading the table if there are any. Do the persistent memory maintainers intend to add such functions? Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <alpine.LRH.2.02.1808301537420.30950-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>]
* Re: Snapshot target and DAX-capable devices 2018-08-30 19:44 ` Mikulas Patocka (?) @ 2018-08-31 10:01 ` Jan Kara -1 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 10:01 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu 30-08-18 15:44:57, Mikulas Patocka wrote: > On Thu, 30 Aug 2018, Mike Snitzer wrote: > > > On Thu, Aug 30 2018 at 5:30am -0400, > > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > > DAX mounts if not supported". > > > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > > support. > > > > > > > > > > > > commit b5ab4a9ba55 > > > > > > commit f6e629bd237 > > > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > > snapshots need to be mounted without dax option. > > > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > > does not work. Just try with 4.18... > > > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > > it. > > > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > > reluctant to do so because it really is unclear how/if we can even > > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > > that could really be problematic if we leave DAX IO inflight and then > > > > switch the DM table such that the DM device no longer supports DAX. > > > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > > filesystem on top of it as well. Filesystems simply don't expect this > > > feature of a device can change so they would fail in unexpected ways. Also > > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > > tables won't magically become unmapped so those processes will still have > > > DAX access to those areas of the device. > > > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > > able to support snapshotting (and refusing mixing of DAX-capable and > > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > > When creating a snapshot of a device, we need to freeze the filesystem > > > using it. That will writeprotect all page tables so we are sure we'll get > > > page faults (and thus ->direct_access requests from DM POV) for each write > > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > > make sure to copy original contents to the COW-device before returning PFN > > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > > remapped PFN so everything should work seamlessly from user POV. > > > > > > So something like the above would seem like the best solution from user > > > POV. Implementation of the above would not be completely trivial though as > > > far as I'm looking into DM code. We'd have to implement ->direct_access > > > paths for dm-snap and also I have a vague memory ->direct_access is not > > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > > should know how big obstacle would it be to reintroduce the sleeping > > > possibility (I'm not currently aware of any particular problem with that > > > but I'm not paying close attention to those parts of NVDIMM code). > > > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > > Ross to weigh in. > > > > As you point out, how are the upper layers (e.g. filesystems) supposed > > to reliably cope with this runtime switch to from DAX to non-DAX access? > > > > It does look like we'll need the more elaborate work you outlined > > above. It could be that Mikulas will have interest, DAX expertise and > > time to do the work. > > > > Restating the issue: 4.18 commit dbc626597 switched > > drivers/md/dm-table.cdevice_supports_dax() to perform a much more > > detailed verification of the device's DAX capabilities by calling > > bdev_dax_supported() -- which will actually issue read IO via > > dax_direct_access() to validate the DAX support. dm-snapshot-origin's > > origin_direct_access() returns -EIO. When trying to create a snapshot > > of a DAX enabled linear device, this results in the following error: > > kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. > > > > This is because the active DM device's table is being switched from > > using the linear target to snapshot-origin. Because the corresponding > > DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED > > (again because bdev_dax_supported()'s call to dm-snapshot-origin's > > origin_direct_access() returns -EIO). > > > > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > > fake origin_direct_access"). It gave the elusion that DAX is supported > > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > > that this will "just work" because the bio-based path would be used > > instead is extremely fragile. > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > need to tolerate this "regression". Since reality is the original > > support for snapshot of a DAX DM device never worked in a robust way. > > I'm running the risk of making peoples' heads explode but I cannot just > > drop everything and scramble to implement all the required DAX changes > > in dm-snapshot. > > > > Contributions are welcome! > > > > Mike > > I think a proper fix would be to add functions such as start_dax(struct > block_device *) and stop_dax(struct block_device *). > > start_dax would be used by a (filesystem or other) driver that intends to > use dax - stop_dax would be used when the driver is being unloaded and it > no longer needs dax. Device mapper would then maintain a counter how many > dax users are there and prevent reloading the table if there are any. > > Do the persistent memory maintainers intend to add such functions? So that would be a quick way of at least somehow supporting snapshots for dax-capable devices. Actually these "start_dax / stop_dax" functions already exist in filesystems - they are fs_dax_get_by_bdev() and fs_put_dax(). So by plumbing these two calls down into the block layer, you could easily get the functionality you want. I'm fine with that as a short term solution for the regression. Longer term I'd like to see something like we've outlined with Dave to be implemented but obviously that's more work and also on fs / DAX side, not only DM. Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-31 10:01 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 10:01 UTC (permalink / raw) To: Mikulas Patocka Cc: Mike Snitzer, Jan Kara, Jeff Moyer, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Thu 30-08-18 15:44:57, Mikulas Patocka wrote: > On Thu, 30 Aug 2018, Mike Snitzer wrote: > > > On Thu, Aug 30 2018 at 5:30am -0400, > > Jan Kara <jack@suse.cz> wrote: > > > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > > Jan Kara <jack@suse.cz> wrote: > > > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > > DAX mounts if not supported". > > > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > > support. > > > > > > > > > > > > commit b5ab4a9ba55 > > > > > > commit f6e629bd237 > > > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > > snapshots need to be mounted without dax option. > > > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > > does not work. Just try with 4.18... > > > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > > it. > > > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > > reluctant to do so because it really is unclear how/if we can even > > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > > that could really be problematic if we leave DAX IO inflight and then > > > > switch the DM table such that the DM device no longer supports DAX. > > > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > > filesystem on top of it as well. Filesystems simply don't expect this > > > feature of a device can change so they would fail in unexpected ways. Also > > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > > tables won't magically become unmapped so those processes will still have > > > DAX access to those areas of the device. > > > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > > able to support snapshotting (and refusing mixing of DAX-capable and > > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > > When creating a snapshot of a device, we need to freeze the filesystem > > > using it. That will writeprotect all page tables so we are sure we'll get > > > page faults (and thus ->direct_access requests from DM POV) for each write > > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > > make sure to copy original contents to the COW-device before returning PFN > > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > > remapped PFN so everything should work seamlessly from user POV. > > > > > > So something like the above would seem like the best solution from user > > > POV. Implementation of the above would not be completely trivial though as > > > far as I'm looking into DM code. We'd have to implement ->direct_access > > > paths for dm-snap and also I have a vague memory ->direct_access is not > > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > > should know how big obstacle would it be to reintroduce the sleeping > > > possibility (I'm not currently aware of any particular problem with that > > > but I'm not paying close attention to those parts of NVDIMM code). > > > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > > Ross to weigh in. > > > > As you point out, how are the upper layers (e.g. filesystems) supposed > > to reliably cope with this runtime switch to from DAX to non-DAX access? > > > > It does look like we'll need the more elaborate work you outlined > > above. It could be that Mikulas will have interest, DAX expertise and > > time to do the work. > > > > Restating the issue: 4.18 commit dbc626597 switched > > drivers/md/dm-table.cdevice_supports_dax() to perform a much more > > detailed verification of the device's DAX capabilities by calling > > bdev_dax_supported() -- which will actually issue read IO via > > dax_direct_access() to validate the DAX support. dm-snapshot-origin's > > origin_direct_access() returns -EIO. When trying to create a snapshot > > of a DAX enabled linear device, this results in the following error: > > kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. > > > > This is because the active DM device's table is being switched from > > using the linear target to snapshot-origin. Because the corresponding > > DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED > > (again because bdev_dax_supported()'s call to dm-snapshot-origin's > > origin_direct_access() returns -EIO). > > > > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > > fake origin_direct_access"). It gave the elusion that DAX is supported > > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > > that this will "just work" because the bio-based path would be used > > instead is extremely fragile. > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > need to tolerate this "regression". Since reality is the original > > support for snapshot of a DAX DM device never worked in a robust way. > > I'm running the risk of making peoples' heads explode but I cannot just > > drop everything and scramble to implement all the required DAX changes > > in dm-snapshot. > > > > Contributions are welcome! > > > > Mike > > I think a proper fix would be to add functions such as start_dax(struct > block_device *) and stop_dax(struct block_device *). > > start_dax would be used by a (filesystem or other) driver that intends to > use dax - stop_dax would be used when the driver is being unloaded and it > no longer needs dax. Device mapper would then maintain a counter how many > dax users are there and prevent reloading the table if there are any. > > Do the persistent memory maintainers intend to add such functions? So that would be a quick way of at least somehow supporting snapshots for dax-capable devices. Actually these "start_dax / stop_dax" functions already exist in filesystems - they are fs_dax_get_by_bdev() and fs_put_dax(). So by plumbing these two calls down into the block layer, you could easily get the functionality you want. I'm fine with that as a short term solution for the regression. Longer term I'd like to see something like we've outlined with Dave to be implemented but obviously that's more work and also on fs / DAX side, not only DM. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-31 10:01 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 10:01 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Thu 30-08-18 15:44:57, Mikulas Patocka wrote: > On Thu, 30 Aug 2018, Mike Snitzer wrote: > > > On Thu, Aug 30 2018 at 5:30am -0400, > > Jan Kara <jack@suse.cz> wrote: > > > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > > Jan Kara <jack@suse.cz> wrote: > > > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > > DAX mounts if not supported". > > > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > > support. > > > > > > > > > > > > commit b5ab4a9ba55 > > > > > > commit f6e629bd237 > > > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > > snapshots need to be mounted without dax option. > > > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > > does not work. Just try with 4.18... > > > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > > it. > > > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > > reluctant to do so because it really is unclear how/if we can even > > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > > that could really be problematic if we leave DAX IO inflight and then > > > > switch the DM table such that the DM device no longer supports DAX. > > > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > > filesystem on top of it as well. Filesystems simply don't expect this > > > feature of a device can change so they would fail in unexpected ways. Also > > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > > tables won't magically become unmapped so those processes will still have > > > DAX access to those areas of the device. > > > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > > able to support snapshotting (and refusing mixing of DAX-capable and > > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > > When creating a snapshot of a device, we need to freeze the filesystem > > > using it. That will writeprotect all page tables so we are sure we'll get > > > page faults (and thus ->direct_access requests from DM POV) for each write > > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > > make sure to copy original contents to the COW-device before returning PFN > > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > > remapped PFN so everything should work seamlessly from user POV. > > > > > > So something like the above would seem like the best solution from user > > > POV. Implementation of the above would not be completely trivial though as > > > far as I'm looking into DM code. We'd have to implement ->direct_access > > > paths for dm-snap and also I have a vague memory ->direct_access is not > > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > > should know how big obstacle would it be to reintroduce the sleeping > > > possibility (I'm not currently aware of any particular problem with that > > > but I'm not paying close attention to those parts of NVDIMM code). > > > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > > Ross to weigh in. > > > > As you point out, how are the upper layers (e.g. filesystems) supposed > > to reliably cope with this runtime switch to from DAX to non-DAX access? > > > > It does look like we'll need the more elaborate work you outlined > > above. It could be that Mikulas will have interest, DAX expertise and > > time to do the work. > > > > Restating the issue: 4.18 commit dbc626597 switched > > drivers/md/dm-table.cdevice_supports_dax() to perform a much more > > detailed verification of the device's DAX capabilities by calling > > bdev_dax_supported() -- which will actually issue read IO via > > dax_direct_access() to validate the DAX support. dm-snapshot-origin's > > origin_direct_access() returns -EIO. When trying to create a snapshot > > of a DAX enabled linear device, this results in the following error: > > kernel: device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load. > > > > This is because the active DM device's table is being switched from > > using the linear target to snapshot-origin. Because the corresponding > > DM type switches from DM_TYPE_DAX_BIO_BASED to DM_TYPE_BIO_BASED > > (again because bdev_dax_supported()'s call to dm-snapshot-origin's > > origin_direct_access() returns -EIO). > > > > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > > fake origin_direct_access"). It gave the elusion that DAX is supported > > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > > that this will "just work" because the bio-based path would be used > > instead is extremely fragile. > > > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > > need to tolerate this "regression". Since reality is the original > > support for snapshot of a DAX DM device never worked in a robust way. > > I'm running the risk of making peoples' heads explode but I cannot just > > drop everything and scramble to implement all the required DAX changes > > in dm-snapshot. > > > > Contributions are welcome! > > > > Mike > > I think a proper fix would be to add functions such as start_dax(struct > block_device *) and stop_dax(struct block_device *). > > start_dax would be used by a (filesystem or other) driver that intends to > use dax - stop_dax would be used when the driver is being unloaded and it > no longer needs dax. Device mapper would then maintain a counter how many > dax users are there and prevent reloading the table if there are any. > > Do the persistent memory maintainers intend to add such functions? So that would be a quick way of at least somehow supporting snapshots for dax-capable devices. Actually these "start_dax / stop_dax" functions already exist in filesystems - they are fs_dax_get_by_bdev() and fs_put_dax(). So by plumbing these two calls down into the block layer, you could easily get the functionality you want. I'm fine with that as a short term solution for the regression. Longer term I'd like to see something like we've outlined with Dave to be implemented but obviously that's more work and also on fs / DAX side, not only DM. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices 2018-08-30 18:49 ` Mike Snitzer (?) @ 2018-08-30 22:55 ` Dave Chinner -1 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-08-30 22:55 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu, Aug 30, 2018 at 02:49:07PM -0400, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. .... > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? They can't right now. There's unsolved races between page faults, invalidations and changing the file operations to/from DAX dynamically. This is the entire problem facing the dynamic per-inode DAX on/off flag - if it happens globally to the filesystem without warning, then the filesystem is screwed. To support the block device changing between DAX and non-DAX dynamically, then the filesystem needs to first invalidate the entire filesystem cache, eject all cached inodes from memory, any cached metadata that is using DAX, etc to clear out all the DAX mappings it have. And it has to do it without racing with new page faults or IO that might map new DAX pages. And I'm ignoring the fact that we can't eject referenced inodes (i.e. open files) from the inode cache and so we currently cannot safely change the DAX on such files. That's a blocker right now. Once we can safely change the DAX state of open files, we've got to co=ordinate the block device state change with the filesystem - the filesystem wide invalidation has to be done before the block device can start the change of state, and the filesystem must remain completely stopped until the block device has completed it's change of state. So AFAICT this ends up being "stop the world instantly, eject the world from memory, rebuild the world from scratch, start the world again". Freezing the filesystem doesn't stop the world - we can still do read IO and page faults, so that doesn't prevent pagefault races with the invalidation leaving DAX references in the page cache. Hence we currently have no valid "stop the world" mechanism in the kernel other than unmount, which we can't do while there are open files. What about MAP_SYNC applications? If we turn off DAX with those applications still running, we silently break them and users won't know until the system loses power and they see data corruption after the system comes back. However, applications SEGVing unpredictably becuse of "transparent" storage state changes is almost as unfriendly. Dynamically changing block device DAX support seems like a non-starter to me. At least, it's a non starter until we add a lot more infrastructure, solve a bunch of really hard problems and define how active userspace controlled DAX-only features behave when DAX is no longer available... Cheers, Dave. -- Dave Chinner david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 22:55 ` Dave Chinner 0 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-08-30 22:55 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, Mikulas Patocka, Jeff Moyer, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Thu, Aug 30, 2018 at 02:49:07PM -0400, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, Jan Kara <jack@suse.cz> wrote: > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. .... > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? They can't right now. There's unsolved races between page faults, invalidations and changing the file operations to/from DAX dynamically. This is the entire problem facing the dynamic per-inode DAX on/off flag - if it happens globally to the filesystem without warning, then the filesystem is screwed. To support the block device changing between DAX and non-DAX dynamically, then the filesystem needs to first invalidate the entire filesystem cache, eject all cached inodes from memory, any cached metadata that is using DAX, etc to clear out all the DAX mappings it have. And it has to do it without racing with new page faults or IO that might map new DAX pages. And I'm ignoring the fact that we can't eject referenced inodes (i.e. open files) from the inode cache and so we currently cannot safely change the DAX on such files. That's a blocker right now. Once we can safely change the DAX state of open files, we've got to co=ordinate the block device state change with the filesystem - the filesystem wide invalidation has to be done before the block device can start the change of state, and the filesystem must remain completely stopped until the block device has completed it's change of state. So AFAICT this ends up being "stop the world instantly, eject the world from memory, rebuild the world from scratch, start the world again". Freezing the filesystem doesn't stop the world - we can still do read IO and page faults, so that doesn't prevent pagefault races with the invalidation leaving DAX references in the page cache. Hence we currently have no valid "stop the world" mechanism in the kernel other than unmount, which we can't do while there are open files. What about MAP_SYNC applications? If we turn off DAX with those applications still running, we silently break them and users won't know until the system loses power and they see data corruption after the system comes back. However, applications SEGVing unpredictably becuse of "transparent" storage state changes is almost as unfriendly. Dynamically changing block device DAX support seems like a non-starter to me. At least, it's a non starter until we add a lot more infrastructure, solve a bunch of really hard problems and define how active userspace controlled DAX-only features behave when DAX is no longer available... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-30 22:55 ` Dave Chinner 0 siblings, 0 replies; 83+ messages in thread From: Dave Chinner @ 2018-08-30 22:55 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, linux-fsdevel@vger.kernel.org On Thu, Aug 30, 2018 at 02:49:07PM -0400, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, Jan Kara <jack@suse.cz> wrote: > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. .... > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? They can't right now. There's unsolved races between page faults, invalidations and changing the file operations to/from DAX dynamically. This is the entire problem facing the dynamic per-inode DAX on/off flag - if it happens globally to the filesystem without warning, then the filesystem is screwed. To support the block device changing between DAX and non-DAX dynamically, then the filesystem needs to first invalidate the entire filesystem cache, eject all cached inodes from memory, any cached metadata that is using DAX, etc to clear out all the DAX mappings it have. And it has to do it without racing with new page faults or IO that might map new DAX pages. And I'm ignoring the fact that we can't eject referenced inodes (i.e. open files) from the inode cache and so we currently cannot safely change the DAX on such files. That's a blocker right now. Once we can safely change the DAX state of open files, we've got to co=ordinate the block device state change with the filesystem - the filesystem wide invalidation has to be done before the block device can start the change of state, and the filesystem must remain completely stopped until the block device has completed it's change of state. So AFAICT this ends up being "stop the world instantly, eject the world from memory, rebuild the world from scratch, start the world again". Freezing the filesystem doesn't stop the world - we can still do read IO and page faults, so that doesn't prevent pagefault races with the invalidation leaving DAX references in the page cache. Hence we currently have no valid "stop the world" mechanism in the kernel other than unmount, which we can't do while there are open files. What about MAP_SYNC applications? If we turn off DAX with those applications still running, we silently break them and users won't know until the system loses power and they see data corruption after the system comes back. However, applications SEGVing unpredictably becuse of "transparent" storage state changes is almost as unfriendly. Dynamically changing block device DAX support seems like a non-starter to me. At least, it's a non starter until we add a lot more infrastructure, solve a bunch of really hard problems and define how active userspace controlled DAX-only features behave when DAX is no longer available... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices 2018-08-30 18:49 ` Mike Snitzer (?) @ 2018-08-31 9:54 ` Jan Kara -1 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:54 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mikulas Patocka, Ross Zwisler, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu 30-08-18 14:49:07, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > Hi, > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > DAX mounts if not supported". > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > support. > > > > > > > > > > commit b5ab4a9ba55 > > > > > commit f6e629bd237 > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > snapshots need to be mounted without dax option. > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > does not work. Just try with 4.18... > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > it. > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > reluctant to do so because it really is unclear how/if we can even > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > that could really be problematic if we leave DAX IO inflight and then > > > switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > > > So something like the above would seem like the best solution from user > > POV. Implementation of the above would not be completely trivial though as > > far as I'm looking into DM code. We'd have to implement ->direct_access > > paths for dm-snap and also I have a vague memory ->direct_access is not > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > should know how big obstacle would it be to reintroduce the sleeping > > possibility (I'm not currently aware of any particular problem with that > > but I'm not paying close attention to those parts of NVDIMM code). > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > Ross to weigh in. Ross was on vacation as well and I didn't get any email from him for a few weeks. I'm not sure when he'll be back. So I guess we are on our own. > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? As Dave wrote, switching of underlying device from DAX to non-DAX would be very difficult to implement currently to that's IMHO a no-go. > It does look like we'll need the more elaborate work you outlined > above. It could be that Mikulas will have interest, DAX expertise and > time to do the work. OK, thanks for including him. > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > fake origin_direct_access"). It gave the elusion that DAX is supported > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > that this will "just work" because the bio-based path would be used > instead is extremely fragile. > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. > I'm running the risk of making peoples' heads explode but I cannot just > drop everything and scramble to implement all the required DAX changes > in dm-snapshot. Yeah, I don't think the regression is critical. I just wanted to point out that it exists and that we should look into fixing it... Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-31 9:54 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:54 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, Mikulas Patocka, Jeff Moyer, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com, Ross Zwisler On Thu 30-08-18 14:49:07, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, > Jan Kara <jack@suse.cz> wrote: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > Jan Kara <jack@suse.cz> wrote: > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > Hi, > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > DAX mounts if not supported". > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > support. > > > > > > > > > > commit b5ab4a9ba55 > > > > > commit f6e629bd237 > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > snapshots need to be mounted without dax option. > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > does not work. Just try with 4.18... > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > it. > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > reluctant to do so because it really is unclear how/if we can even > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > that could really be problematic if we leave DAX IO inflight and then > > > switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > > > So something like the above would seem like the best solution from user > > POV. Implementation of the above would not be completely trivial though as > > far as I'm looking into DM code. We'd have to implement ->direct_access > > paths for dm-snap and also I have a vague memory ->direct_access is not > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > should know how big obstacle would it be to reintroduce the sleeping > > possibility (I'm not currently aware of any particular problem with that > > but I'm not paying close attention to those parts of NVDIMM code). > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > Ross to weigh in. Ross was on vacation as well and I didn't get any email from him for a few weeks. I'm not sure when he'll be back. So I guess we are on our own. > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? As Dave wrote, switching of underlying device from DAX to non-DAX would be very difficult to implement currently to that's IMHO a no-go. > It does look like we'll need the more elaborate work you outlined > above. It could be that Mikulas will have interest, DAX expertise and > time to do the work. OK, thanks for including him. > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > fake origin_direct_access"). It gave the elusion that DAX is supported > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > that this will "just work" because the bio-based path would be used > instead is extremely fragile. > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. > I'm running the risk of making peoples' heads explode but I cannot just > drop everything and scramble to implement all the required DAX changes > in dm-snapshot. Yeah, I don't think the regression is critical. I just wanted to point out that it exists and that we should look into fixing it... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: Snapshot target and DAX-capable devices @ 2018-08-31 9:54 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:54 UTC (permalink / raw) To: Mike Snitzer Cc: Jan Kara, linux-nvdimm@lists.01.org, dm-devel@redhat.com, Mikulas Patocka, Ross Zwisler, linux-fsdevel@vger.kernel.org On Thu 30-08-18 14:49:07, Mike Snitzer wrote: > On Thu, Aug 30 2018 at 5:30am -0400, > Jan Kara <jack@suse.cz> wrote: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > > > On Tue, Aug 28 2018 at 3:50am -0400, > > > Jan Kara <jack@suse.cz> wrote: > > > > > > > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > > > > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > > > > > > Hi, > > > > > > > > > > > > I've been analyzing why fstest generic/081 fails when the backing device is > > > > > > capable of DAX. The problem boils down to the failure of: > > > > > > > > > > > > lvm vgcreate -f vg0 /dev/pmem0 > > > > > > lvm lvcreate -L 128M -n lv0 vg0 > > > > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > > > > > > > > > > > > The last command fails like: > > > > > > > > > > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > > > > > > Failed to lock logical volume vg0/lv0. > > > > > > Aborting. Manual intervention required. > > > > > > > > > > > > And the core of the problem is that volume vg0/lv0 is originally of > > > > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > > > > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > > > > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > > > > > > DAX mounts if not supported". > > > > > > > > > > > > The question is whether / how this should be fixed. The current inability > > > > > > to create snapshots of DAX-capable devices looks weird and the cryptic > > > > > > failure makes it even worse (it took me quite a while to understand what is > > > > > > failing and why). OTOH I see the rationale behind Ross' change as well. > > > > > > > > > > Here are the dm-snap changes that went along with the original DAX > > > > > support. > > > > > > > > > > commit b5ab4a9ba55 > > > > > commit f6e629bd237 > > > > > > > > > > Basically, snapshots can be added/removed to DAX-capable devices, but > > > > > snapshots need to be mounted without dax option. > > > > > > > > Yes, and after these two commits things were working. But then commit > > > > dbc626597 broke things again so currently snapshotting DAX-capable devices > > > > does not work. Just try with 4.18... > > > > > > Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > > > such. But commit dbc626597 has caused us to regress.. so we need to fix > > > it. > > > > > > We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > > > reluctant to do so because it really is unclear how/if we can even > > > support a device switching from DAX to non-DAX while IO is in-flight. DM > > > supports suspending without flushing (via dmsetup suspend --noflush) and > > > that could really be problematic if we leave DAX IO inflight and then > > > switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > > > So something like the above would seem like the best solution from user > > POV. Implementation of the above would not be completely trivial though as > > far as I'm looking into DM code. We'd have to implement ->direct_access > > paths for dm-snap and also I have a vague memory ->direct_access is not > > allowed to sleep these days and DM uses sleeping locks all around... Dan > > should know how big obstacle would it be to reintroduce the sleeping > > possibility (I'm not currently aware of any particular problem with that > > but I'm not paying close attention to those parts of NVDIMM code). > > Thanks for these details Jan. Think Dan is on sabbatical so we'll need > Ross to weigh in. Ross was on vacation as well and I didn't get any email from him for a few weeks. I'm not sure when he'll be back. So I guess we are on our own. > As you point out, how are the upper layers (e.g. filesystems) supposed > to reliably cope with this runtime switch to from DAX to non-DAX access? As Dave wrote, switching of underlying device from DAX to non-DAX would be very difficult to implement currently to that's IMHO a no-go. > It does look like we'll need the more elaborate work you outlined > above. It could be that Mikulas will have interest, DAX expertise and > time to do the work. OK, thanks for including him. > In general I _never_ should have taken commit f6e629bd237 ("dm snap: add > fake origin_direct_access"). It gave the elusion that DAX is supported > by dm-snapshot-origin when in reality it simply returns -EIO. Expecting > that this will "just work" because the bio-based path would be used > instead is extremely fragile. > > Until we properly add DAX support to dm-snapshot I'm afraid we really do > need to tolerate this "regression". Since reality is the original > support for snapshot of a DAX DM device never worked in a robust way. > I'm running the risk of making peoples' heads explode but I cannot just > drop everything and scramble to implement all the required DAX changes > in dm-snapshot. Yeah, I don't think the regression is critical. I just wanted to point out that it exists and that we should look into fixing it... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [dm-devel] Snapshot target and DAX-capable devices 2018-08-30 9:30 ` Jan Kara (?) @ 2018-08-30 19:17 ` Jeff Moyer -1 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:17 UTC (permalink / raw) To: Jan Kara Cc: Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> writes: > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: >> On Tue, Aug 28 2018 at 3:50am -0400, >> Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: >> >> > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: >> > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: >> > > > Hi, >> > > > >> > > > I've been analyzing why fstest generic/081 fails when the backing device is >> > > > capable of DAX. The problem boils down to the failure of: >> > > > >> > > > lvm vgcreate -f vg0 /dev/pmem0 >> > > > lvm lvcreate -L 128M -n lv0 vg0 >> > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 >> > > > >> > > > The last command fails like: >> > > > >> > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument >> > > > Failed to lock logical volume vg0/lv0. >> > > > Aborting. Manual intervention required. >> > > > >> > > > And the core of the problem is that volume vg0/lv0 is originally of >> > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to >> > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. >> > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent >> > > > DAX mounts if not supported". >> > > > >> > > > The question is whether / how this should be fixed. The current inability >> > > > to create snapshots of DAX-capable devices looks weird and the cryptic >> > > > failure makes it even worse (it took me quite a while to understand what is >> > > > failing and why). OTOH I see the rationale behind Ross' change as well. >> > > >> > > Here are the dm-snap changes that went along with the original DAX >> > > support. >> > > >> > > commit b5ab4a9ba55 >> > > commit f6e629bd237 >> > > >> > > Basically, snapshots can be added/removed to DAX-capable devices, but >> > > snapshots need to be mounted without dax option. >> > >> > Yes, and after these two commits things were working. But then commit >> > dbc626597 broke things again so currently snapshotting DAX-capable devices >> > does not work. Just try with 4.18... >> >> Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as >> such. But commit dbc626597 has caused us to regress.. so we need to fix >> it. >> >> We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was >> reluctant to do so because it really is unclear how/if we can even >> support a device switching from DAX to non-DAX while IO is in-flight. DM >> supports suspending without flushing (via dmsetup suspend --noflush) and >> that could really be problematic if we leave DAX IO inflight and then >> switch the DM table such that the DM device no longer supports DAX. > > Well, changing device from DAX-capable to DAX-incapable is problematic for > filesystem on top of it as well. Filesystems simply don't expect this > feature of a device can change so they would fail in unexpected ways. Also > PFNs from the pmem (DAX-capable) device that are already mapped to user page > tables won't magically become unmapped so those processes will still have > DAX access to those areas of the device. > > But, if both original bdev and COW device are DAX-capable, we *should* be > able to support snapshotting (and refusing mixing of DAX-capable and > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > When creating a snapshot of a device, we need to freeze the filesystem > using it. That will writeprotect all page tables so we are sure we'll get > page faults (and thus ->direct_access requests from DM POV) for each write > attempt to any mapping. Then ->direct_access method of snapshot-origin can > make sure to copy original contents to the COW-device before returning PFN > from ->direct_access. Similarly ->direct_access of COW-device can provide > remapped PFN so everything should work seamlessly from user POV. In your example above, if two processes have a file mapped with MAP_SHARED, and P1 does a store, the new contents will not be reflected in P2, right?. This is different from what is expected, and different from what happens when the page cache is involved. I think you'd need to unmap all mappings on a CoW, whether triggered by a store to an existing mapping or a write(2). -Jeff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [dm-devel] Snapshot target and DAX-capable devices @ 2018-08-30 19:17 ` Jeff Moyer 0 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:17 UTC (permalink / raw) To: Jan Kara Cc: Mike Snitzer, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com Jan Kara <jack@suse.cz> writes: > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: >> On Tue, Aug 28 2018 at 3:50am -0400, >> Jan Kara <jack@suse.cz> wrote: >> >> > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: >> > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: >> > > > Hi, >> > > > >> > > > I've been analyzing why fstest generic/081 fails when the backing device is >> > > > capable of DAX. The problem boils down to the failure of: >> > > > >> > > > lvm vgcreate -f vg0 /dev/pmem0 >> > > > lvm lvcreate -L 128M -n lv0 vg0 >> > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 >> > > > >> > > > The last command fails like: >> > > > >> > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument >> > > > Failed to lock logical volume vg0/lv0. >> > > > Aborting. Manual intervention required. >> > > > >> > > > And the core of the problem is that volume vg0/lv0 is originally of >> > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to >> > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. >> > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent >> > > > DAX mounts if not supported". >> > > > >> > > > The question is whether / how this should be fixed. The current inability >> > > > to create snapshots of DAX-capable devices looks weird and the cryptic >> > > > failure makes it even worse (it took me quite a while to understand what is >> > > > failing and why). OTOH I see the rationale behind Ross' change as well. >> > > >> > > Here are the dm-snap changes that went along with the original DAX >> > > support. >> > > >> > > commit b5ab4a9ba55 >> > > commit f6e629bd237 >> > > >> > > Basically, snapshots can be added/removed to DAX-capable devices, but >> > > snapshots need to be mounted without dax option. >> > >> > Yes, and after these two commits things were working. But then commit >> > dbc626597 broke things again so currently snapshotting DAX-capable devices >> > does not work. Just try with 4.18... >> >> Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as >> such. But commit dbc626597 has caused us to regress.. so we need to fix >> it. >> >> We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was >> reluctant to do so because it really is unclear how/if we can even >> support a device switching from DAX to non-DAX while IO is in-flight. DM >> supports suspending without flushing (via dmsetup suspend --noflush) and >> that could really be problematic if we leave DAX IO inflight and then >> switch the DM table such that the DM device no longer supports DAX. > > Well, changing device from DAX-capable to DAX-incapable is problematic for > filesystem on top of it as well. Filesystems simply don't expect this > feature of a device can change so they would fail in unexpected ways. Also > PFNs from the pmem (DAX-capable) device that are already mapped to user page > tables won't magically become unmapped so those processes will still have > DAX access to those areas of the device. > > But, if both original bdev and COW device are DAX-capable, we *should* be > able to support snapshotting (and refusing mixing of DAX-capable and > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > When creating a snapshot of a device, we need to freeze the filesystem > using it. That will writeprotect all page tables so we are sure we'll get > page faults (and thus ->direct_access requests from DM POV) for each write > attempt to any mapping. Then ->direct_access method of snapshot-origin can > make sure to copy original contents to the COW-device before returning PFN > from ->direct_access. Similarly ->direct_access of COW-device can provide > remapped PFN so everything should work seamlessly from user POV. In your example above, if two processes have a file mapped with MAP_SHARED, and P1 does a store, the new contents will not be reflected in P2, right?. This is different from what is expected, and different from what happens when the page cache is involved. I think you'd need to unmap all mappings on a CoW, whether triggered by a store to an existing mapping or a write(2). -Jeff ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [dm-devel] Snapshot target and DAX-capable devices @ 2018-08-30 19:17 ` Jeff Moyer 0 siblings, 0 replies; 83+ messages in thread From: Jeff Moyer @ 2018-08-30 19:17 UTC (permalink / raw) To: Jan Kara Cc: Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org Jan Kara <jack@suse.cz> writes: > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: >> On Tue, Aug 28 2018 at 3:50am -0400, >> Jan Kara <jack@suse.cz> wrote: >> >> > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: >> > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: >> > > > Hi, >> > > > >> > > > I've been analyzing why fstest generic/081 fails when the backing device is >> > > > capable of DAX. The problem boils down to the failure of: >> > > > >> > > > lvm vgcreate -f vg0 /dev/pmem0 >> > > > lvm lvcreate -L 128M -n lv0 vg0 >> > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 >> > > > >> > > > The last command fails like: >> > > > >> > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument >> > > > Failed to lock logical volume vg0/lv0. >> > > > Aborting. Manual intervention required. >> > > > >> > > > And the core of the problem is that volume vg0/lv0 is originally of >> > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to >> > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. >> > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent >> > > > DAX mounts if not supported". >> > > > >> > > > The question is whether / how this should be fixed. The current inability >> > > > to create snapshots of DAX-capable devices looks weird and the cryptic >> > > > failure makes it even worse (it took me quite a while to understand what is >> > > > failing and why). OTOH I see the rationale behind Ross' change as well. >> > > >> > > Here are the dm-snap changes that went along with the original DAX >> > > support. >> > > >> > > commit b5ab4a9ba55 >> > > commit f6e629bd237 >> > > >> > > Basically, snapshots can be added/removed to DAX-capable devices, but >> > > snapshots need to be mounted without dax option. >> > >> > Yes, and after these two commits things were working. But then commit >> > dbc626597 broke things again so currently snapshotting DAX-capable devices >> > does not work. Just try with 4.18... >> >> Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as >> such. But commit dbc626597 has caused us to regress.. so we need to fix >> it. >> >> We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was >> reluctant to do so because it really is unclear how/if we can even >> support a device switching from DAX to non-DAX while IO is in-flight. DM >> supports suspending without flushing (via dmsetup suspend --noflush) and >> that could really be problematic if we leave DAX IO inflight and then >> switch the DM table such that the DM device no longer supports DAX. > > Well, changing device from DAX-capable to DAX-incapable is problematic for > filesystem on top of it as well. Filesystems simply don't expect this > feature of a device can change so they would fail in unexpected ways. Also > PFNs from the pmem (DAX-capable) device that are already mapped to user page > tables won't magically become unmapped so those processes will still have > DAX access to those areas of the device. > > But, if both original bdev and COW device are DAX-capable, we *should* be > able to support snapshotting (and refusing mixing of DAX-capable and > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > When creating a snapshot of a device, we need to freeze the filesystem > using it. That will writeprotect all page tables so we are sure we'll get > page faults (and thus ->direct_access requests from DM POV) for each write > attempt to any mapping. Then ->direct_access method of snapshot-origin can > make sure to copy original contents to the COW-device before returning PFN > from ->direct_access. Similarly ->direct_access of COW-device can provide > remapped PFN so everything should work seamlessly from user POV. In your example above, if two processes have a file mapped with MAP_SHARED, and P1 does a store, the new contents will not be reflected in P2, right?. This is different from what is expected, and different from what happens when the page cache is involved. I think you'd need to unmap all mappings on a CoW, whether triggered by a store to an existing mapping or a write(2). -Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
[parent not found: <x498t4naclf.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>]
* Re: [dm-devel] Snapshot target and DAX-capable devices 2018-08-30 19:17 ` Jeff Moyer (?) @ 2018-08-31 9:14 ` Jan Kara -1 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:14 UTC (permalink / raw) To: Jeff Moyer Cc: Jan Kara, Mike Snitzer, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Thu 30-08-18 15:17:16, Jeff Moyer wrote: > Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> writes: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > >> On Tue, Aug 28 2018 at 3:50am -0400, > >> Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote: > >> > >> > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > >> > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > >> > > > Hi, > >> > > > > >> > > > I've been analyzing why fstest generic/081 fails when the backing device is > >> > > > capable of DAX. The problem boils down to the failure of: > >> > > > > >> > > > lvm vgcreate -f vg0 /dev/pmem0 > >> > > > lvm lvcreate -L 128M -n lv0 vg0 > >> > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > >> > > > > >> > > > The last command fails like: > >> > > > > >> > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > >> > > > Failed to lock logical volume vg0/lv0. > >> > > > Aborting. Manual intervention required. > >> > > > > >> > > > And the core of the problem is that volume vg0/lv0 is originally of > >> > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > >> > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > >> > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > >> > > > DAX mounts if not supported". > >> > > > > >> > > > The question is whether / how this should be fixed. The current inability > >> > > > to create snapshots of DAX-capable devices looks weird and the cryptic > >> > > > failure makes it even worse (it took me quite a while to understand what is > >> > > > failing and why). OTOH I see the rationale behind Ross' change as well. > >> > > > >> > > Here are the dm-snap changes that went along with the original DAX > >> > > support. > >> > > > >> > > commit b5ab4a9ba55 > >> > > commit f6e629bd237 > >> > > > >> > > Basically, snapshots can be added/removed to DAX-capable devices, but > >> > > snapshots need to be mounted without dax option. > >> > > >> > Yes, and after these two commits things were working. But then commit > >> > dbc626597 broke things again so currently snapshotting DAX-capable devices > >> > does not work. Just try with 4.18... > >> > >> Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > >> such. But commit dbc626597 has caused us to regress.. so we need to fix > >> it. > >> > >> We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > >> reluctant to do so because it really is unclear how/if we can even > >> support a device switching from DAX to non-DAX while IO is in-flight. DM > >> supports suspending without flushing (via dmsetup suspend --noflush) and > >> that could really be problematic if we leave DAX IO inflight and then > >> switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > In your example above, if two processes have a file mapped with > MAP_SHARED, and P1 does a store, the new contents will not be reflected > in P2, right?. This is different from what is expected, and different > from what happens when the page cache is involved. > > I think you'd need to unmap all mappings on a CoW, whether triggered by > a store to an existing mapping or a write(2). Yes, you are right. For COW-device we need to unmap all DAX mappings before doing CoW. But for snapshot-origin device, we don't need that, right? As for that case no block actually changes location. So there notification to DM on first write access should be enough. Am I understanding the problem right? Honza -- Jan Kara <jack-IBi9RG/b67k@public.gmane.org> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [dm-devel] Snapshot target and DAX-capable devices @ 2018-08-31 9:14 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:14 UTC (permalink / raw) To: Jeff Moyer Cc: Jan Kara, Mike Snitzer, Kani, Toshi, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, ross.zwisler@linux.intel.com, dan.j.williams@intel.com On Thu 30-08-18 15:17:16, Jeff Moyer wrote: > Jan Kara <jack@suse.cz> writes: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > >> On Tue, Aug 28 2018 at 3:50am -0400, > >> Jan Kara <jack@suse.cz> wrote: > >> > >> > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > >> > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > >> > > > Hi, > >> > > > > >> > > > I've been analyzing why fstest generic/081 fails when the backing device is > >> > > > capable of DAX. The problem boils down to the failure of: > >> > > > > >> > > > lvm vgcreate -f vg0 /dev/pmem0 > >> > > > lvm lvcreate -L 128M -n lv0 vg0 > >> > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > >> > > > > >> > > > The last command fails like: > >> > > > > >> > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > >> > > > Failed to lock logical volume vg0/lv0. > >> > > > Aborting. Manual intervention required. > >> > > > > >> > > > And the core of the problem is that volume vg0/lv0 is originally of > >> > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > >> > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > >> > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > >> > > > DAX mounts if not supported". > >> > > > > >> > > > The question is whether / how this should be fixed. The current inability > >> > > > to create snapshots of DAX-capable devices looks weird and the cryptic > >> > > > failure makes it even worse (it took me quite a while to understand what is > >> > > > failing and why). OTOH I see the rationale behind Ross' change as well. > >> > > > >> > > Here are the dm-snap changes that went along with the original DAX > >> > > support. > >> > > > >> > > commit b5ab4a9ba55 > >> > > commit f6e629bd237 > >> > > > >> > > Basically, snapshots can be added/removed to DAX-capable devices, but > >> > > snapshots need to be mounted without dax option. > >> > > >> > Yes, and after these two commits things were working. But then commit > >> > dbc626597 broke things again so currently snapshotting DAX-capable devices > >> > does not work. Just try with 4.18... > >> > >> Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > >> such. But commit dbc626597 has caused us to regress.. so we need to fix > >> it. > >> > >> We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > >> reluctant to do so because it really is unclear how/if we can even > >> support a device switching from DAX to non-DAX while IO is in-flight. DM > >> supports suspending without flushing (via dmsetup suspend --noflush) and > >> that could really be problematic if we leave DAX IO inflight and then > >> switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > In your example above, if two processes have a file mapped with > MAP_SHARED, and P1 does a store, the new contents will not be reflected > in P2, right?. This is different from what is expected, and different > from what happens when the page cache is involved. > > I think you'd need to unmap all mappings on a CoW, whether triggered by > a store to an existing mapping or a write(2). Yes, you are right. For COW-device we need to unmap all DAX mappings before doing CoW. But for snapshot-origin device, we don't need that, right? As for that case no block actually changes location. So there notification to DM on first write access should be enough. Am I understanding the problem right? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 83+ messages in thread
* Re: [dm-devel] Snapshot target and DAX-capable devices @ 2018-08-31 9:14 ` Jan Kara 0 siblings, 0 replies; 83+ messages in thread From: Jan Kara @ 2018-08-31 9:14 UTC (permalink / raw) To: Jeff Moyer Cc: Jan Kara, Mike Snitzer, linux-nvdimm@lists.01.org, dm-devel@redhat.com, linux-fsdevel@vger.kernel.org On Thu 30-08-18 15:17:16, Jeff Moyer wrote: > Jan Kara <jack@suse.cz> writes: > > > On Tue 28-08-18 13:56:30, Mike Snitzer wrote: > >> On Tue, Aug 28 2018 at 3:50am -0400, > >> Jan Kara <jack@suse.cz> wrote: > >> > >> > On Mon 27-08-18 16:43:28, Kani, Toshi wrote: > >> > > On Mon, 2018-08-27 at 18:07 +0200, Jan Kara wrote: > >> > > > Hi, > >> > > > > >> > > > I've been analyzing why fstest generic/081 fails when the backing device is > >> > > > capable of DAX. The problem boils down to the failure of: > >> > > > > >> > > > lvm vgcreate -f vg0 /dev/pmem0 > >> > > > lvm lvcreate -L 128M -n lv0 vg0 > >> > > > lvm lvcreate -s -L 4M -n snap0 vg0/lv0 > >> > > > > >> > > > The last command fails like: > >> > > > > >> > > > device-mapper: reload ioctl on (253:0) failed: Invalid argument > >> > > > Failed to lock logical volume vg0/lv0. > >> > > > Aborting. Manual intervention required. > >> > > > > >> > > > And the core of the problem is that volume vg0/lv0 is originally of > >> > > > DM_TYPE_DAX_BIO_BASED type but when the snapshot gets created, we try to > >> > > > switch it to DM_TYPE_BIO_BASED because now the device stops supporting DAX. > >> > > > The problem seems to be introduced by Ross' commit dbc626597 "dm: prevent > >> > > > DAX mounts if not supported". > >> > > > > >> > > > The question is whether / how this should be fixed. The current inability > >> > > > to create snapshots of DAX-capable devices looks weird and the cryptic > >> > > > failure makes it even worse (it took me quite a while to understand what is > >> > > > failing and why). OTOH I see the rationale behind Ross' change as well. > >> > > > >> > > Here are the dm-snap changes that went along with the original DAX > >> > > support. > >> > > > >> > > commit b5ab4a9ba55 > >> > > commit f6e629bd237 > >> > > > >> > > Basically, snapshots can be added/removed to DAX-capable devices, but > >> > > snapshots need to be mounted without dax option. > >> > > >> > Yes, and after these two commits things were working. But then commit > >> > dbc626597 broke things again so currently snapshotting DAX-capable devices > >> > does not work. Just try with 4.18... > >> > >> Commit f6e629bd237 was a nasty hack, and commit dbc626597 exposed it as > >> such. But commit dbc626597 has caused us to regress.. so we need to fix > >> it. > >> > >> We could remove DM_TYPE_DAX_BIO_BASED completely. But in the past I was > >> reluctant to do so because it really is unclear how/if we can even > >> support a device switching from DAX to non-DAX while IO is in-flight. DM > >> supports suspending without flushing (via dmsetup suspend --noflush) and > >> that could really be problematic if we leave DAX IO inflight and then > >> switch the DM table such that the DM device no longer supports DAX. > > > > Well, changing device from DAX-capable to DAX-incapable is problematic for > > filesystem on top of it as well. Filesystems simply don't expect this > > feature of a device can change so they would fail in unexpected ways. Also > > PFNs from the pmem (DAX-capable) device that are already mapped to user page > > tables won't magically become unmapped so those processes will still have > > DAX access to those areas of the device. > > > > But, if both original bdev and COW device are DAX-capable, we *should* be > > able to support snapshotting (and refusing mixing of DAX-capable and > > DAX-incapable devices in a snapshot is IMHO not very surprising to users). > > When creating a snapshot of a device, we need to freeze the filesystem > > using it. That will writeprotect all page tables so we are sure we'll get > > page faults (and thus ->direct_access requests from DM POV) for each write > > attempt to any mapping. Then ->direct_access method of snapshot-origin can > > make sure to copy original contents to the COW-device before returning PFN > > from ->direct_access. Similarly ->direct_access of COW-device can provide > > remapped PFN so everything should work seamlessly from user POV. > > In your example above, if two processes have a file mapped with > MAP_SHARED, and P1 does a store, the new contents will not be reflected > in P2, right?. This is different from what is expected, and different > from what happens when the page cache is involved. > > I think you'd need to unmap all mappings on a CoW, whether triggered by > a store to an existing mapping or a write(2). Yes, you are right. For COW-device we need to unmap all DAX mappings before doing CoW. But for snapshot-origin device, we don't need that, right? As for that case no block actually changes location. So there notification to DM on first write access should be enough. Am I understanding the problem right? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 83+ messages in thread
end of thread, other threads:[~2018-12-18 19:49 UTC | newest]
Thread overview: 83+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-27 16:07 Snapshot target and DAX-capable devices Jan Kara
2018-08-27 16:07 ` Jan Kara
2018-08-27 16:07 ` Jan Kara
[not found] ` <20180827160744.GE4002-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-08-27 16:43 ` Kani, Toshi
2018-08-27 16:43 ` Kani, Toshi
2018-08-27 16:43 ` Kani, Toshi
[not found] ` <e38303902267d2d8bae8b0c88da84a4ed668e9fb.camel-ZPxbGqLxI0U@public.gmane.org>
2018-08-28 7:50 ` Jan Kara
2018-08-28 7:50 ` Jan Kara
2018-08-28 7:50 ` Jan Kara
[not found] ` <20180828075025.GA17756-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-08-28 17:56 ` Mike Snitzer
2018-08-28 17:56 ` Mike Snitzer
2018-08-28 17:56 ` Mike Snitzer
[not found] ` <20180828175630.GA1197-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-08-28 22:38 ` Kani, Toshi
2018-08-28 22:38 ` Kani, Toshi
2018-08-28 22:38 ` Kani, Toshi
2018-08-30 9:30 ` Jan Kara
2018-08-30 9:30 ` Jan Kara
2018-08-30 9:30 ` Jan Kara
[not found] ` <20180830093028.GC1767-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-08-30 18:49 ` Mike Snitzer
2018-08-30 18:49 ` Mike Snitzer
2018-08-30 18:49 ` Mike Snitzer
[not found] ` <20180830184907.GA14867-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-08-30 19:32 ` Jeff Moyer
2018-08-30 19:32 ` Jeff Moyer
2018-08-30 19:32 ` Jeff Moyer
[not found] ` <x494lfbabwi.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2018-08-30 19:47 ` Mikulas Patocka
2018-08-30 19:47 ` Mikulas Patocka
2018-08-30 19:47 ` Mikulas Patocka
[not found] ` <alpine.LRH.2.02.1808301545200.30950-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>
2018-08-30 19:53 ` Jeff Moyer
2018-08-30 19:53 ` Jeff Moyer
2018-08-30 19:53 ` Jeff Moyer
2018-08-30 23:38 ` Dave Chinner
2018-08-30 23:38 ` Dave Chinner
2018-08-30 23:38 ` Dave Chinner
2018-08-31 9:42 ` Jan Kara
2018-08-31 9:42 ` Jan Kara
2018-08-31 9:42 ` Jan Kara
2018-09-05 1:25 ` Dave Chinner
2018-09-05 1:25 ` Dave Chinner
2018-09-05 1:25 ` Dave Chinner
[not found] ` <20180831094255.GB11622-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-12-12 16:11 ` Huaisheng Ye
2018-12-12 16:11 ` Huaisheng Ye
2018-12-12 16:11 ` Huaisheng Ye
[not found] ` <167a3303a01.11a848ab768799.5161498967766415143-ytc+IHgoah0@public.gmane.org>
2018-12-12 16:12 ` Christoph Hellwig
2018-12-12 16:12 ` Christoph Hellwig
2018-12-12 16:12 ` Christoph Hellwig
[not found] ` <20181212161254.GA20790-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2018-12-12 17:50 ` Mike Snitzer
2018-12-12 17:50 ` Mike Snitzer
2018-12-12 17:50 ` Mike Snitzer
[not found] ` <20181212175047.GA24962-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-12-12 19:49 ` Kani, Toshi
2018-12-12 19:49 ` Kani, Toshi
2018-12-12 19:49 ` Kani, Toshi
2018-12-12 21:15 ` Theodore Y. Ts'o
2018-12-12 21:15 ` Theodore Y. Ts'o
[not found] ` <20181212211547.GA24926-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2018-12-12 22:43 ` Mike Snitzer
2018-12-12 22:43 ` Mike Snitzer
2018-12-12 22:43 ` Mike Snitzer
[not found] ` <20181212224321.GA2902-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-12-14 4:11 ` [dm-devel] " Theodore Y. Ts'o
2018-12-14 4:11 ` Theodore Y. Ts'o
2018-12-14 4:11 ` Theodore Y. Ts'o
2018-12-14 8:24 ` [External] " Huaisheng HS1 Ye
2018-12-14 8:24 ` Huaisheng HS1 Ye
2018-12-14 8:24 ` Huaisheng HS1 Ye
[not found] ` <HK2PR03MB441871946735DE9EE714D10B92A10-LG58XzHXFHCi7fCZ8j4jr682SN/2zMuYvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-12-18 19:49 ` Mike Snitzer
2018-12-18 19:49 ` Mike Snitzer
2018-12-18 19:49 ` Mike Snitzer
2018-08-30 19:44 ` Mikulas Patocka
2018-08-30 19:44 ` Mikulas Patocka
2018-08-30 19:44 ` Mikulas Patocka
[not found] ` <alpine.LRH.2.02.1808301537420.30950-Hpncn10jQN4oNljnaZt3ZvA+iT7yCHsGwRM8/txMwJMAicBL8TP8PQ@public.gmane.org>
2018-08-31 10:01 ` Jan Kara
2018-08-31 10:01 ` Jan Kara
2018-08-31 10:01 ` Jan Kara
2018-08-30 22:55 ` Dave Chinner
2018-08-30 22:55 ` Dave Chinner
2018-08-30 22:55 ` Dave Chinner
2018-08-31 9:54 ` Jan Kara
2018-08-31 9:54 ` Jan Kara
2018-08-31 9:54 ` Jan Kara
2018-08-30 19:17 ` [dm-devel] " Jeff Moyer
2018-08-30 19:17 ` Jeff Moyer
2018-08-30 19:17 ` Jeff Moyer
[not found] ` <x498t4naclf.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2018-08-31 9:14 ` Jan Kara
2018-08-31 9:14 ` Jan Kara
2018-08-31 9:14 ` Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.