* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing [not found] ` <20180519052635.567438191@debian.vm> @ 2018-05-22 6:39 ` Christoph Hellwig 2018-05-22 18:41 ` Mike Snitzer [not found] ` <CAPcyv4iEtfuVGPR0QMKcafv2XFwSj3nzxjX8cuXpXe00akAvYA@mail.gmail.com> 1 sibling, 1 reply; 40+ messages in thread From: Christoph Hellwig @ 2018-05-22 6:39 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Mike Snitzer, dm-devel, linux-nvdimm On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote: > Use new API for flushing persistent memory. The sentence doesnt make much sense. 'A new API', 'A better abstraction' maybe? > > The problem is this: > * on X86-64, non-temporal stores have the best performance > * ARM64 doesn't have non-temporal stores, so we must flush cache. We > should flush cache as late as possible, because it performs better this > way. > > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit > data persistently, all three functions must be called. > > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. > > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and > pmem_commit is wmb. > > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and > pmem_commit is empty. All these should be provided by the pmem layer, and be properly documented. And be sorted before adding your new target that uses them. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 6:39 ` [dm-devel] [patch 4/4] dm-writecache: use new API for flushing Christoph Hellwig @ 2018-05-22 18:41 ` Mike Snitzer 2018-05-22 19:00 ` Dan Williams 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-22 18:41 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Mikulas Patocka, dm-devel, linux-nvdimm On Tue, May 22 2018 at 2:39am -0400, Christoph Hellwig <hch@infradead.org> wrote: > On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote: > > Use new API for flushing persistent memory. > > The sentence doesnt make much sense. 'A new API', 'A better > abstraction' maybe? > > > > > The problem is this: > > * on X86-64, non-temporal stores have the best performance > > * ARM64 doesn't have non-temporal stores, so we must flush cache. We > > should flush cache as late as possible, because it performs better this > > way. > > > > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit > > data persistently, all three functions must be called. > > > > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign > > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. > > > > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and > > pmem_commit is wmb. > > > > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and > > pmem_commit is empty. > > All these should be provided by the pmem layer, and be properly > documented. And be sorted before adding your new target that uses > them. I don't see that as a hard requirement. Mikulas did the work to figure out what is more optimal on x86_64 vs amd64. It makes a difference for his target and that is sufficient to carry it locally until/when it is either elevated to pmem. We cannot even get x86 and swait maintainers to reply to repeat requests for review. Stacking up further deps on pmem isn't high on my list. Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 18:41 ` Mike Snitzer @ 2018-05-22 19:00 ` Dan Williams 2018-05-22 19:19 ` Mike Snitzer 2018-05-24 8:15 ` Mikulas Patocka 0 siblings, 2 replies; 40+ messages in thread From: Dan Williams @ 2018-05-22 19:00 UTC (permalink / raw) To: Mike Snitzer Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka, linux-nvdimm On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, May 22 2018 at 2:39am -0400, > Christoph Hellwig <hch@infradead.org> wrote: > >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote: >> > Use new API for flushing persistent memory. >> >> The sentence doesnt make much sense. 'A new API', 'A better >> abstraction' maybe? >> >> > >> > The problem is this: >> > * on X86-64, non-temporal stores have the best performance >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We >> > should flush cache as late as possible, because it performs better this >> > way. >> > >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit >> > data persistently, all three functions must be called. >> > >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. >> > >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and >> > pmem_commit is wmb. >> > >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and >> > pmem_commit is empty. >> >> All these should be provided by the pmem layer, and be properly >> documented. And be sorted before adding your new target that uses >> them. > > I don't see that as a hard requirement. Mikulas did the work to figure > out what is more optimal on x86_64 vs amd64. It makes a difference for > his target and that is sufficient to carry it locally until/when it is > either elevated to pmem. > > We cannot even get x86 and swait maintainers to reply to repeat requests > for review. Stacking up further deps on pmem isn't high on my list. > Except I'm being responsive. I agree with Christoph that we should build pmem helpers at an architecture level and not per-driver. Let's make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch up to x86 in this space. We already have PowerPC enabling PMEM API, so I don't see an unreasonable barrier to ask the same of ARM. This patch is not even cc'd to linux-arm-kernel. Has the subject been broached with them? _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 19:00 ` Dan Williams @ 2018-05-22 19:19 ` Mike Snitzer 2018-05-22 19:27 ` Dan Williams 2018-05-24 8:15 ` Mikulas Patocka 1 sibling, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-22 19:19 UTC (permalink / raw) To: Dan Williams Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka, linux-nvdimm On Tue, May 22 2018 at 3:00pm -0400, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote: > > On Tue, May 22 2018 at 2:39am -0400, > > Christoph Hellwig <hch@infradead.org> wrote: > > > >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote: > >> > Use new API for flushing persistent memory. > >> > >> The sentence doesnt make much sense. 'A new API', 'A better > >> abstraction' maybe? > >> > >> > > >> > The problem is this: > >> > * on X86-64, non-temporal stores have the best performance > >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We > >> > should flush cache as late as possible, because it performs better this > >> > way. > >> > > >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit > >> > data persistently, all three functions must be called. > >> > > >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign > >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. > >> > > >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and > >> > pmem_commit is wmb. > >> > > >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and > >> > pmem_commit is empty. > >> > >> All these should be provided by the pmem layer, and be properly > >> documented. And be sorted before adding your new target that uses > >> them. > > > > I don't see that as a hard requirement. Mikulas did the work to figure > > out what is more optimal on x86_64 vs amd64. It makes a difference for > > his target and that is sufficient to carry it locally until/when it is > > either elevated to pmem. > > > > We cannot even get x86 and swait maintainers to reply to repeat requests > > for review. Stacking up further deps on pmem isn't high on my list. > > > > Except I'm being responsive. Except you're looking to immediately punt to linux-arm-kernel ;) > I agree with Christoph that we should > build pmem helpers at an architecture level and not per-driver. Let's > make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch > up to x86 in this space. We already have PowerPC enabling PMEM API, so > I don't see an unreasonable barrier to ask the same of ARM. This patch > is not even cc'd to linux-arm-kernel. Has the subject been broached > with them? No idea. Not by me. The thing is, I'm no expert in pmem. You are. Coordinating the change with ARM et al feels unnecessarily limiting and quicky moves outside my control. Serious question: Why can't this code land in this dm-writecache target and then be lifted (or obsoleted)? But if you think it worthwhile to force ARM to step up then fine. That does limit the availability of using writecache on ARM while they get the PMEM API together. I'll do whatever you want.. just put the smack down and tell me how it is ;) Thanks, Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 19:19 ` Mike Snitzer @ 2018-05-22 19:27 ` Dan Williams 2018-05-22 20:52 ` Mike Snitzer 2018-05-28 13:52 ` Mikulas Patocka 0 siblings, 2 replies; 40+ messages in thread From: Dan Williams @ 2018-05-22 19:27 UTC (permalink / raw) To: Mike Snitzer Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka, linux-nvdimm On Tue, May 22, 2018 at 12:19 PM, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, May 22 2018 at 3:00pm -0400, > Dan Williams <dan.j.williams@intel.com> wrote: > >> On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote: >> > On Tue, May 22 2018 at 2:39am -0400, >> > Christoph Hellwig <hch@infradead.org> wrote: >> > >> >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote: >> >> > Use new API for flushing persistent memory. >> >> >> >> The sentence doesnt make much sense. 'A new API', 'A better >> >> abstraction' maybe? >> >> >> >> > >> >> > The problem is this: >> >> > * on X86-64, non-temporal stores have the best performance >> >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We >> >> > should flush cache as late as possible, because it performs better this >> >> > way. >> >> > >> >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit >> >> > data persistently, all three functions must be called. >> >> > >> >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign >> >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. >> >> > >> >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and >> >> > pmem_commit is wmb. >> >> > >> >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and >> >> > pmem_commit is empty. >> >> >> >> All these should be provided by the pmem layer, and be properly >> >> documented. And be sorted before adding your new target that uses >> >> them. >> > >> > I don't see that as a hard requirement. Mikulas did the work to figure >> > out what is more optimal on x86_64 vs amd64. It makes a difference for >> > his target and that is sufficient to carry it locally until/when it is >> > either elevated to pmem. >> > >> > We cannot even get x86 and swait maintainers to reply to repeat requests >> > for review. Stacking up further deps on pmem isn't high on my list. >> > >> >> Except I'm being responsive. > > Except you're looking to immediately punt to linux-arm-kernel ;) Well, I'm not, not really. I'm saying drop ARM support, it's not ready. > >> I agree with Christoph that we should >> build pmem helpers at an architecture level and not per-driver. Let's >> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch >> up to x86 in this space. We already have PowerPC enabling PMEM API, so >> I don't see an unreasonable barrier to ask the same of ARM. This patch >> is not even cc'd to linux-arm-kernel. Has the subject been broached >> with them? > > No idea. Not by me. > > The thing is, I'm no expert in pmem. You are. Coordinating the change > with ARM et al feels unnecessarily limiting and quicky moves outside my > control. > > Serious question: Why can't this code land in this dm-writecache target > and then be lifted (or obsoleted)? Because we already have an API, and we don't want to promote local solutions to global problems, or carry unnecessary technical debt. > > But if you think it worthwhile to force ARM to step up then fine. That > does limit the availability of using writecache on ARM while they get > the PMEM API together. > > I'll do whatever you want.. just put the smack down and tell me how it > is ;) I'd say just control the variables you can control. Drop the ARM support if you want to move forward and propose extensions / updates to the pmem api for x86 and I'll help push those since I was involved in pushing the x86 pmem api in the first instance. That way you don't need to touch this driver as new archs add their pmem api enabling. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 19:27 ` Dan Williams @ 2018-05-22 20:52 ` Mike Snitzer 2018-05-22 22:53 ` [dm-devel] " Jeff Moyer 2018-05-28 13:52 ` Mikulas Patocka 1 sibling, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-22 20:52 UTC (permalink / raw) To: Dan Williams Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka, linux-nvdimm On Tue, May 22 2018 at 3:27pm -0400, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, May 22, 2018 at 12:19 PM, Mike Snitzer <snitzer@redhat.com> wrote: > > On Tue, May 22 2018 at 3:00pm -0400, > > Dan Williams <dan.j.williams@intel.com> wrote: > > > >> On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote: > >> > On Tue, May 22 2018 at 2:39am -0400, > >> > Christoph Hellwig <hch@infradead.org> wrote: > >> > > >> >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote: > >> >> > Use new API for flushing persistent memory. > >> >> > >> >> The sentence doesnt make much sense. 'A new API', 'A better > >> >> abstraction' maybe? > >> >> > >> >> > > >> >> > The problem is this: > >> >> > * on X86-64, non-temporal stores have the best performance > >> >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We > >> >> > should flush cache as late as possible, because it performs better this > >> >> > way. > >> >> > > >> >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit > >> >> > data persistently, all three functions must be called. > >> >> > > >> >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign > >> >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. > >> >> > > >> >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and > >> >> > pmem_commit is wmb. > >> >> > > >> >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and > >> >> > pmem_commit is empty. > >> >> > >> >> All these should be provided by the pmem layer, and be properly > >> >> documented. And be sorted before adding your new target that uses > >> >> them. > >> > > >> > I don't see that as a hard requirement. Mikulas did the work to figure > >> > out what is more optimal on x86_64 vs amd64. It makes a difference for > >> > his target and that is sufficient to carry it locally until/when it is > >> > either elevated to pmem. > >> > > >> > We cannot even get x86 and swait maintainers to reply to repeat requests > >> > for review. Stacking up further deps on pmem isn't high on my list. > >> > > >> > >> Except I'm being responsive. > > > > Except you're looking to immediately punt to linux-arm-kernel ;) > > Well, I'm not, not really. I'm saying drop ARM support, it's not ready. > > > > >> I agree with Christoph that we should > >> build pmem helpers at an architecture level and not per-driver. Let's > >> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch > >> up to x86 in this space. We already have PowerPC enabling PMEM API, so > >> I don't see an unreasonable barrier to ask the same of ARM. This patch > >> is not even cc'd to linux-arm-kernel. Has the subject been broached > >> with them? > > > > No idea. Not by me. > > > > The thing is, I'm no expert in pmem. You are. Coordinating the change > > with ARM et al feels unnecessarily limiting and quicky moves outside my > > control. > > > > Serious question: Why can't this code land in this dm-writecache target > > and then be lifted (or obsoleted)? > > Because we already have an API, and we don't want to promote local > solutions to global problems, or carry unnecessary technical debt. > > > > > But if you think it worthwhile to force ARM to step up then fine. That > > does limit the availability of using writecache on ARM while they get > > the PMEM API together. > > > > I'll do whatever you want.. just put the smack down and tell me how it > > is ;) > > I'd say just control the variables you can control. Drop the ARM > support if you want to move forward and propose extensions / updates > to the pmem api for x86 and I'll help push those since I was involved > in pushing the x86 pmem api in the first instance. That way you don't > need to touch this driver as new archs add their pmem api enabling. Looking at Mikulas' wrapper API that you and hch are calling into question: For ARM it is using arch/arm64/mm/flush.c:arch_wb_cache_pmem(). (And ARM does seem to be providing CONFIG_ARCH_HAS_PMEM_API.) Whereas x86_64 is using memcpy_flushcache() as provided by CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE. (Yet ARM does provide arch/arm64/lib/uaccess_flushcache.c:memcpy_flushcache) Just seems this isn't purely about ARM lacking on an API level (given on x86_64 Mikulas isn't only using CONFIG_ARCH_HAS_PMEM_API). Seems this is more to do with x86_64 having efficient Non-temporal stores? Anyway, I'm still trying to appreciate the details here before I can make any forward progress. Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 20:52 ` Mike Snitzer @ 2018-05-22 22:53 ` Jeff Moyer 2018-05-23 20:57 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Jeff Moyer @ 2018-05-22 22:53 UTC (permalink / raw) To: Mike Snitzer, Mikulas Patocka Cc: Christoph Hellwig, device-mapper development, linux-nvdimm Hi, Mike, Mike Snitzer <snitzer@redhat.com> writes: > Looking at Mikulas' wrapper API that you and hch are calling into > question: > > For ARM it is using arch/arm64/mm/flush.c:arch_wb_cache_pmem(). > (And ARM does seem to be providing CONFIG_ARCH_HAS_PMEM_API.) > > Whereas x86_64 is using memcpy_flushcache() as provided by > CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE. > (Yet ARM does provide arch/arm64/lib/uaccess_flushcache.c:memcpy_flushcache) > > Just seems this isn't purely about ARM lacking on an API level (given on > x86_64 Mikulas isn't only using CONFIG_ARCH_HAS_PMEM_API). > > Seems this is more to do with x86_64 having efficient Non-temporal > stores? Yeah, I think you've got that all right. > Anyway, I'm still trying to appreciate the details here before I can > make any forward progress. Making data persistent on x64 requires 3 steps: 1) copy the data into pmem (store instructions) 2) flush the cache lines associated with the data (clflush, clflush_opt, clwb) 3) wait on the flush to complete (sfence) I'm not sure if other architectures require step 3. Mikulas' implementation seems to imply that arm64 doesn't require the fence. The current pmem api provides: memcpy* -- step 1 memcpy_flushcache -- this combines steps 1 and 2 dax_flush -- step 2 wmb* -- step 3 * not strictly part of the pmem api So, if you didn't care about performance, you could write generic code that only used memcpy, dax_flush, and wmb (assuming other arches actually need the wmb). What Mikulas did was to abstract out an API that could be called by generic code that would work optimally on all architectures. This looks like a worth-while addition to the PMEM API, to me. Mikulas, what do you think about refactoring the code as Christoph suggested? Cheers, Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 22:53 ` [dm-devel] " Jeff Moyer @ 2018-05-23 20:57 ` Mikulas Patocka 0 siblings, 0 replies; 40+ messages in thread From: Mikulas Patocka @ 2018-05-23 20:57 UTC (permalink / raw) To: Jeff Moyer Cc: Christoph Hellwig, device-mapper development, Mike Snitzer, linux-nvdimm On Tue, 22 May 2018, Jeff Moyer wrote: > Hi, Mike, > > Mike Snitzer <snitzer@redhat.com> writes: > > > Looking at Mikulas' wrapper API that you and hch are calling into > > question: > > > > For ARM it is using arch/arm64/mm/flush.c:arch_wb_cache_pmem(). > > (And ARM does seem to be providing CONFIG_ARCH_HAS_PMEM_API.) > > > > Whereas x86_64 is using memcpy_flushcache() as provided by > > CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE. > > (Yet ARM does provide arch/arm64/lib/uaccess_flushcache.c:memcpy_flushcache) > > > > Just seems this isn't purely about ARM lacking on an API level (given on > > x86_64 Mikulas isn't only using CONFIG_ARCH_HAS_PMEM_API). > > > > Seems this is more to do with x86_64 having efficient Non-temporal > > stores? > > Yeah, I think you've got that all right. > > > Anyway, I'm still trying to appreciate the details here before I can > > make any forward progress. > > Making data persistent on x64 requires 3 steps: > 1) copy the data into pmem (store instructions) > 2) flush the cache lines associated with the data (clflush, clflush_opt, clwb) > 3) wait on the flush to complete (sfence) In theory it works this way. In practice, this sequence is useless because the cache flusing instructions are horribly slow. So, the dm-writecache driver uses non-temporal stores instead of cache flushing. Now, the problem with arm64 is that it doesn't have non-temporal stores. So, memcpy_flushcache on arm64 does cached stores and flushes the cache afterwards. And this eager flushing is slower than late flushing. On arm4, you want to do cached stores, then do something else, and flush the cache as late as possible. > I'm not sure if other architectures require step 3. Mikulas' > implementation seems to imply that arm64 doesn't require the fence. I suppose that arch_wb_cache_pmem() does whatever it needs to do to flush the cache. If not, add something like arch_wb_cache_pmem_commit(). > The current pmem api provides: > > memcpy* -- step 1 > memcpy_flushcache -- this combines steps 1 and 2 > dax_flush -- step 2 > wmb* -- step 3 > > * not strictly part of the pmem api > > So, if you didn't care about performance, you could write generic code > that only used memcpy, dax_flush, and wmb (assuming other arches > actually need the wmb). What Mikulas did was to abstract out an API > that could be called by generic code that would work optimally on all > architectures. > > This looks like a worth-while addition to the PMEM API, to me. Mikulas, > what do you think about refactoring the code as Christoph suggested? I sent this patch https://www.redhat.com/archives/dm-devel/2018-May/msg00054.html so that you can take the functions pmem_memcpy, pmem_assign, pmem_flush and pmem_commit and move them to the generic linux headers. If you want to do it, do it. > Cheers, > Jeff Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 19:27 ` Dan Williams 2018-05-22 20:52 ` Mike Snitzer @ 2018-05-28 13:52 ` Mikulas Patocka 2018-05-28 17:41 ` Dan Williams 1 sibling, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-28 13:52 UTC (permalink / raw) To: Dan Williams Cc: Christoph Hellwig, device-mapper development, Mike Snitzer, linux-nvdimm On Tue, 22 May 2018, Dan Williams wrote: > >> Except I'm being responsive. > > > > Except you're looking to immediately punt to linux-arm-kernel ;) > > Well, I'm not, not really. I'm saying drop ARM support, it's not ready. This is the worst thing to do - because once late cache flushing is dropped from the dm-writecache target, it could hardly be reintroduced again. > >> I agree with Christoph that we should > >> build pmem helpers at an architecture level and not per-driver. Let's > >> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch > >> up to x86 in this space. We already have PowerPC enabling PMEM API, so > >> I don't see an unreasonable barrier to ask the same of ARM. This patch > >> is not even cc'd to linux-arm-kernel. Has the subject been broached > >> with them? > > > > No idea. Not by me. > > > > The thing is, I'm no expert in pmem. You are. Coordinating the change > > with ARM et al feels unnecessarily limiting and quicky moves outside my > > control. > > > > Serious question: Why can't this code land in this dm-writecache target > > and then be lifted (or obsoleted)? > > Because we already have an API, and we don't want to promote local > solutions to global problems, or carry unnecessary technical debt. > > > > > But if you think it worthwhile to force ARM to step up then fine. That > > does limit the availability of using writecache on ARM while they get > > the PMEM API together. > > > > I'll do whatever you want.. just put the smack down and tell me how it > > is ;) > > I'd say just control the variables you can control. Drop the ARM > support if you want to move forward and propose extensions / updates What do we gain by dropping it? > to the pmem api for x86 and I'll help push those since I was involved > in pushing the x86 pmem api in the first instance. That way you don't > need to touch this driver as new archs add their pmem api enabling. The pmem API is x86-centric - that the problem. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-28 13:52 ` Mikulas Patocka @ 2018-05-28 17:41 ` Dan Williams 2018-05-30 13:42 ` [dm-devel] " Jeff Moyer 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2018-05-28 17:41 UTC (permalink / raw) To: Mikulas Patocka Cc: Christoph Hellwig, device-mapper development, Mike Snitzer, linux-nvdimm On Mon, May 28, 2018 at 6:52 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Tue, 22 May 2018, Dan Williams wrote: > >> >> Except I'm being responsive. >> > >> > Except you're looking to immediately punt to linux-arm-kernel ;) >> >> Well, I'm not, not really. I'm saying drop ARM support, it's not ready. > > This is the worst thing to do - because once late cache flushing is > dropped from the dm-writecache target, it could hardly be reintroduced > again. > >> >> I agree with Christoph that we should >> >> build pmem helpers at an architecture level and not per-driver. Let's >> >> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch >> >> up to x86 in this space. We already have PowerPC enabling PMEM API, so >> >> I don't see an unreasonable barrier to ask the same of ARM. This patch >> >> is not even cc'd to linux-arm-kernel. Has the subject been broached >> >> with them? >> > >> > No idea. Not by me. >> > >> > The thing is, I'm no expert in pmem. You are. Coordinating the change >> > with ARM et al feels unnecessarily limiting and quicky moves outside my >> > control. >> > >> > Serious question: Why can't this code land in this dm-writecache target >> > and then be lifted (or obsoleted)? >> >> Because we already have an API, and we don't want to promote local >> solutions to global problems, or carry unnecessary technical debt. >> >> > >> > But if you think it worthwhile to force ARM to step up then fine. That >> > does limit the availability of using writecache on ARM while they get >> > the PMEM API together. >> > >> > I'll do whatever you want.. just put the smack down and tell me how it >> > is ;) >> >> I'd say just control the variables you can control. Drop the ARM >> support if you want to move forward and propose extensions / updates > > What do we gain by dropping it? > >> to the pmem api for x86 and I'll help push those since I was involved >> in pushing the x86 pmem api in the first instance. That way you don't >> need to touch this driver as new archs add their pmem api enabling. > > The pmem API is x86-centric - that the problem. When I read your patch I came away with the impression that ARM had not added memcpy_flushcache() yet and you were working around that fact. Now that I look, ARM *does* define memcpy_flushcache() and you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM memcpy_flushcache() implementation is: memcpy(dst, src, cnt); __clean_dcache_area_pop(dst, cnt); So, I do not see how what you're doing is any less work unless you are flushing less than you copy? If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the ARM implementation is broken and that needs to be addressed not worked around in a driver. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing 2018-05-28 17:41 ` Dan Williams @ 2018-05-30 13:42 ` Jeff Moyer 2018-05-30 13:51 ` Mikulas Patocka 2018-05-30 13:52 ` Jeff Moyer 0 siblings, 2 replies; 40+ messages in thread From: Jeff Moyer @ 2018-05-30 13:42 UTC (permalink / raw) To: Dan Williams Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka, Mike Snitzer, linux-nvdimm Dan Williams <dan.j.williams@intel.com> writes: > When I read your patch I came away with the impression that ARM had > not added memcpy_flushcache() yet and you were working around that > fact. Now that I look, ARM *does* define memcpy_flushcache() and > you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on > ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM > memcpy_flushcache() implementation is: > > memcpy(dst, src, cnt); > __clean_dcache_area_pop(dst, cnt); > > So, I do not see how what you're doing is any less work unless you are > flushing less than you copy? > > If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the > ARM implementation is broken and that needs to be addressed not worked > around in a driver. I think Mikulas wanted to batch up multiple copies and flush at the end. According to his commit message, that batching gained him 2% performance. -Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:42 ` [dm-devel] " Jeff Moyer @ 2018-05-30 13:51 ` Mikulas Patocka 2018-05-30 13:52 ` Jeff Moyer 1 sibling, 0 replies; 40+ messages in thread From: Mikulas Patocka @ 2018-05-30 13:51 UTC (permalink / raw) To: Jeff Moyer Cc: Christoph Hellwig, device-mapper development, Mike Snitzer, linux-nvdimm On Wed, 30 May 2018, Jeff Moyer wrote: > Dan Williams <dan.j.williams@intel.com> writes: > > > When I read your patch I came away with the impression that ARM had > > not added memcpy_flushcache() yet and you were working around that > > fact. Now that I look, ARM *does* define memcpy_flushcache() and > > you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on > > ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM > > memcpy_flushcache() implementation is: > > > > memcpy(dst, src, cnt); > > __clean_dcache_area_pop(dst, cnt); > > > > So, I do not see how what you're doing is any less work unless you are > > flushing less than you copy? > > > > If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the > > ARM implementation is broken and that needs to be addressed not worked > > around in a driver. > > I think Mikulas wanted to batch up multiple copies and flush at the > end. According to his commit message, that batching gained him 2% > performance. > > -Jeff No - this 2% difference is inlined memcpy_flushcache() vs out-of-line memcpy_flushcache(). I thought that dax_flush() performed 12% better memcpy_flushcache() - but the reason why it performed better was - that it was not flushing the cache at all. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:42 ` [dm-devel] " Jeff Moyer 2018-05-30 13:51 ` Mikulas Patocka @ 2018-05-30 13:52 ` Jeff Moyer 1 sibling, 0 replies; 40+ messages in thread From: Jeff Moyer @ 2018-05-30 13:52 UTC (permalink / raw) To: Dan Williams Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka, Mike Snitzer, linux-nvdimm Jeff Moyer <jmoyer@redhat.com> writes: > Dan Williams <dan.j.williams@intel.com> writes: > >> When I read your patch I came away with the impression that ARM had >> not added memcpy_flushcache() yet and you were working around that >> fact. Now that I look, ARM *does* define memcpy_flushcache() and >> you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on >> ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM >> memcpy_flushcache() implementation is: >> >> memcpy(dst, src, cnt); >> __clean_dcache_area_pop(dst, cnt); >> >> So, I do not see how what you're doing is any less work unless you are >> flushing less than you copy? >> >> If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the >> ARM implementation is broken and that needs to be addressed not worked >> around in a driver. > > I think Mikulas wanted to batch up multiple copies and flush at the > end. According to his commit message, that batching gained him 2% > performance. Nevermind me, I just caught up with the rest of the thread. :) -Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-22 19:00 ` Dan Williams 2018-05-22 19:19 ` Mike Snitzer @ 2018-05-24 8:15 ` Mikulas Patocka 1 sibling, 0 replies; 40+ messages in thread From: Mikulas Patocka @ 2018-05-24 8:15 UTC (permalink / raw) To: Dan Williams Cc: Christoph Hellwig, device-mapper development, Mike Snitzer, linux-nvdimm On Tue, 22 May 2018, Dan Williams wrote: > On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote: > > On Tue, May 22 2018 at 2:39am -0400, > > Christoph Hellwig <hch@infradead.org> wrote: > > > >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote: > >> > Use new API for flushing persistent memory. > >> > >> The sentence doesnt make much sense. 'A new API', 'A better > >> abstraction' maybe? > >> > >> > > >> > The problem is this: > >> > * on X86-64, non-temporal stores have the best performance > >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We > >> > should flush cache as late as possible, because it performs better this > >> > way. > >> > > >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit > >> > data persistently, all three functions must be called. > >> > > >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign > >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. > >> > > >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and > >> > pmem_commit is wmb. > >> > > >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and > >> > pmem_commit is empty. > >> > >> All these should be provided by the pmem layer, and be properly > >> documented. And be sorted before adding your new target that uses > >> them. > > > > I don't see that as a hard requirement. Mikulas did the work to figure > > out what is more optimal on x86_64 vs amd64. It makes a difference for > > his target and that is sufficient to carry it locally until/when it is > > either elevated to pmem. > > > > We cannot even get x86 and swait maintainers to reply to repeat requests > > for review. Stacking up further deps on pmem isn't high on my list. > > > > Except I'm being responsive. I agree with Christoph that we should > build pmem helpers at an architecture level and not per-driver. Let's > make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch > up to x86 in this space. We already have PowerPC enabling PMEM API, so > I don't see an unreasonable barrier to ask the same of ARM. This patch > is not even cc'd to linux-arm-kernel. Has the subject been broached > with them? The ARM code can't "catch-up" with X86. On X86 - non-temporal stores (i.e. memcpy_flushcache) are faster than cached write and cache flushing. The ARM architecture doesn't have non-temporal stores. So, memcpy_flushcache on ARM does memcpy (that writes data to the cache) and then flushes the cache. But this eager cache flushig is slower than late cache flushing. The optimal code sequence on ARM to write to persistent memory is to call memcpy, then do something else, and then call arch_wb_cache_pmem as late as possible. And this ARM-optimized code sequence is just horribly slow on X86. This issue can't be "fixed" in ARM-specific source code. The ARM processor have such a characteristics that eager cache flushing is slower than late cache flushing - and that's it - you can't change processor behavior. If you don't want '#if defined(CONFIG_X86_64)' in the dm-writecache driver, then just take the functions that are in this conditional block and move them to some generic linux header. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <CAPcyv4iEtfuVGPR0QMKcafv2XFwSj3nzxjX8cuXpXe00akAvYA@mail.gmail.com>]
[parent not found: <alpine.LRH.2.02.1805250213270.13894@file01.intranet.prod.int.rdu2.redhat.com>]
* Re: [patch 4/4] dm-writecache: use new API for flushing [not found] ` <alpine.LRH.2.02.1805250213270.13894@file01.intranet.prod.int.rdu2.redhat.com> @ 2018-05-25 12:51 ` Mike Snitzer 2018-05-25 15:57 ` Dan Williams 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-25 12:51 UTC (permalink / raw) To: Dan Williams; +Cc: device-mapper development, Mikulas Patocka, linux-nvdimm On Fri, May 25 2018 at 2:17am -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Thu, 24 May 2018, Dan Williams wrote: > > > On Fri, May 18, 2018 at 10:25 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > Use new API for flushing persistent memory. > > > > > > The problem is this: > > > * on X86-64, non-temporal stores have the best performance > > > * ARM64 doesn't have non-temporal stores, so we must flush cache. We > > > should flush cache as late as possible, because it performs better this > > > way. > > > > > > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit > > > data persistently, all three functions must be called. > > > > > > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign > > > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. > > > > > > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and > > > pmem_commit is wmb. > > > > > > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and > > > pmem_commit is empty. > > > > I don't want to grow driver-local wrappers for pmem. You should use > > memcpy_flushcache directly() and if an architecture does not define > > memcpy_flushcache() then don't allow building dm-writecache, i.e. this > > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't > > see a need to add a standalone flush operation if all relevant archs > > provide memcpy_flushcache(). As for commit, I'd say just use wmb() > > directly since all archs define it. Alternatively we could introduce > > memcpy_flushcache_relaxed() to be the un-ordered version of the copy > > routine and memcpy_flushcache() would imply a wmb(). > > But memcpy_flushcache() on ARM64 is slow. Yes, Dan can you please take some time to fully appreciate what this small wrapper API is providing (maybe you've done that, but your recent reply is mixed-message). Seems you're keeping the innovation it provides at arms-length. Basically the PMEM APIs you've helped construct are lacking, forcing DM developers to own fixing them is an inversion that only serves to stonewall. Please, less time on stonewalling and more time lifting this wrapper API; otherwise the dm-writecache local wrapper API is a near-term means to an end. I revised the dm-writecache patch header yesterday, please review: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.18&id=2105231db61b08752bc4247d2fe7838657700b0d (the last paragraph in particular, I intend to move forward with this wrapper unless someone on the PMEM side of the house steps up and lifts it up between now and when the 4.18 merge window opens) dm: add writecache target The writecache target caches writes on persistent memory or SSD. It is intended for databases or other programs that need extremely low commit latency. The writecache target doesn't cache reads because reads are supposed to be cached in page cache in normal RAM. The following describes the approach used to provide the most efficient flushing of persistent memory on X86_64 vs ARM64: * On X86_64 non-temporal stores (i.e. memcpy_flushcache) are faster than cached writes and cache flushing. * The ARM64 architecture doesn't have non-temporal stores. So, memcpy_flushcache on ARM does memcpy (that writes data to the cache) and then flushes the cache. But this eager cache flushig is slower than late cache flushing. The optimal code sequence on ARM to write to persistent memory is to call memcpy, then do something else, and then call arch_wb_cache_pmem as late as possible. And this ARM-optimized code sequence is just horribly slow on X86. This issue can't be "fixed" in ARM-specific source code. The ARM processor have characteristics such that eager cache flushing is slower than late cache flushing - and that's it - you can't change processor behavior. We introduce a wrapper API for flushing persistent memory with functions pmem_memcpy, pmem_flush and pmem_commit. To commit data persistently, all three functions must be called. The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. On X86_64, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and pmem_commit is wmb. On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and pmem_commit is empty. It is clear that this wrapper API for flushing persistent memory needs to be elevated out of this dm-writecache driver. But that can happen later without requiring DM developers to blaze new trails on pmem specific implementation details/quirks (pmem developers need to clean up their APIs given they are already spread across CONFIG_ARCH_HAS_PMEM_API and CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE and they absolutely don't take into account the duality of the different programming models needed to achieve optimal cross-architecture use of persistent memory). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <msnitzer@redhat.com> _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-25 12:51 ` Mike Snitzer @ 2018-05-25 15:57 ` Dan Williams 2018-05-26 7:02 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2018-05-25 15:57 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, Mikulas Patocka, linux-nvdimm On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote: > On Fri, May 25 2018 at 2:17am -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > >> >> >> On Thu, 24 May 2018, Dan Williams wrote: >> >> > On Fri, May 18, 2018 at 10:25 PM, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > > Use new API for flushing persistent memory. >> > > >> > > The problem is this: >> > > * on X86-64, non-temporal stores have the best performance >> > > * ARM64 doesn't have non-temporal stores, so we must flush cache. We >> > > should flush cache as late as possible, because it performs better this >> > > way. >> > > >> > > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit >> > > data persistently, all three functions must be called. >> > > >> > > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign >> > > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically. >> > > >> > > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and >> > > pmem_commit is wmb. >> > > >> > > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and >> > > pmem_commit is empty. >> > >> > I don't want to grow driver-local wrappers for pmem. You should use >> > memcpy_flushcache directly() and if an architecture does not define >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't >> > see a need to add a standalone flush operation if all relevant archs >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb() >> > directly since all archs define it. Alternatively we could introduce >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy >> > routine and memcpy_flushcache() would imply a wmb(). >> >> But memcpy_flushcache() on ARM64 is slow. > > Yes, Dan can you please take some time to fully appreciate what this > small wrapper API is providing (maybe you've done that, but your recent > reply is mixed-message). Seems you're keeping the innovation it > provides at arms-length. Basically the PMEM APIs you've helped > construct are lacking, forcing DM developers to own fixing them is an > inversion that only serves to stonewall. > > Please, less time on stonewalling and more time lifting this wrapper > API; otherwise the dm-writecache local wrapper API is a near-term means > to an end. > > I revised the dm-writecache patch header yesterday, please review: > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.18&id=2105231db61b08752bc4247d2fe7838657700b0d > (the last paragraph in particular, I intend to move forward with this > wrapper unless someone on the PMEM side of the house steps up and lifts > it up between now and when the 4.18 merge window opens) > > dm: add writecache target > The writecache target caches writes on persistent memory or SSD. > It is intended for databases or other programs that need extremely low > commit latency. > > The writecache target doesn't cache reads because reads are supposed to > be cached in page cache in normal RAM. > > The following describes the approach used to provide the most efficient > flushing of persistent memory on X86_64 vs ARM64: > > * On X86_64 non-temporal stores (i.e. memcpy_flushcache) are faster > than cached writes and cache flushing. > > * The ARM64 architecture doesn't have non-temporal stores. So, > memcpy_flushcache on ARM does memcpy (that writes data to the cache) > and then flushes the cache. But this eager cache flushig is slower > than late cache flushing. > > The optimal code sequence on ARM to write to persistent memory is to > call memcpy, then do something else, and then call arch_wb_cache_pmem as > late as possible. And this ARM-optimized code sequence is just horribly > slow on X86. > > This issue can't be "fixed" in ARM-specific source code. The ARM > processor have characteristics such that eager cache flushing is slower > than late cache flushing - and that's it - you can't change processor > behavior. > > We introduce a wrapper API for flushing persistent memory with functions > pmem_memcpy, pmem_flush and pmem_commit. To commit data persistently, > all three functions must be called. > > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign > (unlike pmem_memcpy) guarantees that 8-byte values are written > atomically. > > On X86_64, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and > pmem_commit is wmb. > > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and > pmem_commit is empty. > > It is clear that this wrapper API for flushing persistent memory needs > to be elevated out of this dm-writecache driver. But that can happen > later without requiring DM developers to blaze new trails on pmem > specific implementation details/quirks (pmem developers need to clean up > their APIs given they are already spread across CONFIG_ARCH_HAS_PMEM_API > and CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE and they absolutely don't take > into account the duality of the different programming models needed to > achieve optimal cross-architecture use of persistent memory). Right, so again, what is wrong with memcpy_flushcache_relaxed() + wmb() or otherwise making memcpy_flushcache() ordered. I do not see that as a trailblazing requirement, I see that as typical review and a reduction of the operation space that you are proposing. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-25 15:57 ` Dan Williams @ 2018-05-26 7:02 ` Mikulas Patocka 2018-05-26 15:26 ` Dan Williams 0 siblings, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-26 7:02 UTC (permalink / raw) To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Fri, 25 May 2018, Dan Williams wrote: > On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote: > > On Fri, May 25 2018 at 2:17am -0400, > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > >> On Thu, 24 May 2018, Dan Williams wrote: > >> > >> > I don't want to grow driver-local wrappers for pmem. You should use > >> > memcpy_flushcache directly() and if an architecture does not define > >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this > >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't > >> > see a need to add a standalone flush operation if all relevant archs > >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb() > >> > directly since all archs define it. Alternatively we could introduce > >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy > >> > routine and memcpy_flushcache() would imply a wmb(). > >> > >> But memcpy_flushcache() on ARM64 is slow. > > Right, so again, what is wrong with memcpy_flushcache_relaxed() + > wmb() or otherwise making memcpy_flushcache() ordered. I do not see > that as a trailblazing requirement, I see that as typical review and a > reduction of the operation space that you are proposing. memcpy_flushcache on ARM64 is generally wrong thing to do, because it is slower than memcpy and explicit cache flush. Suppose that you want to write data to a block device and make it persistent. So you send a WRITE bio and then a FLUSH bio. Now - how to implement these two bios on persistent memory: On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does wmb() - this is the optimal implementation. But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal implementation is that the WRITE bio does just memcpy() and the FLUSH bio does arch_wb_cache_pmem() on the affected range. Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture doesn't have non-temporal stores. So, memcpy_flushcache() is implemented as memcpy() followed by a cache flush. Now - if you flush the cache immediatelly after memcpy, the cache is full of dirty lines and the cache-flushing code has to write these lines back and that is slow. If you flush the cache some time after memcpy (i.e. when the FLUSH bio is received), the processor already flushed some part of the cache on its own, so the cache-flushing function has less work to do and it is faster. So the conclusion is - don't use memcpy_flushcache on ARM. This problem cannot be fixed by a better implementation of memcpy_flushcache. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-26 7:02 ` Mikulas Patocka @ 2018-05-26 15:26 ` Dan Williams 2018-05-28 13:32 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2018-05-26 15:26 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 25 May 2018, Dan Williams wrote: > >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote: >> > On Fri, May 25 2018 at 2:17am -0400, >> > Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> >> On Thu, 24 May 2018, Dan Williams wrote: >> >> >> >> > I don't want to grow driver-local wrappers for pmem. You should use >> >> > memcpy_flushcache directly() and if an architecture does not define >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't >> >> > see a need to add a standalone flush operation if all relevant archs >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb() >> >> > directly since all archs define it. Alternatively we could introduce >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy >> >> > routine and memcpy_flushcache() would imply a wmb(). >> >> >> >> But memcpy_flushcache() on ARM64 is slow. >> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() + >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see >> that as a trailblazing requirement, I see that as typical review and a >> reduction of the operation space that you are proposing. > > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is > slower than memcpy and explicit cache flush. > > Suppose that you want to write data to a block device and make it > persistent. So you send a WRITE bio and then a FLUSH bio. > > Now - how to implement these two bios on persistent memory: > > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does > wmb() - this is the optimal implementation. > > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal > implementation is that the WRITE bio does just memcpy() and the FLUSH bio > does arch_wb_cache_pmem() on the affected range. > > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented > as memcpy() followed by a cache flush. > > Now - if you flush the cache immediatelly after memcpy, the cache is full > of dirty lines and the cache-flushing code has to write these lines back > and that is slow. > > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is > received), the processor already flushed some part of the cache on its > own, so the cache-flushing function has less work to do and it is faster. > > So the conclusion is - don't use memcpy_flushcache on ARM. This problem > cannot be fixed by a better implementation of memcpy_flushcache. It sounds like ARM might be better off with mapping its pmem as write-through rather than write-back, and skip the explicit cache management altogether. You speak of "optimal" and "sub-optimal", but what would be more clear is fio measurements of the relative IOPs and latency profiles of the different approaches. The reason I am continuing to push here is that reducing the operation space from 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the maintenance long term. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-26 15:26 ` Dan Williams @ 2018-05-28 13:32 ` Mikulas Patocka 2018-05-28 18:14 ` Dan Williams 0 siblings, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-28 13:32 UTC (permalink / raw) To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Sat, 26 May 2018, Dan Williams wrote: > On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Fri, 25 May 2018, Dan Williams wrote: > > > >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote: > >> > On Fri, May 25 2018 at 2:17am -0400, > >> > Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > >> >> On Thu, 24 May 2018, Dan Williams wrote: > >> >> > >> >> > I don't want to grow driver-local wrappers for pmem. You should use > >> >> > memcpy_flushcache directly() and if an architecture does not define > >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this > >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't > >> >> > see a need to add a standalone flush operation if all relevant archs > >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb() > >> >> > directly since all archs define it. Alternatively we could introduce > >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy > >> >> > routine and memcpy_flushcache() would imply a wmb(). > >> >> > >> >> But memcpy_flushcache() on ARM64 is slow. > >> > >> Right, so again, what is wrong with memcpy_flushcache_relaxed() + > >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see > >> that as a trailblazing requirement, I see that as typical review and a > >> reduction of the operation space that you are proposing. > > > > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is > > slower than memcpy and explicit cache flush. > > > > Suppose that you want to write data to a block device and make it > > persistent. So you send a WRITE bio and then a FLUSH bio. > > > > Now - how to implement these two bios on persistent memory: > > > > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does > > wmb() - this is the optimal implementation. > > > > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal > > implementation is that the WRITE bio does just memcpy() and the FLUSH bio > > does arch_wb_cache_pmem() on the affected range. > > > > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture > > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented > > as memcpy() followed by a cache flush. > > > > Now - if you flush the cache immediatelly after memcpy, the cache is full > > of dirty lines and the cache-flushing code has to write these lines back > > and that is slow. > > > > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is > > received), the processor already flushed some part of the cache on its > > own, so the cache-flushing function has less work to do and it is faster. > > > > So the conclusion is - don't use memcpy_flushcache on ARM. This problem > > cannot be fixed by a better implementation of memcpy_flushcache. > > It sounds like ARM might be better off with mapping its pmem as > write-through rather than write-back, and skip the explicit cache I doubt it would perform well - write combining combines the writes into a larger segments - and write-through doesn't. > management altogether. You speak of "optimal" and "sub-optimal", but > what would be more clear is fio measurements of the relative IOPs and > latency profiles of the different approaches. The reason I am > continuing to push here is that reducing the operation space from > 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the > maintenance long term. I measured it (with nvme backing store) and late cache flushing has 12% better performance than eager flushing with memcpy_flushcache(). 131836 4k iops - vs - 117016. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-28 13:32 ` Mikulas Patocka @ 2018-05-28 18:14 ` Dan Williams 2018-05-30 13:07 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2018-05-28 18:14 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Sat, 26 May 2018, Dan Williams wrote: > >> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Fri, 25 May 2018, Dan Williams wrote: >> > >> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote: >> >> > On Fri, May 25 2018 at 2:17am -0400, >> >> > Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> > >> >> >> On Thu, 24 May 2018, Dan Williams wrote: >> >> >> >> >> >> > I don't want to grow driver-local wrappers for pmem. You should use >> >> >> > memcpy_flushcache directly() and if an architecture does not define >> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this >> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't >> >> >> > see a need to add a standalone flush operation if all relevant archs >> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb() >> >> >> > directly since all archs define it. Alternatively we could introduce >> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy >> >> >> > routine and memcpy_flushcache() would imply a wmb(). >> >> >> >> >> >> But memcpy_flushcache() on ARM64 is slow. >> >> >> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() + >> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see >> >> that as a trailblazing requirement, I see that as typical review and a >> >> reduction of the operation space that you are proposing. >> > >> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is >> > slower than memcpy and explicit cache flush. >> > >> > Suppose that you want to write data to a block device and make it >> > persistent. So you send a WRITE bio and then a FLUSH bio. >> > >> > Now - how to implement these two bios on persistent memory: >> > >> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does >> > wmb() - this is the optimal implementation. >> > >> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal >> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio >> > does arch_wb_cache_pmem() on the affected range. >> > >> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture >> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented >> > as memcpy() followed by a cache flush. >> > >> > Now - if you flush the cache immediatelly after memcpy, the cache is full >> > of dirty lines and the cache-flushing code has to write these lines back >> > and that is slow. >> > >> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is >> > received), the processor already flushed some part of the cache on its >> > own, so the cache-flushing function has less work to do and it is faster. >> > >> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem >> > cannot be fixed by a better implementation of memcpy_flushcache. >> >> It sounds like ARM might be better off with mapping its pmem as >> write-through rather than write-back, and skip the explicit cache > > I doubt it would perform well - write combining combines the writes into a > larger segments - and write-through doesn't. > Last I checked write-through caching does not disable write combining >> management altogether. You speak of "optimal" and "sub-optimal", but >> what would be more clear is fio measurements of the relative IOPs and >> latency profiles of the different approaches. The reason I am >> continuing to push here is that reducing the operation space from >> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the >> maintenance long term. > > I measured it (with nvme backing store) and late cache flushing has 12% > better performance than eager flushing with memcpy_flushcache(). I assume what you're seeing is ARM64 over-flushing the amount of dirty data so it becomes more efficient to do an amortized flush at the end? However, that effectively makes memcpy_flushcache() unusable in the way it can be used on x86. You claimed that ARM does not support non-temporal stores, but it does, see the STNP instruction. I do not want to see arch specific optimizations in drivers, so either write-through mappings is a potential answer to remove the need to explicitly manage flushing, or just implement STNP hacks in memcpy_flushcache() like you did with MOVNT on x86. > 131836 4k iops - vs - 117016. To be clear this is memcpy_flushcache() vs memcpy + flush? _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-28 18:14 ` Dan Williams @ 2018-05-30 13:07 ` Mikulas Patocka 2018-05-30 13:16 ` Mike Snitzer 2018-05-30 15:58 ` Dan Williams 0 siblings, 2 replies; 40+ messages in thread From: Mikulas Patocka @ 2018-05-30 13:07 UTC (permalink / raw) To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Mon, 28 May 2018, Dan Williams wrote: > On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Sat, 26 May 2018, Dan Williams wrote: > > > >> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > >> > > >> > On Fri, 25 May 2018, Dan Williams wrote: > >> > > >> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote: > >> >> > On Fri, May 25 2018 at 2:17am -0400, > >> >> > Mikulas Patocka <mpatocka@redhat.com> wrote: > >> >> > > >> >> >> On Thu, 24 May 2018, Dan Williams wrote: > >> >> >> > >> >> >> > I don't want to grow driver-local wrappers for pmem. You should use > >> >> >> > memcpy_flushcache directly() and if an architecture does not define > >> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this > >> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't > >> >> >> > see a need to add a standalone flush operation if all relevant archs > >> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb() > >> >> >> > directly since all archs define it. Alternatively we could introduce > >> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy > >> >> >> > routine and memcpy_flushcache() would imply a wmb(). > >> >> >> > >> >> >> But memcpy_flushcache() on ARM64 is slow. > >> >> > >> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() + > >> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see > >> >> that as a trailblazing requirement, I see that as typical review and a > >> >> reduction of the operation space that you are proposing. > >> > > >> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is > >> > slower than memcpy and explicit cache flush. > >> > > >> > Suppose that you want to write data to a block device and make it > >> > persistent. So you send a WRITE bio and then a FLUSH bio. > >> > > >> > Now - how to implement these two bios on persistent memory: > >> > > >> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does > >> > wmb() - this is the optimal implementation. > >> > > >> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal > >> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio > >> > does arch_wb_cache_pmem() on the affected range. > >> > > >> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture > >> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented > >> > as memcpy() followed by a cache flush. > >> > > >> > Now - if you flush the cache immediatelly after memcpy, the cache is full > >> > of dirty lines and the cache-flushing code has to write these lines back > >> > and that is slow. > >> > > >> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is > >> > received), the processor already flushed some part of the cache on its > >> > own, so the cache-flushing function has less work to do and it is faster. > >> > > >> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem > >> > cannot be fixed by a better implementation of memcpy_flushcache. > >> > >> It sounds like ARM might be better off with mapping its pmem as > >> write-through rather than write-back, and skip the explicit cache > > > > I doubt it would perform well - write combining combines the writes into a > > larger segments - and write-through doesn't. > > > > Last I checked write-through caching does not disable write combining > > >> management altogether. You speak of "optimal" and "sub-optimal", but > >> what would be more clear is fio measurements of the relative IOPs and > >> latency profiles of the different approaches. The reason I am > >> continuing to push here is that reducing the operation space from > >> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the > >> maintenance long term. > > > > I measured it (with nvme backing store) and late cache flushing has 12% > > better performance than eager flushing with memcpy_flushcache(). > > I assume what you're seeing is ARM64 over-flushing the amount of dirty > data so it becomes more efficient to do an amortized flush at the end? > However, that effectively makes memcpy_flushcache() unusable in the > way it can be used on x86. You claimed that ARM does not support > non-temporal stores, but it does, see the STNP instruction. I do not > want to see arch specific optimizations in drivers, so either > write-through mappings is a potential answer to remove the need to > explicitly manage flushing, or just implement STNP hacks in > memcpy_flushcache() like you did with MOVNT on x86. > > > 131836 4k iops - vs - 117016. > > To be clear this is memcpy_flushcache() vs memcpy + flush? I found out what caused the difference. I used dax_flush on the version of dm-writecache that I had on the ARM machine (with the kernel 4.14, because it is the last version where dax on ramdisk works) - and I thought that dax_flush flushes the cache, but it doesn't. When I replaced dax_flush with arch_wb_cache_pmem, the performance difference between early flushing and late flushing disappeared. So I think we can remove this per-architecture switch from dm-writecache. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:07 ` Mikulas Patocka @ 2018-05-30 13:16 ` Mike Snitzer 2018-05-30 13:21 ` Mikulas Patocka 2018-05-30 15:58 ` Dan Williams 1 sibling, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-30 13:16 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm On Wed, May 30 2018 at 9:07am -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 28 May 2018, Dan Williams wrote: > > > On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > I measured it (with nvme backing store) and late cache flushing has 12% > > > better performance than eager flushing with memcpy_flushcache(). > > > > I assume what you're seeing is ARM64 over-flushing the amount of dirty > > data so it becomes more efficient to do an amortized flush at the end? > > However, that effectively makes memcpy_flushcache() unusable in the > > way it can be used on x86. You claimed that ARM does not support > > non-temporal stores, but it does, see the STNP instruction. I do not > > want to see arch specific optimizations in drivers, so either > > write-through mappings is a potential answer to remove the need to > > explicitly manage flushing, or just implement STNP hacks in > > memcpy_flushcache() like you did with MOVNT on x86. > > > > > 131836 4k iops - vs - 117016. > > > > To be clear this is memcpy_flushcache() vs memcpy + flush? > > I found out what caused the difference. I used dax_flush on the version of > dm-writecache that I had on the ARM machine (with the kernel 4.14, because > it is the last version where dax on ramdisk works) - and I thought that > dax_flush flushes the cache, but it doesn't. > > When I replaced dax_flush with arch_wb_cache_pmem, the performance > difference between early flushing and late flushing disappeared. > > So I think we can remove this per-architecture switch from dm-writecache. That is really great news, can you submit an incremental patch that layers ontop of the linux-dm.git 'dm-4.18' branch? Thanks, Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:16 ` Mike Snitzer @ 2018-05-30 13:21 ` Mikulas Patocka 2018-05-30 13:26 ` Mike Snitzer 0 siblings, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-30 13:21 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm On Wed, 30 May 2018, Mike Snitzer wrote: > That is really great news, can you submit an incremental patch that > layers ontop of the linux-dm.git 'dm-4.18' branch? > > Thanks, > Mike I've sent the current version that I have. I fixed the bugs that were reported here (missing DAX, dm_bufio_client_create, __branch_check__ long->int truncation). Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:21 ` Mikulas Patocka @ 2018-05-30 13:26 ` Mike Snitzer 2018-05-30 13:33 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-30 13:26 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm On Wed, May 30 2018 at 9:21am -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > That is really great news, can you submit an incremental patch that > > layers ontop of the linux-dm.git 'dm-4.18' branch? > > > > Thanks, > > Mike > > I've sent the current version that I have. I fixed the bugs that were > reported here (missing DAX, dm_bufio_client_create, __branch_check__ > long->int truncation). OK, but a monolithic dm-writecache.c is no longer useful to me. I can drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take your __branch_check__ patch). Not sure what the dm_bufio_client_create fix is... must've missed a report about that. ANyway, point is we're on too a different phase of dm-writecache.c's development. I've picked it up and am trying to get it ready for the 4.18 merge window (likely opening Sunday). Therefore it needs to be in a git tree, and incremental changes overlayed. I cannot be rebasing at this late stage in the 4.18 development window. Thanks, Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:26 ` Mike Snitzer @ 2018-05-30 13:33 ` Mikulas Patocka 2018-05-30 13:54 ` Mike Snitzer 0 siblings, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-30 13:33 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm On Wed, 30 May 2018, Mike Snitzer wrote: > On Wed, May 30 2018 at 9:21am -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > That is really great news, can you submit an incremental patch that > > > layers ontop of the linux-dm.git 'dm-4.18' branch? > > > > > > Thanks, > > > Mike > > > > I've sent the current version that I have. I fixed the bugs that were > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ > > long->int truncation). > > OK, but a monolithic dm-writecache.c is no longer useful to me. I can > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take > your __branch_check__ patch). Not sure what the dm_bufio_client_create > fix is... must've missed a report about that. > > ANyway, point is we're on too a different phase of dm-writecache.c's > development. I've picked it up and am trying to get it ready for the > 4.18 merge window (likely opening Sunday). Therefore it needs to be in > a git tree, and incremental changes overlayed. I cannot be rebasing at > this late stage in the 4.18 development window. > > Thanks, > Mike I downloaded dm-writecache from your git repository some times ago - but you changed a lot of useless things (i.e. reordering the fields in the structure) since that time - so, you'll have to merge the changes. You dropped the latency measuring code - why? We still would like to benchmark the driver. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:33 ` Mikulas Patocka @ 2018-05-30 13:54 ` Mike Snitzer 2018-05-30 14:09 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-30 13:54 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm On Wed, May 30 2018 at 9:33am -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > On Wed, May 30 2018 at 9:21am -0400, > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > > > That is really great news, can you submit an incremental patch that > > > > layers ontop of the linux-dm.git 'dm-4.18' branch? > > > > > > > > Thanks, > > > > Mike > > > > > > I've sent the current version that I have. I fixed the bugs that were > > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ > > > long->int truncation). > > > > OK, but a monolithic dm-writecache.c is no longer useful to me. I can > > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take > > your __branch_check__ patch). Not sure what the dm_bufio_client_create > > fix is... must've missed a report about that. > > > > ANyway, point is we're on too a different phase of dm-writecache.c's > > development. I've picked it up and am trying to get it ready for the > > 4.18 merge window (likely opening Sunday). Therefore it needs to be in > > a git tree, and incremental changes overlayed. I cannot be rebasing at > > this late stage in the 4.18 development window. > > > > Thanks, > > Mike > > I downloaded dm-writecache from your git repository some times ago - but > you changed a lot of useless things (i.e. reordering the fields in the > structure) since that time - so, you'll have to merge the changes. Fine I'll deal with it. reordering the fields eliminated holes in the structure and reduced struct members spanning cache lines. > You dropped the latency measuring code - why? We still would like to > benchmark the driver. I saved that patch. It is a development-only patch. I tried to have you work on formalizing the code further by making the functions actually get called, and by wrapping the code with CONFIG_DM_WRITECACHE_MEASURE_LATENCY. In the end I dropped it for now and we'll just have to apply the patch when we need to benchmark. Here is the current patch if you'd like to improve it (e.g. actually call measure_latency_start and measure_latency_end in places) -- I seemed to have lost my incremental change to switch over to CONFIG_DM_WRITECACHE_MEASURE_LATENCY (likely due to rebase); can worry about that later. This is based on the dm-4.18 branch. >From 20a7c123271741cb7260154b68730942417e803a Mon Sep 17 00:00:00 2001 From: Mikulas Patocka <mpatocka@redhat.com> Date: Tue, 22 May 2018 15:54:53 -0400 Subject: [PATCH] dm writecache: add the ability to measure some latencies Developer-only code that won't go upstream (as-is). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> drivers/md/dm-writecache.c | 94 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 87 insertions(+), 7 deletions(-) diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c index 844c4fb2fcfc..e733a14faf8f 100644 --- a/drivers/md/dm-writecache.c +++ b/drivers/md/dm-writecache.c @@ -28,6 +28,8 @@ #define AUTOCOMMIT_BLOCKS_PMEM 64 #define AUTOCOMMIT_MSEC 1000 +//#define WC_MEASURE_LATENCY + #define BITMAP_GRANULARITY 65536 #if BITMAP_GRANULARITY < PAGE_SIZE #undef BITMAP_GRANULARITY @@ -217,6 +219,15 @@ struct dm_writecache { struct dm_kcopyd_client *dm_kcopyd; unsigned long *dirty_bitmap; unsigned dirty_bitmap_size; + +#ifdef WC_MEASURE_LATENCY + ktime_t lock_acquired_time; + ktime_t max_lock_held; + ktime_t max_lock_wait; + ktime_t max_freelist_wait; + ktime_t measure_latency_time; + ktime_t max_measure_latency; +#endif }; #define WB_LIST_INLINE 16 @@ -243,15 +254,60 @@ struct copy_struct { DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(dm_writecache_throttle, "A percentage of time allocated for data copying"); -static void wc_lock(struct dm_writecache *wc) +static inline void measure_latency_start(struct dm_writecache *wc) +{ +#ifdef WC_MEASURE_LATENCY + wc->measure_latency_time = ktime_get(); +#endif +} + +static inline void measure_latency_end(struct dm_writecache *wc, unsigned long n) { +#ifdef WC_MEASURE_LATENCY + ktime_t now = ktime_get(); + if (now - wc->measure_latency_time > wc->max_measure_latency) { + wc->max_measure_latency = now - wc->measure_latency_time; + printk(KERN_DEBUG "dm-writecache: measured latency %lld.%03lldus, %lu steps\n", + wc->max_measure_latency / 1000, wc->max_measure_latency % 1000, n); + } +#endif +} + +static void __wc_lock(struct dm_writecache *wc, int line) +{ +#ifdef WC_MEASURE_LATENCY + ktime_t before, after; + before = ktime_get(); +#endif mutex_lock(&wc->lock); +#ifdef WC_MEASURE_LATENCY + after = ktime_get(); + if (unlikely(after - before > wc->max_lock_wait)) { + wc->max_lock_wait = after - before; + printk(KERN_DEBUG "dm-writecache: waiting for lock for %lld.%03lldus at %d\n", + wc->max_lock_wait / 1000, wc->max_lock_wait % 1000, line); + after = ktime_get(); + } + wc->lock_acquired_time = after; +#endif } +#define wc_lock(wc) __wc_lock(wc, __LINE__) -static void wc_unlock(struct dm_writecache *wc) +static void __wc_unlock(struct dm_writecache *wc, int line) { +#ifdef WC_MEASURE_LATENCY + ktime_t now = ktime_get(); + if (now - wc->lock_acquired_time > wc->max_lock_held) { + wc->max_lock_held = now - wc->lock_acquired_time; + printk(KERN_DEBUG "dm-writecache: lock held for %lld.%03lldus at %d\n", + wc->max_lock_held / 1000, wc->max_lock_held % 1000, line); + } +#endif mutex_unlock(&wc->lock); } +#define wc_unlock(wc) __wc_unlock(wc, __LINE__) + +#define wc_unlock_long(wc) mutex_unlock(&wc->lock) #if IS_ENABLED(CONFIG_DAX_DRIVER) static int persistent_memory_claim(struct dm_writecache *wc) @@ -293,6 +349,10 @@ static int persistent_memory_claim(struct dm_writecache *wc) r = -EOPNOTSUPP; goto err2; } +#ifdef WC_MEASURE_LATENCY + printk(KERN_DEBUG "dm-writecache: device %s, pfn %016llx\n", + wc->ssd_dev->name, pfn.val); +#endif if (da != p) { long i; wc->memory_map = NULL; @@ -701,16 +761,35 @@ static void writecache_free_entry(struct dm_writecache *wc, struct wc_entry *e) swake_up(&wc->freelist_wait); } -static void writecache_wait_on_freelist(struct dm_writecache *wc) +static void __writecache_wait_on_freelist(struct dm_writecache *wc, bool measure, int line) { DECLARE_SWAITQUEUE(wait); +#ifdef WC_MEASURE_LATENCY + ktime_t before, after; +#endif prepare_to_swait(&wc->freelist_wait, &wait, TASK_UNINTERRUPTIBLE); wc_unlock(wc); +#ifdef WC_MEASURE_LATENCY + if (measure) + before = ktime_get(); +#endif io_schedule(); finish_swait(&wc->freelist_wait, &wait); +#ifdef WC_MEASURE_LATENCY + if (measure) { + after = ktime_get(); + if (unlikely(after - before > wc->max_freelist_wait)) { + wc->max_freelist_wait = after - before; + printk(KERN_DEBUG "dm-writecache: waiting on freelist for %lld.%03lldus at %d\n", + wc->max_freelist_wait / 1000, wc->max_freelist_wait % 1000, line); + } + } +#endif wc_lock(wc); } +#define writecache_wait_on_freelist(wc) __writecache_wait_on_freelist(wc, true, __LINE__) +#define writecache_wait_on_freelist_long(wc) __writecache_wait_on_freelist(wc, false, __LINE__) static void writecache_poison_lists(struct dm_writecache *wc) { @@ -890,7 +969,7 @@ static void writecache_suspend(struct dm_target *ti) writecache_poison_lists(wc); - wc_unlock(wc); + wc_unlock_long(wc); } static int writecache_alloc_entries(struct dm_writecache *wc) @@ -1001,7 +1080,7 @@ static void writecache_resume(struct dm_target *ti) writecache_commit_flushed(wc); } - wc_unlock(wc); + wc_unlock_long(wc); } static int process_flush_mesg(unsigned argc, char **argv, struct dm_writecache *wc) @@ -1472,8 +1551,9 @@ static void __writeback_throttle(struct dm_writecache *wc, struct writeback_list if (unlikely(wc->max_writeback_jobs)) { if (READ_ONCE(wc->writeback_size) - wbl->size >= wc->max_writeback_jobs) { wc_lock(wc); - while (wc->writeback_size - wbl->size >= wc->max_writeback_jobs) - writecache_wait_on_freelist(wc); + while (wc->writeback_size - wbl->size >= wc->max_writeback_jobs) { + writecache_wait_on_freelist_long(wc); + } wc_unlock(wc); } } _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:54 ` Mike Snitzer @ 2018-05-30 14:09 ` Mikulas Patocka 2018-05-30 14:21 ` Mike Snitzer 2018-05-31 3:39 ` Mike Snitzer 0 siblings, 2 replies; 40+ messages in thread From: Mikulas Patocka @ 2018-05-30 14:09 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm On Wed, 30 May 2018, Mike Snitzer wrote: > On Wed, May 30 2018 at 9:33am -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > On Wed, May 30 2018 at 9:21am -0400, > > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > > > > > That is really great news, can you submit an incremental patch that > > > > > layers ontop of the linux-dm.git 'dm-4.18' branch? > > > > > > > > > > Thanks, > > > > > Mike > > > > > > > > I've sent the current version that I have. I fixed the bugs that were > > > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ > > > > long->int truncation). > > > > > > OK, but a monolithic dm-writecache.c is no longer useful to me. I can > > > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take > > > your __branch_check__ patch). Not sure what the dm_bufio_client_create > > > fix is... must've missed a report about that. > > > > > > ANyway, point is we're on too a different phase of dm-writecache.c's > > > development. I've picked it up and am trying to get it ready for the > > > 4.18 merge window (likely opening Sunday). Therefore it needs to be in > > > a git tree, and incremental changes overlayed. I cannot be rebasing at > > > this late stage in the 4.18 development window. > > > > > > Thanks, > > > Mike > > > > I downloaded dm-writecache from your git repository some times ago - but > > you changed a lot of useless things (i.e. reordering the fields in the > > structure) since that time - so, you'll have to merge the changes. > > Fine I'll deal with it. reordering the fields eliminated holes in the > structure and reduced struct members spanning cache lines. And what about this? #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) The code that I had just allowed the compiler to optimize out persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you deleted it. Most architectures don't have persistent memory and the dm-writecache driver could work in ssd-only mode on them. On these architectures, I define #define WC_MODE_PMEM(wc) false - and the compiler will just automatically remove the tests for that condition and the unused branch. It does also eliminate unused static functions. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 14:09 ` Mikulas Patocka @ 2018-05-30 14:21 ` Mike Snitzer 2018-05-30 14:46 ` Mikulas Patocka 2018-05-31 3:39 ` Mike Snitzer 1 sibling, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-30 14:21 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm On Wed, May 30 2018 at 10:09am -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > On Wed, May 30 2018 at 9:33am -0400, > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > > > On Wed, May 30 2018 at 9:21am -0400, > > > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > > > > > > > That is really great news, can you submit an incremental patch that > > > > > > layers ontop of the linux-dm.git 'dm-4.18' branch? > > > > > > > > > > > > Thanks, > > > > > > Mike > > > > > > > > > > I've sent the current version that I have. I fixed the bugs that were > > > > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ > > > > > long->int truncation). > > > > > > > > OK, but a monolithic dm-writecache.c is no longer useful to me. I can > > > > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take > > > > your __branch_check__ patch). Not sure what the dm_bufio_client_create > > > > fix is... must've missed a report about that. > > > > > > > > ANyway, point is we're on too a different phase of dm-writecache.c's > > > > development. I've picked it up and am trying to get it ready for the > > > > 4.18 merge window (likely opening Sunday). Therefore it needs to be in > > > > a git tree, and incremental changes overlayed. I cannot be rebasing at > > > > this late stage in the 4.18 development window. > > > > > > > > Thanks, > > > > Mike > > > > > > I downloaded dm-writecache from your git repository some times ago - but > > > you changed a lot of useless things (i.e. reordering the fields in the > > > structure) since that time - so, you'll have to merge the changes. > > > > Fine I'll deal with it. reordering the fields eliminated holes in the > > structure and reduced struct members spanning cache lines. > > And what about this? > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > > The code that I had just allowed the compiler to optimize out > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you > deleted it. > > Most architectures don't have persistent memory and the dm-writecache > driver could work in ssd-only mode on them. On these architectures, I > define > #define WC_MODE_PMEM(wc) false > - and the compiler will just automatically remove the tests for that > condition and the unused branch. It does also eliminate unused static > functions. This level of microoptimization can be backfilled. But as it was, there were too many #defines. And I'm really not concerned with eliminating unused static functions for this case. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 14:21 ` Mike Snitzer @ 2018-05-30 14:46 ` Mikulas Patocka 2018-05-31 3:42 ` Mike Snitzer 0 siblings, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-30 14:46 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm On Wed, 30 May 2018, Mike Snitzer wrote: > On Wed, May 30 2018 at 10:09am -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > On Wed, May 30 2018 at 9:33am -0400, > > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > > > > > On Wed, May 30 2018 at 9:21am -0400, > > > > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > > > > > > > > > That is really great news, can you submit an incremental patch that > > > > > > > layers ontop of the linux-dm.git 'dm-4.18' branch? > > > > > > > > > > > > > > Thanks, > > > > > > > Mike > > > > > > > > > > > > I've sent the current version that I have. I fixed the bugs that were > > > > > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ > > > > > > long->int truncation). > > > > > > > > > > OK, but a monolithic dm-writecache.c is no longer useful to me. I can > > > > > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take > > > > > your __branch_check__ patch). Not sure what the dm_bufio_client_create > > > > > fix is... must've missed a report about that. > > > > > > > > > > ANyway, point is we're on too a different phase of dm-writecache.c's > > > > > development. I've picked it up and am trying to get it ready for the > > > > > 4.18 merge window (likely opening Sunday). Therefore it needs to be in > > > > > a git tree, and incremental changes overlayed. I cannot be rebasing at > > > > > this late stage in the 4.18 development window. > > > > > > > > > > Thanks, > > > > > Mike > > > > > > > > I downloaded dm-writecache from your git repository some times ago - but > > > > you changed a lot of useless things (i.e. reordering the fields in the > > > > structure) since that time - so, you'll have to merge the changes. > > > > > > Fine I'll deal with it. reordering the fields eliminated holes in the > > > structure and reduced struct members spanning cache lines. > > > > And what about this? > > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > > > > The code that I had just allowed the compiler to optimize out > > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you > > deleted it. > > > > Most architectures don't have persistent memory and the dm-writecache > > driver could work in ssd-only mode on them. On these architectures, I > > define > > #define WC_MODE_PMEM(wc) false > > - and the compiler will just automatically remove the tests for that > > condition and the unused branch. It does also eliminate unused static > > functions. > > This level of microoptimization can be backfilled. But as it was, there > were too many #defines. And I'm really not concerned with eliminating > unused static functions for this case. I don't see why "too many defines" would be a problem. If I compile it with and without pmem support, the difference is 15kB-vs-12kB. If we look at just one function (writecache_map), the difference is 1595 bytes - vs - 1280 bytes. So, it produces real savings in code size. The problem with performance is not caused a condition that always jumps the same way (that is predicted by the CPU and it causes no delays in the pipeline) - the problem is that a bigger function consumes more i-cache. There is no reason to include code that can't be executed. Note that we should also redefine pmem_assign on architectures that don't support persistent memory: #ifndef DM_WRITECACHE_ONLY_SSD #define pmem_assign(dest, src) \ do { \ typeof(dest) uniq = (src); \ memcpy_flushcache(&(dest), &uniq, sizeof(dest)); \ } while (0) #else #define pmem_assign(dest, src) ((dest) = (src)) #endif I.e. we should not call memcpy_flushcache if we can't have persistent memory. Cache flushing is slow and we should not do it if we don't have to. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 14:46 ` Mikulas Patocka @ 2018-05-31 3:42 ` Mike Snitzer 2018-06-03 15:03 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-31 3:42 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm On Wed, May 30 2018 at 10:46P -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > Fine I'll deal with it. reordering the fields eliminated holes in the > > > > structure and reduced struct members spanning cache lines. > > > > > > And what about this? > > > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > > > > > > The code that I had just allowed the compiler to optimize out > > > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you > > > deleted it. > > > > > > Most architectures don't have persistent memory and the dm-writecache > > > driver could work in ssd-only mode on them. On these architectures, I > > > define > > > #define WC_MODE_PMEM(wc) false > > > - and the compiler will just automatically remove the tests for that > > > condition and the unused branch. It does also eliminate unused static > > > functions. > > > > This level of microoptimization can be backfilled. But as it was, there > > were too many #defines. And I'm really not concerned with eliminating > > unused static functions for this case. > > I don't see why "too many defines" would be a problem. > > If I compile it with and without pmem support, the difference is > 15kB-vs-12kB. If we look at just one function (writecache_map), the > difference is 1595 bytes - vs - 1280 bytes. So, it produces real savings > in code size. > > The problem with performance is not caused a condition that always jumps > the same way (that is predicted by the CPU and it causes no delays in the > pipeline) - the problem is that a bigger function consumes more i-cache. > There is no reason to include code that can't be executed. Please double check you see the reduced code size you're expecting using the latest dm-writecache.c in linux-dm.git's dm-4.18 branch. Thanks, Mike _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-31 3:42 ` Mike Snitzer @ 2018-06-03 15:03 ` Mikulas Patocka 0 siblings, 0 replies; 40+ messages in thread From: Mikulas Patocka @ 2018-06-03 15:03 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm On Wed, 30 May 2018, Mike Snitzer wrote: > On Wed, May 30 2018 at 10:46P -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > > > > > Fine I'll deal with it. reordering the fields eliminated holes in the > > > > > structure and reduced struct members spanning cache lines. > > > > > > > > And what about this? > > > > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > > > > > > > > The code that I had just allowed the compiler to optimize out > > > > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you > > > > deleted it. > > > > > > > > Most architectures don't have persistent memory and the dm-writecache > > > > driver could work in ssd-only mode on them. On these architectures, I > > > > define > > > > #define WC_MODE_PMEM(wc) false > > > > - and the compiler will just automatically remove the tests for that > > > > condition and the unused branch. It does also eliminate unused static > > > > functions. > > > > > > This level of microoptimization can be backfilled. But as it was, there > > > were too many #defines. And I'm really not concerned with eliminating > > > unused static functions for this case. > > > > I don't see why "too many defines" would be a problem. > > > > If I compile it with and without pmem support, the difference is > > 15kB-vs-12kB. If we look at just one function (writecache_map), the > > difference is 1595 bytes - vs - 1280 bytes. So, it produces real savings > > in code size. > > > > The problem with performance is not caused a condition that always jumps > > the same way (that is predicted by the CPU and it causes no delays in the > > pipeline) - the problem is that a bigger function consumes more i-cache. > > There is no reason to include code that can't be executed. > > Please double check you see the reduced code size you're expecting using > the latest dm-writecache.c in linux-dm.git's dm-4.18 branch. > > Thanks, > Mike I checked that - it's OK. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 14:09 ` Mikulas Patocka 2018-05-30 14:21 ` Mike Snitzer @ 2018-05-31 3:39 ` Mike Snitzer 2018-05-31 8:16 ` Mikulas Patocka 1 sibling, 1 reply; 40+ messages in thread From: Mike Snitzer @ 2018-05-31 3:39 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm On Wed, May 30 2018 at 10:09P -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > And what about this? > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > > The code that I had just allowed the compiler to optimize out > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you > deleted it. > > Most architectures don't have persistent memory and the dm-writecache > driver could work in ssd-only mode on them. On these architectures, I > define > #define WC_MODE_PMEM(wc) false > - and the compiler will just automatically remove the tests for that > condition and the unused branch. It does also eliminate unused static > functions. Here is the patch that I just folded into the rebased version of dm-writecache now in the dm-4.18 branch. (I rebased ontop of Jens' latest block tree for 4.18 that now includes the mempool_init changes, etc.) --- drivers/md/dm-writecache.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c index fcbfaf7c27ec..691b5ffb799f 100644 --- a/drivers/md/dm-writecache.c +++ b/drivers/md/dm-writecache.c @@ -34,13 +34,21 @@ #define BITMAP_GRANULARITY PAGE_SIZE #endif +#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) +#define DM_WRITECACHE_HAS_PMEM +#endif + +#ifdef DM_WRITECACHE_HAS_PMEM #define pmem_assign(dest, src) \ do { \ typeof(dest) uniq = (src); \ memcpy_flushcache(&(dest), &uniq, sizeof(dest)); \ } while (0) +#else +#define pmem_assign(dest, src) ((dest)) = (src)) +#endif -#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(CONFIG_ARCH_HAS_PMEM_API) +#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM) #define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS #endif @@ -87,8 +95,13 @@ struct wc_entry { #endif }; +#ifdef DM_WRITECACHE_HAS_PMEM #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) #define WC_MODE_FUA(wc) ((wc)->writeback_fua) +#else +#define WC_MODE_PMEM(wc) false +#define WC_MODE_FUA(wc) false +#endif #define WC_MODE_SORT_FREELIST(wc) (!WC_MODE_PMEM(wc)) struct dm_writecache { @@ -1857,7 +1870,7 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv) if (!strcasecmp(string, "s")) { wc->pmem_mode = false; } else if (!strcasecmp(string, "p")) { -#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) +#ifdef DM_WRITECACHE_HAS_PMEM wc->pmem_mode = true; wc->writeback_fua = true; #else -- 2.15.0 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-31 3:39 ` Mike Snitzer @ 2018-05-31 8:16 ` Mikulas Patocka 2018-05-31 12:09 ` Mike Snitzer 0 siblings, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-31 8:16 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm On Wed, 30 May 2018, Mike Snitzer wrote: > On Wed, May 30 2018 at 10:09P -0400, > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > And what about this? > > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > > > > The code that I had just allowed the compiler to optimize out > > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you > > deleted it. > > > > Most architectures don't have persistent memory and the dm-writecache > > driver could work in ssd-only mode on them. On these architectures, I > > define > > #define WC_MODE_PMEM(wc) false > > - and the compiler will just automatically remove the tests for that > > condition and the unused branch. It does also eliminate unused static > > functions. > > Here is the patch that I just folded into the rebased version of > dm-writecache now in the dm-4.18 branch. > > (I rebased ontop of Jens' latest block tree for 4.18 that now includes > the mempool_init changes, etc.) > > --- > drivers/md/dm-writecache.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c > index fcbfaf7c27ec..691b5ffb799f 100644 > --- a/drivers/md/dm-writecache.c > +++ b/drivers/md/dm-writecache.c > @@ -34,13 +34,21 @@ > #define BITMAP_GRANULARITY PAGE_SIZE > #endif > > +#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) > +#define DM_WRITECACHE_HAS_PMEM > +#endif > + > +#ifdef DM_WRITECACHE_HAS_PMEM > #define pmem_assign(dest, src) \ > do { \ > typeof(dest) uniq = (src); \ > memcpy_flushcache(&(dest), &uniq, sizeof(dest)); \ > } while (0) > +#else > +#define pmem_assign(dest, src) ((dest)) = (src)) > +#endif > > -#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(CONFIG_ARCH_HAS_PMEM_API) > +#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM) > #define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS > #endif > > @@ -87,8 +95,13 @@ struct wc_entry { > #endif > }; > > +#ifdef DM_WRITECACHE_HAS_PMEM > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > #define WC_MODE_FUA(wc) ((wc)->writeback_fua) > +#else > +#define WC_MODE_PMEM(wc) false > +#define WC_MODE_FUA(wc) false > +#endif > #define WC_MODE_SORT_FREELIST(wc) (!WC_MODE_PMEM(wc)) > > struct dm_writecache { > @@ -1857,7 +1870,7 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv) > if (!strcasecmp(string, "s")) { > wc->pmem_mode = false; > } else if (!strcasecmp(string, "p")) { > -#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) > +#ifdef DM_WRITECACHE_HAS_PMEM > wc->pmem_mode = true; > wc->writeback_fua = true; > #else > -- > 2.15.0 OK. I think that persistent_memory_claim should also be conditioned based on DM_WRITECACHE_HAS_PMEM - i.e. if we have DM_WRITECACHE_HAS_PMEM, we don't need to use other ifdefs. Is there some difference between "#if IS_ENABLED(CONFIG_OPTION)" and "#if defined(CONFIG_OPTION)"? Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-31 8:16 ` Mikulas Patocka @ 2018-05-31 12:09 ` Mike Snitzer 0 siblings, 0 replies; 40+ messages in thread From: Mike Snitzer @ 2018-05-31 12:09 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm On Thu, May 31 2018 at 4:16am -0400, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 30 May 2018, Mike Snitzer wrote: > > > On Wed, May 30 2018 at 10:09P -0400, > > Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > And what about this? > > > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > > > > > > The code that I had just allowed the compiler to optimize out > > > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you > > > deleted it. > > > > > > Most architectures don't have persistent memory and the dm-writecache > > > driver could work in ssd-only mode on them. On these architectures, I > > > define > > > #define WC_MODE_PMEM(wc) false > > > - and the compiler will just automatically remove the tests for that > > > condition and the unused branch. It does also eliminate unused static > > > functions. > > > > Here is the patch that I just folded into the rebased version of > > dm-writecache now in the dm-4.18 branch. > > > > (I rebased ontop of Jens' latest block tree for 4.18 that now includes > > the mempool_init changes, etc.) > > > > --- > > drivers/md/dm-writecache.c | 17 +++++++++++++++-- > > 1 file changed, 15 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c > > index fcbfaf7c27ec..691b5ffb799f 100644 > > --- a/drivers/md/dm-writecache.c > > +++ b/drivers/md/dm-writecache.c > > @@ -34,13 +34,21 @@ > > #define BITMAP_GRANULARITY PAGE_SIZE > > #endif > > > > +#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) > > +#define DM_WRITECACHE_HAS_PMEM > > +#endif > > + > > +#ifdef DM_WRITECACHE_HAS_PMEM > > #define pmem_assign(dest, src) \ > > do { \ > > typeof(dest) uniq = (src); \ > > memcpy_flushcache(&(dest), &uniq, sizeof(dest)); \ > > } while (0) > > +#else > > +#define pmem_assign(dest, src) ((dest)) = (src)) > > +#endif > > > > -#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(CONFIG_ARCH_HAS_PMEM_API) > > +#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM) > > #define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS > > #endif > > > > @@ -87,8 +95,13 @@ struct wc_entry { > > #endif > > }; > > > > +#ifdef DM_WRITECACHE_HAS_PMEM > > #define WC_MODE_PMEM(wc) ((wc)->pmem_mode) > > #define WC_MODE_FUA(wc) ((wc)->writeback_fua) > > +#else > > +#define WC_MODE_PMEM(wc) false > > +#define WC_MODE_FUA(wc) false > > +#endif > > #define WC_MODE_SORT_FREELIST(wc) (!WC_MODE_PMEM(wc)) > > > > struct dm_writecache { > > @@ -1857,7 +1870,7 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv) > > if (!strcasecmp(string, "s")) { > > wc->pmem_mode = false; > > } else if (!strcasecmp(string, "p")) { > > -#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) > > +#ifdef DM_WRITECACHE_HAS_PMEM > > wc->pmem_mode = true; > > wc->writeback_fua = true; > > #else > > -- > > 2.15.0 > > OK. > > I think that persistent_memory_claim should also be conditioned based on > DM_WRITECACHE_HAS_PMEM - i.e. if we have DM_WRITECACHE_HAS_PMEM, we don't > need to use other ifdefs. > > Is there some difference between "#if IS_ENABLED(CONFIG_OPTION)" and > "#if defined(CONFIG_OPTION)"? Not that I'm aware of: #define IS_ENABLED(option) __or(IS_BUILTIN(option), IS_MODULE(option)) Here is the incremental patch I just folded in: diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c index fd3bc232b7d6..f2ae02f22c43 100644 --- a/drivers/md/dm-writecache.c +++ b/drivers/md/dm-writecache.c @@ -34,7 +34,7 @@ #define BITMAP_GRANULARITY PAGE_SIZE #endif -#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) +#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) #define DM_WRITECACHE_HAS_PMEM #endif @@ -218,7 +218,7 @@ static void wc_unlock(struct dm_writecache *wc) mutex_unlock(&wc->lock); } -#if IS_ENABLED(CONFIG_DAX_DRIVER) +#ifdef DM_WRITECACHE_HAS_PMEM static int persistent_memory_claim(struct dm_writecache *wc) { int r; _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 13:07 ` Mikulas Patocka 2018-05-30 13:16 ` Mike Snitzer @ 2018-05-30 15:58 ` Dan Williams 2018-05-30 22:39 ` Dan Williams 1 sibling, 1 reply; 40+ messages in thread From: Dan Williams @ 2018-05-30 15:58 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Wed, May 30, 2018 at 6:07 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 28 May 2018, Dan Williams wrote: > >> On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Sat, 26 May 2018, Dan Williams wrote: >> > >> >> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> > >> >> > >> >> > On Fri, 25 May 2018, Dan Williams wrote: >> >> > >> >> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote: >> >> >> > On Fri, May 25 2018 at 2:17am -0400, >> >> >> > Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> >> > >> >> >> >> On Thu, 24 May 2018, Dan Williams wrote: >> >> >> >> >> >> >> >> > I don't want to grow driver-local wrappers for pmem. You should use >> >> >> >> > memcpy_flushcache directly() and if an architecture does not define >> >> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this >> >> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't >> >> >> >> > see a need to add a standalone flush operation if all relevant archs >> >> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb() >> >> >> >> > directly since all archs define it. Alternatively we could introduce >> >> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy >> >> >> >> > routine and memcpy_flushcache() would imply a wmb(). >> >> >> >> >> >> >> >> But memcpy_flushcache() on ARM64 is slow. >> >> >> >> >> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() + >> >> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see >> >> >> that as a trailblazing requirement, I see that as typical review and a >> >> >> reduction of the operation space that you are proposing. >> >> > >> >> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is >> >> > slower than memcpy and explicit cache flush. >> >> > >> >> > Suppose that you want to write data to a block device and make it >> >> > persistent. So you send a WRITE bio and then a FLUSH bio. >> >> > >> >> > Now - how to implement these two bios on persistent memory: >> >> > >> >> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does >> >> > wmb() - this is the optimal implementation. >> >> > >> >> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal >> >> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio >> >> > does arch_wb_cache_pmem() on the affected range. >> >> > >> >> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture >> >> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented >> >> > as memcpy() followed by a cache flush. >> >> > >> >> > Now - if you flush the cache immediatelly after memcpy, the cache is full >> >> > of dirty lines and the cache-flushing code has to write these lines back >> >> > and that is slow. >> >> > >> >> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is >> >> > received), the processor already flushed some part of the cache on its >> >> > own, so the cache-flushing function has less work to do and it is faster. >> >> > >> >> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem >> >> > cannot be fixed by a better implementation of memcpy_flushcache. >> >> >> >> It sounds like ARM might be better off with mapping its pmem as >> >> write-through rather than write-back, and skip the explicit cache >> > >> > I doubt it would perform well - write combining combines the writes into a >> > larger segments - and write-through doesn't. >> > >> >> Last I checked write-through caching does not disable write combining >> >> >> management altogether. You speak of "optimal" and "sub-optimal", but >> >> what would be more clear is fio measurements of the relative IOPs and >> >> latency profiles of the different approaches. The reason I am >> >> continuing to push here is that reducing the operation space from >> >> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the >> >> maintenance long term. >> > >> > I measured it (with nvme backing store) and late cache flushing has 12% >> > better performance than eager flushing with memcpy_flushcache(). >> >> I assume what you're seeing is ARM64 over-flushing the amount of dirty >> data so it becomes more efficient to do an amortized flush at the end? >> However, that effectively makes memcpy_flushcache() unusable in the >> way it can be used on x86. You claimed that ARM does not support >> non-temporal stores, but it does, see the STNP instruction. I do not >> want to see arch specific optimizations in drivers, so either >> write-through mappings is a potential answer to remove the need to >> explicitly manage flushing, or just implement STNP hacks in >> memcpy_flushcache() like you did with MOVNT on x86. >> >> > 131836 4k iops - vs - 117016. >> >> To be clear this is memcpy_flushcache() vs memcpy + flush? > > I found out what caused the difference. I used dax_flush on the version of > dm-writecache that I had on the ARM machine (with the kernel 4.14, because > it is the last version where dax on ramdisk works) - and I thought that > dax_flush flushes the cache, but it doesn't. > > When I replaced dax_flush with arch_wb_cache_pmem, the performance > difference between early flushing and late flushing disappeared. > > So I think we can remove this per-architecture switch from dm-writecache. Great find! Thanks for the due diligence. Feel free to add: Acked-by: Dan Williams <dan.j.williams@intel.com> ...on the reworks to unify ARM and x86. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 15:58 ` Dan Williams @ 2018-05-30 22:39 ` Dan Williams 2018-05-31 8:19 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2018-05-30 22:39 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Wed, May 30, 2018 at 8:58 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Wed, May 30, 2018 at 6:07 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> >> On Mon, 28 May 2018, Dan Williams wrote: >> >>> On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: >>> > >>> > >>> > On Sat, 26 May 2018, Dan Williams wrote: >>> > >>> >> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: >>> >> > >>> >> > >>> >> > On Fri, 25 May 2018, Dan Williams wrote: >>> >> > >>> >> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote: >>> >> >> > On Fri, May 25 2018 at 2:17am -0400, >>> >> >> > Mikulas Patocka <mpatocka@redhat.com> wrote: >>> >> >> > >>> >> >> >> On Thu, 24 May 2018, Dan Williams wrote: >>> >> >> >> >>> >> >> >> > I don't want to grow driver-local wrappers for pmem. You should use >>> >> >> >> > memcpy_flushcache directly() and if an architecture does not define >>> >> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this >>> >> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't >>> >> >> >> > see a need to add a standalone flush operation if all relevant archs >>> >> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb() >>> >> >> >> > directly since all archs define it. Alternatively we could introduce >>> >> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy >>> >> >> >> > routine and memcpy_flushcache() would imply a wmb(). >>> >> >> >> >>> >> >> >> But memcpy_flushcache() on ARM64 is slow. >>> >> >> >>> >> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() + >>> >> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see >>> >> >> that as a trailblazing requirement, I see that as typical review and a >>> >> >> reduction of the operation space that you are proposing. >>> >> > >>> >> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is >>> >> > slower than memcpy and explicit cache flush. >>> >> > >>> >> > Suppose that you want to write data to a block device and make it >>> >> > persistent. So you send a WRITE bio and then a FLUSH bio. >>> >> > >>> >> > Now - how to implement these two bios on persistent memory: >>> >> > >>> >> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does >>> >> > wmb() - this is the optimal implementation. >>> >> > >>> >> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal >>> >> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio >>> >> > does arch_wb_cache_pmem() on the affected range. >>> >> > >>> >> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture >>> >> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented >>> >> > as memcpy() followed by a cache flush. >>> >> > >>> >> > Now - if you flush the cache immediatelly after memcpy, the cache is full >>> >> > of dirty lines and the cache-flushing code has to write these lines back >>> >> > and that is slow. >>> >> > >>> >> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is >>> >> > received), the processor already flushed some part of the cache on its >>> >> > own, so the cache-flushing function has less work to do and it is faster. >>> >> > >>> >> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem >>> >> > cannot be fixed by a better implementation of memcpy_flushcache. >>> >> >>> >> It sounds like ARM might be better off with mapping its pmem as >>> >> write-through rather than write-back, and skip the explicit cache >>> > >>> > I doubt it would perform well - write combining combines the writes into a >>> > larger segments - and write-through doesn't. >>> > >>> >>> Last I checked write-through caching does not disable write combining >>> >>> >> management altogether. You speak of "optimal" and "sub-optimal", but >>> >> what would be more clear is fio measurements of the relative IOPs and >>> >> latency profiles of the different approaches. The reason I am >>> >> continuing to push here is that reducing the operation space from >>> >> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the >>> >> maintenance long term. >>> > >>> > I measured it (with nvme backing store) and late cache flushing has 12% >>> > better performance than eager flushing with memcpy_flushcache(). >>> >>> I assume what you're seeing is ARM64 over-flushing the amount of dirty >>> data so it becomes more efficient to do an amortized flush at the end? >>> However, that effectively makes memcpy_flushcache() unusable in the >>> way it can be used on x86. You claimed that ARM does not support >>> non-temporal stores, but it does, see the STNP instruction. I do not >>> want to see arch specific optimizations in drivers, so either >>> write-through mappings is a potential answer to remove the need to >>> explicitly manage flushing, or just implement STNP hacks in >>> memcpy_flushcache() like you did with MOVNT on x86. >>> >>> > 131836 4k iops - vs - 117016. >>> >>> To be clear this is memcpy_flushcache() vs memcpy + flush? >> >> I found out what caused the difference. I used dax_flush on the version of >> dm-writecache that I had on the ARM machine (with the kernel 4.14, because >> it is the last version where dax on ramdisk works) - and I thought that >> dax_flush flushes the cache, but it doesn't. >> >> When I replaced dax_flush with arch_wb_cache_pmem, the performance >> difference between early flushing and late flushing disappeared. >> >> So I think we can remove this per-architecture switch from dm-writecache. > > Great find! Thanks for the due diligence. Feel free to add: > > Acked-by: Dan Williams <dan.j.williams@intel.com> > > ...on the reworks to unify ARM and x86. One more note. The side effect of not using dax_flush() is that you may end up flushing caches on systems where the platform has asserted it will take responsibility for flushing caches at power loss. If / when those systems become more prevalent we may want to think of a way to combine the non-temporal optimization and the cache-flush-bypass optimizations. However that is something that can wait for a later change beyond 4.18. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-30 22:39 ` Dan Williams @ 2018-05-31 8:19 ` Mikulas Patocka 2018-05-31 14:51 ` Dan Williams 0 siblings, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-31 8:19 UTC (permalink / raw) To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Wed, 30 May 2018, Dan Williams wrote: > > Great find! Thanks for the due diligence. Feel free to add: > > > > Acked-by: Dan Williams <dan.j.williams@intel.com> > > > > ...on the reworks to unify ARM and x86. > > One more note. The side effect of not using dax_flush() is that you > may end up flushing caches on systems where the platform has asserted > it will take responsibility for flushing caches at power loss. If / > when those systems become more prevalent we may want to think of a way > to combine the non-temporal optimization and the cache-flush-bypass > optimizations. However that is something that can wait for a later > change beyond 4.18. We could define memcpy_flushpmem, that falls back to memcpy or memcpy_flushcache, depending on whether the platform flushes the caches at power loss or not. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-31 8:19 ` Mikulas Patocka @ 2018-05-31 14:51 ` Dan Williams 2018-05-31 15:31 ` Mikulas Patocka 0 siblings, 1 reply; 40+ messages in thread From: Dan Williams @ 2018-05-31 14:51 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Thu, May 31, 2018 at 1:19 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Wed, 30 May 2018, Dan Williams wrote: > >> > Great find! Thanks for the due diligence. Feel free to add: >> > >> > Acked-by: Dan Williams <dan.j.williams@intel.com> >> > >> > ...on the reworks to unify ARM and x86. >> >> One more note. The side effect of not using dax_flush() is that you >> may end up flushing caches on systems where the platform has asserted >> it will take responsibility for flushing caches at power loss. If / >> when those systems become more prevalent we may want to think of a way >> to combine the non-temporal optimization and the cache-flush-bypass >> optimizations. However that is something that can wait for a later >> change beyond 4.18. > > We could define memcpy_flushpmem, that falls back to memcpy or > memcpy_flushcache, depending on whether the platform flushes the caches at > power loss or not. The problem is that some platforms only power fail protect a subset of the physical address range, but yes, if the platform makes a global assertion we can globally replace memcpy_flushpmem() with plain memcpy. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-31 14:51 ` Dan Williams @ 2018-05-31 15:31 ` Mikulas Patocka 2018-05-31 16:39 ` Dan Williams 0 siblings, 1 reply; 40+ messages in thread From: Mikulas Patocka @ 2018-05-31 15:31 UTC (permalink / raw) To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Thu, 31 May 2018, Dan Williams wrote: > On Thu, May 31, 2018 at 1:19 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Wed, 30 May 2018, Dan Williams wrote: > > > >> > Great find! Thanks for the due diligence. Feel free to add: > >> > > >> > Acked-by: Dan Williams <dan.j.williams@intel.com> > >> > > >> > ...on the reworks to unify ARM and x86. > >> > >> One more note. The side effect of not using dax_flush() is that you > >> may end up flushing caches on systems where the platform has asserted > >> it will take responsibility for flushing caches at power loss. If / > >> when those systems become more prevalent we may want to think of a way > >> to combine the non-temporal optimization and the cache-flush-bypass > >> optimizations. However that is something that can wait for a later > >> change beyond 4.18. > > > > We could define memcpy_flushpmem, that falls back to memcpy or > > memcpy_flushcache, depending on whether the platform flushes the caches at > > power loss or not. > > The problem is that some platforms only power fail protect a subset of > the physical address range, How can this be? A psysical address may be cached on any CPU, so either there is enough power to flush all the CPUs' caches or there isn't. How does the CPU design that protects only a part of physical addresses look like? > but yes, if the platform makes a global > assertion we can globally replace memcpy_flushpmem() with plain > memcpy. Mikulas _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [patch 4/4] dm-writecache: use new API for flushing 2018-05-31 15:31 ` Mikulas Patocka @ 2018-05-31 16:39 ` Dan Williams 0 siblings, 0 replies; 40+ messages in thread From: Dan Williams @ 2018-05-31 16:39 UTC (permalink / raw) To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm On Thu, May 31, 2018 at 8:31 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Thu, 31 May 2018, Dan Williams wrote: > >> On Thu, May 31, 2018 at 1:19 AM, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Wed, 30 May 2018, Dan Williams wrote: >> > >> >> > Great find! Thanks for the due diligence. Feel free to add: >> >> > >> >> > Acked-by: Dan Williams <dan.j.williams@intel.com> >> >> > >> >> > ...on the reworks to unify ARM and x86. >> >> >> >> One more note. The side effect of not using dax_flush() is that you >> >> may end up flushing caches on systems where the platform has asserted >> >> it will take responsibility for flushing caches at power loss. If / >> >> when those systems become more prevalent we may want to think of a way >> >> to combine the non-temporal optimization and the cache-flush-bypass >> >> optimizations. However that is something that can wait for a later >> >> change beyond 4.18. >> > >> > We could define memcpy_flushpmem, that falls back to memcpy or >> > memcpy_flushcache, depending on whether the platform flushes the caches at >> > power loss or not. >> >> The problem is that some platforms only power fail protect a subset of >> the physical address range, > > How can this be? A psysical address may be cached on any CPU, so either > there is enough power to flush all the CPUs' caches or there isn't. > > How does the CPU design that protects only a part of physical addresses > look like? It's not necessarily a CPU problem, it may be a problem of having enough stored energy to potentially software loops to flush caches. There's also the consideration that a general purpose platform may mix persistent memory technologies from different vendors where some might be flash-backed DRAM and some might be persistent media directly. For now I don't think we need to worry about it, but I don't want to make the assumption that this property is platform global given the history of how persistent memory has been deployed to date. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2018-06-03 15:03 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20180519052503.325953342@debian.vm>
[not found] ` <20180519052635.567438191@debian.vm>
2018-05-22 6:39 ` [dm-devel] [patch 4/4] dm-writecache: use new API for flushing Christoph Hellwig
2018-05-22 18:41 ` Mike Snitzer
2018-05-22 19:00 ` Dan Williams
2018-05-22 19:19 ` Mike Snitzer
2018-05-22 19:27 ` Dan Williams
2018-05-22 20:52 ` Mike Snitzer
2018-05-22 22:53 ` [dm-devel] " Jeff Moyer
2018-05-23 20:57 ` Mikulas Patocka
2018-05-28 13:52 ` Mikulas Patocka
2018-05-28 17:41 ` Dan Williams
2018-05-30 13:42 ` [dm-devel] " Jeff Moyer
2018-05-30 13:51 ` Mikulas Patocka
2018-05-30 13:52 ` Jeff Moyer
2018-05-24 8:15 ` Mikulas Patocka
[not found] ` <CAPcyv4iEtfuVGPR0QMKcafv2XFwSj3nzxjX8cuXpXe00akAvYA@mail.gmail.com>
[not found] ` <alpine.LRH.2.02.1805250213270.13894@file01.intranet.prod.int.rdu2.redhat.com>
2018-05-25 12:51 ` Mike Snitzer
2018-05-25 15:57 ` Dan Williams
2018-05-26 7:02 ` Mikulas Patocka
2018-05-26 15:26 ` Dan Williams
2018-05-28 13:32 ` Mikulas Patocka
2018-05-28 18:14 ` Dan Williams
2018-05-30 13:07 ` Mikulas Patocka
2018-05-30 13:16 ` Mike Snitzer
2018-05-30 13:21 ` Mikulas Patocka
2018-05-30 13:26 ` Mike Snitzer
2018-05-30 13:33 ` Mikulas Patocka
2018-05-30 13:54 ` Mike Snitzer
2018-05-30 14:09 ` Mikulas Patocka
2018-05-30 14:21 ` Mike Snitzer
2018-05-30 14:46 ` Mikulas Patocka
2018-05-31 3:42 ` Mike Snitzer
2018-06-03 15:03 ` Mikulas Patocka
2018-05-31 3:39 ` Mike Snitzer
2018-05-31 8:16 ` Mikulas Patocka
2018-05-31 12:09 ` Mike Snitzer
2018-05-30 15:58 ` Dan Williams
2018-05-30 22:39 ` Dan Williams
2018-05-31 8:19 ` Mikulas Patocka
2018-05-31 14:51 ` Dan Williams
2018-05-31 15:31 ` Mikulas Patocka
2018-05-31 16:39 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).