Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing

linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed

* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing
       [not found] ` <20180519052635.567438191@debian.vm>
@ 2018-05-22  6:39   ` Christoph Hellwig
  2018-05-22 18:41     ` Mike Snitzer
       [not found]   ` <CAPcyv4iEtfuVGPR0QMKcafv2XFwSj3nzxjX8cuXpXe00akAvYA@mail.gmail.com>
  1 sibling, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2018-05-22  6:39 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Mike Snitzer, dm-devel, linux-nvdimm

On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote:
> Use new API for flushing persistent memory.

The sentence doesnt make much sense.  'A new API', 'A better
abstraction' maybe?

> 
> The problem is this:
> * on X86-64, non-temporal stores have the best performance
> * ARM64 doesn't have non-temporal stores, so we must flush cache. We
>   should flush cache as late as possible, because it performs better this
>   way.
> 
> We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
> data persistently, all three functions must be called.
> 
> The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
> (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
> 
> On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
> pmem_commit is wmb.
> 
> On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
> pmem_commit is empty.

All these should be provided by the pmem layer, and be properly
documented.  And be sorted before adding your new target that uses
them.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22  6:39   ` [dm-devel] [patch 4/4] dm-writecache: use new API for flushing Christoph Hellwig
@ 2018-05-22 18:41     ` Mike Snitzer
  2018-05-22 19:00       ` Dan Williams
  0 siblings, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-22 18:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Mikulas Patocka, dm-devel, linux-nvdimm

On Tue, May 22 2018 at  2:39am -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote:
> > Use new API for flushing persistent memory.
> 
> The sentence doesnt make much sense.  'A new API', 'A better
> abstraction' maybe?
> 
> > 
> > The problem is this:
> > * on X86-64, non-temporal stores have the best performance
> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We
> >   should flush cache as late as possible, because it performs better this
> >   way.
> > 
> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
> > data persistently, all three functions must be called.
> > 
> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
> > 
> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
> > pmem_commit is wmb.
> > 
> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
> > pmem_commit is empty.
> 
> All these should be provided by the pmem layer, and be properly
> documented.  And be sorted before adding your new target that uses
> them.

I don't see that as a hard requirement.  Mikulas did the work to figure
out what is more optimal on x86_64 vs amd64.  It makes a difference for
his target and that is sufficient to carry it locally until/when it is
either elevated to pmem.

We cannot even get x86 and swait maintainers to reply to repeat requests
for review.  Stacking up further deps on pmem isn't high on my list.

Mike
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22 18:41     ` Mike Snitzer
@ 2018-05-22 19:00       ` Dan Williams
  2018-05-22 19:19         ` Mike Snitzer
  2018-05-24  8:15         ` Mikulas Patocka
  0 siblings, 2 replies; 40+ messages in thread
From: Dan Williams @ 2018-05-22 19:00 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka,
	linux-nvdimm

On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, May 22 2018 at  2:39am -0400,
> Christoph Hellwig <hch@infradead.org> wrote:
>
>> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote:
>> > Use new API for flushing persistent memory.
>>
>> The sentence doesnt make much sense.  'A new API', 'A better
>> abstraction' maybe?
>>
>> >
>> > The problem is this:
>> > * on X86-64, non-temporal stores have the best performance
>> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We
>> >   should flush cache as late as possible, because it performs better this
>> >   way.
>> >
>> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
>> > data persistently, all three functions must be called.
>> >
>> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
>> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
>> >
>> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
>> > pmem_commit is wmb.
>> >
>> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
>> > pmem_commit is empty.
>>
>> All these should be provided by the pmem layer, and be properly
>> documented.  And be sorted before adding your new target that uses
>> them.
>
> I don't see that as a hard requirement.  Mikulas did the work to figure
> out what is more optimal on x86_64 vs amd64.  It makes a difference for
> his target and that is sufficient to carry it locally until/when it is
> either elevated to pmem.
>
> We cannot even get x86 and swait maintainers to reply to repeat requests
> for review.  Stacking up further deps on pmem isn't high on my list.
>

Except I'm being responsive. I agree with Christoph that we should
build pmem helpers at an architecture level and not per-driver. Let's
make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch
up to x86 in this space. We already have PowerPC enabling PMEM API, so
I don't see an unreasonable barrier to ask the same of ARM. This patch
is not even cc'd to linux-arm-kernel. Has the subject been broached
with them?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22 19:00       ` Dan Williams
@ 2018-05-22 19:19         ` Mike Snitzer
  2018-05-22 19:27           ` Dan Williams
  2018-05-24  8:15         ` Mikulas Patocka
  1 sibling, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-22 19:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka,
	linux-nvdimm

On Tue, May 22 2018 at  3:00pm -0400,
Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> > On Tue, May 22 2018 at  2:39am -0400,
> > Christoph Hellwig <hch@infradead.org> wrote:
> >
> >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote:
> >> > Use new API for flushing persistent memory.
> >>
> >> The sentence doesnt make much sense.  'A new API', 'A better
> >> abstraction' maybe?
> >>
> >> >
> >> > The problem is this:
> >> > * on X86-64, non-temporal stores have the best performance
> >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We
> >> >   should flush cache as late as possible, because it performs better this
> >> >   way.
> >> >
> >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
> >> > data persistently, all three functions must be called.
> >> >
> >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
> >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
> >> >
> >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
> >> > pmem_commit is wmb.
> >> >
> >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
> >> > pmem_commit is empty.
> >>
> >> All these should be provided by the pmem layer, and be properly
> >> documented.  And be sorted before adding your new target that uses
> >> them.
> >
> > I don't see that as a hard requirement.  Mikulas did the work to figure
> > out what is more optimal on x86_64 vs amd64.  It makes a difference for
> > his target and that is sufficient to carry it locally until/when it is
> > either elevated to pmem.
> >
> > We cannot even get x86 and swait maintainers to reply to repeat requests
> > for review.  Stacking up further deps on pmem isn't high on my list.
> >
> 
> Except I'm being responsive.

Except you're looking to immediately punt to linux-arm-kernel ;)

> I agree with Christoph that we should
> build pmem helpers at an architecture level and not per-driver. Let's
> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch
> up to x86 in this space. We already have PowerPC enabling PMEM API, so
> I don't see an unreasonable barrier to ask the same of ARM. This patch
> is not even cc'd to linux-arm-kernel. Has the subject been broached
> with them?

No idea.  Not by me.

The thing is, I'm no expert in pmem.  You are.  Coordinating the change
with ARM et al feels unnecessarily limiting and quicky moves outside my
control.

Serious question: Why can't this code land in this dm-writecache target
and then be lifted (or obsoleted)?

But if you think it worthwhile to force ARM to step up then fine.  That
does limit the availability of using writecache on ARM while they get
the PMEM API together.

I'll do whatever you want.. just put the smack down and tell me how it
is ;)

Thanks,
Mike
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22 19:19         ` Mike Snitzer
@ 2018-05-22 19:27           ` Dan Williams
  2018-05-22 20:52             ` Mike Snitzer
  2018-05-28 13:52             ` Mikulas Patocka
  0 siblings, 2 replies; 40+ messages in thread
From: Dan Williams @ 2018-05-22 19:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka,
	linux-nvdimm

On Tue, May 22, 2018 at 12:19 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, May 22 2018 at  3:00pm -0400,
> Dan Williams <dan.j.williams@intel.com> wrote:
>
>> On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote:
>> > On Tue, May 22 2018 at  2:39am -0400,
>> > Christoph Hellwig <hch@infradead.org> wrote:
>> >
>> >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote:
>> >> > Use new API for flushing persistent memory.
>> >>
>> >> The sentence doesnt make much sense.  'A new API', 'A better
>> >> abstraction' maybe?
>> >>
>> >> >
>> >> > The problem is this:
>> >> > * on X86-64, non-temporal stores have the best performance
>> >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We
>> >> >   should flush cache as late as possible, because it performs better this
>> >> >   way.
>> >> >
>> >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
>> >> > data persistently, all three functions must be called.
>> >> >
>> >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
>> >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
>> >> >
>> >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
>> >> > pmem_commit is wmb.
>> >> >
>> >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
>> >> > pmem_commit is empty.
>> >>
>> >> All these should be provided by the pmem layer, and be properly
>> >> documented.  And be sorted before adding your new target that uses
>> >> them.
>> >
>> > I don't see that as a hard requirement.  Mikulas did the work to figure
>> > out what is more optimal on x86_64 vs amd64.  It makes a difference for
>> > his target and that is sufficient to carry it locally until/when it is
>> > either elevated to pmem.
>> >
>> > We cannot even get x86 and swait maintainers to reply to repeat requests
>> > for review.  Stacking up further deps on pmem isn't high on my list.
>> >
>>
>> Except I'm being responsive.
>
> Except you're looking to immediately punt to linux-arm-kernel ;)

Well, I'm not, not really. I'm saying drop ARM support, it's not ready.

>
>> I agree with Christoph that we should
>> build pmem helpers at an architecture level and not per-driver. Let's
>> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch
>> up to x86 in this space. We already have PowerPC enabling PMEM API, so
>> I don't see an unreasonable barrier to ask the same of ARM. This patch
>> is not even cc'd to linux-arm-kernel. Has the subject been broached
>> with them?
>
> No idea.  Not by me.
>
> The thing is, I'm no expert in pmem.  You are.  Coordinating the change
> with ARM et al feels unnecessarily limiting and quicky moves outside my
> control.
>
> Serious question: Why can't this code land in this dm-writecache target
> and then be lifted (or obsoleted)?

Because we already have an API, and we don't want to promote local
solutions to global problems, or carry  unnecessary technical debt.

>
> But if you think it worthwhile to force ARM to step up then fine.  That
> does limit the availability of using writecache on ARM while they get
> the PMEM API together.
>
> I'll do whatever you want.. just put the smack down and tell me how it
> is ;)

I'd say just control the variables you can control. Drop the ARM
support if you want to move forward and propose extensions / updates
to the pmem api for x86 and I'll help push those since I was involved
in pushing the x86 pmem api in the first instance. That way you don't
need to touch this driver as new archs add their pmem api enabling.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22 19:27           ` Dan Williams
@ 2018-05-22 20:52             ` Mike Snitzer
  2018-05-22 22:53               ` [dm-devel] " Jeff Moyer
  2018-05-28 13:52             ` Mikulas Patocka
  1 sibling, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-22 20:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka,
	linux-nvdimm

On Tue, May 22 2018 at  3:27pm -0400,
Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, May 22, 2018 at 12:19 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> > On Tue, May 22 2018 at  3:00pm -0400,
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >
> >> On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> >> > On Tue, May 22 2018 at  2:39am -0400,
> >> > Christoph Hellwig <hch@infradead.org> wrote:
> >> >
> >> >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote:
> >> >> > Use new API for flushing persistent memory.
> >> >>
> >> >> The sentence doesnt make much sense.  'A new API', 'A better
> >> >> abstraction' maybe?
> >> >>
> >> >> >
> >> >> > The problem is this:
> >> >> > * on X86-64, non-temporal stores have the best performance
> >> >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We
> >> >> >   should flush cache as late as possible, because it performs better this
> >> >> >   way.
> >> >> >
> >> >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
> >> >> > data persistently, all three functions must be called.
> >> >> >
> >> >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
> >> >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
> >> >> >
> >> >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
> >> >> > pmem_commit is wmb.
> >> >> >
> >> >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
> >> >> > pmem_commit is empty.
> >> >>
> >> >> All these should be provided by the pmem layer, and be properly
> >> >> documented.  And be sorted before adding your new target that uses
> >> >> them.
> >> >
> >> > I don't see that as a hard requirement.  Mikulas did the work to figure
> >> > out what is more optimal on x86_64 vs amd64.  It makes a difference for
> >> > his target and that is sufficient to carry it locally until/when it is
> >> > either elevated to pmem.
> >> >
> >> > We cannot even get x86 and swait maintainers to reply to repeat requests
> >> > for review.  Stacking up further deps on pmem isn't high on my list.
> >> >
> >>
> >> Except I'm being responsive.
> >
> > Except you're looking to immediately punt to linux-arm-kernel ;)
> 
> Well, I'm not, not really. I'm saying drop ARM support, it's not ready.
> 
> >
> >> I agree with Christoph that we should
> >> build pmem helpers at an architecture level and not per-driver. Let's
> >> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch
> >> up to x86 in this space. We already have PowerPC enabling PMEM API, so
> >> I don't see an unreasonable barrier to ask the same of ARM. This patch
> >> is not even cc'd to linux-arm-kernel. Has the subject been broached
> >> with them?
> >
> > No idea.  Not by me.
> >
> > The thing is, I'm no expert in pmem.  You are.  Coordinating the change
> > with ARM et al feels unnecessarily limiting and quicky moves outside my
> > control.
> >
> > Serious question: Why can't this code land in this dm-writecache target
> > and then be lifted (or obsoleted)?
> 
> Because we already have an API, and we don't want to promote local
> solutions to global problems, or carry  unnecessary technical debt.
> 
> >
> > But if you think it worthwhile to force ARM to step up then fine.  That
> > does limit the availability of using writecache on ARM while they get
> > the PMEM API together.
> >
> > I'll do whatever you want.. just put the smack down and tell me how it
> > is ;)
> 
> I'd say just control the variables you can control. Drop the ARM
> support if you want to move forward and propose extensions / updates
> to the pmem api for x86 and I'll help push those since I was involved
> in pushing the x86 pmem api in the first instance. That way you don't
> need to touch this driver as new archs add their pmem api enabling.

Looking at Mikulas' wrapper API that you and hch are calling into
question:

For ARM it is using arch/arm64/mm/flush.c:arch_wb_cache_pmem().
(And ARM does seem to be providing CONFIG_ARCH_HAS_PMEM_API.)

Whereas x86_64 is using memcpy_flushcache() as provided by
CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE.
(Yet ARM does provide arch/arm64/lib/uaccess_flushcache.c:memcpy_flushcache)

Just seems this isn't purely about ARM lacking on an API level (given on
x86_64 Mikulas isn't only using CONFIG_ARCH_HAS_PMEM_API).

Seems this is more to do with x86_64 having efficient Non-temporal
stores?

Anyway, I'm still trying to appreciate the details here before I can
make any forward progress.

Mike
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22 20:52             ` Mike Snitzer
@ 2018-05-22 22:53               ` Jeff Moyer
  2018-05-23 20:57                 ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Jeff Moyer @ 2018-05-22 22:53 UTC (permalink / raw)
  To: Mike Snitzer, Mikulas Patocka
  Cc: Christoph Hellwig, device-mapper development, linux-nvdimm

Hi, Mike,

Mike Snitzer <snitzer@redhat.com> writes:

> Looking at Mikulas' wrapper API that you and hch are calling into
> question:
>
> For ARM it is using arch/arm64/mm/flush.c:arch_wb_cache_pmem().
> (And ARM does seem to be providing CONFIG_ARCH_HAS_PMEM_API.)
>
> Whereas x86_64 is using memcpy_flushcache() as provided by
> CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE.
> (Yet ARM does provide arch/arm64/lib/uaccess_flushcache.c:memcpy_flushcache)
>
> Just seems this isn't purely about ARM lacking on an API level (given on
> x86_64 Mikulas isn't only using CONFIG_ARCH_HAS_PMEM_API).
>
> Seems this is more to do with x86_64 having efficient Non-temporal
> stores?

Yeah, I think you've got that all right.

> Anyway, I'm still trying to appreciate the details here before I can
> make any forward progress.

Making data persistent on x64 requires 3 steps:
1) copy the data into pmem   (store instructions)
2) flush the cache lines associated with the data (clflush, clflush_opt, clwb)
3) wait on the flush to complete (sfence)

I'm not sure if other architectures require step 3.  Mikulas'
implementation seems to imply that arm64 doesn't require the fence.

The current pmem api provides:

memcpy*           -- step 1
memcpy_flushcache -- this combines steps 1 and 2
dax_flush         -- step 2
wmb*              -- step 3

* not strictly part of the pmem api

So, if you didn't care about performance, you could write generic code
that only used memcpy, dax_flush, and wmb (assuming other arches
actually need the wmb).  What Mikulas did was to abstract out an API
that could be called by generic code that would work optimally on all
architectures.

This looks like a worth-while addition to the PMEM API, to me.  Mikulas,
what do you think about refactoring the code as Christoph suggested?

Cheers,
Jeff
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22 22:53               ` [dm-devel] " Jeff Moyer
@ 2018-05-23 20:57                 ` Mikulas Patocka
  0 siblings, 0 replies; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-23 20:57 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, device-mapper development, Mike Snitzer,
	linux-nvdimm



On Tue, 22 May 2018, Jeff Moyer wrote:

> Hi, Mike,
> 
> Mike Snitzer <snitzer@redhat.com> writes:
> 
> > Looking at Mikulas' wrapper API that you and hch are calling into
> > question:
> >
> > For ARM it is using arch/arm64/mm/flush.c:arch_wb_cache_pmem().
> > (And ARM does seem to be providing CONFIG_ARCH_HAS_PMEM_API.)
> >
> > Whereas x86_64 is using memcpy_flushcache() as provided by
> > CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE.
> > (Yet ARM does provide arch/arm64/lib/uaccess_flushcache.c:memcpy_flushcache)
> >
> > Just seems this isn't purely about ARM lacking on an API level (given on
> > x86_64 Mikulas isn't only using CONFIG_ARCH_HAS_PMEM_API).
> >
> > Seems this is more to do with x86_64 having efficient Non-temporal
> > stores?
> 
> Yeah, I think you've got that all right.
> 
> > Anyway, I'm still trying to appreciate the details here before I can
> > make any forward progress.
> 
> Making data persistent on x64 requires 3 steps:
> 1) copy the data into pmem   (store instructions)
> 2) flush the cache lines associated with the data (clflush, clflush_opt, clwb)
> 3) wait on the flush to complete (sfence)

In theory it works this way. In practice, this sequence is useless because 
the cache flusing instructions are horribly slow.

So, the dm-writecache driver uses non-temporal stores instead of cache 
flushing.

Now, the problem with arm64 is that it doesn't have non-temporal stores. 
So, memcpy_flushcache on arm64 does cached stores and flushes the cache 
afterwards. And this eager flushing is slower than late flushing. On arm4, 
you want to do cached stores, then do something else, and flush the cache 
as late as possible.

> I'm not sure if other architectures require step 3.  Mikulas'
> implementation seems to imply that arm64 doesn't require the fence.

I suppose that arch_wb_cache_pmem() does whatever it needs to do to flush 
the cache. If not, add something like arch_wb_cache_pmem_commit().

> The current pmem api provides:
> 
> memcpy*           -- step 1
> memcpy_flushcache -- this combines steps 1 and 2
> dax_flush         -- step 2
> wmb*              -- step 3
> 
> * not strictly part of the pmem api
> 
> So, if you didn't care about performance, you could write generic code
> that only used memcpy, dax_flush, and wmb (assuming other arches
> actually need the wmb).  What Mikulas did was to abstract out an API
> that could be called by generic code that would work optimally on all
> architectures.
> 
> This looks like a worth-while addition to the PMEM API, to me.  Mikulas,
> what do you think about refactoring the code as Christoph suggested?

I sent this patch 
https://www.redhat.com/archives/dm-devel/2018-May/msg00054.html so that 
you can take the functions pmem_memcpy, pmem_assign, pmem_flush and 
pmem_commit and move them to the generic linux headers. If you want to do 
it, do it.

> Cheers,
> Jeff

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22 19:27           ` Dan Williams
  2018-05-22 20:52             ` Mike Snitzer
@ 2018-05-28 13:52             ` Mikulas Patocka
  2018-05-28 17:41               ` Dan Williams
  1 sibling, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-28 13:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, device-mapper development, Mike Snitzer,
	linux-nvdimm



On Tue, 22 May 2018, Dan Williams wrote:

> >> Except I'm being responsive.
> >
> > Except you're looking to immediately punt to linux-arm-kernel ;)
> 
> Well, I'm not, not really. I'm saying drop ARM support, it's not ready.

This is the worst thing to do - because once late cache flushing is 
dropped from the dm-writecache target, it could hardly be reintroduced 
again.

> >> I agree with Christoph that we should
> >> build pmem helpers at an architecture level and not per-driver. Let's
> >> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch
> >> up to x86 in this space. We already have PowerPC enabling PMEM API, so
> >> I don't see an unreasonable barrier to ask the same of ARM. This patch
> >> is not even cc'd to linux-arm-kernel. Has the subject been broached
> >> with them?
> >
> > No idea.  Not by me.
> >
> > The thing is, I'm no expert in pmem.  You are.  Coordinating the change
> > with ARM et al feels unnecessarily limiting and quicky moves outside my
> > control.
> >
> > Serious question: Why can't this code land in this dm-writecache target
> > and then be lifted (or obsoleted)?
> 
> Because we already have an API, and we don't want to promote local
> solutions to global problems, or carry  unnecessary technical debt.
> 
> >
> > But if you think it worthwhile to force ARM to step up then fine.  That
> > does limit the availability of using writecache on ARM while they get
> > the PMEM API together.
> >
> > I'll do whatever you want.. just put the smack down and tell me how it
> > is ;)
> 
> I'd say just control the variables you can control. Drop the ARM
> support if you want to move forward and propose extensions / updates

What do we gain by dropping it?

> to the pmem api for x86 and I'll help push those since I was involved
> in pushing the x86 pmem api in the first instance. That way you don't
> need to touch this driver as new archs add their pmem api enabling.

The pmem API is x86-centric - that the problem.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-28 13:52             ` Mikulas Patocka
@ 2018-05-28 17:41               ` Dan Williams
  2018-05-30 13:42                 ` [dm-devel] " Jeff Moyer
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Williams @ 2018-05-28 17:41 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Christoph Hellwig, device-mapper development, Mike Snitzer,
	linux-nvdimm

On Mon, May 28, 2018 at 6:52 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Tue, 22 May 2018, Dan Williams wrote:
>
>> >> Except I'm being responsive.
>> >
>> > Except you're looking to immediately punt to linux-arm-kernel ;)
>>
>> Well, I'm not, not really. I'm saying drop ARM support, it's not ready.
>
> This is the worst thing to do - because once late cache flushing is
> dropped from the dm-writecache target, it could hardly be reintroduced
> again.
>
>> >> I agree with Christoph that we should
>> >> build pmem helpers at an architecture level and not per-driver. Let's
>> >> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch
>> >> up to x86 in this space. We already have PowerPC enabling PMEM API, so
>> >> I don't see an unreasonable barrier to ask the same of ARM. This patch
>> >> is not even cc'd to linux-arm-kernel. Has the subject been broached
>> >> with them?
>> >
>> > No idea.  Not by me.
>> >
>> > The thing is, I'm no expert in pmem.  You are.  Coordinating the change
>> > with ARM et al feels unnecessarily limiting and quicky moves outside my
>> > control.
>> >
>> > Serious question: Why can't this code land in this dm-writecache target
>> > and then be lifted (or obsoleted)?
>>
>> Because we already have an API, and we don't want to promote local
>> solutions to global problems, or carry  unnecessary technical debt.
>>
>> >
>> > But if you think it worthwhile to force ARM to step up then fine.  That
>> > does limit the availability of using writecache on ARM while they get
>> > the PMEM API together.
>> >
>> > I'll do whatever you want.. just put the smack down and tell me how it
>> > is ;)
>>
>> I'd say just control the variables you can control. Drop the ARM
>> support if you want to move forward and propose extensions / updates
>
> What do we gain by dropping it?
>
>> to the pmem api for x86 and I'll help push those since I was involved
>> in pushing the x86 pmem api in the first instance. That way you don't
>> need to touch this driver as new archs add their pmem api enabling.
>
> The pmem API is x86-centric - that the problem.

When I read your patch I came away with the impression that ARM had
not added memcpy_flushcache() yet and you were working around that
fact. Now that I look, ARM *does* define memcpy_flushcache() and
you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on
ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM
memcpy_flushcache() implementation is:

    memcpy(dst, src, cnt);
    __clean_dcache_area_pop(dst, cnt);

So, I do not see how what you're doing is any less work unless you are
flushing less than you copy?

If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the
ARM implementation is broken and that needs to be addressed not worked
around in a driver.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing
  2018-05-28 17:41               ` Dan Williams
@ 2018-05-30 13:42                 ` Jeff Moyer
  2018-05-30 13:51                   ` Mikulas Patocka
  2018-05-30 13:52                   ` Jeff Moyer
  0 siblings, 2 replies; 40+ messages in thread
From: Jeff Moyer @ 2018-05-30 13:42 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka,
	Mike Snitzer, linux-nvdimm

Dan Williams <dan.j.williams@intel.com> writes:

> When I read your patch I came away with the impression that ARM had
> not added memcpy_flushcache() yet and you were working around that
> fact. Now that I look, ARM *does* define memcpy_flushcache() and
> you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on
> ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM
> memcpy_flushcache() implementation is:
>
>     memcpy(dst, src, cnt);
>     __clean_dcache_area_pop(dst, cnt);
>
> So, I do not see how what you're doing is any less work unless you are
> flushing less than you copy?
>
> If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the
> ARM implementation is broken and that needs to be addressed not worked
> around in a driver.

I think Mikulas wanted to batch up multiple copies and flush at the
end.  According to his commit message, that batching gained him 2%
performance.

-Jeff
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:42                 ` [dm-devel] " Jeff Moyer
@ 2018-05-30 13:51                   ` Mikulas Patocka
  2018-05-30 13:52                   ` Jeff Moyer
  1 sibling, 0 replies; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-30 13:51 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, device-mapper development, Mike Snitzer,
	linux-nvdimm



On Wed, 30 May 2018, Jeff Moyer wrote:

> Dan Williams <dan.j.williams@intel.com> writes:
> 
> > When I read your patch I came away with the impression that ARM had
> > not added memcpy_flushcache() yet and you were working around that
> > fact. Now that I look, ARM *does* define memcpy_flushcache() and
> > you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on
> > ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM
> > memcpy_flushcache() implementation is:
> >
> >     memcpy(dst, src, cnt);
> >     __clean_dcache_area_pop(dst, cnt);
> >
> > So, I do not see how what you're doing is any less work unless you are
> > flushing less than you copy?
> >
> > If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the
> > ARM implementation is broken and that needs to be addressed not worked
> > around in a driver.
> 
> I think Mikulas wanted to batch up multiple copies and flush at the
> end.  According to his commit message, that batching gained him 2%
> performance.
> 
> -Jeff

No - this 2% difference is inlined memcpy_flushcache() vs out-of-line 
memcpy_flushcache().

I thought that dax_flush() performed 12% better memcpy_flushcache() - but 
the reason why it performed better was - that it was not flushing the 
cache at all.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [dm-devel] [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:42                 ` [dm-devel] " Jeff Moyer
  2018-05-30 13:51                   ` Mikulas Patocka
@ 2018-05-30 13:52                   ` Jeff Moyer
  1 sibling, 0 replies; 40+ messages in thread
From: Jeff Moyer @ 2018-05-30 13:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, device-mapper development, Mikulas Patocka,
	Mike Snitzer, linux-nvdimm

Jeff Moyer <jmoyer@redhat.com> writes:

> Dan Williams <dan.j.williams@intel.com> writes:
>
>> When I read your patch I came away with the impression that ARM had
>> not added memcpy_flushcache() yet and you were working around that
>> fact. Now that I look, ARM *does* define memcpy_flushcache() and
>> you're avoiding it. You use memcpy+arch_wb_pmem where arch_wb_pmem on
>> ARM64 is defined as __clean_dcache_area_pop(dst, cnt). The ARM
>> memcpy_flushcache() implementation is:
>>
>>     memcpy(dst, src, cnt);
>>     __clean_dcache_area_pop(dst, cnt);
>>
>> So, I do not see how what you're doing is any less work unless you are
>> flushing less than you copy?
>>
>> If memcpy_flushcache() is slower than memcpy + arch_wb_pmem then the
>> ARM implementation is broken and that needs to be addressed not worked
>> around in a driver.
>
> I think Mikulas wanted to batch up multiple copies and flush at the
> end.  According to his commit message, that batching gained him 2%
> performance.

Nevermind me, I just caught up with the rest of the thread.  :)

-Jeff
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-22 19:00       ` Dan Williams
  2018-05-22 19:19         ` Mike Snitzer
@ 2018-05-24  8:15         ` Mikulas Patocka
  1 sibling, 0 replies; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-24  8:15 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, device-mapper development, Mike Snitzer,
	linux-nvdimm



On Tue, 22 May 2018, Dan Williams wrote:

> On Tue, May 22, 2018 at 11:41 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> > On Tue, May 22 2018 at  2:39am -0400,
> > Christoph Hellwig <hch@infradead.org> wrote:
> >
> >> On Sat, May 19, 2018 at 07:25:07AM +0200, Mikulas Patocka wrote:
> >> > Use new API for flushing persistent memory.
> >>
> >> The sentence doesnt make much sense.  'A new API', 'A better
> >> abstraction' maybe?
> >>
> >> >
> >> > The problem is this:
> >> > * on X86-64, non-temporal stores have the best performance
> >> > * ARM64 doesn't have non-temporal stores, so we must flush cache. We
> >> >   should flush cache as late as possible, because it performs better this
> >> >   way.
> >> >
> >> > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
> >> > data persistently, all three functions must be called.
> >> >
> >> > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
> >> > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
> >> >
> >> > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
> >> > pmem_commit is wmb.
> >> >
> >> > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
> >> > pmem_commit is empty.
> >>
> >> All these should be provided by the pmem layer, and be properly
> >> documented.  And be sorted before adding your new target that uses
> >> them.
> >
> > I don't see that as a hard requirement.  Mikulas did the work to figure
> > out what is more optimal on x86_64 vs amd64.  It makes a difference for
> > his target and that is sufficient to carry it locally until/when it is
> > either elevated to pmem.
> >
> > We cannot even get x86 and swait maintainers to reply to repeat requests
> > for review.  Stacking up further deps on pmem isn't high on my list.
> >
> 
> Except I'm being responsive. I agree with Christoph that we should
> build pmem helpers at an architecture level and not per-driver. Let's
> make this driver depend on ARCH_HAS_PMEM_API and require ARM to catch
> up to x86 in this space. We already have PowerPC enabling PMEM API, so
> I don't see an unreasonable barrier to ask the same of ARM. This patch
> is not even cc'd to linux-arm-kernel. Has the subject been broached
> with them?

The ARM code can't "catch-up" with X86.

On X86 - non-temporal stores (i.e. memcpy_flushcache) are faster than 
cached write and cache flushing.

The ARM architecture doesn't have non-temporal stores. So, 
memcpy_flushcache on ARM does memcpy (that writes data to the cache) and 
then flushes the cache. But this eager cache flushig is slower than late 
cache flushing.

The optimal code sequence on ARM to write to persistent memory is to call 
memcpy, then do something else, and then call arch_wb_cache_pmem as late 
as possible. And this ARM-optimized code sequence is just horribly slow on 
X86.

This issue can't be "fixed" in ARM-specific source code. The ARM processor 
have such a characteristics that eager cache flushing is slower than late 
cache flushing - and that's it - you can't change processor behavior.

If you don't want '#if defined(CONFIG_X86_64)' in the dm-writecache 
driver, then just take the functions that are in this conditional block 
and move them to some generic linux header.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

[parent not found: <CAPcyv4iEtfuVGPR0QMKcafv2XFwSj3nzxjX8cuXpXe00akAvYA@mail.gmail.com>]

[parent not found: <alpine.LRH.2.02.1805250213270.13894@file01.intranet.prod.int.rdu2.redhat.com>]

* Re: [patch 4/4] dm-writecache: use new API for flushing
       [not found]     ` <alpine.LRH.2.02.1805250213270.13894@file01.intranet.prod.int.rdu2.redhat.com>
@ 2018-05-25 12:51       ` Mike Snitzer
  2018-05-25 15:57         ` Dan Williams
  0 siblings, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-25 12:51 UTC (permalink / raw)
  To: Dan Williams; +Cc: device-mapper development, Mikulas Patocka, linux-nvdimm

On Fri, May 25 2018 at  2:17am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Thu, 24 May 2018, Dan Williams wrote:
> 
> > On Fri, May 18, 2018 at 10:25 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > Use new API for flushing persistent memory.
> > >
> > > The problem is this:
> > > * on X86-64, non-temporal stores have the best performance
> > > * ARM64 doesn't have non-temporal stores, so we must flush cache. We
> > >   should flush cache as late as possible, because it performs better this
> > >   way.
> > >
> > > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
> > > data persistently, all three functions must be called.
> > >
> > > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
> > > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
> > >
> > > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
> > > pmem_commit is wmb.
> > >
> > > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
> > > pmem_commit is empty.
> > 
> > I don't want to grow driver-local wrappers for pmem. You should use
> > memcpy_flushcache directly() and if an architecture does not define
> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
> > see a need to add a standalone flush operation if all relevant archs
> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
> > directly since all archs define it. Alternatively we could introduce
> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
> > routine and memcpy_flushcache() would imply a wmb().
> 
> But memcpy_flushcache() on ARM64 is slow.

Yes, Dan can you please take some time to fully appreciate what this
small wrapper API is providing (maybe you've done that, but your recent
reply is mixed-message).  Seems you're keeping the innovation it
provides at arms-length.  Basically the PMEM APIs you've helped
construct are lacking, forcing DM developers to own fixing them is an
inversion that only serves to stonewall.

Please, less time on stonewalling and more time lifting this wrapper
API; otherwise the dm-writecache local wrapper API is a near-term means
to an end.

I revised the dm-writecache patch header yesterday, please review:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.18&id=2105231db61b08752bc4247d2fe7838657700b0d
(the last paragraph in particular, I intend to move forward with this
wrapper unless someone on the PMEM side of the house steps up and lifts
it up between now and when the 4.18 merge window opens)

dm: add writecache target
The writecache target caches writes on persistent memory or SSD.
It is intended for databases or other programs that need extremely low
commit latency.

The writecache target doesn't cache reads because reads are supposed to
be cached in page cache in normal RAM.

The following describes the approach used to provide the most efficient
flushing of persistent memory on X86_64 vs ARM64:

* On X86_64 non-temporal stores (i.e. memcpy_flushcache) are faster
  than cached writes and cache flushing.

* The ARM64 architecture doesn't have non-temporal stores. So,
  memcpy_flushcache on ARM does memcpy (that writes data to the cache)
  and then flushes the cache.  But this eager cache flushig is slower
  than late cache flushing.

The optimal code sequence on ARM to write to persistent memory is to
call memcpy, then do something else, and then call arch_wb_cache_pmem as
late as possible. And this ARM-optimized code sequence is just horribly
slow on X86.

This issue can't be "fixed" in ARM-specific source code. The ARM
processor have characteristics such that eager cache flushing is slower
than late cache flushing - and that's it - you can't change processor
behavior.

We introduce a wrapper API for flushing persistent memory with functions
pmem_memcpy, pmem_flush and pmem_commit. To commit data persistently,
all three functions must be called.

The macro pmem_assign may be used instead of pmem_memcpy.  pmem_assign
(unlike pmem_memcpy) guarantees that 8-byte values are written
atomically.

On X86_64, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
pmem_commit is wmb.

On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
pmem_commit is empty.

It is clear that this wrapper API for flushing persistent memory needs
to be elevated out of this dm-writecache driver.  But that can happen
later without requiring DM developers to blaze new trails on pmem
specific implementation details/quirks (pmem developers need to clean up
their APIs given they are already spread across CONFIG_ARCH_HAS_PMEM_API
and CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE and they absolutely don't take
into account the duality of the different programming models needed to
achieve optimal cross-architecture use of persistent memory).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <msnitzer@redhat.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-25 12:51       ` Mike Snitzer
@ 2018-05-25 15:57         ` Dan Williams
  2018-05-26  7:02           ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Williams @ 2018-05-25 15:57 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, Mikulas Patocka, linux-nvdimm

On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Fri, May 25 2018 at  2:17am -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>>
>>
>> On Thu, 24 May 2018, Dan Williams wrote:
>>
>> > On Fri, May 18, 2018 at 10:25 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> > > Use new API for flushing persistent memory.
>> > >
>> > > The problem is this:
>> > > * on X86-64, non-temporal stores have the best performance
>> > > * ARM64 doesn't have non-temporal stores, so we must flush cache. We
>> > >   should flush cache as late as possible, because it performs better this
>> > >   way.
>> > >
>> > > We introduce functions pmem_memcpy, pmem_flush and pmem_commit. To commit
>> > > data persistently, all three functions must be called.
>> > >
>> > > The macro pmem_assign may be used instead of pmem_memcpy. pmem_assign
>> > > (unlike pmem_memcpy) guarantees that 8-byte values are written atomically.
>> > >
>> > > On X86, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
>> > > pmem_commit is wmb.
>> > >
>> > > On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
>> > > pmem_commit is empty.
>> >
>> > I don't want to grow driver-local wrappers for pmem. You should use
>> > memcpy_flushcache directly() and if an architecture does not define
>> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
>> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
>> > see a need to add a standalone flush operation if all relevant archs
>> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
>> > directly since all archs define it. Alternatively we could introduce
>> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
>> > routine and memcpy_flushcache() would imply a wmb().
>>
>> But memcpy_flushcache() on ARM64 is slow.
>
> Yes, Dan can you please take some time to fully appreciate what this
> small wrapper API is providing (maybe you've done that, but your recent
> reply is mixed-message).  Seems you're keeping the innovation it
> provides at arms-length.  Basically the PMEM APIs you've helped
> construct are lacking, forcing DM developers to own fixing them is an
> inversion that only serves to stonewall.
>
> Please, less time on stonewalling and more time lifting this wrapper
> API; otherwise the dm-writecache local wrapper API is a near-term means
> to an end.
>
> I revised the dm-writecache patch header yesterday, please review:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.18&id=2105231db61b08752bc4247d2fe7838657700b0d
> (the last paragraph in particular, I intend to move forward with this
> wrapper unless someone on the PMEM side of the house steps up and lifts
> it up between now and when the 4.18 merge window opens)
>
> dm: add writecache target
> The writecache target caches writes on persistent memory or SSD.
> It is intended for databases or other programs that need extremely low
> commit latency.
>
> The writecache target doesn't cache reads because reads are supposed to
> be cached in page cache in normal RAM.
>
> The following describes the approach used to provide the most efficient
> flushing of persistent memory on X86_64 vs ARM64:
>
> * On X86_64 non-temporal stores (i.e. memcpy_flushcache) are faster
>   than cached writes and cache flushing.
>
> * The ARM64 architecture doesn't have non-temporal stores. So,
>   memcpy_flushcache on ARM does memcpy (that writes data to the cache)
>   and then flushes the cache.  But this eager cache flushig is slower
>   than late cache flushing.
>
> The optimal code sequence on ARM to write to persistent memory is to
> call memcpy, then do something else, and then call arch_wb_cache_pmem as
> late as possible. And this ARM-optimized code sequence is just horribly
> slow on X86.
>
> This issue can't be "fixed" in ARM-specific source code. The ARM
> processor have characteristics such that eager cache flushing is slower
> than late cache flushing - and that's it - you can't change processor
> behavior.
>
> We introduce a wrapper API for flushing persistent memory with functions
> pmem_memcpy, pmem_flush and pmem_commit. To commit data persistently,
> all three functions must be called.
>
> The macro pmem_assign may be used instead of pmem_memcpy.  pmem_assign
> (unlike pmem_memcpy) guarantees that 8-byte values are written
> atomically.
>
> On X86_64, pmem_memcpy is memcpy_flushcache, pmem_flush is empty and
> pmem_commit is wmb.
>
> On ARM64, pmem_memcpy is memcpy, pmem_flush is arch_wb_cache_pmem and
> pmem_commit is empty.
>
> It is clear that this wrapper API for flushing persistent memory needs
> to be elevated out of this dm-writecache driver.  But that can happen
> later without requiring DM developers to blaze new trails on pmem
> specific implementation details/quirks (pmem developers need to clean up
> their APIs given they are already spread across CONFIG_ARCH_HAS_PMEM_API
> and CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE and they absolutely don't take
> into account the duality of the different programming models needed to
> achieve optimal cross-architecture use of persistent memory).

Right, so again, what is wrong with memcpy_flushcache_relaxed() +
wmb() or otherwise making memcpy_flushcache() ordered. I do not see
that as a trailblazing requirement, I see that as typical review and a
reduction of the operation space that you are proposing.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-25 15:57         ` Dan Williams
@ 2018-05-26  7:02           ` Mikulas Patocka
  2018-05-26 15:26             ` Dan Williams
  0 siblings, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-26  7:02 UTC (permalink / raw)
  To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm

On Fri, 25 May 2018, Dan Williams wrote:

> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> > On Fri, May 25 2018 at  2:17am -0400,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >> On Thu, 24 May 2018, Dan Williams wrote:
> >>
> >> > I don't want to grow driver-local wrappers for pmem. You should use
> >> > memcpy_flushcache directly() and if an architecture does not define
> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
> >> > see a need to add a standalone flush operation if all relevant archs
> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
> >> > directly since all archs define it. Alternatively we could introduce
> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
> >> > routine and memcpy_flushcache() would imply a wmb().
> >>
> >> But memcpy_flushcache() on ARM64 is slow.
> 
> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
> that as a trailblazing requirement, I see that as typical review and a
> reduction of the operation space that you are proposing.

memcpy_flushcache on ARM64 is generally wrong thing to do, because it is 
slower than memcpy and explicit cache flush.

Suppose that you want to write data to a block device and make it 
persistent. So you send a WRITE bio and then a FLUSH bio.

Now - how to implement these two bios on persistent memory:

On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does 
wmb() - this is the optimal implementation.

But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal 
implementation is that the WRITE bio does just memcpy() and the FLUSH bio 
does arch_wb_cache_pmem() on the affected range.

Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture 
doesn't have non-temporal stores. So, memcpy_flushcache() is implemented 
as memcpy() followed by a cache flush.

Now - if you flush the cache immediatelly after memcpy, the cache is full 
of dirty lines and the cache-flushing code has to write these lines back 
and that is slow.

If you flush the cache some time after memcpy (i.e. when the FLUSH bio is 
received), the processor already flushed some part of the cache on its 
own, so the cache-flushing function has less work to do and it is faster.

So the conclusion is - don't use memcpy_flushcache on ARM. This problem 
cannot be fixed by a better implementation of memcpy_flushcache.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-26  7:02           ` Mikulas Patocka
@ 2018-05-26 15:26             ` Dan Williams
  2018-05-28 13:32               ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Williams @ 2018-05-26 15:26 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm

On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 25 May 2018, Dan Williams wrote:
>
>> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote:
>> > On Fri, May 25 2018 at  2:17am -0400,
>> > Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >
>> >> On Thu, 24 May 2018, Dan Williams wrote:
>> >>
>> >> > I don't want to grow driver-local wrappers for pmem. You should use
>> >> > memcpy_flushcache directly() and if an architecture does not define
>> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
>> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
>> >> > see a need to add a standalone flush operation if all relevant archs
>> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
>> >> > directly since all archs define it. Alternatively we could introduce
>> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
>> >> > routine and memcpy_flushcache() would imply a wmb().
>> >>
>> >> But memcpy_flushcache() on ARM64 is slow.
>>
>> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
>> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
>> that as a trailblazing requirement, I see that as typical review and a
>> reduction of the operation space that you are proposing.
>
> memcpy_flushcache on ARM64 is generally wrong thing to do, because it is
> slower than memcpy and explicit cache flush.
>
> Suppose that you want to write data to a block device and make it
> persistent. So you send a WRITE bio and then a FLUSH bio.
>
> Now - how to implement these two bios on persistent memory:
>
> On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does
> wmb() - this is the optimal implementation.
>
> But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal
> implementation is that the WRITE bio does just memcpy() and the FLUSH bio
> does arch_wb_cache_pmem() on the affected range.
>
> Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture
> doesn't have non-temporal stores. So, memcpy_flushcache() is implemented
> as memcpy() followed by a cache flush.
>
> Now - if you flush the cache immediatelly after memcpy, the cache is full
> of dirty lines and the cache-flushing code has to write these lines back
> and that is slow.
>
> If you flush the cache some time after memcpy (i.e. when the FLUSH bio is
> received), the processor already flushed some part of the cache on its
> own, so the cache-flushing function has less work to do and it is faster.
>
> So the conclusion is - don't use memcpy_flushcache on ARM. This problem
> cannot be fixed by a better implementation of memcpy_flushcache.

It sounds like ARM might be better off with mapping its pmem as
write-through rather than write-back, and skip the explicit cache
management altogether. You speak of "optimal" and "sub-optimal", but
what would be more clear is fio measurements of the relative IOPs and
latency profiles of the different approaches. The reason I am
continuing to push here is that reducing the operation space from
'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the
maintenance long term.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-26 15:26             ` Dan Williams
@ 2018-05-28 13:32               ` Mikulas Patocka
  2018-05-28 18:14                 ` Dan Williams
  0 siblings, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-28 13:32 UTC (permalink / raw)
  To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm



On Sat, 26 May 2018, Dan Williams wrote:

> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> > On Fri, 25 May 2018, Dan Williams wrote:
> >
> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> >> > On Fri, May 25 2018 at  2:17am -0400,
> >> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> >> >
> >> >> On Thu, 24 May 2018, Dan Williams wrote:
> >> >>
> >> >> > I don't want to grow driver-local wrappers for pmem. You should use
> >> >> > memcpy_flushcache directly() and if an architecture does not define
> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
> >> >> > see a need to add a standalone flush operation if all relevant archs
> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
> >> >> > directly since all archs define it. Alternatively we could introduce
> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
> >> >> > routine and memcpy_flushcache() would imply a wmb().
> >> >>
> >> >> But memcpy_flushcache() on ARM64 is slow.
> >>
> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
> >> that as a trailblazing requirement, I see that as typical review and a
> >> reduction of the operation space that you are proposing.
> >
> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is
> > slower than memcpy and explicit cache flush.
> >
> > Suppose that you want to write data to a block device and make it
> > persistent. So you send a WRITE bio and then a FLUSH bio.
> >
> > Now - how to implement these two bios on persistent memory:
> >
> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does
> > wmb() - this is the optimal implementation.
> >
> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal
> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio
> > does arch_wb_cache_pmem() on the affected range.
> >
> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture
> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented
> > as memcpy() followed by a cache flush.
> >
> > Now - if you flush the cache immediatelly after memcpy, the cache is full
> > of dirty lines and the cache-flushing code has to write these lines back
> > and that is slow.
> >
> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is
> > received), the processor already flushed some part of the cache on its
> > own, so the cache-flushing function has less work to do and it is faster.
> >
> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem
> > cannot be fixed by a better implementation of memcpy_flushcache.
> 
> It sounds like ARM might be better off with mapping its pmem as
> write-through rather than write-back, and skip the explicit cache

I doubt it would perform well - write combining combines the writes into a 
larger segments - and write-through doesn't.

> management altogether. You speak of "optimal" and "sub-optimal", but
> what would be more clear is fio measurements of the relative IOPs and
> latency profiles of the different approaches. The reason I am
> continuing to push here is that reducing the operation space from
> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the
> maintenance long term.

I measured it (with nvme backing store) and late cache flushing has 12% 
better performance than eager flushing with memcpy_flushcache().

131836 4k iops - vs - 117016.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-28 13:32               ` Mikulas Patocka
@ 2018-05-28 18:14                 ` Dan Williams
  2018-05-30 13:07                   ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Williams @ 2018-05-28 18:14 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm

On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Sat, 26 May 2018, Dan Williams wrote:
>
>> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >
>> >
>> > On Fri, 25 May 2018, Dan Williams wrote:
>> >
>> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote:
>> >> > On Fri, May 25 2018 at  2:17am -0400,
>> >> > Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >> >
>> >> >> On Thu, 24 May 2018, Dan Williams wrote:
>> >> >>
>> >> >> > I don't want to grow driver-local wrappers for pmem. You should use
>> >> >> > memcpy_flushcache directly() and if an architecture does not define
>> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
>> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
>> >> >> > see a need to add a standalone flush operation if all relevant archs
>> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
>> >> >> > directly since all archs define it. Alternatively we could introduce
>> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
>> >> >> > routine and memcpy_flushcache() would imply a wmb().
>> >> >>
>> >> >> But memcpy_flushcache() on ARM64 is slow.
>> >>
>> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
>> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
>> >> that as a trailblazing requirement, I see that as typical review and a
>> >> reduction of the operation space that you are proposing.
>> >
>> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is
>> > slower than memcpy and explicit cache flush.
>> >
>> > Suppose that you want to write data to a block device and make it
>> > persistent. So you send a WRITE bio and then a FLUSH bio.
>> >
>> > Now - how to implement these two bios on persistent memory:
>> >
>> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does
>> > wmb() - this is the optimal implementation.
>> >
>> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal
>> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio
>> > does arch_wb_cache_pmem() on the affected range.
>> >
>> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture
>> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented
>> > as memcpy() followed by a cache flush.
>> >
>> > Now - if you flush the cache immediatelly after memcpy, the cache is full
>> > of dirty lines and the cache-flushing code has to write these lines back
>> > and that is slow.
>> >
>> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is
>> > received), the processor already flushed some part of the cache on its
>> > own, so the cache-flushing function has less work to do and it is faster.
>> >
>> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem
>> > cannot be fixed by a better implementation of memcpy_flushcache.
>>
>> It sounds like ARM might be better off with mapping its pmem as
>> write-through rather than write-back, and skip the explicit cache
>
> I doubt it would perform well - write combining combines the writes into a
> larger segments - and write-through doesn't.
>

Last I checked write-through caching does not disable write combining

>> management altogether. You speak of "optimal" and "sub-optimal", but
>> what would be more clear is fio measurements of the relative IOPs and
>> latency profiles of the different approaches. The reason I am
>> continuing to push here is that reducing the operation space from
>> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the
>> maintenance long term.
>
> I measured it (with nvme backing store) and late cache flushing has 12%
> better performance than eager flushing with memcpy_flushcache().

I assume what you're seeing is ARM64 over-flushing the amount of dirty
data so it becomes more efficient to do an amortized flush at the end?
However, that effectively makes memcpy_flushcache() unusable in the
way it can be used on x86. You claimed that ARM does not support
non-temporal stores, but it does, see the STNP instruction. I do not
want to see arch specific optimizations in drivers, so either
write-through mappings is a potential answer to remove the need to
explicitly manage flushing, or just implement STNP hacks in
memcpy_flushcache() like you did with MOVNT on x86.

> 131836 4k iops - vs - 117016.

To be clear this is memcpy_flushcache() vs memcpy + flush?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-28 18:14                 ` Dan Williams
@ 2018-05-30 13:07                   ` Mikulas Patocka
  2018-05-30 13:16                     ` Mike Snitzer
  2018-05-30 15:58                     ` Dan Williams
  0 siblings, 2 replies; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-30 13:07 UTC (permalink / raw)
  To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm



On Mon, 28 May 2018, Dan Williams wrote:

> On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> > On Sat, 26 May 2018, Dan Williams wrote:
> >
> >> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >> >
> >> >
> >> > On Fri, 25 May 2018, Dan Williams wrote:
> >> >
> >> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> >> >> > On Fri, May 25 2018 at  2:17am -0400,
> >> >> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> >> >> >
> >> >> >> On Thu, 24 May 2018, Dan Williams wrote:
> >> >> >>
> >> >> >> > I don't want to grow driver-local wrappers for pmem. You should use
> >> >> >> > memcpy_flushcache directly() and if an architecture does not define
> >> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
> >> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
> >> >> >> > see a need to add a standalone flush operation if all relevant archs
> >> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
> >> >> >> > directly since all archs define it. Alternatively we could introduce
> >> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
> >> >> >> > routine and memcpy_flushcache() would imply a wmb().
> >> >> >>
> >> >> >> But memcpy_flushcache() on ARM64 is slow.
> >> >>
> >> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
> >> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
> >> >> that as a trailblazing requirement, I see that as typical review and a
> >> >> reduction of the operation space that you are proposing.
> >> >
> >> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is
> >> > slower than memcpy and explicit cache flush.
> >> >
> >> > Suppose that you want to write data to a block device and make it
> >> > persistent. So you send a WRITE bio and then a FLUSH bio.
> >> >
> >> > Now - how to implement these two bios on persistent memory:
> >> >
> >> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does
> >> > wmb() - this is the optimal implementation.
> >> >
> >> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal
> >> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio
> >> > does arch_wb_cache_pmem() on the affected range.
> >> >
> >> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture
> >> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented
> >> > as memcpy() followed by a cache flush.
> >> >
> >> > Now - if you flush the cache immediatelly after memcpy, the cache is full
> >> > of dirty lines and the cache-flushing code has to write these lines back
> >> > and that is slow.
> >> >
> >> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is
> >> > received), the processor already flushed some part of the cache on its
> >> > own, so the cache-flushing function has less work to do and it is faster.
> >> >
> >> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem
> >> > cannot be fixed by a better implementation of memcpy_flushcache.
> >>
> >> It sounds like ARM might be better off with mapping its pmem as
> >> write-through rather than write-back, and skip the explicit cache
> >
> > I doubt it would perform well - write combining combines the writes into a
> > larger segments - and write-through doesn't.
> >
> 
> Last I checked write-through caching does not disable write combining
> 
> >> management altogether. You speak of "optimal" and "sub-optimal", but
> >> what would be more clear is fio measurements of the relative IOPs and
> >> latency profiles of the different approaches. The reason I am
> >> continuing to push here is that reducing the operation space from
> >> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the
> >> maintenance long term.
> >
> > I measured it (with nvme backing store) and late cache flushing has 12%
> > better performance than eager flushing with memcpy_flushcache().
> 
> I assume what you're seeing is ARM64 over-flushing the amount of dirty
> data so it becomes more efficient to do an amortized flush at the end?
> However, that effectively makes memcpy_flushcache() unusable in the
> way it can be used on x86. You claimed that ARM does not support
> non-temporal stores, but it does, see the STNP instruction. I do not
> want to see arch specific optimizations in drivers, so either
> write-through mappings is a potential answer to remove the need to
> explicitly manage flushing, or just implement STNP hacks in
> memcpy_flushcache() like you did with MOVNT on x86.
> 
> > 131836 4k iops - vs - 117016.
> 
> To be clear this is memcpy_flushcache() vs memcpy + flush?

I found out what caused the difference. I used dax_flush on the version of 
dm-writecache that I had on the ARM machine (with the kernel 4.14, because 
it is the last version where dax on ramdisk works) - and I thought that 
dax_flush flushes the cache, but it doesn't.

When I replaced dax_flush with arch_wb_cache_pmem, the performance 
difference between early flushing and late flushing disappeared.

So I think we can remove this per-architecture switch from dm-writecache.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:07                   ` Mikulas Patocka
@ 2018-05-30 13:16                     ` Mike Snitzer
  2018-05-30 13:21                       ` Mikulas Patocka
  2018-05-30 15:58                     ` Dan Williams
  1 sibling, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-30 13:16 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm

On Wed, May 30 2018 at  9:07am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Mon, 28 May 2018, Dan Williams wrote:
> 
> > On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > I measured it (with nvme backing store) and late cache flushing has 12%
> > > better performance than eager flushing with memcpy_flushcache().
> > 
> > I assume what you're seeing is ARM64 over-flushing the amount of dirty
> > data so it becomes more efficient to do an amortized flush at the end?
> > However, that effectively makes memcpy_flushcache() unusable in the
> > way it can be used on x86. You claimed that ARM does not support
> > non-temporal stores, but it does, see the STNP instruction. I do not
> > want to see arch specific optimizations in drivers, so either
> > write-through mappings is a potential answer to remove the need to
> > explicitly manage flushing, or just implement STNP hacks in
> > memcpy_flushcache() like you did with MOVNT on x86.
> > 
> > > 131836 4k iops - vs - 117016.
> > 
> > To be clear this is memcpy_flushcache() vs memcpy + flush?
> 
> I found out what caused the difference. I used dax_flush on the version of 
> dm-writecache that I had on the ARM machine (with the kernel 4.14, because 
> it is the last version where dax on ramdisk works) - and I thought that 
> dax_flush flushes the cache, but it doesn't.
> 
> When I replaced dax_flush with arch_wb_cache_pmem, the performance 
> difference between early flushing and late flushing disappeared.
> 
> So I think we can remove this per-architecture switch from dm-writecache.

That is really great news, can you submit an incremental patch that
layers ontop of the linux-dm.git 'dm-4.18' branch?

Thanks,
Mike
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:16                     ` Mike Snitzer
@ 2018-05-30 13:21                       ` Mikulas Patocka
  2018-05-30 13:26                         ` Mike Snitzer
  0 siblings, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-30 13:21 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm



On Wed, 30 May 2018, Mike Snitzer wrote:

> That is really great news, can you submit an incremental patch that
> layers ontop of the linux-dm.git 'dm-4.18' branch?
> 
> Thanks,
> Mike

I've sent the current version that I have. I fixed the bugs that were 
reported here (missing DAX, dm_bufio_client_create, __branch_check__ 
long->int truncation).

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:21                       ` Mikulas Patocka
@ 2018-05-30 13:26                         ` Mike Snitzer
  2018-05-30 13:33                           ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-30 13:26 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm

On Wed, May 30 2018 at  9:21am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 30 May 2018, Mike Snitzer wrote:
> 
> > That is really great news, can you submit an incremental patch that
> > layers ontop of the linux-dm.git 'dm-4.18' branch?
> > 
> > Thanks,
> > Mike
> 
> I've sent the current version that I have. I fixed the bugs that were 
> reported here (missing DAX, dm_bufio_client_create, __branch_check__ 
> long->int truncation).

OK, but a monolithic dm-writecache.c is no longer useful to me.  I can
drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take
your __branch_check__ patch).  Not sure what the dm_bufio_client_create
fix is... must've missed a report about that.

ANyway, point is we're on too a different phase of dm-writecache.c's
development.  I've picked it up and am trying to get it ready for the
4.18 merge window (likely opening Sunday).  Therefore it needs to be in
a git tree, and incremental changes overlayed.  I cannot be rebasing at
this late stage in the 4.18 development window.

Thanks,
Mike

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:26                         ` Mike Snitzer
@ 2018-05-30 13:33                           ` Mikulas Patocka
  2018-05-30 13:54                             ` Mike Snitzer
  0 siblings, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-30 13:33 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm



On Wed, 30 May 2018, Mike Snitzer wrote:

> On Wed, May 30 2018 at  9:21am -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > 
> > 
> > On Wed, 30 May 2018, Mike Snitzer wrote:
> > 
> > > That is really great news, can you submit an incremental patch that
> > > layers ontop of the linux-dm.git 'dm-4.18' branch?
> > > 
> > > Thanks,
> > > Mike
> > 
> > I've sent the current version that I have. I fixed the bugs that were 
> > reported here (missing DAX, dm_bufio_client_create, __branch_check__ 
> > long->int truncation).
> 
> OK, but a monolithic dm-writecache.c is no longer useful to me.  I can
> drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take
> your __branch_check__ patch).  Not sure what the dm_bufio_client_create
> fix is... must've missed a report about that.
> 
> ANyway, point is we're on too a different phase of dm-writecache.c's
> development.  I've picked it up and am trying to get it ready for the
> 4.18 merge window (likely opening Sunday).  Therefore it needs to be in
> a git tree, and incremental changes overlayed.  I cannot be rebasing at
> this late stage in the 4.18 development window.
> 
> Thanks,
> Mike

I downloaded dm-writecache from your git repository some times ago - but 
you changed a lot of useless things (i.e. reordering the fields in the 
structure) since that time - so, you'll have to merge the changes.

You dropped the latency measuring code - why? We still would like to 
benchmark the driver.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:33                           ` Mikulas Patocka
@ 2018-05-30 13:54                             ` Mike Snitzer
  2018-05-30 14:09                               ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-30 13:54 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm

On Wed, May 30 2018 at  9:33am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 30 May 2018, Mike Snitzer wrote:
> 
> > On Wed, May 30 2018 at  9:21am -0400,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > 
> > > 
> > > On Wed, 30 May 2018, Mike Snitzer wrote:
> > > 
> > > > That is really great news, can you submit an incremental patch that
> > > > layers ontop of the linux-dm.git 'dm-4.18' branch?
> > > > 
> > > > Thanks,
> > > > Mike
> > > 
> > > I've sent the current version that I have. I fixed the bugs that were 
> > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ 
> > > long->int truncation).
> > 
> > OK, but a monolithic dm-writecache.c is no longer useful to me.  I can
> > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take
> > your __branch_check__ patch).  Not sure what the dm_bufio_client_create
> > fix is... must've missed a report about that.
> > 
> > ANyway, point is we're on too a different phase of dm-writecache.c's
> > development.  I've picked it up and am trying to get it ready for the
> > 4.18 merge window (likely opening Sunday).  Therefore it needs to be in
> > a git tree, and incremental changes overlayed.  I cannot be rebasing at
> > this late stage in the 4.18 development window.
> > 
> > Thanks,
> > Mike
> 
> I downloaded dm-writecache from your git repository some times ago - but 
> you changed a lot of useless things (i.e. reordering the fields in the 
> structure) since that time - so, you'll have to merge the changes.

Fine I'll deal with it.  reordering the fields eliminated holes in the
structure and reduced struct members spanning cache lines.
 
> You dropped the latency measuring code - why? We still would like to 
> benchmark the driver.

I saved that patch.  It is a development-only patch.  I tried to have
you work on formalizing the code further by making the functions
actually get called, and by wrapping the code with
CONFIG_DM_WRITECACHE_MEASURE_LATENCY.  In the end I dropped it for now
and we'll just have to apply the patch when we need to benchmark.

Here is the current patch if you'd like to improve it (e.g. actually
call measure_latency_start and measure_latency_end in places) -- I
seemed to have lost my incremental change to switch over to
CONFIG_DM_WRITECACHE_MEASURE_LATENCY (likely due to rebase); can worry
about that later.

This is based on the dm-4.18 branch.

>From 20a7c123271741cb7260154b68730942417e803a Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 22 May 2018 15:54:53 -0400
Subject: [PATCH] dm writecache: add the ability to measure some latencies

Developer-only code that won't go upstream (as-is).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
 drivers/md/dm-writecache.c | 94 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 87 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 844c4fb2fcfc..e733a14faf8f 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -28,6 +28,8 @@
 #define AUTOCOMMIT_BLOCKS_PMEM		64
 #define AUTOCOMMIT_MSEC			1000
 
+//#define WC_MEASURE_LATENCY
+
 #define BITMAP_GRANULARITY	65536
 #if BITMAP_GRANULARITY < PAGE_SIZE
 #undef BITMAP_GRANULARITY
@@ -217,6 +219,15 @@ struct dm_writecache {
 	struct dm_kcopyd_client *dm_kcopyd;
 	unsigned long *dirty_bitmap;
 	unsigned dirty_bitmap_size;
+
+#ifdef WC_MEASURE_LATENCY
+	ktime_t lock_acquired_time;
+	ktime_t max_lock_held;
+	ktime_t max_lock_wait;
+	ktime_t max_freelist_wait;
+	ktime_t measure_latency_time;
+	ktime_t max_measure_latency;
+#endif
 };
 
 #define WB_LIST_INLINE		16
@@ -243,15 +254,60 @@ struct copy_struct {
 DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(dm_writecache_throttle,
 					    "A percentage of time allocated for data copying");
 
-static void wc_lock(struct dm_writecache *wc)
+static inline void measure_latency_start(struct dm_writecache *wc)
+{
+#ifdef WC_MEASURE_LATENCY
+	wc->measure_latency_time = ktime_get();
+#endif
+}
+
+static inline void measure_latency_end(struct dm_writecache *wc, unsigned long n)
 {
+#ifdef WC_MEASURE_LATENCY
+	ktime_t now = ktime_get();
+	if (now - wc->measure_latency_time > wc->max_measure_latency) {
+		wc->max_measure_latency = now - wc->measure_latency_time;
+		printk(KERN_DEBUG "dm-writecache: measured latency %lld.%03lldus, %lu steps\n",
+		       wc->max_measure_latency / 1000, wc->max_measure_latency % 1000, n);
+	}
+#endif
+}
+
+static void __wc_lock(struct dm_writecache *wc, int line)
+{
+#ifdef WC_MEASURE_LATENCY
+	ktime_t before, after;
+	before = ktime_get();
+#endif
 	mutex_lock(&wc->lock);
+#ifdef WC_MEASURE_LATENCY
+	after = ktime_get();
+	if (unlikely(after - before > wc->max_lock_wait)) {
+		wc->max_lock_wait = after - before;
+		printk(KERN_DEBUG "dm-writecache: waiting for lock for %lld.%03lldus at %d\n",
+		       wc->max_lock_wait / 1000, wc->max_lock_wait % 1000, line);
+		after = ktime_get();
+	}
+	wc->lock_acquired_time = after;
+#endif
 }
+#define wc_lock(wc)	__wc_lock(wc, __LINE__)
 
-static void wc_unlock(struct dm_writecache *wc)
+static void __wc_unlock(struct dm_writecache *wc, int line)
 {
+#ifdef WC_MEASURE_LATENCY
+	ktime_t now = ktime_get();
+	if (now - wc->lock_acquired_time > wc->max_lock_held) {
+		wc->max_lock_held = now - wc->lock_acquired_time;
+		printk(KERN_DEBUG "dm-writecache: lock held for %lld.%03lldus at %d\n",
+		       wc->max_lock_held / 1000, wc->max_lock_held % 1000, line);
+	}
+#endif
 	mutex_unlock(&wc->lock);
 }
+#define wc_unlock(wc)	__wc_unlock(wc, __LINE__)
+
+#define wc_unlock_long(wc)	mutex_unlock(&wc->lock)
 
 #if IS_ENABLED(CONFIG_DAX_DRIVER)
 static int persistent_memory_claim(struct dm_writecache *wc)
@@ -293,6 +349,10 @@ static int persistent_memory_claim(struct dm_writecache *wc)
 		r = -EOPNOTSUPP;
 		goto err2;
 	}
+#ifdef WC_MEASURE_LATENCY
+	printk(KERN_DEBUG "dm-writecache: device %s, pfn %016llx\n",
+	       wc->ssd_dev->name, pfn.val);
+#endif
 	if (da != p) {
 		long i;
 		wc->memory_map = NULL;
@@ -701,16 +761,35 @@ static void writecache_free_entry(struct dm_writecache *wc, struct wc_entry *e)
 		swake_up(&wc->freelist_wait);
 }
 
-static void writecache_wait_on_freelist(struct dm_writecache *wc)
+static void __writecache_wait_on_freelist(struct dm_writecache *wc, bool measure, int line)
 {
 	DECLARE_SWAITQUEUE(wait);
+#ifdef WC_MEASURE_LATENCY
+	ktime_t before, after;
+#endif
 
 	prepare_to_swait(&wc->freelist_wait, &wait, TASK_UNINTERRUPTIBLE);
 	wc_unlock(wc);
+#ifdef WC_MEASURE_LATENCY
+	if (measure)
+		before = ktime_get();
+#endif
 	io_schedule();
 	finish_swait(&wc->freelist_wait, &wait);
+#ifdef WC_MEASURE_LATENCY
+	if (measure) {
+		after = ktime_get();
+		if (unlikely(after - before > wc->max_freelist_wait)) {
+			wc->max_freelist_wait = after - before;
+			printk(KERN_DEBUG "dm-writecache: waiting on freelist for %lld.%03lldus at %d\n",
+			       wc->max_freelist_wait / 1000, wc->max_freelist_wait % 1000, line);
+		}
+	}
+#endif
 	wc_lock(wc);
 }
+#define writecache_wait_on_freelist(wc)		__writecache_wait_on_freelist(wc, true, __LINE__)
+#define writecache_wait_on_freelist_long(wc)	__writecache_wait_on_freelist(wc, false, __LINE__)
 
 static void writecache_poison_lists(struct dm_writecache *wc)
 {
@@ -890,7 +969,7 @@ static void writecache_suspend(struct dm_target *ti)
 
 	writecache_poison_lists(wc);
 
-	wc_unlock(wc);
+	wc_unlock_long(wc);
 }
 
 static int writecache_alloc_entries(struct dm_writecache *wc)
@@ -1001,7 +1080,7 @@ static void writecache_resume(struct dm_target *ti)
 		writecache_commit_flushed(wc);
 	}
 
-	wc_unlock(wc);
+	wc_unlock_long(wc);
 }
 
 static int process_flush_mesg(unsigned argc, char **argv, struct dm_writecache *wc)
@@ -1472,8 +1551,9 @@ static void __writeback_throttle(struct dm_writecache *wc, struct writeback_list
 	if (unlikely(wc->max_writeback_jobs)) {
 		if (READ_ONCE(wc->writeback_size) - wbl->size >= wc->max_writeback_jobs) {
 			wc_lock(wc);
-			while (wc->writeback_size - wbl->size >= wc->max_writeback_jobs)
-				writecache_wait_on_freelist(wc);
+			while (wc->writeback_size - wbl->size >= wc->max_writeback_jobs) {
+				writecache_wait_on_freelist_long(wc);
+			}
 			wc_unlock(wc);
 		}
 	}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:54                             ` Mike Snitzer
@ 2018-05-30 14:09                               ` Mikulas Patocka
  2018-05-30 14:21                                 ` Mike Snitzer
  2018-05-31  3:39                                 ` Mike Snitzer
  0 siblings, 2 replies; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-30 14:09 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm



On Wed, 30 May 2018, Mike Snitzer wrote:

> On Wed, May 30 2018 at  9:33am -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > 
> > 
> > On Wed, 30 May 2018, Mike Snitzer wrote:
> > 
> > > On Wed, May 30 2018 at  9:21am -0400,
> > > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > 
> > > > 
> > > > 
> > > > On Wed, 30 May 2018, Mike Snitzer wrote:
> > > > 
> > > > > That is really great news, can you submit an incremental patch that
> > > > > layers ontop of the linux-dm.git 'dm-4.18' branch?
> > > > > 
> > > > > Thanks,
> > > > > Mike
> > > > 
> > > > I've sent the current version that I have. I fixed the bugs that were 
> > > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ 
> > > > long->int truncation).
> > > 
> > > OK, but a monolithic dm-writecache.c is no longer useful to me.  I can
> > > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take
> > > your __branch_check__ patch).  Not sure what the dm_bufio_client_create
> > > fix is... must've missed a report about that.
> > > 
> > > ANyway, point is we're on too a different phase of dm-writecache.c's
> > > development.  I've picked it up and am trying to get it ready for the
> > > 4.18 merge window (likely opening Sunday).  Therefore it needs to be in
> > > a git tree, and incremental changes overlayed.  I cannot be rebasing at
> > > this late stage in the 4.18 development window.
> > > 
> > > Thanks,
> > > Mike
> > 
> > I downloaded dm-writecache from your git repository some times ago - but 
> > you changed a lot of useless things (i.e. reordering the fields in the 
> > structure) since that time - so, you'll have to merge the changes.
> 
> Fine I'll deal with it.  reordering the fields eliminated holes in the
> structure and reduced struct members spanning cache lines.

And what about this?
#define WC_MODE_PMEM(wc)                        ((wc)->pmem_mode)

The code that I had just allowed the compiler to optimize out 
persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you 
deleted it.

Most architectures don't have persistent memory and the dm-writecache 
driver could work in ssd-only mode on them. On these architectures, I 
define
#define WC_MODE_PMEM(wc)                        false
- and the compiler will just automatically remove the tests for that 
condition and the unused branch. It does also eliminate unused static 
functions.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 14:09                               ` Mikulas Patocka
@ 2018-05-30 14:21                                 ` Mike Snitzer
  2018-05-30 14:46                                   ` Mikulas Patocka
  2018-05-31  3:39                                 ` Mike Snitzer
  1 sibling, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-30 14:21 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm

On Wed, May 30 2018 at 10:09am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 30 May 2018, Mike Snitzer wrote:
> 
> > On Wed, May 30 2018 at  9:33am -0400,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > > 
> > > 
> > > On Wed, 30 May 2018, Mike Snitzer wrote:
> > > 
> > > > On Wed, May 30 2018 at  9:21am -0400,
> > > > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > > 
> > > > > 
> > > > > 
> > > > > On Wed, 30 May 2018, Mike Snitzer wrote:
> > > > > 
> > > > > > That is really great news, can you submit an incremental patch that
> > > > > > layers ontop of the linux-dm.git 'dm-4.18' branch?
> > > > > > 
> > > > > > Thanks,
> > > > > > Mike
> > > > > 
> > > > > I've sent the current version that I have. I fixed the bugs that were 
> > > > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ 
> > > > > long->int truncation).
> > > > 
> > > > OK, but a monolithic dm-writecache.c is no longer useful to me.  I can
> > > > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take
> > > > your __branch_check__ patch).  Not sure what the dm_bufio_client_create
> > > > fix is... must've missed a report about that.
> > > > 
> > > > ANyway, point is we're on too a different phase of dm-writecache.c's
> > > > development.  I've picked it up and am trying to get it ready for the
> > > > 4.18 merge window (likely opening Sunday).  Therefore it needs to be in
> > > > a git tree, and incremental changes overlayed.  I cannot be rebasing at
> > > > this late stage in the 4.18 development window.
> > > > 
> > > > Thanks,
> > > > Mike
> > > 
> > > I downloaded dm-writecache from your git repository some times ago - but 
> > > you changed a lot of useless things (i.e. reordering the fields in the 
> > > structure) since that time - so, you'll have to merge the changes.
> > 
> > Fine I'll deal with it.  reordering the fields eliminated holes in the
> > structure and reduced struct members spanning cache lines.
> 
> And what about this?
> #define WC_MODE_PMEM(wc)                        ((wc)->pmem_mode)
> 
> The code that I had just allowed the compiler to optimize out 
> persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you 
> deleted it.
> 
> Most architectures don't have persistent memory and the dm-writecache 
> driver could work in ssd-only mode on them. On these architectures, I 
> define
> #define WC_MODE_PMEM(wc)                        false
> - and the compiler will just automatically remove the tests for that 
> condition and the unused branch. It does also eliminate unused static 
> functions.

This level of microoptimization can be backfilled.  But as it was, there
were too many #defines.  And I'm really not concerned with eliminating
unused static functions for this case.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 14:21                                 ` Mike Snitzer
@ 2018-05-30 14:46                                   ` Mikulas Patocka
  2018-05-31  3:42                                     ` Mike Snitzer
  0 siblings, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-30 14:46 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm



On Wed, 30 May 2018, Mike Snitzer wrote:

> On Wed, May 30 2018 at 10:09am -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > 
> > 
> > On Wed, 30 May 2018, Mike Snitzer wrote:
> > 
> > > On Wed, May 30 2018 at  9:33am -0400,
> > > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > 
> > > > 
> > > > 
> > > > On Wed, 30 May 2018, Mike Snitzer wrote:
> > > > 
> > > > > On Wed, May 30 2018 at  9:21am -0400,
> > > > > Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Wed, 30 May 2018, Mike Snitzer wrote:
> > > > > > 
> > > > > > > That is really great news, can you submit an incremental patch that
> > > > > > > layers ontop of the linux-dm.git 'dm-4.18' branch?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Mike
> > > > > > 
> > > > > > I've sent the current version that I have. I fixed the bugs that were 
> > > > > > reported here (missing DAX, dm_bufio_client_create, __branch_check__ 
> > > > > > long->int truncation).
> > > > > 
> > > > > OK, but a monolithic dm-writecache.c is no longer useful to me.  I can
> > > > > drop Arnd's gcc warning fix (with the idea that Ingo or Steve will take
> > > > > your __branch_check__ patch).  Not sure what the dm_bufio_client_create
> > > > > fix is... must've missed a report about that.
> > > > > 
> > > > > ANyway, point is we're on too a different phase of dm-writecache.c's
> > > > > development.  I've picked it up and am trying to get it ready for the
> > > > > 4.18 merge window (likely opening Sunday).  Therefore it needs to be in
> > > > > a git tree, and incremental changes overlayed.  I cannot be rebasing at
> > > > > this late stage in the 4.18 development window.
> > > > > 
> > > > > Thanks,
> > > > > Mike
> > > > 
> > > > I downloaded dm-writecache from your git repository some times ago - but 
> > > > you changed a lot of useless things (i.e. reordering the fields in the 
> > > > structure) since that time - so, you'll have to merge the changes.
> > > 
> > > Fine I'll deal with it.  reordering the fields eliminated holes in the
> > > structure and reduced struct members spanning cache lines.
> > 
> > And what about this?
> > #define WC_MODE_PMEM(wc)                        ((wc)->pmem_mode)
> > 
> > The code that I had just allowed the compiler to optimize out 
> > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you 
> > deleted it.
> > 
> > Most architectures don't have persistent memory and the dm-writecache 
> > driver could work in ssd-only mode on them. On these architectures, I 
> > define
> > #define WC_MODE_PMEM(wc)                        false
> > - and the compiler will just automatically remove the tests for that 
> > condition and the unused branch. It does also eliminate unused static 
> > functions.
> 
> This level of microoptimization can be backfilled.  But as it was, there
> were too many #defines.  And I'm really not concerned with eliminating
> unused static functions for this case.

I don't see why "too many defines" would be a problem.

If I compile it with and without pmem support, the difference is 
15kB-vs-12kB. If we look at just one function (writecache_map), the 
difference is 1595 bytes - vs - 1280 bytes. So, it produces real savings 
in code size.

The problem with performance is not caused a condition that always jumps 
the same way (that is predicted by the CPU and it causes no delays in the 
pipeline) - the problem is that a bigger function consumes more i-cache. 
There is no reason to include code that can't be executed.


Note that we should also redefine pmem_assign on architectures that don't 
support persistent memory:
#ifndef DM_WRITECACHE_ONLY_SSD
#define pmem_assign(dest, src)                                          \
do {                                                                    \
        typeof(dest) uniq = (src);                                      \
        memcpy_flushcache(&(dest), &uniq, sizeof(dest));                \
} while (0)
#else
#define pmem_assign(dest, src)          ((dest) = (src))
#endif

I.e. we should not call memcpy_flushcache if we can't have persistent 
memory. Cache flushing is slow and we should not do it if we don't have 
to.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 14:46                                   ` Mikulas Patocka
@ 2018-05-31  3:42                                     ` Mike Snitzer
  2018-06-03 15:03                                       ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-31  3:42 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm

On Wed, May 30 2018 at 10:46P -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 30 May 2018, Mike Snitzer wrote:
> 
> > > > Fine I'll deal with it.  reordering the fields eliminated holes in the
> > > > structure and reduced struct members spanning cache lines.
> > > 
> > > And what about this?
> > > #define WC_MODE_PMEM(wc)                        ((wc)->pmem_mode)
> > > 
> > > The code that I had just allowed the compiler to optimize out 
> > > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you 
> > > deleted it.
> > > 
> > > Most architectures don't have persistent memory and the dm-writecache 
> > > driver could work in ssd-only mode on them. On these architectures, I 
> > > define
> > > #define WC_MODE_PMEM(wc)                        false
> > > - and the compiler will just automatically remove the tests for that 
> > > condition and the unused branch. It does also eliminate unused static 
> > > functions.
> > 
> > This level of microoptimization can be backfilled.  But as it was, there
> > were too many #defines.  And I'm really not concerned with eliminating
> > unused static functions for this case.
> 
> I don't see why "too many defines" would be a problem.
> 
> If I compile it with and without pmem support, the difference is 
> 15kB-vs-12kB. If we look at just one function (writecache_map), the 
> difference is 1595 bytes - vs - 1280 bytes. So, it produces real savings 
> in code size.
> 
> The problem with performance is not caused a condition that always jumps 
> the same way (that is predicted by the CPU and it causes no delays in the 
> pipeline) - the problem is that a bigger function consumes more i-cache. 
> There is no reason to include code that can't be executed.

Please double check you see the reduced code size you're expecting using
the latest dm-writecache.c in linux-dm.git's dm-4.18 branch.

Thanks,
Mike
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-31  3:42                                     ` Mike Snitzer
@ 2018-06-03 15:03                                       ` Mikulas Patocka
  0 siblings, 0 replies; 40+ messages in thread
From: Mikulas Patocka @ 2018-06-03 15:03 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm



On Wed, 30 May 2018, Mike Snitzer wrote:

> On Wed, May 30 2018 at 10:46P -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > 
> > 
> > On Wed, 30 May 2018, Mike Snitzer wrote:
> > 
> > > > > Fine I'll deal with it.  reordering the fields eliminated holes in the
> > > > > structure and reduced struct members spanning cache lines.
> > > > 
> > > > And what about this?
> > > > #define WC_MODE_PMEM(wc)                        ((wc)->pmem_mode)
> > > > 
> > > > The code that I had just allowed the compiler to optimize out 
> > > > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you 
> > > > deleted it.
> > > > 
> > > > Most architectures don't have persistent memory and the dm-writecache 
> > > > driver could work in ssd-only mode on them. On these architectures, I 
> > > > define
> > > > #define WC_MODE_PMEM(wc)                        false
> > > > - and the compiler will just automatically remove the tests for that 
> > > > condition and the unused branch. It does also eliminate unused static 
> > > > functions.
> > > 
> > > This level of microoptimization can be backfilled.  But as it was, there
> > > were too many #defines.  And I'm really not concerned with eliminating
> > > unused static functions for this case.
> > 
> > I don't see why "too many defines" would be a problem.
> > 
> > If I compile it with and without pmem support, the difference is 
> > 15kB-vs-12kB. If we look at just one function (writecache_map), the 
> > difference is 1595 bytes - vs - 1280 bytes. So, it produces real savings 
> > in code size.
> > 
> > The problem with performance is not caused a condition that always jumps 
> > the same way (that is predicted by the CPU and it causes no delays in the 
> > pipeline) - the problem is that a bigger function consumes more i-cache. 
> > There is no reason to include code that can't be executed.
> 
> Please double check you see the reduced code size you're expecting using
> the latest dm-writecache.c in linux-dm.git's dm-4.18 branch.
> 
> Thanks,
> Mike

I checked that - it's OK.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 14:09                               ` Mikulas Patocka
  2018-05-30 14:21                                 ` Mike Snitzer
@ 2018-05-31  3:39                                 ` Mike Snitzer
  2018-05-31  8:16                                   ` Mikulas Patocka
  1 sibling, 1 reply; 40+ messages in thread
From: Mike Snitzer @ 2018-05-31  3:39 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm

On Wed, May 30 2018 at 10:09P -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:
 
> And what about this?
> #define WC_MODE_PMEM(wc)                        ((wc)->pmem_mode)
> 
> The code that I had just allowed the compiler to optimize out 
> persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you 
> deleted it.
> 
> Most architectures don't have persistent memory and the dm-writecache 
> driver could work in ssd-only mode on them. On these architectures, I 
> define
> #define WC_MODE_PMEM(wc)                        false
> - and the compiler will just automatically remove the tests for that 
> condition and the unused branch. It does also eliminate unused static 
> functions.

Here is the patch that I just folded into the rebased version of
dm-writecache now in the dm-4.18 branch.

(I rebased ontop of Jens' latest block tree for 4.18 that now includes
the mempool_init changes, etc.)

---
 drivers/md/dm-writecache.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index fcbfaf7c27ec..691b5ffb799f 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -34,13 +34,21 @@
 #define BITMAP_GRANULARITY	PAGE_SIZE
 #endif
 
+#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
+#define DM_WRITECACHE_HAS_PMEM
+#endif
+
+#ifdef DM_WRITECACHE_HAS_PMEM
 #define pmem_assign(dest, src)					\
 do {								\
 	typeof(dest) uniq = (src);				\
 	memcpy_flushcache(&(dest), &uniq, sizeof(dest));	\
 } while (0)
+#else
+#define pmem_assign(dest, src)	((dest)) = (src))
+#endif
 
-#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(CONFIG_ARCH_HAS_PMEM_API)
+#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM)
 #define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS
 #endif
 
@@ -87,8 +95,13 @@ struct wc_entry {
 #endif
 };
 
+#ifdef DM_WRITECACHE_HAS_PMEM
 #define WC_MODE_PMEM(wc)			((wc)->pmem_mode)
 #define WC_MODE_FUA(wc)				((wc)->writeback_fua)
+#else
+#define WC_MODE_PMEM(wc)			false
+#define WC_MODE_FUA(wc)				false
+#endif
 #define WC_MODE_SORT_FREELIST(wc)		(!WC_MODE_PMEM(wc))
 
 struct dm_writecache {
@@ -1857,7 +1870,7 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
 	if (!strcasecmp(string, "s")) {
 		wc->pmem_mode = false;
 	} else if (!strcasecmp(string, "p")) {
-#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
+#ifdef DM_WRITECACHE_HAS_PMEM
 		wc->pmem_mode = true;
 		wc->writeback_fua = true;
 #else
-- 
2.15.0

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-31  3:39                                 ` Mike Snitzer
@ 2018-05-31  8:16                                   ` Mikulas Patocka
  2018-05-31 12:09                                     ` Mike Snitzer
  0 siblings, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-31  8:16 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, linux-nvdimm



On Wed, 30 May 2018, Mike Snitzer wrote:

> On Wed, May 30 2018 at 10:09P -0400,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>  
> > And what about this?
> > #define WC_MODE_PMEM(wc)                        ((wc)->pmem_mode)
> > 
> > The code that I had just allowed the compiler to optimize out 
> > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you 
> > deleted it.
> > 
> > Most architectures don't have persistent memory and the dm-writecache 
> > driver could work in ssd-only mode on them. On these architectures, I 
> > define
> > #define WC_MODE_PMEM(wc)                        false
> > - and the compiler will just automatically remove the tests for that 
> > condition and the unused branch. It does also eliminate unused static 
> > functions.
> 
> Here is the patch that I just folded into the rebased version of
> dm-writecache now in the dm-4.18 branch.
> 
> (I rebased ontop of Jens' latest block tree for 4.18 that now includes
> the mempool_init changes, etc.)
> 
> ---
>  drivers/md/dm-writecache.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index fcbfaf7c27ec..691b5ffb799f 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -34,13 +34,21 @@
>  #define BITMAP_GRANULARITY	PAGE_SIZE
>  #endif
>  
> +#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
> +#define DM_WRITECACHE_HAS_PMEM
> +#endif
> +
> +#ifdef DM_WRITECACHE_HAS_PMEM
>  #define pmem_assign(dest, src)					\
>  do {								\
>  	typeof(dest) uniq = (src);				\
>  	memcpy_flushcache(&(dest), &uniq, sizeof(dest));	\
>  } while (0)
> +#else
> +#define pmem_assign(dest, src)	((dest)) = (src))
> +#endif
>  
> -#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(CONFIG_ARCH_HAS_PMEM_API)
> +#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM)
>  #define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS
>  #endif
>  
> @@ -87,8 +95,13 @@ struct wc_entry {
>  #endif
>  };
>  
> +#ifdef DM_WRITECACHE_HAS_PMEM
>  #define WC_MODE_PMEM(wc)			((wc)->pmem_mode)
>  #define WC_MODE_FUA(wc)				((wc)->writeback_fua)
> +#else
> +#define WC_MODE_PMEM(wc)			false
> +#define WC_MODE_FUA(wc)				false
> +#endif
>  #define WC_MODE_SORT_FREELIST(wc)		(!WC_MODE_PMEM(wc))
>  
>  struct dm_writecache {
> @@ -1857,7 +1870,7 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
>  	if (!strcasecmp(string, "s")) {
>  		wc->pmem_mode = false;
>  	} else if (!strcasecmp(string, "p")) {
> -#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
> +#ifdef DM_WRITECACHE_HAS_PMEM
>  		wc->pmem_mode = true;
>  		wc->writeback_fua = true;
>  #else
> -- 
> 2.15.0

OK.

I think that persistent_memory_claim should also be conditioned based on 
DM_WRITECACHE_HAS_PMEM - i.e. if we have DM_WRITECACHE_HAS_PMEM, we don't 
need to use other ifdefs.

Is there some difference between "#if IS_ENABLED(CONFIG_OPTION)" and
"#if defined(CONFIG_OPTION)"?

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-31  8:16                                   ` Mikulas Patocka
@ 2018-05-31 12:09                                     ` Mike Snitzer
  0 siblings, 0 replies; 40+ messages in thread
From: Mike Snitzer @ 2018-05-31 12:09 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, linux-nvdimm

On Thu, May 31 2018 at  4:16am -0400,
Mikulas Patocka <mpatocka@redhat.com> wrote:

> 
> 
> On Wed, 30 May 2018, Mike Snitzer wrote:
> 
> > On Wed, May 30 2018 at 10:09P -0400,
> > Mikulas Patocka <mpatocka@redhat.com> wrote:
> >  
> > > And what about this?
> > > #define WC_MODE_PMEM(wc)                        ((wc)->pmem_mode)
> > > 
> > > The code that I had just allowed the compiler to optimize out 
> > > persistent-memory code if we have DM_WRITECACHE_ONLY_SSD defined - and you 
> > > deleted it.
> > > 
> > > Most architectures don't have persistent memory and the dm-writecache 
> > > driver could work in ssd-only mode on them. On these architectures, I 
> > > define
> > > #define WC_MODE_PMEM(wc)                        false
> > > - and the compiler will just automatically remove the tests for that 
> > > condition and the unused branch. It does also eliminate unused static 
> > > functions.
> > 
> > Here is the patch that I just folded into the rebased version of
> > dm-writecache now in the dm-4.18 branch.
> > 
> > (I rebased ontop of Jens' latest block tree for 4.18 that now includes
> > the mempool_init changes, etc.)
> > 
> > ---
> >  drivers/md/dm-writecache.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> > index fcbfaf7c27ec..691b5ffb799f 100644
> > --- a/drivers/md/dm-writecache.c
> > +++ b/drivers/md/dm-writecache.c
> > @@ -34,13 +34,21 @@
> >  #define BITMAP_GRANULARITY	PAGE_SIZE
> >  #endif
> >  
> > +#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
> > +#define DM_WRITECACHE_HAS_PMEM
> > +#endif
> > +
> > +#ifdef DM_WRITECACHE_HAS_PMEM
> >  #define pmem_assign(dest, src)					\
> >  do {								\
> >  	typeof(dest) uniq = (src);				\
> >  	memcpy_flushcache(&(dest), &uniq, sizeof(dest));	\
> >  } while (0)
> > +#else
> > +#define pmem_assign(dest, src)	((dest)) = (src))
> > +#endif
> >  
> > -#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(CONFIG_ARCH_HAS_PMEM_API)
> > +#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM)
> >  #define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS
> >  #endif
> >  
> > @@ -87,8 +95,13 @@ struct wc_entry {
> >  #endif
> >  };
> >  
> > +#ifdef DM_WRITECACHE_HAS_PMEM
> >  #define WC_MODE_PMEM(wc)			((wc)->pmem_mode)
> >  #define WC_MODE_FUA(wc)				((wc)->writeback_fua)
> > +#else
> > +#define WC_MODE_PMEM(wc)			false
> > +#define WC_MODE_FUA(wc)				false
> > +#endif
> >  #define WC_MODE_SORT_FREELIST(wc)		(!WC_MODE_PMEM(wc))
> >  
> >  struct dm_writecache {
> > @@ -1857,7 +1870,7 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
> >  	if (!strcasecmp(string, "s")) {
> >  		wc->pmem_mode = false;
> >  	} else if (!strcasecmp(string, "p")) {
> > -#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
> > +#ifdef DM_WRITECACHE_HAS_PMEM
> >  		wc->pmem_mode = true;
> >  		wc->writeback_fua = true;
> >  #else
> > -- 
> > 2.15.0
> 
> OK.
> 
> I think that persistent_memory_claim should also be conditioned based on 
> DM_WRITECACHE_HAS_PMEM - i.e. if we have DM_WRITECACHE_HAS_PMEM, we don't 
> need to use other ifdefs.
> 
> Is there some difference between "#if IS_ENABLED(CONFIG_OPTION)" and
> "#if defined(CONFIG_OPTION)"?

Not that I'm aware of:
  #define IS_ENABLED(option) __or(IS_BUILTIN(option), IS_MODULE(option))

Here is the incremental patch I just folded in:

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index fd3bc232b7d6..f2ae02f22c43 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -34,7 +34,7 @@
 #define BITMAP_GRANULARITY     PAGE_SIZE
 #endif

-#if defined(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
 #define DM_WRITECACHE_HAS_PMEM
 #endif

@@ -218,7 +218,7 @@ static void wc_unlock(struct dm_writecache *wc)
        mutex_unlock(&wc->lock);
 }

-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#ifdef DM_WRITECACHE_HAS_PMEM
 static int persistent_memory_claim(struct dm_writecache *wc)
 {
        int r;
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 13:07                   ` Mikulas Patocka
  2018-05-30 13:16                     ` Mike Snitzer
@ 2018-05-30 15:58                     ` Dan Williams
  2018-05-30 22:39                       ` Dan Williams
  1 sibling, 1 reply; 40+ messages in thread
From: Dan Williams @ 2018-05-30 15:58 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm

On Wed, May 30, 2018 at 6:07 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Mon, 28 May 2018, Dan Williams wrote:
>
>> On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >
>> >
>> > On Sat, 26 May 2018, Dan Williams wrote:
>> >
>> >> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >> >
>> >> >
>> >> > On Fri, 25 May 2018, Dan Williams wrote:
>> >> >
>> >> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote:
>> >> >> > On Fri, May 25 2018 at  2:17am -0400,
>> >> >> > Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >> >> >
>> >> >> >> On Thu, 24 May 2018, Dan Williams wrote:
>> >> >> >>
>> >> >> >> > I don't want to grow driver-local wrappers for pmem. You should use
>> >> >> >> > memcpy_flushcache directly() and if an architecture does not define
>> >> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
>> >> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
>> >> >> >> > see a need to add a standalone flush operation if all relevant archs
>> >> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
>> >> >> >> > directly since all archs define it. Alternatively we could introduce
>> >> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
>> >> >> >> > routine and memcpy_flushcache() would imply a wmb().
>> >> >> >>
>> >> >> >> But memcpy_flushcache() on ARM64 is slow.
>> >> >>
>> >> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
>> >> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
>> >> >> that as a trailblazing requirement, I see that as typical review and a
>> >> >> reduction of the operation space that you are proposing.
>> >> >
>> >> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is
>> >> > slower than memcpy and explicit cache flush.
>> >> >
>> >> > Suppose that you want to write data to a block device and make it
>> >> > persistent. So you send a WRITE bio and then a FLUSH bio.
>> >> >
>> >> > Now - how to implement these two bios on persistent memory:
>> >> >
>> >> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does
>> >> > wmb() - this is the optimal implementation.
>> >> >
>> >> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal
>> >> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio
>> >> > does arch_wb_cache_pmem() on the affected range.
>> >> >
>> >> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture
>> >> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented
>> >> > as memcpy() followed by a cache flush.
>> >> >
>> >> > Now - if you flush the cache immediatelly after memcpy, the cache is full
>> >> > of dirty lines and the cache-flushing code has to write these lines back
>> >> > and that is slow.
>> >> >
>> >> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is
>> >> > received), the processor already flushed some part of the cache on its
>> >> > own, so the cache-flushing function has less work to do and it is faster.
>> >> >
>> >> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem
>> >> > cannot be fixed by a better implementation of memcpy_flushcache.
>> >>
>> >> It sounds like ARM might be better off with mapping its pmem as
>> >> write-through rather than write-back, and skip the explicit cache
>> >
>> > I doubt it would perform well - write combining combines the writes into a
>> > larger segments - and write-through doesn't.
>> >
>>
>> Last I checked write-through caching does not disable write combining
>>
>> >> management altogether. You speak of "optimal" and "sub-optimal", but
>> >> what would be more clear is fio measurements of the relative IOPs and
>> >> latency profiles of the different approaches. The reason I am
>> >> continuing to push here is that reducing the operation space from
>> >> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the
>> >> maintenance long term.
>> >
>> > I measured it (with nvme backing store) and late cache flushing has 12%
>> > better performance than eager flushing with memcpy_flushcache().
>>
>> I assume what you're seeing is ARM64 over-flushing the amount of dirty
>> data so it becomes more efficient to do an amortized flush at the end?
>> However, that effectively makes memcpy_flushcache() unusable in the
>> way it can be used on x86. You claimed that ARM does not support
>> non-temporal stores, but it does, see the STNP instruction. I do not
>> want to see arch specific optimizations in drivers, so either
>> write-through mappings is a potential answer to remove the need to
>> explicitly manage flushing, or just implement STNP hacks in
>> memcpy_flushcache() like you did with MOVNT on x86.
>>
>> > 131836 4k iops - vs - 117016.
>>
>> To be clear this is memcpy_flushcache() vs memcpy + flush?
>
> I found out what caused the difference. I used dax_flush on the version of
> dm-writecache that I had on the ARM machine (with the kernel 4.14, because
> it is the last version where dax on ramdisk works) - and I thought that
> dax_flush flushes the cache, but it doesn't.
>
> When I replaced dax_flush with arch_wb_cache_pmem, the performance
> difference between early flushing and late flushing disappeared.
>
> So I think we can remove this per-architecture switch from dm-writecache.

Great find! Thanks for the due diligence. Feel free to add:

    Acked-by: Dan Williams <dan.j.williams@intel.com>

...on the reworks to unify ARM and x86.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 15:58                     ` Dan Williams
@ 2018-05-30 22:39                       ` Dan Williams
  2018-05-31  8:19                         ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Williams @ 2018-05-30 22:39 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm

On Wed, May 30, 2018 at 8:58 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Wed, May 30, 2018 at 6:07 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>>
>>
>> On Mon, 28 May 2018, Dan Williams wrote:
>>
>>> On Mon, May 28, 2018 at 6:32 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>>> >
>>> >
>>> > On Sat, 26 May 2018, Dan Williams wrote:
>>> >
>>> >> On Sat, May 26, 2018 at 12:02 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>>> >> >
>>> >> >
>>> >> > On Fri, 25 May 2018, Dan Williams wrote:
>>> >> >
>>> >> >> On Fri, May 25, 2018 at 5:51 AM, Mike Snitzer <snitzer@redhat.com> wrote:
>>> >> >> > On Fri, May 25 2018 at  2:17am -0400,
>>> >> >> > Mikulas Patocka <mpatocka@redhat.com> wrote:
>>> >> >> >
>>> >> >> >> On Thu, 24 May 2018, Dan Williams wrote:
>>> >> >> >>
>>> >> >> >> > I don't want to grow driver-local wrappers for pmem. You should use
>>> >> >> >> > memcpy_flushcache directly() and if an architecture does not define
>>> >> >> >> > memcpy_flushcache() then don't allow building dm-writecache, i.e. this
>>> >> >> >> > driver should 'depends on CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE'. I don't
>>> >> >> >> > see a need to add a standalone flush operation if all relevant archs
>>> >> >> >> > provide memcpy_flushcache(). As for commit, I'd say just use wmb()
>>> >> >> >> > directly since all archs define it. Alternatively we could introduce
>>> >> >> >> > memcpy_flushcache_relaxed() to be the un-ordered version of the copy
>>> >> >> >> > routine and memcpy_flushcache() would imply a wmb().
>>> >> >> >>
>>> >> >> >> But memcpy_flushcache() on ARM64 is slow.
>>> >> >>
>>> >> >> Right, so again, what is wrong with memcpy_flushcache_relaxed() +
>>> >> >> wmb() or otherwise making memcpy_flushcache() ordered. I do not see
>>> >> >> that as a trailblazing requirement, I see that as typical review and a
>>> >> >> reduction of the operation space that you are proposing.
>>> >> >
>>> >> > memcpy_flushcache on ARM64 is generally wrong thing to do, because it is
>>> >> > slower than memcpy and explicit cache flush.
>>> >> >
>>> >> > Suppose that you want to write data to a block device and make it
>>> >> > persistent. So you send a WRITE bio and then a FLUSH bio.
>>> >> >
>>> >> > Now - how to implement these two bios on persistent memory:
>>> >> >
>>> >> > On X86, the WRITE bio does memcpy_flushcache() and the FLUSH bio does
>>> >> > wmb() - this is the optimal implementation.
>>> >> >
>>> >> > But on ARM64, memcpy_flushcache() is suboptimal. On ARM64, the optimal
>>> >> > implementation is that the WRITE bio does just memcpy() and the FLUSH bio
>>> >> > does arch_wb_cache_pmem() on the affected range.
>>> >> >
>>> >> > Why is memcpy_flushcache() is suboptimal on ARM? The ARM architecture
>>> >> > doesn't have non-temporal stores. So, memcpy_flushcache() is implemented
>>> >> > as memcpy() followed by a cache flush.
>>> >> >
>>> >> > Now - if you flush the cache immediatelly after memcpy, the cache is full
>>> >> > of dirty lines and the cache-flushing code has to write these lines back
>>> >> > and that is slow.
>>> >> >
>>> >> > If you flush the cache some time after memcpy (i.e. when the FLUSH bio is
>>> >> > received), the processor already flushed some part of the cache on its
>>> >> > own, so the cache-flushing function has less work to do and it is faster.
>>> >> >
>>> >> > So the conclusion is - don't use memcpy_flushcache on ARM. This problem
>>> >> > cannot be fixed by a better implementation of memcpy_flushcache.
>>> >>
>>> >> It sounds like ARM might be better off with mapping its pmem as
>>> >> write-through rather than write-back, and skip the explicit cache
>>> >
>>> > I doubt it would perform well - write combining combines the writes into a
>>> > larger segments - and write-through doesn't.
>>> >
>>>
>>> Last I checked write-through caching does not disable write combining
>>>
>>> >> management altogether. You speak of "optimal" and "sub-optimal", but
>>> >> what would be more clear is fio measurements of the relative IOPs and
>>> >> latency profiles of the different approaches. The reason I am
>>> >> continuing to push here is that reducing the operation space from
>>> >> 'copy-flush-commit' to just 'copy' or 'copy-commit' simplifies the
>>> >> maintenance long term.
>>> >
>>> > I measured it (with nvme backing store) and late cache flushing has 12%
>>> > better performance than eager flushing with memcpy_flushcache().
>>>
>>> I assume what you're seeing is ARM64 over-flushing the amount of dirty
>>> data so it becomes more efficient to do an amortized flush at the end?
>>> However, that effectively makes memcpy_flushcache() unusable in the
>>> way it can be used on x86. You claimed that ARM does not support
>>> non-temporal stores, but it does, see the STNP instruction. I do not
>>> want to see arch specific optimizations in drivers, so either
>>> write-through mappings is a potential answer to remove the need to
>>> explicitly manage flushing, or just implement STNP hacks in
>>> memcpy_flushcache() like you did with MOVNT on x86.
>>>
>>> > 131836 4k iops - vs - 117016.
>>>
>>> To be clear this is memcpy_flushcache() vs memcpy + flush?
>>
>> I found out what caused the difference. I used dax_flush on the version of
>> dm-writecache that I had on the ARM machine (with the kernel 4.14, because
>> it is the last version where dax on ramdisk works) - and I thought that
>> dax_flush flushes the cache, but it doesn't.
>>
>> When I replaced dax_flush with arch_wb_cache_pmem, the performance
>> difference between early flushing and late flushing disappeared.
>>
>> So I think we can remove this per-architecture switch from dm-writecache.
>
> Great find! Thanks for the due diligence. Feel free to add:
>
>     Acked-by: Dan Williams <dan.j.williams@intel.com>
>
> ...on the reworks to unify ARM and x86.

One more note. The side effect of not using dax_flush() is that you
may end up flushing caches on systems where the platform has asserted
it will take responsibility for flushing caches at power loss. If /
when those systems become more prevalent we may want to think of a way
to combine the non-temporal optimization and the cache-flush-bypass
optimizations. However that is something that can wait for a later
change beyond 4.18.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-30 22:39                       ` Dan Williams
@ 2018-05-31  8:19                         ` Mikulas Patocka
  2018-05-31 14:51                           ` Dan Williams
  0 siblings, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-31  8:19 UTC (permalink / raw)
  To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm



On Wed, 30 May 2018, Dan Williams wrote:

> > Great find! Thanks for the due diligence. Feel free to add:
> >
> >     Acked-by: Dan Williams <dan.j.williams@intel.com>
> >
> > ...on the reworks to unify ARM and x86.
> 
> One more note. The side effect of not using dax_flush() is that you
> may end up flushing caches on systems where the platform has asserted
> it will take responsibility for flushing caches at power loss. If /
> when those systems become more prevalent we may want to think of a way
> to combine the non-temporal optimization and the cache-flush-bypass
> optimizations. However that is something that can wait for a later
> change beyond 4.18.

We could define memcpy_flushpmem, that falls back to memcpy or 
memcpy_flushcache, depending on whether the platform flushes the caches at 
power loss or not.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-31  8:19                         ` Mikulas Patocka
@ 2018-05-31 14:51                           ` Dan Williams
  2018-05-31 15:31                             ` Mikulas Patocka
  0 siblings, 1 reply; 40+ messages in thread
From: Dan Williams @ 2018-05-31 14:51 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm

On Thu, May 31, 2018 at 1:19 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Wed, 30 May 2018, Dan Williams wrote:
>
>> > Great find! Thanks for the due diligence. Feel free to add:
>> >
>> >     Acked-by: Dan Williams <dan.j.williams@intel.com>
>> >
>> > ...on the reworks to unify ARM and x86.
>>
>> One more note. The side effect of not using dax_flush() is that you
>> may end up flushing caches on systems where the platform has asserted
>> it will take responsibility for flushing caches at power loss. If /
>> when those systems become more prevalent we may want to think of a way
>> to combine the non-temporal optimization and the cache-flush-bypass
>> optimizations. However that is something that can wait for a later
>> change beyond 4.18.
>
> We could define memcpy_flushpmem, that falls back to memcpy or
> memcpy_flushcache, depending on whether the platform flushes the caches at
> power loss or not.

The problem is that some platforms only power fail protect a subset of
the physical address range, but yes, if the platform makes a global
assertion we can globally replace memcpy_flushpmem() with plain
memcpy.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-31 14:51                           ` Dan Williams
@ 2018-05-31 15:31                             ` Mikulas Patocka
  2018-05-31 16:39                               ` Dan Williams
  0 siblings, 1 reply; 40+ messages in thread
From: Mikulas Patocka @ 2018-05-31 15:31 UTC (permalink / raw)
  To: Dan Williams; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm



On Thu, 31 May 2018, Dan Williams wrote:

> On Thu, May 31, 2018 at 1:19 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> > On Wed, 30 May 2018, Dan Williams wrote:
> >
> >> > Great find! Thanks for the due diligence. Feel free to add:
> >> >
> >> >     Acked-by: Dan Williams <dan.j.williams@intel.com>
> >> >
> >> > ...on the reworks to unify ARM and x86.
> >>
> >> One more note. The side effect of not using dax_flush() is that you
> >> may end up flushing caches on systems where the platform has asserted
> >> it will take responsibility for flushing caches at power loss. If /
> >> when those systems become more prevalent we may want to think of a way
> >> to combine the non-temporal optimization and the cache-flush-bypass
> >> optimizations. However that is something that can wait for a later
> >> change beyond 4.18.
> >
> > We could define memcpy_flushpmem, that falls back to memcpy or
> > memcpy_flushcache, depending on whether the platform flushes the caches at
> > power loss or not.
> 
> The problem is that some platforms only power fail protect a subset of
> the physical address range,

How can this be? A psysical address may be cached on any CPU, so either 
there is enough power to flush all the CPUs' caches or there isn't.

How does the CPU design that protects only a part of physical addresses 
look like?

> but yes, if the platform makes a global
> assertion we can globally replace memcpy_flushpmem() with plain
> memcpy.

Mikulas
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [patch 4/4] dm-writecache: use new API for flushing
  2018-05-31 15:31                             ` Mikulas Patocka
@ 2018-05-31 16:39                               ` Dan Williams
  0 siblings, 0 replies; 40+ messages in thread
From: Dan Williams @ 2018-05-31 16:39 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Mike Snitzer, linux-nvdimm

On Thu, May 31, 2018 at 8:31 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Thu, 31 May 2018, Dan Williams wrote:
>
>> On Thu, May 31, 2018 at 1:19 AM, Mikulas Patocka <mpatocka@redhat.com> wrote:
>> >
>> >
>> > On Wed, 30 May 2018, Dan Williams wrote:
>> >
>> >> > Great find! Thanks for the due diligence. Feel free to add:
>> >> >
>> >> >     Acked-by: Dan Williams <dan.j.williams@intel.com>
>> >> >
>> >> > ...on the reworks to unify ARM and x86.
>> >>
>> >> One more note. The side effect of not using dax_flush() is that you
>> >> may end up flushing caches on systems where the platform has asserted
>> >> it will take responsibility for flushing caches at power loss. If /
>> >> when those systems become more prevalent we may want to think of a way
>> >> to combine the non-temporal optimization and the cache-flush-bypass
>> >> optimizations. However that is something that can wait for a later
>> >> change beyond 4.18.
>> >
>> > We could define memcpy_flushpmem, that falls back to memcpy or
>> > memcpy_flushcache, depending on whether the platform flushes the caches at
>> > power loss or not.
>>
>> The problem is that some platforms only power fail protect a subset of
>> the physical address range,
>
> How can this be? A psysical address may be cached on any CPU, so either
> there is enough power to flush all the CPUs' caches or there isn't.
>
> How does the CPU design that protects only a part of physical addresses
> look like?

It's not necessarily a CPU problem, it may be a problem of having
enough stored energy to potentially software loops to flush caches.
There's also the consideration that a general purpose platform may mix
persistent memory technologies from different vendors where some might
be flash-backed DRAM and some might be persistent media directly.

For now I don't think we need to worry about it, but I don't want to
make the assumption that this property is platform global given the
history of how persistent memory has been deployed to date.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-06-03 15:03 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20180519052503.325953342@debian.vm>
     [not found] ` <20180519052635.567438191@debian.vm>
2018-05-22  6:39   ` [dm-devel] [patch 4/4] dm-writecache: use new API for flushing Christoph Hellwig
2018-05-22 18:41     ` Mike Snitzer
2018-05-22 19:00       ` Dan Williams
2018-05-22 19:19         ` Mike Snitzer
2018-05-22 19:27           ` Dan Williams
2018-05-22 20:52             ` Mike Snitzer
2018-05-22 22:53               ` [dm-devel] " Jeff Moyer
2018-05-23 20:57                 ` Mikulas Patocka
2018-05-28 13:52             ` Mikulas Patocka
2018-05-28 17:41               ` Dan Williams
2018-05-30 13:42                 ` [dm-devel] " Jeff Moyer
2018-05-30 13:51                   ` Mikulas Patocka
2018-05-30 13:52                   ` Jeff Moyer
2018-05-24  8:15         ` Mikulas Patocka
     [not found]   ` <CAPcyv4iEtfuVGPR0QMKcafv2XFwSj3nzxjX8cuXpXe00akAvYA@mail.gmail.com>
     [not found]     ` <alpine.LRH.2.02.1805250213270.13894@file01.intranet.prod.int.rdu2.redhat.com>
2018-05-25 12:51       ` Mike Snitzer
2018-05-25 15:57         ` Dan Williams
2018-05-26  7:02           ` Mikulas Patocka
2018-05-26 15:26             ` Dan Williams
2018-05-28 13:32               ` Mikulas Patocka
2018-05-28 18:14                 ` Dan Williams
2018-05-30 13:07                   ` Mikulas Patocka
2018-05-30 13:16                     ` Mike Snitzer
2018-05-30 13:21                       ` Mikulas Patocka
2018-05-30 13:26                         ` Mike Snitzer
2018-05-30 13:33                           ` Mikulas Patocka
2018-05-30 13:54                             ` Mike Snitzer
2018-05-30 14:09                               ` Mikulas Patocka
2018-05-30 14:21                                 ` Mike Snitzer
2018-05-30 14:46                                   ` Mikulas Patocka
2018-05-31  3:42                                     ` Mike Snitzer
2018-06-03 15:03                                       ` Mikulas Patocka
2018-05-31  3:39                                 ` Mike Snitzer
2018-05-31  8:16                                   ` Mikulas Patocka
2018-05-31 12:09                                     ` Mike Snitzer
2018-05-30 15:58                     ` Dan Williams
2018-05-30 22:39                       ` Dan Williams
2018-05-31  8:19                         ` Mikulas Patocka
2018-05-31 14:51                           ` Dan Williams
2018-05-31 15:31                             ` Mikulas Patocka
2018-05-31 16:39                               ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).