linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Ross Zwisler <zwisler@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>, X86 ML <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 3/6] x86, pmem: add PMEM API for persistent memory
Date: Fri, 29 May 2015 08:48:01 -0700	[thread overview]
Message-ID: <CAPcyv4ibjpHF_2mGoBnELajHR_1m8QqPiD1_k8Znmjsv2oF==g@mail.gmail.com> (raw)
In-Reply-To: <1432901277.4282.22.camel@gmail.com>

On Fri, May 29, 2015 at 5:07 AM, Ross Zwisler <zwisler@gmail.com> wrote:
> On Thu, 2015-05-28 at 16:20 -0700, H. Peter Anvin wrote:
>> On 05/28/2015 03:35 PM, Ross Zwisler wrote:
>> > Add a new PMEM API to x86, and allow for architectures that do not
>> > implement this API.  Architectures that implement the PMEM API
>> > should
>> > define ARCH_HAS_PMEM_API in their kernel configuration and must
>> > provide
>> > implementations for persistent_copy(), persistent_flush() and
>> > persistent_sync().
>>
>> >
>> >  void clflush_cache_range(void *addr, unsigned int size);
>> >
>>
>> No, no, no, no, no.  Include the proper header file.
>
> I'm confused - I did inlcude <asm/cacheflush.h> in pmem.h?  The line
> you're quoting above was an unmodified line from asm/cacheflush.h - I
> didn't redefine the prototype for clflush_cache_range() anywhere.
>
> Or does this comment mean that you think we shouldn't have an
> architecture agnostic PMEM API, and that you think the PMEM and ND_BLK
> drivers should just directly include asm/cacheflush.h and use the x86
> APIs directly?
>
>> > +static inline void arch_persistent_flush(void *vaddr, size_t size)
>> > +{
>> > +   clflush_cache_range(vaddr, size);
>> > +}
>>
>> Shouldn't this really be using clwb() -- we really need a
>> clwb_cache_range() I guess?
>
> I think we will need a clwb_cache_range() for DAX, for when it responds
> to a msync() or fsync() call and needs to rip through a bunch of
> memory, writing it back to the DIMMs.  I just didn't add it yet because
> I didn't have a consumer.
>
> It turns out that for the block aperture I/O case we really do need a
> flush instead of a writeback, though, so clflush_cache_range() is
> perfect.  Here's the flow, which is a read from a block window
> aperture:
>
> 1) The nd_blk driver gets a read request, and selects a block window to
> use.  It's entirely possible that this block window's aperture has
> clean cache lines associated with it in the processor cache hierarchy.
>  It shouldn't be possible that it has dirty cache lines - we either
> just did a read, or we did a write and would have used NT stores.
>
> 2) Write a new address into the block window control register.  The
> memory backing the aperture moves to the new address.  Any clean lines
> held in the processor cache are now out of sync.
>
> 3) Flush the cache lines associated with the aperture.  The lines are
> guaranteed to be clean, so the flush will just discard them and no
> writebacks will occur.
>
> 4) Read the contents of the block aperture, servicing the read.
>
> This is the read flow outlined it the "driver writer's guide":
>
> http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
>
>> Incidentally, clflush_cache_range() seems to have the same flaw as
>> the
>> proposed use case for clwb() had... if the buffer is aligned it will
>> needlessly flush the last line twice.  It should really look
>> something
>> like this (which would be a good standalone patch):
>>
>> void clflush_cache_range(void *vaddr, unsigned int size)
>> {
>>         void *vend = vaddr + size - 1;
>>
>>         mb();
>>
>>       vaddr = (void *)
>>               ((unsigned long)vaddr
>>                & ~(boot_cpu_data.x86_clflush_size - 1));
>>
>>         for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
>>                 clflushopt(vaddr);
>>
>>         mb();
>> }
>> EXPORT_SYMBOL_GPL(clflush_cache_range);
>
> Ah, yep, I saw the same thing and already submitted patches to fix.  I
> think this change should be in the TIP tree:
>
> https://lkml.org/lkml/2015/5/11/336
>
>
>> I also note that with your implementation we have a wmb() in
>> arch_persistent_sync() and an mb() in arch_persistent_flush()...
>> surely one is redundant?
>
> Actually, the way that we need to use arch_persistent_flush() for our
> block window read case, the fencing works out so that nothing is
> redundant.  We never actually use both a persistent_sync() call and a
> persistent_flush() call during the same I/O.  Reads use
> persistent_flush() to invalidate obsolete lines in the cache before
> reading real data from the aperture of ete DIMM, and writes use a bunch
> of NT stores followed by a persistent_sync().
>
> The PMEM driver doesn't use persistent_flush() at all - this API is
> only needed for the block window read case.

Then that's not a "persistence flush", that's a shootdown of the
previous mmio block window setting.  If it's only for BLK reads I
think we need to use clflush_cache_range() directly.  Given that BLK
mode already depends on ACPI I think it's fine for now to make BLK
mode depend on x86.  Otherwise, we need a new cross-arch generic cache
flush primitive like io_flush_cache_range() and have BLK mode depend
on ARCH_HAS_IO_FLUSH_CACHE_RANGE.

  reply	other threads:[~2015-05-29 15:48 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-28 22:35 [PATCH 0/6] I/O path improvements for ND_BLK and PMEM Ross Zwisler
2015-05-28 22:35 ` [PATCH 1/6] pmem: add force casts to avoid __iomem annotation Ross Zwisler
2015-05-28 22:47   ` Dan Williams
2015-05-29 11:39     ` Ross Zwisler
2015-05-29 12:53       ` Dan Williams
2015-05-29 13:22         ` Dan Williams
2015-05-28 22:35 ` [PATCH 2/6] nfit: Fix up address spaces, sparse warnings Ross Zwisler
2015-05-28 22:40   ` Dan Williams
2015-05-28 22:35 ` [PATCH 3/6] x86, pmem: add PMEM API for persistent memory Ross Zwisler
2015-05-28 23:20   ` H. Peter Anvin
2015-05-29  0:02     ` Dan Williams
2015-05-29  4:19       ` H. Peter Anvin
2015-05-29 12:11         ` Ross Zwisler
2015-05-29 12:07     ` Ross Zwisler
2015-05-29 15:48       ` Dan Williams [this message]
2015-05-28 22:35 ` [PATCH 4/6] pmem, nd_blk: update I/O paths to use PMEM API Ross Zwisler
2015-05-29 14:11   ` Dan Williams
2015-05-28 22:35 ` [PATCH 5/6] nd_blk: add support for flush hints Ross Zwisler
     [not found] ` <1432852553-24865-1-git-send-email-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2015-05-28 22:35   ` [PATCH 6/6] nd_blk: add support for NVDIMM flags Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4ibjpHF_2mGoBnELajHR_1m8QqPiD1_k8Znmjsv2oF==g@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=zwisler@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).