Re: [PATCH 3/6] x86, pmem: add PMEM API for persistent memory

linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ross Zwisler <zwisler@gmail.com>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-nvdimm@ml01.01.org, x86@kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 3/6] x86, pmem: add PMEM API for persistent memory
Date: Fri, 29 May 2015 06:07:57 -0600	[thread overview]
Message-ID: <1432901277.4282.22.camel@gmail.com> (raw)
In-Reply-To: <5567A2C4.3040407@zytor.com>

On Thu, 2015-05-28 at 16:20 -0700, H. Peter Anvin wrote:
> On 05/28/2015 03:35 PM, Ross Zwisler wrote:
> > Add a new PMEM API to x86, and allow for architectures that do not
> > implement this API.  Architectures that implement the PMEM API 
> > should
> > define ARCH_HAS_PMEM_API in their kernel configuration and must 
> > provide
> > implementations for persistent_copy(), persistent_flush() and
> > persistent_sync().
> 
> >  
> >  void clflush_cache_range(void *addr, unsigned int size);
> >  
> 
> No, no, no, no, no.  Include the proper header file.

I'm confused - I did inlcude <asm/cacheflush.h> in pmem.h?  The line
you're quoting above was an unmodified line from asm/cacheflush.h - I
didn't redefine the prototype for clflush_cache_range() anywhere.

Or does this comment mean that you think we shouldn't have an
architecture agnostic PMEM API, and that you think the PMEM and ND_BLK
drivers should just directly include asm/cacheflush.h and use the x86
APIs directly?

> > +static inline void arch_persistent_flush(void *vaddr, size_t size)
> > +{
> > +   clflush_cache_range(vaddr, size);
> > +}
> 
> Shouldn't this really be using clwb() -- we really need a
> clwb_cache_range() I guess?

I think we will need a clwb_cache_range() for DAX, for when it responds
to a msync() or fsync() call and needs to rip through a bunch of
memory, writing it back to the DIMMs.  I just didn't add it yet because
I didn't have a consumer.

It turns out that for the block aperture I/O case we really do need a
flush instead of a writeback, though, so clflush_cache_range() is
perfect.  Here's the flow, which is a read from a block window
aperture:

1) The nd_blk driver gets a read request, and selects a block window to
use.  It's entirely possible that this block window's aperture has
clean cache lines associated with it in the processor cache hierarchy. 
 It shouldn't be possible that it has dirty cache lines - we either
just did a read, or we did a write and would have used NT stores.

2) Write a new address into the block window control register.  The
memory backing the aperture moves to the new address.  Any clean lines
held in the processor cache are now out of sync.

3) Flush the cache lines associated with the aperture.  The lines are
guaranteed to be clean, so the flush will just discard them and no
writebacks will occur.

4) Read the contents of the block aperture, servicing the read.

This is the read flow outlined it the "driver writer's guide":

http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf  

> Incidentally, clflush_cache_range() seems to have the same flaw as 
> the
> proposed use case for clwb() had... if the buffer is aligned it will
> needlessly flush the last line twice.  It should really look 
> something
> like this (which would be a good standalone patch):
> 
> void clflush_cache_range(void *vaddr, unsigned int size)
> {
>         void *vend = vaddr + size - 1;
> 
>         mb();
> 
>       vaddr = (void *)
>               ((unsigned long)vaddr
>                & ~(boot_cpu_data.x86_clflush_size - 1));
> 
>         for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
>                 clflushopt(vaddr);
> 
>         mb();
> }
> EXPORT_SYMBOL_GPL(clflush_cache_range);

Ah, yep, I saw the same thing and already submitted patches to fix.  I
think this change should be in the TIP tree:

https://lkml.org/lkml/2015/5/11/336

> I also note that with your implementation we have a wmb() in
> arch_persistent_sync() and an mb() in arch_persistent_flush()... 
> surely one is redundant?

Actually, the way that we need to use arch_persistent_flush() for our
block window read case, the fencing works out so that nothing is
redundant.  We never actually use both a persistent_sync() call and a
persistent_flush() call during the same I/O.  Reads use
persistent_flush() to invalidate obsolete lines in the cache before
reading real data from the aperture of ete DIMM, and writes use a bunch
of NT stores followed by a persistent_sync().

The PMEM driver doesn't use persistent_flush() at all - this API is
only needed for the block window read case.

next prev parent reply	other threads:[~2015-05-29 12:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-28 22:35 [PATCH 0/6] I/O path improvements for ND_BLK and PMEM Ross Zwisler
2015-05-28 22:35 ` [PATCH 1/6] pmem: add force casts to avoid __iomem annotation Ross Zwisler
2015-05-28 22:47   ` Dan Williams
2015-05-29 11:39     ` Ross Zwisler
2015-05-29 12:53       ` Dan Williams
2015-05-29 13:22         ` Dan Williams
2015-05-28 22:35 ` [PATCH 2/6] nfit: Fix up address spaces, sparse warnings Ross Zwisler
2015-05-28 22:40   ` Dan Williams
2015-05-28 22:35 ` [PATCH 3/6] x86, pmem: add PMEM API for persistent memory Ross Zwisler
2015-05-28 23:20   ` H. Peter Anvin
2015-05-29  0:02     ` Dan Williams
2015-05-29  4:19       ` H. Peter Anvin
2015-05-29 12:11         ` Ross Zwisler
2015-05-29 12:07     ` Ross Zwisler [this message]
2015-05-29 15:48       ` Dan Williams
2015-05-28 22:35 ` [PATCH 4/6] pmem, nd_blk: update I/O paths to use PMEM API Ross Zwisler
2015-05-29 14:11   ` Dan Williams
2015-05-28 22:35 ` [PATCH 5/6] nd_blk: add support for flush hints Ross Zwisler
     [not found] ` <1432852553-24865-1-git-send-email-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2015-05-28 22:35   ` [PATCH 6/6] nd_blk: add support for NVDIMM flags Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1432901277.4282.22.camel@gmail.com \
    --to=zwisler@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mingo@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).