All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: liu@jiuyang.me, alex@ghiti.fr, waterman@eecs.berkeley.edu,
	Paul Walmsley <paul.walmsley@sifive.com>,
	aou@eecs.berkeley.edu, akpm@linux-foundation.org,
	geert@linux-m68k.org, linux-riscv@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] implement flush_cache_vmap and flush_cache_vunmap for RISC-V
Date: Mon, 12 Apr 2021 14:22:03 +0800	[thread overview]
Message-ID: <20210412142203.6d86e5c6@xhacker.debian> (raw)
In-Reply-To: <mhng-92e28f5c-ced0-4a92-949f-0fd865c0bbf5@palmerdabbelt-glaptop>

On Sun, 11 Apr 2021 14:41:07 -0700 (PDT) 
Palmer Dabbelt <palmer@dabbelt.com> wrote:


> 
> 
> On Sun, 28 Mar 2021 18:55:09 PDT (-0700), liu@jiuyang.me wrote:
> > This patch implements flush_cache_vmap and flush_cache_vunmap for
> > RISC-V, since these functions might modify PTE. Without this patch,
> > SFENCE.VMA won't be added to related codes, which might introduce a bug
> > in some out-of-order micro-architecture implementations.
> >
> > Signed-off-by: Jiuyang Liu <liu@jiuyang.me>
> > ---
> >  arch/riscv/include/asm/cacheflush.h | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> > index 23ff70350992..4adf25248c43 100644
> > --- a/arch/riscv/include/asm/cacheflush.h
> > +++ b/arch/riscv/include/asm/cacheflush.h
> > @@ -8,6 +8,14 @@
> >
> >  #include <linux/mm.h>
> >
> > +/*
> > + * flush_cache_vmap and flush_cache_vunmap might modify PTE, needs SFENCE.VMA.
> > + * - flush_cache_vmap is invoked after map_kernel_range() has installed the page table entries.
> > + * - flush_cache_vunmap is invoked before unmap_kernel_range() deletes the page table entries  
> 
> These should have line breaks.
> 
> > + */
> > +#define flush_cache_vmap(start, end) flush_tlb_all()  
> 
> We shouldn't need cache flushes for permission upgrades: the ISA allows
> the old mappings to be visible until a fence, but the theory is that
> window will be sort for reasonable architectures so the overhead of
> flushing the entire TLB will overwhelm the extra faults.  There are a
> handful of places where we preemptively flush, but those are generally
> because we can't handle the faults correctly.
> 
> If you have some benchmark that demonstrates a performance issue on real
> hardware here then I'm happy to talk about this further, but this
> assumption is all over arch/riscv so I'd prefer to keep things
> consistent for now.

IMHO the flush_cache_vmap() isn't necessary. From previous discussion, it
seems the reason to implement flush_cache_vmap() is we missed sfence.vma
in vmalloc related code path. But...
The riscv privileged spec says "In particular, if a leaf PTE is modified but
a subsuming SFENCE.VMA is not executed, either the old translation or the
new translation will be used, but the choice is unpredictable. The behavior
is otherwise well-defined"

*If old translation, we do have a page fault, but the vmalloc_fault() will
take care of it, then local_flush_tlb_page() will sfence.vma properly.

*If new translation, we don't do anything.

In both cases, we don't need to implement the flush_cache_vmap()

From another side, even we insert sfence.vma() in advance rather than
rely on the vmalloc_fault() we still can't ensure other harts use the
new translation. Take below small window case for example:

	cpu0				cpu1
map_kernel_range()
  map_kernel_range_noflush()
					access the new vmalloced space.

  flush_cache_vmap()

That's to say, we sill rely on the vmalloc_fault().


> 
> > +#define flush_cache_vunmap(start, end) flush_tlb_all()  
> 

In flush_cache_vunmap() caller's code path, the translation is modified
*after* the flush_cache_vunmap(), for example:

unmap_kernel_range()
  flush_cache_vunmap()
  vunmap_page_range()
  flush_tlb_kernel_range()

IOW, when we call flush_cache_vunmap(), the translation has not changed.
Instead, I believe it's the flush_tlb_kernel_range() to flush the translations
after we changed the translation in vunmap_page_range()

Regards

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: liu@jiuyang.me, alex@ghiti.fr, waterman@eecs.berkeley.edu,
	Paul Walmsley <paul.walmsley@sifive.com>,
	aou@eecs.berkeley.edu, akpm@linux-foundation.org,
	geert@linux-m68k.org, linux-riscv@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] implement flush_cache_vmap and flush_cache_vunmap for RISC-V
Date: Mon, 12 Apr 2021 14:22:03 +0800	[thread overview]
Message-ID: <20210412142203.6d86e5c6@xhacker.debian> (raw)
In-Reply-To: <mhng-92e28f5c-ced0-4a92-949f-0fd865c0bbf5@palmerdabbelt-glaptop>

On Sun, 11 Apr 2021 14:41:07 -0700 (PDT) 
Palmer Dabbelt <palmer@dabbelt.com> wrote:


> 
> 
> On Sun, 28 Mar 2021 18:55:09 PDT (-0700), liu@jiuyang.me wrote:
> > This patch implements flush_cache_vmap and flush_cache_vunmap for
> > RISC-V, since these functions might modify PTE. Without this patch,
> > SFENCE.VMA won't be added to related codes, which might introduce a bug
> > in some out-of-order micro-architecture implementations.
> >
> > Signed-off-by: Jiuyang Liu <liu@jiuyang.me>
> > ---
> >  arch/riscv/include/asm/cacheflush.h | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> > index 23ff70350992..4adf25248c43 100644
> > --- a/arch/riscv/include/asm/cacheflush.h
> > +++ b/arch/riscv/include/asm/cacheflush.h
> > @@ -8,6 +8,14 @@
> >
> >  #include <linux/mm.h>
> >
> > +/*
> > + * flush_cache_vmap and flush_cache_vunmap might modify PTE, needs SFENCE.VMA.
> > + * - flush_cache_vmap is invoked after map_kernel_range() has installed the page table entries.
> > + * - flush_cache_vunmap is invoked before unmap_kernel_range() deletes the page table entries  
> 
> These should have line breaks.
> 
> > + */
> > +#define flush_cache_vmap(start, end) flush_tlb_all()  
> 
> We shouldn't need cache flushes for permission upgrades: the ISA allows
> the old mappings to be visible until a fence, but the theory is that
> window will be sort for reasonable architectures so the overhead of
> flushing the entire TLB will overwhelm the extra faults.  There are a
> handful of places where we preemptively flush, but those are generally
> because we can't handle the faults correctly.
> 
> If you have some benchmark that demonstrates a performance issue on real
> hardware here then I'm happy to talk about this further, but this
> assumption is all over arch/riscv so I'd prefer to keep things
> consistent for now.

IMHO the flush_cache_vmap() isn't necessary. From previous discussion, it
seems the reason to implement flush_cache_vmap() is we missed sfence.vma
in vmalloc related code path. But...
The riscv privileged spec says "In particular, if a leaf PTE is modified but
a subsuming SFENCE.VMA is not executed, either the old translation or the
new translation will be used, but the choice is unpredictable. The behavior
is otherwise well-defined"

*If old translation, we do have a page fault, but the vmalloc_fault() will
take care of it, then local_flush_tlb_page() will sfence.vma properly.

*If new translation, we don't do anything.

In both cases, we don't need to implement the flush_cache_vmap()

From another side, even we insert sfence.vma() in advance rather than
rely on the vmalloc_fault() we still can't ensure other harts use the
new translation. Take below small window case for example:

	cpu0				cpu1
map_kernel_range()
  map_kernel_range_noflush()
					access the new vmalloced space.

  flush_cache_vmap()

That's to say, we sill rely on the vmalloc_fault().


> 
> > +#define flush_cache_vunmap(start, end) flush_tlb_all()  
> 

In flush_cache_vunmap() caller's code path, the translation is modified
*after* the flush_cache_vunmap(), for example:

unmap_kernel_range()
  flush_cache_vunmap()
  vunmap_page_range()
  flush_tlb_kernel_range()

IOW, when we call flush_cache_vunmap(), the translation has not changed.
Instead, I believe it's the flush_tlb_kernel_range() to flush the translations
after we changed the translation in vunmap_page_range()

Regards

  parent reply	other threads:[~2021-04-12  6:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29  1:55 [PATCH] implement flush_cache_vmap and flush_cache_vunmap for RISC-V Jiuyang Liu
2021-03-29  1:55 ` Jiuyang Liu
2021-03-30  7:02 ` Alex Ghiti
2021-03-30  7:02   ` Alex Ghiti
2021-04-01  6:37 ` Christoph Hellwig
2021-04-01  6:37   ` Christoph Hellwig
2021-04-11 21:41 ` Palmer Dabbelt
2021-04-11 21:41   ` Palmer Dabbelt
2021-04-12  0:13   ` Jiuyang Liu
2021-04-12  0:13     ` Jiuyang Liu
2021-04-12  6:22   ` Jisheng Zhang [this message]
2021-04-12  6:22     ` Jisheng Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210412142203.6d86e5c6@xhacker.debian \
    --to=jisheng.zhang@synaptics.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=aou@eecs.berkeley.edu \
    --cc=geert@linux-m68k.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=liu@jiuyang.me \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=waterman@eecs.berkeley.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.