All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	David Miller <davem@davemloft.net>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	peterx@redhat.com, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linux-parisc@vger.kernel.org
Subject: Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
Date: Thu, 14 Mar 2019 06:42:21 -0400	[thread overview]
Message-ID: <20190314064004-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1552495028.3022.37.camel@HansenPartnership.com>

On Wed, Mar 13, 2019 at 09:37:08AM -0700, James Bottomley wrote:
> On Wed, 2019-03-13 at 09:05 -0700, Christoph Hellwig wrote:
> > On Tue, Mar 12, 2019 at 01:53:37PM -0700, James Bottomley wrote:
> > > I've got to say: optimize what?  What code do we ever have in the
> > > kernel that kmap's a page and then doesn't do anything with it? You
> > > can
> > > guarantee that on kunmap the page is either referenced (needs
> > > invalidating) or updated (needs flushing). The in-kernel use of
> > > kmap is
> > > always
> > > 
> > > kmap
> > > do something with the mapped page
> > > kunmap
> > > 
> > > In a very short interval.  It seems just a simplification to make
> > > kunmap do the flush if needed rather than try to have the users
> > > remember.  The thing which makes this really simple is that on most
> > > architectures flush and invalidate is the same operation.  If you
> > > really want to optimize you can use the referenced and dirty bits
> > > on the kmapped pte to tell you what operation to do, but if your
> > > flush is your invalidate, you simply assume the data needs flushing
> > > on kunmap without checking anything.
> > 
> > I agree that this would be a good way to simplify the API.   Now
> > we'd just need volunteers to implement this for all architectures
> > that need cache flushing and then remove the explicit flushing in
> > the callers..
> 
> Well, it's already done on parisc ...  I can help with this if we agree
> it's the best way forward.  It's really only architectures that
> implement flush_dcache_page that would need modifying.
> 
> It may also improve performance because some kmap/use/flush/kunmap
> sequences have flush_dcache_page() instead of
> flush_kernel_dcache_page() and the former is hugely expensive and
> usually unnecessary because GUP already flushed all the user aliases.
> 
> In the interests of full disclosure the reason we do it for parisc is
> because our later machines have problems even with clean aliases.  So
> on most VIPT systems, doing kmap/read/kunmap creates a fairly harmless
> clean alias.  Technically it should be invalidated, because if you
> remap the same page to the same colour you get cached stale data, but
> in practice the data is expired from the cache long before that
> happens, so the problem is almost never seen if the flush is forgotten.
>  Our problem is on the P9xxx processor: they have a L1/L2 VIPT L3 PIPT
> cache.  As the L1/L2 caches expire clean data, they place the expiring
> contents into L3, but because L3 is PIPT, the stale alias suddenly
> becomes the default for any read of they physical page because any
> update which dirtied the cache line often gets written to main memory
> and placed into the L3 as clean *before* the clean alias in L1/L2 gets
> expired, so the older clean alias replaces it.
> 
> Our only recourse is to kill all aliases with prejudice before the
> kernel loses ownership.
> 
> > > > Which means after we fix vhost to add the flush_dcache_page after
> > > > kunmap, Parisc will get a double hit (but it also means Parisc
> > > > was the only one of those archs needed explicit cache flushes,
> > > > where vhost worked correctly so far.. so it kinds of proofs your
> > > > point of giving up being the safe choice).
> > > 
> > > What double hit?  If there's no cache to flush then cache flush is
> > > a no-op.  It's also a highly piplineable no-op because the CPU has
> > > the L1 cache within easy reach.  The only event when flush takes a
> > > large amount time is if we actually have dirty data to write back
> > > to main memory.
> > 
> > I've heard people complaining that on some microarchitectures even
> > no-op cache flushes are relatively expensive.  Don't ask me why,
> > but if we can easily avoid double flushes we should do that.
> 
> It's still not entirely free for us.  Our internal cache line is around
> 32 bytes (some have 16 and some have 64) but that means we need 128
> flushes for a page ... we definitely can't pipeline them all.  So I
> agree duplicate flush elimination would be a small improvement.
> 
> James

I suspect we'll keep the copyXuser path around for 32 bit anyway -
right Jason?
So we can also keep using that on parisc...

-- 
MST

WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	linux-parisc@vger.kernel.org, kvm@vger.kernel.org,
	netdev@vger.kernel.org, Jason Wang <jasowang@redhat.com>,
	linux-kernel@vger.kernel.org, peterx@redhat.com,
	virtualization@lists.linux-foundation.org,
	Christoph Hellwig <hch@infradead.org>,
	linux-mm@kvack.org, David Miller <davem@davemloft.net>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
Date: Thu, 14 Mar 2019 06:42:21 -0400	[thread overview]
Message-ID: <20190314064004-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1552495028.3022.37.camel@HansenPartnership.com>

On Wed, Mar 13, 2019 at 09:37:08AM -0700, James Bottomley wrote:
> On Wed, 2019-03-13 at 09:05 -0700, Christoph Hellwig wrote:
> > On Tue, Mar 12, 2019 at 01:53:37PM -0700, James Bottomley wrote:
> > > I've got to say: optimize what?  What code do we ever have in the
> > > kernel that kmap's a page and then doesn't do anything with it? You
> > > can
> > > guarantee that on kunmap the page is either referenced (needs
> > > invalidating) or updated (needs flushing). The in-kernel use of
> > > kmap is
> > > always
> > > 
> > > kmap
> > > do something with the mapped page
> > > kunmap
> > > 
> > > In a very short interval.  It seems just a simplification to make
> > > kunmap do the flush if needed rather than try to have the users
> > > remember.  The thing which makes this really simple is that on most
> > > architectures flush and invalidate is the same operation.  If you
> > > really want to optimize you can use the referenced and dirty bits
> > > on the kmapped pte to tell you what operation to do, but if your
> > > flush is your invalidate, you simply assume the data needs flushing
> > > on kunmap without checking anything.
> > 
> > I agree that this would be a good way to simplify the API.   Now
> > we'd just need volunteers to implement this for all architectures
> > that need cache flushing and then remove the explicit flushing in
> > the callers..
> 
> Well, it's already done on parisc ...  I can help with this if we agree
> it's the best way forward.  It's really only architectures that
> implement flush_dcache_page that would need modifying.
> 
> It may also improve performance because some kmap/use/flush/kunmap
> sequences have flush_dcache_page() instead of
> flush_kernel_dcache_page() and the former is hugely expensive and
> usually unnecessary because GUP already flushed all the user aliases.
> 
> In the interests of full disclosure the reason we do it for parisc is
> because our later machines have problems even with clean aliases.  So
> on most VIPT systems, doing kmap/read/kunmap creates a fairly harmless
> clean alias.  Technically it should be invalidated, because if you
> remap the same page to the same colour you get cached stale data, but
> in practice the data is expired from the cache long before that
> happens, so the problem is almost never seen if the flush is forgotten.
>  Our problem is on the P9xxx processor: they have a L1/L2 VIPT L3 PIPT
> cache.  As the L1/L2 caches expire clean data, they place the expiring
> contents into L3, but because L3 is PIPT, the stale alias suddenly
> becomes the default for any read of they physical page because any
> update which dirtied the cache line often gets written to main memory
> and placed into the L3 as clean *before* the clean alias in L1/L2 gets
> expired, so the older clean alias replaces it.
> 
> Our only recourse is to kill all aliases with prejudice before the
> kernel loses ownership.
> 
> > > > Which means after we fix vhost to add the flush_dcache_page after
> > > > kunmap, Parisc will get a double hit (but it also means Parisc
> > > > was the only one of those archs needed explicit cache flushes,
> > > > where vhost worked correctly so far.. so it kinds of proofs your
> > > > point of giving up being the safe choice).
> > > 
> > > What double hit?  If there's no cache to flush then cache flush is
> > > a no-op.  It's also a highly piplineable no-op because the CPU has
> > > the L1 cache within easy reach.  The only event when flush takes a
> > > large amount time is if we actually have dirty data to write back
> > > to main memory.
> > 
> > I've heard people complaining that on some microarchitectures even
> > no-op cache flushes are relatively expensive.  Don't ask me why,
> > but if we can easily avoid double flushes we should do that.
> 
> It's still not entirely free for us.  Our internal cache line is around
> 32 bytes (some have 16 and some have 64) but that means we need 128
> flushes for a page ... we definitely can't pipeline them all.  So I
> agree duplicate flush elimination would be a small improvement.
> 
> James

I suspect we'll keep the copyXuser path around for 32 bit anyway -
right Jason?
So we can also keep using that on parisc...

-- 
MST

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-03-14 10:42 UTC|newest]

Thread overview: 175+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-06  7:18 [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap() Jason Wang
2019-03-06  7:18 ` Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 1/5] vhost: generalize adding used elem Jason Wang
2019-03-06  7:18 ` Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 2/5] vhost: fine grain userspace memory accessors Jason Wang
2019-03-06  7:18 ` Jason Wang
2019-03-06 10:45   ` Christophe de Dinechin
2019-03-06 10:45   ` Christophe de Dinechin
2019-03-07  2:38     ` Jason Wang
2019-03-07  2:38       ` Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 3/5] vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() Jason Wang
2019-03-06  7:18 ` Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 4/5] vhost: introduce helpers to get the size of metadata area Jason Wang
2019-03-06 10:56   ` Christophe de Dinechin
2019-03-06 10:56   ` Christophe de Dinechin
2019-03-07  2:40     ` Jason Wang
2019-03-07  2:40     ` Jason Wang
2019-03-06 18:43   ` Souptick Joarder
2019-03-07  2:42     ` Jason Wang
2019-03-07  2:42     ` Jason Wang
2019-03-06  7:18 ` Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address Jason Wang
2019-03-06  7:18 ` Jason Wang
2019-03-06 16:31   ` Michael S. Tsirkin
2019-03-06 16:31   ` Michael S. Tsirkin
2019-03-07  2:45     ` Jason Wang
2019-03-07  2:45     ` Jason Wang
2019-03-07 15:34       ` Michael S. Tsirkin
2019-03-07 15:34       ` Michael S. Tsirkin
2019-03-07 19:09         ` Jerome Glisse
2019-03-07 19:38           ` Andrea Arcangeli
2019-03-07 19:38             ` Andrea Arcangeli
2019-03-07 20:17             ` Jerome Glisse
2019-03-07 21:27               ` Andrea Arcangeli
2019-03-08  9:13                 ` Jason Wang
2019-03-08  9:13                 ` Jason Wang
2019-03-08 19:11                   ` Andrea Arcangeli
2019-03-08 19:11                   ` Andrea Arcangeli
2019-03-11  7:21                     ` Jason Wang
2019-03-11  7:21                     ` Jason Wang
2019-03-11 14:45                 ` Jan Kara
2019-03-11 14:45                 ` Jan Kara
2019-03-07 21:27               ` Andrea Arcangeli
2019-03-07 20:17             ` Jerome Glisse
2019-03-07 19:09         ` Jerome Glisse
2019-03-08  8:31         ` Jason Wang
2019-03-08  8:31         ` Jason Wang
2019-03-07 15:47   ` Michael S. Tsirkin
2019-03-07 15:47   ` Michael S. Tsirkin
2019-03-07 17:56     ` Michael S. Tsirkin
2019-03-07 19:16       ` Andrea Arcangeli
2019-03-07 19:16       ` Andrea Arcangeli
2019-03-08  8:50         ` Jason Wang
2019-03-08  8:50           ` Jason Wang
2019-03-08 14:58           ` Jerome Glisse
2019-03-08 14:58           ` Jerome Glisse
2019-03-11  7:18             ` Jason Wang
2019-03-11  7:18             ` Jason Wang
2019-03-08 19:48           ` Andrea Arcangeli
2019-03-08 20:06             ` Jerome Glisse
2019-03-08 20:06             ` Jerome Glisse
2019-03-11  7:40             ` Jason Wang
2019-03-11  7:40               ` Jason Wang
2019-03-11 12:48               ` Michael S. Tsirkin
2019-03-11 12:48               ` Michael S. Tsirkin
2019-03-11 13:43                 ` Andrea Arcangeli
2019-03-12  2:56                   ` Jason Wang
2019-03-12  2:56                   ` Jason Wang
2019-03-12  3:51                     ` Michael S. Tsirkin
2019-03-12  3:51                     ` Michael S. Tsirkin
2019-03-11 13:43                 ` Andrea Arcangeli
2019-03-12  2:52                 ` Jason Wang
2019-03-12  2:52                   ` Jason Wang
2019-03-12  3:50                   ` Michael S. Tsirkin
2019-03-12  3:50                   ` Michael S. Tsirkin
2019-03-12  7:15                     ` Jason Wang
2019-03-12  7:15                     ` Jason Wang
2019-03-08 19:48           ` Andrea Arcangeli
2019-03-07 19:17       ` Jerome Glisse
2019-03-07 19:17       ` Jerome Glisse
2019-03-08  2:21         ` Michael S. Tsirkin
2019-03-08  2:21           ` Michael S. Tsirkin
2019-03-08  2:55           ` Jerome Glisse
2019-03-08  3:16             ` Michael S. Tsirkin
2019-03-08  3:40               ` Jerome Glisse
2019-03-08  3:43                 ` Michael S. Tsirkin
2019-03-08  3:45                   ` Jerome Glisse
2019-03-08  3:45                   ` Jerome Glisse
2019-03-08  9:15                     ` Jason Wang
2019-03-08  9:15                     ` Jason Wang
2019-03-08  3:43                 ` Michael S. Tsirkin
2019-03-08  3:40               ` Jerome Glisse
2019-03-08  3:16             ` Michael S. Tsirkin
2019-03-08  2:55           ` Jerome Glisse
2019-03-08  8:58         ` Jason Wang
2019-03-08  8:58         ` Jason Wang
2019-03-08 12:56           ` Michael S. Tsirkin
2019-03-08 15:02             ` Jerome Glisse
2019-03-08 15:02             ` Jerome Glisse
2019-03-08 12:56           ` Michael S. Tsirkin
2019-03-08 19:13           ` Andrea Arcangeli
2019-03-08 19:13           ` Andrea Arcangeli
2019-03-07 17:56     ` Michael S. Tsirkin
2019-03-08 14:12 ` [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap() Christoph Hellwig
2019-03-08 14:12 ` Christoph Hellwig
2019-03-08 14:12   ` Christoph Hellwig
2019-03-11  7:13   ` Jason Wang
2019-03-11  7:13   ` Jason Wang
2019-03-11  7:13     ` Jason Wang
2019-03-11 13:59     ` Michael S. Tsirkin
2019-03-11 13:59     ` Michael S. Tsirkin
2019-03-11 13:59       ` Michael S. Tsirkin
2019-03-11 18:14       ` David Miller
2019-03-11 18:14         ` David Miller
2019-03-12  2:59         ` Jason Wang
2019-03-12  2:59         ` Jason Wang
2019-03-12  2:59           ` Jason Wang
2019-03-12  3:52           ` Michael S. Tsirkin
2019-03-12  3:52             ` Michael S. Tsirkin
2019-03-12  3:52             ` Michael S. Tsirkin
2019-03-12  7:17             ` Jason Wang
2019-03-12  7:17               ` Jason Wang
2019-03-12 11:54               ` Michael S. Tsirkin
2019-03-12 11:54               ` Michael S. Tsirkin
2019-03-12 11:54                 ` Michael S. Tsirkin
2019-03-12 15:46                 ` James Bottomley
2019-03-12 15:46                   ` James Bottomley
2019-03-12 20:04                   ` Andrea Arcangeli
2019-03-12 20:04                     ` Andrea Arcangeli
2019-03-12 20:04                     ` Andrea Arcangeli
2019-03-12 20:53                     ` James Bottomley
2019-03-12 20:53                       ` James Bottomley
2019-03-12 21:11                       ` Andrea Arcangeli
2019-03-12 21:11                       ` Andrea Arcangeli
2019-03-12 21:11                         ` Andrea Arcangeli
2019-03-12 21:19                         ` James Bottomley
2019-03-12 21:19                           ` James Bottomley
2019-03-12 21:53                           ` Andrea Arcangeli
2019-03-12 21:53                           ` Andrea Arcangeli
2019-03-12 21:53                             ` Andrea Arcangeli
2019-03-12 22:02                             ` James Bottomley
2019-03-12 22:02                               ` James Bottomley
2019-03-12 22:50                               ` Andrea Arcangeli
2019-03-12 22:50                               ` Andrea Arcangeli
2019-03-12 22:50                                 ` Andrea Arcangeli
2019-03-12 22:57                                 ` James Bottomley
2019-03-12 22:57                                   ` James Bottomley
2019-03-13 16:05                       ` Christoph Hellwig
2019-03-13 16:05                         ` Christoph Hellwig
2019-03-13 16:05                         ` Christoph Hellwig
2019-03-13 16:37                         ` James Bottomley
2019-03-13 16:37                           ` James Bottomley
2019-03-14 10:42                           ` Michael S. Tsirkin
2019-03-14 10:42                           ` Michael S. Tsirkin [this message]
2019-03-14 10:42                             ` Michael S. Tsirkin
2019-03-14 13:49                             ` Jason Wang
2019-03-14 13:49                             ` Jason Wang
2019-03-14 13:49                               ` Jason Wang
2019-03-14 19:33                               ` Andrea Arcangeli
2019-03-14 19:33                                 ` Andrea Arcangeli
2019-03-15  4:39                                 ` Jason Wang
2019-03-15  4:39                                   ` Jason Wang
2019-03-15  4:39                                 ` Jason Wang
2019-03-14 19:33                               ` Andrea Arcangeli
2019-03-13 16:05                       ` Christoph Hellwig
2019-03-12  7:17             ` Jason Wang
2019-03-12  5:14           ` James Bottomley
2019-03-12  5:14             ` James Bottomley
2019-03-12  7:51             ` Jason Wang
2019-03-12  7:51               ` Jason Wang
2019-03-12  7:53               ` Jason Wang
2019-03-12  7:53                 ` Jason Wang
2019-03-12  7:53               ` Jason Wang
2019-03-12  7:51             ` Jason Wang
2019-03-11 18:14       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190314064004-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=aarcange@redhat.com \
    --cc=davem@davemloft.net \
    --cc=hch@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.