Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	David Miller <davem@davemloft.net>,
	hch@infradead.org, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	peterx@redhat.com, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linux-parisc@vger.kernel.org
Subject: Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
Date: Tue, 12 Mar 2019 17:11:17 -0400	[thread overview]
Message-ID: <20190312211117.GB25147@redhat.com> (raw)
In-Reply-To: <1552424017.14432.11.camel@HansenPartnership.com>

On Tue, Mar 12, 2019 at 01:53:37PM -0700, James Bottomley wrote:
> I've got to say: optimize what?  What code do we ever have in the
> kernel that kmap's a page and then doesn't do anything with it? You can
> guarantee that on kunmap the page is either referenced (needs
> invalidating) or updated (needs flushing). The in-kernel use of kmap is
> always
> 
> kmap
> do something with the mapped page
> kunmap
> 
> In a very short interval.  It seems just a simplification to make
> kunmap do the flush if needed rather than try to have the users
> remember.  The thing which makes this really simple is that on most
> architectures flush and invalidate is the same operation.  If you
> really want to optimize you can use the referenced and dirty bits on
> the kmapped pte to tell you what operation to do, but if your flush is
> your invalidate, you simply assume the data needs flushing on kunmap
> without checking anything.

Except other archs like arm64 and sparc do the cache flushing on
copy_to_user_page and copy_user_page, not on kunmap.

#define copy_user_page(to,from,vaddr,pg) __cpu_copy_user_page(to, from, vaddr)
void __cpu_copy_user_page(void *kto, const void *kfrom, unsigned long vaddr)
{
	struct page *page = virt_to_page(kto);
	copy_page(kto, kfrom);
	flush_dcache_page(page);
}
#define copy_user_page(to, from, vaddr, page)	\
	do {	copy_page(to, from);		\
		sparc_flush_page_to_ram(page);	\
	} while (0)

And they do nothing on kunmap:

static inline void kunmap(struct page *page)
{
	BUG_ON(in_interrupt());
	if (!PageHighMem(page))
		return;
	kunmap_high(page);
}
void kunmap_high(struct page *page)
{
	unsigned long vaddr;
	unsigned long nr;
	unsigned long flags;
	int need_wakeup;
	unsigned int color = get_pkmap_color(page);
	wait_queue_head_t *pkmap_map_wait;

	lock_kmap_any(flags);
	vaddr = (unsigned long)page_address(page);
	BUG_ON(!vaddr);
	nr = PKMAP_NR(vaddr);

	/*
	 * A count must never go down to zero
	 * without a TLB flush!
	 */
	need_wakeup = 0;
	switch (--pkmap_count[nr]) {
	case 0:
		BUG();
	case 1:
		/*
		 * Avoid an unnecessary wake_up() function call.
		 * The common case is pkmap_count[] == 1, but
		 * no waiters.
		 * The tasks queued in the wait-queue are guarded
		 * by both the lock in the wait-queue-head and by
		 * the kmap_lock.  As the kmap_lock is held here,
		 * no need for the wait-queue-head's lock.  Simply
		 * test if the queue is empty.
		 */
		pkmap_map_wait = get_pkmap_wait_queue_head(color);
		need_wakeup = waitqueue_active(pkmap_map_wait);
	}
	unlock_kmap_any(flags);

	/* do wake-up, if needed, race-free outside of the spin lock */
	if (need_wakeup)
		wake_up(pkmap_map_wait);
}
static inline void kunmap(struct page *page)
{
}

because they already did it just above.


> > Which means after we fix vhost to add the flush_dcache_page after
> > kunmap, Parisc will get a double hit (but it also means Parisc was
> > the only one of those archs needed explicit cache flushes, where
> > vhost worked correctly so far.. so it kinds of proofs your point of
> > giving up being the safe choice).
> 
> What double hit?  If there's no cache to flush then cache flush is a
> no-op.  It's also a highly piplineable no-op because the CPU has the L1
> cache within easy reach.  The only event when flush takes a large
> amount time is if we actually have dirty data to write back to main
> memory.

The double hit is in parisc copy_to_user_page:

#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
	flush_cache_page(vma, vaddr, page_to_pfn(page)); \
	memcpy(dst, src, len); \
	flush_kernel_dcache_range_asm((unsigned long)dst, (unsigned long)dst + len); \
} while (0)

That is executed just before kunmap:

static inline void kunmap(struct page *page)
{
	flush_kernel_dcache_page_addr(page_address(page));
}

Can't argue about the fact your "safer" kunmap is safer, but we cannot
rely on common code unless we remove some optimization from the common
code abstractions and we make all archs do kunmap like parisc.

Thanks,
Andrea

next prev parent reply	other threads:[~2019-03-12 21:11 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-06  7:18 [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap() Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 1/5] vhost: generalize adding used elem Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 2/5] vhost: fine grain userspace memory accessors Jason Wang
2019-03-06 10:45   ` Christophe de Dinechin
2019-03-07  2:38     ` Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 3/5] vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 4/5] vhost: introduce helpers to get the size of metadata area Jason Wang
2019-03-06 10:56   ` Christophe de Dinechin
2019-03-07  2:40     ` Jason Wang
2019-03-06 18:43   ` Souptick Joarder
2019-03-07  2:42     ` Jason Wang
2019-03-06  7:18 ` [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address Jason Wang
2019-03-06 16:31   ` Michael S. Tsirkin
2019-03-07  2:45     ` Jason Wang
2019-03-07 15:34       ` Michael S. Tsirkin
2019-03-07 19:09         ` Jerome Glisse
2019-03-07 19:38           ` Andrea Arcangeli
2019-03-07 20:17             ` Jerome Glisse
2019-03-07 21:27               ` Andrea Arcangeli
2019-03-08  9:13                 ` Jason Wang
2019-03-08 19:11                   ` Andrea Arcangeli
2019-03-11  7:21                     ` Jason Wang
2019-03-11 14:45                 ` Jan Kara
2019-03-08  8:31         ` Jason Wang
2019-03-07 15:47   ` Michael S. Tsirkin
2019-03-07 17:56     ` Michael S. Tsirkin
2019-03-07 19:16       ` Andrea Arcangeli
2019-03-08  8:50         ` Jason Wang
2019-03-08 14:58           ` Jerome Glisse
2019-03-11  7:18             ` Jason Wang
2019-03-08 19:48           ` Andrea Arcangeli
2019-03-08 20:06             ` Jerome Glisse
2019-03-11  7:40             ` Jason Wang
2019-03-11 12:48               ` Michael S. Tsirkin
2019-03-11 13:43                 ` Andrea Arcangeli
2019-03-12  2:56                   ` Jason Wang
2019-03-12  3:51                     ` Michael S. Tsirkin
2019-03-12  2:52                 ` Jason Wang
2019-03-12  3:50                   ` Michael S. Tsirkin
2019-03-12  7:15                     ` Jason Wang
2019-03-07 19:17       ` Jerome Glisse
2019-03-08  2:21         ` Michael S. Tsirkin
2019-03-08  2:55           ` Jerome Glisse
2019-03-08  3:16             ` Michael S. Tsirkin
2019-03-08  3:40               ` Jerome Glisse
2019-03-08  3:43                 ` Michael S. Tsirkin
2019-03-08  3:45                   ` Jerome Glisse
2019-03-08  9:15                     ` Jason Wang
2019-03-08  8:58         ` Jason Wang
2019-03-08 12:56           ` Michael S. Tsirkin
2019-03-08 15:02             ` Jerome Glisse
2019-03-08 19:13           ` Andrea Arcangeli
2019-03-08 14:12 ` [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap() Christoph Hellwig
2019-03-11  7:13   ` Jason Wang
2019-03-11 13:59     ` Michael S. Tsirkin
2019-03-11 18:14       ` David Miller
2019-03-12  2:59         ` Jason Wang
2019-03-12  3:52           ` Michael S. Tsirkin
2019-03-12  7:17             ` Jason Wang
2019-03-12 11:54               ` Michael S. Tsirkin
2019-03-12 15:46                 ` James Bottomley
2019-03-12 20:04                   ` Andrea Arcangeli
2019-03-12 20:53                     ` James Bottomley
2019-03-12 21:11                       ` Andrea Arcangeli [this message]
2019-03-12 21:19                         ` James Bottomley
2019-03-12 21:53                           ` Andrea Arcangeli
2019-03-12 22:02                             ` James Bottomley
2019-03-12 22:50                               ` Andrea Arcangeli
2019-03-12 22:57                                 ` James Bottomley
2019-03-13 16:05                       ` Christoph Hellwig
2019-03-13 16:37                         ` James Bottomley
2019-03-14 10:42                           ` Michael S. Tsirkin
2019-03-14 13:49                             ` Jason Wang
2019-03-14 19:33                               ` Andrea Arcangeli
2019-03-15  4:39                                 ` Jason Wang
2019-03-12  5:14           ` James Bottomley
2019-03-12  7:51             ` Jason Wang
2019-03-12  7:53               ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190312211117.GB25147@redhat.com \
    --to=aarcange@redhat.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=davem@davemloft.net \
    --cc=hch@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).