public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: devel@linuxdriverproject.org,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Jork Loeser <Jork.Loeser@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>, X86 ML <x86@kernel.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush
Date: Tue, 23 May 2017 14:36:42 +0200	[thread overview]
Message-ID: <87bmqju4v9.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <CALCETrWApganOO+Tqi63u-HW1d=j0iELFbioNaEf_hBUMoYhpQ@mail.gmail.com> (Andy Lutomirski's message of "Mon, 22 May 2017 11:28:06 -0700")

[-- Attachment #1: Type: text/plain, Size: 5498 bytes --]

Andy Lutomirski <luto@kernel.org> writes:

> On Mon, May 22, 2017 at 3:43 AM, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>> Andy Lutomirski <luto@kernel.org> writes:
>>
>>> On 05/19/2017 07:09 AM, Vitaly Kuznetsov wrote:
>>>> Hyper-V host can suggest us to use hypercall for doing remote TLB flush,
>>>> this is supposed to work faster than IPIs.
>>>>
>>>> Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls
>>>> we need to put the input somewhere in memory and we don't really want to
>>>> have memory allocation on each call so we pre-allocate per cpu memory areas
>>>> on boot. These areas are of fixes size, limit them with an arbitrary number
>>>> of 16 (16 gvas are able to specify 16 * 4096 pages).
>>>>
>>>> pv_ops patching is happening very early so we need to separate
>>>> hyperv_setup_mmu_ops() and hyper_alloc_mmu().
>>>>
>>>> It is possible and easy to implement local TLB flushing too and there is
>>>> even a hint for that. However, I don't see a room for optimization on the
>>>> host side as both hypercall and native tlb flush will result in vmexit. The
>>>> hint is also not set on modern Hyper-V versions.
>>>
>>> Why do local flushes exit?
>>
>> "exist"? I don't know, to be honest. To me it makes no difference from
>> hypervisor's point of view as intercepting tlb flushing instructions is
>> not any different from implmenting a hypercall.
>>
>> Hyper-V gives its guests 'hints' to indicate if they need to use
>> hypercalls for remote/locat TLB flush and I don't remember seeing
>> 'local' bit set.
>
> What I meant was: why aren't local flushes handled directly in the
> guest without exiting to the host?  Or are they?  In principle,
> INVPCID should just work, right?  Even reading and writing CR3 back
> should work if the hypervisor sets up the magic list of allowed CR3
> values, right?
>
> I guess on older CPUs there might not be any way to flush the local
> TLB without exiting, but I'm not *that* familiar with the details of
> the virtualization extensions.
>

Right, local flushes should 'just work'. If for whatever reason
hypervisor decides to trap us it's nothing we can do about it.

>>
>>>
>>>> +static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>>>> +                                struct mm_struct *mm, unsigned long start,
>>>> +                                unsigned long end)
>>>> +{
>>>
>>> What tree will this go through?  I'm about to send a signature change
>>> for this function for tip:x86/mm.
>>
>> I think this was going to get through Greg's char-misc tree but if we
>> need to synchronize I think we can push this through x86.
>
> Works for me.  Linus can probably resolve the trivial conflict.  But
> going through the x86 tree might make sense here if that's okay with
> you.
>

Definitely fine with me, I'll leave this decision up to x86 maintainers,
Hyper-V maintainers, and Greg.

>>
>>>
>>> Also, how would this interact with PCID?  I have PCID patches that I'm
>>> pretty happy with now, and I'm hoping to support PCID in 4.13.
>>>
>>
>> Sorry, I wasn't following this work closely. .flush_tlb_others() hook is
>> not going away from pv_mmu_ops, right? In think case we can have both in
>> 4.13. Or do you see any other clashes?
>>
>
> The issue is that I'm changing the whole flush algorithm.  The main
> patch that affects this is here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/pcid&id=a67bff42e1e55666fdbaddf233a484a8773688c1
>
> The interactions between that patch and paravirt flush helpers may be
> complex, and it'll need some thought.  PCID makes everything even more
> subtle, so just turning off PCID when paravirt flush is involved seems
> the safest for now.  Ideally we'd eventually support PCID and paravirt
> flushes together (and even eventual native remote flushes assuming
> they ever get added).

I see. On Hyper-V HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST hypercall's
interface is:
1) List of entries to flush. Each entry is a PFN and lower 12 bits are
used to encode the number of pages after this one (defined by the PFN)
we'd like to flush. We can flush up to 509 entries with one
hypercall (can be extended but requires a pre-allocated memory region).

2) Processor mask

3) Address space id (all 64 bits of CR3. Not sure how it's used within
the hypervisor).

HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX is more or less the same but we
need more space to specify > 64 vCPUs so we'll be able to pass less than
509 entries.

The main advantage compared to sending IPIs, as far as I understand, is
that virtual CPUs which are not currently scheduled don't need flushing
and we can't know this from within the guest.

I agree that disabling PCID for paravirt flush users for now is a good
option, let's have it merged and tested without this additional
complexity and make another round after.

>
> Also, can you share the benchmark you used for these patches?

I didn't do much while writing the patchset, mostly I was running the
attached dumb trasher (32 pthreads doing mmap/munmap). On a 16 vCPU
Hyper-V 2016 guest I get the following (just re-did the test with
4.12-rc1):

Before the patchset:
# time ./pthread_mmap ./randfile 

real	3m33.118s
user	0m3.698s
sys	3m16.624s

After the patchset:
# time ./pthread_mmap ./randfile 

real	2m19.920s
user	0m2.662s
sys	2m9.948s

K. Y.'s guys at Microsoft did additional testing for the patchset on
different Hyper-V deployments including Azure, they may share their
findings too.

-- 
  Vitaly


[-- Attachment #2: pthread_mmap.c --]
[-- Type: text/plain, Size: 1195 bytes --]

#include <pthread.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define nthreads 32
#define pagecount 16384
#define nrounds 10000
#define nchunks 10
#define PAGE_SIZE 4096

int fd;
unsigned long v;

void *threadf(void *ptr)
{
	unsigned long *addr;
	int i, j;

	for (j = 0; j < nrounds; j++) {
		for (i = 0; i < nchunks; i++) {
			addr = mmap(NULL, PAGE_SIZE * pagecount, PROT_READ, MAP_SHARED, fd, i * PAGE_SIZE);
			if (addr == MAP_FAILED) {
				fprintf(stderr, "mmap\n");
				exit(1);
			}
			v += *addr;
			munmap(addr, PAGE_SIZE * pagecount);
		}
	}
}

int main(int argc, char *argv[]) {
	pthread_t thr[nthreads];
	int i;

	if (argc < 2) {
		fprintf(stderr, "usage: %s <some-big-file>\n", argv[0]);
		exit(1);
	}

	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		fprintf(stderr, "open\n");
		exit(1);
	}
	
	for (i = 0; i < nthreads; i++) {
		if(pthread_create(&thr[i], NULL, threadf, NULL)) {
			fprintf(stderr, "pthread_create\n");
			exit(1);
		}
	}

	for (i = 0; i < nthreads; i++) {
		if(pthread_join(thr[i], NULL)) {
			fprintf(stderr, "pthread_join\n");
			exit(1);
		}
	}

	return 0;
}

  reply	other threads:[~2017-05-23 12:36 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-19 14:09 [PATCH v3 00/10] Hyper-V: praravirtualized remote TLB flushing and hypercall improvements Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 01/10] x86/hyper-v: include hyperv/ only when CONFIG_HYPERV is set Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 02/10] x86/hyper-v: stash the max number of virtual/logical processor Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 03/10] x86/hyper-v: make hv_do_hypercall() inline Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 04/10] x86/hyper-v: fast hypercall implementation Vitaly Kuznetsov
2017-05-21  3:18   ` Andy Lutomirski
2017-05-22 10:44     ` Vitaly Kuznetsov
2017-05-22 22:04       ` Andy Lutomirski
2017-05-19 14:09 ` [PATCH v3 05/10] hyper-v: use fast hypercall for HVCALL_SIGNAL_EVENT Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 06/10] x86/hyper-v: implement rep hypercalls Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 07/10] hyper-v: globalize vp_index Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 08/10] x86/hyper-v: use hypercall for remote TLB flush Vitaly Kuznetsov
2017-05-21  3:23   ` Andy Lutomirski
     [not found]     ` <87zie5tbmm.fsf@vitty.brq.redhat.com>
2017-05-22 14:39       ` KY Srinivasan
2017-05-22 18:28       ` Andy Lutomirski
2017-05-23 12:36         ` Vitaly Kuznetsov [this message]
2017-05-23 17:50           ` KY Srinivasan
2017-06-27  1:36           ` Andy Lutomirski
2017-07-13 12:46             ` Vitaly Kuznetsov
2017-07-14 22:26               ` Andy Lutomirski
2017-05-19 14:09 ` [PATCH v3 09/10] x86/hyper-v: support extended CPU ranges for TLB flush hypercalls Vitaly Kuznetsov
2017-05-19 14:09 ` [PATCH v3 10/10] tracing/hyper-v: trace hyperv_mmu_flush_tlb_others() Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bmqju4v9.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=Jork.Loeser@microsoft.com \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox