linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jon Doron <arilou@gmail.com>
To: Roman Kagan <rvkagan@yandex-team.ru>,
	kvm@vger.kernel.org, linux-hyperv@vger.kernel.org,
	vkuznets@redhat.com
Subject: Re: [PATCH v2 0/1] x86/kvm/hyper-v: Add support to SYNIC exit on EOM
Date: Sat, 2 May 2020 17:47:15 +0300	[thread overview]
Message-ID: <20200502144715.GA2862@jondnuc> (raw)
In-Reply-To: <20200425061637.GF1917435@jondnuc>

On 25/04/2020, Jon Doron wrote:
>On 24/04/2020, Roman Kagan wrote:
>>On Sat, Apr 18, 2020 at 09:41:27AM +0300, Jon Doron wrote:
>>>On 17/04/2020, Roman Kagan wrote:
>>>> On Thu, Apr 16, 2020 at 03:54:30PM +0300, Jon Doron wrote:
>>>> > On 16/04/2020, Roman Kagan wrote:
>>>> > > On Thu, Apr 16, 2020 at 11:38:46AM +0300, Jon Doron wrote:
>>>> > > > According to the TLFS:
>>>> > > > "A write to the end of message (EOM) register by the guest causes the
>>>> > > > hypervisor to scan the internal message buffer queue(s) associated with
>>>> > > > the virtual processor.
>>>> > > >
>>>> > > > If a message buffer queue contains a queued message buffer, the hypervisor
>>>> > > > attempts to deliver the message.
>>>> > > >
>>>> > > > Message delivery succeeds if the SIM page is enabled and the message slot
>>>> > > > corresponding to the SINTx is empty (that is, the message type in the
>>>> > > > header is set to HvMessageTypeNone).
>>>> > > > If a message is successfully delivered, its corresponding internal message
>>>> > > > buffer is dequeued and marked free.
>>>> > > > If the corresponding SINTx is not masked, an edge-triggered interrupt is
>>>> > > > delivered (that is, the corresponding bit in the IRR is set).
>>>> > > >
>>>> > > > This register can be used by guests to poll for messages. It can also be
>>>> > > > used as a way to drain the message queue for a SINTx that has
>>>> > > > been disabled (that is, masked)."
>>>> > >
>>>> > > Doesn't this work already?
>>>> > >
>>>> >
>>>> > Well if you dont have SCONTROL and a GSI associated with the SINT then it
>>>> > does not...
>>>>
>>>> Yes you do need both of these.
>>>>
>>>> > > > So basically this means that we need to exit on EOM so the hypervisor
>>>> > > > will have a chance to send all the pending messages regardless of the
>>>> > > > SCONTROL mechnaisim.
>>>> > >
>>>> > > I might be misinterpreting the spec, but my understanding is that
>>>> > > SCONTROL {en,dis}ables the message queueing completely.  What the quoted
>>>> > > part means is that a write to EOM should trigger the message source to
>>>> > > push a new message into the slot, regardless of whether the SINT was
>>>> > > masked or not.
>>>> > >
>>>> > > And this (I think, haven't tested) should already work.  The userspace
>>>> > > just keeps using the SINT route as it normally does, posting
>>>> > > notifications to the corresponding irqfd when posting a message, and
>>>> > > waiting on the resamplerfd for the message slot to become free.  If the
>>>> > > SINT is masked KVM will skip injecting the interrupt, that's it.
>>>> > >
>>>> > > Roman.
>>>> >
>>>> > That's what I was thinking originally as well, but then i noticed KDNET as a
>>>> > VMBus client (and it basically runs before anything else) is working in this
>>>> > polling mode, where SCONTROL is disabled and it just loops, and if it saw
>>>> > there is a PENDING message flag it will issue an EOM to indicate it has free
>>>> > the slot.
>>>>
>>>> Who sets up the message page then?  Doesn't it enabe SCONTROL as well?
>>>>
>>>
>>>KdNet is the one setting the SIMP and it's not setting the SCONTROL, ill
>>>paste output of KVM traces for the relevant MSRs
>>>
>>>> Note that, even if you don't see it being enabled by Windows, it can be
>>>> enabled by the firmware and/or by the bootloader.
>>>>
>>>> Can you perhaps try with the SeaBIOS from
>>>> https://src.openvz.org/projects/UP/repos/seabios branch hv-scsi?  It
>>>> enables SCONTROL and leaves it that way.
>>>>
>>>> I'd also suggest tracing kvm_msr events (both reads and writes) for
>>>> SCONTROL and SIMP msrs, to better understand the picture.
>>>>
>>>> So far the change you propose appears too heavy to work around the
>>>> problem of disabled SCONTROL.  You seem to be better off just making
>>>> sure it's enabled (either by the firmware or slighly violating the spec
>>>> and initializing to enabled from the start), and sticking to the
>>>> existing infrastructure for posting messages.
>>>>
>>>
>>>I guess there is something I'm missing here but let's say the BIOS would
>>>have set the SCONTROL but the OS is not setting it, who is in charge of
>>>handling the interrupts?
>>
>>SCONTROL doesn't enable the interrupts, it enables SynIC as a whole.
>>The interrupts are enabled via individual SINTx msrs.  This SeaBIOS
>>branch does exactly this: it enables the SynIC via SCONTROL, and then
>>specific SynIC functionality via SIMP/SIEFP, but doesn't activate SINTx
>>and works in polling mode.
>>
>>I agree that this global SCONTROL switch seems redundant but it appears
>>to match the spec.
>>
>>>> > (There are a bunch of patches i sent on the QEMU mailing list as well  where
>>>> > i CCed you, I will probably revise it a bit but was hoping to get  KVM
>>>> > sorted out first).
>>>>
>>>> I'll look through the archive, should be there, thanks.
>>>>
>>>> Roman.
>>>
>>>I tried testing with both the SeaBIOS branch you have suggested and the
>>>EDK2, unfortunately I could not get the EDK2 build to identify my VM drive
>>>to boot from (not sure why)
>>>
>>>Here is an output of KVM trace for the relevant MSRs (SCONTROL and SIMP)
>>>
>>>QEMU Default BIOS
>>>-----------------
>>> qemu-system-x86-613   [000] ....  1121.080722: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1
>>> qemu-system-x86-613   [000] ....  1121.080722: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1
>>> qemu-system-x86-613   [000] .N..  1121.095592: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1
>>> qemu-system-x86-613   [000] .N..  1121.095592: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1
>>>Choose Windows DebugEntry
>>> qemu-system-x86-613   [001] ....  1165.185227: kvm_msr: msr_read 40000083 = 0x0
>>> qemu-system-x86-613   [001] ....  1165.185255: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1001 host 0
>>> qemu-system-x86-613   [001] ....  1165.185255: kvm_msr: msr_write 40000083 = 0xfa1001
>>> qemu-system-x86-613   [001] ....  1165.193206: kvm_msr: msr_read 40000083 = 0xfa1001
>>> qemu-system-x86-613   [001] ....  1165.193236: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1000 host 0
>>> qemu-system-x86-613   [001] ....  1165.193237: kvm_msr: msr_write 40000083 = 0xfa1000
>>>
>>>
>>>SeaBIOS hv-scsci
>>>----------------
>>> qemu-system-x86-656   [001] ....  1313.072714: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1
>>> qemu-system-x86-656   [001] ....  1313.072714: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1
>>> qemu-system-x86-656   [001] ....  1313.087752: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1
>>> qemu-system-x86-656   [001] ....  1313.087752: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1
>>
>>Initialization (host == 1)
>>
>>> qemu-system-x86-656   [001] ....  1313.156675: kvm_msr: msr_read 40000083 = 0x0
>>> qemu-system-x86-656   [001] ....  1313.156680: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x7fffe001 host 0
>>>Choose Windows DebugEntry
>>
>>I guess this is a bit misplaced timewise, BIOS is still working here
>>
>>> qemu-system-x86-656   [001] ....  1313.156680: kvm_msr: msr_write 40000083 = 0x7fffe001
>>
>>BIOS sets up message page
>>
>>> qemu-system-x86-656   [001] ....  1313.162111: kvm_msr: msr_read 40000080 = 0x0
>>> qemu-system-x86-656   [001] ....  1313.162118: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x1 host 0
>>> qemu-system-x86-656   [001] ....  1313.162119: kvm_msr: msr_write 40000080 = 0x1
>>
>>BIOS activates SCONTROL
>>
>>> qemu-system-x86-656   [001] ....  1313.246758: kvm_msr: msr_read 40000083 = 0x7fffe001
>>> qemu-system-x86-656   [001] ....  1313.246764: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 0
>>> qemu-system-x86-656   [001] ....  1313.246764: kvm_msr: msr_write 40000083 = 0x0
>>
>>BIOS clears message page (it's not needed once the VMBus device was
>>brought up)
>>
>>I guess the choice of Windows DebugEntry appeared somewhere here.
>>
>>> qemu-system-x86-656   [001] ....  1348.904727: kvm_msr: msr_read 40000083 = 0x0
>>> qemu-system-x86-656   [001] ....  1348.904771: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1001 host 0
>>> qemu-system-x86-656   [001] ....  1348.904772: kvm_msr: msr_write 40000083 = 0xfa1001
>>
>>Bootloader (debug stub?) sets up the message page
>>
>>> qemu-system-x86-656   [001] ....  1348.919170: kvm_msr: msr_read 40000083 = 0xfa1001
>>> qemu-system-x86-656   [001] ....  1348.919183: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1000 host 0
>>> qemu-system-x86-656   [001] ....  1348.919183: kvm_msr: msr_write 40000083 = 0xfa1000
>>
>>Message page is being disabled again.
>>
>>I guess you only filtered SCONTROL and SIMP, skipping e.g. SVERSION,
>>GUEST_OS_ID, HYPERCALL, etc., which are also part of the exchange here.
>>
>
>Right my bad :( if you want I can re-run the test with the others as 
>well (do you need me to?)
>
>>> I could not get the EDK2 setup to work though
>>> (https://src.openvz.org/projects/UP/repos/edk2 branch hv-scsi)
>>>
>>>It does not detect my VM hard drive not sure why (this is how i  configured
>>>it:
>>> -drive file=./win10.qcow2,format=qcow2,if=none,id=drive_disk0 \
>>> -device virtio-blk-pci,drive=drive_disk0 \
>>>
>>>(Is there something special i need to configure it order for it to  work?, I
>>>tried building EDK2 with and without SMM_REQUIRE and  SECURE_BOOT_ENABLE)
>>
>>No special configuration I can think of.
>>
>>>But in general it sounds like there is something I dont fully understand
>>>when SCONTROL is enabled, then a GSI is associated with this SintRoute.
>>>
>>>Then when the guest triggers an EOI via the APIC we will trigger the GSI
>>>notification, which will give us another go on trying to copy the message
>>>into it's slot.
>>
>>Right.
>>
>>>So is it the OS that is in charge of setting the EOI?
>>
>>Yes.
>>
>>>If so then it needs to
>>>be aware of SCONTROL being enabled and just having it left set by the BIOS
>>>might not be enough?
>>
>>Yes it needs to be aware of SCONTROL being enabled.  However, this
>>awareness may be based on a pure assumption that the previous entity
>>(BIOS or bootloader) did it already.
>>
>>>Also in the TLFS (looking at v6) they mention that message queueing has "3
>>>exit conditions", which will cause the hypervisor to try and attempt to
>>>deliver the additional messages.
>>>
>>>The 3 exit conditions they refer to are:
>>>* Another message buffer is queued.
>>>* The guest indicates the “end of interrupt” by writing to the APIC’s   EOI
>>>register.
>>>* The guest indicates the “end of message” by writing to the SynIC’s EOM
>>>register.
>>>
>>>Also notice this additional exit is only if there is a pending message and
>>>not for every EOM.
>>
>>This meaning of "exit" doesn't trivially correspond to what we have in
>>KVM.  A write to an msr does cause a vmexit.  Then KVM notifies resample
>>eventfds for all SINTs that have them set up, no matter if there's a
>>pending message in the slot.  It may be slightly more optimal to only
>>notify those having indicated a pending message, but I don't see the
>>current behavior break anything or violate the spec, so, as EOMs are not
>>used on fast paths, I woudn't bother optimizing.
>>
>>Roman.
>
>Hi Roman,
>
>So based on your answer I got to the following conclusions (correct if 
>they are wrong).
>
>First of the one in charge of setting the SCONTROL in the 1st place is 
>the BIOS (I dont have a real Hyper-V setup so I cannot really debug it 
>and see, not sure which BIOS they have or if we can "rip" it out and 
>run it through KVM and see how things look like this way).
>
>If the BIOS has not set the SCONTROL I would expect the OS to have 
>something along the lines:
>if (!(get_scontrol() & ENABLED))
>    set_scontrol(ENABLED);
>
>So I started looking through the entire Windows system looking what 
>can set SCONTROL, I believe I have found the flow to be the following:
>
>VMBus.sys imports winhv.sys (which is an export library) winhv.sys 
>will set the SCONTROL prior to VMBus DriverEntry starting here is the 
>complete flow:
>winhv!DllInitialize -> winhv!WinHvpInitialize -> 
>winhv!WinHvReportPresentHypervisor -> winhv!WinHvpConnectToSynic -> 
>winhv!WinHvpEnableSynic
>
>Eventually WinHvpEnableSynic will simply set SCONTROL (for future 
>reference if anyone needs to look into how HyperV register access 
>works in Windows it seems like there is an enum representing all the 
>HyperV registers and to access it there are helper functions to 
>Get/Set.
>SCONTROL value in the enum is 0x0a0010 .
>
>winhv.sys simply provides very simple API to access the Sints i.e 
>(WinHvSetSint / WinHvSetEndOfMessage / WinHvSetSintOnCurrentProcessor 
>/  WinHvGetSintMessage / etc.)
>
>So basically it seems like the OS does not really care if the BIOS has 
>setup the SCONTROL or not, and does so always (if it can) 
>unfortunately in my flow (via kdnet) VMBus is not loaded yet and so 
>does winhv.sys so they "fallback" into this Polling mode.
>
>So that covers the OS part, after that I have tried looking for 
>relevant code in bootmgr and winload (which are Windows boot loader 
>part (like grub) and I could not find any code that might setup 
>SCONTROL.
>
>From your experience with this did you see Hyper-V BIOS simply setting 
>the SCONTROL? Perhaps if that's the case then the correct fix needs to 
>be in the SeaBIOS and the EDK .
>
>I tried to see if Hyper-V supports giving it a BIOS but could not find 
>anyway of doing this, so it just might be that Hyper-V assumes the 
>BIOS is in charge of setting up SCONTROL for all the boot loader 
>components.
>
>But in a way it sounds weird because I would expect to see KDNet 
>working with the ACPI to trigger the GSI but I could not find any 
>relevant code that might do that.
>
>As I write this I think I'm starting to get your point just to make 
>sure I understand it:
>
>1. When a new SintRoute is created we associate it with a GSI
>2. When an EOM is set, we trigger all the GSIs so QEMU will get    
>execution time and send all pending messages if it can.
>
>So basically like you said everything "works" from our perspective 
>regardless if the system has setup SCONTROL or not, because you 
>trigger the interrupt to QEMU regardless of SCONTROL so it can clear 
>the pending message.
>
>If that's indeed the case then probably the only thing needs fixing in 
>my scenario is in QEMU where it should not really care for the 
>SCONTROL if it's enabled or not.
>
>Sounds about right?
>
>Thanks,
>-- Jon.

Hi Roman, any chance you can have a quick look at this and see if I 
understood you correctly?

Thanks,
-- Jon.

  reply	other threads:[~2020-05-02 14:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-16  8:38 [PATCH v2 0/1] x86/kvm/hyper-v: Add support to SYNIC exit on EOM Jon Doron
2020-04-16  8:38 ` [PATCH v2 1/1] " Jon Doron
2020-04-16 12:00 ` [PATCH v2 0/1] " Roman Kagan
2020-04-16 12:54   ` Jon Doron
2020-04-17 10:42     ` Roman Kagan
2020-04-18  6:41       ` Jon Doron
2020-04-24 12:20         ` Jon Doron
2020-04-24 13:37         ` Roman Kagan
2020-04-25  6:16           ` Jon Doron
2020-05-02 14:47             ` Jon Doron [this message]
2020-05-03 19:19             ` Roman Kagan
2020-05-04 15:55               ` Vitaly Kuznetsov
2020-05-05  8:01                 ` Roman Kagan
2020-05-05 10:38                   ` Jon Doron
2020-05-05 20:00                     ` Roman Kagan
2020-05-06  4:49                       ` Jon Doron
2020-05-06  8:46                         ` Roman Kagan
2020-05-07  3:00                           ` Jon Doron
2020-05-08 14:29                             ` Jon Doron
2020-05-08 16:56                               ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200502144715.GA2862@jondnuc \
    --to=arilou@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=rvkagan@yandex-team.ru \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).