qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Michael S . Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Thomas Huth" <thuth@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>
Subject: Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)
Date: Fri, 23 Jun 2023 14:23:58 +0200	[thread overview]
Message-ID: <dbb4c59a-bd50-5f26-27ed-a00ae844a48e@redhat.com> (raw)
In-Reply-To: <2ac86eb2-0f30-e654-25b3-f38793e0fba3@maciej.szmigiero.name>

On 22.06.23 20:45, Maciej S. Szmigiero wrote:
> On 22.06.2023 14:52, David Hildenbrand wrote:
>> On 22.06.23 14:14, Maciej S. Szmigiero wrote:
>>> On 22.06.2023 14:06, David Hildenbrand wrote:
>>>> On 22.06.23 13:17, Maciej S. Szmigiero wrote:
>>>>> On 22.06.2023 13:15, David Hildenbrand wrote:
>>>>>> On 22.06.23 13:12, Maciej S. Szmigiero wrote:
>>>>>>> On 22.06.2023 13:01, David Hildenbrand wrote:
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>>>>> We'd use a memory region container as device memory region (like [1]) and would have to handle the !memdev case (I can help with that). > Into that, you can map the RAM memory region on demand (and eventually even using multiple slots like [1]).
>>>>>>>>>>>>
>>>>>>>>>>>> (2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code handle (un)plugging of the device.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (1) feels cleanest to me, although it will require a bit more work.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I also think approach (1) makes more sense as it avoids memslot metadata
>>>>>>>>>>> overhead for not-yet-hot-added parts of the memory backing device.
>>>>>>>>>>>
>>>>>>>>>>> Not sure what you mean that the !memdev case would be problematic in this
>>>>>>>>>>> case - it is working in the current driver shape so why would adding
>>>>>>>>>>> potential memory subregions (used in the memdev case) change that?
>>>>>>>>>>
>>>>>>>>>> I'm thinking about the case where you have a hv-balloon device without a memdev.
>>>>>>>>>>
>>>>>>>>>> Without -m X,maxmem=y we don't currently expect to have memory devices around
>>>>>>>>>> (and especially them getting (un)plugged. But why should we "force" to set the
>>>>>>>>>> "maxmem" option
>>>>>>>>>
>>>>>>>>> I guess it's only a small change to QEMU to allow having hv-balloon
>>>>>>>>> device (without a memdev) even in the case where there's no "maxmem"
>>>>>>>>> option given on the QEMU command line.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I hope I'll find some time soonish to prototype what I have in mind, to see
>>>>>>>>>> if it could be made working.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Okay, so I'll wait for your prototype before commencing further work on
>>>>>>>>> the next version of this driver.
>>>>>>>>
>>>>>>>> About to have something simplistic running -- I think. Want to test with a Linux VM, but I don't seem to get it working (also without my changes).
>>>>>>>>
>>>>>>>>
>>>>>>>> #!/bin/bash
>>>>>>>>
>>>>>>>> build/qemu-system-x86_64 \
>>>>>>>>          --enable-kvm \
>>>>>>>>          -m 4G,maxmem=36G \
>>>>>>>>          -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
>>>>>>>>          -smp 16 \
>>>>>>>>          -nographic \
>>>>>>>>          -nodefaults \
>>>>>>>>          -net nic -net user \
>>>>>>>>          -chardev stdio,nosignal,id=serial \
>>>>>>>>          -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
>>>>>>>>          -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
>>>>>>>>          -device isa-serial,chardev=serial \
>>>>>>>>          -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
>>>>>>>>          -mon chardev=monitor,mode=readline \
>>>>>>>>          -device vmbus-bridge \
>>>>>>>>          -object memory-backend-ram,size=2G,id=mem0 \
>>>>>>>>          -device hv-balloon,id=hv1,memdev=mem0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [root@vm-0 ~]# uname -r
>>>>>>>> 6.3.5-100.fc37.x86_64
>>>>>>>> [root@vm-0 ~]# modprobe hv_balloon
>>>>>>>> modprobe: ERROR: could not insert 'hv_balloon': No such device
>>>>>>>>
>>>>>>>>
>>>>>>>> Any magic flag I am missing? Or is there something preventing this to work with Linux VMs?
>>>>>>>>
>>>>>>>
>>>>>>> Haven't tested the driver with Linux guests in a long time (as it is
>>>>>>> targeting Windows), but I think you need to disable KVM PV interface for
>>>>>>> the Hyper-V one to be detected by Linux.
>>>>>>>
>>>>>>> Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
>>>>>>> the detected hypervisor is now Hyper-V.
>>>>>>>
>>>>>>> Also, you need to disable S4 in the guest for hot-add capability to work
>>>>>>> (I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).
>>>>>>>
>>>>>>> Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
>>>>>>> to QEMU command line to see what's happening.
>>>>>>
>>>>>> VM is not happy:
>>>>>>
>>>>>> [    1.908595] BUG: kernel NULL pointer dereference, address: 0000000000000007
>>>>>> [    1.908837] #PF: supervisor read access in kernel mode
>>>>>> [    1.908837] #PF: error_code(0x0000) - not-present page
>>>>>> [    1.908837] PGD 0 P4D 0
>>>>>> [    1.908837] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>>>> [    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 6.3.5-100.fc37.x86_64 #1
>>>>>> [    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-p4
>>>>>> [    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
>>>>>> [    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 03 00 00 44 89 c3 4c 89 cb
>>>>>> [    1.908837] RSP: 0018:ffff95b680ad7950 EFLAGS: 00010286
>>>>>> [    1.908837] RAX: ffff95b680ad79e0 RBX: 0000000000000002 RCX: 0000000000000003
>>>>>> [    1.908837] RDX: 0000000000000000 RSI: ffff8a0283a3c558 RDI: ffffffffa4b376e0
>>>>>> [    1.908837] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
>>>>>> [    1.908837] R10: ffff8a02811034ec R11: 0000000000000000 R12: ffffffffffffffff
>>>>>> [    1.908837] R13: ffff8a02811034e8 R14: ffff8a02811034e8 R15: 0000000000000000
>>>>>> [    1.908837] FS:  00007f3bb2e7d0c0(0000) GS:ffff8a02bbd40000(0000) knlGS:0000000000000000
>>>>>> [    1.908837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [    1.908837] CR2: 0000000000000007 CR3: 0000000100a58002 CR4: 0000000000770ee0
>>>>>> [    1.908837] PKRU: 55555554
>>>>>> [    1.908837] Call Trace:
>>>>>> [    1.908837]  <TASK>
>>>>>> [    1.908837]  ? __die+0x23/0x70
>>>>>> [    1.908837]  ? page_fault_oops+0x171/0x4e0
>>>>>> [    1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
>>>>>> [    1.908837]  ? exc_page_fault+0x74/0x170
>>>>>> [    1.908837]  ? asm_exc_page_fault+0x26/0x30
>>>>>> [    1.908837]  ? acpi_ns_lookup+0x8f/0x4c0
>>>>>> [    1.908837]  acpi_ns_get_node_unlocked+0xdd/0x110
>>>>>> [    1.908837]  ? down_timeout+0x3e/0x60
>>>>>> [    1.908837]  ? acpi_ns_get_node+0x3e/0x60
>>>>>> [    1.908837]  acpi_ns_get_node+0x3e/0x60
>>>>>> [    1.908837]  acpi_ns_evaluate+0x1cb/0x2d0
>>>>>> [    1.908837]  acpi_ut_evaluate_object+0x68/0x1c0
>>>>>> [    1.908837]  acpi_rs_get_method_data+0x37/0x80
>>>>>> [    1.908837]  ? __pfx_vmbus_walk_resources+0x10/0x10 [hv_vmbus]
>>>>>> [    1.908837]  acpi_walk_resources+0x91/0xe0
>>>>>> [    1.908837]  vmbus_acpi_add+0x87/0x170 [hv_vmbus]
>>>>>> [    1.908837]  acpi_device_probe+0x47/0x160
>>>>>> [    1.908837]  really_probe+0x19f/0x400
>>>>>> [    1.908837]  ? __pfx___driver_attach+0x10/0x10
>>>>>> [    1.908837]  __driver_probe_device+0x78/0x160
>>>>>> [    1.908837]  driver_probe_device+0x1f/0x90
>>>>>> [    1.908837]  __driver_attach+0xd2/0x1c0
>>>>>> [    1.908837]  bus_for_each_dev+0x85/0xd0
>>>>>> [    1.908837]  bus_add_driver+0x116/0x220
>>>>>> [    1.908837]  driver_register+0x59/0x100
>>>>>> [    1.908837]  ? __pfx_init_module+0x10/0x10 [hv_vmbus]
>>>>>> [    1.908837]  hv_acpi_init+0x39/0xff0 [hv_vmbus]
>>>>>> [    1.908837]  ? __pfx_init_module+0x10/0x10 [hv_vmbus]
>>>>>> [    1.908837]  do_one_initcall+0x5a/0x240
>>>>>> [    1.908837]  do_init_module+0x4a/0x210
>>>>>> [    1.908837]  __do_sys_init_module+0x17f/0x1b0
>>>>>> [    1.908837]  do_syscall_64+0x5c/0x90
>>>>>> [    1.908837]  ? handle_mm_fault+0x11e/0x310
>>>>>> [    1.908837]  ? do_user_addr_fault+0x1e0/0x720
>>>>>> [    1.908837]  ? exc_page_fault+0x74/0x170
>>>>>> [    1.908837]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>>>
>>>>>
>>>>> I guess *few* people run Linux with QEMU Hyper-V interfaces
>>>>> implementation..
>>>>>> Maybe I'll have to install a Windows guest :/
>>>>>>
>>>>> I think that makes more sense, since we're targeting Windows anyway.
>>>>>
>>>>
>>>> Having installed fairly recent Win10 and running with master+your patches, I still can't get it to work.
>>>>
>>>> Windows is stuck booting (before the little circle starts turning).
>>>>
>>>> Removing the hv-balloon device makes it work again (well, at least the circle spins again my windows installation now seems to be broken and I have to reinstall ... windows).
>>>>
>>>> Do you have a working cmdline for Windows I can try?
>>>>
>>>
>>> Do you get any tracing output from the driver when it is loaded or does its mere presence
>>> make your Windows installation fail to boot?
>>>
>>> I'm was testing using "qemu-system-x86_64 -machine q35,accel=kvm \
>>>              -cpu host,vmx,-mpx,-pmu,check,hv-crash,hv-relaxed,hv-time,hv-vapic,hv-spinlocks=0x1fff,hv-
>>
>> ^ That magic collection of flags seems to make Win10 happy.
> 
> I will try to determine the minimum set of Hyper-V flags required
> for Windows guests since having to manually specify the "right"
> set seems like a poor UX (and also try to determine why Linux
> guest crashes in ACPI when booted with above flags).
> 
> But I think both of these problems are related more to VMBus support
> itself rather than hv-balloon.

That's also my feeling. It would be great to extend

https://www.qemu.org/docs/master/system/i386/hyperv.html

and include some of the dirty details in you next submission, so it's 
easier to get running.

[...]

>> Maybe the VM needs some time after start-up to actually
>> process requests.
> 
> Yes, there's a waiting period immediately after the guest
> boot before executing any protocol operations in the guest:
>> /*
>>   * Some Windows versions (at least Server 2019) will crash with various
>>   * error codes when receiving DM protocol requests (at least
>>   * DM_MEM_HOT_ADD_REQUEST) immediately after boot.
>>   *
>>   * It looks like Hyper-V from Server 2016 uses a 50-second after-boot
>>   * delay, probably to workaround this issue, so we'll use this value, too.
>>   */
>> #define HV_BALLOON_POST_INIT_WAIT (50 * 1000)
> 
> 
> That's why I suggested running with tracing on, since then
> you can observe the device waiting in the "POST_INIT_WAIT" state
> (and also how things are proceeding in general).

Ah, makes sense.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-06-23 12:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-12 14:00 [PATCH][RESEND v5 0/3] Hyper-V Dynamic Memory Protocol driver (hv-balloon 🎈️) Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 1/3] error: define g_autoptr() cleanup function for the Error type Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 2/3] Add Hyper-V Dynamic Memory Protocol definitions Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon) Maciej S. Szmigiero
2023-06-12 17:42   ` David Hildenbrand
2023-06-13 17:57     ` Maciej S. Szmigiero
2023-06-19 15:58       ` David Hildenbrand
2023-06-20 20:13         ` Maciej S. Szmigiero
2023-06-21 10:32           ` David Hildenbrand
2023-06-21 18:17             ` Maciej S. Szmigiero
2023-06-22 11:01               ` David Hildenbrand
2023-06-22 11:12                 ` Maciej S. Szmigiero
2023-06-22 11:15                   ` David Hildenbrand
2023-06-22 11:17                     ` Maciej S. Szmigiero
2023-06-22 12:06                       ` David Hildenbrand
2023-06-22 12:14                         ` Maciej S. Szmigiero
2023-06-22 12:27                           ` Maciej S. Szmigiero
2023-06-22 12:52                           ` David Hildenbrand
2023-06-22 18:45                             ` Maciej S. Szmigiero
2023-06-23 12:23                               ` David Hildenbrand [this message]
2023-06-23 18:04                               ` Maciej S. Szmigiero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dbb4c59a-bd50-5f26-27ed-a00ae844a48e@redhat.com \
    --to=david@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=mail@maciej.szmigiero.name \
    --cc=marcandre.lureau@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).