From: David Hildenbrand <david@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Michael S . Tsirkin" <mst@redhat.com>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Thomas Huth" <thuth@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Eduardo Habkost" <eduardo@habkost.net>
Subject: Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)
Date: Fri, 23 Jun 2023 14:23:58 +0200 [thread overview]
Message-ID: <dbb4c59a-bd50-5f26-27ed-a00ae844a48e@redhat.com> (raw)
In-Reply-To: <2ac86eb2-0f30-e654-25b3-f38793e0fba3@maciej.szmigiero.name>
On 22.06.23 20:45, Maciej S. Szmigiero wrote:
> On 22.06.2023 14:52, David Hildenbrand wrote:
>> On 22.06.23 14:14, Maciej S. Szmigiero wrote:
>>> On 22.06.2023 14:06, David Hildenbrand wrote:
>>>> On 22.06.23 13:17, Maciej S. Szmigiero wrote:
>>>>> On 22.06.2023 13:15, David Hildenbrand wrote:
>>>>>> On 22.06.23 13:12, Maciej S. Szmigiero wrote:
>>>>>>> On 22.06.2023 13:01, David Hildenbrand wrote:
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>>>>> We'd use a memory region container as device memory region (like [1]) and would have to handle the !memdev case (I can help with that). > Into that, you can map the RAM memory region on demand (and eventually even using multiple slots like [1]).
>>>>>>>>>>>>
>>>>>>>>>>>> (2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code handle (un)plugging of the device.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (1) feels cleanest to me, although it will require a bit more work.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I also think approach (1) makes more sense as it avoids memslot metadata
>>>>>>>>>>> overhead for not-yet-hot-added parts of the memory backing device.
>>>>>>>>>>>
>>>>>>>>>>> Not sure what you mean that the !memdev case would be problematic in this
>>>>>>>>>>> case - it is working in the current driver shape so why would adding
>>>>>>>>>>> potential memory subregions (used in the memdev case) change that?
>>>>>>>>>>
>>>>>>>>>> I'm thinking about the case where you have a hv-balloon device without a memdev.
>>>>>>>>>>
>>>>>>>>>> Without -m X,maxmem=y we don't currently expect to have memory devices around
>>>>>>>>>> (and especially them getting (un)plugged. But why should we "force" to set the
>>>>>>>>>> "maxmem" option
>>>>>>>>>
>>>>>>>>> I guess it's only a small change to QEMU to allow having hv-balloon
>>>>>>>>> device (without a memdev) even in the case where there's no "maxmem"
>>>>>>>>> option given on the QEMU command line.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I hope I'll find some time soonish to prototype what I have in mind, to see
>>>>>>>>>> if it could be made working.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Okay, so I'll wait for your prototype before commencing further work on
>>>>>>>>> the next version of this driver.
>>>>>>>>
>>>>>>>> About to have something simplistic running -- I think. Want to test with a Linux VM, but I don't seem to get it working (also without my changes).
>>>>>>>>
>>>>>>>>
>>>>>>>> #!/bin/bash
>>>>>>>>
>>>>>>>> build/qemu-system-x86_64 \
>>>>>>>> --enable-kvm \
>>>>>>>> -m 4G,maxmem=36G \
>>>>>>>> -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
>>>>>>>> -smp 16 \
>>>>>>>> -nographic \
>>>>>>>> -nodefaults \
>>>>>>>> -net nic -net user \
>>>>>>>> -chardev stdio,nosignal,id=serial \
>>>>>>>> -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
>>>>>>>> -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
>>>>>>>> -device isa-serial,chardev=serial \
>>>>>>>> -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
>>>>>>>> -mon chardev=monitor,mode=readline \
>>>>>>>> -device vmbus-bridge \
>>>>>>>> -object memory-backend-ram,size=2G,id=mem0 \
>>>>>>>> -device hv-balloon,id=hv1,memdev=mem0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [root@vm-0 ~]# uname -r
>>>>>>>> 6.3.5-100.fc37.x86_64
>>>>>>>> [root@vm-0 ~]# modprobe hv_balloon
>>>>>>>> modprobe: ERROR: could not insert 'hv_balloon': No such device
>>>>>>>>
>>>>>>>>
>>>>>>>> Any magic flag I am missing? Or is there something preventing this to work with Linux VMs?
>>>>>>>>
>>>>>>>
>>>>>>> Haven't tested the driver with Linux guests in a long time (as it is
>>>>>>> targeting Windows), but I think you need to disable KVM PV interface for
>>>>>>> the Hyper-V one to be detected by Linux.
>>>>>>>
>>>>>>> Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
>>>>>>> the detected hypervisor is now Hyper-V.
>>>>>>>
>>>>>>> Also, you need to disable S4 in the guest for hot-add capability to work
>>>>>>> (I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).
>>>>>>>
>>>>>>> Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
>>>>>>> to QEMU command line to see what's happening.
>>>>>>
>>>>>> VM is not happy:
>>>>>>
>>>>>> [ 1.908595] BUG: kernel NULL pointer dereference, address: 0000000000000007
>>>>>> [ 1.908837] #PF: supervisor read access in kernel mode
>>>>>> [ 1.908837] #PF: error_code(0x0000) - not-present page
>>>>>> [ 1.908837] PGD 0 P4D 0
>>>>>> [ 1.908837] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>>>> [ 1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 6.3.5-100.fc37.x86_64 #1
>>>>>> [ 1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-p4
>>>>>> [ 1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
>>>>>> [ 1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 03 00 00 44 89 c3 4c 89 cb
>>>>>> [ 1.908837] RSP: 0018:ffff95b680ad7950 EFLAGS: 00010286
>>>>>> [ 1.908837] RAX: ffff95b680ad79e0 RBX: 0000000000000002 RCX: 0000000000000003
>>>>>> [ 1.908837] RDX: 0000000000000000 RSI: ffff8a0283a3c558 RDI: ffffffffa4b376e0
>>>>>> [ 1.908837] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
>>>>>> [ 1.908837] R10: ffff8a02811034ec R11: 0000000000000000 R12: ffffffffffffffff
>>>>>> [ 1.908837] R13: ffff8a02811034e8 R14: ffff8a02811034e8 R15: 0000000000000000
>>>>>> [ 1.908837] FS: 00007f3bb2e7d0c0(0000) GS:ffff8a02bbd40000(0000) knlGS:0000000000000000
>>>>>> [ 1.908837] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 1.908837] CR2: 0000000000000007 CR3: 0000000100a58002 CR4: 0000000000770ee0
>>>>>> [ 1.908837] PKRU: 55555554
>>>>>> [ 1.908837] Call Trace:
>>>>>> [ 1.908837] <TASK>
>>>>>> [ 1.908837] ? __die+0x23/0x70
>>>>>> [ 1.908837] ? page_fault_oops+0x171/0x4e0
>>>>>> [ 1.908837] ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
>>>>>> [ 1.908837] ? exc_page_fault+0x74/0x170
>>>>>> [ 1.908837] ? asm_exc_page_fault+0x26/0x30
>>>>>> [ 1.908837] ? acpi_ns_lookup+0x8f/0x4c0
>>>>>> [ 1.908837] acpi_ns_get_node_unlocked+0xdd/0x110
>>>>>> [ 1.908837] ? down_timeout+0x3e/0x60
>>>>>> [ 1.908837] ? acpi_ns_get_node+0x3e/0x60
>>>>>> [ 1.908837] acpi_ns_get_node+0x3e/0x60
>>>>>> [ 1.908837] acpi_ns_evaluate+0x1cb/0x2d0
>>>>>> [ 1.908837] acpi_ut_evaluate_object+0x68/0x1c0
>>>>>> [ 1.908837] acpi_rs_get_method_data+0x37/0x80
>>>>>> [ 1.908837] ? __pfx_vmbus_walk_resources+0x10/0x10 [hv_vmbus]
>>>>>> [ 1.908837] acpi_walk_resources+0x91/0xe0
>>>>>> [ 1.908837] vmbus_acpi_add+0x87/0x170 [hv_vmbus]
>>>>>> [ 1.908837] acpi_device_probe+0x47/0x160
>>>>>> [ 1.908837] really_probe+0x19f/0x400
>>>>>> [ 1.908837] ? __pfx___driver_attach+0x10/0x10
>>>>>> [ 1.908837] __driver_probe_device+0x78/0x160
>>>>>> [ 1.908837] driver_probe_device+0x1f/0x90
>>>>>> [ 1.908837] __driver_attach+0xd2/0x1c0
>>>>>> [ 1.908837] bus_for_each_dev+0x85/0xd0
>>>>>> [ 1.908837] bus_add_driver+0x116/0x220
>>>>>> [ 1.908837] driver_register+0x59/0x100
>>>>>> [ 1.908837] ? __pfx_init_module+0x10/0x10 [hv_vmbus]
>>>>>> [ 1.908837] hv_acpi_init+0x39/0xff0 [hv_vmbus]
>>>>>> [ 1.908837] ? __pfx_init_module+0x10/0x10 [hv_vmbus]
>>>>>> [ 1.908837] do_one_initcall+0x5a/0x240
>>>>>> [ 1.908837] do_init_module+0x4a/0x210
>>>>>> [ 1.908837] __do_sys_init_module+0x17f/0x1b0
>>>>>> [ 1.908837] do_syscall_64+0x5c/0x90
>>>>>> [ 1.908837] ? handle_mm_fault+0x11e/0x310
>>>>>> [ 1.908837] ? do_user_addr_fault+0x1e0/0x720
>>>>>> [ 1.908837] ? exc_page_fault+0x74/0x170
>>>>>> [ 1.908837] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>>>
>>>>>
>>>>> I guess *few* people run Linux with QEMU Hyper-V interfaces
>>>>> implementation..
>>>>>> Maybe I'll have to install a Windows guest :/
>>>>>>
>>>>> I think that makes more sense, since we're targeting Windows anyway.
>>>>>
>>>>
>>>> Having installed fairly recent Win10 and running with master+your patches, I still can't get it to work.
>>>>
>>>> Windows is stuck booting (before the little circle starts turning).
>>>>
>>>> Removing the hv-balloon device makes it work again (well, at least the circle spins again my windows installation now seems to be broken and I have to reinstall ... windows).
>>>>
>>>> Do you have a working cmdline for Windows I can try?
>>>>
>>>
>>> Do you get any tracing output from the driver when it is loaded or does its mere presence
>>> make your Windows installation fail to boot?
>>>
>>> I'm was testing using "qemu-system-x86_64 -machine q35,accel=kvm \
>>> -cpu host,vmx,-mpx,-pmu,check,hv-crash,hv-relaxed,hv-time,hv-vapic,hv-spinlocks=0x1fff,hv-
>>
>> ^ That magic collection of flags seems to make Win10 happy.
>
> I will try to determine the minimum set of Hyper-V flags required
> for Windows guests since having to manually specify the "right"
> set seems like a poor UX (and also try to determine why Linux
> guest crashes in ACPI when booted with above flags).
>
> But I think both of these problems are related more to VMBus support
> itself rather than hv-balloon.
That's also my feeling. It would be great to extend
https://www.qemu.org/docs/master/system/i386/hyperv.html
and include some of the dirty details in you next submission, so it's
easier to get running.
[...]
>> Maybe the VM needs some time after start-up to actually
>> process requests.
>
> Yes, there's a waiting period immediately after the guest
> boot before executing any protocol operations in the guest:
>> /*
>> * Some Windows versions (at least Server 2019) will crash with various
>> * error codes when receiving DM protocol requests (at least
>> * DM_MEM_HOT_ADD_REQUEST) immediately after boot.
>> *
>> * It looks like Hyper-V from Server 2016 uses a 50-second after-boot
>> * delay, probably to workaround this issue, so we'll use this value, too.
>> */
>> #define HV_BALLOON_POST_INIT_WAIT (50 * 1000)
>
>
> That's why I suggested running with tracing on, since then
> you can observe the device waiting in the "POST_INIT_WAIT" state
> (and also how things are proceeding in general).
Ah, makes sense.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-06-23 12:24 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-12 14:00 [PATCH][RESEND v5 0/3] Hyper-V Dynamic Memory Protocol driver (hv-balloon 🎈️) Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 1/3] error: define g_autoptr() cleanup function for the Error type Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 2/3] Add Hyper-V Dynamic Memory Protocol definitions Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon) Maciej S. Szmigiero
2023-06-12 17:42 ` David Hildenbrand
2023-06-13 17:57 ` Maciej S. Szmigiero
2023-06-19 15:58 ` David Hildenbrand
2023-06-20 20:13 ` Maciej S. Szmigiero
2023-06-21 10:32 ` David Hildenbrand
2023-06-21 18:17 ` Maciej S. Szmigiero
2023-06-22 11:01 ` David Hildenbrand
2023-06-22 11:12 ` Maciej S. Szmigiero
2023-06-22 11:15 ` David Hildenbrand
2023-06-22 11:17 ` Maciej S. Szmigiero
2023-06-22 12:06 ` David Hildenbrand
2023-06-22 12:14 ` Maciej S. Szmigiero
2023-06-22 12:27 ` Maciej S. Szmigiero
2023-06-22 12:52 ` David Hildenbrand
2023-06-22 18:45 ` Maciej S. Szmigiero
2023-06-23 12:23 ` David Hildenbrand [this message]
2023-06-23 18:04 ` Maciej S. Szmigiero
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dbb4c59a-bd50-5f26-27ed-a00ae844a48e@redhat.com \
--to=david@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=mail@maciej.szmigiero.name \
--cc=marcandre.lureau@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).