From: David Hildenbrand <david@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Michael S . Tsirkin" <mst@redhat.com>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Thomas Huth" <thuth@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Eduardo Habkost" <eduardo@habkost.net>
Subject: Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)
Date: Thu, 22 Jun 2023 14:52:13 +0200 [thread overview]
Message-ID: <4da58c74-cc42-6159-87df-905d4441537c@redhat.com> (raw)
In-Reply-To: <08d68f19-5570-464e-a79e-6d79cb046c40@maciej.szmigiero.name>
On 22.06.23 14:14, Maciej S. Szmigiero wrote:
> On 22.06.2023 14:06, David Hildenbrand wrote:
>> On 22.06.23 13:17, Maciej S. Szmigiero wrote:
>>> On 22.06.2023 13:15, David Hildenbrand wrote:
>>>> On 22.06.23 13:12, Maciej S. Szmigiero wrote:
>>>>> On 22.06.2023 13:01, David Hildenbrand wrote:
>>>>>> [...]
>>>>>>
>>>>>>>>>> We'd use a memory region container as device memory region (like [1]) and would have to handle the !memdev case (I can help with that). > Into that, you can map the RAM memory region on demand (and eventually even using multiple slots like [1]).
>>>>>>>>>>
>>>>>>>>>> (2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code handle (un)plugging of the device.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (1) feels cleanest to me, although it will require a bit more work.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I also think approach (1) makes more sense as it avoids memslot metadata
>>>>>>>>> overhead for not-yet-hot-added parts of the memory backing device.
>>>>>>>>>
>>>>>>>>> Not sure what you mean that the !memdev case would be problematic in this
>>>>>>>>> case - it is working in the current driver shape so why would adding
>>>>>>>>> potential memory subregions (used in the memdev case) change that?
>>>>>>>>
>>>>>>>> I'm thinking about the case where you have a hv-balloon device without a memdev.
>>>>>>>>
>>>>>>>> Without -m X,maxmem=y we don't currently expect to have memory devices around
>>>>>>>> (and especially them getting (un)plugged. But why should we "force" to set the
>>>>>>>> "maxmem" option
>>>>>>>
>>>>>>> I guess it's only a small change to QEMU to allow having hv-balloon
>>>>>>> device (without a memdev) even in the case where there's no "maxmem"
>>>>>>> option given on the QEMU command line.
>>>>>>>
>>>>>>>>
>>>>>>>> I hope I'll find some time soonish to prototype what I have in mind, to see
>>>>>>>> if it could be made working.
>>>>>>>>
>>>>>>>
>>>>>>> Okay, so I'll wait for your prototype before commencing further work on
>>>>>>> the next version of this driver.
>>>>>>
>>>>>> About to have something simplistic running -- I think. Want to test with a Linux VM, but I don't seem to get it working (also without my changes).
>>>>>>
>>>>>>
>>>>>> #!/bin/bash
>>>>>>
>>>>>> build/qemu-system-x86_64 \
>>>>>> --enable-kvm \
>>>>>> -m 4G,maxmem=36G \
>>>>>> -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
>>>>>> -smp 16 \
>>>>>> -nographic \
>>>>>> -nodefaults \
>>>>>> -net nic -net user \
>>>>>> -chardev stdio,nosignal,id=serial \
>>>>>> -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
>>>>>> -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
>>>>>> -device isa-serial,chardev=serial \
>>>>>> -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
>>>>>> -mon chardev=monitor,mode=readline \
>>>>>> -device vmbus-bridge \
>>>>>> -object memory-backend-ram,size=2G,id=mem0 \
>>>>>> -device hv-balloon,id=hv1,memdev=mem0
>>>>>>
>>>>>>
>>>>>>
>>>>>> [root@vm-0 ~]# uname -r
>>>>>> 6.3.5-100.fc37.x86_64
>>>>>> [root@vm-0 ~]# modprobe hv_balloon
>>>>>> modprobe: ERROR: could not insert 'hv_balloon': No such device
>>>>>>
>>>>>>
>>>>>> Any magic flag I am missing? Or is there something preventing this to work with Linux VMs?
>>>>>>
>>>>>
>>>>> Haven't tested the driver with Linux guests in a long time (as it is
>>>>> targeting Windows), but I think you need to disable KVM PV interface for
>>>>> the Hyper-V one to be detected by Linux.
>>>>>
>>>>> Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
>>>>> the detected hypervisor is now Hyper-V.
>>>>>
>>>>> Also, you need to disable S4 in the guest for hot-add capability to work
>>>>> (I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).
>>>>>
>>>>> Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
>>>>> to QEMU command line to see what's happening.
>>>>
>>>> VM is not happy:
>>>>
>>>> [ 1.908595] BUG: kernel NULL pointer dereference, address: 0000000000000007
>>>> [ 1.908837] #PF: supervisor read access in kernel mode
>>>> [ 1.908837] #PF: error_code(0x0000) - not-present page
>>>> [ 1.908837] PGD 0 P4D 0
>>>> [ 1.908837] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>> [ 1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 6.3.5-100.fc37.x86_64 #1
>>>> [ 1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-p4
>>>> [ 1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
>>>> [ 1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 03 00 00 44 89 c3 4c 89 cb
>>>> [ 1.908837] RSP: 0018:ffff95b680ad7950 EFLAGS: 00010286
>>>> [ 1.908837] RAX: ffff95b680ad79e0 RBX: 0000000000000002 RCX: 0000000000000003
>>>> [ 1.908837] RDX: 0000000000000000 RSI: ffff8a0283a3c558 RDI: ffffffffa4b376e0
>>>> [ 1.908837] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
>>>> [ 1.908837] R10: ffff8a02811034ec R11: 0000000000000000 R12: ffffffffffffffff
>>>> [ 1.908837] R13: ffff8a02811034e8 R14: ffff8a02811034e8 R15: 0000000000000000
>>>> [ 1.908837] FS: 00007f3bb2e7d0c0(0000) GS:ffff8a02bbd40000(0000) knlGS:0000000000000000
>>>> [ 1.908837] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 1.908837] CR2: 0000000000000007 CR3: 0000000100a58002 CR4: 0000000000770ee0
>>>> [ 1.908837] PKRU: 55555554
>>>> [ 1.908837] Call Trace:
>>>> [ 1.908837] <TASK>
>>>> [ 1.908837] ? __die+0x23/0x70
>>>> [ 1.908837] ? page_fault_oops+0x171/0x4e0
>>>> [ 1.908837] ? prepare_alloc_pages.constprop.0+0xf6/0x1a0
>>>> [ 1.908837] ? exc_page_fault+0x74/0x170
>>>> [ 1.908837] ? asm_exc_page_fault+0x26/0x30
>>>> [ 1.908837] ? acpi_ns_lookup+0x8f/0x4c0
>>>> [ 1.908837] acpi_ns_get_node_unlocked+0xdd/0x110
>>>> [ 1.908837] ? down_timeout+0x3e/0x60
>>>> [ 1.908837] ? acpi_ns_get_node+0x3e/0x60
>>>> [ 1.908837] acpi_ns_get_node+0x3e/0x60
>>>> [ 1.908837] acpi_ns_evaluate+0x1cb/0x2d0
>>>> [ 1.908837] acpi_ut_evaluate_object+0x68/0x1c0
>>>> [ 1.908837] acpi_rs_get_method_data+0x37/0x80
>>>> [ 1.908837] ? __pfx_vmbus_walk_resources+0x10/0x10 [hv_vmbus]
>>>> [ 1.908837] acpi_walk_resources+0x91/0xe0
>>>> [ 1.908837] vmbus_acpi_add+0x87/0x170 [hv_vmbus]
>>>> [ 1.908837] acpi_device_probe+0x47/0x160
>>>> [ 1.908837] really_probe+0x19f/0x400
>>>> [ 1.908837] ? __pfx___driver_attach+0x10/0x10
>>>> [ 1.908837] __driver_probe_device+0x78/0x160
>>>> [ 1.908837] driver_probe_device+0x1f/0x90
>>>> [ 1.908837] __driver_attach+0xd2/0x1c0
>>>> [ 1.908837] bus_for_each_dev+0x85/0xd0
>>>> [ 1.908837] bus_add_driver+0x116/0x220
>>>> [ 1.908837] driver_register+0x59/0x100
>>>> [ 1.908837] ? __pfx_init_module+0x10/0x10 [hv_vmbus]
>>>> [ 1.908837] hv_acpi_init+0x39/0xff0 [hv_vmbus]
>>>> [ 1.908837] ? __pfx_init_module+0x10/0x10 [hv_vmbus]
>>>> [ 1.908837] do_one_initcall+0x5a/0x240
>>>> [ 1.908837] do_init_module+0x4a/0x210
>>>> [ 1.908837] __do_sys_init_module+0x17f/0x1b0
>>>> [ 1.908837] do_syscall_64+0x5c/0x90
>>>> [ 1.908837] ? handle_mm_fault+0x11e/0x310
>>>> [ 1.908837] ? do_user_addr_fault+0x1e0/0x720
>>>> [ 1.908837] ? exc_page_fault+0x74/0x170
>>>> [ 1.908837] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>
>>>
>>> I guess *few* people run Linux with QEMU Hyper-V interfaces
>>> implementation..
>>>> Maybe I'll have to install a Windows guest :/
>>>>
>>> I think that makes more sense, since we're targeting Windows anyway.
>>>
>>
>> Having installed fairly recent Win10 and running with master+your patches, I still can't get it to work.
>>
>> Windows is stuck booting (before the little circle starts turning).
>>
>> Removing the hv-balloon device makes it work again (well, at least the circle spins again my windows installation now seems to be broken and I have to reinstall ... windows).
>>
>> Do you have a working cmdline for Windows I can try?
>>
>
> Do you get any tracing output from the driver when it is loaded or does its mere presence
> make your Windows installation fail to boot?
>
> I'm was testing using "qemu-system-x86_64 -machine q35,accel=kvm \
> -cpu host,vmx,-mpx,-pmu,check,hv-crash,hv-relaxed,hv-time,hv-vapic,hv-spinlocks=0x1fff,hv-
^ That magic collection of flags seems to make Win10 happy.
Master+your patches
(qemu) balloon 3000
(qemu) info balloon
balloon: actual=3000
(qemu) balloon 8000
(qemu) info balloon
balloon: actual=6144
As the VM has 4G and the memdev 2G, that's expected (although I'd have
expected an error when trying to set above what's possible).
On my prototype:
(qemu) balloon 3000
(qemu) info balloon
balloon: actual=3000
(qemu) balloon 8000
(qemu) info balloon
balloon: actual=6144
Further (some adjustments from me):
(qemu) info memory-devices
Memory device [hv-balloon]: "hv1"
memaddr: 0x800000000
max-size: 2147483648
memdev: /objects/mem0
(qemu) info mtree
...
address-space: device-memory
0000000180000000-000000097fffffff (prio 0, i/o): device-memory
0000000800000000-000000087fffffff (prio 0, i/o): hv-balloon
0000000800000000-000000087fffffff (prio 0, ram): mem0
...
(qemu) info memory_size_summary
base memory: 4294967296
plugged memory: 2147483648
Without a memdev:
(qemu) info memory-devices
Memory device [hv-balloon]: "hv1"
max-size: 0
I noticed that sometimes balloon requests get essentially ignored.
Even retrying to set a target doesn't change anything. Only after
some time it eventually works.
(qemu) balloon 3000
(qemu) info balloon
balloon: actual=4096
...
(qemu) info balloon
balloon: actual=4096
(qemu) balloon 4000
(qemu) info balloon
balloon: actual=4096
...
(qemu) balloon 3000
(qemu) info balloon
balloon: actual=4096
(qemu) balloon 5000
(qemu) balloon 2000
(qemu) info balloon
balloon: actual=3152
(qemu) info balloon
balloon: actual=2000
Maybe the VM needs some time after start-up to actually
process requests.
Anyhow, the prototype lives at:
https://github.com/davidhildenbrand/qemu/tree/hv-balloon
Right now it's on top of:
https://github.com/davidhildenbrand/qemu/tree/virtio-mem-memslots
Patches/commits:
1) memory-device: Support empty memory devices
2) memory-device: Drop size alignment check
-> Two added patches
3) error: define g_autoptr() cleanup function for the Error type
4) Add Hyper-V Dynamic Memory Protocol definitions
5) Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)
-> Your unmodified patches
6) ea0e4775ca tmp
-> Me going wild
The alignment thingy is indeed ugly, but unless we want to fail hotadd for devices
with crazy alignment requirements, it's probably best the way it is on that branch.
You should definitely split up patch #3 into self-contained logical chunks
(e.g., split off the whole hotadd part, common code changes, ...). Also,
I'm not sure if the whole dynamic OurRangePlugged allocation thingy is
not really required, looks like that can be optimized.
I have some other cleanups in mind, but that is best kept to later, once the
patches are in an easier-to-review-sized shape.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-06-22 12:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-12 14:00 [PATCH][RESEND v5 0/3] Hyper-V Dynamic Memory Protocol driver (hv-balloon 🎈️) Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 1/3] error: define g_autoptr() cleanup function for the Error type Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 2/3] Add Hyper-V Dynamic Memory Protocol definitions Maciej S. Szmigiero
2023-06-12 14:00 ` [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon) Maciej S. Szmigiero
2023-06-12 17:42 ` David Hildenbrand
2023-06-13 17:57 ` Maciej S. Szmigiero
2023-06-19 15:58 ` David Hildenbrand
2023-06-20 20:13 ` Maciej S. Szmigiero
2023-06-21 10:32 ` David Hildenbrand
2023-06-21 18:17 ` Maciej S. Szmigiero
2023-06-22 11:01 ` David Hildenbrand
2023-06-22 11:12 ` Maciej S. Szmigiero
2023-06-22 11:15 ` David Hildenbrand
2023-06-22 11:17 ` Maciej S. Szmigiero
2023-06-22 12:06 ` David Hildenbrand
2023-06-22 12:14 ` Maciej S. Szmigiero
2023-06-22 12:27 ` Maciej S. Szmigiero
2023-06-22 12:52 ` David Hildenbrand [this message]
2023-06-22 18:45 ` Maciej S. Szmigiero
2023-06-23 12:23 ` David Hildenbrand
2023-06-23 18:04 ` Maciej S. Szmigiero
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4da58c74-cc42-6159-87df-905d4441537c@redhat.com \
--to=david@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=mail@maciej.szmigiero.name \
--cc=marcandre.lureau@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).