* Re: QGA fsfreeze blocks QMP: need async fsfreeze or alternative?
2026-02-03 12:58 ` Kostiantyn Kostiuk
@ 2026-02-03 14:20 ` Yan Vugenfirer
2026-02-03 15:27 ` Stefan Hajnoczi
1 sibling, 0 replies; 4+ messages in thread
From: Yan Vugenfirer @ 2026-02-03 14:20 UTC (permalink / raw)
To: Kostiantyn Kostiuk
Cc: Noam Assouline, qemu-devel, michael.roth, Qianqian Zhu,
Stefan Hajnoczi
Hi Noam,
I think Kostiantyn explained quite well the deficiencies of the current qemu-ga.
I can suggest an alternative - there is a virtio-vsock to SSH bridge
for Windows. It can be used to develop a snapshot mechanism that will
give you more control over what happens on the Windows side.
Best regards,
Yan.
On Tue, Feb 3, 2026 at 2:58 PM Kostiantyn Kostiuk <kkostiuk@redhat.com> wrote:
>
> Hi Noam
>
> QEMU agent was developed as a tool with a synchronous API, and adding any async commands requires a redesign of the API. QGA also does not support sending any event to the host.
>
> First of all, if you send 2 asing command "file-open-async" and get 2 responses with FD, how can you know which FD is for which file? Yes, asked about FS freeze API, but the idea is the same. FS-freeze allow to provide a list of volumes to freeze, so you can have 2 requests to freeze 2 sets of volumes. And get the same question.
>
> Regarding multiple agents, this is theoretically possible because QGA is an independent application. If you run each QGA instance with a proper different state folder and a different communication channel, it should work. The main problem is that QGA instances will be independant and when QGA1 blocks all API execution because the guest has frozen FS, QGA2 will allow any command, including FS freeze.
>
> Unfortunately, I have no good answer for you. Windows VSS has a lot of limitations, and we are trying to somehow work with it. Windows VSS doesn't even have an API to report a FS state, so QGA builds and uses internal knowledge that will be out of sync after snapshot restoring.
>
> CC: @Yan Vugenfirer @Qianqian Zhu Do you have any idea?
>
> Best Regards,
> Kostiantyn Kostiuk.
>
>
> On Tue, Feb 3, 2026 at 12:23 PM Noam Assouline <nassouli@redhat.com> wrote:
>>
>> Hello qemu-devel!
>>
>> I’m working on a KubeVirt fix for Windows VSS fsfreeze timeouts (PR #16653). Up to now we’ve relied on libvirt’s default QEMU agent response timeout of 5 seconds, and that often isn’t enough for VSS fsfreeze to complete. This PR proposes increasing the timeout to 60 seconds so the freeze can finish successfully.
>>
>> The challenge and the reason for this email is that qemu-ga processes commands synchronously on a single connection. While guest-fsfreeze-freeze is running, the agent is effectively busy and other commands (e.g. ping, status) will hang until it returns, which can impact pod readiness probes. I’m checking what we can do about this.
>>
>> I’m mainly looking to understand whether this can be addressed in qemu-ga, and to get guidance on the right direction. Is there a supported way to use multiple agent connections/channels, or is an async guest-fsfreeze-freeze with a completion event the more appropriate solution? More generally, any best‑practice guidance around Windows fsfreeze timeouts and responsiveness would be very helpful!
>>
>> Thanks in advance, and cc’ing qemu-ga maintainers.
>>
>> Noam
>> KubeVirt Storage Ecosystem team
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: QGA fsfreeze blocks QMP: need async fsfreeze or alternative?
2026-02-03 12:58 ` Kostiantyn Kostiuk
2026-02-03 14:20 ` Yan Vugenfirer
@ 2026-02-03 15:27 ` Stefan Hajnoczi
1 sibling, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2026-02-03 15:27 UTC (permalink / raw)
To: Kostiantyn Kostiuk
Cc: Noam Assouline, Yan Vugenfirer, qemu-devel, michael.roth,
Qianqian Zhu, Stefan Hajnoczi
[-- Attachment #1: Type: text/plain, Size: 3805 bytes --]
On Tue, Feb 03, 2026 at 02:58:29PM +0200, Kostiantyn Kostiuk wrote:
> Hi Noam
>
> QEMU agent was developed as a tool with a synchronous API, and adding any
> async commands requires a redesign of the API. QGA also does not support
> sending any event to the host.
Regarding the lack of QMP events, this is a bummer because it introduces
latency but polling for completion is a possibility. Enabling QMP events
might also be an option?
> First of all, if you send 2 asing command "file-open-async" and get 2
> responses with FD, how can you know which FD is for which file? Yes, asked
> about FS freeze API, but the idea is the same. FS-freeze allow to provide a
> list of volumes to freeze, so you can have 2 requests to freeze 2 sets of
> volumes. And get the same question.
An async freeze command could take a unique identifier argument that is
passed back to the client when completion is reported. This way the
client can correlate the completion to a specific command.
There are existing async QAPI APIs that can be used as a reference. For
example, qapi/jobs.json. It's a 3-part API where jobs are launched, can
be queried, and can be managed (pause/cancel/dismiss). Querying is
read-only, so the dismiss command can be used to actually reap the job
and make it go away. Something similar could be done for fsfreeze. The
job API was supposed to be generic, but it's only used by the block
layer as far as I'm aware - maybe it could be reused here too?
>
> Regarding multiple agents, this is theoretically possible because QGA is an
> independent application. If you run each QGA instance with a proper
> different state folder and a different communication channel, it should
> work. The main problem is that QGA instances will be independant and when
> QGA1 blocks all API execution because the guest has frozen FS, QGA2 will
> allow any command, including FS freeze.
>
> Unfortunately, I have no good answer for you. Windows VSS has a lot of
> limitations, and we are trying to somehow work with it. Windows VSS doesn't
> even have an API to report a FS state, so QGA builds and uses internal
> knowledge that will be out of sync after snapshot restoring.
>
> CC: @Yan Vugenfirer <yvugenfi@redhat.com> @Qianqian Zhu <qizhu@redhat.com> Do
> you have any idea?
>
> Best Regards,
> Kostiantyn Kostiuk.
>
>
> On Tue, Feb 3, 2026 at 12:23 PM Noam Assouline <nassouli@redhat.com> wrote:
>
> > Hello qemu-devel!
> >
> > I’m working on a KubeVirt fix for Windows VSS fsfreeze timeouts (PR #16653
> > <https://github.com/kubevirt/kubevirt/pull/16653>). Up to now we’ve
> > relied on libvirt’s default QEMU agent response timeout of 5 seconds, and
> > that often isn’t enough for VSS fsfreeze to complete. This PR proposes
> > increasing the timeout to 60 seconds so the freeze can finish successfully.
> >
> > The challenge and the reason for this email is that qemu-ga processes
> > commands synchronously on a single connection. While guest-fsfreeze-freeze
> > is running, the agent is effectively busy and other commands (e.g. ping,
> > status) will hang until it returns, which can impact pod readiness probes.
> > I’m checking what we can do about this.
> >
> > I’m mainly looking to understand whether this can be addressed in qemu-ga,
> > and to get guidance on the right direction. Is there a supported way to use
> > multiple agent connections/channels, or is an async guest-fsfreeze-freeze
> > with a completion event the more appropriate solution? More generally, any
> > best‑practice guidance around Windows fsfreeze timeouts and responsiveness
> > would be very helpful!
> >
> > Thanks in advance, and cc’ing qemu-ga maintainers.
> >
> > Noam
> > KubeVirt Storage Ecosystem team
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread