Issue with QEMU Live Migration

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Issue with QEMU Live Migration
@ 2024-08-21 13:32 Arisetty, Chakri
  2024-08-21 13:56 ` Fabiano Rosas
  0 siblings, 1 reply; 10+ messages in thread
From: Arisetty, Chakri @ 2024-08-21 13:32 UTC (permalink / raw)
  To: qemu-devel@nongnu.org, qemu-block@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

Hello,

I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.

Migration state switches to pre-switchover state during the RAM migration.

My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
Closing prematurely NBD connection results in BLOCK JOB error.
In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.

Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?

Thanks
Chakri

[-- Attachment #2: Type: text/html, Size: 2767 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-21 13:32 Issue with QEMU Live Migration Arisetty, Chakri
@ 2024-08-21 13:56 ` Fabiano Rosas
  2024-08-21 16:55   ` Arisetty, Chakri
  0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-08-21 13:56 UTC (permalink / raw)
  To: Arisetty, Chakri, qemu-devel@nongnu.org, qemu-block@nongnu.org
  Cc: "Peter Xu <peterx, "Kevin Wolf <kwolf,
	"Eric  Blake <eblake

"Arisetty, Chakri" <carisett@akamai.com> writes:

> Hello,
>
> I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.
>
> Migration state switches to pre-switchover state during the RAM migration.
>
> My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
> Closing prematurely NBD connection results in BLOCK JOB error.
> In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.
>
> Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?
>
> Thanks
> Chakri

Hi, I believe it was you who opened this bug as well? 

https://gitlab.com/qemu-project/qemu/-/issues/2482

So the core of the issue here is that the block job is transitioning to
ready while the migration is still ongoing so there's still dirtying
happening.

Have you looked at the documentation at
docs/interop/live-block-operations.rst? Section "QMP invocation for live
storage migration with ``drive-mirror`` + NBD", point 4 says that a
block-job-cancel should be issues after BLOCK_JOB_READY is
reached. Although there is mention of when the migration should be
performed.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-21 13:56 ` Fabiano Rosas
@ 2024-08-21 16:55   ` Arisetty, Chakri
  2024-08-22 13:47     ` Fabiano Rosas
  0 siblings, 1 reply; 10+ messages in thread
From: Arisetty, Chakri @ 2024-08-21 16:55 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel@nongnu.org, qemu-block@nongnu.org
  Cc: "Peter Xu <peterx@redhat.com>"@imap1.dmz-prg2.suse.org,
	"Kevin Wolf <kwolf@redhat.com>"@imap1.dmz-prg2.suse.org,
	"Eric  Blake <eblake@redhat.com>"@imap1.dmz-prg2.suse.org,
	Blew III, Will, Massry, Abraham, Tottenham, Max, Greve, Mark

[-- Attachment #1: Type: text/plain, Size: 4111 bytes --]

Hi,

Thank you for getting back to me.

Yes, I have opened the ticket https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>

> So the core of the issue here is that the block job is transitioning to
> ready while the migration is still ongoing so there's still dirtying
> happening.

Yes, this is the problem I have. RAM migration state is already moved to pre-switchover and mirror block job is moved to "READY" state assuming that there are no more dirty blocks.
But there are still dirty blocks and these dirty block blocks are being transferred to destination host.

I've created a small patch(attached) in mirror.c to put the mirror job back into the "RUNNING" state if there are any dirty pages.
But I still would like to prevent RAM migration state to be moved to pre-switchover when there are dirty blocks.

> docs/interop/live-block-operations.rst? Section "QMP invocation for live
> storage migration with ``drive-mirror`` + NBD", point 4 says that a
> block-job-cancel should be issues after BLOCK_JOB_READY is
> reached. Although there is mention of when the migration should be
> performed.

Thanks for the pointer, I've looked at this part (block-job-cancel). The problem is that QEMU on the source host is still transferring the dirty blocks.
That is the reason I am trying to avoid moving RAM migration state to pre-switchover when there are any dirty pages.

is there a way in QEMU to know if the disk transfer is completed and stop dirty pages being transferred?

Thanks
Chakri


On 8/21/24, 6:56 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de>> wrote:


!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!


"Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com>> writes:


> Hello,
>
> I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.
>
> Migration state switches to pre-switchover state during the RAM migration.
>
> My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
> Closing prematurely NBD connection results in BLOCK JOB error.
> In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.
>
> Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?
>
> Thanks
> Chakri


Hi, I believe it was you who opened this bug as well? 


https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> 


So the core of the issue here is that the block job is transitioning to
ready while the migration is still ongoing so there's still dirtying
happening.


Have you looked at the documentation at
docs/interop/live-block-operations.rst? Section "QMP invocation for live
storage migration with ``drive-mirror`` + NBD", point 4 says that a
block-job-cancel should be issues after BLOCK_JOB_READY is
reached. Although there is mention of when the migration should be
performed.




[-- Attachment #2: qemu-block-job-running.patch --]
[-- Type: application/octet-stream, Size: 2051 bytes --]

diff --git a/block/mirror.c b/block/mirror.c
index 251adc5ae..3457afe1d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1089,6 +1089,10 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
             break;
         }
 
+	if (cnt != 0 && job_is_ready(&s->common.job)) {
+            job_transition_to_running(&s->common.job);
+        }
+
         if (job_is_ready(&s->common.job) && !should_complete) {
             delay_ns = (s->in_flight == 0 &&
                         cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
diff --git a/include/qemu/job.h b/include/qemu/job.h
index e502787dd..87dbef0d2 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -641,6 +641,12 @@ int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
  */
 void job_early_fail(Job *job);
 
+/**
+ * Moves the @job from RUNNING to READY.
+ * Called with job_mutex *not* held.
+ */
+void job_transition_to_running(Job *job);
+
 /**
  * Moves the @job from RUNNING to READY.
  * Called with job_mutex *not* held.
diff --git a/job.c b/job.c
index 72d57f093..298d90817 100644
--- a/job.c
+++ b/job.c
@@ -62,7 +62,7 @@ bool JobSTT[JOB_STATUS__MAX][JOB_STATUS__MAX] = {
     /* C: */ [JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1},
     /* R: */ [JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0},
     /* P: */ [JOB_STATUS_PAUSED]    = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
-    /* Y: */ [JOB_STATUS_READY]     = {0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0},
+    /* Y: */ [JOB_STATUS_READY]     = {0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0},
     /* S: */ [JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
     /* W: */ [JOB_STATUS_WAITING]   = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0},
     /* D: */ [JOB_STATUS_PENDING]   = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
@@ -1035,6 +1035,12 @@ static int job_transition_to_pending_locked(Job *job)
     return 0;
 }
 
+void job_transition_to_running(Job *job)
+{
+    JOB_LOCK_GUARD();
+    job_state_transition_locked(job, JOB_STATUS_RUNNING);
+}
+
 void job_transition_to_ready(Job *job)
 {
     JOB_LOCK_GUARD();

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-21 16:55   ` Arisetty, Chakri
@ 2024-08-22 13:47     ` Fabiano Rosas
  2024-08-23 13:30       ` Arisetty, Chakri
  0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-08-22 13:47 UTC (permalink / raw)
  To: Arisetty, Chakri, qemu-devel@nongnu.org, qemu-block@nongnu.org
  Cc: Peter Xu, Kevin Wolf, Eric Blake, Blew III, Will, Massry, Abraham,
	Tottenham, Max, Greve, Mark

"Arisetty, Chakri" <carisett@akamai.com> writes:

Ugh, it seems I messed up the CC addresses, let's see if this time they
go out right. For those new to the thread, we're discussing this bug:

https://gitlab.com/qemu-project/qemu/-/issues/2482

> Hi,
>
> Thank you for getting back to me.
>
> Yes, I have opened the ticket https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>
>
>> So the core of the issue here is that the block job is transitioning to
>> ready while the migration is still ongoing so there's still dirtying
>> happening.
>
> Yes, this is the problem I have. RAM migration state is already moved to pre-switchover and mirror block job is moved to "READY" state assuming that there are no more dirty blocks.
> But there are still dirty blocks and these dirty block blocks are being transferred to destination host.
>
> I've created a small patch(attached) in mirror.c to put the mirror job back into the "RUNNING" state if there are any dirty pages.
> But I still would like to prevent RAM migration state to be moved to pre-switchover when there are dirty blocks.

It's still not entirely clear to me what the situation is here. When the
migration reaches pre-switchover state the VM is stopped, so there would
be no more IO happening. Is this a matter of a race condition (of sorts)
because pre-switchover happens while the block mirror job is still
transferring the final blocks? Or is it instead about the data being in
traffic over the netword and not yet reaching the destination machine?

Is the disk inactivation after the pre-switchover affecting this at all?

>
>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>> block-job-cancel should be issues after BLOCK_JOB_READY is
>> reached. Although there is mention of when the migration should be
>> performed.
>
> Thanks for the pointer, I've looked at this part (block-job-cancel). The problem is that QEMU on the source host is still transferring the dirty blocks.
> That is the reason I am trying to avoid moving RAM migration state to pre-switchover when there are any dirty pages.
>
> is there a way in QEMU to know if the disk transfer is completed and stop dirty pages being transferred?

Sorry, I can't help here. We have block layer people in CC, they might
be able to advise.

>
> Thanks
> Chakri
>
>
> On 8/21/24, 6:56 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de>> wrote:
>
>
> !-------------------------------------------------------------------|
> This Message Is From an External Sender
> This message came from outside your organization.
> |-------------------------------------------------------------------!
>
>
> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com>> writes:
>
>
>> Hello,
>>
>> I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.
>>
>> Migration state switches to pre-switchover state during the RAM migration.
>>
>> My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
>> Closing prematurely NBD connection results in BLOCK JOB error.
>> In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.
>>
>> Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?
>>
>> Thanks
>> Chakri
>
>
> Hi, I believe it was you who opened this bug as well? 
>
>
> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> 
>
>
> So the core of the issue here is that the block job is transitioning to
> ready while the migration is still ongoing so there's still dirtying
> happening.
>
>
> Have you looked at the documentation at
> docs/interop/live-block-operations.rst? Section "QMP invocation for live
> storage migration with ``drive-mirror`` + NBD", point 4 says that a
> block-job-cancel should be issues after BLOCK_JOB_READY is
> reached. Although there is mention of when the migration should be
> performed.
>
>
>
> diff --git a/block/mirror.c b/block/mirror.c
> index 251adc5ae..3457afe1d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1089,6 +1089,10 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>              break;
>          }
>  
> +	if (cnt != 0 && job_is_ready(&s->common.job)) {
> +            job_transition_to_running(&s->common.job);
> +        }
> +
>          if (job_is_ready(&s->common.job) && !should_complete) {
>              delay_ns = (s->in_flight == 0 &&
>                          cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index e502787dd..87dbef0d2 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -641,6 +641,12 @@ int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
>   */
>  void job_early_fail(Job *job);
>  
> +/**
> + * Moves the @job from RUNNING to READY.
> + * Called with job_mutex *not* held.
> + */
> +void job_transition_to_running(Job *job);
> +
>  /**
>   * Moves the @job from RUNNING to READY.
>   * Called with job_mutex *not* held.
> diff --git a/job.c b/job.c
> index 72d57f093..298d90817 100644
> --- a/job.c
> +++ b/job.c
> @@ -62,7 +62,7 @@ bool JobSTT[JOB_STATUS__MAX][JOB_STATUS__MAX] = {
>      /* C: */ [JOB_STATUS_CREATED]   = {0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1},
>      /* R: */ [JOB_STATUS_RUNNING]   = {0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0},
>      /* P: */ [JOB_STATUS_PAUSED]    = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
> -    /* Y: */ [JOB_STATUS_READY]     = {0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0},
> +    /* Y: */ [JOB_STATUS_READY]     = {0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0},
>      /* S: */ [JOB_STATUS_STANDBY]   = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
>      /* W: */ [JOB_STATUS_WAITING]   = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0},
>      /* D: */ [JOB_STATUS_PENDING]   = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
> @@ -1035,6 +1035,12 @@ static int job_transition_to_pending_locked(Job *job)
>      return 0;
>  }
>  
> +void job_transition_to_running(Job *job)
> +{
> +    JOB_LOCK_GUARD();
> +    job_state_transition_locked(job, JOB_STATUS_RUNNING);
> +}
> +
>  void job_transition_to_ready(Job *job)
>  {
>      JOB_LOCK_GUARD();


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-22 13:47     ` Fabiano Rosas
@ 2024-08-23 13:30       ` Arisetty, Chakri
  2024-08-23 13:41         ` Arisetty, Chakri
  0 siblings, 1 reply; 10+ messages in thread
From: Arisetty, Chakri @ 2024-08-23 13:30 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel@nongnu.org, qemu-block@nongnu.org
  Cc: Peter Xu, Kevin Wolf, Eric Blake, Blew III, Will, Massry, Abraham,
	Tottenham, Max, Greve, Mark

Hi,

Thank you once again!

> It's still not entirely clear to me what the situation is here. When the
> migration reaches pre-switchover state the VM is stopped, so there would
> be no more IO happening. Is this a matter of a race condition (of sorts)
> because pre-switchover happens while the block mirror job is still
> transferring the final blocks? Or is it instead about the data being in
> traffic over the netword and not yet reaching the destination machine?

When the migration reaches to pre-switchover with block-job-cancelled, there are no dirty blocks, But, there are dirty blocks if the block-job is NOT cancelled
and there are dirty blocks, and those blocks are transferred to NBD server.

# When the block mirror job is running before enter pre-switchover state, the dirty count is '0' and job entered into 'ready' state from 'running' state.
# block-job-cancel is NOT issued with the test.
1695226@1724348063.794485:mirror_run <  s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 0 active_write_bytes_in_flight 0 total 5368709120 current 5368709120 deltla 1630 iostatus 0

1695226@1724348063.795152:job_state_transition job 0x55e5b9ffbe40 (ret: 0) attempting allowed transition (running-->ready)

# QMP command for 'query-block-jobs'
1695226@1724348063.845789:qmp_exit_query_block_jobs [{"auto-finalize": true, "io-status": "ok", "device": "drive-scsi-disk-0", "auto-dismiss": true, "busy": false, "len": 5368709120, "offset": 5368709120, "status": "ready", "paused": false, "speed": 100000000, "ready": true, "type": "mirror"}] 1

# RAM migration enters 'pre-switchover', dirty count keeps incrementing and NBD client sending the block pages to NBD server.

1695226@1724348070.968831:mirror_run <  s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 131072 active_write_bytes_in_flight 0 total 5368840192 current 5368709120 deltla 950 iostatus 0
...
1695226@1724348070.970540:mirror_run <  s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 2162688 active_write_bytes_in_flight 0 total 5371002880 current 5368840192 deltla 1547585 iostatus 0
..

RAM migration to enter 'completion' state from 'pre-switchover' takes a very long time for VM with bigger RAM. Stopping/Cancelling block-job during the period causes the disk contents to be lost entire duration.

Is there a way or API/callback in qemu  to indicate there are no dirty blocks and invoke the API from RAM migration code?

I'd appreciate if anyone can help me with it.

Thanks
Chakri


On 8/22/24, 6:47 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de>> wrote:


!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!


"Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com>> writes:


Ugh, it seems I messed up the CC addresses, let's see if this time they
go out right. For those new to the thread, we're discussing this bug:


https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$> 


> Hi,
>
> Thank you for getting back to me.
>
> Yes, I have opened the ticket https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>
>
>> So the core of the issue here is that the block job is transitioning to
>> ready while the migration is still ongoing so there's still dirtying
>> happening.
>
> Yes, this is the problem I have. RAM migration state is already moved to pre-switchover and mirror block job is moved to "READY" state assuming that there are no more dirty blocks.
> But there are still dirty blocks and these dirty block blocks are being transferred to destination host.
>
> I've created a small patch(attached) in mirror.c to put the mirror job back into the "RUNNING" state if there are any dirty pages.
> But I still would like to prevent RAM migration state to be moved to pre-switchover when there are dirty blocks.


It's still not entirely clear to me what the situation is here. When the
migration reaches pre-switchover state the VM is stopped, so there would
be no more IO happening. Is this a matter of a race condition (of sorts)
because pre-switchover happens while the block mirror job is still
transferring the final blocks? Or is it instead about the data being in
traffic over the netword and not yet reaching the destination machine?


Is the disk inactivation after the pre-switchover affecting this at all?


>
>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>> block-job-cancel should be issues after BLOCK_JOB_READY is
>> reached. Although there is mention of when the migration should be
>> performed.
>
> Thanks for the pointer, I've looked at this part (block-job-cancel). The problem is that QEMU on the source host is still transferring the dirty blocks.
> That is the reason I am trying to avoid moving RAM migration state to pre-switchover when there are any dirty pages.
>
> is there a way in QEMU to know if the disk transfer is completed and stop dirty pages being transferred?


Sorry, I can't help here. We have block layer people in CC, they might
be able to advise.


>
> Thanks
> Chakri
>
>
> On 8/21/24, 6:56 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>> wrote:
>
>
> !-------------------------------------------------------------------|
> This Message Is From an External Sender
> This message came from outside your organization.
> |-------------------------------------------------------------------!
>
>
> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>> writes:
>
>
>> Hello,
>>
>> I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.
>>
>> Migration state switches to pre-switchover state during the RAM migration.
>>
>> My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
>> Closing prematurely NBD connection results in BLOCK JOB error.
>> In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.
>>
>> Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?
>>
>> Thanks
>> Chakri
>
>
> Hi, I believe it was you who opened this bug as well? 
>
>
> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> 
>
>
> So the core of the issue here is that the block job is transitioning to
> ready while the migration is still ongoing so there's still dirtying
> happening.
>
>
> Have you looked at the documentation at
> docs/interop/live-block-operations.rst? Section "QMP invocation for live
> storage migration with ``drive-mirror`` + NBD", point 4 says that a
> block-job-cancel should be issues after BLOCK_JOB_READY is
> reached. Although there is mention of when the migration should be
> performed.
>
>
>
> diff --git a/block/mirror.c b/block/mirror.c
> index 251adc5ae..3457afe1d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1089,6 +1089,10 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
> break;
> }
> 
> + if (cnt != 0 && job_is_ready(&s->common.job)) {
> + job_transition_to_running(&s->common.job);
> + }
> +
> if (job_is_ready(&s->common.job) && !should_complete) {
> delay_ns = (s->in_flight == 0 &&
> cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index e502787dd..87dbef0d2 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -641,6 +641,12 @@ int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
> */
> void job_early_fail(Job *job);
> 
> +/**
> + * Moves the @job from RUNNING to READY.
> + * Called with job_mutex *not* held.
> + */
> +void job_transition_to_running(Job *job);
> +
> /**
> * Moves the @job from RUNNING to READY.
> * Called with job_mutex *not* held.
> diff --git a/job.c b/job.c
> index 72d57f093..298d90817 100644
> --- a/job.c
> +++ b/job.c
> @@ -62,7 +62,7 @@ bool JobSTT[JOB_STATUS__MAX][JOB_STATUS__MAX] = {
> /* C: */ [JOB_STATUS_CREATED] = {0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1},
> /* R: */ [JOB_STATUS_RUNNING] = {0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0},
> /* P: */ [JOB_STATUS_PAUSED] = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
> - /* Y: */ [JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0},
> + /* Y: */ [JOB_STATUS_READY] = {0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0},
> /* S: */ [JOB_STATUS_STANDBY] = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
> /* W: */ [JOB_STATUS_WAITING] = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0},
> /* D: */ [JOB_STATUS_PENDING] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
> @@ -1035,6 +1035,12 @@ static int job_transition_to_pending_locked(Job *job)
> return 0;
> }
> 
> +void job_transition_to_running(Job *job)
> +{
> + JOB_LOCK_GUARD();
> + job_state_transition_locked(job, JOB_STATUS_RUNNING);
> +}
> +
> void job_transition_to_ready(Job *job)
> {
> JOB_LOCK_GUARD();




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-23 13:30       ` Arisetty, Chakri
@ 2024-08-23 13:41         ` Arisetty, Chakri
  2024-08-23 14:42           ` Fabiano Rosas
  0 siblings, 1 reply; 10+ messages in thread
From: Arisetty, Chakri @ 2024-08-23 13:41 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel@nongnu.org, qemu-block@nongnu.org
  Cc: Peter Xu, Kevin Wolf, Eric Blake, Blew III, Will, Massry, Abraham,
	Tottenham, Max, Greve, Mark

Hello,

Here is more data if my earlier mail did not provide enough details. I apologize for not providing the critical data points in my previous mail.

- Created a file (dd if=/dev/urandom of=/orig.img bs=1M count=1000) before starting live migration
- Started migration with block-job-cancel command before entering into pre-switchover
- During the RAM migration, I copied the original file to new file (dd of=/migration.img if=/orig.img bs=1M count=1000)
- During the RAM migration, I also started stress-ng (stress-ng --hdd 10 --hdd-bytes 4G -i 8 -t 72000s)
- Issued sync command to flush the new buffer contents into the disk. VM stalled completely
- Migration was completed successfully
- Rebooted the VM and checked for the file (/migration.img). The file does not exist. So, block device contents are NOT synced.

So, we have a potential for customer data loss. This is the problem we currently have.

Can someone advice?

Thanks
Chakri


On 8/23/24, 6:30 AM, "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com>> wrote:


Hi,


Thank you once again!


> It's still not entirely clear to me what the situation is here. When the
> migration reaches pre-switchover state the VM is stopped, so there would
> be no more IO happening. Is this a matter of a race condition (of sorts)
> because pre-switchover happens while the block mirror job is still
> transferring the final blocks? Or is it instead about the data being in
> traffic over the netword and not yet reaching the destination machine?


When the migration reaches to pre-switchover with block-job-cancelled, there are no dirty blocks, But, there are dirty blocks if the block-job is NOT cancelled
and there are dirty blocks, and those blocks are transferred to NBD server.


# When the block mirror job is running before enter pre-switchover state, the dirty count is '0' and job entered into 'ready' state from 'running' state.
# block-job-cancel is NOT issued with the test.
1695226@1724348063.794485:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 0 active_write_bytes_in_flight 0 total 5368709120 current 5368709120 deltla 1630 iostatus 0


1695226@1724348063.795152:job_state_transition job 0x55e5b9ffbe40 (ret: 0) attempting allowed transition (running-->ready)


# QMP command for 'query-block-jobs'
1695226@1724348063.845789:qmp_exit_query_block_jobs [{"auto-finalize": true, "io-status": "ok", "device": "drive-scsi-disk-0", "auto-dismiss": true, "busy": false, "len": 5368709120, "offset": 5368709120, "status": "ready", "paused": false, "speed": 100000000, "ready": true, "type": "mirror"}] 1


# RAM migration enters 'pre-switchover', dirty count keeps incrementing and NBD client sending the block pages to NBD server.


1695226@1724348070.968831:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 131072 active_write_bytes_in_flight 0 total 5368840192 current 5368709120 deltla 950 iostatus 0
...
1695226@1724348070.970540:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 2162688 active_write_bytes_in_flight 0 total 5371002880 current 5368840192 deltla 1547585 iostatus 0
..


RAM migration to enter 'completion' state from 'pre-switchover' takes a very long time for VM with bigger RAM. Stopping/Cancelling block-job during the period causes the disk contents to be lost entire duration.


Is there a way or API/callback in qemu to indicate there are no dirty blocks and invoke the API from RAM migration code?


I'd appreciate if anyone can help me with it.


Thanks
Chakri




On 8/22/24, 6:47 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>> wrote:




!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!




"Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>> writes:




Ugh, it seems I messed up the CC addresses, let's see if this time they
go out right. For those new to the thread, we're discussing this bug:




https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$>> 




> Hi,
>
> Thank you for getting back to me.
>
> Yes, I have opened the ticket https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>>
>
>> So the core of the issue here is that the block job is transitioning to
>> ready while the migration is still ongoing so there's still dirtying
>> happening.
>
> Yes, this is the problem I have. RAM migration state is already moved to pre-switchover and mirror block job is moved to "READY" state assuming that there are no more dirty blocks.
> But there are still dirty blocks and these dirty block blocks are being transferred to destination host.
>
> I've created a small patch(attached) in mirror.c to put the mirror job back into the "RUNNING" state if there are any dirty pages.
> But I still would like to prevent RAM migration state to be moved to pre-switchover when there are dirty blocks.




It's still not entirely clear to me what the situation is here. When the
migration reaches pre-switchover state the VM is stopped, so there would
be no more IO happening. Is this a matter of a race condition (of sorts)
because pre-switchover happens while the block mirror job is still
transferring the final blocks? Or is it instead about the data being in
traffic over the netword and not yet reaching the destination machine?




Is the disk inactivation after the pre-switchover affecting this at all?




>
>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>> block-job-cancel should be issues after BLOCK_JOB_READY is
>> reached. Although there is mention of when the migration should be
>> performed.
>
> Thanks for the pointer, I've looked at this part (block-job-cancel). The problem is that QEMU on the source host is still transferring the dirty blocks.
> That is the reason I am trying to avoid moving RAM migration state to pre-switchover when there are any dirty pages.
>
> is there a way in QEMU to know if the disk transfer is completed and stop dirty pages being transferred?




Sorry, I can't help here. We have block layer people in CC, they might
be able to advise.




>
> Thanks
> Chakri
>
>
> On 8/21/24, 6:56 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>>> wrote:
>
>
> !-------------------------------------------------------------------|
> This Message Is From an External Sender
> This message came from outside your organization.
> |-------------------------------------------------------------------!
>
>
> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>>> writes:
>
>
>> Hello,
>>
>> I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.
>>
>> Migration state switches to pre-switchover state during the RAM migration.
>>
>> My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
>> Closing prematurely NBD connection results in BLOCK JOB error.
>> In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.
>>
>> Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?
>>
>> Thanks
>> Chakri
>
>
> Hi, I believe it was you who opened this bug as well? 
>
>
> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>> 
>
>
> So the core of the issue here is that the block job is transitioning to
> ready while the migration is still ongoing so there's still dirtying
> happening.
>
>
> Have you looked at the documentation at
> docs/interop/live-block-operations.rst? Section "QMP invocation for live
> storage migration with ``drive-mirror`` + NBD", point 4 says that a
> block-job-cancel should be issues after BLOCK_JOB_READY is
> reached. Although there is mention of when the migration should be
> performed.
>
>
>
> diff --git a/block/mirror.c b/block/mirror.c
> index 251adc5ae..3457afe1d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1089,6 +1089,10 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
> break;
> }
> 
> + if (cnt != 0 && job_is_ready(&s->common.job)) {
> + job_transition_to_running(&s->common.job);
> + }
> +
> if (job_is_ready(&s->common.job) && !should_complete) {
> delay_ns = (s->in_flight == 0 &&
> cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index e502787dd..87dbef0d2 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -641,6 +641,12 @@ int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
> */
> void job_early_fail(Job *job);
> 
> +/**
> + * Moves the @job from RUNNING to READY.
> + * Called with job_mutex *not* held.
> + */
> +void job_transition_to_running(Job *job);
> +
> /**
> * Moves the @job from RUNNING to READY.
> * Called with job_mutex *not* held.
> diff --git a/job.c b/job.c
> index 72d57f093..298d90817 100644
> --- a/job.c
> +++ b/job.c
> @@ -62,7 +62,7 @@ bool JobSTT[JOB_STATUS__MAX][JOB_STATUS__MAX] = {
> /* C: */ [JOB_STATUS_CREATED] = {0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1},
> /* R: */ [JOB_STATUS_RUNNING] = {0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0},
> /* P: */ [JOB_STATUS_PAUSED] = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
> - /* Y: */ [JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0},
> + /* Y: */ [JOB_STATUS_READY] = {0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0},
> /* S: */ [JOB_STATUS_STANDBY] = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
> /* W: */ [JOB_STATUS_WAITING] = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0},
> /* D: */ [JOB_STATUS_PENDING] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
> @@ -1035,6 +1035,12 @@ static int job_transition_to_pending_locked(Job *job)
> return 0;
> }
> 
> +void job_transition_to_running(Job *job)
> +{
> + JOB_LOCK_GUARD();
> + job_state_transition_locked(job, JOB_STATUS_RUNNING);
> +}
> +
> void job_transition_to_ready(Job *job)
> {
> JOB_LOCK_GUARD();










^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-23 13:41         ` Arisetty, Chakri
@ 2024-08-23 14:42           ` Fabiano Rosas
  2024-08-25 17:09             ` Arisetty, Chakri
  0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-08-23 14:42 UTC (permalink / raw)
  To: Arisetty, Chakri, qemu-devel@nongnu.org, qemu-block@nongnu.org
  Cc: Peter Xu, Kevin Wolf, Eric Blake, Blew III, Will, Massry, Abraham,
	Tottenham, Max, Greve, Mark

"Arisetty, Chakri" <carisett@akamai.com> writes:

> Hello,
>
> Here is more data if my earlier mail did not provide enough details. I apologize for not providing the critical data points in my previous mail.
>
> - Created a file (dd if=/dev/urandom of=/orig.img bs=1M count=1000) before starting live migration
> - Started migration with block-job-cancel command before entering into pre-switchover

Is this a type of migration that you have attempted before and it used
to work? Or is this the first time you're using the mirror job for live
migration?

I was expecting something like:

- start the mirror job
- qmp_migrate
- once PRE_SWITCHOVER is reached, issue block-job-cancel
- qmp_migrate_continue

To be clear, at this point I don't think probing the job status from the
migration code to wait for the job to finish is the right thing to
do. Let's first attempt to rule out any bugs or incorrect usage.

> - During the RAM migration, I copied the original file to new file (dd of=/migration.img if=/orig.img bs=1M count=1000)
> - During the RAM migration, I also started stress-ng (stress-ng --hdd 10 --hdd-bytes 4G -i 8 -t 72000s)
> - Issued sync command to flush the new buffer contents into the disk. VM stalled completely
> - Migration was completed successfully
> - Rebooted the VM and checked for the file (/migration.img). The file does not exist. So, block device contents are NOT synced.
>
> So, we have a potential for customer data loss. This is the problem we currently have.
>
> Can someone advice?
>
> Thanks
> Chakri
>
>
> On 8/23/24, 6:30 AM, "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com>> wrote:
>
>
> Hi,
>
>
> Thank you once again!
>
>
>> It's still not entirely clear to me what the situation is here. When the
>> migration reaches pre-switchover state the VM is stopped, so there would
>> be no more IO happening. Is this a matter of a race condition (of sorts)
>> because pre-switchover happens while the block mirror job is still
>> transferring the final blocks? Or is it instead about the data being in
>> traffic over the netword and not yet reaching the destination machine?
>
>
> When the migration reaches to pre-switchover with block-job-cancelled, there are no dirty blocks, But, there are dirty blocks if the block-job is NOT cancelled
> and there are dirty blocks, and those blocks are transferred to NBD server.
>
>
> # When the block mirror job is running before enter pre-switchover state, the dirty count is '0' and job entered into 'ready' state from 'running' state.
> # block-job-cancel is NOT issued with the test.
> 1695226@1724348063.794485:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 0 active_write_bytes_in_flight 0 total 5368709120 current 5368709120 deltla 1630 iostatus 0
>
>
> 1695226@1724348063.795152:job_state_transition job 0x55e5b9ffbe40 (ret: 0) attempting allowed transition (running-->ready)
>
>
> # QMP command for 'query-block-jobs'
> 1695226@1724348063.845789:qmp_exit_query_block_jobs [{"auto-finalize": true, "io-status": "ok", "device": "drive-scsi-disk-0", "auto-dismiss": true, "busy": false, "len": 5368709120, "offset": 5368709120, "status": "ready", "paused": false, "speed": 100000000, "ready": true, "type": "mirror"}] 1
>
>
> # RAM migration enters 'pre-switchover', dirty count keeps incrementing and NBD client sending the block pages to NBD server.
>
>
> 1695226@1724348070.968831:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 131072 active_write_bytes_in_flight 0 total 5368840192 current 5368709120 deltla 950 iostatus 0
> ...
> 1695226@1724348070.970540:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 2162688 active_write_bytes_in_flight 0 total 5371002880 current 5368840192 deltla 1547585 iostatus 0
> ..
>
>
> RAM migration to enter 'completion' state from 'pre-switchover' takes a very long time for VM with bigger RAM. Stopping/Cancelling block-job during the period causes the disk contents to be lost entire duration.
>
>
> Is there a way or API/callback in qemu to indicate there are no dirty blocks and invoke the API from RAM migration code?
>
>
> I'd appreciate if anyone can help me with it.
>
>
> Thanks
> Chakri
>
>
>
>
> On 8/22/24, 6:47 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>> wrote:
>
>
>
>
> !-------------------------------------------------------------------|
> This Message Is From an External Sender
> This message came from outside your organization.
> |-------------------------------------------------------------------!
>
>
>
>
> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>> writes:
>
>
>
>
> Ugh, it seems I messed up the CC addresses, let's see if this time they
> go out right. For those new to the thread, we're discussing this bug:
>
>
>
>
> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$>> 
>
>
>
>
>> Hi,
>>
>> Thank you for getting back to me.
>>
>> Yes, I have opened the ticket https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>>
>>
>>> So the core of the issue here is that the block job is transitioning to
>>> ready while the migration is still ongoing so there's still dirtying
>>> happening.
>>
>> Yes, this is the problem I have. RAM migration state is already moved to pre-switchover and mirror block job is moved to "READY" state assuming that there are no more dirty blocks.
>> But there are still dirty blocks and these dirty block blocks are being transferred to destination host.
>>
>> I've created a small patch(attached) in mirror.c to put the mirror job back into the "RUNNING" state if there are any dirty pages.
>> But I still would like to prevent RAM migration state to be moved to pre-switchover when there are dirty blocks.
>
>
>
>
> It's still not entirely clear to me what the situation is here. When the
> migration reaches pre-switchover state the VM is stopped, so there would
> be no more IO happening. Is this a matter of a race condition (of sorts)
> because pre-switchover happens while the block mirror job is still
> transferring the final blocks? Or is it instead about the data being in
> traffic over the netword and not yet reaching the destination machine?
>
>
>
>
> Is the disk inactivation after the pre-switchover affecting this at all?
>
>
>
>
>>
>>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>>> block-job-cancel should be issues after BLOCK_JOB_READY is
>>> reached. Although there is mention of when the migration should be
>>> performed.
>>
>> Thanks for the pointer, I've looked at this part (block-job-cancel). The problem is that QEMU on the source host is still transferring the dirty blocks.
>> That is the reason I am trying to avoid moving RAM migration state to pre-switchover when there are any dirty pages.
>>
>> is there a way in QEMU to know if the disk transfer is completed and stop dirty pages being transferred?
>
>
>
>
> Sorry, I can't help here. We have block layer people in CC, they might
> be able to advise.
>
>
>
>
>>
>> Thanks
>> Chakri
>>
>>
>> On 8/21/24, 6:56 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>>> wrote:
>>
>>
>> !-------------------------------------------------------------------|
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> |-------------------------------------------------------------------!
>>
>>
>> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>>> writes:
>>
>>
>>> Hello,
>>>
>>> I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.
>>>
>>> Migration state switches to pre-switchover state during the RAM migration.
>>>
>>> My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
>>> Closing prematurely NBD connection results in BLOCK JOB error.
>>> In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.
>>>
>>> Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?
>>>
>>> Thanks
>>> Chakri
>>
>>
>> Hi, I believe it was you who opened this bug as well? 
>>
>>
>> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>> 
>>
>>
>> So the core of the issue here is that the block job is transitioning to
>> ready while the migration is still ongoing so there's still dirtying
>> happening.
>>
>>
>> Have you looked at the documentation at
>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>> block-job-cancel should be issues after BLOCK_JOB_READY is
>> reached. Although there is mention of when the migration should be
>> performed.
>>
>>
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 251adc5ae..3457afe1d 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1089,6 +1089,10 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>> break;
>> }
>> 
>> + if (cnt != 0 && job_is_ready(&s->common.job)) {
>> + job_transition_to_running(&s->common.job);
>> + }
>> +
>> if (job_is_ready(&s->common.job) && !should_complete) {
>> delay_ns = (s->in_flight == 0 &&
>> cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index e502787dd..87dbef0d2 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -641,6 +641,12 @@ int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
>> */
>> void job_early_fail(Job *job);
>> 
>> +/**
>> + * Moves the @job from RUNNING to READY.
>> + * Called with job_mutex *not* held.
>> + */
>> +void job_transition_to_running(Job *job);
>> +
>> /**
>> * Moves the @job from RUNNING to READY.
>> * Called with job_mutex *not* held.
>> diff --git a/job.c b/job.c
>> index 72d57f093..298d90817 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -62,7 +62,7 @@ bool JobSTT[JOB_STATUS__MAX][JOB_STATUS__MAX] = {
>> /* C: */ [JOB_STATUS_CREATED] = {0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1},
>> /* R: */ [JOB_STATUS_RUNNING] = {0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0},
>> /* P: */ [JOB_STATUS_PAUSED] = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
>> - /* Y: */ [JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0},
>> + /* Y: */ [JOB_STATUS_READY] = {0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0},
>> /* S: */ [JOB_STATUS_STANDBY] = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
>> /* W: */ [JOB_STATUS_WAITING] = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0},
>> /* D: */ [JOB_STATUS_PENDING] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
>> @@ -1035,6 +1035,12 @@ static int job_transition_to_pending_locked(Job *job)
>> return 0;
>> }
>> 
>> +void job_transition_to_running(Job *job)
>> +{
>> + JOB_LOCK_GUARD();
>> + job_state_transition_locked(job, JOB_STATUS_RUNNING);
>> +}
>> +
>> void job_transition_to_ready(Job *job)
>> {
>> JOB_LOCK_GUARD();


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-23 14:42           ` Fabiano Rosas
@ 2024-08-25 17:09             ` Arisetty, Chakri
  2024-08-26 12:04               ` Prasad Pandit
  0 siblings, 1 reply; 10+ messages in thread
From: Arisetty, Chakri @ 2024-08-25 17:09 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel@nongnu.org, qemu-block@nongnu.org
  Cc: Peter Xu, Kevin Wolf, Eric Blake, Blew III, Will, Massry, Abraham,
	Tottenham, Max, Greve, Mark


Hello,

> Is this a type of migration that you have attempted before and it used
> to work? Or is this the first time you're using the mirror job for live
> migration?

We have been using live migration for a quite some time (about 5 years) and this issue has been present for many years.
It is not something that is broken recently.

> I was expecting something like:
> 
> 
> - start the mirror job
> - qmp_migrate
> - once PRE_SWITCHOVER is reached, issue block-job-cancel
> - qmp_migrate_continue

We use exact same steps to do live migration. I repeated the test now

> 
> To be clear, at this point I don't think probing the job status from the
> migration code to wait for the job to finish is the right thing to
> do. Let's first attempt to rule out any bugs or incorrect usage.

Sure, as you suggested to rule out any incorrect usage, I repeated the test with above steps.
once RAM migration state moved to pre-switchover, issued block-job-cancel. There are no more dirty blocks.
But all the disk writes from 'pre-switchover' state to 'complete' state are lost.
Thus, it is creating loss of customer data.

From Block job cancel (1724604086.815220) to RAM migration completion (1724604103.950049) took 17 seconds.
The disk contents were lost for duration of 17 seconds. Yes,  It also depends the network bandwidth during the time.

Here is the trace information for the issue.
-------------------------------------------------------
# Mirror Block job is completed. Entering
2710491@1724604083.474928:job_state_transition job 0x5623bb524ef0 (ret: 0) attempting allowed transition (running-->ready)

...
# NBD flush request is sent
230140@1724604084.808079:nbd_send_request Sending request to server: { .from = 0, .len = 0, .handle = 94711463923264, .flags = 0x0, .type = 3 (flush) }
230140@1724604084.808658:nbd_receive_simple_reply Got simple reply: { .error = 0 (success), handle = 94711463923264 }
...
2710491@1724604085.009137:mirror_run <  s 0x5623bb524ef0 in_flight: 0 bytes_in_flight: 0 dirty count 0 active_write_bytes_in_flight 0 total 5411962880 current 5411962880 deltla 1360 iostatus 0
2710491@1724604085.009144:mirror_before_flush s 0x5623bb524ef0
2710491@1724604085.009149:mirror_before_sleep s 0x5623bb524ef0 dirty count 0 synced 1 delay 100000000ns
...

# RAM migration state moved to pre-switchover
2710491@1724604084.811015:qmp_exit_query_migrate {"expected-downtime": 803, "compression": {"compression-rate": 1.2139466235218377, "busy": 0, "compressed-size": 2997148660, "pages": 888276, "busy-rate": 0}, "xbzrle-cache": {"encoding-rate": 0, "bytes": 17798, "cache-size": 67108864, "cache-miss-rate": 0, "pages": 519, "overflow": 0, "cache-miss": 12285}, "status": "pre-switchover", "setup-time": 9, "total-time": 29787, "ram": {"total": 17197506560, "postcopy-requests": 0, "dirty-sync-count": 4, "multifd-bytes": 0, "pages-per-second": 16320, "downtime-bytes": 0, "page-size": 4096, "remaining": 2764800, "postcopy-bytes": 0, "mbps": 516.37569811320759, "transferred": 3084518925, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 3084518925, "duplicate": 3310849, "dirty-pages-rate": 436, "skipped": 0, "normal-bytes": 50343936, "normal": 12291}} 1

# Block job cancel is issued after RAM migration state reached to pre-switchover
2710491@1724604086.815220:job_apply_verb job 0x5623bb524ef0 in state ready; applying verb cancel (allowed)
2710491@1724604086.815236:mirror_yield s 0x5623bb524ef0 dirty count 0 free buffers 256 in_flight 0
2710491@1724604086.815240:mirror_run <  s 0x5623bb524ef0 in_flight: 0 bytes_in_flight: 0 dirty count 0 active_write_bytes_in_flight 0 total 5411962880 current 5411962880 deltla 790 iostatus 0
2710491@1724604086.815243:mirror_before_flush s 0x5623bb524ef0
2710491@1724604086.815247:qmp_exit_block_job_cancel {} 1
...

2710491@1724604086.815292:nbd_send_request Sending request to server: { .from = 0, .len = 0, .handle = 94711463923264, .flags = 0x0, .type = 3 (flush) }
2710491@1724604086.815792:nbd_receive_simple_reply Got simple reply: { .error = 0 (success), handle = 94711463923264 }
2710491@1724604086.818617:job_completed job 0x5623bb524ef0 ret 0
...

# Migration completed successfully, Writing the disk is performed during the window.
2710491@1724604086.818629:job_state_transition job 0x5623bb524ef0 (ret: 0) attempting allowed transition (ready-->waiting)
2710491@1724604086.818714:job_state_transition job 0x5623bb524ef0 (ret: 0) attempting allowed transition (waiting-->pending)
2710491@1724604086.818974:job_state_transition job 0x5623bb524ef0 (ret: 0) attempting allowed transition (pending-->concluded)
2710491@1724604086.819004:job_state_transition job 0x5623bb524ef0 (ret: 0) attempting allowed transition (concluded-->null)
2710491@1724604087.596415:qmp_exit_query_block_jobs [] 1
2710491@1724604089.542782:qmp_exit_query_migrate {"expected-downtime": 803, "compression": {"compression-rate": 1.2139466235218377, "busy": 0, "compressed-size": 2997148660, "pages": 888276, "busy-rate": 0}, "xbzrle-cache": {"encoding-rate": 0, "bytes": 17798, "cache-size": 67108864, "cache-miss-rate": 0, "pages": 519, "overflow": 0, "cache-miss": 12285}, "status": "device", "setup-time": 9, "total-time": 34518, "ram": {"total": 17197506560, "postcopy-requests": 0, "dirty-sync-count": 4, "multifd-bytes": 0, "pages-per-second": 16320, "downtime-bytes": 0, "page-size": 4096, "remaining": 2764800, "postcopy-bytes": 0, "mbps": 516.37569811320759, "transferred": 3084518925, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 3084518925, "duplicate": 3310849, "dirty-pages-rate": 436, "skipped": 0, "normal-bytes": 50343936, "normal": 12291}} 1
2710491@1724604103.950049:qmp_exit_query_migrate {"compression": {"compression-rate": 1.2139466235218377, "busy": 0, "compressed-size": 2997148660, "pages": 888276, "busy-rate": 0}, "xbzrle-cache": {"encoding-rate": 119.44173502640746, "bytes": 28901, "cache-size": 67108864, "cache-miss-rate": 0.94703977798334871, "pages": 1021, "overflow": 0, "cache-miss": 12474}, "status": "completed", "setup-time": 9, "downtime": 5277, "total-time": 34535, "ram": {"total": 17197506560, "postcopy-requests": 0, "dirty-sync-count": 5, "multifd-bytes": 0, "pages-per-second": 16320, "downtime-bytes": 788359, "page-size": 4096, "remaining": 0, "postcopy-bytes": 0, "mbps": 714.95416184904138, "transferred": 3085307284, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 3084518925, "duplicate": 3310849, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 51118080, "normal": 12480}} 1



Thanks,
Chakri

> - During the RAM migration, I copied the original file to new file (dd of=/migration.img if=/orig.img bs=1M count=1000)
> - During the RAM migration, I also started stress-ng (stress-ng --hdd 10 --hdd-bytes 4G -i 8 -t 72000s)
> - Issued sync command to flush the new buffer contents into the disk. VM stalled completely
> - Migration was completed successfully
> - Rebooted the VM and checked for the file (/migration.img). The file does not exist. So, block device contents are NOT synced.
>
> So, we have a potential for customer data loss. This is the problem we currently have.
>
> Can someone advice?
>
> Thanks
> Chakri
>
>
> On 8/23/24, 6:30 AM, "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>> wrote:
>
>
> Hi,
>
>
> Thank you once again!
>
>
>> It's still not entirely clear to me what the situation is here. When the
>> migration reaches pre-switchover state the VM is stopped, so there would
>> be no more IO happening. Is this a matter of a race condition (of sorts)
>> because pre-switchover happens while the block mirror job is still
>> transferring the final blocks? Or is it instead about the data being in
>> traffic over the netword and not yet reaching the destination machine?
>
>
> When the migration reaches to pre-switchover with block-job-cancelled, there are no dirty blocks, But, there are dirty blocks if the block-job is NOT cancelled
> and there are dirty blocks, and those blocks are transferred to NBD server.
>
>
> # When the block mirror job is running before enter pre-switchover state, the dirty count is '0' and job entered into 'ready' state from 'running' state.
> # block-job-cancel is NOT issued with the test.
> 1695226@1724348063.794485:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 0 active_write_bytes_in_flight 0 total 5368709120 current 5368709120 deltla 1630 iostatus 0
>
>
> 1695226@1724348063.795152:job_state_transition job 0x55e5b9ffbe40 (ret: 0) attempting allowed transition (running-->ready)
>
>
> # QMP command for 'query-block-jobs'
> 1695226@1724348063.845789:qmp_exit_query_block_jobs [{"auto-finalize": true, "io-status": "ok", "device": "drive-scsi-disk-0", "auto-dismiss": true, "busy": false, "len": 5368709120, "offset": 5368709120, "status": "ready", "paused": false, "speed": 100000000, "ready": true, "type": "mirror"}] 1
>
>
> # RAM migration enters 'pre-switchover', dirty count keeps incrementing and NBD client sending the block pages to NBD server.
>
>
> 1695226@1724348070.968831:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 131072 active_write_bytes_in_flight 0 total 5368840192 current 5368709120 deltla 950 iostatus 0
> ...
> 1695226@1724348070.970540:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 2162688 active_write_bytes_in_flight 0 total 5371002880 current 5368840192 deltla 1547585 iostatus 0
> ..
>
>
> RAM migration to enter 'completion' state from 'pre-switchover' takes a very long time for VM with bigger RAM. Stopping/Cancelling block-job during the period causes the disk contents to be lost entire duration.
>
>
> Is there a way or API/callback in qemu to indicate there are no dirty blocks and invoke the API from RAM migration code?
>
>
> I'd appreciate if anyone can help me with it.
>
>
> Thanks
> Chakri
>
>
>
>
> On 8/22/24, 6:47 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>>> wrote:
>
>
>
>
> !-------------------------------------------------------------------|
> This Message Is From an External Sender
> This message came from outside your organization.
> |-------------------------------------------------------------------!
>
>
>
>
> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>>> writes:
>
>
>
>
> Ugh, it seems I messed up the CC addresses, let's see if this time they
> go out right. For those new to the thread, we're discussing this bug:
>
>
>
>
> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$>>> 
>
>
>
>
>> Hi,
>>
>> Thank you for getting back to me.
>>
>> Yes, I have opened the ticket https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>>>
>>
>>> So the core of the issue here is that the block job is transitioning to
>>> ready while the migration is still ongoing so there's still dirtying
>>> happening.
>>
>> Yes, this is the problem I have. RAM migration state is already moved to pre-switchover and mirror block job is moved to "READY" state assuming that there are no more dirty blocks.
>> But there are still dirty blocks and these dirty block blocks are being transferred to destination host.
>>
>> I've created a small patch(attached) in mirror.c to put the mirror job back into the "RUNNING" state if there are any dirty pages.
>> But I still would like to prevent RAM migration state to be moved to pre-switchover when there are dirty blocks.
>
>
>
>
> It's still not entirely clear to me what the situation is here. When the
> migration reaches pre-switchover state the VM is stopped, so there would
> be no more IO happening. Is this a matter of a race condition (of sorts)
> because pre-switchover happens while the block mirror job is still
> transferring the final blocks? Or is it instead about the data being in
> traffic over the netword and not yet reaching the destination machine?
>
>
>
>
> Is the disk inactivation after the pre-switchover affecting this at all?
>
>
>
>
>>
>>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>>> block-job-cancel should be issues after BLOCK_JOB_READY is
>>> reached. Although there is mention of when the migration should be
>>> performed.
>>
>> Thanks for the pointer, I've looked at this part (block-job-cancel). The problem is that QEMU on the source host is still transferring the dirty blocks.
>> That is the reason I am trying to avoid moving RAM migration state to pre-switchover when there are any dirty pages.
>>
>> is there a way in QEMU to know if the disk transfer is completed and stop dirty pages being transferred?
>
>
>
>
> Sorry, I can't help here. We have block layer people in CC, they might
> be able to advise.
>
>
>
>
>>
>> Thanks
>> Chakri
>>
>>
>> On 8/21/24, 6:56 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>>>> wrote:
>>
>>
>> !-------------------------------------------------------------------|
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> |-------------------------------------------------------------------!
>>
>>
>> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>>>> writes:
>>
>>
>>> Hello,
>>>
>>> I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.
>>>
>>> Migration state switches to pre-switchover state during the RAM migration.
>>>
>>> My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
>>> Closing prematurely NBD connection results in BLOCK JOB error.
>>> In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.
>>>
>>> Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?
>>>
>>> Thanks
>>> Chakri
>>
>>
>> Hi, I believe it was you who opened this bug as well? 
>>
>>
>> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>>> 
>>
>>
>> So the core of the issue here is that the block job is transitioning to
>> ready while the migration is still ongoing so there's still dirtying
>> happening.
>>
>>
>> Have you looked at the documentation at
>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>> block-job-cancel should be issues after BLOCK_JOB_READY is
>> reached. Although there is mention of when the migration should be
>> performed.
>>
>>
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 251adc5ae..3457afe1d 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1089,6 +1089,10 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>> break;
>> }
>> 
>> + if (cnt != 0 && job_is_ready(&s->common.job)) {
>> + job_transition_to_running(&s->common.job);
>> + }
>> +
>> if (job_is_ready(&s->common.job) && !should_complete) {
>> delay_ns = (s->in_flight == 0 &&
>> cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index e502787dd..87dbef0d2 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -641,6 +641,12 @@ int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
>> */
>> void job_early_fail(Job *job);
>> 
>> +/**
>> + * Moves the @job from RUNNING to READY.
>> + * Called with job_mutex *not* held.
>> + */
>> +void job_transition_to_running(Job *job);
>> +
>> /**
>> * Moves the @job from RUNNING to READY.
>> * Called with job_mutex *not* held.
>> diff --git a/job.c b/job.c
>> index 72d57f093..298d90817 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -62,7 +62,7 @@ bool JobSTT[JOB_STATUS__MAX][JOB_STATUS__MAX] = {
>> /* C: */ [JOB_STATUS_CREATED] = {0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1},
>> /* R: */ [JOB_STATUS_RUNNING] = {0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0},
>> /* P: */ [JOB_STATUS_PAUSED] = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
>> - /* Y: */ [JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0},
>> + /* Y: */ [JOB_STATUS_READY] = {0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0},
>> /* S: */ [JOB_STATUS_STANDBY] = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
>> /* W: */ [JOB_STATUS_WAITING] = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0},
>> /* D: */ [JOB_STATUS_PENDING] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
>> @@ -1035,6 +1035,12 @@ static int job_transition_to_pending_locked(Job *job)
>> return 0;
>> }
>> 
>> +void job_transition_to_running(Job *job)
>> +{
>> + JOB_LOCK_GUARD();
>> + job_state_transition_locked(job, JOB_STATUS_RUNNING);
>> +}
>> +
>> void job_transition_to_ready(Job *job)
>> {
>> JOB_LOCK_GUARD();






On 8/23/24, 7:42 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de>> wrote:


!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!


"Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com>> writes:


> Hello,
>
> Here is more data if my earlier mail did not provide enough details. I apologize for not providing the critical data points in my previous mail.
>
> - Created a file (dd if=/dev/urandom of=/orig.img bs=1M count=1000) before starting live migration
> - Started migration with block-job-cancel command before entering into pre-switchover


Is this a type of migration that you have attempted before and it used
to work? Or is this the first time you're using the mirror job for live
migration?


I was expecting something like:


- start the mirror job
- qmp_migrate
- once PRE_SWITCHOVER is reached, issue block-job-cancel
- qmp_migrate_continue


To be clear, at this point I don't think probing the job status from the
migration code to wait for the job to finish is the right thing to
do. Let's first attempt to rule out any bugs or incorrect usage.


> - During the RAM migration, I copied the original file to new file (dd of=/migration.img if=/orig.img bs=1M count=1000)
> - During the RAM migration, I also started stress-ng (stress-ng --hdd 10 --hdd-bytes 4G -i 8 -t 72000s)
> - Issued sync command to flush the new buffer contents into the disk. VM stalled completely
> - Migration was completed successfully
> - Rebooted the VM and checked for the file (/migration.img). The file does not exist. So, block device contents are NOT synced.
>
> So, we have a potential for customer data loss. This is the problem we currently have.
>
> Can someone advice?
>
> Thanks
> Chakri
>
>
> On 8/23/24, 6:30 AM, "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>> wrote:
>
>
> Hi,
>
>
> Thank you once again!
>
>
>> It's still not entirely clear to me what the situation is here. When the
>> migration reaches pre-switchover state the VM is stopped, so there would
>> be no more IO happening. Is this a matter of a race condition (of sorts)
>> because pre-switchover happens while the block mirror job is still
>> transferring the final blocks? Or is it instead about the data being in
>> traffic over the netword and not yet reaching the destination machine?
>
>
> When the migration reaches to pre-switchover with block-job-cancelled, there are no dirty blocks, But, there are dirty blocks if the block-job is NOT cancelled
> and there are dirty blocks, and those blocks are transferred to NBD server.
>
>
> # When the block mirror job is running before enter pre-switchover state, the dirty count is '0' and job entered into 'ready' state from 'running' state.
> # block-job-cancel is NOT issued with the test.
> 1695226@1724348063.794485:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 0 active_write_bytes_in_flight 0 total 5368709120 current 5368709120 deltla 1630 iostatus 0
>
>
> 1695226@1724348063.795152:job_state_transition job 0x55e5b9ffbe40 (ret: 0) attempting allowed transition (running-->ready)
>
>
> # QMP command for 'query-block-jobs'
> 1695226@1724348063.845789:qmp_exit_query_block_jobs [{"auto-finalize": true, "io-status": "ok", "device": "drive-scsi-disk-0", "auto-dismiss": true, "busy": false, "len": 5368709120, "offset": 5368709120, "status": "ready", "paused": false, "speed": 100000000, "ready": true, "type": "mirror"}] 1
>
>
> # RAM migration enters 'pre-switchover', dirty count keeps incrementing and NBD client sending the block pages to NBD server.
>
>
> 1695226@1724348070.968831:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 131072 active_write_bytes_in_flight 0 total 5368840192 current 5368709120 deltla 950 iostatus 0
> ...
> 1695226@1724348070.970540:mirror_run < s 0x55e5b9ffbe40 in_flight: 0 bytes_in_flight: 0 dirty count 2162688 active_write_bytes_in_flight 0 total 5371002880 current 5368840192 deltla 1547585 iostatus 0
> ..
>
>
> RAM migration to enter 'completion' state from 'pre-switchover' takes a very long time for VM with bigger RAM. Stopping/Cancelling block-job during the period causes the disk contents to be lost entire duration.
>
>
> Is there a way or API/callback in qemu to indicate there are no dirty blocks and invoke the API from RAM migration code?
>
>
> I'd appreciate if anyone can help me with it.
>
>
> Thanks
> Chakri
>
>
>
>
> On 8/22/24, 6:47 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>>> wrote:
>
>
>
>
> !-------------------------------------------------------------------|
> This Message Is From an External Sender
> This message came from outside your organization.
> |-------------------------------------------------------------------!
>
>
>
>
> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>>> writes:
>
>
>
>
> Ugh, it seems I messed up the CC addresses, let's see if this time they
> go out right. For those new to the thread, we're discussing this bug:
>
>
>
>
> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!R6zmb2ufwcYOXxYJf4aaguOQTMFPQZ0ErAQ0ekFW2yr8pLLIFJF1mw_uQnwBSdKtUuJad2phm7sE4ME$>>> 
>
>
>
>
>> Hi,
>>
>> Thank you for getting back to me.
>>
>> Yes, I have opened the ticket https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>>>
>>
>>> So the core of the issue here is that the block job is transitioning to
>>> ready while the migration is still ongoing so there's still dirtying
>>> happening.
>>
>> Yes, this is the problem I have. RAM migration state is already moved to pre-switchover and mirror block job is moved to "READY" state assuming that there are no more dirty blocks.
>> But there are still dirty blocks and these dirty block blocks are being transferred to destination host.
>>
>> I've created a small patch(attached) in mirror.c to put the mirror job back into the "RUNNING" state if there are any dirty pages.
>> But I still would like to prevent RAM migration state to be moved to pre-switchover when there are dirty blocks.
>
>
>
>
> It's still not entirely clear to me what the situation is here. When the
> migration reaches pre-switchover state the VM is stopped, so there would
> be no more IO happening. Is this a matter of a race condition (of sorts)
> because pre-switchover happens while the block mirror job is still
> transferring the final blocks? Or is it instead about the data being in
> traffic over the netword and not yet reaching the destination machine?
>
>
>
>
> Is the disk inactivation after the pre-switchover affecting this at all?
>
>
>
>
>>
>>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>>> block-job-cancel should be issues after BLOCK_JOB_READY is
>>> reached. Although there is mention of when the migration should be
>>> performed.
>>
>> Thanks for the pointer, I've looked at this part (block-job-cancel). The problem is that QEMU on the source host is still transferring the dirty blocks.
>> That is the reason I am trying to avoid moving RAM migration state to pre-switchover when there are any dirty pages.
>>
>> is there a way in QEMU to know if the disk transfer is completed and stop dirty pages being transferred?
>
>
>
>
> Sorry, I can't help here. We have block layer people in CC, they might
> be able to advise.
>
>
>
>
>>
>> Thanks
>> Chakri
>>
>>
>> On 8/21/24, 6:56 AM, "Fabiano Rosas" <farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>> <mailto:farosas@suse.de <mailto:farosas@suse.de> <mailto:farosas@suse.de <mailto:farosas@suse.de>>>>> wrote:
>>
>>
>> !-------------------------------------------------------------------|
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> |-------------------------------------------------------------------!
>>
>>
>> "Arisetty, Chakri" <carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>> <mailto:carisett@akamai.com <mailto:carisett@akamai.com> <mailto:carisett@akamai.com <mailto:carisett@akamai.com>>>>> writes:
>>
>>
>>> Hello,
>>>
>>> I’m having trouble with live migration and I’m using QEMU 7.2.0 on Debian 11.
>>>
>>> Migration state switches to pre-switchover state during the RAM migration.
>>>
>>> My assumption is that disks are already migrated and there are no further dirty pages to be transferred from source host to destination host. Therefore, NBD client on the source host closes the connection to the NBD server on the destination host. But we observe that there are still some dirty pages being transferred.
>>> Closing prematurely NBD connection results in BLOCK JOB error.
>>> In the RAM migration code (migration/migration.c), I’d like to check for block mirror job’s status before RAM migration state is moved to pre-switchover. I’m unable to find any block job related code in RAM migration code.
>>>
>>> Could someone help me figuring out what might be going wrong or suggest any troubleshooting steps or advice to get around the issue?
>>>
>>> Thanks
>>> Chakri
>>
>>
>> Hi, I believe it was you who opened this bug as well? 
>>
>>
>> https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$> <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$ <https://urldefense.com/v3/__https://gitlab.com/qemu-project/qemu/-/issues/2482__;!!GjvTz_vk!SCg-a5LiuAGlWyQ6Opd9urNAW4_Z-tUtzPZARWB1d3Ulg_ws87yL3iJcxuZPktLeHNNtPztJTJZNJdE$>>>> 
>>
>>
>> So the core of the issue here is that the block job is transitioning to
>> ready while the migration is still ongoing so there's still dirtying
>> happening.
>>
>>
>> Have you looked at the documentation at
>> docs/interop/live-block-operations.rst? Section "QMP invocation for live
>> storage migration with ``drive-mirror`` + NBD", point 4 says that a
>> block-job-cancel should be issues after BLOCK_JOB_READY is
>> reached. Although there is mention of when the migration should be
>> performed.
>>
>>
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 251adc5ae..3457afe1d 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1089,6 +1089,10 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
>> break;
>> }
>> 
>> + if (cnt != 0 && job_is_ready(&s->common.job)) {
>> + job_transition_to_running(&s->common.job);
>> + }
>> +
>> if (job_is_ready(&s->common.job) && !should_complete) {
>> delay_ns = (s->in_flight == 0 &&
>> cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index e502787dd..87dbef0d2 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -641,6 +641,12 @@ int job_apply_verb_locked(Job *job, JobVerb verb, Error **errp);
>> */
>> void job_early_fail(Job *job);
>> 
>> +/**
>> + * Moves the @job from RUNNING to READY.
>> + * Called with job_mutex *not* held.
>> + */
>> +void job_transition_to_running(Job *job);
>> +
>> /**
>> * Moves the @job from RUNNING to READY.
>> * Called with job_mutex *not* held.
>> diff --git a/job.c b/job.c
>> index 72d57f093..298d90817 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -62,7 +62,7 @@ bool JobSTT[JOB_STATUS__MAX][JOB_STATUS__MAX] = {
>> /* C: */ [JOB_STATUS_CREATED] = {0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1},
>> /* R: */ [JOB_STATUS_RUNNING] = {0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0},
>> /* P: */ [JOB_STATUS_PAUSED] = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0},
>> - /* Y: */ [JOB_STATUS_READY] = {0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0},
>> + /* Y: */ [JOB_STATUS_READY] = {0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0},
>> /* S: */ [JOB_STATUS_STANDBY] = {0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0},
>> /* W: */ [JOB_STATUS_WAITING] = {0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0},
>> /* D: */ [JOB_STATUS_PENDING] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0},
>> @@ -1035,6 +1035,12 @@ static int job_transition_to_pending_locked(Job *job)
>> return 0;
>> }
>> 
>> +void job_transition_to_running(Job *job)
>> +{
>> + JOB_LOCK_GUARD();
>> + job_state_transition_locked(job, JOB_STATUS_RUNNING);
>> +}
>> +
>> void job_transition_to_ready(Job *job)
>> {
>> JOB_LOCK_GUARD();




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-25 17:09             ` Arisetty, Chakri
@ 2024-08-26 12:04               ` Prasad Pandit
  2024-08-26 19:05                 ` Arisetty, Chakri
  0 siblings, 1 reply; 10+ messages in thread
From: Prasad Pandit @ 2024-08-26 12:04 UTC (permalink / raw)
  To: Arisetty, Chakri
  Cc: Fabiano Rosas, qemu-devel@nongnu.org, qemu-block@nongnu.org,
	Peter Xu, Kevin Wolf, Eric Blake, Blew III, Will, Massry, Abraham,
	Tottenham, Max, Greve, Mark

On Sun, 25 Aug 2024 at 22:40, Arisetty, Chakri <carisett@akamai.com> wrote:
> > - start the mirror job
> > - qmp_migrate
> > - once PRE_SWITCHOVER is reached, issue block-job-cancel
> > - qmp_migrate_continue
>
> We use exact same steps to do live migration. I repeated the test now
>
> Sure, as you suggested to rule out any incorrect usage, I repeated the test with above steps.
> once RAM migration state moved to pre-switchover, issued block-job-cancel. There are no more dirty blocks.
> But all the disk writes from 'pre-switchover' state to 'complete' state are lost.
> Thus, it is creating loss of customer data.
>

* How is 'issue block-job-cancel' command issued exactly at the
PRE_SWITCHOVER stage? virsh blockjob --abort?

* Recently a postcopy issue, wherein the migrated guest on the
destination machine hangs sometimes with migrate-postcopy but not with
virsh ---postcopy-after-precopy. It seems virsh(1) handles the switch
better. Wondering if it's similar with 'block-job-cancel'.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Issue with QEMU Live Migration
  2024-08-26 12:04               ` Prasad Pandit
@ 2024-08-26 19:05                 ` Arisetty, Chakri
  0 siblings, 0 replies; 10+ messages in thread
From: Arisetty, Chakri @ 2024-08-26 19:05 UTC (permalink / raw)
  To: Prasad Pandit
  Cc: Fabiano Rosas, qemu-devel@nongnu.org, qemu-block@nongnu.org,
	Peter Xu, Kevin Wolf, Eric Blake, Blew III, Will, Massry, Abraham,
	Tottenham, Max, Greve, Mark

> * How is 'issue block-job-cancel' command issued exactly at the
> PRE_SWITCHOVER stage? virsh blockjob --abort?

We are currently not using libvirt/virsh to issue QMP command. Our software makes up QMP connection and sends the QMP command to cancel the job when RAM migration state is in pre-switchover.

qemu_blockdev_cancel($user, "node-ssci-disk-0");

sub qemu_blockdev_cancel {
    my $user   = shift;
    my $device = shift;

    return qemu_control(
        $user,
        {
            "execute"   => "block-job-cancel",
            "arguments" => {
                "device" => $device,
            }
        }
    );
}

>* Recently a postcopy issue, wherein the migrated guest on the
> destination machine hangs sometimes with migrate-postcopy but not with
> virsh ---postcopy-after-precopy. It seems virsh(1) handles the switch
> better. Wondering if it's similar with 'block-job-cancel'.

Thank you for the pointer, we are currently using pre-switchover. II will look more into libvirt as how it is implemented.

Thanks
Chakri

On 8/26/24, 5:05 AM, "Prasad Pandit" <ppandit@redhat.com <mailto:ppandit@redhat.com>> wrote:

!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!

On Sun, 25 Aug 2024 at 22:40, Arisetty, Chakri <carisett@akamai.com <mailto:carisett@akamai.com>> wrote:
> > - start the mirror job
> > - qmp_migrate
> > - once PRE_SWITCHOVER is reached, issue block-job-cancel
> > - qmp_migrate_continue
>
> We use exact same steps to do live migration. I repeated the test now
>
> Sure, as you suggested to rule out any incorrect usage, I repeated the test with above steps.
> once RAM migration state moved to pre-switchover, issued block-job-cancel. There are no more dirty blocks.
> But all the disk writes from 'pre-switchover' state to 'complete' state are lost.
> Thus, it is creating loss of customer data.
>

* How is 'issue block-job-cancel' command issued exactly at the
PRE_SWITCHOVER stage? virsh blockjob --abort?

* Recently a postcopy issue, wherein the migrated guest on the
destination machine hangs sometimes with migrate-postcopy but not with
virsh ---postcopy-after-precopy. It seems virsh(1) handles the switch
better. Wondering if it's similar with 'block-job-cancel'.

Thank you.
---
- Prasad

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-08-26 19:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-21 13:32 Issue with QEMU Live Migration Arisetty, Chakri
2024-08-21 13:56 ` Fabiano Rosas
2024-08-21 16:55   ` Arisetty, Chakri
2024-08-22 13:47     ` Fabiano Rosas
2024-08-23 13:30       ` Arisetty, Chakri
2024-08-23 13:41         ` Arisetty, Chakri
2024-08-23 14:42           ` Fabiano Rosas
2024-08-25 17:09             ` Arisetty, Chakri
2024-08-26 12:04               ` Prasad Pandit
2024-08-26 19:05                 ` Arisetty, Chakri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).