* MultiFD and default channel out of order mapping on receive side. @ 2022-10-12 19:53 manish.mishra 2022-10-13 8:15 ` Daniel P. Berrangé 0 siblings, 1 reply; 13+ messages in thread From: manish.mishra @ 2022-10-12 19:53 UTC (permalink / raw) To: qemu-devel; +Cc: Juan Quintela Hi Everyone, Hope everyone is doing great. I have seen some live migration issues with qemu-4.2 when using multiFD. Signature of issue is something like this. 2022-10-01T09:57:53.972864Z qemu-kvm: failed to receive packet via multifd channel 0: multifd: received packet magic 5145564d expected 11223344 Basically default live migration channel packet is received on multiFD channel. I see a older patch explaining potential reason for this behavior. https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg05920.html > [PATCH 3/3] migration/multifd: fix potential wrong acception order of IO. But i see this patch was not merged. By looking at qemu master code, i could not find any other patch too which can handle this issue. So as per my understanding this is still a potential issue even in qemu master. I mainly wanted to check why this patch was dropped? Sorry if mis-understood something. It will be great if someone can provide some pointers on this. Thanks Manish Mishra ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-12 19:53 MultiFD and default channel out of order mapping on receive side manish.mishra @ 2022-10-13 8:15 ` Daniel P. Berrangé 2022-10-13 8:56 ` manish.mishra 2022-10-17 7:36 ` manish.mishra 0 siblings, 2 replies; 13+ messages in thread From: Daniel P. Berrangé @ 2022-10-13 8:15 UTC (permalink / raw) To: manish.mishra; +Cc: qemu-devel, Juan Quintela On Thu, Oct 13, 2022 at 01:23:40AM +0530, manish.mishra wrote: > Hi Everyone, > Hope everyone is doing great. I have seen some live migration issues with qemu-4.2 when using multiFD. Signature of issue is something like this. > 2022-10-01T09:57:53.972864Z qemu-kvm: failed to receive packet via multifd channel 0: multifd: received packet magic 5145564d expected 11223344 > > Basically default live migration channel packet is received on multiFD channel. I see a older patch explaining potential reason for this behavior. > https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg05920.html > > [PATCH 3/3] migration/multifd: fix potential wrong acception order of IO. > > But i see this patch was not merged. By looking at qemu master code, i > could not find any other patch too which can handle this issue. So as > per my understanding this is still a potential issue even in qemu > master. I mainly wanted to check why this patch was dropped? See my repllies in that message - it broke compatilibity of data on the wire, meaning old QEMU can't talk to new QEMU and vica-verca. We need a fix for this issue, but it needs to take into account wire compatibility. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-13 8:15 ` Daniel P. Berrangé @ 2022-10-13 8:56 ` manish.mishra 2022-10-17 7:36 ` manish.mishra 1 sibling, 0 replies; 13+ messages in thread From: manish.mishra @ 2022-10-13 8:56 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: qemu-devel, Juan Quintela On 13/10/22 1:45 pm, Daniel P. Berrangé wrote: > On Thu, Oct 13, 2022 at 01:23:40AM +0530, manish.mishra wrote: >> Hi Everyone, >> Hope everyone is doing great. I have seen some live migration issues with qemu-4.2 when using multiFD. Signature of issue is something like this. >> 2022-10-01T09:57:53.972864Z qemu-kvm: failed to receive packet via multifd channel 0: multifd: received packet magic 5145564d expected 11223344 >> >> Basically default live migration channel packet is received on multiFD channel. I see a older patch explaining potential reason for this behavior. >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2019-2D10_msg05920.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=LZBcU_C3HMbpUCFZgqxkS-pV8C2mHOjqUTzt45LlLwa26DA0pCAjJVDoamnX8vnC&s=B-b_HMnn_ee6JeA87-PVNBrBqxzdWYgo5PpaP91dqT8&e= >>> [PATCH 3/3] migration/multifd: fix potential wrong acception order of IO. >> But i see this patch was not merged. By looking at qemu master code, i >> could not find any other patch too which can handle this issue. So as >> per my understanding this is still a potential issue even in qemu >> master. I mainly wanted to check why this patch was dropped? > See my repllies in that message - it broke compatilibity of data on > the wire, meaning old QEMU can't talk to new QEMU and vica-verca. > > We need a fix for this issue, but it needs to take into account > wire compatibility. > > With regards, > Daniel ok got it, thank you so much Daniel, in that case i will try to create some patch considering backward compatibility and send for review. Mainly i wanted to understand if it is handled somehow differently in upstream master, but manually looking code it did not look like that, so just wanted to confirm. Thanks Manish Mishra ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-13 8:15 ` Daniel P. Berrangé 2022-10-13 8:56 ` manish.mishra @ 2022-10-17 7:36 ` manish.mishra 2022-10-17 11:38 ` Daniel P. Berrangé 1 sibling, 1 reply; 13+ messages in thread From: manish.mishra @ 2022-10-17 7:36 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: qemu-devel, Juan Quintela, Peter Xu [-- Attachment #1: Type: text/plain, Size: 2877 bytes --] Hi Daniel, I was thinking for some solutions for this so wanted to discuss that before going ahead. Also added Juan and Peter in loop. 1. Earlier i was thinking, on destination side as of now for default and multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION so may be we can decide mapping based on that. But then that does not work for newly added post copy preempt channel as it does not send any MAGIC number. Also even for multiFD just MAGIC number does not tell which multifd channel number is it, even though as per my thinking it does not matter. So MAGIC number should be good for indentifying default vs multiFD channel? 2. For post-copy preempt may be we can initiate this channel only after we have received a request from remote e.g. remote page fault. This to me looks safest considering post-copy recorvery case too. I can not think of any depedency on post copy preempt channel which requires it to be initialised very early. May be Peter can confirm this. 3. Another thing we can do is to have 2-way handshake on every channel creation with some additional metadata, this to me looks like cleanest approach and durable, i understand that can break migration to/from old qemu, but then that can come as migration capability? Please let me know if any of these works or if you have some other suggestions? Thanks Manish Mishra On 13/10/22 1:45 pm, Daniel P. Berrangé wrote: > On Thu, Oct 13, 2022 at 01:23:40AM +0530, manish.mishra wrote: >> Hi Everyone, >> Hope everyone is doing great. I have seen some live migration issues with qemu-4.2 when using multiFD. Signature of issue is something like this. >> 2022-10-01T09:57:53.972864Z qemu-kvm: failed to receive packet via multifd channel 0: multifd: received packet magic 5145564d expected 11223344 >> >> Basically default live migration channel packet is received on multiFD channel. I see a older patch explaining potential reason for this behavior. >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2019-2D10_msg05920.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=LZBcU_C3HMbpUCFZgqxkS-pV8C2mHOjqUTzt45LlLwa26DA0pCAjJVDoamnX8vnC&s=B-b_HMnn_ee6JeA87-PVNBrBqxzdWYgo5PpaP91dqT8&e= >>> [PATCH 3/3] migration/multifd: fix potential wrong acception order of IO. >> But i see this patch was not merged. By looking at qemu master code, i >> could not find any other patch too which can handle this issue. So as >> per my understanding this is still a potential issue even in qemu >> master. I mainly wanted to check why this patch was dropped? > See my repllies in that message - it broke compatilibity of data on > the wire, meaning old QEMU can't talk to new QEMU and vica-verca. > > We need a fix for this issue, but it needs to take into account > wire compatibility. > > With regards, > Daniel [-- Attachment #2: Type: text/html, Size: 4375 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-17 7:36 ` manish.mishra @ 2022-10-17 11:38 ` Daniel P. Berrangé 2022-10-17 21:15 ` Peter Xu 0 siblings, 1 reply; 13+ messages in thread From: Daniel P. Berrangé @ 2022-10-17 11:38 UTC (permalink / raw) To: manish.mishra; +Cc: qemu-devel, Juan Quintela, Peter Xu, Dr. David Alan Gilbert On Mon, Oct 17, 2022 at 01:06:00PM +0530, manish.mishra wrote: > Hi Daniel, > > I was thinking for some solutions for this so wanted to discuss that before going ahead. Also added Juan and Peter in loop. > > 1. Earlier i was thinking, on destination side as of now for default > and multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION > so may be we can decide mapping based on that. But then that does not > work for newly added post copy preempt channel as it does not send > any MAGIC number. Also even for multiFD just MAGIC number does not > tell which multifd channel number is it, even though as per my thinking > it does not matter. So MAGIC number should be good for indentifying > default vs multiFD channel? Yep, you don't need to know more than the MAGIC value. In migration_io_process_incoming, we need to use MSG_PEEK to look at the first 4 bytes pendingon the wire. If those bytes are 'QEVM' that's the primary channel, if those bytes are big endian 0x11223344, that's a multifd channel. Using MSG_PEEK aviods need to modify thue later code that actually reads this data. The challenge is how long to wait with the MSG_PEEK. If we do it in a blocking mode, its fine for main channel and multifd, but IIUC for the post-copy pre-empt channel we'd be waiting for something that will never arrive. Having suggested MSG_PEEK though, this may well not work if the channel has TLS present. In fact it almost definitely won't work. To cope with TLS migration_io_process_incoming would need to actually read the data off the wire, and later methods be taught to skip reading the magic. > 2. For post-copy preempt may be we can initiate this channel only > after we have received a request from remote e.g. remote page fault. > This to me looks safest considering post-copy recorvery case too. > I can not think of any depedency on post copy preempt channel which > requires it to be initialised very early. May be Peter can confirm > this. I guess that could work > 3. Another thing we can do is to have 2-way handshake on every > channel creation with some additional metadata, this to me looks > like cleanest approach and durable, i understand that can break > migration to/from old qemu, but then that can come as migration > capability? The benefit of (1) is that the fix can be deployed for all existing QEMU releases by backporting it. (3) will meanwhile need mgmt app updates to make it work, which is much more work to deploy. We really shoulud have had a more formal handshake, and I've described ways to achieve this in the past, but it is quite alot of work. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-17 11:38 ` Daniel P. Berrangé @ 2022-10-17 21:15 ` Peter Xu 2022-10-18 8:18 ` Daniel P. Berrangé 0 siblings, 1 reply; 13+ messages in thread From: Peter Xu @ 2022-10-17 21:15 UTC (permalink / raw) To: Daniel P. Berrangé, Manish Mishra Cc: manish.mishra, qemu-devel, Juan Quintela, Dr. David Alan Gilbert On Mon, Oct 17, 2022 at 12:38:30PM +0100, Daniel P. Berrangé wrote: > On Mon, Oct 17, 2022 at 01:06:00PM +0530, manish.mishra wrote: > > Hi Daniel, > > > > I was thinking for some solutions for this so wanted to discuss that before going ahead. Also added Juan and Peter in loop. > > > > 1. Earlier i was thinking, on destination side as of now for default > > and multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION > > so may be we can decide mapping based on that. But then that does not > > work for newly added post copy preempt channel as it does not send > > any MAGIC number. Also even for multiFD just MAGIC number does not > > tell which multifd channel number is it, even though as per my thinking > > it does not matter. So MAGIC number should be good for indentifying > > default vs multiFD channel? > > Yep, you don't need to know more than the MAGIC value. > > In migration_io_process_incoming, we need to use MSG_PEEK to look at > the first 4 bytes pendingon the wire. If those bytes are 'QEVM' that's > the primary channel, if those bytes are big endian 0x11223344, that's > a multifd channel. Using MSG_PEEK aviods need to modify thue later > code that actually reads this data. > > The challenge is how long to wait with the MSG_PEEK. If we do it > in a blocking mode, its fine for main channel and multifd, but > IIUC for the post-copy pre-empt channel we'd be waiting for > something that will never arrive. > > Having suggested MSG_PEEK though, this may well not work if the > channel has TLS present. In fact it almost definitely won't work. > > To cope with TLS migration_io_process_incoming would need to > actually read the data off the wire, and later methods be > taught to skip reading the magic. > > > 2. For post-copy preempt may be we can initiate this channel only > > after we have received a request from remote e.g. remote page fault. > > This to me looks safest considering post-copy recorvery case too. > > I can not think of any depedency on post copy preempt channel which > > requires it to be initialised very early. May be Peter can confirm > > this. > > I guess that could work Currently all preempt code still assumes when postcopy activated it's in preempt mode. IIUC such a change will bring an extra phase of postcopy with no-preempt before preempt enabled. We may need to teach qemu to understand that if it's needed. Meanwhile the initial page requests will not be able to benefit from the new preempt channel too. > > > 3. Another thing we can do is to have 2-way handshake on every > > channel creation with some additional metadata, this to me looks > > like cleanest approach and durable, i understand that can break > > migration to/from old qemu, but then that can come as migration > > capability? > > The benefit of (1) is that the fix can be deployed for all existing > QEMU releases by backporting it. (3) will meanwhile need mgmt app > updates to make it work, which is much more work to deploy. > > We really shoulud have had a more formal handshake, and I've described > ways to achieve this in the past, but it is quite alot of work. I don't know whether (1) is a valid option if there are use cases that it cannot cover (on either tls or preempt). The handshake is definitely the clean approach. What's the outcome of such wrongly ordered connections? Will migration fail immediately and safely? For multifd, I think it should fail immediately after the connection established. For preempt, I'd also expect the same thing because the only wrong order to happen right now is having the preempt channel to be the migration channel, then it should also fail immediately on the first qemu_get_byte(). Hopefully that's still not too bad - I mean, if we can fail constantly and safely (never fail during postcopy), we can always retry and as long as connections created successfully we can start the migration safely. But please correct me if it's not the case. -- Peter Xu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-17 21:15 ` Peter Xu @ 2022-10-18 8:18 ` Daniel P. Berrangé 2022-10-18 14:51 ` Peter Xu 0 siblings, 1 reply; 13+ messages in thread From: Daniel P. Berrangé @ 2022-10-18 8:18 UTC (permalink / raw) To: Peter Xu; +Cc: Manish Mishra, qemu-devel, Juan Quintela, Dr. David Alan Gilbert On Mon, Oct 17, 2022 at 05:15:35PM -0400, Peter Xu wrote: > On Mon, Oct 17, 2022 at 12:38:30PM +0100, Daniel P. Berrangé wrote: > > On Mon, Oct 17, 2022 at 01:06:00PM +0530, manish.mishra wrote: > > > Hi Daniel, > > > > > > I was thinking for some solutions for this so wanted to discuss that before going ahead. Also added Juan and Peter in loop. > > > > > > 1. Earlier i was thinking, on destination side as of now for default > > > and multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION > > > so may be we can decide mapping based on that. But then that does not > > > work for newly added post copy preempt channel as it does not send > > > any MAGIC number. Also even for multiFD just MAGIC number does not > > > tell which multifd channel number is it, even though as per my thinking > > > it does not matter. So MAGIC number should be good for indentifying > > > default vs multiFD channel? > > > > Yep, you don't need to know more than the MAGIC value. > > > > In migration_io_process_incoming, we need to use MSG_PEEK to look at > > the first 4 bytes pendingon the wire. If those bytes are 'QEVM' that's > > the primary channel, if those bytes are big endian 0x11223344, that's > > a multifd channel. Using MSG_PEEK aviods need to modify thue later > > code that actually reads this data. > > > > The challenge is how long to wait with the MSG_PEEK. If we do it > > in a blocking mode, its fine for main channel and multifd, but > > IIUC for the post-copy pre-empt channel we'd be waiting for > > something that will never arrive. > > > > Having suggested MSG_PEEK though, this may well not work if the > > channel has TLS present. In fact it almost definitely won't work. > > > > To cope with TLS migration_io_process_incoming would need to > > actually read the data off the wire, and later methods be > > taught to skip reading the magic. > > > > > 2. For post-copy preempt may be we can initiate this channel only > > > after we have received a request from remote e.g. remote page fault. > > > This to me looks safest considering post-copy recorvery case too. > > > I can not think of any depedency on post copy preempt channel which > > > requires it to be initialised very early. May be Peter can confirm > > > this. > > > > I guess that could work > > Currently all preempt code still assumes when postcopy activated it's in > preempt mode. IIUC such a change will bring an extra phase of postcopy > with no-preempt before preempt enabled. We may need to teach qemu to > understand that if it's needed. > > Meanwhile the initial page requests will not be able to benefit from the > new preempt channel too. > > > > > > 3. Another thing we can do is to have 2-way handshake on every > > > channel creation with some additional metadata, this to me looks > > > like cleanest approach and durable, i understand that can break > > > migration to/from old qemu, but then that can come as migration > > > capability? > > > > The benefit of (1) is that the fix can be deployed for all existing > > QEMU releases by backporting it. (3) will meanwhile need mgmt app > > updates to make it work, which is much more work to deploy. > > > > We really shoulud have had a more formal handshake, and I've described > > ways to achieve this in the past, but it is quite alot of work. > > I don't know whether (1) is a valid option if there are use cases that it > cannot cover (on either tls or preempt). The handshake is definitely the > clean approach. > > What's the outcome of such wrongly ordered connections? Will migration > fail immediately and safely? > > For multifd, I think it should fail immediately after the connection > established. > > For preempt, I'd also expect the same thing because the only wrong order to > happen right now is having the preempt channel to be the migration channel, > then it should also fail immediately on the first qemu_get_byte(). > > Hopefully that's still not too bad - I mean, if we can fail constantly and > safely (never fail during postcopy), we can always retry and as long as > connections created successfully we can start the migration safely. But > please correct me if it's not the case. It should typically fail as the magic bytes are different, which will not pass validation. The exception being the postcopy pre-empt channel which may well cause migration to stall as nothing will be sent initially by the src. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-18 8:18 ` Daniel P. Berrangé @ 2022-10-18 14:51 ` Peter Xu 2022-10-18 21:00 ` Peter Xu 0 siblings, 1 reply; 13+ messages in thread From: Peter Xu @ 2022-10-18 14:51 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Manish Mishra, qemu-devel, Juan Quintela, Dr. David Alan Gilbert On Tue, Oct 18, 2022 at 09:18:28AM +0100, Daniel P. Berrangé wrote: > On Mon, Oct 17, 2022 at 05:15:35PM -0400, Peter Xu wrote: > > On Mon, Oct 17, 2022 at 12:38:30PM +0100, Daniel P. Berrangé wrote: > > > On Mon, Oct 17, 2022 at 01:06:00PM +0530, manish.mishra wrote: > > > > Hi Daniel, > > > > > > > > I was thinking for some solutions for this so wanted to discuss that before going ahead. Also added Juan and Peter in loop. > > > > > > > > 1. Earlier i was thinking, on destination side as of now for default > > > > and multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION > > > > so may be we can decide mapping based on that. But then that does not > > > > work for newly added post copy preempt channel as it does not send > > > > any MAGIC number. Also even for multiFD just MAGIC number does not > > > > tell which multifd channel number is it, even though as per my thinking > > > > it does not matter. So MAGIC number should be good for indentifying > > > > default vs multiFD channel? > > > > > > Yep, you don't need to know more than the MAGIC value. > > > > > > In migration_io_process_incoming, we need to use MSG_PEEK to look at > > > the first 4 bytes pendingon the wire. If those bytes are 'QEVM' that's > > > the primary channel, if those bytes are big endian 0x11223344, that's > > > a multifd channel. Using MSG_PEEK aviods need to modify thue later > > > code that actually reads this data. > > > > > > The challenge is how long to wait with the MSG_PEEK. If we do it > > > in a blocking mode, its fine for main channel and multifd, but > > > IIUC for the post-copy pre-empt channel we'd be waiting for > > > something that will never arrive. > > > > > > Having suggested MSG_PEEK though, this may well not work if the > > > channel has TLS present. In fact it almost definitely won't work. > > > > > > To cope with TLS migration_io_process_incoming would need to > > > actually read the data off the wire, and later methods be > > > taught to skip reading the magic. > > > > > > > 2. For post-copy preempt may be we can initiate this channel only > > > > after we have received a request from remote e.g. remote page fault. > > > > This to me looks safest considering post-copy recorvery case too. > > > > I can not think of any depedency on post copy preempt channel which > > > > requires it to be initialised very early. May be Peter can confirm > > > > this. > > > > > > I guess that could work > > > > Currently all preempt code still assumes when postcopy activated it's in > > preempt mode. IIUC such a change will bring an extra phase of postcopy > > with no-preempt before preempt enabled. We may need to teach qemu to > > understand that if it's needed. > > > > Meanwhile the initial page requests will not be able to benefit from the > > new preempt channel too. > > > > > > > > > 3. Another thing we can do is to have 2-way handshake on every > > > > channel creation with some additional metadata, this to me looks > > > > like cleanest approach and durable, i understand that can break > > > > migration to/from old qemu, but then that can come as migration > > > > capability? > > > > > > The benefit of (1) is that the fix can be deployed for all existing > > > QEMU releases by backporting it. (3) will meanwhile need mgmt app > > > updates to make it work, which is much more work to deploy. > > > > > > We really shoulud have had a more formal handshake, and I've described > > > ways to achieve this in the past, but it is quite alot of work. > > > > I don't know whether (1) is a valid option if there are use cases that it > > cannot cover (on either tls or preempt). The handshake is definitely the > > clean approach. > > > > What's the outcome of such wrongly ordered connections? Will migration > > fail immediately and safely? > > > > For multifd, I think it should fail immediately after the connection > > established. > > > > For preempt, I'd also expect the same thing because the only wrong order to > > happen right now is having the preempt channel to be the migration channel, > > then it should also fail immediately on the first qemu_get_byte(). > > > > Hopefully that's still not too bad - I mean, if we can fail constantly and > > safely (never fail during postcopy), we can always retry and as long as > > connections created successfully we can start the migration safely. But > > please correct me if it's not the case. > > It should typically fail as the magic bytes are different, which will not > pass validation. The exception being the postcopy pre-empt channel which > may well cause migration to stall as nothing will be sent initially by > the src. Hmm right.. Actually if preempt channel is special we can fix it alone. As both of you discussed, we can postpone the preempt channel setup, maybe not as late as when we receive the 1st page request, but: (1) For newly established migration, we can postpone preempt channel setup (postcopy_preempt_setup, resume=false) to the entrance of postcopy_start(). (2) For a postcopy recovery process, we can postpone preempt channel setup (postcopy_preempt_setup, resume=true) to postcopy_do_resume(), maybe between qemu_savevm_state_resume_prepare() and the final handshake of postcopy_resume_handshake(). I need to try and test a bit for above idea. But the same trick may not play well on multifd even if it works. -- Peter Xu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-18 14:51 ` Peter Xu @ 2022-10-18 21:00 ` Peter Xu 2022-10-20 14:44 ` manish.mishra 0 siblings, 1 reply; 13+ messages in thread From: Peter Xu @ 2022-10-18 21:00 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Manish Mishra, qemu-devel, Juan Quintela, Dr. David Alan Gilbert On Tue, Oct 18, 2022 at 10:51:12AM -0400, Peter Xu wrote: > On Tue, Oct 18, 2022 at 09:18:28AM +0100, Daniel P. Berrangé wrote: > > On Mon, Oct 17, 2022 at 05:15:35PM -0400, Peter Xu wrote: > > > On Mon, Oct 17, 2022 at 12:38:30PM +0100, Daniel P. Berrangé wrote: > > > > On Mon, Oct 17, 2022 at 01:06:00PM +0530, manish.mishra wrote: > > > > > Hi Daniel, > > > > > > > > > > I was thinking for some solutions for this so wanted to discuss that before going ahead. Also added Juan and Peter in loop. > > > > > > > > > > 1. Earlier i was thinking, on destination side as of now for default > > > > > and multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION > > > > > so may be we can decide mapping based on that. But then that does not > > > > > work for newly added post copy preempt channel as it does not send > > > > > any MAGIC number. Also even for multiFD just MAGIC number does not > > > > > tell which multifd channel number is it, even though as per my thinking > > > > > it does not matter. So MAGIC number should be good for indentifying > > > > > default vs multiFD channel? > > > > > > > > Yep, you don't need to know more than the MAGIC value. > > > > > > > > In migration_io_process_incoming, we need to use MSG_PEEK to look at > > > > the first 4 bytes pendingon the wire. If those bytes are 'QEVM' that's > > > > the primary channel, if those bytes are big endian 0x11223344, that's > > > > a multifd channel. Using MSG_PEEK aviods need to modify thue later > > > > code that actually reads this data. > > > > > > > > The challenge is how long to wait with the MSG_PEEK. If we do it > > > > in a blocking mode, its fine for main channel and multifd, but > > > > IIUC for the post-copy pre-empt channel we'd be waiting for > > > > something that will never arrive. > > > > > > > > Having suggested MSG_PEEK though, this may well not work if the > > > > channel has TLS present. In fact it almost definitely won't work. > > > > > > > > To cope with TLS migration_io_process_incoming would need to > > > > actually read the data off the wire, and later methods be > > > > taught to skip reading the magic. > > > > > > > > > 2. For post-copy preempt may be we can initiate this channel only > > > > > after we have received a request from remote e.g. remote page fault. > > > > > This to me looks safest considering post-copy recorvery case too. > > > > > I can not think of any depedency on post copy preempt channel which > > > > > requires it to be initialised very early. May be Peter can confirm > > > > > this. > > > > > > > > I guess that could work > > > > > > Currently all preempt code still assumes when postcopy activated it's in > > > preempt mode. IIUC such a change will bring an extra phase of postcopy > > > with no-preempt before preempt enabled. We may need to teach qemu to > > > understand that if it's needed. > > > > > > Meanwhile the initial page requests will not be able to benefit from the > > > new preempt channel too. > > > > > > > > > > > > 3. Another thing we can do is to have 2-way handshake on every > > > > > channel creation with some additional metadata, this to me looks > > > > > like cleanest approach and durable, i understand that can break > > > > > migration to/from old qemu, but then that can come as migration > > > > > capability? > > > > > > > > The benefit of (1) is that the fix can be deployed for all existing > > > > QEMU releases by backporting it. (3) will meanwhile need mgmt app > > > > updates to make it work, which is much more work to deploy. > > > > > > > > We really shoulud have had a more formal handshake, and I've described > > > > ways to achieve this in the past, but it is quite alot of work. > > > > > > I don't know whether (1) is a valid option if there are use cases that it > > > cannot cover (on either tls or preempt). The handshake is definitely the > > > clean approach. > > > > > > What's the outcome of such wrongly ordered connections? Will migration > > > fail immediately and safely? > > > > > > For multifd, I think it should fail immediately after the connection > > > established. > > > > > > For preempt, I'd also expect the same thing because the only wrong order to > > > happen right now is having the preempt channel to be the migration channel, > > > then it should also fail immediately on the first qemu_get_byte(). > > > > > > Hopefully that's still not too bad - I mean, if we can fail constantly and > > > safely (never fail during postcopy), we can always retry and as long as > > > connections created successfully we can start the migration safely. But > > > please correct me if it's not the case. > > > > It should typically fail as the magic bytes are different, which will not > > pass validation. The exception being the postcopy pre-empt channel which > > may well cause migration to stall as nothing will be sent initially by > > the src. > > Hmm right.. > > Actually if preempt channel is special we can fix it alone. As both of you > discussed, we can postpone the preempt channel setup, maybe not as late as > when we receive the 1st page request, but: > > (1) For newly established migration, we can postpone preempt channel > setup (postcopy_preempt_setup, resume=false) to the entrance of > postcopy_start(). > > (2) For a postcopy recovery process, we can postpone preempt channel > setup (postcopy_preempt_setup, resume=true) to postcopy_do_resume(), > maybe between qemu_savevm_state_resume_prepare() and the final > handshake of postcopy_resume_handshake(). > > I need to try and test a bit for above idea. But the same trick may not > play well on multifd even if it works. The sender side is relatively easy because migration thread can move on without the preempt channel, then the main thread will keep taking care of it, when connected it can notify the migration thread. It seems trickier with dest node where the migration loading thread is only a coroutine of the main thread, so during loading the vm I don't really see how further socket connections can be established. Now it's okay with thread being shared because we only do migration_incoming_process() and enter the coroutine if all channels are ready. -- Peter Xu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-18 21:00 ` Peter Xu @ 2022-10-20 14:44 ` manish.mishra 2022-10-20 16:32 ` Peter Xu 0 siblings, 1 reply; 13+ messages in thread From: manish.mishra @ 2022-10-20 14:44 UTC (permalink / raw) To: Peter Xu, Daniel P. Berrangé Cc: qemu-devel, Juan Quintela, Dr. David Alan Gilbert, Prerna Saxena, "utkarsh.tripathi [-- Attachment #1: Type: text/plain, Size: 6946 bytes --] On 19/10/22 2:30 am, Peter Xu wrote: > On Tue, Oct 18, 2022 at 10:51:12AM -0400, Peter Xu wrote: >> On Tue, Oct 18, 2022 at 09:18:28AM +0100, Daniel P. Berrangé wrote: >>> On Mon, Oct 17, 2022 at 05:15:35PM -0400, Peter Xu wrote: >>>> On Mon, Oct 17, 2022 at 12:38:30PM +0100, Daniel P. Berrangé wrote: >>>>> On Mon, Oct 17, 2022 at 01:06:00PM +0530, manish.mishra wrote: >>>>>> Hi Daniel, >>>>>> >>>>>> I was thinking for some solutions for this so wanted to discuss that before going ahead. Also added Juan and Peter in loop. >>>>>> >>>>>> 1. Earlier i was thinking, on destination side as of now for default >>>>>> and multi-FD channel first data to be sent is MAGIC_NUMBER and VERSION >>>>>> so may be we can decide mapping based on that. But then that does not >>>>>> work for newly added post copy preempt channel as it does not send >>>>>> any MAGIC number. Also even for multiFD just MAGIC number does not >>>>>> tell which multifd channel number is it, even though as per my thinking >>>>>> it does not matter. So MAGIC number should be good for indentifying >>>>>> default vs multiFD channel? >>>>> Yep, you don't need to know more than the MAGIC value. >>>>> >>>>> In migration_io_process_incoming, we need to use MSG_PEEK to look at >>>>> the first 4 bytes pendingon the wire. If those bytes are 'QEVM' that's >>>>> the primary channel, if those bytes are big endian 0x11223344, that's >>>>> a multifd channel. Using MSG_PEEK aviods need to modify thue later >>>>> code that actually reads this data. >>>>> >>>>> The challenge is how long to wait with the MSG_PEEK. If we do it >>>>> in a blocking mode, its fine for main channel and multifd, but >>>>> IIUC for the post-copy pre-empt channel we'd be waiting for >>>>> something that will never arrive. >>>>> >>>>> Having suggested MSG_PEEK though, this may well not work if the >>>>> channel has TLS present. In fact it almost definitely won't work. >>>>> >>>>> To cope with TLS migration_io_process_incoming would need to >>>>> actually read the data off the wire, and later methods be >>>>> taught to skip reading the magic. >>>>> >>>>>> 2. For post-copy preempt may be we can initiate this channel only >>>>>> after we have received a request from remote e.g. remote page fault. >>>>>> This to me looks safest considering post-copy recorvery case too. >>>>>> I can not think of any depedency on post copy preempt channel which >>>>>> requires it to be initialised very early. May be Peter can confirm >>>>>> this. >>>>> I guess that could work >>>> Currently all preempt code still assumes when postcopy activated it's in >>>> preempt mode. IIUC such a change will bring an extra phase of postcopy >>>> with no-preempt before preempt enabled. We may need to teach qemu to >>>> understand that if it's needed. >>>> >>>> Meanwhile the initial page requests will not be able to benefit from the >>>> new preempt channel too. >>>> >>>>>> 3. Another thing we can do is to have 2-way handshake on every >>>>>> channel creation with some additional metadata, this to me looks >>>>>> like cleanest approach and durable, i understand that can break >>>>>> migration to/from old qemu, but then that can come as migration >>>>>> capability? >>>>> The benefit of (1) is that the fix can be deployed for all existing >>>>> QEMU releases by backporting it. (3) will meanwhile need mgmt app >>>>> updates to make it work, which is much more work to deploy. >>>>> >>>>> We really shoulud have had a more formal handshake, and I've described >>>>> ways to achieve this in the past, but it is quite alot of work. >>>> I don't know whether (1) is a valid option if there are use cases that it >>>> cannot cover (on either tls or preempt). The handshake is definitely the >>>> clean approach. >>>> >>>> What's the outcome of such wrongly ordered connections? Will migration >>>> fail immediately and safely? >>>> >>>> For multifd, I think it should fail immediately after the connection >>>> established. >>>> >>>> For preempt, I'd also expect the same thing because the only wrong order to >>>> happen right now is having the preempt channel to be the migration channel, >>>> then it should also fail immediately on the first qemu_get_byte(). >>>> >>>> Hopefully that's still not too bad - I mean, if we can fail constantly and >>>> safely (never fail during postcopy), we can always retry and as long as >>>> connections created successfully we can start the migration safely. But >>>> please correct me if it's not the case. >>> It should typically fail as the magic bytes are different, which will not >>> pass validation. The exception being the postcopy pre-empt channel which >>> may well cause migration to stall as nothing will be sent initially by >>> the src. >> Hmm right.. >> >> Actually if preempt channel is special we can fix it alone. As both of you >> discussed, we can postpone the preempt channel setup, maybe not as late as >> when we receive the 1st page request, but: >> >> (1) For newly established migration, we can postpone preempt channel >> setup (postcopy_preempt_setup, resume=false) to the entrance of >> postcopy_start(). >> >> (2) For a postcopy recovery process, we can postpone preempt channel >> setup (postcopy_preempt_setup, resume=true) to postcopy_do_resume(), >> maybe between qemu_savevm_state_resume_prepare() and the final >> handshake of postcopy_resume_handshake(). Yes Peter, agree postcopy_start and postcopy_do_resume should also work, as by then we already have some 2-way communication, for e.g. for non-recovery case we send ping cmd, so probaly we can block in postcopy_start till we get reply of pong. Also for postcopy_do_resume too probably after response of |MIG_CMD_POSTCOPY_RESUME|. >> >> I need to try and test a bit for above idea. But the same trick may not >> play well on multifd even if it works. I had one concern, during recover we do not send any magic. As of now we do not support multifd with postcopy so it should be fine, we can do explict checking for non-recovery case. But i remember from some discussion in future there may be support for multiFD with postcopy or have multiple postcopy preempt channels too, then proper handshake will be required? So at some point we want to take that path? For now i agree approach 1 will be good as suggested by Daniel it can be backported easily to older qemu's too. > The sender side is relatively easy because migration thread can move on > without the preempt channel, then the main thread will keep taking care of > it, when connected it can notify the migration thread. > > It seems trickier with dest node where the migration loading thread is only > a coroutine of the main thread, so during loading the vm I don't really see > how further socket connections can be established. Now it's okay with > thread being shared because we only do migration_incoming_process() and > enter the coroutine if all channels are ready. > [-- Attachment #2: Type: text/html, Size: 8847 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-20 14:44 ` manish.mishra @ 2022-10-20 16:32 ` Peter Xu 2022-10-20 22:07 ` Daniel P. Berrangé 0 siblings, 1 reply; 13+ messages in thread From: Peter Xu @ 2022-10-20 16:32 UTC (permalink / raw) To: manish.mishra Cc: Daniel P. Berrangé, qemu-devel, Juan Quintela, Dr. David Alan Gilbert, Prerna Saxena On Thu, Oct 20, 2022 at 08:14:19PM +0530, manish.mishra wrote: > I had one concern, during recover we do not send any magic. As of now we do not support multifd with postcopy so it should be fine, we can do explict checking for non-recovery case. But i remember from some discussion in future there may be support for multiFD with postcopy or have multiple postcopy preempt channels too, then proper handshake will be required? So at some point we want to take that path? For now i agree approach 1 will be good as suggested by Daniel it can be backported easily to older qemu's too. Yes for the long run I think we should provide a generic solution for all the channels to be established for migration purpose. Not to mention that as I replied previously to my original email, the trick won't easily work with dest QEMU where we need further change to allow qemu to accept new channels during loading of the VM. Considering the complexity that it'll take just to resolve the prempt channel ordering, I think maybe it's cleaner we just look for the long term goal. -- Peter Xu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-20 16:32 ` Peter Xu @ 2022-10-20 22:07 ` Daniel P. Berrangé 2022-10-21 8:13 ` manish.mishra 0 siblings, 1 reply; 13+ messages in thread From: Daniel P. Berrangé @ 2022-10-20 22:07 UTC (permalink / raw) To: Peter Xu Cc: manish.mishra, qemu-devel, Juan Quintela, Dr. David Alan Gilbert, Prerna Saxena On Thu, Oct 20, 2022 at 12:32:06PM -0400, Peter Xu wrote: > On Thu, Oct 20, 2022 at 08:14:19PM +0530, manish.mishra wrote: > > I had one concern, during recover we do not send any magic. As of now we > do not support multifd with postcopy so it should be fine, we can do > explict checking for non-recovery case. But i remember from some > discussion in future there may be support for multiFD with postcopy or > have multiple postcopy preempt channels too, then proper handshake will > be required? So at some point we want to take that path? For now i agree > approach 1 will be good as suggested by Daniel it can be backported > easily to older qemu's too. > > Yes for the long run I think we should provide a generic solution for all > the channels to be established for migration purpose. > > Not to mention that as I replied previously to my original email, the trick > won't easily work with dest QEMU where we need further change to allow qemu > to accept new channels during loading of the VM. > > Considering the complexity that it'll take just to resolve the prempt > channel ordering, I think maybe it's cleaner we just look for the long term > goal. I think we should just ignore the preempt channel. Lets just do the easy bit and fix the main vs multifd channel mixup, as that's the one that is definitely actively hitting people today. We can solve that as a quick win in a way that is easy to backport to existing releases of QEMU for those affected. Separately from that, lets define a clean slate migration protocol to solve many of our historic problems and mistakes that can't be dealt with through retrofitting, not limited to just this ordering mistake. We had a significant discussion about it at the start of the year in this thread, which I think we should take forward and write into a formal protocol spec. https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03655.html With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: MultiFD and default channel out of order mapping on receive side. 2022-10-20 22:07 ` Daniel P. Berrangé @ 2022-10-21 8:13 ` manish.mishra 0 siblings, 0 replies; 13+ messages in thread From: manish.mishra @ 2022-10-21 8:13 UTC (permalink / raw) To: Daniel P. Berrangé, Peter Xu Cc: qemu-devel, Juan Quintela, Dr. David Alan Gilbert, Prerna Saxena On 21/10/22 3:37 am, Daniel P. Berrangé wrote: > On Thu, Oct 20, 2022 at 12:32:06PM -0400, Peter Xu wrote: >> On Thu, Oct 20, 2022 at 08:14:19PM +0530, manish.mishra wrote: >>> I had one concern, during recover we do not send any magic. As of now we >> do not support multifd with postcopy so it should be fine, we can do >> explict checking for non-recovery case. But i remember from some >> discussion in future there may be support for multiFD with postcopy or >> have multiple postcopy preempt channels too, then proper handshake will >> be required? So at some point we want to take that path? For now i agree >> approach 1 will be good as suggested by Daniel it can be backported >> easily to older qemu's too. >> >> Yes for the long run I think we should provide a generic solution for all >> the channels to be established for migration purpose. >> >> Not to mention that as I replied previously to my original email, the trick >> won't easily work with dest QEMU where we need further change to allow qemu >> to accept new channels during loading of the VM. >> >> Considering the complexity that it'll take just to resolve the prempt >> channel ordering, I think maybe it's cleaner we just look for the long term >> goal. > I think we should just ignore the preempt channel. Lets just do the > easy bit and fix the main vs multifd channel mixup, as that's the one > that is definitely actively hitting people today. We can solve that as > a quick win in a way that is easy to backport to existing releases of > QEMU for those affected. Yes, that works for now Daniel. I can send a patch as per your earlier suggestions on this, early next week if it fine? > Separately from that, lets define a clean slate migration protocol to > solve many of our historic problems and mistakes that can't be dealt > with through retrofitting, not limited to just this ordering mistake. > > We had a significant discussion about it at the start of the year > in this thread, which I think we should take forward and write into > a formal protocol spec. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2022-2D03_msg03655.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=vjRKUpojJeAWFcJNi7YETWaLjjOcLWIcdk8KO-HMu8_A3veG2aaGmZoLxFVHgLt0&s=uv5cKdIMcT8Wo3erYQZJqnMXvRCbda4gNtxsy9EXrC8&e= This was a good read for me. :) > > With regards, > Daniel Thanks Manish Mishra ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-10-21 8:21 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-12 19:53 MultiFD and default channel out of order mapping on receive side manish.mishra 2022-10-13 8:15 ` Daniel P. Berrangé 2022-10-13 8:56 ` manish.mishra 2022-10-17 7:36 ` manish.mishra 2022-10-17 11:38 ` Daniel P. Berrangé 2022-10-17 21:15 ` Peter Xu 2022-10-18 8:18 ` Daniel P. Berrangé 2022-10-18 14:51 ` Peter Xu 2022-10-18 21:00 ` Peter Xu 2022-10-20 14:44 ` manish.mishra 2022-10-20 16:32 ` Peter Xu 2022-10-20 22:07 ` Daniel P. Berrangé 2022-10-21 8:13 ` manish.mishra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).