From: Li Zhang <lizhang@suse.de>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: quintela@redhat.com, berrange@redhat.com, qemu-devel@nongnu.org,
cfontana@suse.de
Subject: Re: [PATCH v2 0/1] migration: multifd live migration improvement
Date: Tue, 7 Dec 2021 14:45:10 +0100 [thread overview]
Message-ID: <e55634a9-bb30-de28-9dec-2dee15d9cb41@suse.de> (raw)
In-Reply-To: <Ya5qgYpDrN79A+jl@work-vm>
On 12/6/21 8:54 PM, Dr. David Alan Gilbert wrote:
> * Li Zhang (lizhang@suse.de) wrote:
>> When testing live migration with multifd channels (8, 16, or a bigger number)
>> and using qemu -incoming (without "defer"), if a network error occurs
>> (for example, triggering the kernel SYN flooding detection),
>> the migration fails and the guest hangs forever.
>>
>> The test environment and the command line is as the following:
>>
>> QEMU verions: QEMU emulator version 6.2.91 (v6.2.0-rc1-47-gc5fbdd60cf)
>> Host OS: SLE 15 with kernel: 5.14.5-1-default
>> Network Card: mlx5 100Gbps
>> Network card: Intel Corporation I350 Gigabit (1Gbps)
>>
>> Source:
>> qemu-system-x86_64 -M q35 -smp 32 -nographic \
>> -serial telnet:10.156.208.153:4321,server,nowait \
>> -m 4096 -enable-kvm -hda /var/lib/libvirt/images/openSUSE-15.3.img \
>> -monitor stdio
>> Dest:
>> qemu-system-x86_64 -M q35 -smp 32 -nographic \
>> -serial telnet:10.156.208.154:4321,server,nowait \
>> -m 4096 -enable-kvm -hda /var/lib/libvirt/images/openSUSE-15.3.img \
>> -monitor stdio \
>> -incoming tcp:1.0.8.154:4000
>>
>> (qemu) migrate_set_parameter max-bandwidth 100G
>> (qemu) migrate_set_capability multifd on
>> (qemu) migrate_set_parameter multifd-channels 16
>>
>> The guest hangs when executing the command: migrate -d tcp:1.0.8.154:4000.
>>
>> If a network problem happens, TCP ACK is not received by destination
>> and the destination resets the connection with RST.
>>
>> No. Time Source Destination Protocol Length Info
>> 119 1.021169 1.0.8.153 1.0.8.154 TCP 1410 60166 → 4000 [PSH, ACK] Seq=65 Ack=1 Win=62720 Len=1344 TSval=1338662881 TSecr=1399531897
>> No. Time Source Destination Protocol Length Info
>> 125 1.021181 1.0.8.154 1.0.8.153 TCP 54 4000 → 60166 [RST] Seq=1 Win=0 Len=0
>>
>> kernel log:
>> [334520.229445] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>> [334562.994919] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>> [334695.519927] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>> [334734.689511] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>> [335687.740415] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>> [335730.013598] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
> Should we document somewhere how to avoid that? Is there something we
> should be doing in the connection code to avoid it?
We should use the command line -incoming defer in QEMU command line
instead of -incoming ip:port.
And the backlog of the socket will be set as the same as multifd
channels, this problem doesn't happen as far as I test.
If we use --incoming ip:port in the QEMU command line, the backlog of
the socket is always 1, it will cause the SYN flooding.
>
> Dave
>
>> There are two problems here:
>> 1. On the send side, the main thread is blocked on qemu_thread_join and
>> send threads are blocked on sendmsg
>> 2. On receive side, the receive threads are blocked on qemu_sem_wait to
>> wait for a semaphore.
>>
>> The patch is to fix the first problem, and the guest doesn't hang any more.
>> But there is no better solution to fix the second problem yet.
>>
>> Li Zhang (1):
>> multifd: Shut down the QIO channels to avoid blocking the send threads
>> when they are terminated.
>>
>> migration/multifd.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> --
>> 2.31.1
>>
next prev parent reply other threads:[~2021-12-07 13:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-03 11:55 [PATCH v2 0/1] migration: multifd live migration improvement Li Zhang
2021-12-03 11:55 ` [PATCH v2 1/1] multifd: Shut down the QIO channels to avoid blocking the send threads when they are terminated Li Zhang
2021-12-03 14:00 ` Daniel P. Berrangé
2021-12-06 8:55 ` Li Zhang
2021-12-06 19:50 ` Dr. David Alan Gilbert
2021-12-07 13:49 ` Li Zhang
2021-12-09 10:52 ` Juan Quintela
2021-12-06 19:54 ` [PATCH v2 0/1] migration: multifd live migration improvement Dr. David Alan Gilbert
2021-12-07 13:45 ` Li Zhang [this message]
2021-12-07 14:16 ` Daniel P. Berrangé
2021-12-07 15:32 ` Li Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e55634a9-bb30-de28-9dec-2dee15d9cb41@suse.de \
--to=lizhang@suse.de \
--cc=berrange@redhat.com \
--cc=cfontana@suse.de \
--cc=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).