From: Li Zhang <lizhang@suse.de>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: cfontana@suse.de, quintela@redhat.com,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
qemu-devel@nongnu.org
Subject: Re: [PATCH v2 0/1] migration: multifd live migration improvement
Date: Tue, 7 Dec 2021 16:32:36 +0100 [thread overview]
Message-ID: <3be2bdc7-d7c2-2a8a-66e1-d889f5bca5c2@suse.de> (raw)
In-Reply-To: <Ya9sx9AMwQ2Kwooj@redhat.com>
On 12/7/21 3:16 PM, Daniel P. Berrangé wrote:
> On Tue, Dec 07, 2021 at 02:45:10PM +0100, Li Zhang wrote:
>> On 12/6/21 8:54 PM, Dr. David Alan Gilbert wrote:
>>> * Li Zhang (lizhang@suse.de) wrote:
>>>> When testing live migration with multifd channels (8, 16, or a bigger number)
>>>> and using qemu -incoming (without "defer"), if a network error occurs
>>>> (for example, triggering the kernel SYN flooding detection),
>>>> the migration fails and the guest hangs forever.
>>>>
>>>> The test environment and the command line is as the following:
>>>>
>>>> QEMU verions: QEMU emulator version 6.2.91 (v6.2.0-rc1-47-gc5fbdd60cf)
>>>> Host OS: SLE 15 with kernel: 5.14.5-1-default
>>>> Network Card: mlx5 100Gbps
>>>> Network card: Intel Corporation I350 Gigabit (1Gbps)
>>>>
>>>> Source:
>>>> qemu-system-x86_64 -M q35 -smp 32 -nographic \
>>>> -serial telnet:10.156.208.153:4321,server,nowait \
>>>> -m 4096 -enable-kvm -hda /var/lib/libvirt/images/openSUSE-15.3.img \
>>>> -monitor stdio
>>>> Dest:
>>>> qemu-system-x86_64 -M q35 -smp 32 -nographic \
>>>> -serial telnet:10.156.208.154:4321,server,nowait \
>>>> -m 4096 -enable-kvm -hda /var/lib/libvirt/images/openSUSE-15.3.img \
>>>> -monitor stdio \
>>>> -incoming tcp:1.0.8.154:4000
>>>>
>>>> (qemu) migrate_set_parameter max-bandwidth 100G
>>>> (qemu) migrate_set_capability multifd on
>>>> (qemu) migrate_set_parameter multifd-channels 16
>>>>
>>>> The guest hangs when executing the command: migrate -d tcp:1.0.8.154:4000.
>>>>
>>>> If a network problem happens, TCP ACK is not received by destination
>>>> and the destination resets the connection with RST.
>>>>
>>>> No. Time Source Destination Protocol Length Info
>>>> 119 1.021169 1.0.8.153 1.0.8.154 TCP 1410 60166 → 4000 [PSH, ACK] Seq=65 Ack=1 Win=62720 Len=1344 TSval=1338662881 TSecr=1399531897
>>>> No. Time Source Destination Protocol Length Info
>>>> 125 1.021181 1.0.8.154 1.0.8.153 TCP 54 4000 → 60166 [RST] Seq=1 Win=0 Len=0
>>>>
>>>> kernel log:
>>>> [334520.229445] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>>>> [334562.994919] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>>>> [334695.519927] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>>>> [334734.689511] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>>>> [335687.740415] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>>>> [335730.013598] TCP: request_sock_TCP: Possible SYN flooding on port 4000. Sending cookies. Check SNMP counters.
>>> Should we document somewhere how to avoid that? Is there something we
>>> should be doing in the connection code to avoid it?
>> We should use the command line -incoming defer in QEMU command line instead
>> of -incoming ip:port.
>>
>> And the backlog of the socket will be set as the same as multifd channels,
>> this problem doesn't happen as far as I test.
>>
>> If we use --incoming ip:port in the QEMU command line, the backlog of the
>> socket is always 1, it will cause the SYN flooding.
> Do we send migration parameters from the src to the dst QEMU ?
No, I don't think we send migration parameters from the src to the dest
QEMU.
I set migration parameters on both sides from qemu monitor seperately.
> There are a bunch of things that we need to set to the same
> value on the src and dst. If we sent any relevant MigrationParameters
> fields to the dst, when the first/main migration chanel is opened, it
> could validate that it is configured in a way that is compatible with
> the src. If it isn't, it can drop the main channel immediately. This
> would trigger the src to fail the migration and we couldn't get stuck
> setting up the secondary data channels for multifd.
OK, currently, we have same parameters on both sides if we set them the
same parameters.
If we use -incoming tcp:ip:port because the multifd is disabled by
default and backlog is 1 when the socket is created.
Here is the function which set the backlog:
static void
socket_start_incoming_migration_internal(SocketAddress *saddr,
Error **errp)
{
QIONetListener *listener = qio_net_listener_new();
MigrationIncomingState *mis = migration_incoming_get_current();
size_t i;
int num = 1;
qio_net_listener_set_name(listener, "migration-socket-listener");
if (migrate_use_multifd()) {
num = migrate_multifd_channels();
}
...
}
The process with -incoming tcp:ip:port is as the following:
1. Create qemu process with command line -incoming tcp:ip:port
2. socket_start_incoming_migration_internal is called and backlog is:
num=1, multifd is disabled, num = migrate_multifd_channels() is not called
3. Enable multifd and set multifd parameters, but the backlog is still
1, because the it couldn't be changed anymore.
4. Run migration
The process with -incoming defer is as the following:
1. Create qemu process with command line -incoming defer
2. Enable multifd and set multifd parameters
3. Execute the command (qemu) migrate_incoming tcp:ip:port
4. Call socket_start_incoming_migration_internal then the backlog is
set: num = migrate_multifd_channels();
5. Run migration
>
> Regards,
> Daniel
prev parent reply other threads:[~2021-12-07 15:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-03 11:55 [PATCH v2 0/1] migration: multifd live migration improvement Li Zhang
2021-12-03 11:55 ` [PATCH v2 1/1] multifd: Shut down the QIO channels to avoid blocking the send threads when they are terminated Li Zhang
2021-12-03 14:00 ` Daniel P. Berrangé
2021-12-06 8:55 ` Li Zhang
2021-12-06 19:50 ` Dr. David Alan Gilbert
2021-12-07 13:49 ` Li Zhang
2021-12-09 10:52 ` Juan Quintela
2021-12-06 19:54 ` [PATCH v2 0/1] migration: multifd live migration improvement Dr. David Alan Gilbert
2021-12-07 13:45 ` Li Zhang
2021-12-07 14:16 ` Daniel P. Berrangé
2021-12-07 15:32 ` Li Zhang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3be2bdc7-d7c2-2a8a-66e1-d889f5bca5c2@suse.de \
--to=lizhang@suse.de \
--cc=berrange@redhat.com \
--cc=cfontana@suse.de \
--cc=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).