qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
@ 2018-01-02 11:17 Stefan Priebe - Profihost AG
  2018-01-02 14:20 ` Wei Xu
  2018-01-03  8:14 ` Alexandre DERUMIER
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-01-02 11:17 UTC (permalink / raw)
  To: qemu-devel

Hello,

currently i'm trying to fix a problem where we have "random" missing
packets.

We're doing an ssh connect from machine a to machine b every 5 minutes
via rsync and ssh.

Sometimes it happens that we get this cron message:
"Connection to 192.168.0.2 closed by remote host.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
ssh: connect to host 192.168.0.2 port 22: Connection refused"

The tap devices on the target vm shows dropped RX packages on BOTH tap
interfaces - strangely with the same amount of pkts?

# ifconfig tap317i0; ifconfig tap317i1
tap317i0  Link encap:Ethernet  HWaddr 6e:cb:65:94:bb:bf
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0
          TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:177991267 (169.7 MiB)  TX bytes:910412749 (868.2 MiB)

tap317i1  Link encap:Ethernet  HWaddr 96:f8:b5:d0:9a:07
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0
          TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1597564313 (1.4 GiB)  TX bytes:3517734365 (3.2 GiB)

Any ideas how to inspect this issue?

Greets,
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-02 11:17 [Qemu-devel] dropped pkts with Qemu on tap interace (RX) Stefan Priebe - Profihost AG
@ 2018-01-02 14:20 ` Wei Xu
  2018-01-02 15:24   ` Stefan Priebe - Profihost AG
  2018-01-03  8:14 ` Alexandre DERUMIER
  1 sibling, 1 reply; 10+ messages in thread
From: Wei Xu @ 2018-01-02 14:20 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: qemu-devel

On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG wrote:
> Hello,
> 
> currently i'm trying to fix a problem where we have "random" missing
> packets.
> 
> We're doing an ssh connect from machine a to machine b every 5 minutes
> via rsync and ssh.
> 
> Sometimes it happens that we get this cron message:
> "Connection to 192.168.0.2 closed by remote host.
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
> ssh: connect to host 192.168.0.2 port 22: Connection refused"

Hi Stefan,
What kind of virtio-net backend are you using? Can you paste your qemu
command line here?

'Connection refused' usually means that the client gets a TCP Reset rather
than losing packets, so this might not be a relevant issue.

Also you can do a tcpdump on both guests and see what happened to SSH packets
(tcpdump -i tapXXX port 22).

> 
> The tap devices on the target vm shows dropped RX packages on BOTH tap
> interfaces - strangely with the same amount of pkts?
> 
> # ifconfig tap317i0; ifconfig tap317i1
> tap317i0  Link encap:Ethernet  HWaddr 6e:cb:65:94:bb:bf
>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>           RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0
>           TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:177991267 (169.7 MiB)  TX bytes:910412749 (868.2 MiB)
> 
> tap317i1  Link encap:Ethernet  HWaddr 96:f8:b5:d0:9a:07
>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>           RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0
>           TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:1597564313 (1.4 GiB)  TX bytes:3517734365 (3.2 GiB)
> 
> Any ideas how to inspect this issue?

It seems both tap interfaces lose RX pkts, dropping pkts of RX means the
host(backend) cann't receive packets from the guest as fast as the guest sends.

Are you running some symmetrical test on both guests? 

Wei

> 
> Greets,
> Stefan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-02 14:20 ` Wei Xu
@ 2018-01-02 15:24   ` Stefan Priebe - Profihost AG
  2018-01-02 17:04     ` Wei Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-01-02 15:24 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel

Hi,
Am 02.01.2018 um 15:20 schrieb Wei Xu:
> On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG wrote:
>> Hello,
>>
>> currently i'm trying to fix a problem where we have "random" missing
>> packets.
>>
>> We're doing an ssh connect from machine a to machine b every 5 minutes
>> via rsync and ssh.
>>
>> Sometimes it happens that we get this cron message:
>> "Connection to 192.168.0.2 closed by remote host.
>> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
>> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
>> ssh: connect to host 192.168.0.2 port 22: Connection refused"
> 
> Hi Stefan,
> What kind of virtio-net backend are you using? Can you paste your qemu
> command line here?

Sure netdev part:
-netdev
type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
-device
virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
-netdev
type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
-device
virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301


> 'Connection refused' usually means that the client gets a TCP Reset rather
> than losing packets, so this might not be a relevant issue.

Mhm so you mean these might be two seperate ones?

> Also you can do a tcpdump on both guests and see what happened to SSH packets
> (tcpdump -i tapXXX port 22).

Sadly not as there's too much traffic on that part as rsync is syncing
every 5 minutes through ssh.

>> The tap devices on the target vm shows dropped RX packages on BOTH tap
>> interfaces - strangely with the same amount of pkts?
>>
>> # ifconfig tap317i0; ifconfig tap317i1
>> tap317i0  Link encap:Ethernet  HWaddr 6e:cb:65:94:bb:bf
>>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>           RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0
>>           TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:177991267 (169.7 MiB)  TX bytes:910412749 (868.2 MiB)
>>
>> tap317i1  Link encap:Ethernet  HWaddr 96:f8:b5:d0:9a:07
>>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>           RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0
>>           TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:1597564313 (1.4 GiB)  TX bytes:3517734365 (3.2 GiB)
>>
>> Any ideas how to inspect this issue?
> 
> It seems both tap interfaces lose RX pkts, dropping pkts of RX means the
> host(backend) cann't receive packets from the guest as fast as the guest sends.

Inside the guest i see no dropped packets at all. It's only on the host
and strangely on both taps at the same value? And both are connected to
absolutely different networks.

> Are you running some symmetrical test on both guests? 

No.

Stefan


> Wei
> 
>>
>> Greets,
>> Stefan
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-02 15:24   ` Stefan Priebe - Profihost AG
@ 2018-01-02 17:04     ` Wei Xu
  2018-01-02 21:17       ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 10+ messages in thread
From: Wei Xu @ 2018-01-02 17:04 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: qemu-devel

On Tue, Jan 02, 2018 at 04:24:33PM +0100, Stefan Priebe - Profihost AG wrote:
> Hi,
> Am 02.01.2018 um 15:20 schrieb Wei Xu:
> > On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG wrote:
> >> Hello,
> >>
> >> currently i'm trying to fix a problem where we have "random" missing
> >> packets.
> >>
> >> We're doing an ssh connect from machine a to machine b every 5 minutes
> >> via rsync and ssh.
> >>
> >> Sometimes it happens that we get this cron message:
> >> "Connection to 192.168.0.2 closed by remote host.
> >> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> >> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
> >> ssh: connect to host 192.168.0.2 port 22: Connection refused"
> > 
> > Hi Stefan,
> > What kind of virtio-net backend are you using? Can you paste your qemu
> > command line here?
> 
> Sure netdev part:
> -netdev
> type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
> -device
> virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
> -netdev
> type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
> -device
> virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301

According to what you have mentioned, the traffic is not heavy for the guests,
the dropping shouldn't happen for regular case.

What is your hardware platform? and Which versions are you using for both
guest/host kernel and qemu? Are there other VMs on the same host?

> 
> 
> > 'Connection refused' usually means that the client gets a TCP Reset rather
> > than losing packets, so this might not be a relevant issue.
> 
> Mhm so you mean these might be two seperate ones?

Yes.

> 
> > Also you can do a tcpdump on both guests and see what happened to SSH packets
> > (tcpdump -i tapXXX port 22).
> 
> Sadly not as there's too much traffic on that part as rsync is syncing
> every 5 minutes through ssh.

You can do a tcpdump for the entire traffic from the guest and host and compare
what kind of packets are dropped if the traffic is not overloaded.

Wei

> 
> >> The tap devices on the target vm shows dropped RX packages on BOTH tap
> >> interfaces - strangely with the same amount of pkts?
> >>
> >> # ifconfig tap317i0; ifconfig tap317i1
> >> tap317i0  Link encap:Ethernet  HWaddr 6e:cb:65:94:bb:bf
> >>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
> >>           RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0
> >>           TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0
> >>           collisions:0 txqueuelen:1000
> >>           RX bytes:177991267 (169.7 MiB)  TX bytes:910412749 (868.2 MiB)
> >>
> >> tap317i1  Link encap:Ethernet  HWaddr 96:f8:b5:d0:9a:07
> >>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
> >>           RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0
> >>           TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0
> >>           collisions:0 txqueuelen:1000
> >>           RX bytes:1597564313 (1.4 GiB)  TX bytes:3517734365 (3.2 GiB)
> >>
> >> Any ideas how to inspect this issue?
> > 
> > It seems both tap interfaces lose RX pkts, dropping pkts of RX means the
> > host(backend) cann't receive packets from the guest as fast as the guest sends.
> 
> Inside the guest i see no dropped packets at all. It's only on the host
> and strangely on both taps at the same value? And both are connected to
> absolutely different networks.
> 
> > Are you running some symmetrical test on both guests? 
> 
> No.
> 
> Stefan
> 
> 
> > Wei
> > 
> >>
> >> Greets,
> >> Stefan
> >>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-02 17:04     ` Wei Xu
@ 2018-01-02 21:17       ` Stefan Priebe - Profihost AG
  2018-01-03  3:57         ` Wei Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-01-02 21:17 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel


Am 02.01.2018 um 18:04 schrieb Wei Xu:
> On Tue, Jan 02, 2018 at 04:24:33PM +0100, Stefan Priebe - Profihost AG wrote:
>> Hi,
>> Am 02.01.2018 um 15:20 schrieb Wei Xu:
>>> On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG wrote:
>>>> Hello,
>>>>
>>>> currently i'm trying to fix a problem where we have "random" missing
>>>> packets.
>>>>
>>>> We're doing an ssh connect from machine a to machine b every 5 minutes
>>>> via rsync and ssh.
>>>>
>>>> Sometimes it happens that we get this cron message:
>>>> "Connection to 192.168.0.2 closed by remote host.
>>>> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
>>>> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
>>>> ssh: connect to host 192.168.0.2 port 22: Connection refused"
>>>
>>> Hi Stefan,
>>> What kind of virtio-net backend are you using? Can you paste your qemu
>>> command line here?
>>
>> Sure netdev part:
>> -netdev
>> type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
>> -device
>> virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
>> -netdev
>> type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
>> -device
>> virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301
> 
> According to what you have mentioned, the traffic is not heavy for the guests,
> the dropping shouldn't happen for regular case.

The avg traffic is around 300kb/s.

> What is your hardware platform?

Dual Intel Xeon E5-2680 v4

> and Which versions are you using for both
> guest/host kernel
Kernel v4.4.103

> and qemu?
2.9.1

> Are there other VMs on the same host?
Yes.


>>> 'Connection refused' usually means that the client gets a TCP Reset rather
>>> than losing packets, so this might not be a relevant issue.
>>
>> Mhm so you mean these might be two seperate ones?
> 
> Yes.
> 
>>
>>> Also you can do a tcpdump on both guests and see what happened to SSH packets
>>> (tcpdump -i tapXXX port 22).
>>
>> Sadly not as there's too much traffic on that part as rsync is syncing
>> every 5 minutes through ssh.
> 
> You can do a tcpdump for the entire traffic from the guest and host and compare
> what kind of packets are dropped if the traffic is not overloaded.

Are you sure? I don't get why the same amount and same kind of packets
should be received by both tap which are connected to different bridges
to different HW and physical interfaces.

Stefan

> Wei
> 
>>
>>>> The tap devices on the target vm shows dropped RX packages on BOTH tap
>>>> interfaces - strangely with the same amount of pkts?
>>>>
>>>> # ifconfig tap317i0; ifconfig tap317i1
>>>> tap317i0  Link encap:Ethernet  HWaddr 6e:cb:65:94:bb:bf
>>>>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>>>           RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0
>>>>           TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0
>>>>           collisions:0 txqueuelen:1000
>>>>           RX bytes:177991267 (169.7 MiB)  TX bytes:910412749 (868.2 MiB)
>>>>
>>>> tap317i1  Link encap:Ethernet  HWaddr 96:f8:b5:d0:9a:07
>>>>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>>>           RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0
>>>>           TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0
>>>>           collisions:0 txqueuelen:1000
>>>>           RX bytes:1597564313 (1.4 GiB)  TX bytes:3517734365 (3.2 GiB)
>>>>
>>>> Any ideas how to inspect this issue?
>>>
>>> It seems both tap interfaces lose RX pkts, dropping pkts of RX means the
>>> host(backend) cann't receive packets from the guest as fast as the guest sends.
>>
>> Inside the guest i see no dropped packets at all. It's only on the host
>> and strangely on both taps at the same value? And both are connected to
>> absolutely different networks.
>>
>>> Are you running some symmetrical test on both guests? 
>>
>> No.
>>
>> Stefan
>>
>>
>>> Wei
>>>
>>>>
>>>> Greets,
>>>> Stefan
>>>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-02 21:17       ` Stefan Priebe - Profihost AG
@ 2018-01-03  3:57         ` Wei Xu
  2018-01-03 15:07           ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 10+ messages in thread
From: Wei Xu @ 2018-01-03  3:57 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: qemu-devel

On Tue, Jan 02, 2018 at 10:17:25PM +0100, Stefan Priebe - Profihost AG wrote:
> 
> Am 02.01.2018 um 18:04 schrieb Wei Xu:
> > On Tue, Jan 02, 2018 at 04:24:33PM +0100, Stefan Priebe - Profihost AG wrote:
> >> Hi,
> >> Am 02.01.2018 um 15:20 schrieb Wei Xu:
> >>> On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>> Hello,
> >>>>
> >>>> currently i'm trying to fix a problem where we have "random" missing
> >>>> packets.
> >>>>
> >>>> We're doing an ssh connect from machine a to machine b every 5 minutes
> >>>> via rsync and ssh.
> >>>>
> >>>> Sometimes it happens that we get this cron message:
> >>>> "Connection to 192.168.0.2 closed by remote host.
> >>>> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> >>>> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
> >>>> ssh: connect to host 192.168.0.2 port 22: Connection refused"
> >>>
> >>> Hi Stefan,
> >>> What kind of virtio-net backend are you using? Can you paste your qemu
> >>> command line here?
> >>
> >> Sure netdev part:
> >> -netdev
> >> type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
> >> -device
> >> virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
> >> -netdev
> >> type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
> >> -device
> >> virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301
> > 
> > According to what you have mentioned, the traffic is not heavy for the guests,
> > the dropping shouldn't happen for regular case.
> 
> The avg traffic is around 300kb/s.
> 
> > What is your hardware platform?
> 
> Dual Intel Xeon E5-2680 v4
> 
> > and Which versions are you using for both
> > guest/host kernel
> Kernel v4.4.103
> 
> > and qemu?
> 2.9.1
> 
> > Are there other VMs on the same host?
> Yes.

What about the CPU load? 

> 
> 
> >>> 'Connection refused' usually means that the client gets a TCP Reset rather
> >>> than losing packets, so this might not be a relevant issue.
> >>
> >> Mhm so you mean these might be two seperate ones?
> > 
> > Yes.
> > 
> >>
> >>> Also you can do a tcpdump on both guests and see what happened to SSH packets
> >>> (tcpdump -i tapXXX port 22).
> >>
> >> Sadly not as there's too much traffic on that part as rsync is syncing
> >> every 5 minutes through ssh.
> > 
> > You can do a tcpdump for the entire traffic from the guest and host and compare
> > what kind of packets are dropped if the traffic is not overloaded.
> 
> Are you sure? I don't get why the same amount and same kind of packets
> should be received by both tap which are connected to different bridges
> to different HW and physical interfaces.

Exactly, possibly this would be a host or guest kernel bug cos than qemu issue
you are using vhost kernel as the backend and the two stats are independent,
you might have to check out what is happening inside the traffic.

Wei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-02 11:17 [Qemu-devel] dropped pkts with Qemu on tap interace (RX) Stefan Priebe - Profihost AG
  2018-01-02 14:20 ` Wei Xu
@ 2018-01-03  8:14 ` Alexandre DERUMIER
  2018-01-03 15:10   ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 10+ messages in thread
From: Alexandre DERUMIER @ 2018-01-03  8:14 UTC (permalink / raw)
  To: Stefan Priebe, Profihost AG; +Cc: qemu-devel

Hi Stefan,

>>The tap devices on the target vm shows dropped RX packages on BOTH tap 
>>interfaces - strangely with the same amount of pkts? 

that's strange indeed. 
if you tcpdump tap interfaces, do you see incoming traffic only on 1 interface, or both random ?

(can you provide the network configuration in the guest for both interfaces ?)


I'm seeing that you have enable multiqueue on 1 of the interfaces, do you have setup correctly the multiqueue part inside the guest.
do you have enough vcpu to handle all the queues ?


----- Mail original -----
De: "Stefan Priebe, Profihost AG" <s.priebe@profihost.ag>
À: "qemu-devel" <qemu-devel@nongnu.org>
Envoyé: Mardi 2 Janvier 2018 12:17:29
Objet: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)

Hello, 

currently i'm trying to fix a problem where we have "random" missing 
packets. 

We're doing an ssh connect from machine a to machine b every 5 minutes 
via rsync and ssh. 

Sometimes it happens that we get this cron message: 
"Connection to 192.168.0.2 closed by remote host. 
rsync: connection unexpectedly closed (0 bytes received so far) [sender] 
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2] 
ssh: connect to host 192.168.0.2 port 22: Connection refused" 

The tap devices on the target vm shows dropped RX packages on BOTH tap 
interfaces - strangely with the same amount of pkts? 

# ifconfig tap317i0; ifconfig tap317i1 
tap317i0 Link encap:Ethernet HWaddr 6e:cb:65:94:bb:bf 
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 
RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0 
TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0 
collisions:0 txqueuelen:1000 
RX bytes:177991267 (169.7 MiB) TX bytes:910412749 (868.2 MiB) 

tap317i1 Link encap:Ethernet HWaddr 96:f8:b5:d0:9a:07 
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 
RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0 
TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0 
collisions:0 txqueuelen:1000 
RX bytes:1597564313 (1.4 GiB) TX bytes:3517734365 (3.2 GiB) 

Any ideas how to inspect this issue? 

Greets, 
Stefan 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-03  3:57         ` Wei Xu
@ 2018-01-03 15:07           ` Stefan Priebe - Profihost AG
  2018-01-04  3:09             ` Wei Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-01-03 15:07 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel


Am 03.01.2018 um 04:57 schrieb Wei Xu:
> On Tue, Jan 02, 2018 at 10:17:25PM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 02.01.2018 um 18:04 schrieb Wei Xu:
>>> On Tue, Jan 02, 2018 at 04:24:33PM +0100, Stefan Priebe - Profihost AG wrote:
>>>> Hi,
>>>> Am 02.01.2018 um 15:20 schrieb Wei Xu:
>>>>> On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello,
>>>>>>
>>>>>> currently i'm trying to fix a problem where we have "random" missing
>>>>>> packets.
>>>>>>
>>>>>> We're doing an ssh connect from machine a to machine b every 5 minutes
>>>>>> via rsync and ssh.
>>>>>>
>>>>>> Sometimes it happens that we get this cron message:
>>>>>> "Connection to 192.168.0.2 closed by remote host.
>>>>>> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
>>>>>> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
>>>>>> ssh: connect to host 192.168.0.2 port 22: Connection refused"
>>>>>
>>>>> Hi Stefan,
>>>>> What kind of virtio-net backend are you using? Can you paste your qemu
>>>>> command line here?
>>>>
>>>> Sure netdev part:
>>>> -netdev
>>>> type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
>>>> -device
>>>> virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
>>>> -netdev
>>>> type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
>>>> -device
>>>> virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301
>>>
>>> According to what you have mentioned, the traffic is not heavy for the guests,
>>> the dropping shouldn't happen for regular case.
>>
>> The avg traffic is around 300kb/s.
>>
>>> What is your hardware platform?
>>
>> Dual Intel Xeon E5-2680 v4
>>
>>> and Which versions are you using for both
>>> guest/host kernel
>> Kernel v4.4.103
>>
>>> and qemu?
>> 2.9.1
>>
>>> Are there other VMs on the same host?
>> Yes.
> 
> What about the CPU load? 

Host:
80-90% Idle
LoadAvg: 6-7

VM:
97%-99% Idle

>>>>> 'Connection refused' usually means that the client gets a TCP Reset rather
>>>>> than losing packets, so this might not be a relevant issue.
>>>>
>>>> Mhm so you mean these might be two seperate ones?
>>>
>>> Yes.
>>>
>>>>
>>>>> Also you can do a tcpdump on both guests and see what happened to SSH packets
>>>>> (tcpdump -i tapXXX port 22).
>>>>
>>>> Sadly not as there's too much traffic on that part as rsync is syncing
>>>> every 5 minutes through ssh.
>>>
>>> You can do a tcpdump for the entire traffic from the guest and host and compare
>>> what kind of packets are dropped if the traffic is not overloaded.
>>
>> Are you sure? I don't get why the same amount and same kind of packets
>> should be received by both tap which are connected to different bridges
>> to different HW and physical interfaces.
> 
> Exactly, possibly this would be a host or guest kernel bug cos than qemu issue
> you are using vhost kernel as the backend and the two stats are independent,
> you might have to check out what is happening inside the traffic.

What do you mean by inside the traffic?

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-03  8:14 ` Alexandre DERUMIER
@ 2018-01-03 15:10   ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Priebe - Profihost AG @ 2018-01-03 15:10 UTC (permalink / raw)
  To: Alexandre DERUMIER; +Cc: qemu-devel

Am 03.01.2018 um 09:14 schrieb Alexandre DERUMIER:
> Hi Stefan,
> 
>>> The tap devices on the target vm shows dropped RX packages on BOTH tap 
>>> interfaces - strangely with the same amount of pkts? 
> 
> that's strange indeed. 
> if you tcpdump tap interfaces, do you see incoming traffic only on 1 interface, or both random ?

complete independend random traffic as it should.

> (can you provide the network configuration in the guest for both interfaces ?)

inside the guest? where the drop counter stays 0?

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet static
        address 192.168.0.2
        netmask 255.255.255.0

that's it.

> I'm seeing that you have enable multiqueue on 1 of the interfaces, do you have setup correctly the multiqueue part inside the guest.
uh oh? What is needed inside the guest?

> do you have enough vcpu to handle all the queues ?
Yes.

Stefan

> ----- Mail original -----
> De: "Stefan Priebe, Profihost AG" <s.priebe@profihost.ag>
> À: "qemu-devel" <qemu-devel@nongnu.org>
> Envoyé: Mardi 2 Janvier 2018 12:17:29
> Objet: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
> 
> Hello, 
> 
> currently i'm trying to fix a problem where we have "random" missing 
> packets. 
> 
> We're doing an ssh connect from machine a to machine b every 5 minutes 
> via rsync and ssh. 
> 
> Sometimes it happens that we get this cron message: 
> "Connection to 192.168.0.2 closed by remote host. 
> rsync: connection unexpectedly closed (0 bytes received so far) [sender] 
> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2] 
> ssh: connect to host 192.168.0.2 port 22: Connection refused" 
> 
> The tap devices on the target vm shows dropped RX packages on BOTH tap 
> interfaces - strangely with the same amount of pkts? 
> 
> # ifconfig tap317i0; ifconfig tap317i1 
> tap317i0 Link encap:Ethernet HWaddr 6e:cb:65:94:bb:bf 
> UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 
> RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0 
> TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0 
> collisions:0 txqueuelen:1000 
> RX bytes:177991267 (169.7 MiB) TX bytes:910412749 (868.2 MiB) 
> 
> tap317i1 Link encap:Ethernet HWaddr 96:f8:b5:d0:9a:07 
> UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 
> RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0 
> TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0 
> collisions:0 txqueuelen:1000 
> RX bytes:1597564313 (1.4 GiB) TX bytes:3517734365 (3.2 GiB) 
> 
> Any ideas how to inspect this issue? 
> 
> Greets, 
> Stefan 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
  2018-01-03 15:07           ` Stefan Priebe - Profihost AG
@ 2018-01-04  3:09             ` Wei Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Wei Xu @ 2018-01-04  3:09 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: qemu-devel

On Wed, Jan 03, 2018 at 04:07:44PM +0100, Stefan Priebe - Profihost AG wrote:
> 
> Am 03.01.2018 um 04:57 schrieb Wei Xu:
> > On Tue, Jan 02, 2018 at 10:17:25PM +0100, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 02.01.2018 um 18:04 schrieb Wei Xu:
> >>> On Tue, Jan 02, 2018 at 04:24:33PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>> Hi,
> >>>> Am 02.01.2018 um 15:20 schrieb Wei Xu:
> >>>>> On Tue, Jan 02, 2018 at 12:17:29PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> currently i'm trying to fix a problem where we have "random" missing
> >>>>>> packets.
> >>>>>>
> >>>>>> We're doing an ssh connect from machine a to machine b every 5 minutes
> >>>>>> via rsync and ssh.
> >>>>>>
> >>>>>> Sometimes it happens that we get this cron message:
> >>>>>> "Connection to 192.168.0.2 closed by remote host.
> >>>>>> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> >>>>>> rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
> >>>>>> ssh: connect to host 192.168.0.2 port 22: Connection refused"
> >>>>>
> >>>>> Hi Stefan,
> >>>>> What kind of virtio-net backend are you using? Can you paste your qemu
> >>>>> command line here?
> >>>>
> >>>> Sure netdev part:
> >>>> -netdev
> >>>> type=tap,id=net0,ifname=tap317i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
> >>>> -device
> >>>> virtio-net-pci,mac=EA:37:42:5C:F3:33,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
> >>>> -netdev
> >>>> type=tap,id=net1,ifname=tap317i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4
> >>>> -device
> >>>> virtio-net-pci,mac=6A:8E:74:45:1A:0B,nedev=net1,bus=pci.0,addr=0x13,id=net1,vectors=10,mq=on,bootindex=301
> >>>
> >>> According to what you have mentioned, the traffic is not heavy for the guests,
> >>> the dropping shouldn't happen for regular case.
> >>
> >> The avg traffic is around 300kb/s.
> >>
> >>> What is your hardware platform?
> >>
> >> Dual Intel Xeon E5-2680 v4
> >>
> >>> and Which versions are you using for both
> >>> guest/host kernel
> >> Kernel v4.4.103
> >>
> >>> and qemu?
> >> 2.9.1
> >>
> >>> Are there other VMs on the same host?
> >> Yes.
> > 
> > What about the CPU load? 
> 
> Host:
> 80-90% Idle
> LoadAvg: 6-7
> 
> VM:
> 97%-99% Idle
> 

OK, then this shouldn't be a concern.

> >>>>> 'Connection refused' usually means that the client gets a TCP Reset rather
> >>>>> than losing packets, so this might not be a relevant issue.
> >>>>
> >>>> Mhm so you mean these might be two seperate ones?
> >>>
> >>> Yes.
> >>>
> >>>>
> >>>>> Also you can do a tcpdump on both guests and see what happened to SSH packets
> >>>>> (tcpdump -i tapXXX port 22).
> >>>>
> >>>> Sadly not as there's too much traffic on that part as rsync is syncing
> >>>> every 5 minutes through ssh.
> >>>
> >>> You can do a tcpdump for the entire traffic from the guest and host and compare
> >>> what kind of packets are dropped if the traffic is not overloaded.
> >>
> >> Are you sure? I don't get why the same amount and same kind of packets
> >> should be received by both tap which are connected to different bridges
> >> to different HW and physical interfaces.
> > 
> > Exactly, possibly this would be a host or guest kernel bug cos than qemu issue
> > you are using vhost kernel as the backend and the two stats are independent,
> > you might have to check out what is happening inside the traffic.
> 
> What do you mean by inside the traffic?

You might need to figure what kind of packets are dropped on host tap interface,
are they random packets or specific packets?

There are few other tests which help to see what happened besides triaging
the traffic, or you can try alternative tests according to your test bed.

1). Upgrade host & guest kernel to latest kernel and see if it comes up, you can
use net-next tree.
    git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git

2). Do some traffic throughput(netperf, iperf, etc) on both guests(traffic from 
guest to host if the guests are isolated due to your comments) and check out
the statistics.

Wei

> 
> Stefan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-04  2:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-02 11:17 [Qemu-devel] dropped pkts with Qemu on tap interace (RX) Stefan Priebe - Profihost AG
2018-01-02 14:20 ` Wei Xu
2018-01-02 15:24   ` Stefan Priebe - Profihost AG
2018-01-02 17:04     ` Wei Xu
2018-01-02 21:17       ` Stefan Priebe - Profihost AG
2018-01-03  3:57         ` Wei Xu
2018-01-03 15:07           ` Stefan Priebe - Profihost AG
2018-01-04  3:09             ` Wei Xu
2018-01-03  8:14 ` Alexandre DERUMIER
2018-01-03 15:10   ` Stefan Priebe - Profihost AG

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).