From mboxrd@z Thu Jan  1 00:00:00 1970
From: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Subject: Re: Regression in throughput between kvm guests over virtual bridge
Date: Mon, 20 Nov 2017 14:25:17 -0500
Message-ID: <edfeb137-9047-53e1-72d7-26efed923682@linux.vnet.ibm.com>
References: <c0b42b27-56f6-c2f9-9476-28d25678808a@linux.vnet.ibm.com>
 <376f8939-1990-abf6-1f5f-57b3822f94fe@redhat.com>
 <c117c0c5-7fc0-013b-6f54-2e722d115101@linux.vnet.ibm.com>
 <20171026094415.uyogf2iw7yoavnoc@Wei-Dev>
 <da80025f-6942-615f-570e-5005a25eb147@linux.vnet.ibm.com>
 <20171031070717.wcbgrp6thrjmtrh3@Wei-Dev>
 <56710dc8-f289-0211-db97-1a1ea29e38f7@linux.vnet.ibm.com>
 <20171104233519.7jwja7t2itooyeak@Wei-Dev>
 <1611b26f-0997-3b22-95f5-debf57b7be8c@linux.vnet.ibm.com>
 <101d1fdf-9df1-44bd-73a7-e7d8fbc09160@linux.vnet.ibm.com>
 <20171112183406.zuuj7w3fmtb4eduf@Wei-Dev>
 <9996b0f1-ffa6-ff95-2e9c-0deccf4623ae@linux.vnet.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Cc: Jason Wang <jasowang@redhat.com>, mst@redhat.com,
        netdev@vger.kernel.org, davem@davemloft.net
To: Wei Xu <wexu@redhat.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:37026 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1752107AbdKTT0c (ORCPT
        <rfc822;netdev@vger.kernel.org>); Mon, 20 Nov 2017 14:26:32 -0500
Received: from pps.filterd (m0098404.ppops.net [127.0.0.1])
        by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vAKJPhY0058447
        for <netdev@vger.kernel.org>; Mon, 20 Nov 2017 14:26:32 -0500
Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.152])
        by mx0a-001b2d01.pphosted.com with ESMTP id 2ec1ugbeku-1
        (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT)
        for <netdev@vger.kernel.org>; Mon, 20 Nov 2017 14:25:48 -0500
Received: from localhost
        by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
        for <netdev@vger.kernel.org> from <mjrosato@linux.vnet.ibm.com>;
        Mon, 20 Nov 2017 12:25:21 -0700
In-Reply-To: <9996b0f1-ffa6-ff95-2e9c-0deccf4623ae@linux.vnet.ibm.com>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On 11/14/2017 03:11 PM, Matthew Rosato wrote:
> On 11/12/2017 01:34 PM, Wei Xu wrote:
>> On Sat, Nov 11, 2017 at 03:59:54PM -0500, Matthew Rosato wrote:
>>>>> This case should be quite similar with pkgten, if you got improvement with
>>>>> pktgen, usually it was also the same for UDP, could you please try to disable
>>>>> tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? Currently
>>>>> the most significant tests would be like this AFAICT:
>>>>>
>>>>> Host->VM     4.12    4.13
>>>>>  TCP:
>>>>>  UDP:
>>>>> pktgen:

So, I automated these scenarios for extended overnight runs and started
experiencing OOM conditions overnight on a 40G system.  I did a bisect
and it also points to c67df11f.  I can see a leak in at least all of the
Host->VM testcases (TCP, UDP, pktgen), but the pktgen scenario shows the
fastest leak.

I enabled slub_debug on base 4.13 and ran my pktgen scenario in short
intervals until a large% of host memory was consumed.  Numbers below
after the last pktgen run completed. The summary is that a very large #
of active skbuff_head_cache entries can be seen - The sum of alloc/free
calls match up, but the # of active skbuff_head_cache entries keeps
growing each time the workload is run and never goes back down in
between runs.

free -h:
     total        used        free      shared  buff/cache   available
Mem:   39G         31G        6.6G        472K        1.4G        6.8G

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

1001952 1000610  99%    0.75K  23856	   42    763392K skbuff_head_cache
126192 126153  99%    0.36K   2868	 44     45888K ksm_rmap_item
100485 100435  99%    0.41K   1305	 77     41760K kernfs_node_cache
 63294  39598  62%    0.48K    959	 66     30688K dentry
 31968  31719  99%    0.88K    888	 36     28416K inode_cache

/sys/kernel/slab/skbuff_head_cache/alloc_calls :
    259 __alloc_skb+0x68/0x188 age=1/135076/135741 pid=0-11776 cpus=0,2,4,18
1000351 __build_skb+0x42/0xb0 age=8114/63172/117830 pid=0-11863 cpus=0,10

/sys/kernel/slab/skbuff_head_cache/free_calls:
  13492 <not-available> age=4295073614 pid=0 cpus=0
 978298 tun_do_read.part.10+0x18c/0x6a0 age=8532/63624/110571 pid=11733
cpus=1-19
      6 skb_free_datagram+0x32/0x78 age=11648/73253/110173 pid=11325
cpus=4,8,10,12,14
      3 __dev_kfree_skb_any+0x5e/0x70 age=108957/115043/118269
pid=0-11605 cpus=5,7,12
      1 netlink_broadcast_filtered+0x172/0x470 age=136165 pid=1 cpus=4
      2 netlink_dump+0x268/0x2a8 age=73236/86857/100479 pid=11325 cpus=4,12
      1 netlink_unicast+0x1ae/0x220 age=12991 pid=9922 cpus=12
      1 tcp_recvmsg+0x2e2/0xa60 age=0 pid=11776 cpus=6
      3 unix_stream_read_generic+0x810/0x908 age=15443/50904/118273
pid=9915-11581 cpus=8,16,18
      2 tap_do_read+0x16a/0x488 [tap] age=42338/74246/106155
pid=11605-11699 cpus=2,9
      1 macvlan_process_broadcast+0x17e/0x1e0 [macvlan] age=18835
pid=331 cpus=11
   8800 pktgen_thread_worker+0x80a/0x16d8 [pktgen] age=8545/62184/110571
pid=11863 cpus=0


By comparison, when running 4.13 with c67df11f reverted, here's the same
output after the exact same test:

free -h:
       total        used        free      shared  buff/cache   available
Mem:     39G        783M         37G        472K        637M         37G

slabtop:
  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
   714    256  35%    0.75K     17	 42	  544K skbuff_head_cache

/sys/kernel/slab/skbuff_head_cache/alloc_calls:
    257 __alloc_skb+0x68/0x188 age=0/65252/65507 pid=1-11768 cpus=10,15
/sys/kernel/slab/skbuff_head_cache/free_calls:
    255 <not-available> age=4295003081 pid=0 cpus=0
      1 netlink_broadcast_filtered+0x2e8/0x4e0 age=65601 pid=1 cpus=15
      1 tcp_recvmsg+0x2e2/0xa60 age=0 pid=11768 cpus=16