From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752862AbcF2O0Q (ORCPT <rfc822;w@1wt.eu>);
	Wed, 29 Jun 2016 10:26:16 -0400
Received: from mail-wm0-f48.google.com ([74.125.82.48]:38145 "EHLO
	mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752537AbcF2O0N (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 29 Jun 2016 10:26:13 -0400
Subject: Re: Unbounded growth of slab caches and how to shrink them
To: Christoph Lameter <cl@linux.com>
References: <5773A427.2080100@kyup.com>
 <alpine.DEB.2.20.1606290854500.14924@east.gentwo.org>
Cc: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>
From: Nikolay Borisov <kernel@kyup.com>
Message-ID: <5773D88F.1030005@kyup.com>
Date: Wed, 29 Jun 2016 17:17:51 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.1.0
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.20.1606290854500.14924@east.gentwo.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 06/29/2016 05:00 PM, Christoph Lameter wrote:
> On Wed, 29 Jun 2016, Nikolay Borisov wrote:
> 
>> I've observed a rather strange unbounded growth of the kmalloc-192
>> slab cache:
>>
>> OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>> 711124869 411527215   3%    0.19K 16934908       42 135479264K kmalloc-192
>>
>> Essentially the kmalloc is around 130 GB , yet only 3 percent of this are
>> being used. In this case I'd like to essentially shrink the overall size
>> of the cache. How is it possible to achieve that? I tried echoing '1'
>> to /sys/kernel/slab/kmalloc-192/shrink but nothing changed.
> 
> Ok this probably means that most slabs have just a few or one objects?
> Some workloads can result in situations like that. Can you enable
> debugging and get a list of functions where these objects are allocated?

Right, so what debugging concretely do you have in mind. So far what I
did was reboot the machine with SLUB merging disabled, since there are
quite a lot of slabs being merged into that particular one:

:t-0000192   <- cred_jar pid_3 inet_peer_cache request_sock_TCPv6
kmalloc-192 file_lock_cache bio-0 ip_dst_cache key_jar

I'm quite sure it's likely it's one of the either networking or bio-0
slab cache, since the others seems generally not very used.

> 
>> This is on 3.12 which is rather old kernel, but still I believe it is
>> entirely possible for someone to find a way to flood a machine with
>> network requests which would cause a lot of objects to be allocate,
>> resulting in a particular slab cache growing, then later when the request
>> flood stops the cache would be almost empty, yet the memory won't be usable
>> for anything other than satisfying memory allocation from this cache.
> 
> True. Long known problem and all my attempts to facilitate a solution here
> did not go anywhere. The essential solution would require objects being
> movable or removable from the sparsely allocated page frames. And this
> goes way beyond my subsystem.
> 
> If you can figure out which subsystem allocates or frees these objects
> (through the call traces) then we may find a knob in the subsystem to
> clear those out once in a while.
> 
>