From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752862AbcF2O0Q (ORCPT ); Wed, 29 Jun 2016 10:26:16 -0400 Received: from mail-wm0-f48.google.com ([74.125.82.48]:38145 "EHLO mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752537AbcF2O0N (ORCPT ); Wed, 29 Jun 2016 10:26:13 -0400 Subject: Re: Unbounded growth of slab caches and how to shrink them To: Christoph Lameter References: <5773A427.2080100@kyup.com> Cc: "Linux-Kernel@Vger. Kernel. Org" From: Nikolay Borisov Message-ID: <5773D88F.1030005@kyup.com> Date: Wed, 29 Jun 2016 17:17:51 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/29/2016 05:00 PM, Christoph Lameter wrote: > On Wed, 29 Jun 2016, Nikolay Borisov wrote: > >> I've observed a rather strange unbounded growth of the kmalloc-192 >> slab cache: >> >> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME >> 711124869 411527215 3% 0.19K 16934908 42 135479264K kmalloc-192 >> >> Essentially the kmalloc is around 130 GB , yet only 3 percent of this are >> being used. In this case I'd like to essentially shrink the overall size >> of the cache. How is it possible to achieve that? I tried echoing '1' >> to /sys/kernel/slab/kmalloc-192/shrink but nothing changed. > > Ok this probably means that most slabs have just a few or one objects? > Some workloads can result in situations like that. Can you enable > debugging and get a list of functions where these objects are allocated? Right, so what debugging concretely do you have in mind. So far what I did was reboot the machine with SLUB merging disabled, since there are quite a lot of slabs being merged into that particular one: :t-0000192 <- cred_jar pid_3 inet_peer_cache request_sock_TCPv6 kmalloc-192 file_lock_cache bio-0 ip_dst_cache key_jar I'm quite sure it's likely it's one of the either networking or bio-0 slab cache, since the others seems generally not very used. > >> This is on 3.12 which is rather old kernel, but still I believe it is >> entirely possible for someone to find a way to flood a machine with >> network requests which would cause a lot of objects to be allocate, >> resulting in a particular slab cache growing, then later when the request >> flood stops the cache would be almost empty, yet the memory won't be usable >> for anything other than satisfying memory allocation from this cache. > > True. Long known problem and all my attempts to facilitate a solution here > did not go anywhere. The essential solution would require objects being > movable or removable from the sparsely allocated page frames. And this > goes way beyond my subsystem. > > If you can figure out which subsystem allocates or frees these objects > (through the call traces) then we may find a knob in the subsystem to > clear those out once in a while. > >