From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2578CC433FE for ; Wed, 4 May 2022 20:50:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357336AbiEDUyX (ORCPT ); Wed, 4 May 2022 16:54:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237697AbiEDUyW (ORCPT ); Wed, 4 May 2022 16:54:22 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:e::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 803D11A386; Wed, 4 May 2022 13:50:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=LUcWUXDVgRzdq1jJHmWhPb5IAqTKtI0mu9qnJVWCZic=; b=rp3tULafgJwVBqrzXLuxhcy/Cp Wr2a2vR1oJIDTyhyFZkUrOdOCxirZ2u/G22uY4Av/ItDlm8otE7VnKe2fXojanNUlEyWaoabC/Dwf J00DWqbXLn8bndSSTnYoI7TYITG2H1IuBQTQDwOBB2xkJ54dCtWZaJdGf7tPVNYb2Vv/DytdhX/9G mkkKPJFcyJNGSDSfFKs5+TawT4xG+qSCMyicGkYx7J42cZLKgWN8TXYzigWCjfDrJU+N1CBm8rp3Q 9Nhb3PnTxJBMO3I7UEL15fBIw8+A+jwvNz2yOFgwpv22O76MwcP35joAhD4t8jR+dnFu06TC+1DRI 1otRHafw==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nmLxJ-00CgNR-Mt; Wed, 04 May 2022 20:50:41 +0000 Date: Wed, 4 May 2022 13:50:41 -0700 From: Luis Chamberlain To: Vasily Averin Cc: Shakeel Butt , kernel@openvz.org, Florian Westphal , linux-kernel@vger.kernel.org, Roman Gushchin , Vlastimil Babka , Michal Hocko , cgroups@vger.kernel.org, netdev@vger.kernel.org, "David S. Miller" , Jakub Kicinski , Paolo Abeni , Kees Cook , Iurii Zaikin , linux-fsdevel@vger.kernel.org Subject: Re: [PATCH memcg v2] memcg: accounting for objects allocated for new netdevice Message-ID: References: <53613f02-75f2-0546-d84c-a5ed989327b6@openvz.org> <354a0a5f-9ec3-a25c-3215-304eab2157bc@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <354a0a5f-9ec3-a25c-3215-304eab2157bc@openvz.org> Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Mon, May 02, 2022 at 03:15:51PM +0300, Vasily Averin wrote: > Creating a new netdevice allocates at least ~50Kb of memory for various > kernel objects, but only ~5Kb of them are accounted to memcg. As a result, > creating an unlimited number of netdevice inside a memcg-limited container > does not fall within memcg restrictions, consumes a significant part > of the host's memory, can cause global OOM and lead to random kills of > host processes. > > The main consumers of non-accounted memory are: > ~10Kb 80+ kernfs nodes > ~6Kb ipv6_add_dev() allocations > 6Kb __register_sysctl_table() allocations > 4Kb neigh_sysctl_register() allocations > 4Kb __devinet_sysctl_register() allocations > 4Kb __addrconf_sysctl_register() allocations > > Accounting of these objects allows to increase the share of memcg-related > memory up to 60-70% (~38Kb accounted vs ~54Kb total for dummy netdevice > on typical VM with default Fedora 35 kernel) and this should be enough > to somehow protect the host from misuse inside container. > > Other related objects are quite small and may not be taken into account > to minimize the expected performance degradation. > > It should be separately mentonied ~300 bytes of percpu allocation > of struct ipstats_mib in snmp6_alloc_dev(), on huge multi-cpu nodes > it can become the main consumer of memory. > > This patch does not enables kernfs accounting as it affects > other parts of the kernel and should be discussed separately. > However, even without kernfs, this patch significantly improves the > current situation and allows to take into account more than half > of all netdevice allocations. > > Signed-off-by: Vasily Averin > --- > v2: 1) kernfs accounting moved into separate patch, suggested by > Shakeel and mkoutny@. > 2) in ipv6_add_dev() changed original "sizeof(struct inet6_dev)" > to "sizeof(*ndev)", according to checkpath.pl recommendation: > CHECK: Prefer kzalloc(sizeof(*ndev)...) over kzalloc(sizeof > (struct inet6_dev)...) > --- > fs/proc/proc_sysctl.c | 2 +- for proc_sysctl: Acked-by: Luis Chamberlain Luis