From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68A9F382F20 for ; Wed, 22 Apr 2026 18:17:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776881842; cv=none; b=izWrssEN3EK8QcZ1wIP/TZc3oT+WSSYn0AJg71v45Vvl8dlTFfUjcfkIJIxM+uZ3lvzl4RyPU/9eHzIpSPxxdW8AKgyG7bSxkh8b79wvC4N04UhO2gz30vgdXxG4NsYppVtnUNzCDuKyKCZ8otbDfjOW8IAjUv7sanQZPDWjKZI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776881842; c=relaxed/simple; bh=sPBTQoBEri8VbQ2Sl1S8YcqDLdFDE3pIu5OVrcMCct0=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=PNvW9eXIqb8ZELzKwU4i0RtAzXo8lSH6sJp+6FqoYZEBoWO9vooV7CXl/WUjqkm7Uai+o7lA14YpwDK3bIKjvtc405lAYSsio3jQ/8hTkYEqcKnLc0LZ0kZbcoQHF/6vGEV6wLUSBnPdwikGKbSEgcsmVBX/Z8AtXgLJg4KsMds= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BCh/wIlE; arc=none smtp.client-ip=209.85.128.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BCh/wIlE" Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-4891b0786beso33773585e9.1 for ; Wed, 22 Apr 2026 11:17:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776881840; x=1777486640; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=CoE8/aU1Q2Pt/SSX/JgDYwue2hZuMxMlpCEhsaWxqag=; b=BCh/wIlEGA5LZVU2xESda2A5BilKPisaPFH2NI3ha76ZJUw5OkEkucidI7wm7Fvfj6 tPtq8+zH7eBH/U/cwylv2BoXo87ZT2pVkgDmTQCpZ7hM+fOO2LCzU7hLBE0mQOkkKRGz ttKjx/7SbB4aYa9D5ypIqFeAaN2+C0gN9Da7xqkIFo+oKxxk8dNRfWOsD2oE41Pm9Tq5 LZ70xpdji9X/y9BEyCKBq9nRz7LSM0QTE69lgcYdzPmEgrTV8L/aKU952iHH+vfEjD5a 5eoaW6vWbD5NixInOQ5KCnuA49uTVokvY63URx0EwFcfh2KUFzGtXb2ORFYos3WC3WrC kHRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776881840; x=1777486640; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CoE8/aU1Q2Pt/SSX/JgDYwue2hZuMxMlpCEhsaWxqag=; b=FG/qzmdBMNGbce9OTwOlrd+MxIPVhfkjjh3vSViuzwwYw+C0wg0g4G32Y18L4mYVHo cO2wh+QEQJs/ns7bC16LN10I194Vz3m1weDXsGcplwqbbDPLBNPEj61wOcNm4CWM7MVh 0Hg7Evy4D7pZGNMW1A0uVxW2WGH1eAwylmfXZMGgtTqg/3/EK8TTjfPN05qs9eFNDRmi rXdo5UIdNgS9el+35YAKC81778ujx43kYHoMDCO5KKdLfOLuoKOFvEI8Sx9dEQJoCbNW 0YMt7wOfXkDEQn3gWiRxLfm2A5lordB5uI2hfod2AOekavWv3G8RU0uElqgP1QeSm2Qd O13w== X-Forwarded-Encrypted: i=1; AFNElJ/S46DIVaAw098UXoc8Yd7Q+VxX+r64qs8Wu7UyqTa/Z3dod96pFLTmj09SIBG5L3dV3B7lXr5usGB6AL2U@vger.kernel.org X-Gm-Message-State: AOJu0YwcKFuI8L5iREmAFkhiyYPBeKtAdGSzzbpuw7D7wZICZ8aXI/V5 oq4mIF1xLW4uidrO909y78ePqiYu0nPFe4uzvVNRbDvKYClrD6TgYubc X-Gm-Gg: AeBDieu+p/BFwg3zXF4GW8x2dcQVYBzchJjgP50h6xhmLGwPfKRtDnxXbhFD/L0XwTo ww+43Kbvm7kZECS2FAiK4tKf0T/6T2MhpecyrUGlNGih3uZ+KweXpNDwt5Kow/daLd/Zqxf3ug8 Z8stOU3wKNq2EzAe5iPzSMxCsl+Klhwh0VizCnYt7gK6RA1qIAKz0WF/m5mXzvqClGjiY/FB76/ 7LusdfbYQT7WoOFYm5JHAcq8UCe1sxL8DeJka0G7MlomMAZyLEkgCqeIubVMVXNqlDAhqe3AJ0S M+6gwOi3xtvvY1oNrrTDyvSnMcbBoor3VXkkKMtbpbuBTlvHYN1+hV2B9T05q/pLSRxOuh31LfN 3VafTSTpUNBUQ7FXuMi/qABv1WMXNRE5BOQDH5+0GJc69RZw2M0zNn2/CmWeA40BKQ2XGM3JR5f 5ueCALcI4VJO9Uv4jk0s5WsqUqKOOEAACITjWcr344lEcLHK6f6F0yQJuOyABGp3wJPORXpIpkI 6u9mXbEnl6C X-Received: by 2002:a05:600c:3110:b0:487:2439:b7be with SMTP id 5b1f17b1804b1-488fb7389dfmr352035605e9.6.1776881839492; Wed, 22 Apr 2026 11:17:19 -0700 (PDT) Received: from f.. (cst-prg-93-232.cust.vodafone.cz. [46.135.93.232]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-488fb74c68asm145316335e9.3.2026.04.22.11.17.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Apr 2026 11:17:18 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Mateusz Guzik Subject: [PATCH v2] fs: cache the string generated by reading /proc/filesystems Date: Wed, 22 Apr 2026 20:17:11 +0200 Message-ID: <20260422181711.1340269-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit It is being read surprisingly often (e.g., by mkdir, ls and even sed!). This is lock-protected pointer chasing over a linked list to pay for sprintf for every fs (32 on my boxen). Instead cache the result. open+read+close cycle single-threaded (ops/s): before: 442732 after: 1063462 (+140%) Here the main bottleneck is memcg. Scalability-wise problems are avoidable lockref trip on open and ref management for the file on procfs side. The file looks like a sterotypical C from the 90s, right down to an open-coded and slightly obfuscated linked list. I intentionally did not clean up any of it -- I think the file will be best served by a Rust rewrite when the time comes. Signed-off-by: Mateusz Guzik --- v2: - drop the procfs bits - touch up some comments I posted v1 last year https://lore.kernel.org/linux-fsdevel/20250329192821.822253-1-mjguzik@gmail.com/ but that ran into some procfs issues. the thing can be sped up regardless of the procfs problem. fs/filesystems.c | 144 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 137 insertions(+), 7 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index 0c7d2b7ac26c..704fc6d49f80 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -34,6 +34,23 @@ static struct file_system_type *file_systems; static DEFINE_RWLOCK(file_systems_lock); +#ifdef CONFIG_PROC_FS +static unsigned long file_systems_gen; + +struct file_systems_string { + struct rcu_head rcufree; + unsigned long gen; + size_t len; + char string[]; +}; +static struct file_systems_string *file_systems_string; +static void invalidate_filesystems_string(void); +#else +static void invalidate_filesystems_string(void) +{ +} +#endif + /* WARNING: This can be used only if we _already_ own a reference */ struct file_system_type *get_filesystem(struct file_system_type *fs) { @@ -83,10 +100,12 @@ int register_filesystem(struct file_system_type * fs) return -EBUSY; write_lock(&file_systems_lock); p = find_filesystem(fs->name, strlen(fs->name)); - if (*p) + if (*p) { res = -EBUSY; - else + } else { *p = fs; + invalidate_filesystems_string(); + } write_unlock(&file_systems_lock); return res; } @@ -115,6 +134,7 @@ int unregister_filesystem(struct file_system_type * fs) if (fs == *tmp) { *tmp = fs->next; fs->next = NULL; + invalidate_filesystems_string(); write_unlock(&file_systems_lock); synchronize_rcu(); return 0; @@ -235,22 +255,132 @@ int __init list_bdev_fs_names(char *buf, size_t size) } #ifdef CONFIG_PROC_FS -static int filesystems_proc_show(struct seq_file *m, void *v) +/* + * The fs list gets queried a lot by userspace because of libselinux, including + * rather surprising programs (would you guess *sed* is on the list?). In order + * to reduce the overhead we cache the resulting string, which normally hangs + * around below 512 bytes in size. + * + * As the list almost never changes, its creation is not particularly optimized + * for simplicity. + * + * We sort it out on read in order to not introduce a failure point for fs + * registration (in principle we may be unable to alloc memory for the list). + */ +static void invalidate_filesystems_string(void) { - struct file_system_type * tmp; + struct file_systems_string *fss; - read_lock(&file_systems_lock); + lockdep_assert_held_write(&file_systems_lock); + file_systems_gen++; + fss = file_systems_string; + WRITE_ONCE(file_systems_string, NULL); + kfree_rcu(fss, rcufree); +} + +static noinline int regen_filesystems_string(void) +{ + struct file_system_type *tmp; + struct file_systems_string *old, *new; + size_t newlen, usedlen; + unsigned long gen; + +retry: + lockdep_assert_not_held(&file_systems_lock); + + newlen = 0; + write_lock(&file_systems_lock); + gen = file_systems_gen; + tmp = file_systems; + + /* pre-calc space for "%s\t%s\n" for each fs */ + while (tmp) { + if (!(tmp->fs_flags & FS_REQUIRES_DEV)) + newlen += strlen("nodev"); + newlen += strlen("\t"); + newlen += strlen(tmp->name); + newlen += strlen("\n"); + tmp = tmp->next; + } + write_unlock(&file_systems_lock); + + new = kmalloc(offsetof(struct file_systems_string, string) + newlen + 1, + GFP_KERNEL); + if (!new) + return -ENOMEM; + + new->gen = gen; + new->len = newlen; + new->string[newlen] = '\0'; + write_lock(&file_systems_lock); + old = file_systems_string; + + /* + * Did someone beat us to it? + */ + if (old && old->gen == file_systems_gen) { + write_unlock(&file_systems_lock); + kfree(new); + return 0; + } + + /* + * Did the list change in the meantime? + */ + if (gen != file_systems_gen) { + write_unlock(&file_systems_lock); + kfree(new); + goto retry; + } + + /* + * Populate the string. + * + * We know we have just enough space because we calculated the right + * size the previous time we had the lock and confirmed the list has + * not changed after reacquiring it. + */ + usedlen = 0; tmp = file_systems; while (tmp) { - seq_printf(m, "%s\t%s\n", + usedlen += sprintf(&new->string[usedlen], "%s\t%s\n", (tmp->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev", tmp->name); tmp = tmp->next; } - read_unlock(&file_systems_lock); + BUG_ON(new->len != strlen(new->string)); + + /* + * Paired with consume fence in READ_ONCE() in filesystems_proc_show() + */ + smp_store_release(&file_systems_string, new); + write_unlock(&file_systems_lock); + kfree_rcu(old, rcufree); return 0; } +static int filesystems_proc_show(struct seq_file *m, void *v) +{ + struct file_systems_string *fss; + + for (;;) { + scoped_guard(rcu) { + /* + * Paired with smp_store_release() in regen_filesystems_string() + */ + fss = READ_ONCE(file_systems_string); + if (likely(fss)) { + seq_write(m, fss->string, fss->len); + return 0; + } + } + + int err = regen_filesystems_string(); + if (unlikely(err)) + return err; + } +} + static int __init proc_filesystems_init(void) { proc_create_single("filesystems", 0, NULL, filesystems_proc_show); -- 2.48.1