From: Chris Down <chris@chrisdown.name>
To: Matthew Wilcox <willy@infradead.org>
Cc: Amir Goldstein <amir73il@gmail.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Jeff Layton <jlayton@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
kernel-team@fb.com, Hugh Dickins <hughd@google.com>,
Miklos Szeredi <miklos@szeredi.hu>,
"zhengbin (A)" <zhengbin13@huawei.com>,
Roman Gushchin <guro@fb.com>
Subject: Re: [PATCH] fs: inode: Reduce volatile inode wraparound risk when ino_t is 64 bit
Date: Mon, 23 Dec 2019 20:45:51 +0000 [thread overview]
Message-ID: <20191223204551.GA272672@chrisdown.name> (raw)
In-Reply-To: <20191220195025.GA9469@bombadil.infradead.org>
Matthew Wilcox writes:
>On Fri, Dec 20, 2019 at 07:35:38PM +0200, Amir Goldstein wrote:
>> On Fri, Dec 20, 2019 at 6:46 PM Matthew Wilcox <willy@infradead.org> wrote:
>> >
>> > On Fri, Dec 20, 2019 at 03:41:11PM +0200, Amir Goldstein wrote:
>> > > Suggestion:
>> > > 1. Extend the kmem_cache API to let the ctor() know if it is
>> > > initializing an object
>> > > for the first time (new page) or recycling an object.
>> >
>> > Uh, what? The ctor is _only_ called when new pages are allocated.
>> > Part of the contract with the slab user is that objects are returned to
>> > the slab in an initialised state.
>>
>> Right. I mixed up the ctor() with alloc_inode().
>> So is there anything stopping us from reusing an existing non-zero
>> value of i_ino in shmem_get_inode()? for recycling shmem ino
>> numbers?
>
>I think that would be an excellent solution to the problem! At least,
>I can't think of any problems with it.
Thanks for the suggestions and feedback, Amir and Matthew :-)
The slab i_ino recycling approach works somewhat, but is unfortunately neutered
quite a lot by the fact that slab recycling is per-memcg. That is, replacing
with recycle_or_get_next_ino(old_ino)[0] for shmfs and a few other trivial
callsites only leads to about 10% slab reuse, which doesn't really stem the
bleeding of 32-bit inums on an affected workload:
# tail -5000 /sys/kernel/debug/tracing/trace | grep -o 'recycle_or_get_next_ino:.*' | sort | uniq -c
4454 recycle_or_get_next_ino: not recycled
546 recycle_or_get_next_ino: recycled
Roman (who I've just added to cc) tells me that currently we only have
per-memcg slab reuse instead of global when using CONFIG_MEMCG. This
contributes fairly significantly here since there are multiple tasks across
multiple cgroups which are contributing to the get_next_ino() thrash.
I think this is a good start, but we need something of a different magnitude in
order to actually solve this problem with the current slab infrastructure. How
about something like the following?
1. Add get_next_ino_full, which uses whatever the full width of ino_t is
2. Use get_next_ino_full in tmpfs (et al.)
3. Add a mount option to tmpfs (et al.), say `32bit-inums`, which people can
pass if they want the 32-bit inode numbers back. This would still allow
people who want to make this tradeoff to use xino.
4. (If you like) Also add a CONFIG option to disable this at compile time.
I'd appreciate your thoughts on that approach or others you have ideas about.
Thanks! :-)
0:
unsigned int recycle_or_get_next_ino(ino_t old_ino)
{
/*
* get_next_ino returns unsigned int. If this fires then i_ino must be
* >32 bits and have been changed later, so the caller shouldn't be
* recycling inode numbers
*/
WARN_ONCE(old_ino >> (sizeof(unsigned int) * 8),
"Recyclable i_ino uses more bits than unsigned int: %llu",
(u64)old_ino);
if (old_ino) {
if (prandom_u32() % 100 == 0)
trace_printk("recycled\n");
return old_ino;
} else {
if (prandom_u32() % 100 == 0)
trace_printk("not recycled\n");
return get_next_ino();
}
}
next prev parent reply other threads:[~2019-12-23 20:47 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-20 2:49 [PATCH] fs: inode: Reduce volatile inode wraparound risk when ino_t is 64 bit Chris Down
2019-12-20 3:05 ` zhengbin (A)
2019-12-20 8:32 ` Amir Goldstein
2019-12-20 12:16 ` Chris Down
2019-12-20 13:41 ` Amir Goldstein
2019-12-20 16:46 ` Matthew Wilcox
2019-12-20 17:35 ` Amir Goldstein
2019-12-20 19:50 ` Matthew Wilcox
2019-12-23 20:45 ` Chris Down [this message]
2019-12-24 3:04 ` Amir Goldstein
2019-12-25 12:54 ` Chris Down
2019-12-26 1:40 ` zhengbin (A)
2019-12-20 21:30 ` Darrick J. Wong
2019-12-21 8:43 ` Amir Goldstein
2019-12-21 18:05 ` Darrick J. Wong
2019-12-21 10:16 ` Chris Down
2020-01-07 17:35 ` J. Bruce Fields
2020-01-07 17:44 ` Chris Down
2020-01-08 3:00 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191223204551.GA272672@chrisdown.name \
--to=chris@chrisdown.name \
--cc=amir73il@gmail.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jlayton@kernel.org \
--cc=kernel-team@fb.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=zhengbin13@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.