From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Jeff Layton <jlayton@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>, Kees Cook <kees@kernel.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <muchun.song@linux.dev>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat()
Date: Sun, 24 Mar 2024 02:27:31 +0000 [thread overview]
Message-ID: <20240324022731.GR538574@ZenIV> (raw)
In-Reply-To: <CAHk-=whgFtbTxCAg2CWQtDj7n6CEyzvdV1wcCj2qpMfpw0=m1A@mail.gmail.com>
On Fri, Mar 01, 2024 at 09:51:18AM -0800, Linus Torvalds wrote:
> Right. I think the natural and logical way to deal with this is to
> just say "we account when we add the file to the fdtable".
>
> IOW, just have fd_install() do it. That's the really natural point,
> and also makes it very logical why alloc_empty_file_noaccount()
> wouldn't need to do the GFP_KERNEL_ACCOUNT.
We can have the same file occuring in many slots of many descriptor tables,
obviously. So it would have to be a flag (in ->f_mode?) set by it, for
"someone's already charged for it", or you'll end up with really insane
crap on each fork(), dup(), etc.
But there's also MAP_ANON with its setup_shmem_file(), with the resulting
file not going into descriptor tables at all, and that's not a rare thing.
> > - I don't know how to properly unwind the accounting failure case. It
> > seems like a new case because when we succeed the open, there's no
> > further error path at least in path_openat().
>
> Yeah, let me think about this part. Becasue fd_install() is the right
> point, but that too does not really allow for error handling.
>
> Yes, we could close things and fail it, but it really is much too late
> at this point.
That as well. For things like O_CREAT even do_dentry_open() would be too
late for unrolls.
> What I *think* I'd want for this case is
>
> (a) allow the accounting to go over by a bit
>
> (b) make sure there's a cheap way to ask (before) about "did we go
> over the limit"
>
> IOW, the accounting never needed to be byte-accurate to begin with,
> and making it fail (cheaply and early) on the next file allocation is
> fine.
>
> Just make it really cheap. Can we do that?
That might be reasonable, but TBH I would rather combine that with
do_dentry_open()/alloc_file() (i.e. the places where we set FMODE_OPENED)
as places to do that, rather than messing with fd_install().
How does the following sound?
* those who allocate empty files mark them if they are intended
to be kernel-internal (see below for how to get the information there)
* memcg charge happens when we set FMODE_OPENED, provided that
struct file instance is not marked kernel-internal.
* exceeding the limit => pretend we'd succeeded and fail the
next allocation.
As for how to get the information down there... We have 6 functions
where "allocate" and "mark it opened" callchains converge -
alloc_file() (pipe(2) et.al., mostly), path_openat() (normal opens,
but also filp_open() et.al.), dentry_open(), kernel_file_open(),
kernel_tmpfile_open(), dentry_create(). The last 3 are all
kernel-internal; dentry_open() might or might not be.
For path_openat() we can add a bit somewhere in struct open_flags;
the places where we set struct open_flags up would be the ones that
might need to be annotated. That's
file_open_name()
file_open_root()
do_sys_openat2() (definitely userland)
io_openat2() (ditto)
sys_uselib() (ditto)
do_open_execat() (IMO can be considered userland in all cases)
For alloc_file() it's almost always userland. IMO things like
dma_buf_export() and setup_shmem_file() should be charged.
So it's a matter of propagating the information to dentry_open(),
file_open_name() and file_open_root(). That's about 70 callers
to annotate, including filp_open() and file_open_root_mnt() into
the mix. <greps> 61, actually, and from the quick look it
seems that most of them are really obvious...
Comments?
next prev parent reply other threads:[~2024-03-24 2:27 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-01 17:07 [PATCH RFC 0/4] memcg_kmem hooks refactoring and kmem_cache_charge() Vlastimil Babka
2024-03-01 17:07 ` [PATCH RFC 1/4] mm, slab: move memcg charging to post-alloc hook Vlastimil Babka
2024-03-12 18:52 ` Roman Gushchin
2024-03-12 18:59 ` Matthew Wilcox
2024-03-12 20:35 ` Roman Gushchin
2024-03-13 10:55 ` Vlastimil Babka
2024-03-13 17:34 ` Roman Gushchin
2024-03-15 3:23 ` Chengming Zhou
2024-03-01 17:07 ` [PATCH RFC 2/4] mm, slab: move slab_memcg hooks to mm/memcontrol.c Vlastimil Babka
2024-03-12 18:56 ` Roman Gushchin
2024-03-12 19:32 ` Matthew Wilcox
2024-03-12 20:36 ` Roman Gushchin
2024-03-01 17:07 ` [PATCH RFC 3/4] mm, slab: introduce kmem_cache_charge() Vlastimil Babka
2024-03-01 17:07 ` [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat() Vlastimil Babka
2024-03-01 17:51 ` Linus Torvalds
2024-03-01 18:53 ` Roman Gushchin
2024-03-12 9:22 ` Vlastimil Babka
2024-03-12 19:05 ` Roman Gushchin
2024-03-04 12:47 ` Christian Brauner
2024-03-24 2:27 ` Al Viro [this message]
2024-03-24 17:44 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240324022731.GR538574@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chuck.lever@oracle.com \
--cc=cl@linux.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=jpoimboe@kernel.org \
--cc=kees@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.