From: Ted Ts'o <tytso@mit.edu>
To: David Rientjes <rientjes@google.com>
Cc: Jens Axboe <jaxboe@fusionio.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Neil Brown <neilb@suse.de>, Alasdair G Kergon <agk@redhat.com>,
Chris Mason <chris.mason@oracle.com>,
Steven Whitehouse <swhiteho@redhat.com>, Jan Kara <jack@suse.cz>,
Frederic Weisbecker <fweisbec@gmail.com>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
"cluster-devel@redhat.com" <cluster-devel@redhat.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"reiserfs-devel@vger.kernel.org" <reiserfs-devel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc
Date: Wed, 25 Aug 2010 07:24:33 -0400 [thread overview]
Message-ID: <20100825112433.GB4453@thunk.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1008241309170.21242@chino.kir.corp.google.com>
On Tue, Aug 24, 2010 at 01:11:26PM -0700, David Rientjes wrote:
> On Tue, 24 Aug 2010, Jens Axboe wrote:
>
> > Should be possible to warn at build time for anyone using __GFP_NOFAIL
> > without wrapping it in a function.
> >
>
> We could make this __deprecated functions as Peter suggested if you think
> build time warnings for existing users would be helpful.
Let me take a few steps backwards and look at the problem from a
somewhat higher level.
Part of the problem is that we have a few places in the kernel where
failure is really not an option --- or rather, if we're going to fail
while we're in the middle of doing a commit, our choices really are
(a) retry the loop in the jbd layer (which Andrew really doesn't
like), (b) keep our own private cache of free memory so we don't fail
and/or loop, (c) fail the file system and mark it read-only, or (d)
panic.
There are other places where we can fail safely (for example, in jbd's
start_this_handle, although that just pushes the layer up the stack,
and ultimately, to userspace where most userspace programs don't
really expect ENOMEM to get returned by a random disk write --- how
much do _you_ trust a random GNOME or KDE developer to do correct
error checking and do something sane at the application?)
So we can mark the retry loop helper function as deprecated, and that
will make some of these cases go away, but ultimately if we're going
to fail the memory allocation, something bad is going to happen, and
the only question is whether we want to have something bad happen by
looping in the memory allocator, or to force the file system to
panic/oops the system, or have random application die and/or lose user
data because they don't expect write() to return ENOMEM.
So at some level it would be nice if we had a few different levels of
"we *really* need this memory". Level 0 might be, "if we can't get
it, no biggie, we'll figure out some other way around it. I doubt
there is much at level 0, but in theory, we could have some good
citizens that fall in that camp and who simply will bypass some
optimization if they can't get the memory. Level 1 might be, if you
can't get the memory, we will propagate a failure up to userspace, but
it's probably a relatively "safe" place to fail (i.e., when the user
is opening a file). Level 2 might be, "if you can't get the memory,
we will propgate a failure up to userspace, but it's at a space where
most applications are lazy f*ckers, and this may lead to serious
application errors" (example: close(2), and this is a file system that
only pushes the file to the server at close time, e.g. AFS). Level 3
might be, "if you can't get the memory, I'm going to fail the file
system, or some other global subsystem, that will probably result in
the system crashing or needing to be rebooted".
We can ignore this problem and pretend it doesn't exist at the memory
allocator level, but that means the callers are going to be doing
their own thing to try to avoid having really bad things happening at
really-low-memory occasions. And this may mean looping, whether we
mark the function as deprecated or not.
This is becoming even more of an issue now given that with
containerization, we have jokers who are forcing tasks to run in very
small memory containers, which means that failures can happen far more
frequently --- and in some cases, just because the process running the
task happens to be in an extremely constrained memory cgroup, doesn't
mean that failing the entire system is really such a great idea. Or
maybe that means memory cgroups are kinda busted. :-)
- Ted
next prev parent reply other threads:[~2010-08-25 11:24 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-24 10:50 [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc David Rientjes
2010-08-24 10:50 ` [patch 3/5] fs: add nofail variant of alloc_buffer_head David Rientjes
2010-08-24 12:17 ` Jan Kara
2010-08-24 12:15 ` [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc Jan Kara
2010-08-24 13:29 ` Peter Zijlstra
2010-08-24 13:33 ` Jens Axboe
2010-08-24 20:11 ` David Rientjes
2010-08-25 11:24 ` Ted Ts'o [this message]
2010-08-25 11:35 ` Peter Zijlstra
2010-08-25 11:57 ` Ted Ts'o
2010-08-25 12:48 ` Peter Zijlstra
2010-08-25 12:52 ` Peter Zijlstra
2010-08-25 13:20 ` Theodore Tso
2010-08-25 13:31 ` Peter Zijlstra
2010-08-25 20:43 ` David Rientjes
2010-08-25 20:55 ` Peter Zijlstra
2010-08-25 21:11 ` David Rientjes
2010-08-25 21:27 ` Peter Zijlstra
2010-08-25 23:11 ` David Rientjes
2010-08-26 0:19 ` Ted Ts'o
2010-08-26 0:30 ` David Rientjes
[not found] ` <alpine.DEB.2.00.1008251724360.25783@chino.kir.corp.google.com>
2010-08-26 1:48 ` Ted Ts'o
2010-08-26 3:09 ` David Rientjes
2010-08-26 7:06 ` Dave Chinner
2010-08-26 8:29 ` Peter Zijlstra
2010-08-26 6:38 ` Dave Chinner
2010-08-25 13:34 ` Peter Zijlstra
2010-08-25 13:24 ` Dave Chinner
2010-08-25 13:35 ` Peter Zijlstra
2010-08-25 20:53 ` Ted Ts'o
2010-08-25 20:59 ` David Rientjes
2010-08-25 21:35 ` Peter Zijlstra
2010-08-25 20:58 ` David Rientjes
2010-08-25 21:11 ` Christoph Lameter
2010-08-25 21:21 ` Peter Zijlstra
2010-08-25 21:23 ` David Rientjes
2010-08-25 21:35 ` Christoph Lameter
2010-08-25 23:05 ` David Rientjes
2010-08-26 1:30 ` Christoph Lameter
2010-08-26 3:12 ` David Rientjes
2010-08-26 14:16 ` Christoph Lameter
2010-08-26 22:31 ` David Rientjes
2010-08-26 0:09 ` Dave Chinner
2010-08-25 14:13 ` Peter Zijlstra
2010-08-24 13:55 ` Dave Chinner
2010-08-24 14:03 ` Peter Zijlstra
2010-08-24 20:12 ` David Rientjes
2010-08-24 20:08 ` David Rientjes
2010-09-02 1:02 ` [patch v2 " David Rientjes
2010-09-02 1:03 ` [patch v2 3/5] fs: add nofail variant of alloc_buffer_head David Rientjes
2010-09-02 7:59 ` [patch v2 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc Jiri Slaby
2010-09-02 14:51 ` Jan Kara
2010-09-02 21:15 ` Neil Brown
2010-09-05 23:03 ` David Rientjes
2010-09-05 23:01 ` David Rientjes
2010-09-06 9:05 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100825112433.GB4453@thunk.org \
--to=tytso@mit.edu \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=cluster-devel@redhat.com \
--cc=fweisbec@gmail.com \
--cc=jack@suse.cz \
--cc=jaxboe@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=peterz@infradead.org \
--cc=reiserfs-devel@vger.kernel.org \
--cc=rientjes@google.com \
--cc=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).