From: Dave Chinner <david@fromorbit.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ted Ts'o <tytso@mit.edu>, David Rientjes <rientjes@google.com>,
Jens Axboe <jaxboe@fusionio.com>,
Andrew Morton <akpm@linux-foundation.org>,
Neil Brown <neilb@suse.de>, Alasdair G Kergon <agk@redhat.com>,
Chris Mason <chris.mason@oracle.com>,
Steven Whitehouse <swhiteho@redhat.com>, Jan Kara <jack@suse.cz>,
Frederic Weisbecker <fweisbec@gmail.com>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
"cluster-devel@redhat.com" <cluster-devel@redhat.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"reiserfs-devel@vger.kernel.org" <reiserfs-devel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc
Date: Thu, 26 Aug 2010 10:09:40 +1000 [thread overview]
Message-ID: <20100826000940.GR31488@dastard> (raw)
In-Reply-To: <1282743342.2605.3707.camel@laptop>
On Wed, Aug 25, 2010 at 03:35:42PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-08-25 at 23:24 +1000, Dave Chinner wrote:
> >
> > That is, the guarantee that we will always make progress simply does
> > not exist in filesystems, so a mempool-like concept seems to me to
> > be doomed from the start....
>
> While I appreciate that it might be somewhat (a lot) harder for a
> filesystem to provide that guarantee, I'd be deeply worried about your
> claim that its impossible.
I didn't say impossible, just that there's no way we can always
guarantee of forward progress with a specific, bound pool of memory.
Sure, we know what the worst case amount of log space is needed for
each transaction (i.e. how many pages that will be dirtied), but
that does not take into account all the blocks that need to be read
to make those modifications, the memory needed for stuff like btree
cursors, log tickets, transaction commit vectors, btree blocks
needed to do the searches, etc. A typical transaction reservation
on a 4k block filesystem is between 200-400k (it's worst case), and
if you add in all the other allocations that might be required,
we're at the order of requiring megabytes of RAM to guarantee a
single transaction will succeed in low memory conditions. The exact
requirement is very difficult to quantify algorithmically, but for a
single transaction it should be possible.
However, consider the case of running a thousand concurrent
transactions and in the middle of that the system runs out of
memory. All the transactions need memory allocation to succeed, some
are blocked waiting for resources held in other transactions, etc.
Firstly, how to you stop all the transactions from making further
progress to serialise access to the low memory pool? Secondly, how
do you select which transaction you want to use the low memory pool?
What do you do if the selected transaction then blocks on a resource
held by another transaction (which you can't know ahead of time)? Do
you switch to another thread and hope the pool doesn't run dry? What
do you do when (not if) the memory pool runs dry?
I'm sure this could be done, but it's lot of difficult, unrewarding
work that greatly increases code complexity, touches a massive
amount of the filesystem code base, exponentially increases the test
matrix, is likely to have significant operational overhead, and even
then there's no guarantee that we've got it right. That doesn't
sound like a good solution to me.
> It would render a system without swap very prone to deadlocks. Even with
> the very tight dirty page accounting we currently have you can fill all
> your memory with anonymous pages, at which point there's nothing free
> and you require writeout of dirty pages to succeed.
Then don't allow anonymous pages to fill all of memory when there is
no swap available - i.e. keep a larger pool of free memory when
there is no swap available. That's a much simpler solution than
turning all the filesystems upside down to try to make them not need
allocation....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2010-08-26 0:09 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-24 10:50 [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc David Rientjes
2010-08-24 10:50 ` [patch 3/5] fs: add nofail variant of alloc_buffer_head David Rientjes
2010-08-24 12:17 ` Jan Kara
2010-08-24 12:15 ` [patch 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc Jan Kara
2010-08-24 13:29 ` Peter Zijlstra
2010-08-24 13:33 ` Jens Axboe
2010-08-24 20:11 ` David Rientjes
2010-08-25 11:24 ` Ted Ts'o
2010-08-25 11:35 ` Peter Zijlstra
2010-08-25 11:57 ` Ted Ts'o
2010-08-25 12:48 ` Peter Zijlstra
2010-08-25 12:52 ` Peter Zijlstra
2010-08-25 13:20 ` Theodore Tso
2010-08-25 13:31 ` Peter Zijlstra
2010-08-25 20:43 ` David Rientjes
2010-08-25 20:55 ` Peter Zijlstra
2010-08-25 21:11 ` David Rientjes
2010-08-25 21:27 ` Peter Zijlstra
2010-08-25 23:11 ` David Rientjes
2010-08-26 0:19 ` Ted Ts'o
2010-08-26 0:30 ` David Rientjes
[not found] ` <alpine.DEB.2.00.1008251724360.25783@chino.kir.corp.google.com>
2010-08-26 1:48 ` Ted Ts'o
2010-08-26 3:09 ` David Rientjes
2010-08-26 7:06 ` Dave Chinner
2010-08-26 8:29 ` Peter Zijlstra
2010-08-26 6:38 ` Dave Chinner
2010-08-25 13:34 ` Peter Zijlstra
2010-08-25 13:24 ` Dave Chinner
2010-08-25 13:35 ` Peter Zijlstra
2010-08-25 20:53 ` Ted Ts'o
2010-08-25 20:59 ` David Rientjes
2010-08-25 21:35 ` Peter Zijlstra
2010-08-25 20:58 ` David Rientjes
2010-08-25 21:11 ` Christoph Lameter
2010-08-25 21:21 ` Peter Zijlstra
2010-08-25 21:23 ` David Rientjes
2010-08-25 21:35 ` Christoph Lameter
2010-08-25 23:05 ` David Rientjes
2010-08-26 1:30 ` Christoph Lameter
2010-08-26 3:12 ` David Rientjes
2010-08-26 14:16 ` Christoph Lameter
2010-08-26 22:31 ` David Rientjes
2010-08-26 0:09 ` Dave Chinner [this message]
2010-08-25 14:13 ` Peter Zijlstra
2010-08-24 13:55 ` Dave Chinner
2010-08-24 14:03 ` Peter Zijlstra
2010-08-24 20:12 ` David Rientjes
2010-08-24 20:08 ` David Rientjes
2010-09-02 1:02 ` [patch v2 " David Rientjes
2010-09-02 1:03 ` [patch v2 3/5] fs: add nofail variant of alloc_buffer_head David Rientjes
2010-09-02 7:59 ` [patch v2 1/5] mm: add nofail variants of kmalloc kcalloc and kzalloc Jiri Slaby
2010-09-02 14:51 ` Jan Kara
2010-09-02 21:15 ` Neil Brown
2010-09-05 23:03 ` David Rientjes
2010-09-05 23:01 ` David Rientjes
2010-09-06 9:05 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100826000940.GR31488@dastard \
--to=david@fromorbit.com \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=cluster-devel@redhat.com \
--cc=fweisbec@gmail.com \
--cc=jack@suse.cz \
--cc=jaxboe@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=peterz@infradead.org \
--cc=reiserfs-devel@vger.kernel.org \
--cc=rientjes@google.com \
--cc=swhiteho@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).