Re: [PATCH 2/3] jbd2 : Fix journal start by passing a parameter to specify if the caller can deal with ENOMEM

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Ted Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger@dilger.ca>, Jan Kara <jack@suse.cz>,
	Manish Katiyar <mkatiyar@gmail.com>,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/3] jbd2 : Fix journal start by passing a parameter to specify if the caller can deal with ENOMEM
Date: Thu, 26 May 2011 16:49:56 +0200	[thread overview]
Message-ID: <20110526144956.GB5123@quack.suse.cz> (raw)
In-Reply-To: <20110526140558.GJ9520@thunk.org>

On Thu 26-05-11 10:05:58, Ted Tso wrote:
> On Wed, May 25, 2011 at 10:07:20PM -0600, Andreas Dilger wrote:
> > > What JBD2_TOPLEVEL means is that caller is from a top-level file
> > > system function, such as ext4_symlink() or ext4_chmod(), such that
> > > start_this_handle() can use GFP_KERNEL instead of GFP_NOFS.  GFP_NOFS
> > > is needed for any function that might get called by the direct reclaim
> > > path (i.e., the writepage() function).  But for the top-level
> > > symlink() or chmod() function, it's actually OK to allocate memory
> > > using GFP_KERNEL, since there's no potential recursion problem.
> > 
> > At this point, why not just pass GFP_KERNEL or GFP_NOFS directly,
> > optionally with __GFP_NOFAIL?
> 
> Well, __GFP_NOFAIL is going away.  (At least there are a number of mm
> hackers, including akpm, who really want it to go away).
> 
> We could key off of GFP_NOFS, but GFP_NOFS doesn't mean loop and make
> sure that we don't fail.  The two concepts are in fact orthogonal;
> it's just at the moment that there are most places which are called in
> the fs writeback path which also can't fail.  
  Exactly.

> But just to give one example, ext4_bio_write_page() is an example of a
> function that allocates memory GFP_NOFS, but can fail with ENOMEM,
> because its caller, mpage_da_submit_io() in fs/ext4/inode.c is
> designed to cope with failure in a way that doesn't cause data loss.
> (We leave the page dirty, unlock it and back out of the writeback
> code, and later the bdi flusher threads will retry the writepages
> request.)
> 
>     	      	      	       	    	    	  - Ted
> 
> P.S.  That means that there are calls to ext4_journal_start() in the
> ext4_da_writepages() code path that probably could be converted to
> ext4_journal_start_failok() --- or to ext4_journal_start(inode,
> nblocks, JBD2_FAIL_OK) per my suggestion --- once we have fully
> audited the error return paths.
> 
> P.P.S.  Something that might be worth doing is having a sysfs tunable
> that causes ext4_journal_start() to return an ENOMEM failure on a per
> file system basis with a probability specified by the sysfs tunable.
> This would allow us to actually _test_ the error handling to make sure
> sane things happen....  
  No need to do this. If you make JBD2 use a separate slab for transaction
structures (trivial and makes some sense anyway), you can use
fault-injection framework to do exactly what you describe above (see
Documentation/fault-injection/fault-injection.txt and look for failslab).

> What I'd probably do is define a new flag, JBD2_WRITEBACK_TEST, and
> use it to make the ext4_journal_start() functions that are allowed to
> probabilistically fail (since the retry should be happening at a
> higher level), and then run a stress test with the syfs tunable
> enabled.  Since the flag would only cause ext4_journal_start()
> functions that should have automatic fallbacks, the stress test would
> be able to run to completion even though 10% of the
> ext4_journal_start(... JBD2_WRITEBACK_TEST) calls were failing.
> That's another example of why using a flag bitfield instead of a bool
> is much more powerful.
  But if we just fail all transaction allocations with say 10% probability,
it should work as well, shouldn't it? We'd just retry those allocations
whose failure we cannot handle and eventually succeed. Or do I miss
something?
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2011-05-26 14:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-25  7:26 [PATCH 2/3] jbd2 : Fix journal start by passing a parameter to specify if the caller can deal with ENOMEM Manish Katiyar
2011-05-25  7:44 ` Jan Kara
2011-05-25  7:47   ` Manish Katiyar
2011-05-25  8:13     ` Jan Kara
2011-05-26  2:22       ` Ted Ts'o
2011-05-26  4:07         ` Andreas Dilger
2011-05-26 14:05           ` Ted Ts'o
2011-05-26 14:49             ` Jan Kara [this message]
2011-05-26 15:08               ` Ted Ts'o
2011-05-26 15:37                 ` Jan Kara
2011-05-27  4:11                 ` Manish Katiyar
2011-05-26  5:29         ` Manish Katiyar
2011-05-26 14:51           ` Jan Kara
2011-05-28  6:16         ` Manish Katiyar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110526144956.GB5123@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mkatiyar@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).