linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Whitehouse <swhiteho@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [TOPIC] Last iput() from flusher thread, last fput() from munmap()...
Date: Wed, 28 Mar 2012 15:07:40 +0100	[thread overview]
Message-ID: <1332943660.2728.66.camel@menhir> (raw)
In-Reply-To: <20120328115430.GF18751@quack.suse.cz>

Hi,

On Wed, 2012-03-28 at 13:54 +0200, Jan Kara wrote:
> Hi,
> 
> On Wed 28-03-12 10:04:15, Steven Whitehouse wrote:
> > On Wed, 2012-03-28 at 03:38 +0100, Al Viro wrote:
> > > On Tue, Mar 27, 2012 at 11:08:58PM +0200, Jan Kara wrote:
> > > >   Hello,
> > > > 
> > > >   maybe the name of this topic could be "How hard should be life of
> > > > filesystems?" but that's kind of broad topic and suggests too much of
> > > > bikeshedding. I'd like to concentrate on concrete possible pain points
> > > > between filesystems & VFS (possibly writeback or even generally MM).
> > > > Lately, I've myself came across the two issues in $SUBJECT:
> > > > 1) dropping of last file reference can happen from munmap() and in that
> > > >    case mmap_sem will be held when ->release() is called. Even more it
> > > >    could be held when ->evict_inode() is called to delete inode because
> > > >    inode was unlinked.
> > > 
> > > Yes, it can.
> > > 
> > > > 2) since flusher thread takes inode reference when writing inode out, the
> > > >    last inode reference can be dropped from flusher thread. Thus inode may
> > > >    get deleted in the flusher thread context. This does not seem that
> > > >    problematic on its own but if we realize progress of memory reclaim
> > > >    depends (at least from a longterm perspective) on flusher thread making
> > > >    progress, things start looking a bit uncertain. Even more so when we
> > > >    would like avoid ->writepage() calls from reclaim and let flusher thread
> > > >    do the work instead. That would then require filesystems to carefully
> > > >    design their ->evict_inode() routines so that things are not
> > > >    deadlockable.
> > > 
> > > You mean "use GFP_NOIO for allocations when holding fs-internal locks"?
> > > 
> > > >   Both these issues should be avoidable (we can postpone fput() after we
> > > > drop mmap_sem; we can tweak inode refcounting to avoid last iput() from
> > > > flusher thread) but obviously there's some cost in the complexity of generic
> > > > layer. So the question is, is it worth it?
> > > 
> > > I don't thing it is.  ->i_mutex in ->release() is never needed; existing
> > > cases are racy and dropping preallocation that way is simply wrong.  And
> > > ->evict_inode() is a non-issue, since it has no reason whatsoever to take
> > > *any* locks in mutex - the damn thing is called when nobody has references
> > > to struct inode anymore.  Deadlocks with flusher... that's what NOIO and
> > > NOFS are for.
> > > 
> > For cluster filesystems, we have to take locks (cluster wide) in
> > ->evict_inode() in order to establish for certain whether we are the
> > last opener of the inode. Just because there are no references on the
> > local node, doesn't mean that a remote node doesn't hold the file open
> > still.
> > 
> > We do always use GFP_NOFS when allocating memory while holding such
> > locks, so I'm not quite sure from the above whether or not that will be
> > an issue,
>   Yeah, but you have to use networking to communicate with other nodes
> about locks and this creates another interesting dependecy.
> 
> Currently, everything seems to work out just fine and I don't say I know
> about a particular deadlock. I just say that the dependencies are so
> complex that I don't know whether things will work OK e.g. if we change
> page reclaim to offload more to flusher thread. And that's what I feel
> uneasy about.
> 
> 								Honza

Yes, I agree. I've certainly seen some issues with this code path in
GFS2 in the past though, so making it more robust in this way seems to
be a good plan to me,

Steve.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-03-28 14:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-27 21:08 [TOPIC] Last iput() from flusher thread, last fput() from munmap() Jan Kara
2012-03-28  2:38 ` Al Viro
2012-03-28  4:45   ` Dave Chinner
2012-03-28  9:04   ` Steven Whitehouse
2012-03-28 11:54     ` [Lsf-pc] " Jan Kara
2012-03-28 14:07       ` Steven Whitehouse [this message]
2012-03-28 12:10   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1332943660.2728.66.camel@menhir \
    --to=swhiteho@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).