All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Whitehouse <swhiteho@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [Lsf-pc] [TOPIC] Last iput() from flusher thread, last fput() from munmap()...
Date: Wed, 28 Mar 2012 15:07:40 +0100	[thread overview]
Message-ID: <1332943660.2728.66.camel@menhir> (raw)
In-Reply-To: <20120328115430.GF18751@quack.suse.cz>

Hi,

On Wed, 2012-03-28 at 13:54 +0200, Jan Kara wrote:
> Hi,
> 
> On Wed 28-03-12 10:04:15, Steven Whitehouse wrote:
> > On Wed, 2012-03-28 at 03:38 +0100, Al Viro wrote:
> > > On Tue, Mar 27, 2012 at 11:08:58PM +0200, Jan Kara wrote:
> > > >   Hello,
> > > > 
> > > >   maybe the name of this topic could be "How hard should be life of
> > > > filesystems?" but that's kind of broad topic and suggests too much of
> > > > bikeshedding. I'd like to concentrate on concrete possible pain points
> > > > between filesystems & VFS (possibly writeback or even generally MM).
> > > > Lately, I've myself came across the two issues in $SUBJECT:
> > > > 1) dropping of last file reference can happen from munmap() and in that
> > > >    case mmap_sem will be held when ->release() is called. Even more it
> > > >    could be held when ->evict_inode() is called to delete inode because
> > > >    inode was unlinked.
> > > 
> > > Yes, it can.
> > > 
> > > > 2) since flusher thread takes inode reference when writing inode out, the
> > > >    last inode reference can be dropped from flusher thread. Thus inode may
> > > >    get deleted in the flusher thread context. This does not seem that
> > > >    problematic on its own but if we realize progress of memory reclaim
> > > >    depends (at least from a longterm perspective) on flusher thread making
> > > >    progress, things start looking a bit uncertain. Even more so when we
> > > >    would like avoid ->writepage() calls from reclaim and let flusher thread
> > > >    do the work instead. That would then require filesystems to carefully
> > > >    design their ->evict_inode() routines so that things are not
> > > >    deadlockable.
> > > 
> > > You mean "use GFP_NOIO for allocations when holding fs-internal locks"?
> > > 
> > > >   Both these issues should be avoidable (we can postpone fput() after we
> > > > drop mmap_sem; we can tweak inode refcounting to avoid last iput() from
> > > > flusher thread) but obviously there's some cost in the complexity of generic
> > > > layer. So the question is, is it worth it?
> > > 
> > > I don't thing it is.  ->i_mutex in ->release() is never needed; existing
> > > cases are racy and dropping preallocation that way is simply wrong.  And
> > > ->evict_inode() is a non-issue, since it has no reason whatsoever to take
> > > *any* locks in mutex - the damn thing is called when nobody has references
> > > to struct inode anymore.  Deadlocks with flusher... that's what NOIO and
> > > NOFS are for.
> > > 
> > For cluster filesystems, we have to take locks (cluster wide) in
> > ->evict_inode() in order to establish for certain whether we are the
> > last opener of the inode. Just because there are no references on the
> > local node, doesn't mean that a remote node doesn't hold the file open
> > still.
> > 
> > We do always use GFP_NOFS when allocating memory while holding such
> > locks, so I'm not quite sure from the above whether or not that will be
> > an issue,
>   Yeah, but you have to use networking to communicate with other nodes
> about locks and this creates another interesting dependecy.
> 
> Currently, everything seems to work out just fine and I don't say I know
> about a particular deadlock. I just say that the dependencies are so
> complex that I don't know whether things will work OK e.g. if we change
> page reclaim to offload more to flusher thread. And that's what I feel
> uneasy about.
> 
> 								Honza

Yes, I agree. I've certainly seen some issues with this code path in
GFS2 in the past though, so making it more robust in this way seems to
be a good plan to me,

Steve.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-03-28 14:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-27 21:08 [TOPIC] Last iput() from flusher thread, last fput() from munmap() Jan Kara
2012-03-27 21:08 ` Jan Kara
2012-03-28  2:38 ` Al Viro
2012-03-28  2:38   ` Al Viro
2012-03-28  4:45   ` Dave Chinner
2012-03-28  4:45     ` Dave Chinner
2012-03-28  9:04   ` Steven Whitehouse
2012-03-28 11:54     ` [Lsf-pc] " Jan Kara
2012-03-28 11:54       ` Jan Kara
2012-03-28 14:07       ` Steven Whitehouse [this message]
2012-03-28 12:10   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1332943660.2728.66.camel@menhir \
    --to=swhiteho@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.