linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore
Date: Tue, 11 Aug 2015 08:41:54 +1000	[thread overview]
Message-ID: <20150810224154.GK3902@dastard> (raw)
In-Reply-To: <20150810145942.GF3768@quack.suse.cz>

On Mon, Aug 10, 2015 at 04:59:42PM +0200, Jan Kara wrote:
> On Fri 07-08-15 21:55:52, Oleg Nesterov wrote:
> > I'll try to re-check/re-test, but so far I think that this and the
> > previous series are correct. However 3/4 from the previous series
> > (attached at the end just in case) uncovers (I think) some problems
> > in xfs locking.
....
> 
> Hum, I had a look at the lockdep report below and it's one of the
> peculiarities of the freeze protection. For record let me repeat the full
> argument for why I don't think there's a possibility for a real deadlock.
> Feel free to skip to the end if you get bored.  
> 
> One would like to construct the lock chain as:
> 
> CPU0 (chown foo dir)	CPU1 (readdir dir)	CPU2 (page fault)
> process Y		process X, thread 0	process X, thread 1
> 
> 			get ILOCK for dir
> gets freeze protection
> starts transaction in xfs_setattr_nonsize
> waits to get ILOCK on 'dir'
> 						get mmap_sem for X
> 			wait for mmap_sem for process X
> 			  in filldir()
> 						wait for freeze protection in
> 						  xfs_page_mkwrite
> 
> and CPU3 then being in freeze_super() blocking CPU2 and waiting for CPU0 to
> finish it's freeze-protected section. But this cannot happen. The reason is
> that we block writers level-by-level and thus while there are writers at
> level X, we do not block writers at level X+1. So in this particular case
> freeze_super() will block waiting for CPU0 to finish its freeze protected
> section while CPU2 is free to continue.
> 
> In general we have a chain like
> 
> freeze L0 -> freeze L1 -> freeze L2 -> ILOCK -> mmap_sem --\
>                 A                                          |
>                 \------------------------------------------/
> 
> But since ILOCK is always acquired with freeze protection at L0 and we can
> block at L1 only after there are no writers at L0, this loop can never
> happen.
> 
> Note that if we use the property of freezing that lock at level X+1 cannot
> block when we hold lock at level X, we can as well simplify the dependency
> graph and track in it only the lowest level of freeze lock that is
> currently acquired (since the levels above it cannot block and do not in
> any way influence blocking of other processes either and thus are
> irrelevant for the purpose of deadlock detection). Then the dependency
> graph we'd get would be:
> 
> freeze L0 -> ILOCK -> mmap_sem -> freeze L1
> 
> and we have a nice acyclic graph we like to see... So probably we have to
> hack the lockdep instrumentation some more and just don't tell lockdep
> about freeze locks at higher levels if we already hold a lock at lower
> level. Thoughts?

The XFS directory ilock->filldir->might_fault locking path has been
generating false positives in quite a lot of places because of
things we do on one side of the mmap_sem in filesystem paths vs
thigs we do on the other side of the mmap_sem in the page fault
path.

I'm getting sick of these stupid lockdep false positives. I think I
need to rework the XFS readdir locking patch I wrote a while back:

http://oss.sgi.com/archives/xfs/2014-03/msg00146.html

At the time it wasn't clear what the best way forward was. Right now
I think the method I originally used (IOLOCK for directory data and
hence readdir, ILOCK for everything else) will be sufficient; if we
need to do anything smarter that can be addressed further down the
road.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-08-10 22:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-22 21:15 [PATCH 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-07-22 21:15 ` [PATCH 1/4] percpu-rwsem: introduce percpu_down_read_trylock() Oleg Nesterov
2015-07-22 21:15 ` [PATCH 2/4] percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire() Oleg Nesterov
2015-07-31 10:20   ` Peter Zijlstra
2015-08-03 15:40     ` Oleg Nesterov
2015-07-22 21:15 ` [PATCH 3/4] shift percpu_counter_destroy() into destroy_super_work() Oleg Nesterov
2015-07-28  8:36   ` Jan Kara
2015-07-22 21:15 ` [PATCH 4/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-07-22 21:34   ` Oleg Nesterov
2015-07-28  8:34   ` Jan Kara
2015-08-03 17:30     ` Oleg Nesterov
2015-08-07 19:54   ` Oleg Nesterov
2015-08-07 19:55 ` [PATCH 0/4] " Oleg Nesterov
2015-08-10 14:59   ` Jan Kara
2015-08-10 22:41     ` Dave Chinner [this message]
2015-08-11 13:16       ` Oleg Nesterov
2015-08-11 13:29         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150810224154.GK3902@dastard \
    --to=david@fromorbit.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).