Re: __sb_start_write() && force_trylock hack

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: __sb_start_write() && force_trylock hack
Date: Wed, 19 Aug 2015 13:11:58 +1000	[thread overview]
Message-ID: <20150819031158.GO714@dastard> (raw)
In-Reply-To: <20150818144900.GA26478@redhat.com>

On Tue, Aug 18, 2015 at 04:49:00PM +0200, Oleg Nesterov wrote:
> Jan, Dave, perhaps you can take a look...
> 
> On 08/14, Oleg Nesterov wrote:
> >
> > Plus another patch which removes the "trylock"
> > hack in __sb_start_write().
> 
> I meant the patch we already discussed (attached at the end). And yes,
> previously I reported it passed the tests. However, I only ran the same
> 'grep -il freeze tests/*/???' tests. When I tried to run all tests, I
> got the new reports from lockdep.
> 
> 	[ 2098.281171]  May be due to missing lock nesting notation

<groan>

> 	[ 2098.288744] 4 locks held by fsstress/50971:
> 	[ 2098.293408]  #0:  (sb_writers#13){++++++}, at: [<ffffffff81248d32>] __sb_start_write+0x32/0x40
> 	[ 2098.303085]  #1:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffff8125685f>] filename_create+0x7f/0x170
> 	[ 2098.314038]  #2:  (sb_internal#2){++++++}, at: [<ffffffff81248d32>] __sb_start_write+0x32/0x40
> 	[ 2098.323711]  #3:  (&type->s_umount_key#54){++++++}, at: [<ffffffffa05a638c>] xfs_flush_inodes+0x1c/0x40 [xfs]
> 	[ 2098.334898]
> 		       stack backtrace:
> 	[ 2098.339762] CPU: 3 PID: 50971 Comm: fsstress Not tainted 4.2.0-rc6+ #27
> 	[ 2098.347143] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
> 	[ 2098.358303]  0000000000000000 00000000e70ee864 ffff880c05a2b9c8 ffffffff817ee692
> 	[ 2098.366603]  0000000000000000 ffffffff826f8030 ffff880c05a2bab8 ffffffff810f45be
> 	[ 2098.374900]  0000000000000000 ffff880c05a2bb20 0000000000000000 0000000000000000
> 	[ 2098.383197] Call Trace:
> 	[ 2098.385930]  [<ffffffff817ee692>] dump_stack+0x45/0x57
> 	[ 2098.391667]  [<ffffffff810f45be>] __lock_acquire+0x1e9e/0x2040
> 	[ 2098.398177]  [<ffffffff810f310d>] ? __lock_acquire+0x9ed/0x2040
> 	[ 2098.404787]  [<ffffffff811d4702>] ? pagevec_lookup_entries+0x22/0x30
> 	[ 2098.411879]  [<ffffffff811d5142>] ? truncate_inode_pages_range+0x2b2/0x7e0
> 	[ 2098.419551]  [<ffffffff810f542e>] lock_acquire+0xbe/0x150
> 	[ 2098.425566]  [<ffffffff81248d32>] ? __sb_start_write+0x32/0x40
> 	[ 2098.432079]  [<ffffffff810ede91>] percpu_down_read+0x51/0xa0
> 	[ 2098.438396]  [<ffffffff81248d32>] ? __sb_start_write+0x32/0x40
> 	[ 2098.444905]  [<ffffffff81248d32>] __sb_start_write+0x32/0x40
> 	[ 2098.451244]  [<ffffffffa05a7f84>] xfs_trans_alloc+0x24/0x40 [xfs]
> 	[ 2098.458076]  [<ffffffffa059e872>] xfs_inactive_truncate+0x32/0x110 [xfs]
> 	[ 2098.465580]  [<ffffffffa059f662>] xfs_inactive+0x102/0x120 [xfs]
> 	[ 2098.472308]  [<ffffffffa05a475b>] xfs_fs_evict_inode+0x7b/0xc0 [xfs]
> 	[ 2098.479401]  [<ffffffff812642ab>] evict+0xab/0x170
> 	[ 2098.484748]  [<ffffffff81264568>] iput+0x1a8/0x230
> 	[ 2098.490100]  [<ffffffff8127701c>] sync_inodes_sb+0x14c/0x1d0
> 	[ 2098.496439]  [<ffffffffa05a6398>] xfs_flush_inodes+0x28/0x40 [xfs]
> 	[ 2098.503361]  [<ffffffffa059e213>] xfs_create+0x5c3/0x6d0 [xfs]

Another false positive.  We have a transaction handle here, but
we've failed to reserve space for it so it's not an active
transaction yet. i.e. the filesystem is operating at ENOSPC.

On an ENOSPC error, we flush out dirty data to free up reserved
delalloc space, which obviously is obviously then racing with an
unlink of an inode in another thread and so drops the final
reference to the inode, and it goes down the eviction path that runs
transactions.

But there is no deadlock here - while we have a transaction handle
allocated at xfs_create() context, it is not yet active and holds no
reservation, so it is safe for eviction here to run new transactions
as there is no active transaction in progress.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2015-08-19  3:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-18 14:49 __sb_start_write() && force_trylock hack Oleg Nesterov
2015-08-18 15:18 ` Oleg Nesterov
2015-08-19  3:25   ` Dave Chinner
2015-08-19  3:11 ` Dave Chinner [this message]
2015-08-19 15:00   ` Oleg Nesterov
2015-08-19 23:26     ` Dave Chinner
2015-08-21 18:30       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150819031158.GO714@dastard \
    --to=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).