public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Byungchul Park <byungchul.park@lge.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
	mingo@kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@lge.com, Arnaldo Carvalho de Melo <acme@kernel.org>,
	Tejun Heo <tj@kernel.org>, Boqun Feng <boqun.feng@gmail.com>
Subject: Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING
Date: Wed, 23 Aug 2017 15:11:33 +0900	[thread overview]
Message-ID: <20170823061133.GD22976@X58A-UD3R> (raw)
In-Reply-To: <20170823023118.GC3108@X58A-UD3R>

On Wed, Aug 23, 2017 at 11:31:18AM +0900, Byungchul Park wrote:
> > > IIUC, that's a problem because XFS does this all over the place.
> > > Let me pull the trace apart in reverse order to explain why XFS is
> > > going to throw endless false positives on this:
> > 
> > Yeah, and I agree that's not right or desired. Just trying to figure out
> > how to go about fixing that.
> > 
> > Because, with possible exception for the single threaded workqueues,
> > there is no actual deadlock there unless its the very same work
> > instance. Its the classic instance vs class thing.
> 
> Currently, we do the following in process_one_work(),
> 
> lockdep_map_acquire for a workqueue
> lockdep_map_acquire for a work
> 
> But IMHO it should be,
> 
> lockdep_map_acquire for a pair of workqueue and work.

IMHO, we should remove workqueue's lockdep_map or work's one, and make
the remaining consider the other, so that it works as if
lockdep_map_acquire was used with a pair of workqueue and work, if it's
needed.

> Right?
> 
> > And splitting up work classes is going to be a nightmare too.
> > 
> > > > [   39.159708] -> #0 ((&ioend->io_work)){+.+.}:
> > > > [   39.166126]        process_one_work+0x244/0x6b0
> > > > [   39.171184]        worker_thread+0x48/0x3f0
> > > > [   39.175845]        kthread+0x147/0x180
> > > > [   39.180020]        ret_from_fork+0x2a/0x40
> > > > [   39.184593]        0xffffffffffffffff
> > > 
> > > That's a data IO completion, which will take an inode lock if we
> > > have to run a transaction to update inode size or convert an
> > > unwritten extent.
> > > 
> > > > [   39.100611] -> #1 (&xfs_nondir_ilock_class){++++}:
> > > > [   39.107612]        __lock_acquire+0x10a1/0x1100
> > > > [   39.112660]        lock_acquire+0xea/0x1f0
> > > > [   39.117224]        down_write_nested+0x26/0x60
> > > > [   39.122184]        xfs_ilock+0x166/0x220
> > > > [   39.126563]        __xfs_setfilesize+0x30/0x200
> > > > [   39.131612]        xfs_setfilesize_ioend+0x7f/0xb0
> > > > [   39.136958]        xfs_end_io+0x49/0xf0
> > > > [   39.141240]        process_one_work+0x273/0x6b0
> > > > [   39.146288]        worker_thread+0x48/0x3f0
> > > > [   39.150960]        kthread+0x147/0x180
> > > > [   39.155146]        ret_from_fork+0x2a/0x40
> > > 
> > > That's the data IO completion locking the inode inside a transaction
> > > to update the inode size inside a workqueue.
> > > 
> > > 
> > > > [   38.962397] -> #2 ((complete)&bp->b_iowait){+.+.}:
> > > > [   38.969401]        __lock_acquire+0x10a1/0x1100
> > > > [   38.974460]        lock_acquire+0xea/0x1f0
> > > > [   38.979036]        wait_for_completion+0x3b/0x130
> > > > [   38.984285]        xfs_buf_submit_wait+0xb2/0x590
> > > > [   38.989535]        _xfs_buf_read+0x23/0x30
> > > > [   38.994108]        xfs_buf_read_map+0x14f/0x320
> > > > [   38.999169]        xfs_trans_read_buf_map+0x170/0x5d0
> > > > [   39.004812]        xfs_read_agf+0x97/0x1d0
> > > > [   39.009386]        xfs_alloc_read_agf+0x60/0x240
> > > > [   39.014541]        xfs_alloc_fix_freelist+0x32a/0x3d0
> > > > [   39.020180]        xfs_free_extent_fix_freelist+0x6b/0xa0
> > > > [   39.026206]        xfs_free_extent+0x48/0x120
> > > > [   39.031068]        xfs_trans_free_extent+0x57/0x200
> > > > [   39.036512]        xfs_extent_free_finish_item+0x26/0x40
> > > > [   39.042442]        xfs_defer_finish+0x174/0x770
> > > > [   39.047494]        xfs_itruncate_extents+0x126/0x470
> > > > [   39.053035]        xfs_setattr_size+0x2a1/0x310
> > > > [   39.058093]        xfs_vn_setattr_size+0x57/0x160
> > > > [   39.063342]        xfs_vn_setattr+0x50/0x70
> > > > [   39.068015]        notify_change+0x2ee/0x420
> > > > [   39.072785]        do_truncate+0x5d/0x90
> > > > [   39.077165]        path_openat+0xab2/0xc00
> > > > [   39.081737]        do_filp_open+0x8a/0xf0
> > > > [   39.086213]        do_sys_open+0x123/0x200
> > > > [   39.090787]        SyS_open+0x1e/0x20
> > > > [   39.094876]        entry_SYSCALL_64_fastpath+0x23/0xc2
> > > 
> > > And that's waiting for a metadata read IO to complete during a
> > > truncate transaction. This has no direct connection to the inode at
> > > all - it's a allocation group header that we have to lock to do
> > > block allocation. The completion it is waiting on doesn't even run
> > > through the same workqueue as the ioends - ioends go through
> > > mp->m_data_workqueue or mp->m_unwritten_workqueue, metadata buffers
> > > go through mp->m_buf_workqueue or mp->m_log_workqueue.
> > > 
> > > So from that perspective, an ioend blocked on an inode lock should
> > > not ever prevent metadata buffer completions from being run. Hence
> > > I'd call this a false positive at best, though I think it really
> > > indicates a bug in the new lockdep code because it isn't
> > > discriminating between different workqueue contexts properly.
> > 
> > Agreed. The interaction with workqueues is buggered.
> 
> I think original uses of lockdep_map were already wrong. I mean it's

I meant 'uses in workqueue'.

> not a problem of newly introduced code.
> 
> > > Even if I ignore the fact that buffer completions are run on
> > > different workqueues, there seems to be a bigger problem with this
> > > sort of completion checking.
> > > 
> > > That is, the trace looks plausible because we are definitely hold an
> > > inode locked deep inside a truncate operation where the completion
> > > if flagged.  Indeed, some transactions that would flag like this
> > > could be holding up to 5 inodes locked and have tens of other
> > > metadata objects locked. There are potentially tens (maybe even
> > > hundreds) of different paths into this IO wait point, and all have
> > > different combinations of objects locked when it triggers. So
> > > there's massive scope for potential deadlocks....
> > > 
> > > .... and so we must have some way of avoiding this whole class of
> > > problems that lockdep is unaware of.
> > 
> > 
> > So I did the below little hack, which basically wipes the entire lock
> > history when we start a work and thereby disregards/looses the
> > dependency on the work 'lock'.
> 
> We should not do the following. Why should we give up valuable
> dependencies?
> 
> > 
> > It makes my test box able to boot and build a kernel on XFS, so while I
> > see what you're saying (I think), it doesn't appear to instantly show.
> > 
> > Should I run xfstests or something to further verify things are OK? Does
> > that need a scratch partition (I keep forgetting how to run that stuff
> > :/).
> > 
> > ---
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 66011c9f5df3..de91cdce9460 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -4756,10 +4756,14 @@ void crossrelease_hist_start(enum xhlock_context_t c)
> >  {
> >  	struct task_struct *cur = current;
> >  
> > -	if (cur->xhlocks) {
> > -		cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> > -		cur->hist_id_save[c] = cur->hist_id;
> > -	}
> > +	if (!cur->xhlocks)
> > +		return;
> > +
> > +	if (c == XHLOCK_PROC)
> > +		invalidate_xhlock(&xhlock(cur->xhlock_idx));
> > +
> > +	cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> > +	cur->hist_id_save[c] = cur->hist_id;
> >  }
> >  
> >  void crossrelease_hist_end(enum xhlock_context_t c)

  reply	other threads:[~2017-08-23  6:11 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-17  8:57 [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING Byungchul Park
2017-08-17  8:57 ` [PATCH v3 2/3] lockdep: Reword title of LOCKDEP_CROSSRELEASE config Byungchul Park
2017-08-17 10:21   ` [tip:locking/core] locking/lockdep: " tip-bot for Byungchul Park
2017-08-17  8:57 ` [PATCH v3 3/3] lockdep: Rename LOCKDEP_COMPLETE config Byungchul Park
2017-08-17 10:22   ` [tip:locking/core] locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS tip-bot for Byungchul Park
2017-08-17 10:21 ` [tip:locking/core] locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING tip-bot for Byungchul Park
2017-08-17 10:45   ` Ingo Molnar
2017-08-18  5:33     ` Byungchul Park
2017-08-21 15:46 ` [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING Peter Zijlstra
2017-08-22  5:14   ` Byungchul Park
2017-08-22  7:52     ` Peter Zijlstra
2017-08-22  8:51       ` Byungchul Park
2017-08-22  9:21         ` Peter Zijlstra
2017-08-22  9:33           ` Byungchul Park
2017-08-22 10:08             ` Peter Zijlstra
2017-08-22 13:49               ` Peter Zijlstra
2017-08-22 14:46                 ` Peter Zijlstra
2017-08-22 15:10                   ` Peter Zijlstra
2017-08-22 15:59                   ` Oleg Nesterov
2017-08-22 16:35                     ` Peter Zijlstra
2017-08-23 16:39                   ` Oleg Nesterov
2017-08-23 17:47                     ` Peter Zijlstra
2017-08-24  6:11                       ` Byungchul Park
2017-08-24  7:37                         ` Byungchul Park
2017-08-24  8:11                           ` Byungchul Park
2017-08-25  1:14                             ` Byungchul Park
2017-08-29 15:52                       ` Oleg Nesterov
2017-08-29 17:07                         ` lockdep && recursive-read Oleg Nesterov
2017-08-29 17:30                           ` Peter Zijlstra
2017-08-29 17:51                         ` [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING Peter Zijlstra
2017-08-23  2:43                 ` Byungchul Park
2017-08-23  6:31                   ` Byungchul Park
2017-08-23 10:26                   ` Peter Zijlstra
2017-08-24  5:07                     ` Byungchul Park
2017-08-22  5:46   ` Dave Chinner
2017-08-22  9:06     ` Peter Zijlstra
2017-08-22  9:22       ` Byungchul Park
2017-08-22  9:37         ` Peter Zijlstra
2017-08-22  9:42           ` Peter Zijlstra
2017-08-23  2:12           ` Byungchul Park
2017-08-23  6:03             ` Byungchul Park
2017-08-23 10:20             ` Peter Zijlstra
2017-08-24  2:02               ` Byungchul Park
2017-08-24  7:30                 ` Byungchul Park
2017-08-22 21:19       ` Dave Chinner
2017-08-23  2:31       ` Byungchul Park
2017-08-23  6:11         ` Byungchul Park [this message]
2017-08-23 10:46         ` Peter Zijlstra
2017-08-24  5:06           ` Byungchul Park
2017-08-23  1:56     ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170823061133.GD22976@X58A-UD3R \
    --to=byungchul.park@lge.com \
    --cc=acme@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=david@fromorbit.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox