All of lore.kernel.org
 help / color / mirror / Atom feed
From: Byungchul Park <byungchul.park@lge.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
	mingo@kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@lge.com, Arnaldo Carvalho de Melo <acme@kernel.org>,
	Tejun Heo <tj@kernel.org>, Boqun Feng <boqun.feng@gmail.com>
Subject: Re: [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING
Date: Wed, 23 Aug 2017 15:11:33 +0900	[thread overview]
Message-ID: <20170823061133.GD22976@X58A-UD3R> (raw)
In-Reply-To: <20170823023118.GC3108@X58A-UD3R>

On Wed, Aug 23, 2017 at 11:31:18AM +0900, Byungchul Park wrote:
> > > IIUC, that's a problem because XFS does this all over the place.
> > > Let me pull the trace apart in reverse order to explain why XFS is
> > > going to throw endless false positives on this:
> > 
> > Yeah, and I agree that's not right or desired. Just trying to figure out
> > how to go about fixing that.
> > 
> > Because, with possible exception for the single threaded workqueues,
> > there is no actual deadlock there unless its the very same work
> > instance. Its the classic instance vs class thing.
> 
> Currently, we do the following in process_one_work(),
> 
> lockdep_map_acquire for a workqueue
> lockdep_map_acquire for a work
> 
> But IMHO it should be,
> 
> lockdep_map_acquire for a pair of workqueue and work.

IMHO, we should remove workqueue's lockdep_map or work's one, and make
the remaining consider the other, so that it works as if
lockdep_map_acquire was used with a pair of workqueue and work, if it's
needed.

> Right?
> 
> > And splitting up work classes is going to be a nightmare too.
> > 
> > > > [   39.159708] -> #0 ((&ioend->io_work)){+.+.}:
> > > > [   39.166126]        process_one_work+0x244/0x6b0
> > > > [   39.171184]        worker_thread+0x48/0x3f0
> > > > [   39.175845]        kthread+0x147/0x180
> > > > [   39.180020]        ret_from_fork+0x2a/0x40
> > > > [   39.184593]        0xffffffffffffffff
> > > 
> > > That's a data IO completion, which will take an inode lock if we
> > > have to run a transaction to update inode size or convert an
> > > unwritten extent.
> > > 
> > > > [   39.100611] -> #1 (&xfs_nondir_ilock_class){++++}:
> > > > [   39.107612]        __lock_acquire+0x10a1/0x1100
> > > > [   39.112660]        lock_acquire+0xea/0x1f0
> > > > [   39.117224]        down_write_nested+0x26/0x60
> > > > [   39.122184]        xfs_ilock+0x166/0x220
> > > > [   39.126563]        __xfs_setfilesize+0x30/0x200
> > > > [   39.131612]        xfs_setfilesize_ioend+0x7f/0xb0
> > > > [   39.136958]        xfs_end_io+0x49/0xf0
> > > > [   39.141240]        process_one_work+0x273/0x6b0
> > > > [   39.146288]        worker_thread+0x48/0x3f0
> > > > [   39.150960]        kthread+0x147/0x180
> > > > [   39.155146]        ret_from_fork+0x2a/0x40
> > > 
> > > That's the data IO completion locking the inode inside a transaction
> > > to update the inode size inside a workqueue.
> > > 
> > > 
> > > > [   38.962397] -> #2 ((complete)&bp->b_iowait){+.+.}:
> > > > [   38.969401]        __lock_acquire+0x10a1/0x1100
> > > > [   38.974460]        lock_acquire+0xea/0x1f0
> > > > [   38.979036]        wait_for_completion+0x3b/0x130
> > > > [   38.984285]        xfs_buf_submit_wait+0xb2/0x590
> > > > [   38.989535]        _xfs_buf_read+0x23/0x30
> > > > [   38.994108]        xfs_buf_read_map+0x14f/0x320
> > > > [   38.999169]        xfs_trans_read_buf_map+0x170/0x5d0
> > > > [   39.004812]        xfs_read_agf+0x97/0x1d0
> > > > [   39.009386]        xfs_alloc_read_agf+0x60/0x240
> > > > [   39.014541]        xfs_alloc_fix_freelist+0x32a/0x3d0
> > > > [   39.020180]        xfs_free_extent_fix_freelist+0x6b/0xa0
> > > > [   39.026206]        xfs_free_extent+0x48/0x120
> > > > [   39.031068]        xfs_trans_free_extent+0x57/0x200
> > > > [   39.036512]        xfs_extent_free_finish_item+0x26/0x40
> > > > [   39.042442]        xfs_defer_finish+0x174/0x770
> > > > [   39.047494]        xfs_itruncate_extents+0x126/0x470
> > > > [   39.053035]        xfs_setattr_size+0x2a1/0x310
> > > > [   39.058093]        xfs_vn_setattr_size+0x57/0x160
> > > > [   39.063342]        xfs_vn_setattr+0x50/0x70
> > > > [   39.068015]        notify_change+0x2ee/0x420
> > > > [   39.072785]        do_truncate+0x5d/0x90
> > > > [   39.077165]        path_openat+0xab2/0xc00
> > > > [   39.081737]        do_filp_open+0x8a/0xf0
> > > > [   39.086213]        do_sys_open+0x123/0x200
> > > > [   39.090787]        SyS_open+0x1e/0x20
> > > > [   39.094876]        entry_SYSCALL_64_fastpath+0x23/0xc2
> > > 
> > > And that's waiting for a metadata read IO to complete during a
> > > truncate transaction. This has no direct connection to the inode at
> > > all - it's a allocation group header that we have to lock to do
> > > block allocation. The completion it is waiting on doesn't even run
> > > through the same workqueue as the ioends - ioends go through
> > > mp->m_data_workqueue or mp->m_unwritten_workqueue, metadata buffers
> > > go through mp->m_buf_workqueue or mp->m_log_workqueue.
> > > 
> > > So from that perspective, an ioend blocked on an inode lock should
> > > not ever prevent metadata buffer completions from being run. Hence
> > > I'd call this a false positive at best, though I think it really
> > > indicates a bug in the new lockdep code because it isn't
> > > discriminating between different workqueue contexts properly.
> > 
> > Agreed. The interaction with workqueues is buggered.
> 
> I think original uses of lockdep_map were already wrong. I mean it's

I meant 'uses in workqueue'.

> not a problem of newly introduced code.
> 
> > > Even if I ignore the fact that buffer completions are run on
> > > different workqueues, there seems to be a bigger problem with this
> > > sort of completion checking.
> > > 
> > > That is, the trace looks plausible because we are definitely hold an
> > > inode locked deep inside a truncate operation where the completion
> > > if flagged.  Indeed, some transactions that would flag like this
> > > could be holding up to 5 inodes locked and have tens of other
> > > metadata objects locked. There are potentially tens (maybe even
> > > hundreds) of different paths into this IO wait point, and all have
> > > different combinations of objects locked when it triggers. So
> > > there's massive scope for potential deadlocks....
> > > 
> > > .... and so we must have some way of avoiding this whole class of
> > > problems that lockdep is unaware of.
> > 
> > 
> > So I did the below little hack, which basically wipes the entire lock
> > history when we start a work and thereby disregards/looses the
> > dependency on the work 'lock'.
> 
> We should not do the following. Why should we give up valuable
> dependencies?
> 
> > 
> > It makes my test box able to boot and build a kernel on XFS, so while I
> > see what you're saying (I think), it doesn't appear to instantly show.
> > 
> > Should I run xfstests or something to further verify things are OK? Does
> > that need a scratch partition (I keep forgetting how to run that stuff
> > :/).
> > 
> > ---
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 66011c9f5df3..de91cdce9460 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -4756,10 +4756,14 @@ void crossrelease_hist_start(enum xhlock_context_t c)
> >  {
> >  	struct task_struct *cur = current;
> >  
> > -	if (cur->xhlocks) {
> > -		cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> > -		cur->hist_id_save[c] = cur->hist_id;
> > -	}
> > +	if (!cur->xhlocks)
> > +		return;
> > +
> > +	if (c == XHLOCK_PROC)
> > +		invalidate_xhlock(&xhlock(cur->xhlock_idx));
> > +
> > +	cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> > +	cur->hist_id_save[c] = cur->hist_id;
> >  }
> >  
> >  void crossrelease_hist_end(enum xhlock_context_t c)

  reply	other threads:[~2017-08-23  6:11 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-17  8:57 [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING Byungchul Park
2017-08-17  8:57 ` [PATCH v3 2/3] lockdep: Reword title of LOCKDEP_CROSSRELEASE config Byungchul Park
2017-08-17 10:21   ` [tip:locking/core] locking/lockdep: " tip-bot for Byungchul Park
2017-08-17  8:57 ` [PATCH v3 3/3] lockdep: Rename LOCKDEP_COMPLETE config Byungchul Park
2017-08-17 10:22   ` [tip:locking/core] locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS tip-bot for Byungchul Park
2017-08-17 10:21 ` [tip:locking/core] locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING tip-bot for Byungchul Park
2017-08-17 10:45   ` Ingo Molnar
2017-08-18  5:33     ` Byungchul Park
2017-08-21 15:46 ` [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING Peter Zijlstra
2017-08-22  5:14   ` Byungchul Park
2017-08-22  7:52     ` Peter Zijlstra
2017-08-22  8:51       ` Byungchul Park
2017-08-22  9:21         ` Peter Zijlstra
2017-08-22  9:33           ` Byungchul Park
2017-08-22 10:08             ` Peter Zijlstra
2017-08-22 13:49               ` Peter Zijlstra
2017-08-22 14:46                 ` Peter Zijlstra
2017-08-22 15:10                   ` Peter Zijlstra
2017-08-22 15:59                   ` Oleg Nesterov
2017-08-22 16:35                     ` Peter Zijlstra
2017-08-23 16:39                   ` Oleg Nesterov
2017-08-23 17:47                     ` Peter Zijlstra
2017-08-24  6:11                       ` Byungchul Park
2017-08-24  7:37                         ` Byungchul Park
2017-08-24  8:11                           ` Byungchul Park
2017-08-25  1:14                             ` Byungchul Park
2017-08-29 15:52                       ` Oleg Nesterov
2017-08-29 17:07                         ` lockdep && recursive-read Oleg Nesterov
2017-08-29 17:30                           ` Peter Zijlstra
2017-08-29 17:51                         ` [PATCH v3 1/3] lockdep: Make LOCKDEP_CROSSRELEASE configs all part of PROVE_LOCKING Peter Zijlstra
2017-08-23  2:43                 ` Byungchul Park
2017-08-23  6:31                   ` Byungchul Park
2017-08-23 10:26                   ` Peter Zijlstra
2017-08-24  5:07                     ` Byungchul Park
2017-08-22  5:46   ` Dave Chinner
2017-08-22  9:06     ` Peter Zijlstra
2017-08-22  9:22       ` Byungchul Park
2017-08-22  9:37         ` Peter Zijlstra
2017-08-22  9:42           ` Peter Zijlstra
2017-08-23  2:12           ` Byungchul Park
2017-08-23  6:03             ` Byungchul Park
2017-08-23 10:20             ` Peter Zijlstra
2017-08-24  2:02               ` Byungchul Park
2017-08-24  7:30                 ` Byungchul Park
2017-08-22 21:19       ` Dave Chinner
2017-08-23  2:31       ` Byungchul Park
2017-08-23  6:11         ` Byungchul Park [this message]
2017-08-23 10:46         ` Peter Zijlstra
2017-08-24  5:06           ` Byungchul Park
2017-08-23  1:56     ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170823061133.GD22976@X58A-UD3R \
    --to=byungchul.park@lge.com \
    --cc=acme@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=david@fromorbit.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.