* [RESEND PATCH v2 1/2] completion: Add support for initializing completion with lockdep_map
@ 2017-10-10 6:51 Byungchul Park
2017-10-10 6:51 ` [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush Byungchul Park
0 siblings, 1 reply; 10+ messages in thread
From: Byungchul Park @ 2017-10-10 6:51 UTC (permalink / raw)
To: tj, johannes.berg, peterz, mingo; +Cc: tglx, oleg, linux-kernel, kernel-team
Chagnes from v1
- Add a completion initialization function with a lockdep map
- Enhance readability of the workqueue code
----->8-----
>From e148617e20ebc9c9eefe7bb222b9bba07cb963bc Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul.park@lge.com>
Date: Fri, 8 Sep 2017 17:39:48 +0900
Subject: [RESEND PATCH v2 1/2] completion: Add support for initializing completion
with lockdep_map
Sometimes, we want to initialize completions with sparate lockdep maps
to assign lock classes under control. For example, the workqueue code
manages lockdep maps, as it can classify lockdep maps properly.
Provided a function for that purpose.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
include/linux/completion.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/linux/completion.h b/include/linux/completion.h
index cae5400..182d56e 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -49,6 +49,13 @@ static inline void complete_release_commit(struct completion *x)
lock_commit_crosslock((struct lockdep_map *)&x->map);
}
+#define init_completion_with_map(x, m) \
+do { \
+ lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \
+ (m)->name, (m)->key, 0); \
+ __init_completion(x); \
+} while (0)
+
#define init_completion(x) \
do { \
static struct lock_class_key __key; \
@@ -58,6 +65,7 @@ static inline void complete_release_commit(struct completion *x)
__init_completion(x); \
} while (0)
#else
+#define init_completion_with_map(x, m) __init_completion(x)
#define init_completion(x) __init_completion(x)
static inline void complete_acquire(struct completion *x) {}
static inline void complete_release(struct completion *x) {}
--
1.9.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-10 6:51 [RESEND PATCH v2 1/2] completion: Add support for initializing completion with lockdep_map Byungchul Park
@ 2017-10-10 6:51 ` Byungchul Park
2017-10-12 7:54 ` Byungchul Park
0 siblings, 1 reply; 10+ messages in thread
From: Byungchul Park @ 2017-10-10 6:51 UTC (permalink / raw)
To: tj, johannes.berg, peterz, mingo; +Cc: tglx, oleg, linux-kernel, kernel-team
The workqueue added manual acquisitions to catch deadlock cases.
Now crossrelease was introduced, some of those are redundant, since
wait_for_completion() already includes the acquisition for itself.
Removed it.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
include/linux/workqueue.h | 4 ++--
kernel/workqueue.c | 20 ++++----------------
2 files changed, 6 insertions(+), 18 deletions(-)
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index db6dc9d..1bef13e 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -218,7 +218,7 @@ static inline void destroy_delayed_work_on_stack(struct delayed_work *work) { }
\
__init_work((_work), _onstack); \
(_work)->data = (atomic_long_t) WORK_DATA_INIT(); \
- lockdep_init_map(&(_work)->lockdep_map, #_work, &__key, 0); \
+ lockdep_init_map(&(_work)->lockdep_map, "(complete)"#_work, &__key, 0); \
INIT_LIST_HEAD(&(_work)->entry); \
(_work)->func = (_func); \
} while (0)
@@ -398,7 +398,7 @@ enum {
static struct lock_class_key __key; \
const char *__lock_name; \
\
- __lock_name = #fmt#args; \
+ __lock_name = "(complete)"#fmt#args; \
\
__alloc_workqueue_key((fmt), (flags), (max_active), \
&__key, __lock_name, ##args); \
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ab3c0dc..72f68b1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2497,15 +2497,8 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
__set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
- /*
- * Explicitly init the crosslock for wq_barrier::done, make its lock
- * key a subkey of the corresponding work. As a result we won't
- * build a dependency between wq_barrier::done and unrelated work.
- */
- lockdep_init_map_crosslock((struct lockdep_map *)&barr->done.map,
- "(complete)wq_barr::done",
- target->lockdep_map.key, 1);
- __init_completion(&barr->done);
+ init_completion_with_map(&barr->done, &target->lockdep_map);
+
barr->task = current;
/*
@@ -2611,16 +2604,14 @@ void flush_workqueue(struct workqueue_struct *wq)
struct wq_flusher this_flusher = {
.list = LIST_HEAD_INIT(this_flusher.list),
.flush_color = -1,
- .done = COMPLETION_INITIALIZER_ONSTACK(this_flusher.done),
};
int next_color;
+ init_completion_with_map(&this_flusher.done, &wq->lockdep_map);
+
if (WARN_ON(!wq_online))
return;
- lock_map_acquire(&wq->lockdep_map);
- lock_map_release(&wq->lockdep_map);
-
mutex_lock(&wq->mutex);
/*
@@ -2883,9 +2874,6 @@ bool flush_work(struct work_struct *work)
if (WARN_ON(!wq_online))
return false;
- lock_map_acquire(&work->lockdep_map);
- lock_map_release(&work->lockdep_map);
-
if (start_flush_work(work, &barr)) {
wait_for_completion(&barr.done);
destroy_work_on_stack(&barr.work);
--
1.9.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-10 6:51 ` [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush Byungchul Park
@ 2017-10-12 7:54 ` Byungchul Park
2017-10-12 15:38 ` Tejun Heo
0 siblings, 1 reply; 10+ messages in thread
From: Byungchul Park @ 2017-10-12 7:54 UTC (permalink / raw)
To: tj, johannes.berg, peterz, mingo; +Cc: tglx, oleg, linux-kernel, kernel-team
On Tue, Oct 10, 2017 at 03:51:37PM +0900, Byungchul Park wrote:
> The workqueue added manual acquisitions to catch deadlock cases.
> Now crossrelease was introduced, some of those are redundant, since
> wait_for_completion() already includes the acquisition for itself.
> Removed it.
I think manual annotations for wait_for_completion() should be removed
in this way, since it's already embedded in wait_for_completion(), now.
Don't you think so?
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
> include/linux/workqueue.h | 4 ++--
> kernel/workqueue.c | 20 ++++----------------
> 2 files changed, 6 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index db6dc9d..1bef13e 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -218,7 +218,7 @@ static inline void destroy_delayed_work_on_stack(struct delayed_work *work) { }
> \
> __init_work((_work), _onstack); \
> (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \
> - lockdep_init_map(&(_work)->lockdep_map, #_work, &__key, 0); \
> + lockdep_init_map(&(_work)->lockdep_map, "(complete)"#_work, &__key, 0); \
> INIT_LIST_HEAD(&(_work)->entry); \
> (_work)->func = (_func); \
> } while (0)
> @@ -398,7 +398,7 @@ enum {
> static struct lock_class_key __key; \
> const char *__lock_name; \
> \
> - __lock_name = #fmt#args; \
> + __lock_name = "(complete)"#fmt#args; \
> \
> __alloc_workqueue_key((fmt), (flags), (max_active), \
> &__key, __lock_name, ##args); \
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index ab3c0dc..72f68b1 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2497,15 +2497,8 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
> INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
> __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
>
> - /*
> - * Explicitly init the crosslock for wq_barrier::done, make its lock
> - * key a subkey of the corresponding work. As a result we won't
> - * build a dependency between wq_barrier::done and unrelated work.
> - */
> - lockdep_init_map_crosslock((struct lockdep_map *)&barr->done.map,
> - "(complete)wq_barr::done",
> - target->lockdep_map.key, 1);
> - __init_completion(&barr->done);
> + init_completion_with_map(&barr->done, &target->lockdep_map);
> +
> barr->task = current;
>
> /*
> @@ -2611,16 +2604,14 @@ void flush_workqueue(struct workqueue_struct *wq)
> struct wq_flusher this_flusher = {
> .list = LIST_HEAD_INIT(this_flusher.list),
> .flush_color = -1,
> - .done = COMPLETION_INITIALIZER_ONSTACK(this_flusher.done),
> };
> int next_color;
>
> + init_completion_with_map(&this_flusher.done, &wq->lockdep_map);
> +
> if (WARN_ON(!wq_online))
> return;
>
> - lock_map_acquire(&wq->lockdep_map);
> - lock_map_release(&wq->lockdep_map);
> -
> mutex_lock(&wq->mutex);
>
> /*
> @@ -2883,9 +2874,6 @@ bool flush_work(struct work_struct *work)
> if (WARN_ON(!wq_online))
> return false;
>
> - lock_map_acquire(&work->lockdep_map);
> - lock_map_release(&work->lockdep_map);
> -
> if (start_flush_work(work, &barr)) {
> wait_for_completion(&barr.done);
> destroy_work_on_stack(&barr.work);
> --
> 1.9.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-12 7:54 ` Byungchul Park
@ 2017-10-12 15:38 ` Tejun Heo
2017-10-12 15:56 ` Peter Zijlstra
0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2017-10-12 15:38 UTC (permalink / raw)
To: Byungchul Park
Cc: johannes.berg, peterz, mingo, tglx, oleg, linux-kernel,
kernel-team
Hello,
On Thu, Oct 12, 2017 at 04:54:35PM +0900, Byungchul Park wrote:
> On Tue, Oct 10, 2017 at 03:51:37PM +0900, Byungchul Park wrote:
> > The workqueue added manual acquisitions to catch deadlock cases.
> > Now crossrelease was introduced, some of those are redundant, since
> > wait_for_completion() already includes the acquisition for itself.
> > Removed it.
>
> I think manual annotations for wait_for_completion() should be removed
> in this way, since it's already embedded in wait_for_completion(), now.
> Don't you think so?
As long as we have the same level of protection, simpler code is of
course preferable. That said, I haven't followed the discussion
closely and don't want to apply it without Peter's ack. Peter?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-12 15:38 ` Tejun Heo
@ 2017-10-12 15:56 ` Peter Zijlstra
2017-10-13 7:56 ` Byungchul Park
0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2017-10-12 15:56 UTC (permalink / raw)
To: Tejun Heo
Cc: Byungchul Park, johannes.berg, mingo, tglx, oleg, linux-kernel,
kernel-team
On Thu, Oct 12, 2017 at 08:38:17AM -0700, Tejun Heo wrote:
>
> As long as we have the same level of protection, simpler code is of
> course preferable. That said, I haven't followed the discussion
> closely and don't want to apply it without Peter's ack. Peter?
I'm really tied up atm; and feel we should be addressing the false
positives generated by the current code before we start doing new stuff
on top.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-12 15:56 ` Peter Zijlstra
@ 2017-10-13 7:56 ` Byungchul Park
2017-10-13 8:27 ` Peter Zijlstra
0 siblings, 1 reply; 10+ messages in thread
From: Byungchul Park @ 2017-10-13 7:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Tejun Heo, johannes.berg, mingo, tglx, oleg, linux-kernel,
kernel-team
On Thu, Oct 12, 2017 at 05:56:35PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 12, 2017 at 08:38:17AM -0700, Tejun Heo wrote:
> >
> > As long as we have the same level of protection, simpler code is of
> > course preferable. That said, I haven't followed the discussion
> > closely and don't want to apply it without Peter's ack. Peter?
>
> I'm really tied up atm; and feel we should be addressing the false
> positives generated by the current code before we start doing new stuff
> on top.
We can never avoid adding false dependencies as long as we use
acquisitions in that way the workqueue code does, even though you
successfully replace write acquisitions with recursive-read ones after
making them work, as you know.
Speaking a bit more about the reason, it's because all write locks used
in every work->func() obviously generate false dependencies(links) with
'work' lockdep_map and 'wq' lockdep_map, when they do not involve flush.
This is why I used a word, 'speculative', whenever we were talking.
At the beginning of this issue, I suggested to use recursive-read
acquisitions instead of write ones, which you are working on, since
anyway it reduces false ones.
But, if it's allowed to add a new primitive that just suits that purpose,
I want to propose it to be used instead, which makes false ones reduced
more. A read acuisition is a real acquisition used for a read lock.
Semantics are similar to but not same as what we need for that
speculative purpose.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-13 7:56 ` Byungchul Park
@ 2017-10-13 8:27 ` Peter Zijlstra
2017-10-13 8:48 ` Byungchul Park
2017-10-16 10:58 ` Byungchul Park
0 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-10-13 8:27 UTC (permalink / raw)
To: Byungchul Park
Cc: Tejun Heo, johannes.berg, mingo, tglx, oleg, linux-kernel,
kernel-team
On Fri, Oct 13, 2017 at 04:56:33PM +0900, Byungchul Park wrote:
> On Thu, Oct 12, 2017 at 05:56:35PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 12, 2017 at 08:38:17AM -0700, Tejun Heo wrote:
> > >
> > > As long as we have the same level of protection, simpler code is of
> > > course preferable. That said, I haven't followed the discussion
> > > closely and don't want to apply it without Peter's ack. Peter?
> >
> > I'm really tied up atm; and feel we should be addressing the false
> > positives generated by the current code before we start doing new stuff
> > on top.
>
> We can never avoid adding false dependencies as long as we use
> acquisitions in that way the workqueue code does, even though you
> successfully replace write acquisitions with recursive-read ones after
> making them work, as you know.
Not the point; they still need to get annotated away. The block layer
and xfs are now fairly consistently triggering lockdep splats, that
needs to get sorted.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-13 8:27 ` Peter Zijlstra
@ 2017-10-13 8:48 ` Byungchul Park
2017-10-16 10:58 ` Byungchul Park
1 sibling, 0 replies; 10+ messages in thread
From: Byungchul Park @ 2017-10-13 8:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Tejun Heo, johannes.berg, mingo, tglx, oleg, linux-kernel,
kernel-team
On Fri, Oct 13, 2017 at 10:27:50AM +0200, Peter Zijlstra wrote:
> On Fri, Oct 13, 2017 at 04:56:33PM +0900, Byungchul Park wrote:
> > On Thu, Oct 12, 2017 at 05:56:35PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 12, 2017 at 08:38:17AM -0700, Tejun Heo wrote:
> > > >
> > > > As long as we have the same level of protection, simpler code is of
> > > > course preferable. That said, I haven't followed the discussion
> > > > closely and don't want to apply it without Peter's ack. Peter?
> > >
> > > I'm really tied up atm; and feel we should be addressing the false
> > > positives generated by the current code before we start doing new stuff
> > > on top.
> >
> > We can never avoid adding false dependencies as long as we use
> > acquisitions in that way the workqueue code does, even though you
> > successfully replace write acquisitions with recursive-read ones after
> > making them work, as you know.
>
> Not the point; they still need to get annotated away. The block layer
> and xfs are now fairly consistently triggering lockdep splats, that
> needs to get sorted.
I'm not sure if I can help, since I'm not familiar with that sub-systems,
but I want to do something for them if I can.
Could you share the issues and test cases? Or I suggest that don't wait
for recursive-read work to complete but use 'might' thing to avoid the
false annotations.
I don't say it to rush you to take my patches, but I do beacuse it does
exactly what we need in that case. I believe it would be helpful.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-13 8:27 ` Peter Zijlstra
2017-10-13 8:48 ` Byungchul Park
@ 2017-10-16 10:58 ` Byungchul Park
2017-10-16 11:06 ` Byungchul Park
1 sibling, 1 reply; 10+ messages in thread
From: Byungchul Park @ 2017-10-16 10:58 UTC (permalink / raw)
To: Peter Zijlstra, david, darrick.wong
Cc: Tejun Heo, johannes.berg, mingo, tglx, oleg, linux-kernel,
kernel-team
On Fri, Oct 13, 2017 at 10:27:50AM +0200, Peter Zijlstra wrote:
> On Fri, Oct 13, 2017 at 04:56:33PM +0900, Byungchul Park wrote:
> > On Thu, Oct 12, 2017 at 05:56:35PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 12, 2017 at 08:38:17AM -0700, Tejun Heo wrote:
> > > >
> > > > As long as we have the same level of protection, simpler code is of
> > > > course preferable. That said, I haven't followed the discussion
> > > > closely and don't want to apply it without Peter's ack. Peter?
> > >
> > > I'm really tied up atm; and feel we should be addressing the false
> > > positives generated by the current code before we start doing new stuff
> > > on top.
> >
> > We can never avoid adding false dependencies as long as we use
> > acquisitions in that way the workqueue code does, even though you
> > successfully replace write acquisitions with recursive-read ones after
> > making them work, as you know.
>
> Not the point; they still need to get annotated away. The block layer
> and xfs are now fairly consistently triggering lockdep splats, that
> needs to get sorted.
I believe that the following patch is helpful for that issue. Could you
check if it works, and give your opinion? I will add more comments on it
if you let me know if this patch is acceptable.
----->8-----
>From 340846e59c28457faf4d8051fb0c8a4c9da8c9b1 Mon Sep 17 00:00:00 2001
From: Byungchul Park <byungchul.park@lge.com>
Date: Mon, 16 Oct 2017 18:51:51 +0900
Subject: [RFC] lockdep: Assign a lock_class per gendisk used for
wait_for_completion()
Darrick and Dave Chinner posted the following warning:
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc1-fixes #1 Tainted: G W
> ------------------------------------------------------
> loop0/31693 is trying to acquire lock:
> (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<ffffffffa00f1b0c>] xfs_ilock+0x23c/0x330 [xfs]
>
> but now in release context of a crosslock acquired at the following:
> ((complete)&ret.event){+.+.}, at: [<ffffffff81326c1f>] submit_bio_wait+0x7f/0xb0
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 ((complete)&ret.event){+.+.}:
> lock_acquire+0xab/0x200
> wait_for_completion_io+0x4e/0x1a0
> submit_bio_wait+0x7f/0xb0
> blkdev_issue_zeroout+0x71/0xa0
> xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs]
> xfs_bmapi_write+0x374/0x11f0 [xfs]
> xfs_iomap_write_direct+0x2ac/0x430 [xfs]
> xfs_file_iomap_begin+0x20d/0xd50 [xfs]
> iomap_apply+0x43/0xe0
> dax_iomap_rw+0x89/0xf0
> xfs_file_dax_write+0xcc/0x220 [xfs]
> xfs_file_write_iter+0xf0/0x130 [xfs]
> __vfs_write+0xd9/0x150
> vfs_write+0xc8/0x1c0
> SyS_write+0x45/0xa0
> entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> -> #1 (&xfs_nondir_ilock_class){++++}:
> lock_acquire+0xab/0x200
> down_write_nested+0x4a/0xb0
> xfs_ilock+0x263/0x330 [xfs]
> xfs_setattr_size+0x152/0x370 [xfs]
> xfs_vn_setattr+0x6b/0x90 [xfs]
> notify_change+0x27d/0x3f0
> do_truncate+0x5b/0x90
> path_openat+0x237/0xa90
> do_filp_open+0x8a/0xf0
> do_sys_open+0x11c/0x1f0
> entry_SYSCALL_64_fastpath+0x1f/0xbe
>
> -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}:
> up_write+0x1c/0x40
> xfs_iunlock+0x1d0/0x310 [xfs]
> xfs_file_fallocate+0x8a/0x310 [xfs]
> loop_queue_work+0xb7/0x8d0
> kthread_worker_fn+0xb9/0x1f0
>
> Chain exists of:
> &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event
>
> Possible unsafe locking scenario by crosslock:
>
> CPU0 CPU1
> ---- ----
> lock(&xfs_nondir_ilock_class);
> lock((complete)&ret.event);
> lock(&(&ip->i_mmaplock)->mr_lock);
> unlock((complete)&ret.event);
>
> *** DEADLOCK ***
The warning is a false positive, caused by the fact that all
wait_for_completion()s in submit_bio_wait() are waiting with the same
lock class.
However, some bios have nothing to do with others, for example, the case
might happen while using loop devices, between bios of an upper device
and a lower device(=loop device).
The safest way to assign different lock classes to different devices is
to do it for each gendisk. In other words, this patch assigns a
lockdep_map per gendisk and uses it when initializing completion in
submit_bio_wait().
Of course, it might be too conservative. But, making it safest for now
and extended by block layer experts later is good, atm.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
block/bio.c | 4 ++--
block/genhd.c | 13 +++++--------
include/linux/genhd.h | 26 ++++++++++++++++++++++----
3 files changed, 29 insertions(+), 14 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 9a63597..0d4d6c0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -941,7 +941,7 @@ int submit_bio_wait(struct bio *bio)
{
struct submit_bio_ret ret;
- init_completion(&ret.event);
+ init_completion_with_map(&ret.event, &bio->bi_disk->lockdep_map);
bio->bi_private = &ret;
bio->bi_end_io = submit_bio_wait_endio;
bio->bi_opf |= REQ_SYNC;
@@ -1382,7 +1382,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
if (len <= 0)
break;
-
+
if (bytes > len)
bytes = len;
diff --git a/block/genhd.c b/block/genhd.c
index 7f520fa..676c245 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1304,13 +1304,7 @@ dev_t blk_lookup_devt(const char *name, int partno)
}
EXPORT_SYMBOL(blk_lookup_devt);
-struct gendisk *alloc_disk(int minors)
-{
- return alloc_disk_node(minors, NUMA_NO_NODE);
-}
-EXPORT_SYMBOL(alloc_disk);
-
-struct gendisk *alloc_disk_node(int minors, int node_id)
+struct gendisk *__alloc_disk_node(int minors, int node_id, struct lock_class_key *key, const char *lock_name)
{
struct gendisk *disk;
@@ -1350,9 +1344,12 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
disk_to_dev(disk)->type = &disk_type;
device_initialize(disk_to_dev(disk));
}
+
+ lockdep_init_map(&disk->lockdep_map, lock_name, key, 0);
+
return disk;
}
-EXPORT_SYMBOL(alloc_disk_node);
+EXPORT_SYMBOL(__alloc_disk_node);
struct kobject *get_disk(struct gendisk *disk)
{
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index e619fae..5225efc 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -3,7 +3,7 @@
/*
* genhd.h Copyright (C) 1992 Drew Eckhardt
- * Generic hard disk header file by
+ * Generic hard disk header file by
* Drew Eckhardt
*
* <drew@colorado.edu>
@@ -206,6 +206,9 @@ struct gendisk {
#endif /* CONFIG_BLK_DEV_INTEGRITY */
int node_id;
struct badblocks *bb;
+#ifdef CONFIG_LOCKDEP_COMPLETIONS
+ struct lockdep_map lockdep_map;
+#endif
};
static inline struct gendisk *part_to_disk(struct hd_struct *part)
@@ -483,7 +486,7 @@ struct bsd_disklabel {
__s16 d_type; /* drive type */
__s16 d_subtype; /* controller/d_type specific */
char d_typename[16]; /* type name, e.g. "eagle" */
- char d_packname[16]; /* pack identifier */
+ char d_packname[16]; /* pack identifier */
__u32 d_secsize; /* # of bytes per sector */
__u32 d_nsectors; /* # of data sectors per track */
__u32 d_ntracks; /* # of tracks per cylinder */
@@ -602,8 +605,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
extern void delete_partition(struct gendisk *, int);
extern void printk_all_partitions(void);
-extern struct gendisk *alloc_disk_node(int minors, int node_id);
-extern struct gendisk *alloc_disk(int minors);
+extern struct gendisk *__alloc_disk_node(int minors, int node_id, struct lock_class_key *key, const char *lock_name);
extern struct kobject *get_disk(struct gendisk *disk);
extern void put_disk(struct gendisk *disk);
extern void blk_register_region(dev_t devt, unsigned long range,
@@ -627,6 +629,22 @@ extern ssize_t part_fail_store(struct device *dev,
const char *buf, size_t count);
#endif /* CONFIG_FAIL_MAKE_REQUEST */
+#ifdef CONFIG_LOCKDEP_COMPLETIONS
+#define alloc_disk_node(m, id) \
+({ \
+ static struct lock_class_key __key; \
+ const char *__lock_name; \
+ \
+ __lock_name = "(complete)"#m"("#id")"; \
+ \
+ __alloc_disk_node(m, id, &__key, __lock_name); \
+})
+#else
+#define alloc_disk_node(m, id) __alloc_disk_node(m, id, NULL, NULL)
+#endif
+
+#define alloc_disk(m) alloc_disk_node(m, NUMA_NO_NODE)
+
static inline int hd_ref_init(struct hd_struct *part)
{
if (percpu_ref_init(&part->ref, __delete_partition, 0,
--
1.9.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush
2017-10-16 10:58 ` Byungchul Park
@ 2017-10-16 11:06 ` Byungchul Park
0 siblings, 0 replies; 10+ messages in thread
From: Byungchul Park @ 2017-10-16 11:06 UTC (permalink / raw)
To: Peter Zijlstra, david, darrick.wong
Cc: Tejun Heo, johannes.berg, mingo, tglx, oleg, linux-kernel,
kernel-team
On Mon, Oct 16, 2017 at 07:58:57PM +0900, Byungchul Park wrote:
> On Fri, Oct 13, 2017 at 10:27:50AM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 13, 2017 at 04:56:33PM +0900, Byungchul Park wrote:
> > > On Thu, Oct 12, 2017 at 05:56:35PM +0200, Peter Zijlstra wrote:
> > > > On Thu, Oct 12, 2017 at 08:38:17AM -0700, Tejun Heo wrote:
> > > > >
> > > > > As long as we have the same level of protection, simpler code is of
> > > > > course preferable. That said, I haven't followed the discussion
> > > > > closely and don't want to apply it without Peter's ack. Peter?
> > > >
> > > > I'm really tied up atm; and feel we should be addressing the false
> > > > positives generated by the current code before we start doing new stuff
> > > > on top.
> > >
> > > We can never avoid adding false dependencies as long as we use
> > > acquisitions in that way the workqueue code does, even though you
> > > successfully replace write acquisitions with recursive-read ones after
> > > making them work, as you know.
> >
> > Not the point; they still need to get annotated away. The block layer
> > and xfs are now fairly consistently triggering lockdep splats, that
> > needs to get sorted.
>
> I believe that the following patch is helpful for that issue. Could you
> check if it works, and give your opinion? I will add more comments on it
> if you let me know if this patch is acceptable.
I missed something to tell you, the following patch should be applied on
top of the first patch of this patchset, "completion: Add support for
initializing completion with lockdep_map".
> ----->8-----
> >From 340846e59c28457faf4d8051fb0c8a4c9da8c9b1 Mon Sep 17 00:00:00 2001
> From: Byungchul Park <byungchul.park@lge.com>
> Date: Mon, 16 Oct 2017 18:51:51 +0900
> Subject: [RFC] lockdep: Assign a lock_class per gendisk used for
> wait_for_completion()
>
> Darrick and Dave Chinner posted the following warning:
>
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.14.0-rc1-fixes #1 Tainted: G W
> > ------------------------------------------------------
> > loop0/31693 is trying to acquire lock:
> > (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<ffffffffa00f1b0c>] xfs_ilock+0x23c/0x330 [xfs]
> >
> > but now in release context of a crosslock acquired at the following:
> > ((complete)&ret.event){+.+.}, at: [<ffffffff81326c1f>] submit_bio_wait+0x7f/0xb0
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #2 ((complete)&ret.event){+.+.}:
> > lock_acquire+0xab/0x200
> > wait_for_completion_io+0x4e/0x1a0
> > submit_bio_wait+0x7f/0xb0
> > blkdev_issue_zeroout+0x71/0xa0
> > xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs]
> > xfs_bmapi_write+0x374/0x11f0 [xfs]
> > xfs_iomap_write_direct+0x2ac/0x430 [xfs]
> > xfs_file_iomap_begin+0x20d/0xd50 [xfs]
> > iomap_apply+0x43/0xe0
> > dax_iomap_rw+0x89/0xf0
> > xfs_file_dax_write+0xcc/0x220 [xfs]
> > xfs_file_write_iter+0xf0/0x130 [xfs]
> > __vfs_write+0xd9/0x150
> > vfs_write+0xc8/0x1c0
> > SyS_write+0x45/0xa0
> > entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > -> #1 (&xfs_nondir_ilock_class){++++}:
> > lock_acquire+0xab/0x200
> > down_write_nested+0x4a/0xb0
> > xfs_ilock+0x263/0x330 [xfs]
> > xfs_setattr_size+0x152/0x370 [xfs]
> > xfs_vn_setattr+0x6b/0x90 [xfs]
> > notify_change+0x27d/0x3f0
> > do_truncate+0x5b/0x90
> > path_openat+0x237/0xa90
> > do_filp_open+0x8a/0xf0
> > do_sys_open+0x11c/0x1f0
> > entry_SYSCALL_64_fastpath+0x1f/0xbe
> >
> > -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}:
> > up_write+0x1c/0x40
> > xfs_iunlock+0x1d0/0x310 [xfs]
> > xfs_file_fallocate+0x8a/0x310 [xfs]
> > loop_queue_work+0xb7/0x8d0
> > kthread_worker_fn+0xb9/0x1f0
> >
> > Chain exists of:
> > &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event
> >
> > Possible unsafe locking scenario by crosslock:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(&xfs_nondir_ilock_class);
> > lock((complete)&ret.event);
> > lock(&(&ip->i_mmaplock)->mr_lock);
> > unlock((complete)&ret.event);
> >
> > *** DEADLOCK ***
>
> The warning is a false positive, caused by the fact that all
> wait_for_completion()s in submit_bio_wait() are waiting with the same
> lock class.
>
> However, some bios have nothing to do with others, for example, the case
> might happen while using loop devices, between bios of an upper device
> and a lower device(=loop device).
>
> The safest way to assign different lock classes to different devices is
> to do it for each gendisk. In other words, this patch assigns a
> lockdep_map per gendisk and uses it when initializing completion in
> submit_bio_wait().
>
> Of course, it might be too conservative. But, making it safest for now
> and extended by block layer experts later is good, atm.
>
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
> block/bio.c | 4 ++--
> block/genhd.c | 13 +++++--------
> include/linux/genhd.h | 26 ++++++++++++++++++++++----
> 3 files changed, 29 insertions(+), 14 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 9a63597..0d4d6c0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -941,7 +941,7 @@ int submit_bio_wait(struct bio *bio)
> {
> struct submit_bio_ret ret;
>
> - init_completion(&ret.event);
> + init_completion_with_map(&ret.event, &bio->bi_disk->lockdep_map);
> bio->bi_private = &ret;
> bio->bi_end_io = submit_bio_wait_endio;
> bio->bi_opf |= REQ_SYNC;
> @@ -1382,7 +1382,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
>
> if (len <= 0)
> break;
> -
> +
> if (bytes > len)
> bytes = len;
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 7f520fa..676c245 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1304,13 +1304,7 @@ dev_t blk_lookup_devt(const char *name, int partno)
> }
> EXPORT_SYMBOL(blk_lookup_devt);
>
> -struct gendisk *alloc_disk(int minors)
> -{
> - return alloc_disk_node(minors, NUMA_NO_NODE);
> -}
> -EXPORT_SYMBOL(alloc_disk);
> -
> -struct gendisk *alloc_disk_node(int minors, int node_id)
> +struct gendisk *__alloc_disk_node(int minors, int node_id, struct lock_class_key *key, const char *lock_name)
> {
> struct gendisk *disk;
>
> @@ -1350,9 +1344,12 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
> disk_to_dev(disk)->type = &disk_type;
> device_initialize(disk_to_dev(disk));
> }
> +
> + lockdep_init_map(&disk->lockdep_map, lock_name, key, 0);
> +
> return disk;
> }
> -EXPORT_SYMBOL(alloc_disk_node);
> +EXPORT_SYMBOL(__alloc_disk_node);
>
> struct kobject *get_disk(struct gendisk *disk)
> {
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index e619fae..5225efc 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -3,7 +3,7 @@
>
> /*
> * genhd.h Copyright (C) 1992 Drew Eckhardt
> - * Generic hard disk header file by
> + * Generic hard disk header file by
> * Drew Eckhardt
> *
> * <drew@colorado.edu>
> @@ -206,6 +206,9 @@ struct gendisk {
> #endif /* CONFIG_BLK_DEV_INTEGRITY */
> int node_id;
> struct badblocks *bb;
> +#ifdef CONFIG_LOCKDEP_COMPLETIONS
> + struct lockdep_map lockdep_map;
> +#endif
> };
>
> static inline struct gendisk *part_to_disk(struct hd_struct *part)
> @@ -483,7 +486,7 @@ struct bsd_disklabel {
> __s16 d_type; /* drive type */
> __s16 d_subtype; /* controller/d_type specific */
> char d_typename[16]; /* type name, e.g. "eagle" */
> - char d_packname[16]; /* pack identifier */
> + char d_packname[16]; /* pack identifier */
> __u32 d_secsize; /* # of bytes per sector */
> __u32 d_nsectors; /* # of data sectors per track */
> __u32 d_ntracks; /* # of tracks per cylinder */
> @@ -602,8 +605,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
> extern void delete_partition(struct gendisk *, int);
> extern void printk_all_partitions(void);
>
> -extern struct gendisk *alloc_disk_node(int minors, int node_id);
> -extern struct gendisk *alloc_disk(int minors);
> +extern struct gendisk *__alloc_disk_node(int minors, int node_id, struct lock_class_key *key, const char *lock_name);
> extern struct kobject *get_disk(struct gendisk *disk);
> extern void put_disk(struct gendisk *disk);
> extern void blk_register_region(dev_t devt, unsigned long range,
> @@ -627,6 +629,22 @@ extern ssize_t part_fail_store(struct device *dev,
> const char *buf, size_t count);
> #endif /* CONFIG_FAIL_MAKE_REQUEST */
>
> +#ifdef CONFIG_LOCKDEP_COMPLETIONS
> +#define alloc_disk_node(m, id) \
> +({ \
> + static struct lock_class_key __key; \
> + const char *__lock_name; \
> + \
> + __lock_name = "(complete)"#m"("#id")"; \
> + \
> + __alloc_disk_node(m, id, &__key, __lock_name); \
> +})
> +#else
> +#define alloc_disk_node(m, id) __alloc_disk_node(m, id, NULL, NULL)
> +#endif
> +
> +#define alloc_disk(m) alloc_disk_node(m, NUMA_NO_NODE)
> +
> static inline int hd_ref_init(struct hd_struct *part)
> {
> if (percpu_ref_init(&part->ref, __delete_partition, 0,
> --
> 1.9.1
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-10-16 11:06 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-10 6:51 [RESEND PATCH v2 1/2] completion: Add support for initializing completion with lockdep_map Byungchul Park
2017-10-10 6:51 ` [RESEND PATCH v2 2/2] lockdep: Remove unnecessary acquisitions wrt workqueue flush Byungchul Park
2017-10-12 7:54 ` Byungchul Park
2017-10-12 15:38 ` Tejun Heo
2017-10-12 15:56 ` Peter Zijlstra
2017-10-13 7:56 ` Byungchul Park
2017-10-13 8:27 ` Peter Zijlstra
2017-10-13 8:48 ` Byungchul Park
2017-10-16 10:58 ` Byungchul Park
2017-10-16 11:06 ` Byungchul Park
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.