linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: false positive lockdep splat with loop device
       [not found] ` <20170926042421.GP10955@dastard>
@ 2017-09-26  8:35   ` Amir Goldstein
  2017-10-05 16:33     ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Amir Goldstein @ 2017-09-26  8:35 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, xfs, Peter Zijlstra, Byungchul Park,
	linux-kernel, linux-fsdevel, linux-block

On Tue, Sep 26, 2017 at 7:24 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Sep 21, 2017 at 09:43:41AM +0300, Amir Goldstein wrote:
>> On Thu, Sep 21, 2017 at 1:22 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > [cc lkml, PeterZ and Byungchul]
>> ...
>> > The thing is, this IO completion has nothing to do with the lower
>> > filesystem - it's the IO completion for the filesystem on the loop
>> > device (the upper filesystem) and is not in any way related to the
>> > IO completion from the dax device the lower filesystem is waiting
>> > on.
>> >
>> > IOWs, this is a false positive.
>> >
>> > Peter, this is the sort of false positive I mentioned were likely to
>> > occur without some serious work to annotate the IO stack to prevent
>> > them.  We can nest multiple layers of IO completions and locking in
>> > the IO stack via things like loop and RAID devices.  They can be
>> > nested to arbitrary depths, too (e.g. loop on fs on loop on fs on
>> > dm-raid on n * (loop on fs) on bdev) so this new completion lockdep
>> > checking is going to be a source of false positives until there is
>> > an effective (and simple!) way of providing context based completion
>> > annotations to avoid them...
>> >
>>
>> IMO, the way to handle this is to add 'nesting_depth' information
>> on blockdev (or bdi?). 'nesting' in the sense of blockdev->fs->blockdev->fs.
>> AFAIK, the only blockdev drivers that need to bump nesting_depth
>> are loop and nbd??
>
> You're assumming that this sort of "completion inversion" can only
> happen with bdev->fs->bdev, and that submit_bio_wait() is the only
> place where completions are used in stackable block devices.
>
> AFAICT, this could happen on with any block device that can be
> stacked multiple times that uses completions. e.g. MD has a function
> sync_page_io() that calls submit_bio_wait(), and that is called from
> places in the raid 5, raid 10, raid 1 and bitmap layers (plus others
> in DM). These can get stacked anywhere - even on top of loop devices
> - and so I think the issue has a much wider scope than just loop and
> nbd devices.

True. We can probably duplicate the struct file_system_type pattern,
something like:

struct block_device_type = loop_blkdev_type = {
        .owner          = THIS_MODULE,
        .name           = "loop",
        .devt             = MKDEV(LOOP_MAJOR, 0),
        .probe           = loop_probe,
        .lock              = NULL,
};

...
blk_register_region(&loop_blkdev_type, range, NULL);

And then we have a static place holder for lock_class_key address
to be used when annotating bio completions.
I realize same block device types can be nested via raid/dm/loop,
but as we can see in the splat so do same file system types via loop/nbd,
so we can think of similar solution to both cases.

>
>> Not sure if the kernel should limit loop blockdev nesting depth??
>
> There's no way we should do that just because new lockdep
> functionality is unable to express such constructs.
>

Of course not. I was just wondering if there should be a limit to
blockdev nesting regardless (e.g. too much queuing in the io stack).

But even if there is no limit to blockdev nesting, we can define
CONFIG_LOCKDEP_MAX_NESTING and:

struct lock_class_nested_key {
        struct lock_class_key     keys[CONFIG_LOCKDEP_MAX_NESTING];
};

Then struct file_system_type and struct block_device_type keys
could be of type struct lock_class_nested_key.

nested_depth should be carried in struct gendisk and the it is the
responsibility of  the blockdev driver to bump nested_depth if it
is a nested blockdev.

Callers of lockdep_set_class() ,like lockdep_annotate_inode_mutex_key()
should use a variant lockdep_set_class_nested(lock, ket, nested_depth)
if appropriate.

For any depth greater than CONFIG_LOCKDEP_MAX_NESTING,
lockdep_set_class_nested() will overload the last key in the
array, so we don't solve false positives for infinite nesting depth,
but we solve them for a defined max nesting depth.

Alas, even if this is workable, it will not be anytime soon that I can
backup this scribble with tested patches.

Amir.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: false positive lockdep splat with loop device
  2017-09-26  8:35   ` false positive lockdep splat with loop device Amir Goldstein
@ 2017-10-05 16:33     ` Christoph Hellwig
  2017-10-10  9:16       ` Ilya Dryomov
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2017-10-05 16:33 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Dave Chinner, Darrick J. Wong, xfs, Peter Zijlstra,
	Byungchul Park, linux-kernel, linux-fsdevel, linux-block

Does the patch below fix the warning for you?

--
>From 28aae7104425433d39e6142adcd5b88dc5b0ad5f Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Thu, 5 Oct 2017 18:31:02 +0200
Subject: block: use DECLARE_COMPLETION_ONSTACK in submit_bio_wait

This way we get our lockdep annotations right in case multiple layers
in the stack use submit_bio_wait.

It also happens to simplify the code by getting rid of the submit_bio_ret
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8338304ea256..4e18e959fc0a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -917,17 +917,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
 
-struct submit_bio_ret {
-	struct completion event;
-	int error;
-};
-
 static void submit_bio_wait_endio(struct bio *bio)
 {
-	struct submit_bio_ret *ret = bio->bi_private;
-
-	ret->error = blk_status_to_errno(bio->bi_status);
-	complete(&ret->event);
+	complete(bio->bi_private);
 }
 
 /**
@@ -943,16 +935,15 @@ static void submit_bio_wait_endio(struct bio *bio)
  */
 int submit_bio_wait(struct bio *bio)
 {
-	struct submit_bio_ret ret;
+	DECLARE_COMPLETION_ONSTACK(done);
 
-	init_completion(&ret.event);
-	bio->bi_private = &ret;
+	bio->bi_private = &done;
 	bio->bi_end_io = submit_bio_wait_endio;
 	bio->bi_opf |= REQ_SYNC;
 	submit_bio(bio);
-	wait_for_completion_io(&ret.event);
+	wait_for_completion_io(&done);
 
-	return ret.error;
+	return blk_status_to_errno(bio->bi_status);
 }
 EXPORT_SYMBOL(submit_bio_wait);
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: false positive lockdep splat with loop device
  2017-10-05 16:33     ` Christoph Hellwig
@ 2017-10-10  9:16       ` Ilya Dryomov
  2017-10-10  9:43         ` Ilya Dryomov
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Dryomov @ 2017-10-10  9:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Amir Goldstein, Dave Chinner, Darrick J. Wong, xfs,
	Peter Zijlstra, Byungchul Park, linux-kernel, linux-fsdevel,
	linux-block

[-- Attachment #1: Type: text/plain, Size: 2093 bytes --]

On Thu, Oct 5, 2017 at 6:33 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Does the patch below fix the warning for you?
>
> --
> From 28aae7104425433d39e6142adcd5b88dc5b0ad5f Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Thu, 5 Oct 2017 18:31:02 +0200
> Subject: block: use DECLARE_COMPLETION_ONSTACK in submit_bio_wait
>
> This way we get our lockdep annotations right in case multiple layers
> in the stack use submit_bio_wait.
>
> It also happens to simplify the code by getting rid of the submit_bio_ret
> structure.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/bio.c | 19 +++++--------------
>  1 file changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index 8338304ea256..4e18e959fc0a 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -917,17 +917,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  }
>  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
>
> -struct submit_bio_ret {
> -       struct completion event;
> -       int error;
> -};
> -
>  static void submit_bio_wait_endio(struct bio *bio)
>  {
> -       struct submit_bio_ret *ret = bio->bi_private;
> -
> -       ret->error = blk_status_to_errno(bio->bi_status);
> -       complete(&ret->event);
> +       complete(bio->bi_private);
>  }
>
>  /**
> @@ -943,16 +935,15 @@ static void submit_bio_wait_endio(struct bio *bio)
>   */
>  int submit_bio_wait(struct bio *bio)
>  {
> -       struct submit_bio_ret ret;
> +       DECLARE_COMPLETION_ONSTACK(done);
>
> -       init_completion(&ret.event);
> -       bio->bi_private = &ret;
> +       bio->bi_private = &done;
>         bio->bi_end_io = submit_bio_wait_endio;
>         bio->bi_opf |= REQ_SYNC;
>         submit_bio(bio);
> -       wait_for_completion_io(&ret.event);
> +       wait_for_completion_io(&done);
>
> -       return ret.error;
> +       return blk_status_to_errno(bio->bi_status);
>  }
>  EXPORT_SYMBOL(submit_bio_wait);

No, it doesn't -- the splat is a little bit more complicated, but
fundamentally the same thing.

Thanks,

                Ilya

[-- Attachment #2: pre.txt --]
[-- Type: text/plain, Size: 2948 bytes --]

run fstests shared/298 at 2017-10-09 16:54:45

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc4-ceph-g0721a67fa48f #1 Tainted: G        W      
------------------------------------------------------
loop0/23166 is trying to acquire lock:
 (&ei->i_data_sem){++++}, at: [<ffffffffa6340226>] ext4_punch_hole+0x3b6/0x570

but now in release context of a crosslock acquired at the following:
 ((complete)&ret.event){+.+.}, at: [<ffffffffa645c7ab>] submit_bio_wait+0x8b/0xb0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 ((complete)&ret.event){+.+.}:
       __lock_acquire+0x111f/0x1150
       lock_acquire+0xed/0x1e0
       wait_for_completion_io+0x53/0x1b0
       submit_bio_wait+0x8b/0xb0
       blkdev_issue_zeroout+0xcf/0x1b0
       ext4_issue_zeroout+0x44/0x60
       ext4_ext_zeroout+0x2f/0x40
       ext4_ext_convert_to_initialized+0x6ee/0x7a0
       ext4_ext_map_blocks+0x151d/0x1620
       ext4_map_blocks+0x14f/0x5b0
       ext4_writepages+0xa4a/0x11e0
       do_writepages+0x1c/0x80
       __filemap_fdatawrite_range+0xaa/0xe0
       file_write_and_wait_range+0x39/0xa0
       ext4_sync_file+0xae/0x580
       vfs_fsync_range+0x4b/0xb0
       do_fsync+0x3d/0x70
       SyS_fsync+0x10/0x20
       entry_SYSCALL_64_fastpath+0x23/0xc2

-> #0 (&ei->i_data_sem){++++}:
       down_write+0x40/0x70
       ext4_punch_hole+0x3b6/0x570
       ext4_fallocate+0x19d/0xb20
       loop_queue_work+0xd7/0x9b0
       kthread_worker_fn+0xbf/0x200

other info that might help us debug this:

 Possible unsafe locking scenario by crosslock:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->i_data_sem);
  lock((complete)&ret.event);
                               lock(&ei->i_data_sem);
                               unlock((complete)&ret.event);

 *** DEADLOCK ***

1 lock held by loop0/23166:
 #0:  (&x->wait#17){..-.}, at: [<ffffffffa60d678d>] complete+0x1d/0x60

stack backtrace:
CPU: 0 PID: 23166 Comm: loop0 Tainted: G        W       4.14.0-rc4-ceph-g0721a67fa48f #1
Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010
Call Trace:
 dump_stack+0x85/0xc7
 print_circular_bug+0x1eb/0x2e0
 ? print_bfs_bug+0x40/0x40
 check_prev_add+0x396/0x7e0
 ? jbd2_journal_stop+0xd0/0x4d0
 lock_commit_crosslock+0x3db/0x5e0
 ? lock_commit_crosslock+0x3db/0x5e0
 complete+0x29/0x60
 submit_bio_wait_endio+0x25/0x30
 bio_endio+0x9f/0x1e0
 blk_update_request+0xc4/0x3c0
 blk_mq_end_request+0x1e/0x70
 lo_complete_rq+0x30/0x80
 __blk_mq_complete_request+0xa3/0x160
 blk_mq_complete_request+0x16/0x20
 loop_queue_work+0x5f/0x9b0
 ? find_held_lock+0x39/0xb0
 ? kthread_worker_fn+0x9c/0x200
 ? trace_hardirqs_on_caller+0x11f/0x190
 kthread_worker_fn+0xbf/0x200
 loop_kthread_worker_fn+0x1e/0x20
 kthread+0x152/0x190
 ? loop_get_status64+0x90/0x90
 ? kthread_stop+0x2a0/0x2a0
 ret_from_fork+0x2a/0x40

[-- Attachment #3: post.txt --]
[-- Type: text/plain, Size: 3047 bytes --]

run fstests shared/298 at 2017-10-09 20:38:50

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc4-ceph-gc9ed9bcb0121 #1 Not tainted
------------------------------------------------------
loop0/15423 is trying to acquire lock:
 (jbd2_handle){++++}, at: [<ffffffffbd37b694>] start_this_handle+0x104/0x450

but now in release context of a crosslock acquired at the following:
 ((complete)&done#3){+.+.}, at: [<ffffffffbd45c79b>] submit_bio_wait+0x8b/0xc0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 ((complete)&done#3){+.+.}:
       __lock_acquire+0x111f/0x1150
       lock_acquire+0xed/0x1e0
       wait_for_completion_io+0x53/0x1b0
       submit_bio_wait+0x8b/0xc0
       blkdev_issue_zeroout+0xcf/0x1b0
       ext4_init_inode_table+0x153/0x339
       ext4_lazyinit_thread+0x2bc/0x3d0
       kthread+0x152/0x190
       ret_from_fork+0x2a/0x40

-> #1 (&meta_group_info[i]->alloc_sem){++++}:
       __lock_acquire+0x111f/0x1150
       lock_acquire+0xed/0x1e0
       down_read+0x43/0x70
       __ext4_new_inode+0xc4a/0x1470
       ext4_create+0x103/0x190
       lookup_open+0x78a/0x880
       path_openat+0x3c0/0xad0
       do_filp_open+0x8a/0xf0
       do_sys_open+0x11b/0x1f0
       SyS_open+0x1e/0x20
       entry_SYSCALL_64_fastpath+0x23/0xc2

-> #0 (jbd2_handle){++++}:
       start_this_handle+0x167/0x450
       jbd2__journal_start+0xdb/0x2b0
       __ext4_journal_start_sb+0x89/0x1e0
       ext4_punch_hole+0x229/0x570
       ext4_fallocate+0x19d/0xb20

other info that might help us debug this:

Chain exists of:
  jbd2_handle --> &meta_group_info[i]->alloc_sem --> (complete)&done#3

 Possible unsafe locking scenario by crosslock:

       CPU0                    CPU1
       ----                    ----
  lock(&meta_group_info[i]->alloc_sem);
  lock((complete)&done#3);
                               lock(jbd2_handle);
                               unlock((complete)&done#3);

 *** DEADLOCK ***

1 lock held by loop0/15423:
 #0:  (&x->wait#17){..-.}, at: [<ffffffffbd0d678d>] complete+0x1d/0x60

stack backtrace:
CPU: 1 PID: 15423 Comm: loop0 Not tainted 4.14.0-rc4-ceph-gc9ed9bcb0121 #1
Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010
Call Trace:
 dump_stack+0x85/0xc7
 print_circular_bug+0x1eb/0x2e0
 ? print_bfs_bug+0x40/0x40
 check_prev_add+0x396/0x7e0
 ? __lock_acquire+0x705/0x1150
 lock_commit_crosslock+0x3db/0x5e0
 ? lock_commit_crosslock+0x3db/0x5e0
 complete+0x29/0x60
 submit_bio_wait_endio+0x12/0x20
 bio_endio+0x9f/0x1e0
 blk_update_request+0xc4/0x3c0
 blk_mq_end_request+0x1e/0x70
 lo_complete_rq+0x30/0x80
 __blk_mq_complete_request+0xa3/0x160
 blk_mq_complete_request+0x16/0x20
 loop_queue_work+0x5f/0x9b0
 ? find_held_lock+0x39/0xb0
 ? kthread_worker_fn+0x9c/0x200
 ? trace_hardirqs_on_caller+0x11f/0x190
 kthread_worker_fn+0xbf/0x200
 loop_kthread_worker_fn+0x1e/0x20
 kthread+0x152/0x190
 ? loop_get_status64+0x90/0x90
 ? kthread_stop+0x2a0/0x2a0
 ret_from_fork+0x2a/0x40

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: false positive lockdep splat with loop device
  2017-10-10  9:16       ` Ilya Dryomov
@ 2017-10-10  9:43         ` Ilya Dryomov
  0 siblings, 0 replies; 4+ messages in thread
From: Ilya Dryomov @ 2017-10-10  9:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Amir Goldstein, Dave Chinner, Darrick J. Wong, xfs,
	Peter Zijlstra, Byungchul Park, linux-kernel, linux-fsdevel,
	linux-block

On Tue, Oct 10, 2017 at 11:16 AM, Ilya Dryomov <idryomov@gmail.com> wrote:
> On Thu, Oct 5, 2017 at 6:33 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> Does the patch below fix the warning for you?
>>
>> --
>> From 28aae7104425433d39e6142adcd5b88dc5b0ad5f Mon Sep 17 00:00:00 2001
>> From: Christoph Hellwig <hch@lst.de>
>> Date: Thu, 5 Oct 2017 18:31:02 +0200
>> Subject: block: use DECLARE_COMPLETION_ONSTACK in submit_bio_wait
>>
>> This way we get our lockdep annotations right in case multiple layers
>> in the stack use submit_bio_wait.
>>
>> It also happens to simplify the code by getting rid of the submit_bio_ret
>> structure.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  block/bio.c | 19 +++++--------------
>>  1 file changed, 5 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/bio.c b/block/bio.c
>> index 8338304ea256..4e18e959fc0a 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -917,17 +917,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>>  }
>>  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
>>
>> -struct submit_bio_ret {
>> -       struct completion event;
>> -       int error;
>> -};
>> -
>>  static void submit_bio_wait_endio(struct bio *bio)
>>  {
>> -       struct submit_bio_ret *ret = bio->bi_private;
>> -
>> -       ret->error = blk_status_to_errno(bio->bi_status);
>> -       complete(&ret->event);
>> +       complete(bio->bi_private);
>>  }
>>
>>  /**
>> @@ -943,16 +935,15 @@ static void submit_bio_wait_endio(struct bio *bio)
>>   */
>>  int submit_bio_wait(struct bio *bio)
>>  {
>> -       struct submit_bio_ret ret;
>> +       DECLARE_COMPLETION_ONSTACK(done);
>>
>> -       init_completion(&ret.event);
>> -       bio->bi_private = &ret;
>> +       bio->bi_private = &done;
>>         bio->bi_end_io = submit_bio_wait_endio;
>>         bio->bi_opf |= REQ_SYNC;
>>         submit_bio(bio);
>> -       wait_for_completion_io(&ret.event);
>> +       wait_for_completion_io(&done);
>>
>> -       return ret.error;
>> +       return blk_status_to_errno(bio->bi_status);
>>  }
>>  EXPORT_SYMBOL(submit_bio_wait);
>
> No, it doesn't -- the splat is a little bit more complicated, but
> fundamentally the same thing.

Easily triggered with generic/361 too, BTW.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-10  9:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAOQ4uxixLMRU2YT-3csTr_684vNUZ=2CazKKD-ojO68fC4f6tA@mail.gmail.com>
     [not found] ` <20170926042421.GP10955@dastard>
2017-09-26  8:35   ` false positive lockdep splat with loop device Amir Goldstein
2017-10-05 16:33     ` Christoph Hellwig
2017-10-10  9:16       ` Ilya Dryomov
2017-10-10  9:43         ` Ilya Dryomov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).