From: James Bottomley <jejb@linux.vnet.ibm.com>
To: Jan Kara <jack@suse.cz>, Fengguang Wu <fengguang.wu@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
LKP <lkp@01.org>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
linux-scsi@vger.kernel.org
Subject: Re: [bdi_unregister] 165a5e22fa INFO: task swapper:1 blocked for more than 120 seconds.
Date: Mon, 06 Mar 2017 06:35:21 -0800 [thread overview]
Message-ID: <1488810921.16832.6.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170306120123.GD14932@quack2.suse.cz>
On Mon, 2017-03-06 at 13:01 +0100, Jan Kara wrote:
> On Mon 06-03-17 11:27:33, Jan Kara wrote:
> > Hi,
> >
> > On Sun 05-03-17 10:21:11, Wu Fengguang wrote:
> > > FYI next-20170303 is good while mainline is bad with this error.
> > > The attached reproduce-* may help reproduce the issue.
> >
> > Thanks for report! So from the stacktrace we are in the path
> > testing removal of a device immediately after it has been probed
> > and for some reason bdi_unregister() hangs - likely waiting for
> > cgroup-writeback references to drop. Given how early this happens
> > my guess is we fail to initialize something but for now I don't see
> > how my patch could make a difference. I'm trying to reproduce this
> > to be able to debug more...
>
> OK, so after some debugging I think this is yet another problem in
> SCSI initialization / destruction code which my patch only makes
> visible (added relevant maintainers).
>
> I can reproduce the problem reliably with enabling:
>
> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
> CONFIG_SCSI_DEBUG=m
> CONFIG_BLK_CGROUP=y
> CONFIG_MEMCG=y
> (and thus CONFIG_CGROUP_WRITEBACK=y)
>
> then 'modprobe scsi_debug' is all it takes to reproduce hang.
> Relevant kernel messages with some of my debugging added (attached is
> a patch that adds those debug messages):
This looks to be precisely the same problem Dan Williams was debugging
for us.
> [ 58.721765] scsi host0: scsi_debug: version 1.86 [20160430]
> [ 58.721765] dev_size_mb=8, opts=0x0, submit_queues=1,
> statistics=0
> [ 58.728946] CGWB init ffff88007fbb2000
> [ 58.730095] Created sdev ffff880078e1a000
> [ 58.731611] scsi 0:0:0:0: Direct-Access Linux scsi_debug
> 0186 PQ : 0 ANSI: 7
> [ 58.782246] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39
> MB/8.00 MiB)
> [ 58.789687] sd 0:0:0:0: [sda] Write Protect is off
> [ 58.791140] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08
> [ 58.800879] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, supports DPO and FUA
> [ 58.893738] sd 0:0:0:0: [sda] Attached SCSI disk
> [ 58.896808] Unreg1
> [ 58.897960] Unreg2
> [ 58.898637] Unreg3
> [ 58.899100] CGWB ffff88007fbb2000 usage_cnt: 0
> [ 58.900004] Unreg4
> [ 58.904976] sd 0:0:0:0: [sda] Synchronizing SCSI cache
OK, can you put a WARN_ON trace in sd_shutdown and tell us where this
is coming from. For the device to be reused after this we have to be
calling sd_shutdown() without going into SDEV_DEL.
Thanks,
James
WARNING: multiple messages have this Message-ID (diff)
From: James Bottomley <jejb@linux.vnet.ibm.com>
To: lkp@lists.01.org
Subject: Re: [bdi_unregister] 165a5e22fa INFO: task swapper:1 blocked for more than 120 seconds.
Date: Mon, 06 Mar 2017 06:35:21 -0800 [thread overview]
Message-ID: <1488810921.16832.6.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170306120123.GD14932@quack2.suse.cz>
[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]
On Mon, 2017-03-06 at 13:01 +0100, Jan Kara wrote:
> On Mon 06-03-17 11:27:33, Jan Kara wrote:
> > Hi,
> >
> > On Sun 05-03-17 10:21:11, Wu Fengguang wrote:
> > > FYI next-20170303 is good while mainline is bad with this error.
> > > The attached reproduce-* may help reproduce the issue.
> >
> > Thanks for report! So from the stacktrace we are in the path
> > testing removal of a device immediately after it has been probed
> > and for some reason bdi_unregister() hangs - likely waiting for
> > cgroup-writeback references to drop. Given how early this happens
> > my guess is we fail to initialize something but for now I don't see
> > how my patch could make a difference. I'm trying to reproduce this
> > to be able to debug more...
>
> OK, so after some debugging I think this is yet another problem in
> SCSI initialization / destruction code which my patch only makes
> visible (added relevant maintainers).
>
> I can reproduce the problem reliably with enabling:
>
> CONFIG_DEBUG_TEST_DRIVER_REMOVE=y
> CONFIG_SCSI_DEBUG=m
> CONFIG_BLK_CGROUP=y
> CONFIG_MEMCG=y
> (and thus CONFIG_CGROUP_WRITEBACK=y)
>
> then 'modprobe scsi_debug' is all it takes to reproduce hang.
> Relevant kernel messages with some of my debugging added (attached is
> a patch that adds those debug messages):
This looks to be precisely the same problem Dan Williams was debugging
for us.
> [ 58.721765] scsi host0: scsi_debug: version 1.86 [20160430]
> [ 58.721765] dev_size_mb=8, opts=0x0, submit_queues=1,
> statistics=0
> [ 58.728946] CGWB init ffff88007fbb2000
> [ 58.730095] Created sdev ffff880078e1a000
> [ 58.731611] scsi 0:0:0:0: Direct-Access Linux scsi_debug
> 0186 PQ : 0 ANSI: 7
> [ 58.782246] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39
> MB/8.00 MiB)
> [ 58.789687] sd 0:0:0:0: [sda] Write Protect is off
> [ 58.791140] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08
> [ 58.800879] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, supports DPO and FUA
> [ 58.893738] sd 0:0:0:0: [sda] Attached SCSI disk
> [ 58.896808] Unreg1
> [ 58.897960] Unreg2
> [ 58.898637] Unreg3
> [ 58.899100] CGWB ffff88007fbb2000 usage_cnt: 0
> [ 58.900004] Unreg4
> [ 58.904976] sd 0:0:0:0: [sda] Synchronizing SCSI cache
OK, can you put a WARN_ON trace in sd_shutdown and tell us where this
is coming from. For the device to be reused after this we have to be
calling sd_shutdown() without going into SDEV_DEL.
Thanks,
James
next prev parent reply other threads:[~2017-03-06 14:35 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-05 2:21 [bdi_unregister] 165a5e22fa INFO: task swapper:1 blocked for more than 120 seconds Fengguang Wu
2017-03-05 2:21 ` Fengguang Wu
2017-03-05 2:24 ` [bdi_unregister] 165a5e22fa BUG kmalloc-512 (Not tainted): Poison overwritten Fengguang Wu
2017-03-05 2:24 ` Fengguang Wu
2017-03-05 15:12 ` [bdi_unregister] 165a5e22fa INFO: task swapper:1 blocked for more than 120 seconds Jens Axboe
2017-03-06 1:19 ` Fengguang Wu
2017-03-06 1:19 ` Fengguang Wu
2017-03-06 10:27 ` Jan Kara
2017-03-06 10:27 ` Jan Kara
2017-03-06 12:01 ` Jan Kara
2017-03-06 12:01 ` Jan Kara
2017-03-06 14:35 ` James Bottomley [this message]
2017-03-06 14:35 ` James Bottomley
2017-03-06 15:14 ` Jan Kara
2017-03-06 15:14 ` Jan Kara
2017-03-06 15:44 ` James Bottomley
2017-03-06 15:44 ` James Bottomley
2017-03-06 16:13 ` Jan Kara
2017-03-06 16:13 ` Jan Kara
2017-03-06 17:25 ` James Bottomley
2017-03-06 17:25 ` James Bottomley
2017-03-07 14:41 ` Jan Kara
2017-03-07 14:41 ` Jan Kara
2017-03-07 16:10 ` James Bottomley
2017-03-07 16:10 ` James Bottomley
2017-03-07 16:26 ` Jan Kara
2017-03-07 16:26 ` Jan Kara
2017-03-08 9:59 ` Jan Kara
2017-03-08 9:59 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1488810921.16832.6.camel@linux.vnet.ibm.com \
--to=jejb@linux.vnet.ibm.com \
--cc=axboe@kernel.dk \
--cc=fengguang.wu@intel.com \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lkp@01.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.