WARNING: at block/genhd.c:1556 disk_clear

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
@ 2011-04-04 21:16 Sitsofe Wheeler
  2011-04-05  1:51 ` Jens Axboe
  2011-04-06 13:04 ` Tejun Heo
  0 siblings, 2 replies; 14+ messages in thread
From: Sitsofe Wheeler @ 2011-04-04 21:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

I'm seeing the following warning on 2.6.39-rc1-00220-g148086b when I was
inserting an SD card (which is attached to the USB port) and USB stick
with loads of kernel debug options on:

[    7.526172] ------------[ cut here ]------------
[    7.526191] WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
[    7.526197] Hardware name: 900
[    7.526205] Pid: 1112, comm: udisks-part-id Not tainted 2.6.39-rc1-00227-gd7c764c-dirty #352
[    7.526211] Call Trace:
[    7.526224]  [<b0127208>] warn_slowpath_common+0x68/0xa0
[    7.526232]  [<b0254e8c>] ? disk_clear_events+0xdc/0xf0
[    7.526240]  [<b0254e8c>] ? disk_clear_events+0xdc/0xf0
[    7.526248]  [<b012725b>] warn_slowpath_null+0x1b/0x20
[    7.526255]  [<b0254e8c>] disk_clear_events+0xdc/0xf0
[    7.526265]  [<b01c29c5>] check_disk_change+0x25/0x70
[    7.526274]  [<b0345647>] sd_open+0x97/0x180
[    7.526282]  [<b01c38ac>] __blkdev_get+0x7c/0x310
[    7.526289]  [<b01c3b86>] blkdev_get+0x46/0x2c0
[    7.526298]  [<b051206d>] ? _raw_spin_unlock+0x1d/0x20
[    7.526306]  [<b01c3e55>] blkdev_open+0x55/0x70
[    7.526316]  [<b0193262>] __dentry_open+0x112/0x290
[    7.526323]  [<b0193446>] nameidata_to_filp+0x66/0x70
[    7.526331]  [<b01c3e00>] ? blkdev_get+0x2c0/0x2c0
[    7.526340]  [<b01a04fd>] do_last+0x19d/0x770
[    7.526348]  [<b01a1730>] path_openat+0x90/0x310
[    7.526356]  [<b01a1abd>] do_filp_open+0x5d/0x80
[    7.526364]  [<b051206d>] ? _raw_spin_unlock+0x1d/0x20
[    7.526373]  [<b01aec54>] ? alloc_fd+0x164/0x1b0
[    7.526381]  [<b0194581>] do_sys_open+0xf1/0x1c0
[    7.526389]  [<b01952ac>] ? vfs_write+0xfc/0x140
[    7.526397]  [<b01946b9>] sys_open+0x29/0x40
[    7.526404]  [<b0512817>] sysenter_do_call+0x12/0x36
[    7.526410] ---[ end trace 1aec3f97d06611cd ]---

A quick Google also turns up the same report in
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/748166 .

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-04 21:16 WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() Sitsofe Wheeler
@ 2011-04-05  1:51 ` Jens Axboe
  2011-04-06 13:04 ` Tejun Heo
  1 sibling, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2011-04-05  1:51 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: linux-kernel, Tejun Heo

On 2011-04-04 23:16, Sitsofe Wheeler wrote:
> I'm seeing the following warning on 2.6.39-rc1-00220-g148086b when I was
> inserting an SD card (which is attached to the USB port) and USB stick
> with loads of kernel debug options on:
> 
> [    7.526172] ------------[ cut here ]------------
> [    7.526191] WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
> [    7.526197] Hardware name: 900
> [    7.526205] Pid: 1112, comm: udisks-part-id Not tainted 2.6.39-rc1-00227-gd7c764c-dirty #352
> [    7.526211] Call Trace:
> [    7.526224]  [<b0127208>] warn_slowpath_common+0x68/0xa0
> [    7.526232]  [<b0254e8c>] ? disk_clear_events+0xdc/0xf0
> [    7.526240]  [<b0254e8c>] ? disk_clear_events+0xdc/0xf0
> [    7.526248]  [<b012725b>] warn_slowpath_null+0x1b/0x20
> [    7.526255]  [<b0254e8c>] disk_clear_events+0xdc/0xf0
> [    7.526265]  [<b01c29c5>] check_disk_change+0x25/0x70
> [    7.526274]  [<b0345647>] sd_open+0x97/0x180
> [    7.526282]  [<b01c38ac>] __blkdev_get+0x7c/0x310
> [    7.526289]  [<b01c3b86>] blkdev_get+0x46/0x2c0
> [    7.526298]  [<b051206d>] ? _raw_spin_unlock+0x1d/0x20
> [    7.526306]  [<b01c3e55>] blkdev_open+0x55/0x70
> [    7.526316]  [<b0193262>] __dentry_open+0x112/0x290
> [    7.526323]  [<b0193446>] nameidata_to_filp+0x66/0x70
> [    7.526331]  [<b01c3e00>] ? blkdev_get+0x2c0/0x2c0
> [    7.526340]  [<b01a04fd>] do_last+0x19d/0x770
> [    7.526348]  [<b01a1730>] path_openat+0x90/0x310
> [    7.526356]  [<b01a1abd>] do_filp_open+0x5d/0x80
> [    7.526364]  [<b051206d>] ? _raw_spin_unlock+0x1d/0x20
> [    7.526373]  [<b01aec54>] ? alloc_fd+0x164/0x1b0
> [    7.526381]  [<b0194581>] do_sys_open+0xf1/0x1c0
> [    7.526389]  [<b01952ac>] ? vfs_write+0xfc/0x140
> [    7.526397]  [<b01946b9>] sys_open+0x29/0x40
> [    7.526404]  [<b0512817>] sysenter_do_call+0x12/0x36
> [    7.526410] ---[ end trace 1aec3f97d06611cd ]---
> 
> A quick Google also turns up the same report in
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/748166 .

CC Tejun.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-04 21:16 WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() Sitsofe Wheeler
  2011-04-05  1:51 ` Jens Axboe
@ 2011-04-06 13:04 ` Tejun Heo
  2011-04-06 21:26   ` Sitsofe Wheeler
  1 sibling, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2011-04-06 13:04 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel

I think disk_block_events was a bit too agressive with optimization.
Can you please test the following patch and see whether the problem
goes away?

Thanks.

diff --git a/block/genhd.c b/block/genhd.c
index c91a2da..ae4079a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1418,18 +1418,15 @@ static void __disk_block_events(struct gendisk *disk, bool sync)
 {
 	struct disk_events *ev = disk->ev;
 	unsigned long flags;
-	bool cancel;
 
 	spin_lock_irqsave(&ev->lock, flags);
-	cancel = !ev->block++;
+	ev->block++;
 	spin_unlock_irqrestore(&ev->lock, flags);
 
-	if (cancel) {
-		if (sync)
-			cancel_delayed_work_sync(&disk->ev->dwork);
-		else
-			cancel_delayed_work(&disk->ev->dwork);
-	}
+	if (sync)
+		cancel_delayed_work_sync(&disk->ev->dwork);
+	else
+		cancel_delayed_work(&disk->ev->dwork);
 }
 
 static void __disk_unblock_events(struct gendisk *disk, bool check_now)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-06 13:04 ` Tejun Heo
@ 2011-04-06 21:26   ` Sitsofe Wheeler
  2011-04-06 21:37     ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Sitsofe Wheeler @ 2011-04-06 21:26 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel

On Wed, Apr 06, 2011 at 06:04:23AM -0700, Tejun Heo wrote:
> I think disk_block_events was a bit too agressive with optimization.
> Can you please test the following patch and see whether the problem
> goes away?

With the 2.6.39-rc2 the warning has mysteriously disappeared all by
itself. Could it be that other parts of the kernel were causing this to
manifest itself?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-06 21:26   ` Sitsofe Wheeler
@ 2011-04-06 21:37     ` Tejun Heo
  2011-04-09  9:34       ` Sitsofe Wheeler
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2011-04-06 21:37 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel

On Wed, Apr 06, 2011 at 10:26:11PM +0100, Sitsofe Wheeler wrote:
> On Wed, Apr 06, 2011 at 06:04:23AM -0700, Tejun Heo wrote:
> > I think disk_block_events was a bit too agressive with optimization.
> > Can you please test the following patch and see whether the problem
> > goes away?
> 
> With the 2.6.39-rc2 the warning has mysteriously disappeared all by
> itself. Could it be that other parts of the kernel were causing this to
> manifest itself?

The problem would be quite timing dependent.  It would be great if you
can prepare a setup where you can reproduce the problem on some
regular basis and make sure that the problem goes away with the patch
applied.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-06 21:37     ` Tejun Heo
@ 2011-04-09  9:34       ` Sitsofe Wheeler
  2011-04-10 14:59         ` Sitsofe Wheeler
  0 siblings, 1 reply; 14+ messages in thread
From: Sitsofe Wheeler @ 2011-04-09  9:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel

On Wed, Apr 06, 2011 at 02:37:18PM -0700, Tejun Heo wrote:
> 
> The problem would be quite timing dependent.  It would be great if you
> can prepare a setup where you can reproduce the problem on some
> regular basis and make sure that the problem goes away with the patch
> applied.

Try as I might I cannot even reproduce the problem with the original
problem kernels. I've tried all sorts (such as plugging in multiple USB
disks and killing/restarting services such as udisks) but the problem
has refused to show itself again. Originally I saw the message three
times over a two days period and two of those were within 10 seconds of
boot. I've booted the same kernel a dozen times but the error never
comes back...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-09  9:34       ` Sitsofe Wheeler
@ 2011-04-10 14:59         ` Sitsofe Wheeler
  2011-04-13 20:00           ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Sitsofe Wheeler @ 2011-04-10 14:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel

On Sat, Apr 09, 2011 at 10:34:49AM +0100, Sitsofe Wheeler wrote:
> On Wed, Apr 06, 2011 at 02:37:18PM -0700, Tejun Heo wrote:
> > 
> > The problem would be quite timing dependent.  It would be great if you
> > can prepare a setup where you can reproduce the problem on some
> > regular basis and make sure that the problem goes away with the patch
> > applied.
> 
> Try as I might I cannot even reproduce the problem with the original
> problem kernels. I've tried all sorts (such as plugging in multiple USB

Shortly after saying this the problem reappeared by itself. I've found a
method to reproduce it too: by running
for i in `seq 1 100`; do udevadm trigger --subsystem-match=block ; done
and then waiting at the desktop for a five or so minutes (so the
requests that are backed up drain away) then running it again etc. the
problem will usually trigger. The warning is still here in the most
recent 2.6.39-rc2-00120-g94c8a98 kernel too. Applying the patch to the
first kernel that showed th problem doesn't fix the issue (although it
appears to make the issue a lot more difficult to trigger).

[   40.370083] wlan0: no IPv6 routers present
[ 5217.888074] ------------[ cut here ]------------
[ 5217.888095] WARNING: at block/genhd.c:1553 disk_clear_events+0xdc/0xf0()
[ 5217.888101] Hardware name: 900
[ 5217.888109] Pid: 1740, comm: hald-addon-stor Not tainted 2.6.39-rc1-00220-g148086b-dirty #397
[ 5217.888115] Call Trace:
[ 5217.888127]  [<b0127218>] warn_slowpath_common+0x68/0xa0
[ 5217.888136]  [<b0254e8c>] ? disk_clear_events+0xdc/0xf0
[ 5217.888143]  [<b0254e8c>] ? disk_clear_events+0xdc/0xf0
[ 5217.888152]  [<b012726b>] warn_slowpath_null+0x1b/0x20
[ 5217.888159]  [<b0254e8c>] disk_clear_events+0xdc/0xf0
[ 5217.888169]  [<b01c29d5>] check_disk_change+0x25/0x70
[ 5217.888178]  [<b0345747>] sd_open+0x97/0x180
[ 5217.888185]  [<b03456b0>] ? sd_release+0x60/0x60
[ 5217.888192]  [<b01c3aa5>] __blkdev_get+0x265/0x310
[ 5217.888200]  [<b01c3b96>] blkdev_get+0x46/0x2c0
[ 5217.888209]  [<b051216d>] ? _raw_spin_unlock+0x1d/0x20
[ 5217.888217]  [<b01c3e65>] blkdev_open+0x55/0x70
[ 5217.888226]  [<b0193272>] __dentry_open+0x112/0x290
[ 5217.888234]  [<b0193456>] nameidata_to_filp+0x66/0x70
[ 5217.888242]  [<b01c3e10>] ? blkdev_get+0x2c0/0x2c0
[ 5217.888251]  [<b01a050d>] do_last+0x19d/0x770
[ 5217.888259]  [<b01a1740>] path_openat+0x90/0x310
[ 5217.888267]  [<b01a1acd>] do_filp_open+0x5d/0x80
[ 5217.888274]  [<b051216d>] ? _raw_spin_unlock+0x1d/0x20
[ 5217.888284]  [<b01aec64>] ? alloc_fd+0x164/0x1b0
[ 5217.888292]  [<b0194591>] do_sys_open+0xf1/0x1c0
[ 5217.888301]  [<b0181b0f>] ? might_fault+0x7f/0x90
[ 5217.888308]  [<b01946c9>] sys_open+0x29/0x40
[ 5217.888316]  [<b0512917>] sysenter_do_call+0x12/0x36
[ 5217.888322] ---[ end trace e99fcb86a68da1f4 ]---
[ 5793.762848] [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind a purgeable object
[ 6153.518126] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 6153.519121] ata2.00: BMDMA stat 0x65
[ 6153.519673] ata2.00: cmd c8/00:08:a0:de:77/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 6153.519676]          res 51/44:08:a0:de:77/00:00:00:00:00/e0 Emask 0x9 (media error)

I will note that by EeePC's onboard "gen-0" SSD mysteriously developed
bad blocks after all this so perhaps the above is especially stressful
for disks...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-10 14:59         ` Sitsofe Wheeler
@ 2011-04-13 20:00           ` Tejun Heo
  2011-04-15  9:01             ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2011-04-13 20:00 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel

Hello, Sitsofe.

Sorry about late replay.  I've been and still am travelling.  :-(

On Sun, Apr 10, 2011 at 03:59:20PM +0100, Sitsofe Wheeler wrote:
> Shortly after saying this the problem reappeared by itself. I've found a
> method to reproduce it too: by running
> for i in `seq 1 100`; do udevadm trigger --subsystem-match=block ; done
> and then waiting at the desktop for a five or so minutes (so the
> requests that are backed up drain away) then running it again etc. the
> problem will usually trigger. The warning is still here in the most
> recent 2.6.39-rc2-00120-g94c8a98 kernel too. Applying the patch to the
> first kernel that showed th problem doesn't fix the issue (although it
> appears to make the issue a lot more difficult to trigger).

Thanks a lot for finding out the test case.  I'll see whether I can
reproduce the problem in qemu.

> I will note that by EeePC's onboard "gen-0" SSD mysteriously developed
> bad blocks after all this so perhaps the above is especially stressful
> for disks...

Yeah, it probably hammers the same sector over and over again.
Earlier/cheap SSDs often lack proper wear leveling and writing
repeatedly to the same sector can easily create bad blocks.  Sorry
about that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-13 20:00           ` Tejun Heo
@ 2011-04-15  9:01             ` Tejun Heo
  2011-04-15 18:20               ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2011-04-15  9:01 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel

On Thu, Apr 14, 2011 at 05:00:00AM +0900, Tejun Heo wrote:
> Thanks a lot for finding out the test case.  I'll see whether I can
> reproduce the problem in qemu.

Hmm... can't reproduce the problem with qemu.  Can you please apply
the following patch and see which warning messages trigger?

Thanks.

diff --git a/block/genhd.c b/block/genhd.c
index b364bd0..f061975 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1377,6 +1377,8 @@ struct disk_events {
 
 	long			poll_msecs;	/* interval, -1 for default */
 	struct delayed_work	dwork;
+	struct mutex		mutex;
+	atomic_t		cnt;
 };
 
 static const char *disk_events_strs[] = {
@@ -1572,6 +1574,9 @@ static void disk_events_workfn(struct work_struct *work)
 	unsigned long intv;
 	int nr_events = 0, i;
 
+	WARN_ON_ONCE(atomic_add_return(1, &ev->cnt) != 1);
+	mutex_lock(&ev->mutex);
+
 	/* check events */
 	events = disk->fops->check_events(disk, clearing);
 
@@ -1595,6 +1600,9 @@ static void disk_events_workfn(struct work_struct *work)
 
 	if (nr_events)
 		kobject_uevent_env(&disk_to_dev(disk)->kobj, KOBJ_CHANGE, envp);
+
+	mutex_unlock(&ev->mutex);
+	WARN_ON_ONCE(atomic_add_return(-1, &ev->cnt) != 0);
 }
 
 /*
@@ -1749,6 +1757,8 @@ static void disk_add_events(struct gendisk *disk)
 	ev->block = 1;
 	ev->poll_msecs = -1;
 	INIT_DELAYED_WORK(&ev->dwork, disk_events_workfn);
+	mutex_init(&ev->mutex);
+	atomic_set(&ev->cnt, 0);
 
 	mutex_lock(&disk_events_mutex);
 	list_add_tail(&ev->node, &disk_events);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-15  9:01             ` Tejun Heo
@ 2011-04-15 18:20               ` Tejun Heo
  2011-04-15 21:41                 ` Sitsofe Wheeler
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2011-04-15 18:20 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel

Please try the following patch instead.  Thanks.

diff --git a/block/genhd.c b/block/genhd.c
index b364bd0..6ed0c17 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1377,6 +1377,8 @@ struct disk_events {
 
 	long			poll_msecs;	/* interval, -1 for default */
 	struct delayed_work	dwork;
+	struct mutex		mutex;
+	atomic_t		cnt;
 };
 
 static const char *disk_events_strs[] = {
@@ -1540,6 +1542,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask)
 		return 0;
 	}
 
+	WARN_ON_ONCE(atomic_add_return(1, &ev->cnt) != 1);
+	mutex_lock(&ev->mutex);
+
 	/* tell the workfn about the events being cleared */
 	spin_lock_irq(&ev->lock);
 	ev->clearing |= mask;
@@ -1558,6 +1563,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask)
 	ev->pending &= ~mask;
 	spin_unlock_irq(&ev->lock);
 
+	WARN_ON_ONCE(atomic_add_return(-1, &ev->cnt) != 0);
+	mutex_unlock(&ev->mutex);
+
 	return pending;
 }
 
@@ -1749,6 +1757,8 @@ static void disk_add_events(struct gendisk *disk)
 	ev->block = 1;
 	ev->poll_msecs = -1;
 	INIT_DELAYED_WORK(&ev->dwork, disk_events_workfn);
+	mutex_init(&ev->mutex);
+	atomic_set(&ev->cnt, 0);
 
 	mutex_lock(&disk_events_mutex);
 	list_add_tail(&ev->node, &disk_events);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-15 18:20               ` Tejun Heo
@ 2011-04-15 21:41                 ` Sitsofe Wheeler
  2011-04-21 17:38                   ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Sitsofe Wheeler @ 2011-04-15 21:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-kernel

On Sat, Apr 16, 2011 at 03:20:06AM +0900, Tejun Heo wrote:
> Please try the following patch instead.  Thanks.

I couldn't reproduce the problem with the very latest kernel but here's
the output that appeared with an older kernel with your patch:

------------[ cut here ]------------
WARNING: at block/genhd.c:1561 disk_clear_events+0x164/0x170()
Hardware name: 900
Pid: 10200, comm: blkid Not tainted 2.6.39-rc1-00220-g148086b-dirty #407
Call Trace:
 [<b0127218>] warn_slowpath_common+0x68/0xa0
 [<b0254f24>] ? disk_clear_events+0x164/0x170
 [<b0254f24>] ? disk_clear_events+0x164/0x170
 [<b012726b>] warn_slowpath_null+0x1b/0x20
 [<b0254f24>] disk_clear_events+0x164/0x170
 [<b01c29d5>] check_disk_change+0x25/0x70
 [<b03457e7>] sd_open+0x97/0x180
 [<b0345750>] ? sd_release+0x60/0x60
 [<b01c3aa5>] __blkdev_get+0x265/0x310
 [<b01c2ee7>] ? bdget+0x37/0x100
 [<b01c3a0d>] __blkdev_get+0x1cd/0x310
 [<b01c3b96>] blkdev_get+0x46/0x2c0
 [<b05139dd>] ? _raw_spin_unlock+0x1d/0x20
 [<b01c3e65>] blkdev_open+0x55/0x70
 [<b0193272>] __dentry_open+0x112/0x290
 [<b0193456>] nameidata_to_filp+0x66/0x70
 [<b01c3e10>] ? blkdev_get+0x2c0/0x2c0
 [<b01a050d>] do_last+0x19d/0x770
 [<b01a1740>] path_openat+0x90/0x310
 [<b01a1acd>] do_filp_open+0x5d/0x80
 [<b05139dd>] ? _raw_spin_unlock+0x1d/0x20
 [<b01aec64>] ? alloc_fd+0x164/0x1b0
 [<b0194591>] do_sys_open+0xf1/0x1c0
 [<b01946c9>] sys_open+0x29/0x40
 [<b0514197>] sysenter_do_call+0x12/0x36
---[ end trace e94a40ecc7f65c23 ]---

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-15 21:41                 ` Sitsofe Wheeler
@ 2011-04-21 17:38                   ` Tejun Heo
  2011-04-21 18:43                     ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2011-04-21 17:38 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel

Hello,

On Fri, Apr 15, 2011 at 10:41:40PM +0100, Sitsofe Wheeler wrote:
> I couldn't reproduce the problem with the very latest kernel but here's
> the output that appeared with an older kernel with your patch:
> 
> ------------[ cut here ]------------
> WARNING: at block/genhd.c:1561 disk_clear_events+0x164/0x170()

Hmmm... that's really weird.  It indicates that the workqueue command
sequence there doesn't guarantee scheduling and completion of the work
item, which means either there's a bug in workqueue itself or I'm
confused about what's going on.  Hopefully, the latter.
Unfortunately, even with physical setup, I'm having problem
reproducing the problem.  Which kernel version are you using exactly?
Can you please let me know the commit ID?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-21 17:38                   ` Tejun Heo
@ 2011-04-21 18:43                     ` Tejun Heo
  2011-05-10 16:48                       ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2011-04-21 18:43 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel

On Thu, Apr 21, 2011 at 07:38:45PM +0200, Tejun Heo wrote:
> Hello,
> 
> On Fri, Apr 15, 2011 at 10:41:40PM +0100, Sitsofe Wheeler wrote:
> > I couldn't reproduce the problem with the very latest kernel but here's
> > the output that appeared with an older kernel with your patch:
> > 
> > ------------[ cut here ]------------
> > WARNING: at block/genhd.c:1561 disk_clear_events+0x164/0x170()
> 
> Hmmm... that's really weird.  It indicates that the workqueue command
> sequence there doesn't guarantee scheduling and completion of the work
> item, which means either there's a bug in workqueue itself or I'm
> confused about what's going on.  Hopefully, the latter.
> Unfortunately, even with physical setup, I'm having problem
> reproducing the problem.  Which kernel version are you using exactly?
> Can you please let me know the commit ID?

Also, please try the following patch.  It should show what's going on
with the flush machinery.

Thanks.

diff --git a/block/genhd.c b/block/genhd.c
index b364bd0..0add6d5 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1377,6 +1377,8 @@ struct disk_events {
 
 	long			poll_msecs;	/* interval, -1 for default */
 	struct delayed_work	dwork;
+	struct mutex		mutex;
+	atomic_t		cnt;
 };
 
 static const char *disk_events_strs[] = {
@@ -1540,6 +1542,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask)
 		return 0;
 	}
 
+	WARN_ON_ONCE(atomic_add_return(1, &ev->cnt) != 1);
+	mutex_lock(&ev->mutex);
+
 	/* tell the workfn about the events being cleared */
 	spin_lock_irq(&ev->lock);
 	ev->clearing |= mask;
@@ -1548,7 +1553,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask)
 	/* uncondtionally schedule event check and wait for it to finish */
 	__disk_block_events(disk, true);
 	queue_delayed_work(system_nrt_wq, &ev->dwork, 0);
-	flush_delayed_work(&ev->dwork);
+	if (!flush_delayed_work_dbg(&ev->dwork))
+		printk("disk_clear_events-%s: flush returned false\n",
+		       disk->disk_name);
 	__disk_unblock_events(disk, false);
 
 	/* then, fetch and clear pending events */
@@ -1558,6 +1565,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask)
 	ev->pending &= ~mask;
 	spin_unlock_irq(&ev->lock);
 
+	WARN_ON_ONCE(atomic_add_return(-1, &ev->cnt) != 0);
+	mutex_unlock(&ev->mutex);
+
 	return pending;
 }
 
@@ -1749,6 +1759,8 @@ static void disk_add_events(struct gendisk *disk)
 	ev->block = 1;
 	ev->poll_msecs = -1;
 	INIT_DELAYED_WORK(&ev->dwork, disk_events_workfn);
+	mutex_init(&ev->mutex);
+	atomic_set(&ev->cnt, 0);
 
 	mutex_lock(&disk_events_mutex);
 	list_add_tail(&ev->node, &disk_events);
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index f584aba..eade076 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -372,6 +372,7 @@ extern bool flush_work_sync(struct work_struct *work);
 extern bool cancel_work_sync(struct work_struct *work);
 
 extern bool flush_delayed_work(struct delayed_work *dwork);
+extern bool flush_delayed_work_dbg(struct delayed_work *dwork);
 extern bool flush_delayed_work_sync(struct delayed_work *work);
 extern bool cancel_delayed_work_sync(struct delayed_work *dwork);
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8859a41..20ac7d1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2376,7 +2376,7 @@ out_unlock:
 EXPORT_SYMBOL_GPL(flush_workqueue);
 
 static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
-			     bool wait_executing)
+			     bool wait_executing, bool dbg)
 {
 	struct worker *worker = NULL;
 	struct global_cwq *gcwq;
@@ -2384,8 +2384,11 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
 
 	might_sleep();
 	gcwq = get_work_gcwq(work);
-	if (!gcwq)
+	if (!gcwq) {
+		if (dbg)
+			printk("flush_work-%p: doesn't have gcwq set\n", work);
 		return false;
+	}
 
 	spin_lock_irq(&gcwq->lock);
 	if (!list_empty(&work->entry)) {
@@ -2396,15 +2399,26 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
 		 */
 		smp_rmb();
 		cwq = get_work_cwq(work);
-		if (unlikely(!cwq || gcwq != cwq->gcwq))
+		if (unlikely(!cwq || gcwq != cwq->gcwq)) {
+			if (dbg)
+				printk("flush_work-%p: cwq=%p gcwq@%d cwq->gcwq@%d\n",
+				       work, cwq, gcwq->cpu, cwq ? cwq->gcwq->cpu : -1);
 			goto already_gone;
+		}
 	} else if (wait_executing) {
 		worker = find_worker_executing_work(gcwq, work);
-		if (!worker)
+		if (!worker) {
+			if (dbg)
+				printk("flush_work-%p: wait_executing but not executing on %d\n",
+				       work, gcwq->cpu);
 			goto already_gone;
+		}
 		cwq = worker->current_cwq;
-	} else
+	} else {
+		if (dbg)
+			printk("flush_work-%p: idle, cpu=%d\n", work, gcwq->cpu);
 		goto already_gone;
+	}
 
 	insert_wq_barrier(cwq, barr, work, worker);
 	spin_unlock_irq(&gcwq->lock);
@@ -2449,7 +2463,7 @@ bool flush_work(struct work_struct *work)
 {
 	struct wq_barrier barr;
 
-	if (start_flush_work(work, &barr, true)) {
+	if (start_flush_work(work, &barr, true, false)) {
 		wait_for_completion(&barr.done);
 		destroy_work_on_stack(&barr.work);
 		return true;
@@ -2458,6 +2472,18 @@ bool flush_work(struct work_struct *work)
 }
 EXPORT_SYMBOL_GPL(flush_work);
 
+static bool flush_work_dbg(struct work_struct *work)
+{
+	struct wq_barrier barr;
+
+	if (start_flush_work(work, &barr, true, true)) {
+		wait_for_completion(&barr.done);
+		destroy_work_on_stack(&barr.work);
+		return true;
+	} else
+		return false;
+}
+
 static bool wait_on_cpu_work(struct global_cwq *gcwq, struct work_struct *work)
 {
 	struct wq_barrier barr;
@@ -2514,7 +2540,7 @@ bool flush_work_sync(struct work_struct *work)
 	bool pending, waited;
 
 	/* we'll wait for executions separately, queue barr only if pending */
-	pending = start_flush_work(work, &barr, false);
+	pending = start_flush_work(work, &barr, false, false);
 
 	/* wait for executions to finish */
 	waited = wait_on_work(work);
@@ -2632,6 +2658,15 @@ bool flush_delayed_work(struct delayed_work *dwork)
 }
 EXPORT_SYMBOL(flush_delayed_work);
 
+bool flush_delayed_work_dbg(struct delayed_work *dwork)
+{
+	if (del_timer_sync(&dwork->timer))
+		__queue_work(raw_smp_processor_id(),
+			     get_work_cwq(&dwork->work)->wq, &dwork->work);
+	return flush_work_dbg(&dwork->work);
+}
+EXPORT_SYMBOL(flush_delayed_work_dbg);
+
 /**
  * flush_delayed_work_sync - wait for a dwork to finish
  * @dwork: the delayed work to flush

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0()
  2011-04-21 18:43                     ` Tejun Heo
@ 2011-05-10 16:48                       ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2011-05-10 16:48 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel

On Thu, Apr 21, 2011 at 08:43:04PM +0200, Tejun Heo wrote:
> On Thu, Apr 21, 2011 at 07:38:45PM +0200, Tejun Heo wrote:
> > Hello,
> > 
> > On Fri, Apr 15, 2011 at 10:41:40PM +0100, Sitsofe Wheeler wrote:
> > > I couldn't reproduce the problem with the very latest kernel but here's
> > > the output that appeared with an older kernel with your patch:
> > > 
> > > ------------[ cut here ]------------
> > > WARNING: at block/genhd.c:1561 disk_clear_events+0x164/0x170()
> > 
> > Hmmm... that's really weird.  It indicates that the workqueue command
> > sequence there doesn't guarantee scheduling and completion of the work
> > item, which means either there's a bug in workqueue itself or I'm
> > confused about what's going on.  Hopefully, the latter.
> > Unfortunately, even with physical setup, I'm having problem
> > reproducing the problem.  Which kernel version are you using exactly?
> > Can you please let me know the commit ID?
> 
> Also, please try the following patch.  It should show what's going on
> with the flush machinery.

Never mind.  I now have a reliably reproducible test case.  Can you
please add yourself to bug 34662?  Let's track this issue there.

 https://bugzilla.kernel.org/show_bug.cgi?id=34662

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-05-10 16:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-04 21:16 WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() Sitsofe Wheeler
2011-04-05  1:51 ` Jens Axboe
2011-04-06 13:04 ` Tejun Heo
2011-04-06 21:26   ` Sitsofe Wheeler
2011-04-06 21:37     ` Tejun Heo
2011-04-09  9:34       ` Sitsofe Wheeler
2011-04-10 14:59         ` Sitsofe Wheeler
2011-04-13 20:00           ` Tejun Heo
2011-04-15  9:01             ` Tejun Heo
2011-04-15 18:20               ` Tejun Heo
2011-04-15 21:41                 ` Sitsofe Wheeler
2011-04-21 17:38                   ` Tejun Heo
2011-04-21 18:43                     ` Tejun Heo
2011-05-10 16:48                       ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).