* WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() @ 2011-04-04 21:16 Sitsofe Wheeler 2011-04-05 1:51 ` Jens Axboe 2011-04-06 13:04 ` Tejun Heo 0 siblings, 2 replies; 14+ messages in thread From: Sitsofe Wheeler @ 2011-04-04 21:16 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel I'm seeing the following warning on 2.6.39-rc1-00220-g148086b when I was inserting an SD card (which is attached to the USB port) and USB stick with loads of kernel debug options on: [ 7.526172] ------------[ cut here ]------------ [ 7.526191] WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() [ 7.526197] Hardware name: 900 [ 7.526205] Pid: 1112, comm: udisks-part-id Not tainted 2.6.39-rc1-00227-gd7c764c-dirty #352 [ 7.526211] Call Trace: [ 7.526224] [<b0127208>] warn_slowpath_common+0x68/0xa0 [ 7.526232] [<b0254e8c>] ? disk_clear_events+0xdc/0xf0 [ 7.526240] [<b0254e8c>] ? disk_clear_events+0xdc/0xf0 [ 7.526248] [<b012725b>] warn_slowpath_null+0x1b/0x20 [ 7.526255] [<b0254e8c>] disk_clear_events+0xdc/0xf0 [ 7.526265] [<b01c29c5>] check_disk_change+0x25/0x70 [ 7.526274] [<b0345647>] sd_open+0x97/0x180 [ 7.526282] [<b01c38ac>] __blkdev_get+0x7c/0x310 [ 7.526289] [<b01c3b86>] blkdev_get+0x46/0x2c0 [ 7.526298] [<b051206d>] ? _raw_spin_unlock+0x1d/0x20 [ 7.526306] [<b01c3e55>] blkdev_open+0x55/0x70 [ 7.526316] [<b0193262>] __dentry_open+0x112/0x290 [ 7.526323] [<b0193446>] nameidata_to_filp+0x66/0x70 [ 7.526331] [<b01c3e00>] ? blkdev_get+0x2c0/0x2c0 [ 7.526340] [<b01a04fd>] do_last+0x19d/0x770 [ 7.526348] [<b01a1730>] path_openat+0x90/0x310 [ 7.526356] [<b01a1abd>] do_filp_open+0x5d/0x80 [ 7.526364] [<b051206d>] ? _raw_spin_unlock+0x1d/0x20 [ 7.526373] [<b01aec54>] ? alloc_fd+0x164/0x1b0 [ 7.526381] [<b0194581>] do_sys_open+0xf1/0x1c0 [ 7.526389] [<b01952ac>] ? vfs_write+0xfc/0x140 [ 7.526397] [<b01946b9>] sys_open+0x29/0x40 [ 7.526404] [<b0512817>] sysenter_do_call+0x12/0x36 [ 7.526410] ---[ end trace 1aec3f97d06611cd ]--- A quick Google also turns up the same report in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/748166 . -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-04 21:16 WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() Sitsofe Wheeler @ 2011-04-05 1:51 ` Jens Axboe 2011-04-06 13:04 ` Tejun Heo 1 sibling, 0 replies; 14+ messages in thread From: Jens Axboe @ 2011-04-05 1:51 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: linux-kernel, Tejun Heo On 2011-04-04 23:16, Sitsofe Wheeler wrote: > I'm seeing the following warning on 2.6.39-rc1-00220-g148086b when I was > inserting an SD card (which is attached to the USB port) and USB stick > with loads of kernel debug options on: > > [ 7.526172] ------------[ cut here ]------------ > [ 7.526191] WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() > [ 7.526197] Hardware name: 900 > [ 7.526205] Pid: 1112, comm: udisks-part-id Not tainted 2.6.39-rc1-00227-gd7c764c-dirty #352 > [ 7.526211] Call Trace: > [ 7.526224] [<b0127208>] warn_slowpath_common+0x68/0xa0 > [ 7.526232] [<b0254e8c>] ? disk_clear_events+0xdc/0xf0 > [ 7.526240] [<b0254e8c>] ? disk_clear_events+0xdc/0xf0 > [ 7.526248] [<b012725b>] warn_slowpath_null+0x1b/0x20 > [ 7.526255] [<b0254e8c>] disk_clear_events+0xdc/0xf0 > [ 7.526265] [<b01c29c5>] check_disk_change+0x25/0x70 > [ 7.526274] [<b0345647>] sd_open+0x97/0x180 > [ 7.526282] [<b01c38ac>] __blkdev_get+0x7c/0x310 > [ 7.526289] [<b01c3b86>] blkdev_get+0x46/0x2c0 > [ 7.526298] [<b051206d>] ? _raw_spin_unlock+0x1d/0x20 > [ 7.526306] [<b01c3e55>] blkdev_open+0x55/0x70 > [ 7.526316] [<b0193262>] __dentry_open+0x112/0x290 > [ 7.526323] [<b0193446>] nameidata_to_filp+0x66/0x70 > [ 7.526331] [<b01c3e00>] ? blkdev_get+0x2c0/0x2c0 > [ 7.526340] [<b01a04fd>] do_last+0x19d/0x770 > [ 7.526348] [<b01a1730>] path_openat+0x90/0x310 > [ 7.526356] [<b01a1abd>] do_filp_open+0x5d/0x80 > [ 7.526364] [<b051206d>] ? _raw_spin_unlock+0x1d/0x20 > [ 7.526373] [<b01aec54>] ? alloc_fd+0x164/0x1b0 > [ 7.526381] [<b0194581>] do_sys_open+0xf1/0x1c0 > [ 7.526389] [<b01952ac>] ? vfs_write+0xfc/0x140 > [ 7.526397] [<b01946b9>] sys_open+0x29/0x40 > [ 7.526404] [<b0512817>] sysenter_do_call+0x12/0x36 > [ 7.526410] ---[ end trace 1aec3f97d06611cd ]--- > > A quick Google also turns up the same report in > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/748166 . CC Tejun. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-04 21:16 WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() Sitsofe Wheeler 2011-04-05 1:51 ` Jens Axboe @ 2011-04-06 13:04 ` Tejun Heo 2011-04-06 21:26 ` Sitsofe Wheeler 1 sibling, 1 reply; 14+ messages in thread From: Tejun Heo @ 2011-04-06 13:04 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel I think disk_block_events was a bit too agressive with optimization. Can you please test the following patch and see whether the problem goes away? Thanks. diff --git a/block/genhd.c b/block/genhd.c index c91a2da..ae4079a 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1418,18 +1418,15 @@ static void __disk_block_events(struct gendisk *disk, bool sync) { struct disk_events *ev = disk->ev; unsigned long flags; - bool cancel; spin_lock_irqsave(&ev->lock, flags); - cancel = !ev->block++; + ev->block++; spin_unlock_irqrestore(&ev->lock, flags); - if (cancel) { - if (sync) - cancel_delayed_work_sync(&disk->ev->dwork); - else - cancel_delayed_work(&disk->ev->dwork); - } + if (sync) + cancel_delayed_work_sync(&disk->ev->dwork); + else + cancel_delayed_work(&disk->ev->dwork); } static void __disk_unblock_events(struct gendisk *disk, bool check_now) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-06 13:04 ` Tejun Heo @ 2011-04-06 21:26 ` Sitsofe Wheeler 2011-04-06 21:37 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Sitsofe Wheeler @ 2011-04-06 21:26 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, linux-kernel On Wed, Apr 06, 2011 at 06:04:23AM -0700, Tejun Heo wrote: > I think disk_block_events was a bit too agressive with optimization. > Can you please test the following patch and see whether the problem > goes away? With the 2.6.39-rc2 the warning has mysteriously disappeared all by itself. Could it be that other parts of the kernel were causing this to manifest itself? -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-06 21:26 ` Sitsofe Wheeler @ 2011-04-06 21:37 ` Tejun Heo 2011-04-09 9:34 ` Sitsofe Wheeler 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2011-04-06 21:37 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel On Wed, Apr 06, 2011 at 10:26:11PM +0100, Sitsofe Wheeler wrote: > On Wed, Apr 06, 2011 at 06:04:23AM -0700, Tejun Heo wrote: > > I think disk_block_events was a bit too agressive with optimization. > > Can you please test the following patch and see whether the problem > > goes away? > > With the 2.6.39-rc2 the warning has mysteriously disappeared all by > itself. Could it be that other parts of the kernel were causing this to > manifest itself? The problem would be quite timing dependent. It would be great if you can prepare a setup where you can reproduce the problem on some regular basis and make sure that the problem goes away with the patch applied. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-06 21:37 ` Tejun Heo @ 2011-04-09 9:34 ` Sitsofe Wheeler 2011-04-10 14:59 ` Sitsofe Wheeler 0 siblings, 1 reply; 14+ messages in thread From: Sitsofe Wheeler @ 2011-04-09 9:34 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, linux-kernel On Wed, Apr 06, 2011 at 02:37:18PM -0700, Tejun Heo wrote: > > The problem would be quite timing dependent. It would be great if you > can prepare a setup where you can reproduce the problem on some > regular basis and make sure that the problem goes away with the patch > applied. Try as I might I cannot even reproduce the problem with the original problem kernels. I've tried all sorts (such as plugging in multiple USB disks and killing/restarting services such as udisks) but the problem has refused to show itself again. Originally I saw the message three times over a two days period and two of those were within 10 seconds of boot. I've booted the same kernel a dozen times but the error never comes back... -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-09 9:34 ` Sitsofe Wheeler @ 2011-04-10 14:59 ` Sitsofe Wheeler 2011-04-13 20:00 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Sitsofe Wheeler @ 2011-04-10 14:59 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, linux-kernel On Sat, Apr 09, 2011 at 10:34:49AM +0100, Sitsofe Wheeler wrote: > On Wed, Apr 06, 2011 at 02:37:18PM -0700, Tejun Heo wrote: > > > > The problem would be quite timing dependent. It would be great if you > > can prepare a setup where you can reproduce the problem on some > > regular basis and make sure that the problem goes away with the patch > > applied. > > Try as I might I cannot even reproduce the problem with the original > problem kernels. I've tried all sorts (such as plugging in multiple USB Shortly after saying this the problem reappeared by itself. I've found a method to reproduce it too: by running for i in `seq 1 100`; do udevadm trigger --subsystem-match=block ; done and then waiting at the desktop for a five or so minutes (so the requests that are backed up drain away) then running it again etc. the problem will usually trigger. The warning is still here in the most recent 2.6.39-rc2-00120-g94c8a98 kernel too. Applying the patch to the first kernel that showed th problem doesn't fix the issue (although it appears to make the issue a lot more difficult to trigger). [ 40.370083] wlan0: no IPv6 routers present [ 5217.888074] ------------[ cut here ]------------ [ 5217.888095] WARNING: at block/genhd.c:1553 disk_clear_events+0xdc/0xf0() [ 5217.888101] Hardware name: 900 [ 5217.888109] Pid: 1740, comm: hald-addon-stor Not tainted 2.6.39-rc1-00220-g148086b-dirty #397 [ 5217.888115] Call Trace: [ 5217.888127] [<b0127218>] warn_slowpath_common+0x68/0xa0 [ 5217.888136] [<b0254e8c>] ? disk_clear_events+0xdc/0xf0 [ 5217.888143] [<b0254e8c>] ? disk_clear_events+0xdc/0xf0 [ 5217.888152] [<b012726b>] warn_slowpath_null+0x1b/0x20 [ 5217.888159] [<b0254e8c>] disk_clear_events+0xdc/0xf0 [ 5217.888169] [<b01c29d5>] check_disk_change+0x25/0x70 [ 5217.888178] [<b0345747>] sd_open+0x97/0x180 [ 5217.888185] [<b03456b0>] ? sd_release+0x60/0x60 [ 5217.888192] [<b01c3aa5>] __blkdev_get+0x265/0x310 [ 5217.888200] [<b01c3b96>] blkdev_get+0x46/0x2c0 [ 5217.888209] [<b051216d>] ? _raw_spin_unlock+0x1d/0x20 [ 5217.888217] [<b01c3e65>] blkdev_open+0x55/0x70 [ 5217.888226] [<b0193272>] __dentry_open+0x112/0x290 [ 5217.888234] [<b0193456>] nameidata_to_filp+0x66/0x70 [ 5217.888242] [<b01c3e10>] ? blkdev_get+0x2c0/0x2c0 [ 5217.888251] [<b01a050d>] do_last+0x19d/0x770 [ 5217.888259] [<b01a1740>] path_openat+0x90/0x310 [ 5217.888267] [<b01a1acd>] do_filp_open+0x5d/0x80 [ 5217.888274] [<b051216d>] ? _raw_spin_unlock+0x1d/0x20 [ 5217.888284] [<b01aec64>] ? alloc_fd+0x164/0x1b0 [ 5217.888292] [<b0194591>] do_sys_open+0xf1/0x1c0 [ 5217.888301] [<b0181b0f>] ? might_fault+0x7f/0x90 [ 5217.888308] [<b01946c9>] sys_open+0x29/0x40 [ 5217.888316] [<b0512917>] sysenter_do_call+0x12/0x36 [ 5217.888322] ---[ end trace e99fcb86a68da1f4 ]--- [ 5793.762848] [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind a purgeable object [ 6153.518126] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 6153.519121] ata2.00: BMDMA stat 0x65 [ 6153.519673] ata2.00: cmd c8/00:08:a0:de:77/00:00:00:00:00/e0 tag 0 dma 4096 in [ 6153.519676] res 51/44:08:a0:de:77/00:00:00:00:00/e0 Emask 0x9 (media error) I will note that by EeePC's onboard "gen-0" SSD mysteriously developed bad blocks after all this so perhaps the above is especially stressful for disks... -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-10 14:59 ` Sitsofe Wheeler @ 2011-04-13 20:00 ` Tejun Heo 2011-04-15 9:01 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2011-04-13 20:00 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel Hello, Sitsofe. Sorry about late replay. I've been and still am travelling. :-( On Sun, Apr 10, 2011 at 03:59:20PM +0100, Sitsofe Wheeler wrote: > Shortly after saying this the problem reappeared by itself. I've found a > method to reproduce it too: by running > for i in `seq 1 100`; do udevadm trigger --subsystem-match=block ; done > and then waiting at the desktop for a five or so minutes (so the > requests that are backed up drain away) then running it again etc. the > problem will usually trigger. The warning is still here in the most > recent 2.6.39-rc2-00120-g94c8a98 kernel too. Applying the patch to the > first kernel that showed th problem doesn't fix the issue (although it > appears to make the issue a lot more difficult to trigger). Thanks a lot for finding out the test case. I'll see whether I can reproduce the problem in qemu. > I will note that by EeePC's onboard "gen-0" SSD mysteriously developed > bad blocks after all this so perhaps the above is especially stressful > for disks... Yeah, it probably hammers the same sector over and over again. Earlier/cheap SSDs often lack proper wear leveling and writing repeatedly to the same sector can easily create bad blocks. Sorry about that. Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-13 20:00 ` Tejun Heo @ 2011-04-15 9:01 ` Tejun Heo 2011-04-15 18:20 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2011-04-15 9:01 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel On Thu, Apr 14, 2011 at 05:00:00AM +0900, Tejun Heo wrote: > Thanks a lot for finding out the test case. I'll see whether I can > reproduce the problem in qemu. Hmm... can't reproduce the problem with qemu. Can you please apply the following patch and see which warning messages trigger? Thanks. diff --git a/block/genhd.c b/block/genhd.c index b364bd0..f061975 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1377,6 +1377,8 @@ struct disk_events { long poll_msecs; /* interval, -1 for default */ struct delayed_work dwork; + struct mutex mutex; + atomic_t cnt; }; static const char *disk_events_strs[] = { @@ -1572,6 +1574,9 @@ static void disk_events_workfn(struct work_struct *work) unsigned long intv; int nr_events = 0, i; + WARN_ON_ONCE(atomic_add_return(1, &ev->cnt) != 1); + mutex_lock(&ev->mutex); + /* check events */ events = disk->fops->check_events(disk, clearing); @@ -1595,6 +1600,9 @@ static void disk_events_workfn(struct work_struct *work) if (nr_events) kobject_uevent_env(&disk_to_dev(disk)->kobj, KOBJ_CHANGE, envp); + + mutex_unlock(&ev->mutex); + WARN_ON_ONCE(atomic_add_return(-1, &ev->cnt) != 0); } /* @@ -1749,6 +1757,8 @@ static void disk_add_events(struct gendisk *disk) ev->block = 1; ev->poll_msecs = -1; INIT_DELAYED_WORK(&ev->dwork, disk_events_workfn); + mutex_init(&ev->mutex); + atomic_set(&ev->cnt, 0); mutex_lock(&disk_events_mutex); list_add_tail(&ev->node, &disk_events); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-15 9:01 ` Tejun Heo @ 2011-04-15 18:20 ` Tejun Heo 2011-04-15 21:41 ` Sitsofe Wheeler 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2011-04-15 18:20 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel Please try the following patch instead. Thanks. diff --git a/block/genhd.c b/block/genhd.c index b364bd0..6ed0c17 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1377,6 +1377,8 @@ struct disk_events { long poll_msecs; /* interval, -1 for default */ struct delayed_work dwork; + struct mutex mutex; + atomic_t cnt; }; static const char *disk_events_strs[] = { @@ -1540,6 +1542,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask) return 0; } + WARN_ON_ONCE(atomic_add_return(1, &ev->cnt) != 1); + mutex_lock(&ev->mutex); + /* tell the workfn about the events being cleared */ spin_lock_irq(&ev->lock); ev->clearing |= mask; @@ -1558,6 +1563,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask) ev->pending &= ~mask; spin_unlock_irq(&ev->lock); + WARN_ON_ONCE(atomic_add_return(-1, &ev->cnt) != 0); + mutex_unlock(&ev->mutex); + return pending; } @@ -1749,6 +1757,8 @@ static void disk_add_events(struct gendisk *disk) ev->block = 1; ev->poll_msecs = -1; INIT_DELAYED_WORK(&ev->dwork, disk_events_workfn); + mutex_init(&ev->mutex); + atomic_set(&ev->cnt, 0); mutex_lock(&disk_events_mutex); list_add_tail(&ev->node, &disk_events); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-15 18:20 ` Tejun Heo @ 2011-04-15 21:41 ` Sitsofe Wheeler 2011-04-21 17:38 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Sitsofe Wheeler @ 2011-04-15 21:41 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, linux-kernel On Sat, Apr 16, 2011 at 03:20:06AM +0900, Tejun Heo wrote: > Please try the following patch instead. Thanks. I couldn't reproduce the problem with the very latest kernel but here's the output that appeared with an older kernel with your patch: ------------[ cut here ]------------ WARNING: at block/genhd.c:1561 disk_clear_events+0x164/0x170() Hardware name: 900 Pid: 10200, comm: blkid Not tainted 2.6.39-rc1-00220-g148086b-dirty #407 Call Trace: [<b0127218>] warn_slowpath_common+0x68/0xa0 [<b0254f24>] ? disk_clear_events+0x164/0x170 [<b0254f24>] ? disk_clear_events+0x164/0x170 [<b012726b>] warn_slowpath_null+0x1b/0x20 [<b0254f24>] disk_clear_events+0x164/0x170 [<b01c29d5>] check_disk_change+0x25/0x70 [<b03457e7>] sd_open+0x97/0x180 [<b0345750>] ? sd_release+0x60/0x60 [<b01c3aa5>] __blkdev_get+0x265/0x310 [<b01c2ee7>] ? bdget+0x37/0x100 [<b01c3a0d>] __blkdev_get+0x1cd/0x310 [<b01c3b96>] blkdev_get+0x46/0x2c0 [<b05139dd>] ? _raw_spin_unlock+0x1d/0x20 [<b01c3e65>] blkdev_open+0x55/0x70 [<b0193272>] __dentry_open+0x112/0x290 [<b0193456>] nameidata_to_filp+0x66/0x70 [<b01c3e10>] ? blkdev_get+0x2c0/0x2c0 [<b01a050d>] do_last+0x19d/0x770 [<b01a1740>] path_openat+0x90/0x310 [<b01a1acd>] do_filp_open+0x5d/0x80 [<b05139dd>] ? _raw_spin_unlock+0x1d/0x20 [<b01aec64>] ? alloc_fd+0x164/0x1b0 [<b0194591>] do_sys_open+0xf1/0x1c0 [<b01946c9>] sys_open+0x29/0x40 [<b0514197>] sysenter_do_call+0x12/0x36 ---[ end trace e94a40ecc7f65c23 ]--- -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-15 21:41 ` Sitsofe Wheeler @ 2011-04-21 17:38 ` Tejun Heo 2011-04-21 18:43 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2011-04-21 17:38 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel Hello, On Fri, Apr 15, 2011 at 10:41:40PM +0100, Sitsofe Wheeler wrote: > I couldn't reproduce the problem with the very latest kernel but here's > the output that appeared with an older kernel with your patch: > > ------------[ cut here ]------------ > WARNING: at block/genhd.c:1561 disk_clear_events+0x164/0x170() Hmmm... that's really weird. It indicates that the workqueue command sequence there doesn't guarantee scheduling and completion of the work item, which means either there's a bug in workqueue itself or I'm confused about what's going on. Hopefully, the latter. Unfortunately, even with physical setup, I'm having problem reproducing the problem. Which kernel version are you using exactly? Can you please let me know the commit ID? Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-21 17:38 ` Tejun Heo @ 2011-04-21 18:43 ` Tejun Heo 2011-05-10 16:48 ` Tejun Heo 0 siblings, 1 reply; 14+ messages in thread From: Tejun Heo @ 2011-04-21 18:43 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel On Thu, Apr 21, 2011 at 07:38:45PM +0200, Tejun Heo wrote: > Hello, > > On Fri, Apr 15, 2011 at 10:41:40PM +0100, Sitsofe Wheeler wrote: > > I couldn't reproduce the problem with the very latest kernel but here's > > the output that appeared with an older kernel with your patch: > > > > ------------[ cut here ]------------ > > WARNING: at block/genhd.c:1561 disk_clear_events+0x164/0x170() > > Hmmm... that's really weird. It indicates that the workqueue command > sequence there doesn't guarantee scheduling and completion of the work > item, which means either there's a bug in workqueue itself or I'm > confused about what's going on. Hopefully, the latter. > Unfortunately, even with physical setup, I'm having problem > reproducing the problem. Which kernel version are you using exactly? > Can you please let me know the commit ID? Also, please try the following patch. It should show what's going on with the flush machinery. Thanks. diff --git a/block/genhd.c b/block/genhd.c index b364bd0..0add6d5 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1377,6 +1377,8 @@ struct disk_events { long poll_msecs; /* interval, -1 for default */ struct delayed_work dwork; + struct mutex mutex; + atomic_t cnt; }; static const char *disk_events_strs[] = { @@ -1540,6 +1542,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask) return 0; } + WARN_ON_ONCE(atomic_add_return(1, &ev->cnt) != 1); + mutex_lock(&ev->mutex); + /* tell the workfn about the events being cleared */ spin_lock_irq(&ev->lock); ev->clearing |= mask; @@ -1548,7 +1553,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask) /* uncondtionally schedule event check and wait for it to finish */ __disk_block_events(disk, true); queue_delayed_work(system_nrt_wq, &ev->dwork, 0); - flush_delayed_work(&ev->dwork); + if (!flush_delayed_work_dbg(&ev->dwork)) + printk("disk_clear_events-%s: flush returned false\n", + disk->disk_name); __disk_unblock_events(disk, false); /* then, fetch and clear pending events */ @@ -1558,6 +1565,9 @@ unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask) ev->pending &= ~mask; spin_unlock_irq(&ev->lock); + WARN_ON_ONCE(atomic_add_return(-1, &ev->cnt) != 0); + mutex_unlock(&ev->mutex); + return pending; } @@ -1749,6 +1759,8 @@ static void disk_add_events(struct gendisk *disk) ev->block = 1; ev->poll_msecs = -1; INIT_DELAYED_WORK(&ev->dwork, disk_events_workfn); + mutex_init(&ev->mutex); + atomic_set(&ev->cnt, 0); mutex_lock(&disk_events_mutex); list_add_tail(&ev->node, &disk_events); diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index f584aba..eade076 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -372,6 +372,7 @@ extern bool flush_work_sync(struct work_struct *work); extern bool cancel_work_sync(struct work_struct *work); extern bool flush_delayed_work(struct delayed_work *dwork); +extern bool flush_delayed_work_dbg(struct delayed_work *dwork); extern bool flush_delayed_work_sync(struct delayed_work *work); extern bool cancel_delayed_work_sync(struct delayed_work *dwork); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 8859a41..20ac7d1 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2376,7 +2376,7 @@ out_unlock: EXPORT_SYMBOL_GPL(flush_workqueue); static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, - bool wait_executing) + bool wait_executing, bool dbg) { struct worker *worker = NULL; struct global_cwq *gcwq; @@ -2384,8 +2384,11 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, might_sleep(); gcwq = get_work_gcwq(work); - if (!gcwq) + if (!gcwq) { + if (dbg) + printk("flush_work-%p: doesn't have gcwq set\n", work); return false; + } spin_lock_irq(&gcwq->lock); if (!list_empty(&work->entry)) { @@ -2396,15 +2399,26 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, */ smp_rmb(); cwq = get_work_cwq(work); - if (unlikely(!cwq || gcwq != cwq->gcwq)) + if (unlikely(!cwq || gcwq != cwq->gcwq)) { + if (dbg) + printk("flush_work-%p: cwq=%p gcwq@%d cwq->gcwq@%d\n", + work, cwq, gcwq->cpu, cwq ? cwq->gcwq->cpu : -1); goto already_gone; + } } else if (wait_executing) { worker = find_worker_executing_work(gcwq, work); - if (!worker) + if (!worker) { + if (dbg) + printk("flush_work-%p: wait_executing but not executing on %d\n", + work, gcwq->cpu); goto already_gone; + } cwq = worker->current_cwq; - } else + } else { + if (dbg) + printk("flush_work-%p: idle, cpu=%d\n", work, gcwq->cpu); goto already_gone; + } insert_wq_barrier(cwq, barr, work, worker); spin_unlock_irq(&gcwq->lock); @@ -2449,7 +2463,7 @@ bool flush_work(struct work_struct *work) { struct wq_barrier barr; - if (start_flush_work(work, &barr, true)) { + if (start_flush_work(work, &barr, true, false)) { wait_for_completion(&barr.done); destroy_work_on_stack(&barr.work); return true; @@ -2458,6 +2472,18 @@ bool flush_work(struct work_struct *work) } EXPORT_SYMBOL_GPL(flush_work); +static bool flush_work_dbg(struct work_struct *work) +{ + struct wq_barrier barr; + + if (start_flush_work(work, &barr, true, true)) { + wait_for_completion(&barr.done); + destroy_work_on_stack(&barr.work); + return true; + } else + return false; +} + static bool wait_on_cpu_work(struct global_cwq *gcwq, struct work_struct *work) { struct wq_barrier barr; @@ -2514,7 +2540,7 @@ bool flush_work_sync(struct work_struct *work) bool pending, waited; /* we'll wait for executions separately, queue barr only if pending */ - pending = start_flush_work(work, &barr, false); + pending = start_flush_work(work, &barr, false, false); /* wait for executions to finish */ waited = wait_on_work(work); @@ -2632,6 +2658,15 @@ bool flush_delayed_work(struct delayed_work *dwork) } EXPORT_SYMBOL(flush_delayed_work); +bool flush_delayed_work_dbg(struct delayed_work *dwork) +{ + if (del_timer_sync(&dwork->timer)) + __queue_work(raw_smp_processor_id(), + get_work_cwq(&dwork->work)->wq, &dwork->work); + return flush_work_dbg(&dwork->work); +} +EXPORT_SYMBOL(flush_delayed_work_dbg); + /** * flush_delayed_work_sync - wait for a dwork to finish * @dwork: the delayed work to flush ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() 2011-04-21 18:43 ` Tejun Heo @ 2011-05-10 16:48 ` Tejun Heo 0 siblings, 0 replies; 14+ messages in thread From: Tejun Heo @ 2011-05-10 16:48 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, linux-kernel On Thu, Apr 21, 2011 at 08:43:04PM +0200, Tejun Heo wrote: > On Thu, Apr 21, 2011 at 07:38:45PM +0200, Tejun Heo wrote: > > Hello, > > > > On Fri, Apr 15, 2011 at 10:41:40PM +0100, Sitsofe Wheeler wrote: > > > I couldn't reproduce the problem with the very latest kernel but here's > > > the output that appeared with an older kernel with your patch: > > > > > > ------------[ cut here ]------------ > > > WARNING: at block/genhd.c:1561 disk_clear_events+0x164/0x170() > > > > Hmmm... that's really weird. It indicates that the workqueue command > > sequence there doesn't guarantee scheduling and completion of the work > > item, which means either there's a bug in workqueue itself or I'm > > confused about what's going on. Hopefully, the latter. > > Unfortunately, even with physical setup, I'm having problem > > reproducing the problem. Which kernel version are you using exactly? > > Can you please let me know the commit ID? > > Also, please try the following patch. It should show what's going on > with the flush machinery. Never mind. I now have a reliably reproducible test case. Can you please add yourself to bug 34662? Let's track this issue there. https://bugzilla.kernel.org/show_bug.cgi?id=34662 Thanks. -- tejun ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-05-10 16:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-04 21:16 WARNING: at block/genhd.c:1556 disk_clear_events+0xdc/0xf0() Sitsofe Wheeler 2011-04-05 1:51 ` Jens Axboe 2011-04-06 13:04 ` Tejun Heo 2011-04-06 21:26 ` Sitsofe Wheeler 2011-04-06 21:37 ` Tejun Heo 2011-04-09 9:34 ` Sitsofe Wheeler 2011-04-10 14:59 ` Sitsofe Wheeler 2011-04-13 20:00 ` Tejun Heo 2011-04-15 9:01 ` Tejun Heo 2011-04-15 18:20 ` Tejun Heo 2011-04-15 21:41 ` Sitsofe Wheeler 2011-04-21 17:38 ` Tejun Heo 2011-04-21 18:43 ` Tejun Heo 2011-05-10 16:48 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).