public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
* Race condition between "read CFQ stats" and "block device shutdown"
@ 2013-09-03 20:14 Anatol Pomozov
       [not found] ` <CAOMFOmXJ5ZTYdOvdUt-oxsouhPGRmMshCRhn6AFgmFAGZw5WZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Anatol Pomozov @ 2013-09-03 20:14 UTC (permalink / raw)
  To: Cgroups, Tejun Heo, Hannes Reinecke, Jens Axboe

Hi,

I am running a program that checkes "read CFQ stat files" for race
conditions with other evens (e.g. device shutdown).

And I discovered an interesting bug. Here is the "double_unlock" crash for it


print_unlock_imbalance_bug.isra.23+0x4/0x10
[ 261.453775] [<ffffffff810f7c65>] lock_release_non_nested.isra.39+0x2f5/0x300
[ 261.460900] [<ffffffff810f7cfe>] lock_release+0x8e/0x1f0
[ 261.466293] [<ffffffff81339030>] ? cfqg_prfill_service_level+0x60/0x60
[ 261.472894] [<ffffffff81005be3>] _raw_spin_unlock_irq+0x23/0x50
[ 261.478894] [<ffffffff8133559f>] blkcg_print_blkgs+0x8f/0x140
[ 261.484724] [<ffffffff81335515>] ? blkcg_print_blkgs+0x5/0x140
[ 261.490631] [<ffffffff81338a7f>] cfqg_print_weighted_queue_time+0x2f/0x40
[ 261.497489] [<ffffffff8110b793>] cgroup_seqfile_show+0x53/0x60
[ 261.503398] [<ffffffff811f1fe4>] seq_read+0x124/0x3a0
[ 261.508529] [<ffffffff811ce39d>] vfs_read+0xad/0x180
[ 261.513576] [<ffffffff811ce625>] SyS_read+0x55/0xa0
[ 261.518538] [<ffffffff81609f66>] cstar_dispatch+0x7/0x1f

blkcg_print_blkgs fails with double unlock? Hmm, I checked
cfqg_prfill_service_level and I did not find any places where unlock
can happen.

After some debugging I found that in blkcg_print_blkgs() spinlock
passed to spin_lock_irq() function differs from the object passed to
spin_unlock_irq just a few lines below. It means
request_queue->queue_lock spinlock has changed under the function feet
while it was executing!!!

To make sure I added

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -465,10 +465,16 @@ void blkcg_print_blkgs(struct seq_file *sf,
struct blkcg *blkcg,

        rcu_read_lock();
        hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) {
-               spin_lock_irq(blkg->q->queue_lock);
+               spinlock_t *lock = blkg->q->queue_lock;
+               spinlock_t *new_lock;
+               spin_lock_irq(lock);
                if (blkcg_policy_enabled(blkg->q, pol))
                        total += prfill(sf, blkg->pd[pol->plid], data);
-               spin_unlock_irq(blkg->q->queue_lock);
+               new_lock = blkg->q->queue_lock;
+               if (lock != new_lock) {
+                       pr_err("old lock %p %s  new lock %p %s\n",
lock, lock->dep_map.name, new_lock, new_lock->dep_map.name);
+               }
+               spin_unlock_irq(lock);
        }
        rcu_read_unlock();



And indeed it shows locks are different.


It comes from this change 777eb1bf1 "block: Free queue resources at
blk_release_queue()" that changes lock when devices is shutting down.

What would be the best fix for the issue?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-09-27  5:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-03 20:14 Race condition between "read CFQ stats" and "block device shutdown" Anatol Pomozov
     [not found] ` <CAOMFOmXJ5ZTYdOvdUt-oxsouhPGRmMshCRhn6AFgmFAGZw5WZA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-04  6:42   ` Hannes Reinecke
     [not found]     ` <5226D661.7070301-l3A5Bk7waGM@public.gmane.org>
2013-09-04 15:45       ` Anatol Pomozov
     [not found]         ` <CAOMFOmUCqXN1uaqBEWH3PStuZXvnvLw=YrARgv7DvqO6Y4bFPQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-04 16:07           ` Tejun Heo
     [not found]             ` <20130904160723.GC26609-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-09-25 20:37               ` Anatol Pomozov
2013-09-26 13:54                 ` Tejun Heo
2013-09-26 14:18                   ` Hannes Reinecke
2013-09-26 14:20                     ` Tejun Heo
     [not found]                     ` <5244423A.2050107-l3A5Bk7waGM@public.gmane.org>
2013-09-26 16:23                       ` Anatol Pomozov
2013-09-26 16:30                         ` Tejun Heo
     [not found]                         ` <CAOMFOmX2f35qWyTr7=1HNu=RMB_LMAmpMbYxSEsX1xgURhx_mg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-27  5:59                           ` Hannes Reinecke
2013-09-04 16:15       ` Anatol Pomozov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox