* blk_throtl_exit taking q->queue_lock is problematic @ 2011-02-16 7:31 NeilBrown 2011-02-16 15:53 ` Vivek Goyal 2011-02-17 20:00 ` Vivek Goyal 0 siblings, 2 replies; 17+ messages in thread From: NeilBrown @ 2011-02-16 7:31 UTC (permalink / raw) To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel Hi, I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev is finally released. This is a problem for because by that time the queue_lock doesn't exist any more. It is in a separate data structure controlled by the RAID personality and by the time that the block device is being destroyed the raid personality has shutdown and the data structure containing the lock has been freed. This has not been a problem before. Nothing else takes queue_lock after blk_cleanup_queue. I could of course set queue_lock to point to __queue_lock and initialise that, but it seems untidy and probably violates some locking requirements. Is there some way you could use some other lock - maybe a global lock, or maybe used __queue_lock directly ??? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-16 7:31 blk_throtl_exit taking q->queue_lock is problematic NeilBrown @ 2011-02-16 15:53 ` Vivek Goyal 2011-02-17 0:35 ` NeilBrown 2011-02-17 20:00 ` Vivek Goyal 1 sibling, 1 reply; 17+ messages in thread From: Vivek Goyal @ 2011-02-16 15:53 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-kernel On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote: > > > Hi, > > I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev > is finally released. > > This is a problem for because by that time the queue_lock doesn't exist any > more. It is in a separate data structure controlled by the RAID personality > and by the time that the block device is being destroyed the raid personality > has shutdown and the data structure containing the lock has been freed. > > This has not been a problem before. Nothing else takes queue_lock after > blk_cleanup_queue. I agree that this is a problem. blk_throtl_exit() needs queue lock to avoid other races with cgroup code and for avoiding races for its lists etc. > > I could of course set queue_lock to point to __queue_lock and initialise that, > but it seems untidy and probably violates some locking requirements. > > Is there some way you could use some other lock - maybe a global lock, or > maybe used __queue_lock directly ??? Initially I had put blk_throtl_exit() in blk_cleanup_queue() where it is known that ->queue_lock is still around. Due to a bug, Jens moved it to blk_release_queue(). I still think that blk_cleanup_queue() is a better place to call blk_throtl_exit(). I think following patch should solve the issue. This patch is also not completely race free. I was thinking that can we get rid of throtl_shutdown_timer_wq() call in blk_sync_queue(). IOW, in what circumstances blk_sync_queue() is used. Thanks Vivek o Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is written in such a way that it needs queue lock. In blk_release_queue() there is no gurantee that ->queue_lock is still around. o Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported one problem. https://lkml.org/lkml/2010/10/23/86 And a quick fix moved blk_throtl_exit() to blk_release_queue(). commit 7ad58c028652753814054f4e3ac58f925e7343f4 Author: Jens Axboe <jaxboe@fusionio.com> Date: Sat Oct 23 20:40:26 2010 +0200 block: fix use-after-free bug in blk throttle code o This patch reverts above change and instead checks for q->td in throtl_shutdown_timer_wq(). o This is also not completely race free as check for q->td is without spinlock and we can't take spinlock here as it is called from blk_release_queue->blk_sync_queue() where ->queue_lock might have gone away. o So the question is should we really call throtl_shutdown_timer_wq() from blk_sync_queue(). It might not make much sense because there might be queued bios in throttling logic. The only way to cleanup all bios and cancel all async activity is blk_throtl_exit(). I also don't see it being called to cancel async activity for CFQ. Who makes sure that async activity is cancelled. IOW, I am wondering in what circumstances blk_sync_queue() is called and is it required to call throtl_shutdown_timer_wq() from blk_sync_queue(). If we can get rid of it, then we have taken care of all the races, AFAIK. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- block/blk-core.c | 2 ++ block/blk-sysfs.c | 2 -- block/blk-throttle.c | 6 ++++++ 3 files changed, 8 insertions(+), 2 deletions(-) Index: linux-2.6/block/blk-core.c =================================================================== --- linux-2.6.orig/block/blk-core.c 2011-02-14 17:43:06.000000000 -0500 +++ linux-2.6/block/blk-core.c 2011-02-16 10:11:58.910022185 -0500 @@ -474,6 +474,8 @@ void blk_cleanup_queue(struct request_qu if (q->elevator) elevator_exit(q->elevator); + blk_throtl_exit(q); + blk_put_queue(q); } EXPORT_SYMBOL(blk_cleanup_queue); Index: linux-2.6/block/blk-sysfs.c =================================================================== --- linux-2.6.orig/block/blk-sysfs.c 2011-02-11 09:25:16.000000000 -0500 +++ linux-2.6/block/blk-sysfs.c 2011-02-16 10:12:16.379762988 -0500 @@ -471,8 +471,6 @@ static void blk_release_queue(struct kob blk_sync_queue(q); - blk_throtl_exit(q); - if (rl->rq_pool) mempool_destroy(rl->rq_pool); Index: linux-2.6/block/blk-throttle.c =================================================================== --- linux-2.6.orig/block/blk-throttle.c 2011-02-16 10:08:12.000000000 -0500 +++ linux-2.6/block/blk-throttle.c 2011-02-16 10:45:18.006119406 -0500 @@ -961,6 +961,9 @@ void throtl_shutdown_timer_wq(struct req { struct throtl_data *td = q->td; + if (!td) + return; + cancel_delayed_work_sync(&td->throtl_work); } @@ -1122,6 +1125,9 @@ void blk_throtl_exit(struct request_queu * it. */ throtl_shutdown_timer_wq(q); + + /* Decouple throtl data from queue. */ + q->td = NULL; throtl_td_free(td); } ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-16 15:53 ` Vivek Goyal @ 2011-02-17 0:35 ` NeilBrown 2011-02-17 1:10 ` Vivek Goyal 0 siblings, 1 reply; 17+ messages in thread From: NeilBrown @ 2011-02-17 0:35 UTC (permalink / raw) To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel On Wed, 16 Feb 2011 10:53:05 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote: > > > > > > Hi, > > > > I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev > > is finally released. > > > > This is a problem for because by that time the queue_lock doesn't exist any > > more. It is in a separate data structure controlled by the RAID personality > > and by the time that the block device is being destroyed the raid personality > > has shutdown and the data structure containing the lock has been freed. > > > > This has not been a problem before. Nothing else takes queue_lock after > > blk_cleanup_queue. > > I agree that this is a problem. blk_throtl_exit() needs queue lock to > avoid other races with cgroup code and for avoiding races for its > lists etc. > > > > > I could of course set queue_lock to point to __queue_lock and initialise that, > > but it seems untidy and probably violates some locking requirements. > > > > Is there some way you could use some other lock - maybe a global lock, or > > maybe used __queue_lock directly ??? > > Initially I had put blk_throtl_exit() in blk_cleanup_queue() where it is > known that ->queue_lock is still around. Due to a bug, Jens moved it > to blk_release_queue(). I still think that blk_cleanup_queue() is a better > place to call blk_throtl_exit(). Why do you say that it is known that ->queue_lock is still around in blk_cleanup_queue? In md it isn't. :-( Is there some (other) reason that it needs to be? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-17 0:35 ` NeilBrown @ 2011-02-17 1:10 ` Vivek Goyal 2011-02-17 5:55 ` NeilBrown 0 siblings, 1 reply; 17+ messages in thread From: Vivek Goyal @ 2011-02-17 1:10 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-kernel On Thu, Feb 17, 2011 at 11:35:36AM +1100, NeilBrown wrote: > On Wed, 16 Feb 2011 10:53:05 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > > On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote: > > > > > > > > > Hi, > > > > > > I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev > > > is finally released. > > > > > > This is a problem for because by that time the queue_lock doesn't exist any > > > more. It is in a separate data structure controlled by the RAID personality > > > and by the time that the block device is being destroyed the raid personality > > > has shutdown and the data structure containing the lock has been freed. > > > > > > This has not been a problem before. Nothing else takes queue_lock after > > > blk_cleanup_queue. > > > > I agree that this is a problem. blk_throtl_exit() needs queue lock to > > avoid other races with cgroup code and for avoiding races for its > > lists etc. > > > > > > > > I could of course set queue_lock to point to __queue_lock and initialise that, > > > but it seems untidy and probably violates some locking requirements. > > > > > > Is there some way you could use some other lock - maybe a global lock, or > > > maybe used __queue_lock directly ??? > > > > Initially I had put blk_throtl_exit() in blk_cleanup_queue() where it is > > known that ->queue_lock is still around. Due to a bug, Jens moved it > > to blk_release_queue(). I still think that blk_cleanup_queue() is a better > > place to call blk_throtl_exit(). > > Why do you say that it is known that ->queue_lock is still around in > blk_cleanup_queue? In md it isn't. :-( > Is there some (other) reason that it needs to be? I think this is only true for devices having an elevator because elevator_exit() will call cfq_exit_queue() and take queue lock. So request based multipath devices should have it initialized till now. But yes, for devices not running an elevator, there does not seem to be any other component requiring queue lock to be still there. Like elevator, throttling logic has data structures which need to be cleaned up if driver decides to cleanup the queue. What is that point till when we can use the queue->lock safely? If driver is providing the spinlock embedded in a structue, then it would make logical sense to call back the queue at some point of time and say that spinlock is going away and cleanup any dependencies. I thought blk_cleanup_queue() will be that call but looks like it is not true for all cases. So is it possible to keep the spinlock intact when md is calling up blk_cleanup_queue()? Thanks Vivek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-17 1:10 ` Vivek Goyal @ 2011-02-17 5:55 ` NeilBrown 2011-02-17 15:01 ` Vivek Goyal 2011-02-17 16:59 ` Vivek Goyal 0 siblings, 2 replies; 17+ messages in thread From: NeilBrown @ 2011-02-17 5:55 UTC (permalink / raw) To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > So is it possible to keep the spinlock intact when md is calling up > blk_cleanup_queue()? > It would be possible, yes - but messy. I would probably end up just making ->queue_lock always point to __queue_lock, and then only take it at the places where I call 'block' code which wants to test to see if it is currently held (like the plugging routines). The queue lock (and most of the request queue) is simply irrelevant for md. I would prefer to get away from having to touch it at all... I'll see how messy it would be to stop using it completely and it can just be __queue_lock. Though for me - it would be much easier if you just used __queue_lock ..... NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-17 5:55 ` NeilBrown @ 2011-02-17 15:01 ` Vivek Goyal 2011-02-17 16:59 ` Vivek Goyal 1 sibling, 0 replies; 17+ messages in thread From: Vivek Goyal @ 2011-02-17 15:01 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-kernel On Thu, Feb 17, 2011 at 04:55:01PM +1100, NeilBrown wrote: > On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > > So is it possible to keep the spinlock intact when md is calling up > > blk_cleanup_queue()? > > > > It would be possible, yes - but messy. I would probably end up just making > ->queue_lock always point to __queue_lock, and then only take it at the > places where I call 'block' code which wants to test to see if it is > currently held (like the plugging routines). > > The queue lock (and most of the request queue) is simply irrelevant for md. > I would prefer to get away from having to touch it at all... > > I'll see how messy it would be to stop using it completely and it can just be > __queue_lock. > > Though for me - it would be much easier if you just used __queue_lock ..... Ok, Thinking more about it, How about introducing another spin lock in request queue, say q->throtl_lock. I seem to be using queue lock for synchronization between blk-cgroup and throttling code and also for integrity of throttling data structures when multiple threads are operating. Looks like I might be able to get away using throtl_lock and not rely on queue_lock at all. Want to avoid using __queue_lock for two reasons. - It is not clear when we are actually sharing the lock with the driver and when we are not. In future if there are interaction with other code on request queue, then it will be confusing. If we introduce throtl_lock, then for additional queue synchronization, one can take q->queue_lock also. - Splitting the queue lock should probably be good as it will reduce the contention on q->queue_lock. If it is benefecial, then we should not use __queue_lock to actually reduce the contention. I will do this change and see whether it works or not. Thanks Vivek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-17 5:55 ` NeilBrown 2011-02-17 15:01 ` Vivek Goyal @ 2011-02-17 16:59 ` Vivek Goyal 2011-02-18 2:40 ` NeilBrown 1 sibling, 1 reply; 17+ messages in thread From: Vivek Goyal @ 2011-02-17 16:59 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-kernel On Thu, Feb 17, 2011 at 04:55:01PM +1100, NeilBrown wrote: > On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > > So is it possible to keep the spinlock intact when md is calling up > > blk_cleanup_queue()? > > > > It would be possible, yes - but messy. I would probably end up just making > ->queue_lock always point to __queue_lock, and then only take it at the > places where I call 'block' code which wants to test to see if it is > currently held (like the plugging routines). > > The queue lock (and most of the request queue) is simply irrelevant for md. > I would prefer to get away from having to touch it at all... If queue lock is mostly irrelevant for md, then why md should provide an external lock and not use queue's internal spin lock? > > I'll see how messy it would be to stop using it completely and it can just be > __queue_lock. > > Though for me - it would be much easier if you just used __queue_lock ..... Ok, here is the simple patch which splits the queue lock and uses throtl_lock for throttling logic. I booted and it seems to be working. Having said that, this now introduces the possibility of races for any services I take from request queue. I need to see if I need to take queue lock and that makes me little uncomfortable. I am using kblockd_workqueue to queue throtl work. Looks like I don't need queue lock for that. I am also using block tracing infrastructure and my understanding is that I don't need queue lock for that as well. So if we do this change for performance reasons, it still makes sense but doing this change because md provided a q->queue_lock and took away that lock without notifying block layer hence we do this change, is still not the right reason, IMHO. Thanks Vivek Yet-to-be-Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- block/blk-core.c | 1 + block/blk-throttle.c | 16 ++++++++-------- include/linux/blkdev.h | 3 +++ 3 files changed, 12 insertions(+), 8 deletions(-) Index: linux-2.6/include/linux/blkdev.h =================================================================== --- linux-2.6.orig/include/linux/blkdev.h 2011-02-14 17:43:07.000000000 -0500 +++ linux-2.6/include/linux/blkdev.h 2011-02-17 10:08:35.400922999 -0500 @@ -320,6 +320,9 @@ struct request_queue spinlock_t __queue_lock; spinlock_t *queue_lock; + /* Throttling lock. Protectects throttling data structrues on queue */ + spinlock_t throtl_lock; + /* * queue kobject */ Index: linux-2.6/block/blk-core.c =================================================================== --- linux-2.6.orig/block/blk-core.c 2011-02-16 13:07:30.000000000 -0500 +++ linux-2.6/block/blk-core.c 2011-02-17 10:09:44.236884133 -0500 @@ -547,6 +547,7 @@ struct request_queue *blk_alloc_queue_no mutex_init(&q->sysfs_lock); spin_lock_init(&q->__queue_lock); + spin_lock_init(&q->throtl_lock); return q; } Index: linux-2.6/block/blk-throttle.c =================================================================== --- linux-2.6.orig/block/blk-throttle.c 2011-02-17 10:01:47.000000000 -0500 +++ linux-2.6/block/blk-throttle.c 2011-02-17 10:12:51.949128895 -0500 @@ -763,7 +763,7 @@ static int throtl_dispatch(struct reques struct bio_list bio_list_on_stack; struct bio *bio; - spin_lock_irq(q->queue_lock); + spin_lock_irq(&q->throtl_lock); throtl_process_limit_change(td); @@ -783,7 +783,7 @@ static int throtl_dispatch(struct reques throtl_schedule_next_dispatch(td); out: - spin_unlock_irq(q->queue_lock); + spin_unlock_irq(&q->throtl_lock); /* * If we dispatched some requests, unplug the queue to make sure @@ -882,9 +882,9 @@ void throtl_unlink_blkio_group(void *key unsigned long flags; struct throtl_data *td = key; - spin_lock_irqsave(td->queue->queue_lock, flags); + spin_lock_irqsave(&td->queue->throtl_lock, flags); throtl_destroy_tg(td, tg_of_blkg(blkg)); - spin_unlock_irqrestore(td->queue->queue_lock, flags); + spin_unlock_irqrestore(&td->queue->throtl_lock, flags); } /* @@ -991,7 +991,7 @@ int blk_throtl_bio(struct request_queue return 0; } - spin_lock_irq(q->queue_lock); + spin_lock_irq(&q->throtl_lock); tg = throtl_get_tg(td); if (tg->nr_queued[rw]) { @@ -1032,7 +1032,7 @@ queue_bio: } out: - spin_unlock_irq(q->queue_lock); + spin_unlock_irq(&q->throtl_lock); return 0; } @@ -1093,14 +1093,14 @@ void blk_throtl_exit(struct request_queu throtl_shutdown_timer_wq(q); - spin_lock_irq(q->queue_lock); + spin_lock_irq(&q->throtl_lock); throtl_release_tgs(td); /* If there are other groups */ if (td->nr_undestroyed_grps > 0) wait = true; - spin_unlock_irq(q->queue_lock); + spin_unlock_irq(&q->throtl_lock); /* * Wait for tg->blkg->key accessors to exit their grace periods. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-17 16:59 ` Vivek Goyal @ 2011-02-18 2:40 ` NeilBrown 2011-02-18 3:19 ` Mike Snitzer 2011-02-18 15:05 ` Vivek Goyal 0 siblings, 2 replies; 17+ messages in thread From: NeilBrown @ 2011-02-18 2:40 UTC (permalink / raw) To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > On Thu, Feb 17, 2011 at 04:55:01PM +1100, NeilBrown wrote: > > On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > So is it possible to keep the spinlock intact when md is calling up > > > blk_cleanup_queue()? > > > > > > > It would be possible, yes - but messy. I would probably end up just making > > ->queue_lock always point to __queue_lock, and then only take it at the > > places where I call 'block' code which wants to test to see if it is > > currently held (like the plugging routines). > > > > The queue lock (and most of the request queue) is simply irrelevant for md. > > I would prefer to get away from having to touch it at all... > > If queue lock is mostly irrelevant for md, then why md should provide an > external lock and not use queue's internal spin lock? See other email - historical reasons mostly. > > > > > I'll see how messy it would be to stop using it completely and it can just be > > __queue_lock. > > > > Though for me - it would be much easier if you just used __queue_lock ..... > > Ok, here is the simple patch which splits the queue lock and uses > throtl_lock for throttling logic. I booted and it seems to be working. > > Having said that, this now introduces the possibility of races for any > services I take from request queue. I need to see if I need to take > queue lock and that makes me little uncomfortable. > > I am using kblockd_workqueue to queue throtl work. Looks like I don't > need queue lock for that. I am also using block tracing infrastructure > and my understanding is that I don't need queue lock for that as well. > > So if we do this change for performance reasons, it still makes sense > but doing this change because md provided a q->queue_lock and took away that > lock without notifying block layer hence we do this change, is still not > the right reason, IMHO. Well...I like that patch, as it makes my life easier.... But I agree that md is doing something wrong. Now that ->queue_lock is always initialised, it is wrong to leave it in a state where it not defined. So maybe I'll apply this (after testing it a bit. The only reason for taking the lock queue_lock in a couple of places is to silence some warnings. Thanks, NeilBrown diff --git a/drivers/md/linear.c b/drivers/md/linear.c index 8a2f767..0ed7f6b 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -216,7 +216,6 @@ static int linear_run (mddev_t *mddev) if (md_check_no_bitmap(mddev)) return -EINVAL; - mddev->queue->queue_lock = &mddev->queue->__queue_lock; conf = linear_conf(mddev, mddev->raid_disks); if (!conf) diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index 6d7ddf3..3a62d44 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -435,7 +435,6 @@ static int multipath_run (mddev_t *mddev) * bookkeeping area. [whatever we allocate in multipath_run(), * should be freed in multipath_stop()] */ - mddev->queue->queue_lock = &mddev->queue->__queue_lock; conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL); mddev->private = conf; diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 75671df..c0ac457 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -361,7 +361,6 @@ static int raid0_run(mddev_t *mddev) if (md_check_no_bitmap(mddev)) return -EINVAL; blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors); - mddev->queue->queue_lock = &mddev->queue->__queue_lock; /* if private is not null, we are here after takeover */ if (mddev->private == NULL) { diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index a23ffa3..909282d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -593,7 +593,9 @@ static int flush_pending_writes(conf_t *conf) if (conf->pending_bio_list.head) { struct bio *bio; bio = bio_list_get(&conf->pending_bio_list); + spin_lock(conf->mddev->queue->queue_lock); blk_remove_plug(conf->mddev->queue); + spin_unlock(conf->mddev->queue->queue_lock); spin_unlock_irq(&conf->device_lock); /* flush any pending bitmap writes to * disk before proceeding w/ I/O */ @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio) atomic_inc(&r1_bio->remaining); spin_lock_irqsave(&conf->device_lock, flags); bio_list_add(&conf->pending_bio_list, mbio); + spin_lock(mddev->queue->queue_lock); blk_plug_device(mddev->queue); + spin_unlock(mddev->queue->queue_lock); spin_unlock_irqrestore(&conf->device_lock, flags); } r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); @@ -2021,7 +2025,6 @@ static int run(mddev_t *mddev) if (IS_ERR(conf)) return PTR_ERR(conf); - mddev->queue->queue_lock = &conf->device_lock; list_for_each_entry(rdev, &mddev->disks, same_set) { disk_stack_limits(mddev->gendisk, rdev->bdev, rdev->data_offset << 9); diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 3b607b2..60e6cb1 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -662,7 +662,9 @@ static int flush_pending_writes(conf_t *conf) if (conf->pending_bio_list.head) { struct bio *bio; bio = bio_list_get(&conf->pending_bio_list); + spin_lock(conf->mddev->queue->queue_lock); blk_remove_plug(conf->mddev->queue); + spin_unlock(conf->mddev->queue->queue_lock); spin_unlock_irq(&conf->device_lock); /* flush any pending bitmap writes to disk * before proceeding w/ I/O */ @@ -970,8 +972,10 @@ static int make_request(mddev_t *mddev, struct bio * bio) atomic_inc(&r10_bio->remaining); spin_lock_irqsave(&conf->device_lock, flags); + spin_lock(mddev->queue->queue_lock); bio_list_add(&conf->pending_bio_list, mbio); blk_plug_device(mddev->queue); + spin_unlock(mddev->queue->queue_lock); spin_unlock_irqrestore(&conf->device_lock, flags); } @@ -2304,8 +2308,6 @@ static int run(mddev_t *mddev) if (!conf) goto out; - mddev->queue->queue_lock = &conf->device_lock; - mddev->thread = conf->thread; conf->thread = NULL; diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 7028128..78536fd 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -5204,7 +5204,6 @@ static int run(mddev_t *mddev) mddev->queue->backing_dev_info.congested_data = mddev; mddev->queue->backing_dev_info.congested_fn = raid5_congested; - mddev->queue->queue_lock = &conf->device_lock; mddev->queue->unplug_fn = raid5_unplug_queue; chunk_size = mddev->chunk_sectors << 9; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-18 2:40 ` NeilBrown @ 2011-02-18 3:19 ` Mike Snitzer 2011-02-18 3:33 ` NeilBrown 2011-02-18 15:05 ` Vivek Goyal 1 sibling, 1 reply; 17+ messages in thread From: Mike Snitzer @ 2011-02-18 3:19 UTC (permalink / raw) To: NeilBrown; +Cc: Vivek Goyal, Jens Axboe, linux-kernel On Thu, Feb 17, 2011 at 9:40 PM, NeilBrown <neilb@suse.de> wrote: > On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: >> So if we do this change for performance reasons, it still makes sense >> but doing this change because md provided a q->queue_lock and took away that >> lock without notifying block layer hence we do this change, is still not >> the right reason, IMHO. > > Well...I like that patch, as it makes my life easier.... > > But I agree that md is doing something wrong. Now that ->queue_lock is > always initialised, it is wrong to leave it in a state where it not defined. > > So maybe I'll apply this (after testing it a bit. The only reason for taking > the lock queue_lock in a couple of places is to silence some warnings. > > Thanks, > NeilBrown > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index a23ffa3..909282d 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio) > atomic_inc(&r1_bio->remaining); > spin_lock_irqsave(&conf->device_lock, flags); > bio_list_add(&conf->pending_bio_list, mbio); > + spin_lock(mddev->queue->queue_lock); > blk_plug_device(mddev->queue); > + spin_unlock(mddev->queue->queue_lock); > spin_unlock_irqrestore(&conf->device_lock, flags); > } > r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); Noticed an inconsistency, raid10.c's additional locking also protects the bio_list_add() whereas raid1.c's doesn't. Seems the additional protection in raid10 isn't needed? > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index 3b607b2..60e6cb1 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -970,8 +972,10 @@ static int make_request(mddev_t *mddev, struct bio * bio) > > atomic_inc(&r10_bio->remaining); > spin_lock_irqsave(&conf->device_lock, flags); > + spin_lock(mddev->queue->queue_lock); > bio_list_add(&conf->pending_bio_list, mbio); > blk_plug_device(mddev->queue); > + spin_unlock(mddev->queue->queue_lock); > spin_unlock_irqrestore(&conf->device_lock, flags); > } > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-18 3:19 ` Mike Snitzer @ 2011-02-18 3:33 ` NeilBrown 2011-02-18 14:04 ` Mike Snitzer 2011-02-18 15:04 ` Vivek Goyal 0 siblings, 2 replies; 17+ messages in thread From: NeilBrown @ 2011-02-18 3:33 UTC (permalink / raw) To: Mike Snitzer; +Cc: Vivek Goyal, Jens Axboe, linux-kernel On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote: > On Thu, Feb 17, 2011 at 9:40 PM, NeilBrown <neilb@suse.de> wrote: > > On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > >> So if we do this change for performance reasons, it still makes sense > >> but doing this change because md provided a q->queue_lock and took away that > >> lock without notifying block layer hence we do this change, is still not > >> the right reason, IMHO. > > > > Well...I like that patch, as it makes my life easier.... > > > > But I agree that md is doing something wrong. Now that ->queue_lock is > > always initialised, it is wrong to leave it in a state where it not defined. > > > > So maybe I'll apply this (after testing it a bit. The only reason for taking > > the lock queue_lock in a couple of places is to silence some warnings. > > > > Thanks, > > NeilBrown > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > index a23ffa3..909282d 100644 > > --- a/drivers/md/raid1.c > > +++ b/drivers/md/raid1.c > > @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio) > > atomic_inc(&r1_bio->remaining); > > spin_lock_irqsave(&conf->device_lock, flags); > > bio_list_add(&conf->pending_bio_list, mbio); > > + spin_lock(mddev->queue->queue_lock); > > blk_plug_device(mddev->queue); > > + spin_unlock(mddev->queue->queue_lock); > > spin_unlock_irqrestore(&conf->device_lock, flags); > > } > > r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); > > Noticed an inconsistency, raid10.c's additional locking also protects > the bio_list_add() whereas raid1.c's doesn't. Seems the additional > protection in raid10 isn't needed? Correct - not needed at all. I put it there because it felt a little cleaner keeping the two 'lock's together like the two 'unlock's. Probably confusing though... My other though is to stop using the block-layer plugging altogether like I have in RAID5 (Which I needed to do to make it work with DM). Then I wouldn't need to touch queue_lock at all - very tempting. Thanks for the review. NeilBrown > > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > > index 3b607b2..60e6cb1 100644 > > --- a/drivers/md/raid10.c > > +++ b/drivers/md/raid10.c > > @@ -970,8 +972,10 @@ static int make_request(mddev_t *mddev, struct bio * bio) > > > > atomic_inc(&r10_bio->remaining); > > spin_lock_irqsave(&conf->device_lock, flags); > > + spin_lock(mddev->queue->queue_lock); > > bio_list_add(&conf->pending_bio_list, mbio); > > blk_plug_device(mddev->queue); > > + spin_unlock(mddev->queue->queue_lock); > > spin_unlock_irqrestore(&conf->device_lock, flags); > > } > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-18 3:33 ` NeilBrown @ 2011-02-18 14:04 ` Mike Snitzer 2011-02-18 15:04 ` Vivek Goyal 1 sibling, 0 replies; 17+ messages in thread From: Mike Snitzer @ 2011-02-18 14:04 UTC (permalink / raw) To: NeilBrown; +Cc: Vivek Goyal, Jens Axboe, linux-kernel On Thu, Feb 17 2011 at 10:33pm -0500, NeilBrown <neilb@suse.de> wrote: > On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote: > > > On Thu, Feb 17, 2011 at 9:40 PM, NeilBrown <neilb@suse.de> wrote: > > > On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > >> So if we do this change for performance reasons, it still makes sense > > >> but doing this change because md provided a q->queue_lock and took away that > > >> lock without notifying block layer hence we do this change, is still not > > >> the right reason, IMHO. > > > > > > Well...I like that patch, as it makes my life easier.... > > > > > > But I agree that md is doing something wrong. Now that ->queue_lock is > > > always initialised, it is wrong to leave it in a state where it not defined. > > > > > > So maybe I'll apply this (after testing it a bit. The only reason for taking > > > the lock queue_lock in a couple of places is to silence some warnings. > > > > > > Thanks, > > > NeilBrown > > > > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > index a23ffa3..909282d 100644 > > > --- a/drivers/md/raid1.c > > > +++ b/drivers/md/raid1.c > > > @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio) > > > atomic_inc(&r1_bio->remaining); > > > spin_lock_irqsave(&conf->device_lock, flags); > > > bio_list_add(&conf->pending_bio_list, mbio); > > > + spin_lock(mddev->queue->queue_lock); > > > blk_plug_device(mddev->queue); > > > + spin_unlock(mddev->queue->queue_lock); > > > spin_unlock_irqrestore(&conf->device_lock, flags); > > > } > > > r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); > > > > Noticed an inconsistency, raid10.c's additional locking also protects > > the bio_list_add() whereas raid1.c's doesn't. Seems the additional > > protection in raid10 isn't needed? > > Correct - not needed at all. > I put it there because it felt a little cleaner keeping the two 'lock's > together like the two 'unlock's. Probably confusing though... > > My other though is to stop using the block-layer plugging altogether like I > have in RAID5 (Which I needed to do to make it work with DM). Then I > wouldn't need to touch queue_lock at all - very tempting. FYI, Jens has a considerable plugging overhaul staged in his tree for 2.6.39: http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-2.6.39/stack-plug So if you do ween MD off of the block-layer's plugging Jens will need to adapt his patches that touch MD.. not a big deal. Mike ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-18 3:33 ` NeilBrown 2011-02-18 14:04 ` Mike Snitzer @ 2011-02-18 15:04 ` Vivek Goyal 2011-02-21 7:24 ` NeilBrown 1 sibling, 1 reply; 17+ messages in thread From: Vivek Goyal @ 2011-02-18 15:04 UTC (permalink / raw) To: NeilBrown; +Cc: Mike Snitzer, Jens Axboe, linux-kernel On Fri, Feb 18, 2011 at 02:33:25PM +1100, NeilBrown wrote: > On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote: > > > On Thu, Feb 17, 2011 at 9:40 PM, NeilBrown <neilb@suse.de> wrote: > > > On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > >> So if we do this change for performance reasons, it still makes sense > > >> but doing this change because md provided a q->queue_lock and took away that > > >> lock without notifying block layer hence we do this change, is still not > > >> the right reason, IMHO. > > > > > > Well...I like that patch, as it makes my life easier.... > > > > > > But I agree that md is doing something wrong. Now that ->queue_lock is > > > always initialised, it is wrong to leave it in a state where it not defined. > > > > > > So maybe I'll apply this (after testing it a bit. The only reason for taking > > > the lock queue_lock in a couple of places is to silence some warnings. > > > > > > Thanks, > > > NeilBrown > > > > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > > index a23ffa3..909282d 100644 > > > --- a/drivers/md/raid1.c > > > +++ b/drivers/md/raid1.c > > > @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio) > > > atomic_inc(&r1_bio->remaining); > > > spin_lock_irqsave(&conf->device_lock, flags); > > > bio_list_add(&conf->pending_bio_list, mbio); > > > + spin_lock(mddev->queue->queue_lock); > > > blk_plug_device(mddev->queue); > > > + spin_unlock(mddev->queue->queue_lock); > > > spin_unlock_irqrestore(&conf->device_lock, flags); > > > } > > > r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); > > > > Noticed an inconsistency, raid10.c's additional locking also protects > > the bio_list_add() whereas raid1.c's doesn't. Seems the additional > > protection in raid10 isn't needed? > > Correct - not needed at all. > I put it there because it felt a little cleaner keeping the two 'lock's > together like the two 'unlock's. Probably confusing though... I guess you could use blk_plug_device_unlocked() to get rid of ugliness and this routine will take care of taking queue lock. Thanks Vivek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-18 15:04 ` Vivek Goyal @ 2011-02-21 7:24 ` NeilBrown 2011-02-21 14:42 ` Vivek Goyal 0 siblings, 1 reply; 17+ messages in thread From: NeilBrown @ 2011-02-21 7:24 UTC (permalink / raw) To: Vivek Goyal; +Cc: Mike Snitzer, Jens Axboe, linux-kernel On Fri, 18 Feb 2011 10:04:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > On Fri, Feb 18, 2011 at 02:33:25PM +1100, NeilBrown wrote: > > On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote: > > > Noticed an inconsistency, raid10.c's additional locking also protects > > > the bio_list_add() whereas raid1.c's doesn't. Seems the additional > > > protection in raid10 isn't needed? > > > > Correct - not needed at all. > > I put it there because it felt a little cleaner keeping the two 'lock's > > together like the two 'unlock's. Probably confusing though... > > I guess you could use blk_plug_device_unlocked() to get rid of ugliness > and this routine will take care of taking queue lock. > Yep, that gets rid of some ugliness. I've made that change and will submit it in due course. So blk_throtl doesn't need any change to avoid the problem with md - that changes are made in md instead. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-21 7:24 ` NeilBrown @ 2011-02-21 14:42 ` Vivek Goyal 0 siblings, 0 replies; 17+ messages in thread From: Vivek Goyal @ 2011-02-21 14:42 UTC (permalink / raw) To: NeilBrown; +Cc: Mike Snitzer, Jens Axboe, linux-kernel On Mon, Feb 21, 2011 at 06:24:19PM +1100, NeilBrown wrote: > On Fri, 18 Feb 2011 10:04:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > > On Fri, Feb 18, 2011 at 02:33:25PM +1100, NeilBrown wrote: > > > On Thu, 17 Feb 2011 22:19:52 -0500 Mike Snitzer <snitzer@redhat.com> wrote: > > > > > Noticed an inconsistency, raid10.c's additional locking also protects > > > > the bio_list_add() whereas raid1.c's doesn't. Seems the additional > > > > protection in raid10 isn't needed? > > > > > > Correct - not needed at all. > > > I put it there because it felt a little cleaner keeping the two 'lock's > > > together like the two 'unlock's. Probably confusing though... > > > > I guess you could use blk_plug_device_unlocked() to get rid of ugliness > > and this routine will take care of taking queue lock. > > > > Yep, that gets rid of some ugliness. > I've made that change and will submit it in due course. > So blk_throtl doesn't need any change to avoid the problem with md - that > changes are made in md instead. Thanks Neil. I might still end up moving blk_throtl_exit() to blk_cleanup_queue() once I have sorted out blk_sync_queue(). Because at the end of blk_cleanup_queue() driver is free to release the spin lock and there are no gurantees that in blk_release_queue() lock is still there. Thanks Vivek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-18 2:40 ` NeilBrown 2011-02-18 3:19 ` Mike Snitzer @ 2011-02-18 15:05 ` Vivek Goyal 1 sibling, 0 replies; 17+ messages in thread From: Vivek Goyal @ 2011-02-18 15:05 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-kernel On Fri, Feb 18, 2011 at 01:40:25PM +1100, NeilBrown wrote: > On Thu, 17 Feb 2011 11:59:06 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > > On Thu, Feb 17, 2011 at 04:55:01PM +1100, NeilBrown wrote: > > > On Wed, 16 Feb 2011 20:10:29 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > > > > > > > So is it possible to keep the spinlock intact when md is calling up > > > > blk_cleanup_queue()? > > > > > > > > > > It would be possible, yes - but messy. I would probably end up just making > > > ->queue_lock always point to __queue_lock, and then only take it at the > > > places where I call 'block' code which wants to test to see if it is > > > currently held (like the plugging routines). > > > > > > The queue lock (and most of the request queue) is simply irrelevant for md. > > > I would prefer to get away from having to touch it at all... > > > > If queue lock is mostly irrelevant for md, then why md should provide an > > external lock and not use queue's internal spin lock? > > See other email - historical reasons mostly. > > > > > > > > > I'll see how messy it would be to stop using it completely and it can just be > > > __queue_lock. > > > > > > Though for me - it would be much easier if you just used __queue_lock ..... > > > > Ok, here is the simple patch which splits the queue lock and uses > > throtl_lock for throttling logic. I booted and it seems to be working. > > > > Having said that, this now introduces the possibility of races for any > > services I take from request queue. I need to see if I need to take > > queue lock and that makes me little uncomfortable. > > > > I am using kblockd_workqueue to queue throtl work. Looks like I don't > > need queue lock for that. I am also using block tracing infrastructure > > and my understanding is that I don't need queue lock for that as well. > > > > So if we do this change for performance reasons, it still makes sense > > but doing this change because md provided a q->queue_lock and took away that > > lock without notifying block layer hence we do this change, is still not > > the right reason, IMHO. > > Well...I like that patch, as it makes my life easier.... > > But I agree that md is doing something wrong. Now that ->queue_lock is > always initialised, it is wrong to leave it in a state where it not defined. > > So maybe I'll apply this (after testing it a bit. The only reason for taking > the lock queue_lock in a couple of places is to silence some warnings. > > Thanks, > NeilBrown Thanks Neil. This looks much better in the sense that now we know that queue_lock is still valid at the blk_cleanup_queue() time and did not vanish unexpectdly. Thanks Vivek > > > diff --git a/drivers/md/linear.c b/drivers/md/linear.c > index 8a2f767..0ed7f6b 100644 > --- a/drivers/md/linear.c > +++ b/drivers/md/linear.c > @@ -216,7 +216,6 @@ static int linear_run (mddev_t *mddev) > > if (md_check_no_bitmap(mddev)) > return -EINVAL; > - mddev->queue->queue_lock = &mddev->queue->__queue_lock; > conf = linear_conf(mddev, mddev->raid_disks); > > if (!conf) > diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c > index 6d7ddf3..3a62d44 100644 > --- a/drivers/md/multipath.c > +++ b/drivers/md/multipath.c > @@ -435,7 +435,6 @@ static int multipath_run (mddev_t *mddev) > * bookkeeping area. [whatever we allocate in multipath_run(), > * should be freed in multipath_stop()] > */ > - mddev->queue->queue_lock = &mddev->queue->__queue_lock; > > conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL); > mddev->private = conf; > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c > index 75671df..c0ac457 100644 > --- a/drivers/md/raid0.c > +++ b/drivers/md/raid0.c > @@ -361,7 +361,6 @@ static int raid0_run(mddev_t *mddev) > if (md_check_no_bitmap(mddev)) > return -EINVAL; > blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors); > - mddev->queue->queue_lock = &mddev->queue->__queue_lock; > > /* if private is not null, we are here after takeover */ > if (mddev->private == NULL) { > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index a23ffa3..909282d 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -593,7 +593,9 @@ static int flush_pending_writes(conf_t *conf) > if (conf->pending_bio_list.head) { > struct bio *bio; > bio = bio_list_get(&conf->pending_bio_list); > + spin_lock(conf->mddev->queue->queue_lock); > blk_remove_plug(conf->mddev->queue); > + spin_unlock(conf->mddev->queue->queue_lock); > spin_unlock_irq(&conf->device_lock); > /* flush any pending bitmap writes to > * disk before proceeding w/ I/O */ > @@ -959,7 +961,9 @@ static int make_request(mddev_t *mddev, struct bio * bio) > atomic_inc(&r1_bio->remaining); > spin_lock_irqsave(&conf->device_lock, flags); > bio_list_add(&conf->pending_bio_list, mbio); > + spin_lock(mddev->queue->queue_lock); > blk_plug_device(mddev->queue); > + spin_unlock(mddev->queue->queue_lock); > spin_unlock_irqrestore(&conf->device_lock, flags); > } > r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL); > @@ -2021,7 +2025,6 @@ static int run(mddev_t *mddev) > if (IS_ERR(conf)) > return PTR_ERR(conf); > > - mddev->queue->queue_lock = &conf->device_lock; > list_for_each_entry(rdev, &mddev->disks, same_set) { > disk_stack_limits(mddev->gendisk, rdev->bdev, > rdev->data_offset << 9); > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index 3b607b2..60e6cb1 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -662,7 +662,9 @@ static int flush_pending_writes(conf_t *conf) > if (conf->pending_bio_list.head) { > struct bio *bio; > bio = bio_list_get(&conf->pending_bio_list); > + spin_lock(conf->mddev->queue->queue_lock); > blk_remove_plug(conf->mddev->queue); > + spin_unlock(conf->mddev->queue->queue_lock); > spin_unlock_irq(&conf->device_lock); > /* flush any pending bitmap writes to disk > * before proceeding w/ I/O */ > @@ -970,8 +972,10 @@ static int make_request(mddev_t *mddev, struct bio * bio) > > atomic_inc(&r10_bio->remaining); > spin_lock_irqsave(&conf->device_lock, flags); > + spin_lock(mddev->queue->queue_lock); > bio_list_add(&conf->pending_bio_list, mbio); > blk_plug_device(mddev->queue); > + spin_unlock(mddev->queue->queue_lock); > spin_unlock_irqrestore(&conf->device_lock, flags); > } > > @@ -2304,8 +2308,6 @@ static int run(mddev_t *mddev) > if (!conf) > goto out; > > - mddev->queue->queue_lock = &conf->device_lock; > - > mddev->thread = conf->thread; > conf->thread = NULL; > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 7028128..78536fd 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -5204,7 +5204,6 @@ static int run(mddev_t *mddev) > > mddev->queue->backing_dev_info.congested_data = mddev; > mddev->queue->backing_dev_info.congested_fn = raid5_congested; > - mddev->queue->queue_lock = &conf->device_lock; > mddev->queue->unplug_fn = raid5_unplug_queue; > > chunk_size = mddev->chunk_sectors << 9; ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-16 7:31 blk_throtl_exit taking q->queue_lock is problematic NeilBrown 2011-02-16 15:53 ` Vivek Goyal @ 2011-02-17 20:00 ` Vivek Goyal 2011-02-18 1:57 ` NeilBrown 1 sibling, 1 reply; 17+ messages in thread From: Vivek Goyal @ 2011-02-17 20:00 UTC (permalink / raw) To: NeilBrown; +Cc: Jens Axboe, linux-kernel On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote: > > > Hi, > > I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev > is finally released. > > This is a problem for because by that time the queue_lock doesn't exist any > more. It is in a separate data structure controlled by the RAID personality > and by the time that the block device is being destroyed the raid personality > has shutdown and the data structure containing the lock has been freed. Hi Neil, I am having a look at queue allocation in md and had few queries. I was looking at md_alloc(), where we do mddev->queue = blk_alloc_queue(GFP_KERNEL); blk_queue_make_request(mddev->queue, md_make_request); call to blk_queue_make_request() will make sure queue_lock is initiliazed to internal __queue_lock. Then I looked at raid0_run(), which is again setting the queue lock. mddev->queue->queue_lock = &mddev->queue->__queue_lock; I think this is redundant now as during md_alloc() we already did it. Similar seems to be the case for linear.c and multipath.c Following seem to be the cases where we overide the default lock. raid1.c, raid5.c, raid10.c I was going through the raid1.c, and I see that q->queue_lock has been initialized to &conf->deivce_lock. Can we do the reverse. Introduce spinlock pointer in conf and point it at queue->queue_lock? Anyway you mentioned that personality's data structure are freed before request queue is cleaned up, so there should not be any lifetime issues. Also I was wondering how does it help sharing the lock between request queue and some other data structures in driver. Thanks Vivek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: blk_throtl_exit taking q->queue_lock is problematic 2011-02-17 20:00 ` Vivek Goyal @ 2011-02-18 1:57 ` NeilBrown 0 siblings, 0 replies; 17+ messages in thread From: NeilBrown @ 2011-02-18 1:57 UTC (permalink / raw) To: Vivek Goyal; +Cc: Jens Axboe, linux-kernel On Thu, 17 Feb 2011 15:00:11 -0500 Vivek Goyal <vgoyal@redhat.com> wrote: > On Wed, Feb 16, 2011 at 06:31:14PM +1100, NeilBrown wrote: > > > > > > Hi, > > > > I recently discovered that blk_throtl_exit takes ->queue_lock when a blockdev > > is finally released. > > > > This is a problem for because by that time the queue_lock doesn't exist any > > more. It is in a separate data structure controlled by the RAID personality > > and by the time that the block device is being destroyed the raid personality > > has shutdown and the data structure containing the lock has been freed. > > Hi Neil, > > I am having a look at queue allocation in md and had few queries. > > I was looking at md_alloc(), where we do > > mddev->queue = blk_alloc_queue(GFP_KERNEL); > blk_queue_make_request(mddev->queue, md_make_request); > > call to blk_queue_make_request() will make sure queue_lock is initiliazed > to internal __queue_lock. That is a relatively recent change. commit a4e7d46407d in July 2009 by the look of it. > > Then I looked at raid0_run(), which is again setting the queue lock. > > mddev->queue->queue_lock = &mddev->queue->__queue_lock; > > I think this is redundant now as during md_alloc() we already did it. Yep, it is now redundant. > > Similar seems to be the case for linear.c and multipath.c > > Following seem to be the cases where we overide the default lock. > > raid1.c, raid5.c, raid10.c > > I was going through the raid1.c, and I see that q->queue_lock has > been initialized to &conf->deivce_lock. Can we do the reverse. Introduce > spinlock pointer in conf and point it at queue->queue_lock? Anyway you > mentioned that personality's data structure are freed before request > queue is cleaned up, so there should not be any lifetime issues. The reason it is this way is largely historical. I don't recall the details exactly but md data structures were always quite independent of request_queue. The only reason I set ->queue_lock at all is because some blk functions started checking that it was held ... something to do with plugging and stacking limits I think. See commit e7e72bf641b1 (may 2008) for more details. > > Also I was wondering how does it help sharing the lock between request > queue and some other data structures in driver. All it does is silence some warnings. NeilBrown > > Thanks > Vivek > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-02-21 14:42 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-16 7:31 blk_throtl_exit taking q->queue_lock is problematic NeilBrown 2011-02-16 15:53 ` Vivek Goyal 2011-02-17 0:35 ` NeilBrown 2011-02-17 1:10 ` Vivek Goyal 2011-02-17 5:55 ` NeilBrown 2011-02-17 15:01 ` Vivek Goyal 2011-02-17 16:59 ` Vivek Goyal 2011-02-18 2:40 ` NeilBrown 2011-02-18 3:19 ` Mike Snitzer 2011-02-18 3:33 ` NeilBrown 2011-02-18 14:04 ` Mike Snitzer 2011-02-18 15:04 ` Vivek Goyal 2011-02-21 7:24 ` NeilBrown 2011-02-21 14:42 ` Vivek Goyal 2011-02-18 15:05 ` Vivek Goyal 2011-02-17 20:00 ` Vivek Goyal 2011-02-18 1:57 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox