* Re: WARNING in 2.6.25-07422-gb66e1f1 [not found] ` <20080504183839.GN12774@kernel.dk> @ 2008-05-05 7:24 ` Neil Brown 2008-05-05 18:03 ` Dan Williams 2008-05-05 19:02 ` Jens Axboe 0 siblings, 2 replies; 11+ messages in thread From: Neil Brown @ 2008-05-05 7:24 UTC (permalink / raw) To: Jens Axboe Cc: Jacek Luczak, Prakash Punnoor, Linux Kernel list, Dan Williams, linux-raid On Sunday May 4, jens.axboe@oracle.com wrote: > On Sun, May 04 2008, Jacek Luczak wrote: > > Hi, > > > > I've CC:-ed few guys which may help. > > > > Prakash Punnoor pisze: > > > Hi, I got this on boot: > > > > > > usb 2-1.3: new full speed USB device using ehci_hcd and address 3 > > > usb 2-1.3: configuration #1 chosen from 1 choice > > > Clocksource tsc unstable (delta = -117343945 ns) > > > ------------[ cut here ]------------ > > > WARNING: at include/linux/blkdev.h:443 blk_remove_plug+0x7d/0x90() ... > > Looks like it caught a real bug there - unfortunately we have to check > for ->queue_lock here as well, if this is another stacked devices and > not the bottom device. Does this make the warning go away for you? > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 087eee0..958f26b 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -3264,6 +3264,8 @@ static void raid5_unplug_device(struct request_queue *q) > unsigned long flags; > > spin_lock_irqsave(&conf->device_lock, flags); > + if (q->queue_lock) > + spin_lock(q->queue_lock); > > if (blk_remove_plug(q)) { > conf->seq_flush++; > @@ -3271,6 +3273,8 @@ static void raid5_unplug_device(struct request_queue *q) > } > md_wakeup_thread(mddev->thread); > > + if (q->queue_lock) > + spin_unlock(q->queue_lock); > spin_unlock_irqrestore(&conf->device_lock, flags); > > unplug_slaves(mddev); > I suspect that will just cause more problems, as the 'q' for an md device never gets ->queue_lock initialised. I suspect the correct thing to do is set q->queue_lock = &conf->device_lock; at some stage, probably immediately after device_lock is initialised in 'run'. I was discussing this with Dan Williams starting http://marc.info/?l=linux-raid&m=120951839903995&w=4 though we don't have an agreed patch yet. I'm wondering why you mention the issues of stacked devices though. I don't see how it applies. Could you explain? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-05 7:24 ` WARNING in 2.6.25-07422-gb66e1f1 Neil Brown @ 2008-05-05 18:03 ` Dan Williams 2008-05-05 19:02 ` Jens Axboe 1 sibling, 0 replies; 11+ messages in thread From: Dan Williams @ 2008-05-05 18:03 UTC (permalink / raw) To: Neil Brown Cc: Jens Axboe, Jacek Luczak, Prakash Punnoor, Linux Kernel list, linux-raid On Mon, 2008-05-05 at 00:24 -0700, Neil Brown wrote: > On Sunday May 4, jens.axboe@oracle.com wrote: > > On Sun, May 04 2008, Jacek Luczak wrote: > > > Hi, > > > > > > I've CC:-ed few guys which may help. > > > > > > Prakash Punnoor pisze: > > > > Hi, I got this on boot: > > > > > > > > usb 2-1.3: new full speed USB device using ehci_hcd and address 3 > > > > usb 2-1.3: configuration #1 chosen from 1 choice > > > > Clocksource tsc unstable (delta = -117343945 ns) > > > > ------------[ cut here ]------------ > > > > WARNING: at include/linux/blkdev.h:443 blk_remove_plug+0x7d/0x90() > ... > > > > Looks like it caught a real bug there - unfortunately we have to check > > for ->queue_lock here as well, if this is another stacked devices and > > not the bottom device. Does this make the warning go away for you? > > [..] > I suspect that will just cause more problems, as the 'q' for an md > device never gets ->queue_lock initialised. > I suspect the correct thing to do is set > q->queue_lock = &conf->device_lock; > > at some stage, probably immediately after device_lock is initialised > in 'run'. > > I was discussing this with Dan Williams starting > http://marc.info/?l=linux-raid&m=120951839903995&w=4 > though we don't have an agreed patch yet. The patch below appears to work for the raid5 case, but I am encountering a new issue when testing linear arrays? raid0/1/10 are not triggering this issue. $ mdadm --create /dev/md0 /dev/loop[0-3] -n 4 -l linear mdadm: RUN_ARRAY failed: Invalid argument # huh? mdadm: stopped /dev/md0 $ cat /proc/mdstat Personalities : [raid0] [linear] unused devices: <none> $ mdadm --create /dev/md0 /dev/loop[0-3] -n 4 -l linear Segmentation fault [293399.915068] BUG: unable to handle kernel NULL pointer dereference at 00000000 [293399.931249] IP: [<c0441cfa>] find_usage_backwards+0x9c/0xb6 [293399.945735] *pde = 00000000 [293399.957323] Oops: 0000 [#1] SMP [293399.968978] Modules linked in: raid456 async_xor async_memcpy async_tx xor linear loop ipt_MASQUERADE iptable_nat nf_nat bridge rfcomm l2cap bluetooth ] [293400.093457] [293400.105809] Pid: 30652, comm: mdadm Not tainted (2.6.25-imsm #63) [293400.123339] EIP: 0060:[<c0441cfa>] EFLAGS: 00210046 CPU: 2 [293400.140261] EIP is at find_usage_backwards+0x9c/0xb6 [293400.156651] EAX: 00000002 EBX: 00000000 ECX: 00000001 EDX: 0000a9a8 [293400.174211] ESI: 00000000 EDI: d54f2400 EBP: d1db9ba8 ESP: d1db9b9c [293400.191645] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [293400.207967] Process mdadm (pid: 30652, ti=d1db9000 task=e0f28000 task.ti=d1db9000) [293400.216021] Stack: e0f284f0 e0f28000 00000004 d1db9bb8 c0441d2d e0f284f0 e0f28000 d1db9bd4 [293400.236094] c0442032 c06d1fed 00000010 00200246 e0f284f0 d54f2400 d1db9c24 c0442b63 [293400.256296] 0000025d 00000002 00000000 00000000 f72cd3ec 00000001 e0f28000 00000000 [293400.276699] Call Trace: [293400.302628] [<c0441d2d>] ? check_usage_backwards+0x19/0x3b [293400.320626] [<c0442032>] ? mark_lock+0x228/0x399 [293400.337629] [<c0442b63>] ? __lock_acquire+0x440/0xad5 [293400.355036] [<c04421e4>] ? mark_held_locks+0x41/0x5c [293400.372027] [<c0408124>] ? native_sched_clock+0x8d/0x9f [293400.389053] [<c04435a2>] ? lock_acquire+0x57/0x73 [293400.405617] [<f8cea220>] ? linear_conf+0xac/0x399 [linear] [293400.422874] [<c0628507>] ? _spin_lock+0x1c/0x49 [293400.439193] [<f8cea220>] ? linear_conf+0xac/0x399 [linear] [293400.456628] [<f8cea220>] ? linear_conf+0xac/0x399 [linear] [293400.474060] [<c04421e4>] ? mark_held_locks+0x41/0x5c [293400.491130] [<c0627061>] ? __mutex_unlock_slowpath+0xe1/0xe9 [293400.509098] [<c044093b>] ? lock_release_holdtime+0x3f/0x44 [293400.526942] [<c059524c>] ? do_md_run+0x514/0x9ea [293400.543989] [<f8cea5e2>] ? linear_run+0x11/0x71 [linear] [293400.561848] [<c0595407>] ? do_md_run+0x6cf/0x9ea [293400.579013] [<c0628863>] ? _spin_unlock_irq+0x22/0x26 [293400.596696] [<c04421e4>] ? mark_held_locks+0x41/0x5c [293400.614585] [<c0626f6c>] ? mutex_lock_interruptible_nested+0x25f/0x273 [293400.634244] [<c044235f>] ? trace_hardirqs_on+0xe1/0x102 [293400.652580] [<c0626f76>] ? mutex_lock_interruptible_nested+0x269/0x273 [293400.672573] [<c0599249>] ? md_ioctl+0xb8/0xdc6 [293400.690261] [<c0599d3d>] ? md_ioctl+0xbac/0xdc6 [293400.708073] [<c0408124>] ? native_sched_clock+0x8d/0x9f [293400.726798] [<c044093b>] ? lock_release_holdtime+0x3f/0x44 [293400.745947] [<c06289c2>] ? _spin_unlock_irqrestore+0x36/0x3c [293400.765174] [<c044235f>] ? trace_hardirqs_on+0xe1/0x102 [293400.783838] [<c043acec>] ? down+0x2b/0x2f [293400.801143] [<c04e8035>] ? blkdev_driver_ioctl+0x49/0x5b [293400.819931] [<c04e8762>] ? blkdev_ioctl+0x71b/0x769 [293400.837909] [<c0462006>] ? free_hot_cold_page+0x15c/0x185 [293400.856024] [<c0408124>] ? native_sched_clock+0x8d/0x9f [293400.873546] [<c044093b>] ? lock_release_holdtime+0x3f/0x44 [293400.891111] [<c06289c2>] ? _spin_unlock_irqrestore+0x36/0x3c [293400.908540] [<c044235f>] ? trace_hardirqs_on+0xe1/0x102 [293400.925100] [<c049f217>] ? block_ioctl+0x16/0x1b [293400.940642] [<c049f201>] ? block_ioctl+0x0/0x1b [293400.956000] [<c04887d2>] ? vfs_ioctl+0x22/0x67 [293400.971108] [<c0488a7b>] ? do_vfs_ioctl+0x264/0x27b [293400.986610] [<c0488ad2>] ? sys_ioctl+0x40/0x5a [293401.001599] [<c0403915>] ? sysenter_past_esp+0x6a/0xb1 [293401.017331] ======================= [293401.031194] Code: 89 3d 30 86 a2 c0 b8 02 00 00 00 eb 33 8b 9f b4 00 00 00 eb 16 8b 43 08 8d 56 01 e8 6f ff ff ff 83 f8 02 74 1b 85 c0 74 17 8b 1b <8b> [293401.073207] EIP: [<c0441cfa>] find_usage_backwards+0x9c/0xb6 SS:ESP 0068:d1db9b9c [293401.121680] ---[ end trace 6a498ad836843586 ]--- --- md: tell blk-core about device_lock for protecting the queue flags From: Dan Williams <dan.j.williams@intel.com> Now that queue flags are no longer atomic (commit: 75ad23bc0fcb4f992a5d06982bf0857ab1738e9e) blk-core checks the queue is locked via ->queue_lock. As noticed by Neil conf->device_lock already satisfies this requirement. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/md/linear.c | 6 ++++++ drivers/md/multipath.c | 6 ++++++ drivers/md/raid0.c | 6 ++++++ drivers/md/raid1.c | 7 ++++++- drivers/md/raid10.c | 7 ++++++- drivers/md/raid5.c | 2 ++ include/linux/raid/linear.h | 1 + include/linux/raid/raid0.h | 1 + 8 files changed, 34 insertions(+), 2 deletions(-) diff --git a/drivers/md/linear.c b/drivers/md/linear.c index 0b85117..d026f08 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -122,6 +122,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) cnt = 0; conf->array_size = 0; + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + rdev_for_each(rdev, tmp, mddev) { int j = rdev->raid_disk; dev_info_t *disk = conf->disks + j; @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) disk->rdev = rdev; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit ->max_sector to one PAGE, as * a one page request is never in violation. diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index 42ee1a2..ee7df38 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -436,6 +436,10 @@ static int multipath_run (mddev_t *mddev) goto out_free_conf; } + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + conf->working_disks = 0; rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; @@ -446,8 +450,10 @@ static int multipath_run (mddev_t *mddev) disk = conf->multipaths + disk_idx; disk->rdev = rdev; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, not that we ever expect a device with * a merge_bvec_fn to be involved in multipath */ diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 818b482..deb5609 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -117,6 +117,10 @@ static int create_strip_zones (mddev_t *mddev) if (!conf->devlist) return 1; + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + /* The first zone must contain all devices, so here we check that * there is a proper alignment of slots to devices and find them all */ @@ -138,8 +142,10 @@ static int create_strip_zones (mddev_t *mddev) } zone->dev[j] = rdev1; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev1->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit ->max_sector to one PAGE, as * a one page request is never in violation. diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 6778b7c..a01fc7e 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1935,6 +1935,10 @@ static int run(mddev_t *mddev) if (!conf->r1bio_pool) goto out_no_mem; + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; if (disk_idx >= mddev->raid_disks @@ -1944,8 +1948,10 @@ static int run(mddev_t *mddev) disk->rdev = rdev; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit ->max_sector to one PAGE, as * a one page request is never in violation. @@ -1958,7 +1964,6 @@ static int run(mddev_t *mddev) } conf->raid_disks = mddev->raid_disks; conf->mddev = mddev; - spin_lock_init(&conf->device_lock); INIT_LIST_HEAD(&conf->retry_list); spin_lock_init(&conf->resync_lock); diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 5938fa9..c28af78 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2082,6 +2082,10 @@ static int run(mddev_t *mddev) goto out_free_conf; } + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; if (disk_idx >= mddev->raid_disks @@ -2091,8 +2095,10 @@ static int run(mddev_t *mddev) disk->rdev = rdev; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit ->max_sector to one PAGE, as * a one page request is never in violation. @@ -2103,7 +2109,6 @@ static int run(mddev_t *mddev) disk->head_position = 0; } - spin_lock_init(&conf->device_lock); INIT_LIST_HEAD(&conf->retry_list); spin_lock_init(&conf->resync_lock); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ee0ea91..59964a7 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4257,6 +4257,8 @@ static int run(mddev_t *mddev) goto abort; } spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; init_waitqueue_head(&conf->wait_for_stripe); init_waitqueue_head(&conf->wait_for_overlap); INIT_LIST_HEAD(&conf->handle_list); diff --git a/include/linux/raid/linear.h b/include/linux/raid/linear.h index ba15469..1bb90cf 100644 --- a/include/linux/raid/linear.h +++ b/include/linux/raid/linear.h @@ -19,6 +19,7 @@ struct linear_private_data sector_t array_size; int preshift; /* shift before dividing by hash_spacing */ dev_info_t disks[0]; + spinlock_t device_lock; }; diff --git a/include/linux/raid/raid0.h b/include/linux/raid/raid0.h index 1b2dda0..3d20d14 100644 --- a/include/linux/raid/raid0.h +++ b/include/linux/raid/raid0.h @@ -21,6 +21,7 @@ struct raid0_private_data sector_t hash_spacing; int preshift; /* shift this before divide by hash_spacing */ + spinlock_t device_lock; }; typedef struct raid0_private_data raid0_conf_t; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-05 7:24 ` WARNING in 2.6.25-07422-gb66e1f1 Neil Brown 2008-05-05 18:03 ` Dan Williams @ 2008-05-05 19:02 ` Jens Axboe 2008-05-08 18:39 ` Rafael J. Wysocki 1 sibling, 1 reply; 11+ messages in thread From: Jens Axboe @ 2008-05-05 19:02 UTC (permalink / raw) To: Neil Brown Cc: Jacek Luczak, Prakash Punnoor, Linux Kernel list, Dan Williams, linux-raid On Mon, May 05 2008, Neil Brown wrote: > On Sunday May 4, jens.axboe@oracle.com wrote: > > On Sun, May 04 2008, Jacek Luczak wrote: > > > Hi, > > > > > > I've CC:-ed few guys which may help. > > > > > > Prakash Punnoor pisze: > > > > Hi, I got this on boot: > > > > > > > > usb 2-1.3: new full speed USB device using ehci_hcd and address 3 > > > > usb 2-1.3: configuration #1 chosen from 1 choice > > > > Clocksource tsc unstable (delta = -117343945 ns) > > > > ------------[ cut here ]------------ > > > > WARNING: at include/linux/blkdev.h:443 blk_remove_plug+0x7d/0x90() > ... > > > > Looks like it caught a real bug there - unfortunately we have to check > > for ->queue_lock here as well, if this is another stacked devices and > > not the bottom device. Does this make the warning go away for you? > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > index 087eee0..958f26b 100644 > > --- a/drivers/md/raid5.c > > +++ b/drivers/md/raid5.c > > @@ -3264,6 +3264,8 @@ static void raid5_unplug_device(struct request_queue *q) > > unsigned long flags; > > > > spin_lock_irqsave(&conf->device_lock, flags); > > + if (q->queue_lock) > > + spin_lock(q->queue_lock); > > > > if (blk_remove_plug(q)) { > > conf->seq_flush++; > > @@ -3271,6 +3273,8 @@ static void raid5_unplug_device(struct request_queue *q) > > } > > md_wakeup_thread(mddev->thread); > > > > + if (q->queue_lock) > > + spin_unlock(q->queue_lock); > > spin_unlock_irqrestore(&conf->device_lock, flags); > > > > unplug_slaves(mddev); > > > > I suspect that will just cause more problems, as the 'q' for an md > device never gets ->queue_lock initialised. > I suspect the correct thing to do is set > q->queue_lock = &conf->device_lock; > > at some stage, probably immediately after device_lock is initialised > in 'run'. > > I was discussing this with Dan Williams starting > http://marc.info/?l=linux-raid&m=120951839903995&w=4 > though we don't have an agreed patch yet. I agree with the usage of the device lock. I (mistakenly) thought that raid5 used the bottom device queue for that unplug - I see that it does not, so where does the warning come from? mddev->queue->queue_lock should be NULL, since md never sets it and it's zeroed to begin with?? > I'm wondering why you mention the issues of stacked devices though. I > don't see how it applies. Could you explain? See above, if the queue had been the bottom queue, ->queue_lock may or may not be NULL depending on whether this is the real device or (another) stacked device. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-05 19:02 ` Jens Axboe @ 2008-05-08 18:39 ` Rafael J. Wysocki 2008-05-08 18:46 ` Dan Williams 0 siblings, 1 reply; 11+ messages in thread From: Rafael J. Wysocki @ 2008-05-08 18:39 UTC (permalink / raw) To: Jens Axboe Cc: Neil Brown, Jacek Luczak, Prakash Punnoor, Linux Kernel list, Dan Williams, linux-raid On Monday, 5 of May 2008, Jens Axboe wrote: > On Mon, May 05 2008, Neil Brown wrote: > > On Sunday May 4, jens.axboe@oracle.com wrote: > > > On Sun, May 04 2008, Jacek Luczak wrote: > > > > Hi, > > > > > > > > I've CC:-ed few guys which may help. > > > > > > > > Prakash Punnoor pisze: > > > > > Hi, I got this on boot: > > > > > > > > > > usb 2-1.3: new full speed USB device using ehci_hcd and address 3 > > > > > usb 2-1.3: configuration #1 chosen from 1 choice > > > > > Clocksource tsc unstable (delta = -117343945 ns) > > > > > ------------[ cut here ]------------ > > > > > WARNING: at include/linux/blkdev.h:443 blk_remove_plug+0x7d/0x90() > > ... > > > > > > Looks like it caught a real bug there - unfortunately we have to check > > > for ->queue_lock here as well, if this is another stacked devices and > > > not the bottom device. Does this make the warning go away for you? > > > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > > index 087eee0..958f26b 100644 > > > --- a/drivers/md/raid5.c > > > +++ b/drivers/md/raid5.c > > > @@ -3264,6 +3264,8 @@ static void raid5_unplug_device(struct request_queue *q) > > > unsigned long flags; > > > > > > spin_lock_irqsave(&conf->device_lock, flags); > > > + if (q->queue_lock) > > > + spin_lock(q->queue_lock); > > > > > > if (blk_remove_plug(q)) { > > > conf->seq_flush++; > > > @@ -3271,6 +3273,8 @@ static void raid5_unplug_device(struct request_queue *q) > > > } > > > md_wakeup_thread(mddev->thread); > > > > > > + if (q->queue_lock) > > > + spin_unlock(q->queue_lock); > > > spin_unlock_irqrestore(&conf->device_lock, flags); > > > > > > unplug_slaves(mddev); > > > > > > > I suspect that will just cause more problems, as the 'q' for an md > > device never gets ->queue_lock initialised. > > I suspect the correct thing to do is set > > q->queue_lock = &conf->device_lock; > > > > at some stage, probably immediately after device_lock is initialised > > in 'run'. > > > > I was discussing this with Dan Williams starting > > http://marc.info/?l=linux-raid&m=120951839903995&w=4 > > though we don't have an agreed patch yet. > > I agree with the usage of the device lock. I (mistakenly) thought that > raid5 used the bottom device queue for that unplug - I see that it does > not, so where does the warning come from? mddev->queue->queue_lock > should be NULL, since md never sets it and it's zeroed to begin with?? > > > I'm wondering why you mention the issues of stacked devices though. I > > don't see how it applies. Could you explain? > > See above, if the queue had been the bottom queue, ->queue_lock may or > may not be NULL depending on whether this is the real device or > (another) stacked device. I get a similar warning with RAID1 on one of my test boxes: WARNING: at /home/rafael/src/linux-2.6/include/linux/blkdev.h:443 blk_remove_plug+0x85/0xa0() Modules linked in: raid456 async_xor async_memcpy async_tx xor raid0 ehci_hcd ohci_hcd sd_mod edd raid1 ext3 jbd fan sata_uli pata_ali thermal processor Pid: 2159, comm: md1_raid1 Not tainted 2.6.26-rc1 #158 Call Trace: [<ffffffff80238bbf>] warn_on_slowpath+0x5f/0x80 [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0 [<ffffffff80348f55>] blk_remove_plug+0x85/0xa0 [<ffffffffa004df64>] :raid1:flush_pending_writes+0x44/0xb0 [<ffffffffa004e649>] :raid1:raid1d+0x59/0xfe0 [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0 [<ffffffff8025dc4f>] ? trace_hardirqs_on+0xbf/0x150 [<ffffffff8043ea8c>] md_thread+0x3c/0x110 [<ffffffff8024f6a0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8043ea50>] ? md_thread+0x0/0x110 [<ffffffff8024f23d>] kthread+0x4d/0x80 [<ffffffff8020c548>] child_rip+0xa/0x12 [<ffffffff8020bc5f>] ? restore_args+0x0/0x30 [<ffffffff8024f1f0>] ? kthread+0x0/0x80 [<ffffffff8020c53e>] ? child_rip+0x0/0x12 ---[ end trace 05d4e0844c61f45d ]--- This is the WARN_ON_ONCE(!queue_is_locked(q)) in queue_flag_clear(), apparently. Thanks, Rafael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-08 18:39 ` Rafael J. Wysocki @ 2008-05-08 18:46 ` Dan Williams 2008-05-08 23:18 ` Dan Williams 0 siblings, 1 reply; 11+ messages in thread From: Dan Williams @ 2008-05-08 18:46 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jens Axboe, Neil Brown, Jacek Luczak, Prakash Punnoor, Linux Kernel list, linux-raid On Thu, 2008-05-08 at 11:39 -0700, Rafael J. Wysocki wrote: > I get a similar warning with RAID1 on one of my test boxes: > > WARNING: at /home/rafael/src/linux-2.6/include/linux/blkdev.h:443 blk_remove_plug+0x85/0xa0() > Modules linked in: raid456 async_xor async_memcpy async_tx xor raid0 ehci_hcd ohci_hcd sd_mod edd raid1 ext3 jbd fan sata_uli pata_ali thermal processor > Pid: 2159, comm: md1_raid1 Not tainted 2.6.26-rc1 #158 > > Call Trace: > [<ffffffff80238bbf>] warn_on_slowpath+0x5f/0x80 > [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0 > [<ffffffff80348f55>] blk_remove_plug+0x85/0xa0 > [<ffffffffa004df64>] :raid1:flush_pending_writes+0x44/0xb0 > [<ffffffffa004e649>] :raid1:raid1d+0x59/0xfe0 > [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0 > [<ffffffff8025dc4f>] ? trace_hardirqs_on+0xbf/0x150 > [<ffffffff8043ea8c>] md_thread+0x3c/0x110 > [<ffffffff8024f6a0>] ? autoremove_wake_function+0x0/0x40 > [<ffffffff8043ea50>] ? md_thread+0x0/0x110 > [<ffffffff8024f23d>] kthread+0x4d/0x80 > [<ffffffff8020c548>] child_rip+0xa/0x12 > [<ffffffff8020bc5f>] ? restore_args+0x0/0x30 > [<ffffffff8024f1f0>] ? kthread+0x0/0x80 > [<ffffffff8020c53e>] ? child_rip+0x0/0x12 > > ---[ end trace 05d4e0844c61f45d ]--- > > This is the WARN_ON_ONCE(!queue_is_locked(q)) in queue_flag_clear(), > apparently. Yes, it triggers on all RAID levels. The patch in this message: http://marc.info/?l=linux-raid&m=121001065404056&w=2 > ...fixes the raid 0/1/10/5/6 cases, but I am still trying to isolate an issue (potentially unrelated) with linear arrays. -- Dan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-08 18:46 ` Dan Williams @ 2008-05-08 23:18 ` Dan Williams 2008-05-09 2:15 ` Neil Brown 0 siblings, 1 reply; 11+ messages in thread From: Dan Williams @ 2008-05-08 23:18 UTC (permalink / raw) To: Neil Brown Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor, Linux Kernel list, linux-raid On Thu, 2008-05-08 at 11:46 -0700, Dan Williams wrote: > On Thu, 2008-05-08 at 11:39 -0700, Rafael J. Wysocki wrote: > > I get a similar warning with RAID1 on one of my test boxes: > > > > WARNING: at /home/rafael/src/linux-2.6/include/linux/blkdev.h:443 blk_remove_plug+0x85/0xa0() > > Modules linked in: raid456 async_xor async_memcpy async_tx xor raid0 ehci_hcd ohci_hcd sd_mod edd raid1 ext3 jbd fan sata_uli pata_ali thermal processor > > Pid: 2159, comm: md1_raid1 Not tainted 2.6.26-rc1 #158 > > > > Call Trace: > > [<ffffffff80238bbf>] warn_on_slowpath+0x5f/0x80 > > [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0 > > [<ffffffff80348f55>] blk_remove_plug+0x85/0xa0 > > [<ffffffffa004df64>] :raid1:flush_pending_writes+0x44/0xb0 > > [<ffffffffa004e649>] :raid1:raid1d+0x59/0xfe0 > > [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0 > > [<ffffffff8025dc4f>] ? trace_hardirqs_on+0xbf/0x150 > > [<ffffffff8043ea8c>] md_thread+0x3c/0x110 > > [<ffffffff8024f6a0>] ? autoremove_wake_function+0x0/0x40 > > [<ffffffff8043ea50>] ? md_thread+0x0/0x110 > > [<ffffffff8024f23d>] kthread+0x4d/0x80 > > [<ffffffff8020c548>] child_rip+0xa/0x12 > > [<ffffffff8020bc5f>] ? restore_args+0x0/0x30 > > [<ffffffff8024f1f0>] ? kthread+0x0/0x80 > > [<ffffffff8020c53e>] ? child_rip+0x0/0x12 > > > > ---[ end trace 05d4e0844c61f45d ]--- > > > > This is the WARN_ON_ONCE(!queue_is_locked(q)) in queue_flag_clear(), > > apparently. > > Yes, it triggers on all RAID levels. The patch in this message: > > http://marc.info/?l=linux-raid&m=121001065404056&w=2 > > > ...fixes the raid 0/1/10/5/6 cases, but I am still trying to isolate an > issue (potentially unrelated) with linear arrays. > Gah, 'device_lock' can not come after 'disks[0]' in 'struct linear_private_data'. Updated patch below. Simple testing passes: 'mdadm --create /dev/md0; mkfs.ext3 /dev/md0' for each raid level linear, 0, 1, 10, 5, and 6. ---snip---> Subject: md: tell blk-core about device_lock for protecting the queue flags From: Dan Williams <dan.j.williams@intel.com> Now that queue flags are no longer atomic (commit: 75ad23bc0fcb4f992a5d06982bf0857ab1738e9e) blk-core checks the queue is locked via ->queue_lock. As noticed by Neil conf->device_lock already satisfies this requirement. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/md/linear.c | 6 ++++++ drivers/md/multipath.c | 6 ++++++ drivers/md/raid0.c | 6 ++++++ drivers/md/raid1.c | 7 ++++++- drivers/md/raid10.c | 7 ++++++- drivers/md/raid5.c | 2 ++ include/linux/raid/linear.h | 3 ++- include/linux/raid/raid0.h | 1 + 8 files changed, 35 insertions(+), 3 deletions(-) diff --git a/drivers/md/linear.c b/drivers/md/linear.c index 0b85117..d026f08 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -122,6 +122,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) cnt = 0; conf->array_size = 0; + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + rdev_for_each(rdev, tmp, mddev) { int j = rdev->raid_disk; dev_info_t *disk = conf->disks + j; @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) disk->rdev = rdev; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit ->max_sector to one PAGE, as * a one page request is never in violation. diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index 42ee1a2..ee7df38 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -436,6 +436,10 @@ static int multipath_run (mddev_t *mddev) goto out_free_conf; } + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + conf->working_disks = 0; rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; @@ -446,8 +450,10 @@ static int multipath_run (mddev_t *mddev) disk = conf->multipaths + disk_idx; disk->rdev = rdev; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, not that we ever expect a device with * a merge_bvec_fn to be involved in multipath */ diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 818b482..deb5609 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -117,6 +117,10 @@ static int create_strip_zones (mddev_t *mddev) if (!conf->devlist) return 1; + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + /* The first zone must contain all devices, so here we check that * there is a proper alignment of slots to devices and find them all */ @@ -138,8 +142,10 @@ static int create_strip_zones (mddev_t *mddev) } zone->dev[j] = rdev1; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev1->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit ->max_sector to one PAGE, as * a one page request is never in violation. diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 6778b7c..a01fc7e 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1935,6 +1935,10 @@ static int run(mddev_t *mddev) if (!conf->r1bio_pool) goto out_no_mem; + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; if (disk_idx >= mddev->raid_disks @@ -1944,8 +1948,10 @@ static int run(mddev_t *mddev) disk->rdev = rdev; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit ->max_sector to one PAGE, as * a one page request is never in violation. @@ -1958,7 +1964,6 @@ static int run(mddev_t *mddev) } conf->raid_disks = mddev->raid_disks; conf->mddev = mddev; - spin_lock_init(&conf->device_lock); INIT_LIST_HEAD(&conf->retry_list); spin_lock_init(&conf->resync_lock); diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 5938fa9..c28af78 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2082,6 +2082,10 @@ static int run(mddev_t *mddev) goto out_free_conf; } + spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; + rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; if (disk_idx >= mddev->raid_disks @@ -2091,8 +2095,10 @@ static int run(mddev_t *mddev) disk->rdev = rdev; + spin_lock(&conf->device_lock); blk_queue_stack_limits(mddev->queue, rdev->bdev->bd_disk->queue); + spin_unlock(&conf->device_lock); /* as we don't honour merge_bvec_fn, we must never risk * violating it, so limit ->max_sector to one PAGE, as * a one page request is never in violation. @@ -2103,7 +2109,6 @@ static int run(mddev_t *mddev) disk->head_position = 0; } - spin_lock_init(&conf->device_lock); INIT_LIST_HEAD(&conf->retry_list); spin_lock_init(&conf->resync_lock); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ee0ea91..59964a7 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4257,6 +4257,8 @@ static int run(mddev_t *mddev) goto abort; } spin_lock_init(&conf->device_lock); + /* blk-core uses queue_lock to verify protection of the queue flags */ + mddev->queue->queue_lock = &conf->device_lock; init_waitqueue_head(&conf->wait_for_stripe); init_waitqueue_head(&conf->wait_for_overlap); INIT_LIST_HEAD(&conf->handle_list); diff --git a/include/linux/raid/linear.h b/include/linux/raid/linear.h index ba15469..3c35e1e 100644 --- a/include/linux/raid/linear.h +++ b/include/linux/raid/linear.h @@ -18,7 +18,8 @@ struct linear_private_data sector_t hash_spacing; sector_t array_size; int preshift; /* shift before dividing by hash_spacing */ - dev_info_t disks[0]; + spinlock_t device_lock; + dev_info_t disks[0]; /* grows depending on 'raid_disks' */ }; diff --git a/include/linux/raid/raid0.h b/include/linux/raid/raid0.h index 1b2dda0..3d20d14 100644 --- a/include/linux/raid/raid0.h +++ b/include/linux/raid/raid0.h @@ -21,6 +21,7 @@ struct raid0_private_data sector_t hash_spacing; int preshift; /* shift this before divide by hash_spacing */ + spinlock_t device_lock; }; typedef struct raid0_private_data raid0_conf_t; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-08 23:18 ` Dan Williams @ 2008-05-09 2:15 ` Neil Brown 2008-05-09 4:59 ` Dan Williams 2008-05-09 5:38 ` Neil Brown 0 siblings, 2 replies; 11+ messages in thread From: Neil Brown @ 2008-05-09 2:15 UTC (permalink / raw) To: Dan Williams Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor, Linux Kernel list, linux-raid On Thursday May 8, dan.j.williams@intel.com wrote: > On Thu, 2008-05-08 at 11:46 -0700, Dan Williams wrote: > Subject: md: tell blk-core about device_lock for protecting the queue flags > From: Dan Williams <dan.j.williams@intel.com> > > Now that queue flags are no longer atomic (commit: > 75ad23bc0fcb4f992a5d06982bf0857ab1738e9e) blk-core checks the queue is locked > via ->queue_lock. As noticed by Neil conf->device_lock already satisfies this > requirement. > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > > drivers/md/linear.c | 6 ++++++ > drivers/md/multipath.c | 6 ++++++ > drivers/md/raid0.c | 6 ++++++ > drivers/md/raid1.c | 7 ++++++- > drivers/md/raid10.c | 7 ++++++- > drivers/md/raid5.c | 2 ++ > include/linux/raid/linear.h | 3 ++- > include/linux/raid/raid0.h | 1 + > 8 files changed, 35 insertions(+), 3 deletions(-) > > > diff --git a/drivers/md/linear.c b/drivers/md/linear.c > index 0b85117..d026f08 100644 > --- a/drivers/md/linear.c > +++ b/drivers/md/linear.c > @@ -122,6 +122,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) > cnt = 0; > conf->array_size = 0; > > + spin_lock_init(&conf->device_lock); > + /* blk-core uses queue_lock to verify protection of the queue flags */ > + mddev->queue->queue_lock = &conf->device_lock; > + > rdev_for_each(rdev, tmp, mddev) { > int j = rdev->raid_disk; > dev_info_t *disk = conf->disks + j; > @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) > > disk->rdev = rdev; > > + spin_lock(&conf->device_lock); > blk_queue_stack_limits(mddev->queue, > rdev->bdev->bd_disk->queue); > + spin_unlock(&conf->device_lock); > /* as we don't honour merge_bvec_fn, we must never risk > * violating it, so limit ->max_sector to one PAGE, as > * a one page request is never in violation. This shouldn't be necessary. There is no actual race here -- mddev->queue->queue_flags is not going to be accessed by anyone else until do_md_run does mddev->queue->make_request_fn = mddev->pers->make_request; which is much later. So we only need to be sure that "queue_is_locked" doesn't complain. And as q->queue_lock is still NULL at this point, it won't complain. I think that the *only* change that is needs is to put > + /* blk-core uses queue_lock to verify protection of the queue flags */ > + mddev->queue->queue_lock = &conf->device_lock; after each > + spin_lock_init(&conf->device_lock); i.e. in raid1.c, raid10.c and raid5.c ?? NeilBrown ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-09 2:15 ` Neil Brown @ 2008-05-09 4:59 ` Dan Williams 2008-05-09 5:38 ` Neil Brown 1 sibling, 0 replies; 11+ messages in thread From: Dan Williams @ 2008-05-09 4:59 UTC (permalink / raw) To: Neil Brown Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor, Linux Kernel list, linux-raid On Thu, May 8, 2008 at 7:15 PM, Neil Brown <neilb@suse.de> wrote: > On Thursday May 8, dan.j.williams@intel.com wrote: > > @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) > > > > disk->rdev = rdev; > > > > + spin_lock(&conf->device_lock); > > blk_queue_stack_limits(mddev->queue, > > rdev->bdev->bd_disk->queue); > > + spin_unlock(&conf->device_lock); > > /* as we don't honour merge_bvec_fn, we must never risk > > * violating it, so limit ->max_sector to one PAGE, as > > * a one page request is never in violation. > > This shouldn't be necessary. > There is no actual race here -- mddev->queue->queue_flags is not going to be > accessed by anyone else until do_md_run does > mddev->queue->make_request_fn = mddev->pers->make_request; > which is much later. > So we only need to be sure that "queue_is_locked" doesn't complain. > And as q->queue_lock is still NULL at this point, it won't complain. > > I think that the *only* change that is needs is to put > > > > + /* blk-core uses queue_lock to verify protection of the queue flags */ > > + mddev->queue->queue_lock = &conf->device_lock; > > after each > > > + spin_lock_init(&conf->device_lock); > > i.e. in raid1.c, raid10.c and raid5.c > > ?? Yes, locking shouldn't be needed at those points; however, the warning still fires because blk_queue_stack_limits() is using queue_flag_clear() instead of queue_flag_unlocked(). Taking a look at converting it to queue_flag_clear_unlocked() uncovered a couple more overlooked sites (multipath.c:multipath_add_disk and raid1.c:raid1_add_disk) where ->run has already been called... The options I am thinking of all seem ugly: 1/ keep the unnecessary locking in MD 2/ make blk_queue_stack_limits() use queue_flag_clear_unlocked() even though it needs to be locked sometimes 3/ conditionally use queue_flag_clear_unlocked if !t->queue_lock -- Dan I'm having a h ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-09 2:15 ` Neil Brown 2008-05-09 4:59 ` Dan Williams @ 2008-05-09 5:38 ` Neil Brown 2008-05-12 17:46 ` Dan Williams 1 sibling, 1 reply; 11+ messages in thread From: Neil Brown @ 2008-05-09 5:38 UTC (permalink / raw) To: Dan Williams, Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor On Friday May 9, neilb@suse.de wrote: > On Thursday May 8, dan.j.williams@intel.com wrote: > > @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) > > > > disk->rdev = rdev; > > > > + spin_lock(&conf->device_lock); > > blk_queue_stack_limits(mddev->queue, > > rdev->bdev->bd_disk->queue); > > + spin_unlock(&conf->device_lock); > > /* as we don't honour merge_bvec_fn, we must never risk > > * violating it, so limit ->max_sector to one PAGE, as > > * a one page request is never in violation. > > This shouldn't be necessary. > There is no actual race here -- mddev->queue->queue_flags is not going to be > accessed by anyone else until do_md_run does > mddev->queue->make_request_fn = mddev->pers->make_request; > which is much later. > So we only need to be sure that "queue_is_locked" doesn't complain. > And as q->queue_lock is still NULL at this point, it won't complain. Sorry, I got that backwards. It will complain, won't it. :-) I gotta say that I think it shouldn't. Introducing a spinlock in linear.c, raid0.c, multipath.c just to silence a "WARN_ON" seems like the wrong thing to do. Of course we could just use q->__queue_lock so we don't have to add a new lock, but we still have to take the lock unnecessarily. Unfortunately I cannot find a nice solution that both avoids clutter in md code and also protects against carelessly changing flags without a proper lock..... Maybe.... We could get blk_queue_stack_limits to lock the queue, and always spin_lock_init __queue_lock. Then the only change needed in linear.c et al would be to set ->queue_lock to &->__queue_lock. Jens: What do you think of this?? NeilBrown diff --git a/block/blk-core.c b/block/blk-core.c index b754a4a..2d31dc2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -479,6 +479,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) kobject_init(&q->kobj, &blk_queue_ktype); mutex_init(&q->sysfs_lock); + spin_lock_init(&q->__queue_lock); return q; } @@ -541,10 +542,8 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) * if caller didn't supply a lock, they get per-queue locking with * our embedded lock */ - if (!lock) { - spin_lock_init(&q->__queue_lock); + if (!lock) lock = &q->__queue_lock; - } q->request_fn = rfn; q->prep_rq_fn = NULL; diff --git a/block/blk-settings.c b/block/blk-settings.c index bb93d4c..488199a 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -286,8 +286,14 @@ void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b) t->max_hw_segments = min(t->max_hw_segments, b->max_hw_segments); t->max_segment_size = min(t->max_segment_size, b->max_segment_size); t->hardsect_size = max(t->hardsect_size, b->hardsect_size); - if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) + if (!t->queue_lock) + WARN_ON_ONCE(1); + else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) { + unsigned long flags; + spin_lock_irqsave(&t->queue_lock, flags); queue_flag_clear(QUEUE_FLAG_CLUSTER, t); + spin_unlock_irqrestore(&t->queue_lock, flags); + } } EXPORT_SYMBOL(blk_queue_stack_limits); diff --git a/drivers/md/linear.c b/drivers/md/linear.c index 0b85117..552f81b 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -250,6 +250,7 @@ static int linear_run (mddev_t *mddev) { linear_conf_t *conf; + mddev->queue_lock = &mddev->__queue_lock; conf = linear_conf(mddev, mddev->raid_disks); if (!conf) diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index 42ee1a2..90f85e4 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -417,6 +417,7 @@ static int multipath_run (mddev_t *mddev) * bookkeeping area. [whatever we allocate in multipath_run(), * should be freed in multipath_stop()] */ + mddev->queue_lock = &mddev->__queue_lock; conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL); mddev->private = conf; diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 818b482..a179c8f 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -280,6 +280,7 @@ static int raid0_run (mddev_t *mddev) (mddev->chunk_size>>1)-1); blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9); blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1); + mddev->queue_lock = &mddev->__queue_lock; conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL); if (!conf) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 6778b7c..ac409b7 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1935,6 +1935,9 @@ static int run(mddev_t *mddev) if (!conf->r1bio_pool) goto out_no_mem; + spin_lock_init(&conf->device_lock); + mddev->queue->queue_lock = &conf->device_lock; + rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; if (disk_idx >= mddev->raid_disks @@ -1958,7 +1961,6 @@ static int run(mddev_t *mddev) } conf->raid_disks = mddev->raid_disks; conf->mddev = mddev; - spin_lock_init(&conf->device_lock); INIT_LIST_HEAD(&conf->retry_list); spin_lock_init(&conf->resync_lock); diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 5938fa9..740f670 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2082,6 +2082,9 @@ static int run(mddev_t *mddev) goto out_free_conf; } + spin_lock_init(&conf->device_lock); + mddev->queue->queue_lock = &mddev->queue->__queue_lock; + rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; if (disk_idx >= mddev->raid_disks @@ -2103,7 +2106,6 @@ static int run(mddev_t *mddev) disk->head_position = 0; } - spin_lock_init(&conf->device_lock); INIT_LIST_HEAD(&conf->retry_list); spin_lock_init(&conf->resync_lock); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 087eee0..4fafc79 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4256,6 +4256,7 @@ static int run(mddev_t *mddev) goto abort; } spin_lock_init(&conf->device_lock); + mddev->queue->queue_lock = &conf->device_lock; init_waitqueue_head(&conf->wait_for_stripe); init_waitqueue_head(&conf->wait_for_overlap); INIT_LIST_HEAD(&conf->handle_list); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-09 5:38 ` Neil Brown @ 2008-05-12 17:46 ` Dan Williams 2008-05-13 1:08 ` Neil Brown 0 siblings, 1 reply; 11+ messages in thread From: Dan Williams @ 2008-05-12 17:46 UTC (permalink / raw) To: Neil Brown Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor, Linux Kernel list, linux-raid On Thu, 2008-05-08 at 22:38 -0700, Neil Brown wrote: > On Friday May 9, neilb@suse.de wrote: > > On Thursday May 8, dan.j.williams@intel.com wrote: > > > @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t > *mddev, int raid_disks) > > > > > > disk->rdev = rdev; > > > > > > + spin_lock(&conf->device_lock); > > > blk_queue_stack_limits(mddev->queue, > > > rdev->bdev->bd_disk->queue); > > > + spin_unlock(&conf->device_lock); > > > /* as we don't honour merge_bvec_fn, we must never > risk > > > * violating it, so limit ->max_sector to one PAGE, as > > > * a one page request is never in violation. > > > > This shouldn't be necessary. > > There is no actual race here -- mddev->queue->queue_flags is not > going to be > > accessed by anyone else until do_md_run does > > mddev->queue->make_request_fn = mddev->pers->make_request; > > which is much later. > > So we only need to be sure that "queue_is_locked" doesn't complain. > > And as q->queue_lock is still NULL at this point, it won't complain. > > Sorry, I got that backwards. It will complain, won't it. :-) > > I gotta say that I think it shouldn't. Introducing a spinlock in > linear.c, raid0.c, multipath.c just to silence a "WARN_ON" seems like > the wrong thing to do. Of course we could just use q->__queue_lock so > we don't have to add a new lock, but we still have to take the lock > unnecessarily. > > Unfortunately I cannot find a nice solution that both avoids clutter > in md code and also protects against carelessly changing flags without > a proper lock..... > > Maybe.... > We could get blk_queue_stack_limits to lock the queue, and always > spin_lock_init __queue_lock. Then the only change needed in linear.c > et al would be to set ->queue_lock to &->__queue_lock. > > Jens: What do you think of this?? > > diff --git a/block/blk-core.c b/block/blk-core.c > index b754a4a..2d31dc2 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -479,6 +479,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t > gfp_mask, int node_id) > kobject_init(&q->kobj, &blk_queue_ktype); > > mutex_init(&q->sysfs_lock); > + spin_lock_init(&q->__queue_lock); > > return q; > } > @@ -541,10 +542,8 @@ blk_init_queue_node(request_fn_proc *rfn, > spinlock_t *lock, int node_id) > * if caller didn't supply a lock, they get per-queue locking > with > * our embedded lock > */ > - if (!lock) { > - spin_lock_init(&q->__queue_lock); > + if (!lock) > lock = &q->__queue_lock; > - } > > q->request_fn = rfn; > q->prep_rq_fn = NULL; > diff --git a/block/blk-settings.c b/block/blk-settings.c > index bb93d4c..488199a 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -286,8 +286,14 @@ void blk_queue_stack_limits(struct request_queue > *t, struct request_queue *b) > t->max_hw_segments = min(t->max_hw_segments, > b->max_hw_segments); > t->max_segment_size = min(t->max_segment_size, > b->max_segment_size); > t->hardsect_size = max(t->hardsect_size, b->hardsect_size); > - if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) > + if (!t->queue_lock) > + WARN_ON_ONCE(1); > + else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) { > + unsigned long flags; > + spin_lock_irqsave(&t->queue_lock, flags); > queue_flag_clear(QUEUE_FLAG_CLUSTER, t); > + spin_unlock_irqrestore(&t->queue_lock, flags); > + } > } > EXPORT_SYMBOL(blk_queue_stack_limits); > > diff --git a/drivers/md/linear.c b/drivers/md/linear.c > index 0b85117..552f81b 100644 > --- a/drivers/md/linear.c > +++ b/drivers/md/linear.c > @@ -250,6 +250,7 @@ static int linear_run (mddev_t *mddev) > { > linear_conf_t *conf; > > + mddev->queue_lock = &mddev->__queue_lock; > conf = linear_conf(mddev, mddev->raid_disks); > > if (!conf) > diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c > index 42ee1a2..90f85e4 100644 > --- a/drivers/md/multipath.c > +++ b/drivers/md/multipath.c > @@ -417,6 +417,7 @@ static int multipath_run (mddev_t *mddev) > * bookkeeping area. [whatever we allocate in multipath_run(), > * should be freed in multipath_stop()] > */ > + mddev->queue_lock = &mddev->__queue_lock; > > conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL); > mddev->private = conf; > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c > index 818b482..a179c8f 100644 > --- a/drivers/md/raid0.c > +++ b/drivers/md/raid0.c > @@ -280,6 +280,7 @@ static int raid0_run (mddev_t *mddev) > (mddev->chunk_size>>1)-1); > blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9); > blk_queue_segment_boundary(mddev->queue, > (mddev->chunk_size>>1) - 1); > + mddev->queue_lock = &mddev->__queue_lock; > > conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL); > if (!conf) > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 6778b7c..ac409b7 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1935,6 +1935,9 @@ static int run(mddev_t *mddev) > if (!conf->r1bio_pool) > goto out_no_mem; > > + spin_lock_init(&conf->device_lock); > + mddev->queue->queue_lock = &conf->device_lock; > + > rdev_for_each(rdev, tmp, mddev) { > disk_idx = rdev->raid_disk; > if (disk_idx >= mddev->raid_disks > @@ -1958,7 +1961,6 @@ static int run(mddev_t *mddev) > } > conf->raid_disks = mddev->raid_disks; > conf->mddev = mddev; > - spin_lock_init(&conf->device_lock); > INIT_LIST_HEAD(&conf->retry_list); > > spin_lock_init(&conf->resync_lock); > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index 5938fa9..740f670 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -2082,6 +2082,9 @@ static int run(mddev_t *mddev) > goto out_free_conf; > } > > + spin_lock_init(&conf->device_lock); > + mddev->queue->queue_lock = &mddev->queue->__queue_lock; > + > rdev_for_each(rdev, tmp, mddev) { > disk_idx = rdev->raid_disk; > if (disk_idx >= mddev->raid_disks > @@ -2103,7 +2106,6 @@ static int run(mddev_t *mddev) > > disk->head_position = 0; > } > - spin_lock_init(&conf->device_lock); > INIT_LIST_HEAD(&conf->retry_list); > > spin_lock_init(&conf->resync_lock); > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 087eee0..4fafc79 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -4256,6 +4256,7 @@ static int run(mddev_t *mddev) > goto abort; > } > spin_lock_init(&conf->device_lock); > + mddev->queue->queue_lock = &conf->device_lock; > init_waitqueue_head(&conf->wait_for_stripe); > init_waitqueue_head(&conf->wait_for_overlap); > INIT_LIST_HEAD(&conf->handle_list); > Yes, this is simpler than what I had... spotted some fixups. -- Dan diff --git a/block/blk-settings.c b/block/blk-settings.c index 488199a..8dd8641 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -290,9 +290,9 @@ void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b) WARN_ON_ONCE(1); else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) { unsigned long flags; - spin_lock_irqsave(&t->queue_lock, flags); + spin_lock_irqsave(t->queue_lock, flags); queue_flag_clear(QUEUE_FLAG_CLUSTER, t); - spin_unlock_irqrestore(&t->queue_lock, flags); + spin_unlock_irqrestore(t->queue_lock, flags); } } EXPORT_SYMBOL(blk_queue_stack_limits); diff --git a/drivers/md/linear.c b/drivers/md/linear.c index 552f81b..1074824 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -250,7 +250,7 @@ static int linear_run (mddev_t *mddev) { linear_conf_t *conf; - mddev->queue_lock = &mddev->__queue_lock; + mddev->queue->queue_lock = &mddev->queue->__queue_lock; conf = linear_conf(mddev, mddev->raid_disks); if (!conf) diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c index 90f85e4..4f4d1f3 100644 --- a/drivers/md/multipath.c +++ b/drivers/md/multipath.c @@ -417,7 +417,7 @@ static int multipath_run (mddev_t *mddev) * bookkeeping area. [whatever we allocate in multipath_run(), * should be freed in multipath_stop()] */ - mddev->queue_lock = &mddev->__queue_lock; + mddev->queue->queue_lock = &mddev->queue->__queue_lock; conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL); mddev->private = conf; diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index a179c8f..914c04d 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -280,7 +280,7 @@ static int raid0_run (mddev_t *mddev) (mddev->chunk_size>>1)-1); blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9); blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1); - mddev->queue_lock = &mddev->__queue_lock; + mddev->queue->queue_lock = &mddev->queue->__queue_lock; conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL); if (!conf) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index f46d448..8536ede 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -2083,7 +2083,7 @@ static int run(mddev_t *mddev) } spin_lock_init(&conf->device_lock); - mddev->queue->queue_lock = &mddev->queue->__queue_lock; + mddev->queue->queue_lock = &conf->device_lock; rdev_for_each(rdev, tmp, mddev) { disk_idx = rdev->raid_disk; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: WARNING in 2.6.25-07422-gb66e1f1 2008-05-12 17:46 ` Dan Williams @ 2008-05-13 1:08 ` Neil Brown 0 siblings, 0 replies; 11+ messages in thread From: Neil Brown @ 2008-05-13 1:08 UTC (permalink / raw) To: Dan Williams Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor, Linux Kernel list, linux-raid On Monday May 12, dan.j.williams@intel.com wrote: > > Yes, this is simpler than what I had... spotted some fixups. > Ahh, you noticed that I hadn't actually compiled it. :-) Thanks. I've set it off to Linus. NeilBrown ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-05-13 1:08 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200805031151.44287.prakash@punnoor.de>
[not found] ` <481DB3F3.5080102@gmail.com>
[not found] ` <20080504183839.GN12774@kernel.dk>
2008-05-05 7:24 ` WARNING in 2.6.25-07422-gb66e1f1 Neil Brown
2008-05-05 18:03 ` Dan Williams
2008-05-05 19:02 ` Jens Axboe
2008-05-08 18:39 ` Rafael J. Wysocki
2008-05-08 18:46 ` Dan Williams
2008-05-08 23:18 ` Dan Williams
2008-05-09 2:15 ` Neil Brown
2008-05-09 4:59 ` Dan Williams
2008-05-09 5:38 ` Neil Brown
2008-05-12 17:46 ` Dan Williams
2008-05-13 1:08 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).