linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: WARNING in 2.6.25-07422-gb66e1f1
       [not found]   ` <20080504183839.GN12774@kernel.dk>
@ 2008-05-05  7:24     ` Neil Brown
  2008-05-05 18:03       ` Dan Williams
  2008-05-05 19:02       ` Jens Axboe
  0 siblings, 2 replies; 11+ messages in thread
From: Neil Brown @ 2008-05-05  7:24 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jacek Luczak, Prakash Punnoor, Linux Kernel list, Dan Williams,
	linux-raid

On Sunday May 4, jens.axboe@oracle.com wrote:
> On Sun, May 04 2008, Jacek Luczak wrote:
> > Hi,
> > 
> > I've CC:-ed few guys which may help.
> > 
> > Prakash Punnoor pisze:
> > > Hi, I got this on boot:
> > >
> > > usb 2-1.3: new full speed USB device using ehci_hcd and address 3
> > > usb 2-1.3: configuration #1 chosen from 1 choice
> > > Clocksource tsc unstable (delta = -117343945 ns)
> > > ------------[ cut here ]------------
> > > WARNING: at include/linux/blkdev.h:443 blk_remove_plug+0x7d/0x90()
...
> 
> Looks like it caught a real bug there - unfortunately we have to check
> for ->queue_lock here as well, if this is another stacked devices and
> not the bottom device. Does this make the warning go away for you?
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 087eee0..958f26b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3264,6 +3264,8 @@ static void raid5_unplug_device(struct request_queue *q)
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&conf->device_lock, flags);
> +	if (q->queue_lock)
> +		spin_lock(q->queue_lock);
>  
>  	if (blk_remove_plug(q)) {
>  		conf->seq_flush++;
> @@ -3271,6 +3273,8 @@ static void raid5_unplug_device(struct request_queue *q)
>  	}
>  	md_wakeup_thread(mddev->thread);
>  
> +	if (q->queue_lock)
> +		spin_unlock(q->queue_lock);
>  	spin_unlock_irqrestore(&conf->device_lock, flags);
>  
>  	unplug_slaves(mddev);
> 

I suspect that will just cause more problems, as the 'q' for an md
device never gets ->queue_lock initialised.
I suspect the correct thing to do is set
	q->queue_lock = &conf->device_lock;

at some stage, probably immediately after device_lock is initialised
in 'run'.

I was discussing this with Dan Williams starting
  http://marc.info/?l=linux-raid&m=120951839903995&w=4
though we don't have an agreed patch yet.

I'm wondering why you mention the issues of stacked devices though.  I
don't see how it applies.  Could you explain?

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-05  7:24     ` WARNING in 2.6.25-07422-gb66e1f1 Neil Brown
@ 2008-05-05 18:03       ` Dan Williams
  2008-05-05 19:02       ` Jens Axboe
  1 sibling, 0 replies; 11+ messages in thread
From: Dan Williams @ 2008-05-05 18:03 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jens Axboe, Jacek Luczak, Prakash Punnoor, Linux Kernel list,
	linux-raid


On Mon, 2008-05-05 at 00:24 -0700, Neil Brown wrote:
> On Sunday May 4, jens.axboe@oracle.com wrote:
> > On Sun, May 04 2008, Jacek Luczak wrote:
> > > Hi,
> > >
> > > I've CC:-ed few guys which may help.
> > >
> > > Prakash Punnoor pisze:
> > > > Hi, I got this on boot:
> > > >
> > > > usb 2-1.3: new full speed USB device using ehci_hcd and address 3
> > > > usb 2-1.3: configuration #1 chosen from 1 choice
> > > > Clocksource tsc unstable (delta = -117343945 ns)
> > > > ------------[ cut here ]------------
> > > > WARNING: at include/linux/blkdev.h:443 blk_remove_plug+0x7d/0x90()
> ...
> >
> > Looks like it caught a real bug there - unfortunately we have to check
> > for ->queue_lock here as well, if this is another stacked devices and
> > not the bottom device. Does this make the warning go away for you?
> >
[..]
> I suspect that will just cause more problems, as the 'q' for an md
> device never gets ->queue_lock initialised.
> I suspect the correct thing to do is set
>         q->queue_lock = &conf->device_lock;
> 
> at some stage, probably immediately after device_lock is initialised
> in 'run'.
> 
> I was discussing this with Dan Williams starting
>   http://marc.info/?l=linux-raid&m=120951839903995&w=4
> though we don't have an agreed patch yet.

The patch below appears to work for the raid5 case, but I am
encountering a new issue when testing linear arrays?  raid0/1/10 are not
triggering this issue.

$ mdadm --create /dev/md0 /dev/loop[0-3] -n 4 -l linear
mdadm: RUN_ARRAY failed: Invalid argument # huh?
mdadm: stopped /dev/md0
$ cat /proc/mdstat
Personalities : [raid0] [linear]
unused devices: <none>
$ mdadm --create /dev/md0 /dev/loop[0-3] -n 4 -l linear
Segmentation fault

[293399.915068] BUG: unable to handle kernel NULL pointer dereference at 00000000
[293399.931249] IP: [<c0441cfa>] find_usage_backwards+0x9c/0xb6
[293399.945735] *pde = 00000000
[293399.957323] Oops: 0000 [#1] SMP
[293399.968978] Modules linked in: raid456 async_xor async_memcpy async_tx xor linear loop ipt_MASQUERADE iptable_nat nf_nat bridge rfcomm l2cap bluetooth ]
[293400.093457]
[293400.105809] Pid: 30652, comm: mdadm Not tainted (2.6.25-imsm #63)
[293400.123339] EIP: 0060:[<c0441cfa>] EFLAGS: 00210046 CPU: 2
[293400.140261] EIP is at find_usage_backwards+0x9c/0xb6
[293400.156651] EAX: 00000002 EBX: 00000000 ECX: 00000001 EDX: 0000a9a8
[293400.174211] ESI: 00000000 EDI: d54f2400 EBP: d1db9ba8 ESP: d1db9b9c
[293400.191645]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[293400.207967] Process mdadm (pid: 30652, ti=d1db9000 task=e0f28000 task.ti=d1db9000)
[293400.216021] Stack: e0f284f0 e0f28000 00000004 d1db9bb8 c0441d2d e0f284f0 e0f28000 d1db9bd4
[293400.236094]        c0442032 c06d1fed 00000010 00200246 e0f284f0 d54f2400 d1db9c24 c0442b63
[293400.256296]        0000025d 00000002 00000000 00000000 f72cd3ec 00000001 e0f28000 00000000
[293400.276699] Call Trace:
[293400.302628]  [<c0441d2d>] ? check_usage_backwards+0x19/0x3b
[293400.320626]  [<c0442032>] ? mark_lock+0x228/0x399
[293400.337629]  [<c0442b63>] ? __lock_acquire+0x440/0xad5
[293400.355036]  [<c04421e4>] ? mark_held_locks+0x41/0x5c
[293400.372027]  [<c0408124>] ? native_sched_clock+0x8d/0x9f
[293400.389053]  [<c04435a2>] ? lock_acquire+0x57/0x73
[293400.405617]  [<f8cea220>] ? linear_conf+0xac/0x399 [linear]
[293400.422874]  [<c0628507>] ? _spin_lock+0x1c/0x49
[293400.439193]  [<f8cea220>] ? linear_conf+0xac/0x399 [linear]
[293400.456628]  [<f8cea220>] ? linear_conf+0xac/0x399 [linear]
[293400.474060]  [<c04421e4>] ? mark_held_locks+0x41/0x5c
[293400.491130]  [<c0627061>] ? __mutex_unlock_slowpath+0xe1/0xe9
[293400.509098]  [<c044093b>] ? lock_release_holdtime+0x3f/0x44
[293400.526942]  [<c059524c>] ? do_md_run+0x514/0x9ea
[293400.543989]  [<f8cea5e2>] ? linear_run+0x11/0x71 [linear]
[293400.561848]  [<c0595407>] ? do_md_run+0x6cf/0x9ea
[293400.579013]  [<c0628863>] ? _spin_unlock_irq+0x22/0x26
[293400.596696]  [<c04421e4>] ? mark_held_locks+0x41/0x5c
[293400.614585]  [<c0626f6c>] ? mutex_lock_interruptible_nested+0x25f/0x273
[293400.634244]  [<c044235f>] ? trace_hardirqs_on+0xe1/0x102
[293400.652580]  [<c0626f76>] ? mutex_lock_interruptible_nested+0x269/0x273
[293400.672573]  [<c0599249>] ? md_ioctl+0xb8/0xdc6
[293400.690261]  [<c0599d3d>] ? md_ioctl+0xbac/0xdc6
[293400.708073]  [<c0408124>] ? native_sched_clock+0x8d/0x9f
[293400.726798]  [<c044093b>] ? lock_release_holdtime+0x3f/0x44
[293400.745947]  [<c06289c2>] ? _spin_unlock_irqrestore+0x36/0x3c
[293400.765174]  [<c044235f>] ? trace_hardirqs_on+0xe1/0x102
[293400.783838]  [<c043acec>] ? down+0x2b/0x2f
[293400.801143]  [<c04e8035>] ? blkdev_driver_ioctl+0x49/0x5b
[293400.819931]  [<c04e8762>] ? blkdev_ioctl+0x71b/0x769
[293400.837909]  [<c0462006>] ? free_hot_cold_page+0x15c/0x185
[293400.856024]  [<c0408124>] ? native_sched_clock+0x8d/0x9f
[293400.873546]  [<c044093b>] ? lock_release_holdtime+0x3f/0x44
[293400.891111]  [<c06289c2>] ? _spin_unlock_irqrestore+0x36/0x3c
[293400.908540]  [<c044235f>] ? trace_hardirqs_on+0xe1/0x102
[293400.925100]  [<c049f217>] ? block_ioctl+0x16/0x1b
[293400.940642]  [<c049f201>] ? block_ioctl+0x0/0x1b
[293400.956000]  [<c04887d2>] ? vfs_ioctl+0x22/0x67
[293400.971108]  [<c0488a7b>] ? do_vfs_ioctl+0x264/0x27b
[293400.986610]  [<c0488ad2>] ? sys_ioctl+0x40/0x5a
[293401.001599]  [<c0403915>] ? sysenter_past_esp+0x6a/0xb1
[293401.017331]  =======================
[293401.031194] Code: 89 3d 30 86 a2 c0 b8 02 00 00 00 eb 33 8b 9f b4 00 00 00 eb 16 8b 43 08 8d 56 01 e8 6f ff ff ff 83 f8 02 74 1b 85 c0 74 17 8b 1b <8b>
[293401.073207] EIP: [<c0441cfa>] find_usage_backwards+0x9c/0xb6 SS:ESP 0068:d1db9b9c
[293401.121680] ---[ end trace 6a498ad836843586 ]---

---
md: tell blk-core about device_lock for protecting the queue flags

From: Dan Williams <dan.j.williams@intel.com>

Now that queue flags are no longer atomic (commit:
75ad23bc0fcb4f992a5d06982bf0857ab1738e9e) blk-core checks the queue is locked
via ->queue_lock.  As noticed by Neil conf->device_lock already satisfies this
requirement.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/linear.c         |    6 ++++++
 drivers/md/multipath.c      |    6 ++++++
 drivers/md/raid0.c          |    6 ++++++
 drivers/md/raid1.c          |    7 ++++++-
 drivers/md/raid10.c         |    7 ++++++-
 drivers/md/raid5.c          |    2 ++
 include/linux/raid/linear.h |    1 +
 include/linux/raid/raid0.h  |    1 +
 8 files changed, 34 insertions(+), 2 deletions(-)


diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 0b85117..d026f08 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -122,6 +122,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 	cnt = 0;
 	conf->array_size = 0;
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	rdev_for_each(rdev, tmp, mddev) {
 		int j = rdev->raid_disk;
 		dev_info_t *disk = conf->disks + j;
@@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 
 		disk->rdev = rdev;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 42ee1a2..ee7df38 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -436,6 +436,10 @@ static int multipath_run (mddev_t *mddev)
 		goto out_free_conf;
 	}
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	conf->working_disks = 0;
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;
@@ -446,8 +450,10 @@ static int multipath_run (mddev_t *mddev)
 		disk = conf->multipaths + disk_idx;
 		disk->rdev = rdev;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, not that we ever expect a device with
 		 * a merge_bvec_fn to be involved in multipath */
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 818b482..deb5609 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -117,6 +117,10 @@ static int create_strip_zones (mddev_t *mddev)
 	if (!conf->devlist)
 		return 1;
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	/* The first zone must contain all devices, so here we check that
 	 * there is a proper alignment of slots to devices and find them all
 	 */
@@ -138,8 +142,10 @@ static int create_strip_zones (mddev_t *mddev)
 		}
 		zone->dev[j] = rdev1;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev1->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6778b7c..a01fc7e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1935,6 +1935,10 @@ static int run(mddev_t *mddev)
 	if (!conf->r1bio_pool)
 		goto out_no_mem;
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx >= mddev->raid_disks
@@ -1944,8 +1948,10 @@ static int run(mddev_t *mddev)
 
 		disk->rdev = rdev;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -1958,7 +1964,6 @@ static int run(mddev_t *mddev)
 	}
 	conf->raid_disks = mddev->raid_disks;
 	conf->mddev = mddev;
-	spin_lock_init(&conf->device_lock);
 	INIT_LIST_HEAD(&conf->retry_list);
 
 	spin_lock_init(&conf->resync_lock);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5938fa9..c28af78 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2082,6 +2082,10 @@ static int run(mddev_t *mddev)
 		goto out_free_conf;
 	}
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx >= mddev->raid_disks
@@ -2091,8 +2095,10 @@ static int run(mddev_t *mddev)
 
 		disk->rdev = rdev;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -2103,7 +2109,6 @@ static int run(mddev_t *mddev)
 
 		disk->head_position = 0;
 	}
-	spin_lock_init(&conf->device_lock);
 	INIT_LIST_HEAD(&conf->retry_list);
 
 	spin_lock_init(&conf->resync_lock);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ee0ea91..59964a7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4257,6 +4257,8 @@ static int run(mddev_t *mddev)
 			goto abort;
 	}
 	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
 	init_waitqueue_head(&conf->wait_for_stripe);
 	init_waitqueue_head(&conf->wait_for_overlap);
 	INIT_LIST_HEAD(&conf->handle_list);
diff --git a/include/linux/raid/linear.h b/include/linux/raid/linear.h
index ba15469..1bb90cf 100644
--- a/include/linux/raid/linear.h
+++ b/include/linux/raid/linear.h
@@ -19,6 +19,7 @@ struct linear_private_data
 	sector_t		array_size;
 	int			preshift; /* shift before dividing by hash_spacing */
 	dev_info_t		disks[0];
+	spinlock_t		device_lock;
 };
 
 
diff --git a/include/linux/raid/raid0.h b/include/linux/raid/raid0.h
index 1b2dda0..3d20d14 100644
--- a/include/linux/raid/raid0.h
+++ b/include/linux/raid/raid0.h
@@ -21,6 +21,7 @@ struct raid0_private_data
 
 	sector_t hash_spacing;
 	int preshift;			/* shift this before divide by hash_spacing */
+	spinlock_t device_lock;
 };
 
 typedef struct raid0_private_data raid0_conf_t;



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-05  7:24     ` WARNING in 2.6.25-07422-gb66e1f1 Neil Brown
  2008-05-05 18:03       ` Dan Williams
@ 2008-05-05 19:02       ` Jens Axboe
  2008-05-08 18:39         ` Rafael J. Wysocki
  1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2008-05-05 19:02 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jacek Luczak, Prakash Punnoor, Linux Kernel list, Dan Williams,
	linux-raid

On Mon, May 05 2008, Neil Brown wrote:
> On Sunday May 4, jens.axboe@oracle.com wrote:
> > On Sun, May 04 2008, Jacek Luczak wrote:
> > > Hi,
> > > 
> > > I've CC:-ed few guys which may help.
> > > 
> > > Prakash Punnoor pisze:
> > > > Hi, I got this on boot:
> > > >
> > > > usb 2-1.3: new full speed USB device using ehci_hcd and address 3
> > > > usb 2-1.3: configuration #1 chosen from 1 choice
> > > > Clocksource tsc unstable (delta = -117343945 ns)
> > > > ------------[ cut here ]------------
> > > > WARNING: at include/linux/blkdev.h:443 blk_remove_plug+0x7d/0x90()
> ...
> > 
> > Looks like it caught a real bug there - unfortunately we have to check
> > for ->queue_lock here as well, if this is another stacked devices and
> > not the bottom device. Does this make the warning go away for you?
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 087eee0..958f26b 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -3264,6 +3264,8 @@ static void raid5_unplug_device(struct request_queue *q)
> >  	unsigned long flags;
> >  
> >  	spin_lock_irqsave(&conf->device_lock, flags);
> > +	if (q->queue_lock)
> > +		spin_lock(q->queue_lock);
> >  
> >  	if (blk_remove_plug(q)) {
> >  		conf->seq_flush++;
> > @@ -3271,6 +3273,8 @@ static void raid5_unplug_device(struct request_queue *q)
> >  	}
> >  	md_wakeup_thread(mddev->thread);
> >  
> > +	if (q->queue_lock)
> > +		spin_unlock(q->queue_lock);
> >  	spin_unlock_irqrestore(&conf->device_lock, flags);
> >  
> >  	unplug_slaves(mddev);
> > 
> 
> I suspect that will just cause more problems, as the 'q' for an md
> device never gets ->queue_lock initialised.
> I suspect the correct thing to do is set
> 	q->queue_lock = &conf->device_lock;
> 
> at some stage, probably immediately after device_lock is initialised
> in 'run'.
> 
> I was discussing this with Dan Williams starting
>   http://marc.info/?l=linux-raid&m=120951839903995&w=4
> though we don't have an agreed patch yet.

I agree with the usage of the device lock. I (mistakenly) thought that
raid5 used the bottom device queue for that unplug - I see that it does
not, so where does the warning come from? mddev->queue->queue_lock
should be NULL, since md never sets it and it's zeroed to begin with??

> I'm wondering why you mention the issues of stacked devices though.  I
> don't see how it applies.  Could you explain?

See above, if the queue had been the bottom queue, ->queue_lock may or
may not be NULL depending on whether this is the real device or
(another) stacked device.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-05 19:02       ` Jens Axboe
@ 2008-05-08 18:39         ` Rafael J. Wysocki
  2008-05-08 18:46           ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2008-05-08 18:39 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Neil Brown, Jacek Luczak, Prakash Punnoor, Linux Kernel list,
	Dan Williams, linux-raid

On Monday, 5 of May 2008, Jens Axboe wrote:
> On Mon, May 05 2008, Neil Brown wrote:
> > On Sunday May 4, jens.axboe@oracle.com wrote:
> > > On Sun, May 04 2008, Jacek Luczak wrote:
> > > > Hi,
> > > > 
> > > > I've CC:-ed few guys which may help.
> > > > 
> > > > Prakash Punnoor pisze:
> > > > > Hi, I got this on boot:
> > > > >
> > > > > usb 2-1.3: new full speed USB device using ehci_hcd and address 3
> > > > > usb 2-1.3: configuration #1 chosen from 1 choice
> > > > > Clocksource tsc unstable (delta = -117343945 ns)
> > > > > ------------[ cut here ]------------
> > > > > WARNING: at include/linux/blkdev.h:443 blk_remove_plug+0x7d/0x90()
> > ...
> > > 
> > > Looks like it caught a real bug there - unfortunately we have to check
> > > for ->queue_lock here as well, if this is another stacked devices and
> > > not the bottom device. Does this make the warning go away for you?
> > > 
> > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > index 087eee0..958f26b 100644
> > > --- a/drivers/md/raid5.c
> > > +++ b/drivers/md/raid5.c
> > > @@ -3264,6 +3264,8 @@ static void raid5_unplug_device(struct request_queue *q)
> > >  	unsigned long flags;
> > >  
> > >  	spin_lock_irqsave(&conf->device_lock, flags);
> > > +	if (q->queue_lock)
> > > +		spin_lock(q->queue_lock);
> > >  
> > >  	if (blk_remove_plug(q)) {
> > >  		conf->seq_flush++;
> > > @@ -3271,6 +3273,8 @@ static void raid5_unplug_device(struct request_queue *q)
> > >  	}
> > >  	md_wakeup_thread(mddev->thread);
> > >  
> > > +	if (q->queue_lock)
> > > +		spin_unlock(q->queue_lock);
> > >  	spin_unlock_irqrestore(&conf->device_lock, flags);
> > >  
> > >  	unplug_slaves(mddev);
> > > 
> > 
> > I suspect that will just cause more problems, as the 'q' for an md
> > device never gets ->queue_lock initialised.
> > I suspect the correct thing to do is set
> > 	q->queue_lock = &conf->device_lock;
> > 
> > at some stage, probably immediately after device_lock is initialised
> > in 'run'.
> > 
> > I was discussing this with Dan Williams starting
> >   http://marc.info/?l=linux-raid&m=120951839903995&w=4
> > though we don't have an agreed patch yet.
> 
> I agree with the usage of the device lock. I (mistakenly) thought that
> raid5 used the bottom device queue for that unplug - I see that it does
> not, so where does the warning come from? mddev->queue->queue_lock
> should be NULL, since md never sets it and it's zeroed to begin with??
> 
> > I'm wondering why you mention the issues of stacked devices though.  I
> > don't see how it applies.  Could you explain?
> 
> See above, if the queue had been the bottom queue, ->queue_lock may or
> may not be NULL depending on whether this is the real device or
> (another) stacked device.

I get a similar warning with RAID1 on one of my test boxes:

WARNING: at /home/rafael/src/linux-2.6/include/linux/blkdev.h:443 blk_remove_plug+0x85/0xa0()
Modules linked in: raid456 async_xor async_memcpy async_tx xor raid0 ehci_hcd ohci_hcd sd_mod edd raid1 ext3 jbd fan sata_uli pata_ali thermal processor
Pid: 2159, comm: md1_raid1 Not tainted 2.6.26-rc1 #158

Call Trace:
 [<ffffffff80238bbf>] warn_on_slowpath+0x5f/0x80
 [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0
 [<ffffffff80348f55>] blk_remove_plug+0x85/0xa0
 [<ffffffffa004df64>] :raid1:flush_pending_writes+0x44/0xb0
 [<ffffffffa004e649>] :raid1:raid1d+0x59/0xfe0
 [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0
 [<ffffffff8025dc4f>] ? trace_hardirqs_on+0xbf/0x150
 [<ffffffff8043ea8c>] md_thread+0x3c/0x110
 [<ffffffff8024f6a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8043ea50>] ? md_thread+0x0/0x110
 [<ffffffff8024f23d>] kthread+0x4d/0x80
 [<ffffffff8020c548>] child_rip+0xa/0x12
 [<ffffffff8020bc5f>] ? restore_args+0x0/0x30
 [<ffffffff8024f1f0>] ? kthread+0x0/0x80
 [<ffffffff8020c53e>] ? child_rip+0x0/0x12

---[ end trace 05d4e0844c61f45d ]---

This is the WARN_ON_ONCE(!queue_is_locked(q)) in queue_flag_clear(),
apparently.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-08 18:39         ` Rafael J. Wysocki
@ 2008-05-08 18:46           ` Dan Williams
  2008-05-08 23:18             ` Dan Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2008-05-08 18:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jens Axboe, Neil Brown, Jacek Luczak, Prakash Punnoor,
	Linux Kernel list, linux-raid

On Thu, 2008-05-08 at 11:39 -0700, Rafael J. Wysocki wrote:
> I get a similar warning with RAID1 on one of my test boxes:
> 
> WARNING: at /home/rafael/src/linux-2.6/include/linux/blkdev.h:443 blk_remove_plug+0x85/0xa0()
> Modules linked in: raid456 async_xor async_memcpy async_tx xor raid0 ehci_hcd ohci_hcd sd_mod edd raid1 ext3 jbd fan sata_uli pata_ali thermal processor
> Pid: 2159, comm: md1_raid1 Not tainted 2.6.26-rc1 #158
> 
> Call Trace:
>  [<ffffffff80238bbf>] warn_on_slowpath+0x5f/0x80
>  [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0
>  [<ffffffff80348f55>] blk_remove_plug+0x85/0xa0
>  [<ffffffffa004df64>] :raid1:flush_pending_writes+0x44/0xb0
>  [<ffffffffa004e649>] :raid1:raid1d+0x59/0xfe0
>  [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0
>  [<ffffffff8025dc4f>] ? trace_hardirqs_on+0xbf/0x150
>  [<ffffffff8043ea8c>] md_thread+0x3c/0x110
>  [<ffffffff8024f6a0>] ? autoremove_wake_function+0x0/0x40
>  [<ffffffff8043ea50>] ? md_thread+0x0/0x110
>  [<ffffffff8024f23d>] kthread+0x4d/0x80
>  [<ffffffff8020c548>] child_rip+0xa/0x12
>  [<ffffffff8020bc5f>] ? restore_args+0x0/0x30
>  [<ffffffff8024f1f0>] ? kthread+0x0/0x80
>  [<ffffffff8020c53e>] ? child_rip+0x0/0x12
> 
> ---[ end trace 05d4e0844c61f45d ]---
> 
> This is the WARN_ON_ONCE(!queue_is_locked(q)) in queue_flag_clear(),
> apparently.

Yes, it triggers on all RAID levels.  The patch in this message:

http://marc.info/?l=linux-raid&m=121001065404056&w=2
> 
...fixes the raid 0/1/10/5/6 cases, but I am still trying to isolate an
issue (potentially unrelated) with linear arrays.

--
Dan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-08 18:46           ` Dan Williams
@ 2008-05-08 23:18             ` Dan Williams
  2008-05-09  2:15               ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2008-05-08 23:18 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor,
	Linux Kernel list, linux-raid

On Thu, 2008-05-08 at 11:46 -0700, Dan Williams wrote:
> On Thu, 2008-05-08 at 11:39 -0700, Rafael J. Wysocki wrote:
> > I get a similar warning with RAID1 on one of my test boxes:
> > 
> > WARNING: at /home/rafael/src/linux-2.6/include/linux/blkdev.h:443 blk_remove_plug+0x85/0xa0()
> > Modules linked in: raid456 async_xor async_memcpy async_tx xor raid0 ehci_hcd ohci_hcd sd_mod edd raid1 ext3 jbd fan sata_uli pata_ali thermal processor
> > Pid: 2159, comm: md1_raid1 Not tainted 2.6.26-rc1 #158
> > 
> > Call Trace:
> >  [<ffffffff80238bbf>] warn_on_slowpath+0x5f/0x80
> >  [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0
> >  [<ffffffff80348f55>] blk_remove_plug+0x85/0xa0
> >  [<ffffffffa004df64>] :raid1:flush_pending_writes+0x44/0xb0
> >  [<ffffffffa004e649>] :raid1:raid1d+0x59/0xfe0
> >  [<ffffffff8025e8d8>] ? __lock_acquire+0x748/0x10d0
> >  [<ffffffff8025dc4f>] ? trace_hardirqs_on+0xbf/0x150
> >  [<ffffffff8043ea8c>] md_thread+0x3c/0x110
> >  [<ffffffff8024f6a0>] ? autoremove_wake_function+0x0/0x40
> >  [<ffffffff8043ea50>] ? md_thread+0x0/0x110
> >  [<ffffffff8024f23d>] kthread+0x4d/0x80
> >  [<ffffffff8020c548>] child_rip+0xa/0x12
> >  [<ffffffff8020bc5f>] ? restore_args+0x0/0x30
> >  [<ffffffff8024f1f0>] ? kthread+0x0/0x80
> >  [<ffffffff8020c53e>] ? child_rip+0x0/0x12
> > 
> > ---[ end trace 05d4e0844c61f45d ]---
> > 
> > This is the WARN_ON_ONCE(!queue_is_locked(q)) in queue_flag_clear(),
> > apparently.
> 
> Yes, it triggers on all RAID levels.  The patch in this message:
> 
> http://marc.info/?l=linux-raid&m=121001065404056&w=2
> > 
> ...fixes the raid 0/1/10/5/6 cases, but I am still trying to isolate an
> issue (potentially unrelated) with linear arrays.
> 

Gah, 'device_lock' can not come after 'disks[0]' in 'struct
linear_private_data'.  Updated patch below.  Simple testing passes:
'mdadm --create /dev/md0; mkfs.ext3 /dev/md0' for each raid level
linear, 0, 1, 10, 5, and 6.

---snip--->
Subject: md: tell blk-core about device_lock for protecting the queue flags
From: Dan Williams <dan.j.williams@intel.com>

Now that queue flags are no longer atomic (commit:
75ad23bc0fcb4f992a5d06982bf0857ab1738e9e) blk-core checks the queue is locked
via ->queue_lock.  As noticed by Neil conf->device_lock already satisfies this
requirement.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/linear.c         |    6 ++++++
 drivers/md/multipath.c      |    6 ++++++
 drivers/md/raid0.c          |    6 ++++++
 drivers/md/raid1.c          |    7 ++++++-
 drivers/md/raid10.c         |    7 ++++++-
 drivers/md/raid5.c          |    2 ++
 include/linux/raid/linear.h |    3 ++-
 include/linux/raid/raid0.h  |    1 +
 8 files changed, 35 insertions(+), 3 deletions(-)


diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 0b85117..d026f08 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -122,6 +122,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 	cnt = 0;
 	conf->array_size = 0;
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	rdev_for_each(rdev, tmp, mddev) {
 		int j = rdev->raid_disk;
 		dev_info_t *disk = conf->disks + j;
@@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 
 		disk->rdev = rdev;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 42ee1a2..ee7df38 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -436,6 +436,10 @@ static int multipath_run (mddev_t *mddev)
 		goto out_free_conf;
 	}
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	conf->working_disks = 0;
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;
@@ -446,8 +450,10 @@ static int multipath_run (mddev_t *mddev)
 		disk = conf->multipaths + disk_idx;
 		disk->rdev = rdev;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, not that we ever expect a device with
 		 * a merge_bvec_fn to be involved in multipath */
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 818b482..deb5609 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -117,6 +117,10 @@ static int create_strip_zones (mddev_t *mddev)
 	if (!conf->devlist)
 		return 1;
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	/* The first zone must contain all devices, so here we check that
 	 * there is a proper alignment of slots to devices and find them all
 	 */
@@ -138,8 +142,10 @@ static int create_strip_zones (mddev_t *mddev)
 		}
 		zone->dev[j] = rdev1;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev1->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6778b7c..a01fc7e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1935,6 +1935,10 @@ static int run(mddev_t *mddev)
 	if (!conf->r1bio_pool)
 		goto out_no_mem;
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx >= mddev->raid_disks
@@ -1944,8 +1948,10 @@ static int run(mddev_t *mddev)
 
 		disk->rdev = rdev;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -1958,7 +1964,6 @@ static int run(mddev_t *mddev)
 	}
 	conf->raid_disks = mddev->raid_disks;
 	conf->mddev = mddev;
-	spin_lock_init(&conf->device_lock);
 	INIT_LIST_HEAD(&conf->retry_list);
 
 	spin_lock_init(&conf->resync_lock);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5938fa9..c28af78 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2082,6 +2082,10 @@ static int run(mddev_t *mddev)
 		goto out_free_conf;
 	}
 
+	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx >= mddev->raid_disks
@@ -2091,8 +2095,10 @@ static int run(mddev_t *mddev)
 
 		disk->rdev = rdev;
 
+		spin_lock(&conf->device_lock);
 		blk_queue_stack_limits(mddev->queue,
 				       rdev->bdev->bd_disk->queue);
+		spin_unlock(&conf->device_lock);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -2103,7 +2109,6 @@ static int run(mddev_t *mddev)
 
 		disk->head_position = 0;
 	}
-	spin_lock_init(&conf->device_lock);
 	INIT_LIST_HEAD(&conf->retry_list);
 
 	spin_lock_init(&conf->resync_lock);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ee0ea91..59964a7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4257,6 +4257,8 @@ static int run(mddev_t *mddev)
 			goto abort;
 	}
 	spin_lock_init(&conf->device_lock);
+	/* blk-core uses queue_lock to verify protection of the queue flags */
+	mddev->queue->queue_lock = &conf->device_lock;
 	init_waitqueue_head(&conf->wait_for_stripe);
 	init_waitqueue_head(&conf->wait_for_overlap);
 	INIT_LIST_HEAD(&conf->handle_list);
diff --git a/include/linux/raid/linear.h b/include/linux/raid/linear.h
index ba15469..3c35e1e 100644
--- a/include/linux/raid/linear.h
+++ b/include/linux/raid/linear.h
@@ -18,7 +18,8 @@ struct linear_private_data
 	sector_t		hash_spacing;
 	sector_t		array_size;
 	int			preshift; /* shift before dividing by hash_spacing */
-	dev_info_t		disks[0];
+	spinlock_t		device_lock;
+	dev_info_t		disks[0]; /* grows depending on 'raid_disks' */
 };
 
 
diff --git a/include/linux/raid/raid0.h b/include/linux/raid/raid0.h
index 1b2dda0..3d20d14 100644
--- a/include/linux/raid/raid0.h
+++ b/include/linux/raid/raid0.h
@@ -21,6 +21,7 @@ struct raid0_private_data
 
 	sector_t hash_spacing;
 	int preshift;			/* shift this before divide by hash_spacing */
+	spinlock_t device_lock;
 };
 
 typedef struct raid0_private_data raid0_conf_t;




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-08 23:18             ` Dan Williams
@ 2008-05-09  2:15               ` Neil Brown
  2008-05-09  4:59                 ` Dan Williams
  2008-05-09  5:38                 ` Neil Brown
  0 siblings, 2 replies; 11+ messages in thread
From: Neil Brown @ 2008-05-09  2:15 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor,
	Linux Kernel list, linux-raid

On Thursday May 8, dan.j.williams@intel.com wrote:
> On Thu, 2008-05-08 at 11:46 -0700, Dan Williams wrote:
> Subject: md: tell blk-core about device_lock for protecting the queue flags
> From: Dan Williams <dan.j.williams@intel.com>
> 
> Now that queue flags are no longer atomic (commit:
> 75ad23bc0fcb4f992a5d06982bf0857ab1738e9e) blk-core checks the queue is locked
> via ->queue_lock.  As noticed by Neil conf->device_lock already satisfies this
> requirement.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> 
>  drivers/md/linear.c         |    6 ++++++
>  drivers/md/multipath.c      |    6 ++++++
>  drivers/md/raid0.c          |    6 ++++++
>  drivers/md/raid1.c          |    7 ++++++-
>  drivers/md/raid10.c         |    7 ++++++-
>  drivers/md/raid5.c          |    2 ++
>  include/linux/raid/linear.h |    3 ++-
>  include/linux/raid/raid0.h  |    1 +
>  8 files changed, 35 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
> index 0b85117..d026f08 100644
> --- a/drivers/md/linear.c
> +++ b/drivers/md/linear.c
> @@ -122,6 +122,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
>  	cnt = 0;
>  	conf->array_size = 0;
>  
> +	spin_lock_init(&conf->device_lock);
> +	/* blk-core uses queue_lock to verify protection of the queue flags */
> +	mddev->queue->queue_lock = &conf->device_lock;
> +
>  	rdev_for_each(rdev, tmp, mddev) {
>  		int j = rdev->raid_disk;
>  		dev_info_t *disk = conf->disks + j;
> @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
>  
>  		disk->rdev = rdev;
>  
> +		spin_lock(&conf->device_lock);
>  		blk_queue_stack_limits(mddev->queue,
>  				       rdev->bdev->bd_disk->queue);
> +		spin_unlock(&conf->device_lock);
>  		/* as we don't honour merge_bvec_fn, we must never risk
>  		 * violating it, so limit ->max_sector to one PAGE, as
>  		 * a one page request is never in violation.

This shouldn't be necessary.
There is no actual race here -- mddev->queue->queue_flags is not going to be
accessed by anyone else until do_md_run does
	mddev->queue->make_request_fn = mddev->pers->make_request;
which is much later.
So we only need to be sure that "queue_is_locked" doesn't complain.
And as q->queue_lock is still NULL at this point, it won't complain.

I think that the *only* change that is needs is to put

> +	/* blk-core uses queue_lock to verify protection of the queue flags */
> +	mddev->queue->queue_lock = &conf->device_lock;

after each
> +	spin_lock_init(&conf->device_lock);

i.e. in raid1.c, raid10.c and raid5.c

??

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-09  2:15               ` Neil Brown
@ 2008-05-09  4:59                 ` Dan Williams
  2008-05-09  5:38                 ` Neil Brown
  1 sibling, 0 replies; 11+ messages in thread
From: Dan Williams @ 2008-05-09  4:59 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor,
	Linux Kernel list, linux-raid

On Thu, May 8, 2008 at 7:15 PM, Neil Brown <neilb@suse.de> wrote:
> On Thursday May 8, dan.j.williams@intel.com wrote:
>  > @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
>  >
>  >               disk->rdev = rdev;
>  >
>  > +             spin_lock(&conf->device_lock);
>  >               blk_queue_stack_limits(mddev->queue,
>  >                                      rdev->bdev->bd_disk->queue);
>  > +             spin_unlock(&conf->device_lock);
>  >               /* as we don't honour merge_bvec_fn, we must never risk
>  >                * violating it, so limit ->max_sector to one PAGE, as
>  >                * a one page request is never in violation.
>
>  This shouldn't be necessary.
>  There is no actual race here -- mddev->queue->queue_flags is not going to be
>  accessed by anyone else until do_md_run does
>         mddev->queue->make_request_fn = mddev->pers->make_request;
>  which is much later.
>  So we only need to be sure that "queue_is_locked" doesn't complain.
>  And as q->queue_lock is still NULL at this point, it won't complain.
>
>  I think that the *only* change that is needs is to put
>
>
>  > +     /* blk-core uses queue_lock to verify protection of the queue flags */
>  > +     mddev->queue->queue_lock = &conf->device_lock;
>
>  after each
>
> > +     spin_lock_init(&conf->device_lock);
>
>  i.e. in raid1.c, raid10.c and raid5.c
>
>  ??

Yes, locking shouldn't be needed at those points; however, the warning
still fires because blk_queue_stack_limits() is using
queue_flag_clear() instead of queue_flag_unlocked().  Taking a look at
converting it to queue_flag_clear_unlocked() uncovered a couple more
overlooked sites (multipath.c:multipath_add_disk and
raid1.c:raid1_add_disk) where ->run has already been called...

The options I am thinking of all seem ugly:
1/ keep the unnecessary locking in MD
2/ make blk_queue_stack_limits() use queue_flag_clear_unlocked() even
though it needs to be locked sometimes
3/ conditionally use queue_flag_clear_unlocked if !t->queue_lock

--
Dan
I'm having a h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-09  2:15               ` Neil Brown
  2008-05-09  4:59                 ` Dan Williams
@ 2008-05-09  5:38                 ` Neil Brown
  2008-05-12 17:46                   ` Dan Williams
  1 sibling, 1 reply; 11+ messages in thread
From: Neil Brown @ 2008-05-09  5:38 UTC (permalink / raw)
  To: Dan Williams, Jens Axboe, Rafael J. Wysocki, Jacek Luczak,
	Prakash Punnoor

On Friday May 9, neilb@suse.de wrote:
> On Thursday May 8, dan.j.williams@intel.com wrote:
> > @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
> >  
> >  		disk->rdev = rdev;
> >  
> > +		spin_lock(&conf->device_lock);
> >  		blk_queue_stack_limits(mddev->queue,
> >  				       rdev->bdev->bd_disk->queue);
> > +		spin_unlock(&conf->device_lock);
> >  		/* as we don't honour merge_bvec_fn, we must never risk
> >  		 * violating it, so limit ->max_sector to one PAGE, as
> >  		 * a one page request is never in violation.
> 
> This shouldn't be necessary.
> There is no actual race here -- mddev->queue->queue_flags is not going to be
> accessed by anyone else until do_md_run does
> 	mddev->queue->make_request_fn = mddev->pers->make_request;
> which is much later.
> So we only need to be sure that "queue_is_locked" doesn't complain.
> And as q->queue_lock is still NULL at this point, it won't complain.

Sorry, I got that backwards.  It will complain, won't it. :-)

I gotta say that I think it shouldn't.  Introducing a spinlock in
linear.c, raid0.c, multipath.c just to silence a "WARN_ON" seems like
the wrong thing to do.  Of course we could just use q->__queue_lock so
we don't have to add a new lock, but we still have to take the lock
unnecessarily.

Unfortunately I cannot find a nice solution that both avoids clutter
in md code and also protects against carelessly changing flags without
a proper lock.....

Maybe....
We could get blk_queue_stack_limits to lock the queue, and always
spin_lock_init __queue_lock.  Then the only change needed in linear.c
et al would be to set ->queue_lock to &->__queue_lock.

Jens:  What do you think of this??

NeilBrown



diff --git a/block/blk-core.c b/block/blk-core.c
index b754a4a..2d31dc2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -479,6 +479,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	kobject_init(&q->kobj, &blk_queue_ktype);
 
 	mutex_init(&q->sysfs_lock);
+	spin_lock_init(&q->__queue_lock);
 
 	return q;
 }
@@ -541,10 +542,8 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 	 * if caller didn't supply a lock, they get per-queue locking with
 	 * our embedded lock
 	 */
-	if (!lock) {
-		spin_lock_init(&q->__queue_lock);
+	if (!lock)
 		lock = &q->__queue_lock;
-	}
 
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;
diff --git a/block/blk-settings.c b/block/blk-settings.c
index bb93d4c..488199a 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -286,8 +286,14 @@ void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b)
 	t->max_hw_segments = min(t->max_hw_segments, b->max_hw_segments);
 	t->max_segment_size = min(t->max_segment_size, b->max_segment_size);
 	t->hardsect_size = max(t->hardsect_size, b->hardsect_size);
-	if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags))
+	if (!t->queue_lock)
+		WARN_ON_ONCE(1);
+	else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
+		unsigned long flags;
+		spin_lock_irqsave(&t->queue_lock, flags);
 		queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
+		spin_unlock_irqrestore(&t->queue_lock, flags);
+	}
 }
 EXPORT_SYMBOL(blk_queue_stack_limits);
 
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 0b85117..552f81b 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -250,6 +250,7 @@ static int linear_run (mddev_t *mddev)
 {
 	linear_conf_t *conf;
 
+	mddev->queue_lock = &mddev->__queue_lock;
 	conf = linear_conf(mddev, mddev->raid_disks);
 
 	if (!conf)
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 42ee1a2..90f85e4 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -417,6 +417,7 @@ static int multipath_run (mddev_t *mddev)
 	 * bookkeeping area. [whatever we allocate in multipath_run(),
 	 * should be freed in multipath_stop()]
 	 */
+	mddev->queue_lock = &mddev->__queue_lock;
 
 	conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL);
 	mddev->private = conf;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 818b482..a179c8f 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -280,6 +280,7 @@ static int raid0_run (mddev_t *mddev)
 	       (mddev->chunk_size>>1)-1);
 	blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9);
 	blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1);
+	mddev->queue_lock = &mddev->__queue_lock;
 
 	conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
 	if (!conf)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6778b7c..ac409b7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1935,6 +1935,9 @@ static int run(mddev_t *mddev)
 	if (!conf->r1bio_pool)
 		goto out_no_mem;
 
+	spin_lock_init(&conf->device_lock);
+	mddev->queue->queue_lock = &conf->device_lock;
+
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx >= mddev->raid_disks
@@ -1958,7 +1961,6 @@ static int run(mddev_t *mddev)
 	}
 	conf->raid_disks = mddev->raid_disks;
 	conf->mddev = mddev;
-	spin_lock_init(&conf->device_lock);
 	INIT_LIST_HEAD(&conf->retry_list);
 
 	spin_lock_init(&conf->resync_lock);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5938fa9..740f670 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2082,6 +2082,9 @@ static int run(mddev_t *mddev)
 		goto out_free_conf;
 	}
 
+	spin_lock_init(&conf->device_lock);
+	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
+
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx >= mddev->raid_disks
@@ -2103,7 +2106,6 @@ static int run(mddev_t *mddev)
 
 		disk->head_position = 0;
 	}
-	spin_lock_init(&conf->device_lock);
 	INIT_LIST_HEAD(&conf->retry_list);
 
 	spin_lock_init(&conf->resync_lock);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 087eee0..4fafc79 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4256,6 +4256,7 @@ static int run(mddev_t *mddev)
 			goto abort;
 	}
 	spin_lock_init(&conf->device_lock);
+	mddev->queue->queue_lock = &conf->device_lock;
 	init_waitqueue_head(&conf->wait_for_stripe);
 	init_waitqueue_head(&conf->wait_for_overlap);
 	INIT_LIST_HEAD(&conf->handle_list);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-09  5:38                 ` Neil Brown
@ 2008-05-12 17:46                   ` Dan Williams
  2008-05-13  1:08                     ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2008-05-12 17:46 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor,
	Linux Kernel list, linux-raid


On Thu, 2008-05-08 at 22:38 -0700, Neil Brown wrote:
> On Friday May 9, neilb@suse.de wrote:
> > On Thursday May 8, dan.j.williams@intel.com wrote:
> > > @@ -133,8 +137,10 @@ static linear_conf_t *linear_conf(mddev_t
> *mddev, int raid_disks)
> > > 
> > >             disk->rdev = rdev;
> > > 
> > > +           spin_lock(&conf->device_lock);
> > >             blk_queue_stack_limits(mddev->queue,
> > >                                    rdev->bdev->bd_disk->queue);
> > > +           spin_unlock(&conf->device_lock);
> > >             /* as we don't honour merge_bvec_fn, we must never
> risk
> > >              * violating it, so limit ->max_sector to one PAGE, as
> > >              * a one page request is never in violation.
> >
> > This shouldn't be necessary.
> > There is no actual race here -- mddev->queue->queue_flags is not
> going to be
> > accessed by anyone else until do_md_run does
> >       mddev->queue->make_request_fn = mddev->pers->make_request;
> > which is much later.
> > So we only need to be sure that "queue_is_locked" doesn't complain.
> > And as q->queue_lock is still NULL at this point, it won't complain.
> 
> Sorry, I got that backwards.  It will complain, won't it. :-)
> 
> I gotta say that I think it shouldn't.  Introducing a spinlock in
> linear.c, raid0.c, multipath.c just to silence a "WARN_ON" seems like
> the wrong thing to do.  Of course we could just use q->__queue_lock so
> we don't have to add a new lock, but we still have to take the lock
> unnecessarily.
> 
> Unfortunately I cannot find a nice solution that both avoids clutter
> in md code and also protects against carelessly changing flags without
> a proper lock.....
> 
> Maybe....
> We could get blk_queue_stack_limits to lock the queue, and always
> spin_lock_init __queue_lock.  Then the only change needed in linear.c
> et al would be to set ->queue_lock to &->__queue_lock.
> 
> Jens:  What do you think of this??
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b754a4a..2d31dc2 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -479,6 +479,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t
> gfp_mask, int node_id)
>         kobject_init(&q->kobj, &blk_queue_ktype);
> 
>         mutex_init(&q->sysfs_lock);
> +       spin_lock_init(&q->__queue_lock);
> 
>         return q;
>  }
> @@ -541,10 +542,8 @@ blk_init_queue_node(request_fn_proc *rfn,
> spinlock_t *lock, int node_id)
>          * if caller didn't supply a lock, they get per-queue locking
> with
>          * our embedded lock
>          */
> -       if (!lock) {
> -               spin_lock_init(&q->__queue_lock);
> +       if (!lock)
>                 lock = &q->__queue_lock;
> -       }
> 
>         q->request_fn           = rfn;
>         q->prep_rq_fn           = NULL;
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index bb93d4c..488199a 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -286,8 +286,14 @@ void blk_queue_stack_limits(struct request_queue
> *t, struct request_queue *b)
>         t->max_hw_segments = min(t->max_hw_segments,
> b->max_hw_segments);
>         t->max_segment_size = min(t->max_segment_size,
> b->max_segment_size);
>         t->hardsect_size = max(t->hardsect_size, b->hardsect_size);
> -       if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags))
> +       if (!t->queue_lock)
> +               WARN_ON_ONCE(1);
> +       else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
> +               unsigned long flags;
> +               spin_lock_irqsave(&t->queue_lock, flags);
>                 queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
> +               spin_unlock_irqrestore(&t->queue_lock, flags);
> +       }
>  }
>  EXPORT_SYMBOL(blk_queue_stack_limits);
> 
> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
> index 0b85117..552f81b 100644
> --- a/drivers/md/linear.c
> +++ b/drivers/md/linear.c
> @@ -250,6 +250,7 @@ static int linear_run (mddev_t *mddev)
>  {
>         linear_conf_t *conf;
> 
> +       mddev->queue_lock = &mddev->__queue_lock;
>         conf = linear_conf(mddev, mddev->raid_disks);
> 
>         if (!conf)
> diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
> index 42ee1a2..90f85e4 100644
> --- a/drivers/md/multipath.c
> +++ b/drivers/md/multipath.c
> @@ -417,6 +417,7 @@ static int multipath_run (mddev_t *mddev)
>          * bookkeeping area. [whatever we allocate in multipath_run(),
>          * should be freed in multipath_stop()]
>          */
> +       mddev->queue_lock = &mddev->__queue_lock;
> 
>         conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL);
>         mddev->private = conf;
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 818b482..a179c8f 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -280,6 +280,7 @@ static int raid0_run (mddev_t *mddev)
>                (mddev->chunk_size>>1)-1);
>         blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9);
>         blk_queue_segment_boundary(mddev->queue,
> (mddev->chunk_size>>1) - 1);
> +       mddev->queue_lock = &mddev->__queue_lock;
> 
>         conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
>         if (!conf)
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 6778b7c..ac409b7 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1935,6 +1935,9 @@ static int run(mddev_t *mddev)
>         if (!conf->r1bio_pool)
>                 goto out_no_mem;
> 
> +       spin_lock_init(&conf->device_lock);
> +       mddev->queue->queue_lock = &conf->device_lock;
> +
>         rdev_for_each(rdev, tmp, mddev) {
>                 disk_idx = rdev->raid_disk;
>                 if (disk_idx >= mddev->raid_disks
> @@ -1958,7 +1961,6 @@ static int run(mddev_t *mddev)
>         }
>         conf->raid_disks = mddev->raid_disks;
>         conf->mddev = mddev;
> -       spin_lock_init(&conf->device_lock);
>         INIT_LIST_HEAD(&conf->retry_list);
> 
>         spin_lock_init(&conf->resync_lock);
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 5938fa9..740f670 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2082,6 +2082,9 @@ static int run(mddev_t *mddev)
>                 goto out_free_conf;
>         }
> 
> +       spin_lock_init(&conf->device_lock);
> +       mddev->queue->queue_lock = &mddev->queue->__queue_lock;
> +
>         rdev_for_each(rdev, tmp, mddev) {
>                 disk_idx = rdev->raid_disk;
>                 if (disk_idx >= mddev->raid_disks
> @@ -2103,7 +2106,6 @@ static int run(mddev_t *mddev)
> 
>                 disk->head_position = 0;
>         }
> -       spin_lock_init(&conf->device_lock);
>         INIT_LIST_HEAD(&conf->retry_list);
> 
>         spin_lock_init(&conf->resync_lock);
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 087eee0..4fafc79 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4256,6 +4256,7 @@ static int run(mddev_t *mddev)
>                         goto abort;
>         }
>         spin_lock_init(&conf->device_lock);
> +       mddev->queue->queue_lock = &conf->device_lock;
>         init_waitqueue_head(&conf->wait_for_stripe);
>         init_waitqueue_head(&conf->wait_for_overlap);
>         INIT_LIST_HEAD(&conf->handle_list);
> 

Yes, this is simpler than what I had... spotted some fixups.

--
Dan

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 488199a..8dd8641 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -290,9 +290,9 @@ void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b)
 		WARN_ON_ONCE(1);
 	else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
 		unsigned long flags;
-		spin_lock_irqsave(&t->queue_lock, flags);
+		spin_lock_irqsave(t->queue_lock, flags);
 		queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
-		spin_unlock_irqrestore(&t->queue_lock, flags);
+		spin_unlock_irqrestore(t->queue_lock, flags);
 	}
 }
 EXPORT_SYMBOL(blk_queue_stack_limits);
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 552f81b..1074824 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -250,7 +250,7 @@ static int linear_run (mddev_t *mddev)
 {
 	linear_conf_t *conf;
 
-	mddev->queue_lock = &mddev->__queue_lock;
+	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
 	conf = linear_conf(mddev, mddev->raid_disks);
 
 	if (!conf)
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 90f85e4..4f4d1f3 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -417,7 +417,7 @@ static int multipath_run (mddev_t *mddev)
 	 * bookkeeping area. [whatever we allocate in multipath_run(),
 	 * should be freed in multipath_stop()]
 	 */
-	mddev->queue_lock = &mddev->__queue_lock;
+	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
 
 	conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL);
 	mddev->private = conf;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index a179c8f..914c04d 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -280,7 +280,7 @@ static int raid0_run (mddev_t *mddev)
 	       (mddev->chunk_size>>1)-1);
 	blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9);
 	blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1);
-	mddev->queue_lock = &mddev->__queue_lock;
+	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
 
 	conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
 	if (!conf)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f46d448..8536ede 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2083,7 +2083,7 @@ static int run(mddev_t *mddev)
 	}
 
 	spin_lock_init(&conf->device_lock);
-	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
+	mddev->queue->queue_lock = &conf->device_lock;
 
 	rdev_for_each(rdev, tmp, mddev) {
 		disk_idx = rdev->raid_disk;



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: WARNING in 2.6.25-07422-gb66e1f1
  2008-05-12 17:46                   ` Dan Williams
@ 2008-05-13  1:08                     ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2008-05-13  1:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jens Axboe, Rafael J. Wysocki, Jacek Luczak, Prakash Punnoor,
	Linux Kernel list, linux-raid

On Monday May 12, dan.j.williams@intel.com wrote:
> 
> Yes, this is simpler than what I had... spotted some fixups.
> 

Ahh, you noticed that I hadn't actually compiled it. :-)
Thanks.

I've set it off to Linus.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-05-13  1:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200805031151.44287.prakash@punnoor.de>
     [not found] ` <481DB3F3.5080102@gmail.com>
     [not found]   ` <20080504183839.GN12774@kernel.dk>
2008-05-05  7:24     ` WARNING in 2.6.25-07422-gb66e1f1 Neil Brown
2008-05-05 18:03       ` Dan Williams
2008-05-05 19:02       ` Jens Axboe
2008-05-08 18:39         ` Rafael J. Wysocki
2008-05-08 18:46           ` Dan Williams
2008-05-08 23:18             ` Dan Williams
2008-05-09  2:15               ` Neil Brown
2008-05-09  4:59                 ` Dan Williams
2008-05-09  5:38                 ` Neil Brown
2008-05-12 17:46                   ` Dan Williams
2008-05-13  1:08                     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).