[patch 1/2] raid6_end_write_request() spinlock fix

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 1/2] raid6_end_write_request() spinlock fix
@ 2006-04-25  3:35 Coywolf Qi Hunt
  2006-04-25  5:13 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Coywolf Qi Hunt @ 2006-04-25  3:35 UTC (permalink / raw)
  To: akpm; +Cc: neilb, linux-kernel, linux-raid

Hello,

Reduce the raid6_end_write_request() spinlock window.

Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
---

diff --git a/drivers/md/raid6main.c b/drivers/md/raid6main.c
index bc69355..820536e 100644
--- a/drivers/md/raid6main.c
+++ b/drivers/md/raid6main.c
@@ -468,7 +468,6 @@ static int raid6_end_write_request (stru
  	struct stripe_head *sh = bi->bi_private;
 	raid6_conf_t *conf = sh->raid_conf;
 	int disks = conf->raid_disks, i;
-	unsigned long flags;
 	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
 
 	if (bi->bi_size)
@@ -486,16 +485,14 @@ static int raid6_end_write_request (stru
 		return 0;
 	}
 
-	spin_lock_irqsave(&conf->device_lock, flags);
 	if (!uptodate)
 		md_error(conf->mddev, conf->disks[i].rdev);
 
 	rdev_dec_pending(conf->disks[i].rdev, conf->mddev);
-
 	clear_bit(R5_LOCKED, &sh->dev[i].flags);
 	set_bit(STRIPE_HANDLE, &sh->state);
-	__release_stripe(conf, sh);
-	spin_unlock_irqrestore(&conf->device_lock, flags);
+	release_stripe(sh);
+
 	return 0;
 }
 

-- 
Coywolf Qi Hunt

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [patch 1/2] raid6_end_write_request() spinlock fix
  2006-04-25  3:35 [patch 1/2] raid6_end_write_request() spinlock fix Coywolf Qi Hunt
@ 2006-04-25  5:13 ` Neil Brown
  2006-04-25  6:43   ` Coywolf Qi Hunt
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2006-04-25  5:13 UTC (permalink / raw)
  To: Coywolf Qi Hunt; +Cc: akpm, linux-kernel, linux-raid

On Tuesday April 25, qiyong@fc-cn.com wrote:
> Hello,
> 
> Reduce the raid6_end_write_request() spinlock window.

Andrew: please don't include these in -mm.  This one and the
corresponding raid5 are wrong, and I'm not sure yet the unplug_device
changes.

In this case, the call to md_error, which in turn calls "error" in
raid6main.c, requires the lock to be held as it contains:
	if (!test_bit(Faulty, &rdev->flags)) {
		mddev->sb_dirty = 1;
		if (test_bit(In_sync, &rdev->flags)) {
			conf->working_disks--;
			mddev->degraded++;
			conf->failed_disks++;
			clear_bit(In_sync, &rdev->flags);
			/*
			 * if recovery was running, make sure it aborts.
			 */
			set_bit(MD_RECOVERY_ERR, &mddev->recovery);
		}
		set_bit(Faulty, &rdev->flags);

which is fairly clearly not safe without some locking.

Coywolf:  As I think I have already said, I appreciate your review of
the md/raid code and your attempts to improve it - I'm sure there is
plenty of room to make improvements.  
However posting patches with minimal commentary on code that you don't
fully understand is not the best way to work with the community.
If you see something that you think is wrong, it is much better to ask
why it is the way it is, explain why you think it isn't right, and
quite possibly include an example patch.  Then we can discuss the
issue and find the best solution.

So please feel free to post further patches, but please include more
commentary, and don't assume you understand something that you don't
really.

Thanks,
NeilBrown



> 
> Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
> ---
> 
> diff --git a/drivers/md/raid6main.c b/drivers/md/raid6main.c
> index bc69355..820536e 100644
> --- a/drivers/md/raid6main.c
> +++ b/drivers/md/raid6main.c
> @@ -468,7 +468,6 @@ static int raid6_end_write_request (stru
>   	struct stripe_head *sh = bi->bi_private;
>  	raid6_conf_t *conf = sh->raid_conf;
>  	int disks = conf->raid_disks, i;
> -	unsigned long flags;
>  	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
>  
>  	if (bi->bi_size)
> @@ -486,16 +485,14 @@ static int raid6_end_write_request (stru
>  		return 0;
>  	}
>  
> -	spin_lock_irqsave(&conf->device_lock, flags);
>  	if (!uptodate)
>  		md_error(conf->mddev, conf->disks[i].rdev);
>  
>  	rdev_dec_pending(conf->disks[i].rdev, conf->mddev);
> -
>  	clear_bit(R5_LOCKED, &sh->dev[i].flags);
>  	set_bit(STRIPE_HANDLE, &sh->state);
> -	__release_stripe(conf, sh);
> -	spin_unlock_irqrestore(&conf->device_lock, flags);
> +	release_stripe(sh);
> +
>  	return 0;
>  }
>  
> 
> -- 
> Coywolf Qi Hunt
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 1/2] raid6_end_write_request() spinlock fix
  2006-04-25  5:13 ` Neil Brown
@ 2006-04-25  6:43   ` Coywolf Qi Hunt
  2006-04-25  6:50     ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Coywolf Qi Hunt @ 2006-04-25  6:43 UTC (permalink / raw)
  To: Neil Brown; +Cc: akpm, linux-kernel, linux-raid

On Tue, Apr 25, 2006 at 03:13:49PM +1000, Neil Brown wrote:
> On Tuesday April 25, qiyong@fc-cn.com wrote:
> > Hello,
> > 
> > Reduce the raid6_end_write_request() spinlock window.
> 
> Andrew: please don't include these in -mm.  This one and the
> corresponding raid5 are wrong, and I'm not sure yet the unplug_device
> changes.

I am sure with the unplug_device. Just look follow the path...

> 
> In this case, the call to md_error, which in turn calls "error" in
> raid6main.c, requires the lock to be held as it contains:
> 	if (!test_bit(Faulty, &rdev->flags)) {
> 		mddev->sb_dirty = 1;
> 		if (test_bit(In_sync, &rdev->flags)) {
> 			conf->working_disks--;
> 			mddev->degraded++;
> 			conf->failed_disks++;
> 			clear_bit(In_sync, &rdev->flags);
> 			/*
> 			 * if recovery was running, make sure it aborts.
> 			 */
> 			set_bit(MD_RECOVERY_ERR, &mddev->recovery);
> 		}
> 		set_bit(Faulty, &rdev->flags);
> 
> which is fairly clearly not safe without some locking.

Yes. Let's fix the error(). In any case, the current code is broken. (see raid5/6_end_read_request)
Comments? Thanks.

Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
---

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9c24377..192de19 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -638,7 +638,7 @@ static void error(mddev_t *mddev, mdk_rd
 	raid5_conf_t *conf = (raid5_conf_t *) mddev->private;
 	PRINTK("raid5: error called\n");
 
-	if (!test_bit(Faulty, &rdev->flags)) {
+	if (!test_and_set_bit(Faulty, &rdev->flags)) {
 		mddev->sb_dirty = 1;
 		if (test_bit(In_sync, &rdev->flags)) {
 			conf->working_disks--;
@@ -650,7 +650,6 @@ static void error(mddev_t *mddev, mdk_rd
 			 */
 			set_bit(MD_RECOVERY_ERR, &mddev->recovery);
 		}
-		set_bit(Faulty, &rdev->flags);
 		printk (KERN_ALERT
 			"raid5: Disk failure on %s, disabling device."
 			" Operation continuing on %d devices\n",
diff --git a/drivers/md/raid6main.c b/drivers/md/raid6main.c
index d3deedb..fc0b31d 100644
--- a/drivers/md/raid6main.c
+++ b/drivers/md/raid6main.c
@@ -527,7 +527,7 @@ static void error(mddev_t *mddev, mdk_rd
 	raid6_conf_t *conf = (raid6_conf_t *) mddev->private;
 	PRINTK("raid6: error called\n");
 
-	if (!test_bit(Faulty, &rdev->flags)) {
+	if (!test_and_set_bit(Faulty, &rdev->flags)) {
 		mddev->sb_dirty = 1;
 		if (test_bit(In_sync, &rdev->flags)) {
 			conf->working_disks--;
@@ -539,7 +539,6 @@ static void error(mddev_t *mddev, mdk_rd
 			 */
 			set_bit(MD_RECOVERY_ERR, &mddev->recovery);
 		}
-		set_bit(Faulty, &rdev->flags);
 		printk (KERN_ALERT
 			"raid6: Disk failure on %s, disabling device."
 			" Operation continuing on %d devices\n",

> 
> Coywolf:  As I think I have already said, I appreciate your review of
> the md/raid code and your attempts to improve it - I'm sure there is
> plenty of room to make improvements.  
> However posting patches with minimal commentary on code that you don't
> fully understand is not the best way to work with the community.
> If you see something that you think is wrong, it is much better to ask
> why it is the way it is, explain why you think it isn't right, and
> quite possibly include an example patch.  Then we can discuss the
> issue and find the best solution.
> 
> So please feel free to post further patches, but please include more
> commentary, and don't assume you understand something that you don't
> really.
> 
> Thanks,
> NeilBrown
> 
> 
> 
> > 
> > Signed-off-by: Coywolf Qi Hunt <qiyong@fc-cn.com>
> > ---
> > 
> > diff --git a/drivers/md/raid6main.c b/drivers/md/raid6main.c
> > index bc69355..820536e 100644
> > --- a/drivers/md/raid6main.c
> > +++ b/drivers/md/raid6main.c
> > @@ -468,7 +468,6 @@ static int raid6_end_write_request (stru
> >   	struct stripe_head *sh = bi->bi_private;
> >  	raid6_conf_t *conf = sh->raid_conf;
> >  	int disks = conf->raid_disks, i;
> > -	unsigned long flags;
> >  	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
> >  
> >  	if (bi->bi_size)
> > @@ -486,16 +485,14 @@ static int raid6_end_write_request (stru
> >  		return 0;
> >  	}
> >  
> > -	spin_lock_irqsave(&conf->device_lock, flags);
> >  	if (!uptodate)
> >  		md_error(conf->mddev, conf->disks[i].rdev);
> >  
> >  	rdev_dec_pending(conf->disks[i].rdev, conf->mddev);
> > -
> >  	clear_bit(R5_LOCKED, &sh->dev[i].flags);
> >  	set_bit(STRIPE_HANDLE, &sh->state);
> > -	__release_stripe(conf, sh);
> > -	spin_unlock_irqrestore(&conf->device_lock, flags);
> > +	release_stripe(sh);
> > +
> >  	return 0;
> >  }
> >  
-- 
Coywolf Qi Hunt

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [patch 1/2] raid6_end_write_request() spinlock fix
  2006-04-25  6:43   ` Coywolf Qi Hunt
@ 2006-04-25  6:50     ` Neil Brown
  2006-04-25  8:07       ` Coywolf Qi Hunt
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2006-04-25  6:50 UTC (permalink / raw)
  To: Coywolf Qi Hunt; +Cc: akpm, linux-kernel, linux-raid

On Tuesday April 25, qiyong@fc-cn.com wrote:
> On Tue, Apr 25, 2006 at 03:13:49PM +1000, Neil Brown wrote:
> > On Tuesday April 25, qiyong@fc-cn.com wrote:
> > > Hello,
> > > 
> > > Reduce the raid6_end_write_request() spinlock window.
> > 
> > Andrew: please don't include these in -mm.  This one and the
> > corresponding raid5 are wrong, and I'm not sure yet the unplug_device
> > changes.
> 
> I am sure with the unplug_device. Just look follow the path...
> 

What path?  There are probably several.  If I follow the path, will I
see the same things as you see?  Who knows, because you haven't
bothered to tell us what you see.

> 
> Yes. Let's fix the error(). In any case, the current code is broken. (see raid5/6_end_read_request)

What will I see in raidX_end_read_request.  Surely it isn't that hard
to write a few more sentences?

> Comments? Thanks.

conf->working_disks isn't atomic_t and so decrementing without a
spinlock isn't safe.  So lets just leave it all inside a spinlock.

Also I have a vague memory that clearing In_sync before Faulty is
important, but I'm not certain of that.

Remember: the code is there for a reason.  It might not be a good
reason, and the code could well be wrong.  But it would be worth your
effort trying to find out what the reason is before blithely changing
it (as I discovered recently with a change I suggested to
invalidate_mapping_pages).

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 1/2] raid6_end_write_request() spinlock fix
  2006-04-25  6:50     ` Neil Brown
@ 2006-04-25  8:07       ` Coywolf Qi Hunt
  0 siblings, 0 replies; 5+ messages in thread
From: Coywolf Qi Hunt @ 2006-04-25  8:07 UTC (permalink / raw)
  To: Neil Brown; +Cc: akpm, linux-kernel, linux-raid

On Tue, Apr 25, 2006 at 04:50:10PM +1000, Neil Brown wrote:
> On Tuesday April 25, qiyong@fc-cn.com wrote:
> > On Tue, Apr 25, 2006 at 03:13:49PM +1000, Neil Brown wrote:
> > > On Tuesday April 25, qiyong@fc-cn.com wrote:
> > > > Hello,
> > > > 
> > > > Reduce the raid6_end_write_request() spinlock window.
> > > 
> > > Andrew: please don't include these in -mm.  This one and the
> > > corresponding raid5 are wrong, and I'm not sure yet the unplug_device
> > > changes.
> > 
> > I am sure with the unplug_device. Just look follow the path...
> > 
> 
> What path?  There are probably several.  If I follow the path, will I
> see the same things as you see?  Who knows, because you haven't
> bothered to tell us what you see.

There are only two places where handle_list is possibly re-filled:
__release_stripe() and raidX_activate_delayed().  So raidXd should only
wakeup after these two points.

> 
> > 
> > Yes. Let's fix the error(). In any case, the current code is broken. (see raid5/6_end_read_request)
> 
> What will I see in raidX_end_read_request.  Surely it isn't that hard
> to write a few more sentences?

You should see md_error() in raidX_end_read_request isn't in any spinlocks.

> conf->working_disks isn't atomic_t and so decrementing without a
> spinlock isn't safe.  So lets just leave it all inside a spinlock.

test_and_set_bit(Faulty, &rdev->flags) protects it as well imho.
It can be enter only once.

> 
> Also I have a vague memory that clearing In_sync before Faulty is
> important, but I'm not certain of that.

Maybe, but seems not apply here.

> 
> Remember: the code is there for a reason.  It might not be a good
> reason, and the code could well be wrong.  But it would be worth your
> effort trying to find out what the reason is before blithely changing
> it (as I discovered recently with a change I suggested to
> invalidate_mapping_pages).

Thanks :)
-- 
Coywolf Qi Hunt

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-04-25  8:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-25  3:35 [patch 1/2] raid6_end_write_request() spinlock fix Coywolf Qi Hunt
2006-04-25  5:13 ` Neil Brown
2006-04-25  6:43   ` Coywolf Qi Hunt
2006-04-25  6:50     ` Neil Brown
2006-04-25  8:07       ` Coywolf Qi Hunt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).