All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org, Dan Williams <dan.j.williams@intel.com>
Subject: Re: [patch 3/3] raid5: relieve lock contention in get_active_stripe()
Date: Tue, 10 Sep 2013 14:59:12 +0800	[thread overview]
Message-ID: <20130910065912.GA12038@kernel.org> (raw)
In-Reply-To: <20130910152032.48631492@notabene.brown>

On Tue, Sep 10, 2013 at 03:20:32PM +1000, NeilBrown wrote:
> On Tue, 10 Sep 2013 12:24:38 +0800 Shaohua Li <shli@kernel.org> wrote:
> 
> > On Tue, Sep 10, 2013 at 02:06:29PM +1000, NeilBrown wrote:
> > > On Tue, 10 Sep 2013 10:35:55 +0800 Shaohua Li <shli@kernel.org> wrote:
> > > 
> > > > On Tue, Sep 10, 2013 at 11:13:18AM +1000, NeilBrown wrote:
> > > > > On Mon, 9 Sep 2013 12:33:18 +0800 Shaohua Li <shli@kernel.org> wrote:
> > > > > >  		} else {
> > > > > > +			spin_lock(&conf->device_lock);
> > > > > > +
> > > > > >  			if (atomic_read(&sh->count)) {
> > > > > >  				BUG_ON(!list_empty(&sh->lru)
> > > > > >  				    && !test_bit(STRIPE_EXPANDING, &sh->state)
> > > > > > @@ -611,13 +725,14 @@ get_active_stripe(struct r5conf *conf, s
> > > > > >  					sh->group = NULL;
> > > > > >  				}
> > > > > >  			}
> > > > > > +			spin_unlock(&conf->device_lock);
> > > > > 
> > > > > The device_lock is only really needed in the 'else' branch of the if
> > > > > statement.  So can we have it only there.  i.e. don't take the lock if
> > > > > sh->count is non-zero.
> > > > 
> > > > This is correct, I assume this isn't worthy optimizing before. Will fix soon.
> > > 
> > > It isn't really about optimising performance.  It is about making the code
> > > easier to understand.  If we keep the region covered by the lock as small as
> > > reasonably possible, it makes it more obvious to the reader which values are
> > > being protected.
> > > 
> > >  
> > > > > > -	spin_lock_irqsave(&conf->device_lock, flags);
> > > > > > +	lock_all_device_hash_locks_irqsave(conf, &flags);
> > > > > >  	clear_bit(In_sync, &rdev->flags);
> > > > > >  	mddev->degraded = calc_degraded(conf);
> > > > > > -	spin_unlock_irqrestore(&conf->device_lock, flags);
> > > > > > +	unlock_all_device_hash_locks_irqrestore(conf, &flags);
> > > > > >  	set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> > > > > 
> > > > > Why do you think you need to take all the hash locks here and elsewhere when
> > > > > ->degraded is set?
> > > > > The lock is only need to ensure that the 'In_sync' flags are consistent with
> > > > > the 'degraded' count.
> > > > > ->degraded isn't used in get_active_stripe so I cannot see how it is relevant
> > > > > to the hash locks.
> > > > > 
> > > > > We need to lock everything in raid5_quiesce().  I don't think we need to
> > > > > anywhere else.
> > > > 
> > > > init_stripe() accesses some filelds, don't need to protect?
> > > 
> > > What fields?  Not ->degraded.
> > > 
> > > I think the fields that it accesses are effectively protected by the new
> > > seqlock.
> > > If you don't think so, please be explicit.
> > 
> > Like raid_disks, previous_raid_disks, chunk_sectors, prev_chunk_sectors,
> > algorithm and so on. They are used in raid5_compute_sector(), stripe_set_idx()
> > and init_stripe(). The former two are called by init_stripe().
> 
> Yes.  Those are only changed in raid5_start_reshape() and are protected by
> conf->gen_lock.

Ok, I thought I misread degraded as max_degraded, so added unnecessary code.
The last question, in raid5_start_reshape(), I thought we should use seqlock to
protect the '!mddev->sync_thread' case, no?

> If they change while init_stripe is running, the read_seqcount_retry() call in
> make_request() will notice the inconsistency, release the stripe, and try
> again.
> 
> I guess we probably need an extra check on gen_lock inside init_stripe().
> i.e. a
>   do {
>      seq = read_seqcount_begin(&conf->gen_lock);
> 
> just after the "remove_hash(sh)", and a
> 
>   } while (read_seqcount_retry(&conf->gen_lock, seq));
> 
> just before the "insert_hash(sh)".  That will ensure the stripe inserted into
> the hash is consistent.  The read_seqcount_retry() in make_request is still
> needed to ensure that the correct stripe_head is used.

Good point. If it's in hash list, the seqcount check could be skiped.

Thanks,
Shaohua

  reply	other threads:[~2013-09-10  6:59 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-12  2:24 [patch 0/3] raid5: relieve lock contention of get_active_stripe() Shaohua Li
2013-08-12  2:24 ` [patch 1/3] raid5: rename stripe_hash() Shaohua Li
2013-08-12  2:24 ` [patch 2/3] wait: add wait_event_cmd() Shaohua Li
2013-08-12  2:24 ` [patch 3/3] raid5: relieve lock contention in get_active_stripe() Shaohua Li
2013-08-27  3:17   ` NeilBrown
2013-08-27  8:53     ` Shaohua Li
2013-08-28  4:32       ` NeilBrown
2013-08-28  6:39         ` Shaohua Li
2013-09-03  6:08           ` NeilBrown
2013-09-03  7:02             ` Shaohua Li
2013-09-04  6:41               ` NeilBrown
2013-09-05  5:40                 ` Shaohua Li
2013-09-05  6:29                   ` NeilBrown
2013-09-05  9:18                     ` Shaohua Li
2013-09-09  4:33                       ` Shaohua Li
2013-09-10  1:13                         ` NeilBrown
2013-09-10  2:35                           ` Shaohua Li
2013-09-10  4:06                             ` NeilBrown
2013-09-10  4:24                               ` Shaohua Li
2013-09-10  5:20                                 ` NeilBrown
2013-09-10  6:59                                   ` Shaohua Li [this message]
2013-09-10  7:28                                     ` NeilBrown
2013-09-10  7:37                                       ` Shaohua Li
2013-09-11  1:34                                         ` NeilBrown
2013-09-12  1:55                                           ` Shaohua Li
2013-09-12  5:38                                             ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130910065912.GA12038@kernel.org \
    --to=shli@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.