linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Niu Yawei <yawei.niu@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	yawei.niu@intel.com, andreas.dilger@intel.com, jack@suse.cz,
	lai.siyao@intel.com
Subject: Re: [PATCH] quota: remove dqptr_sem for scalability
Date: Thu, 22 May 2014 15:25:56 +0200	[thread overview]
Message-ID: <20140522132556.GE7999@quack.suse.cz> (raw)
In-Reply-To: <537DD5BA.1050105@gmail.com>

  Hello,

  thanks for the work! I have some comments below.

On Thu 22-05-14 18:47:22, Niu Yawei wrote:
> There are several global locks in the VFS quota code which hurts
> performance a lot when quota accounting enabled, dqptr_sem is the major one.
> 
> This patch tries to make the VFS quota code scalable with minimal changes.
> 
> Following tests (mdtest & dbench) were running over ext4 fs in a
> centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
> the patch relieved the lock congestion a lot.
> 
Snipped performance results - thanks for those but first let's concentrate
on correctness side of things.

> [PATCH] quota: remove dqptr_sem for scalability
> 
> Remove dqptr_sem (but kept in struct quota_info to keep kernel ABI
> unchanged), and the functionality of this lock is implemented by
> other locks:
> * i_dquot is protected by i_lock, however only this pointer, the
>   content of this struct is by dq_data_lock.
> * Q_GETFMT is now protected with dqonoff_mutex instead of dqptr_sem.
> * Small changes in __dquot_initialize() to avoid unnecessary
>   dqget()/dqput() calls.
  Each of these three steps should be a separate patch please.

  Now regarding each of these steps: Using i_lock for protection of dquot
pointers doesn't quite work. You have e.g.:
> @@ -1636,12 +1646,12 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
>  	}
>  	inode_incr_space(inode, number, reserve);
>  	spin_unlock(&dq_data_lock);
> +	spin_unlock(&inode->i_lock);
>  
>  	if (reserve)
>  		goto out_flush_warn;
>  	mark_all_dquot_dirty(dquots);
>  out_flush_warn:
> -	up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
>  	flush_warnings(warn);
>  out:
>  	return ret;
  So you release protection of dquot pointers from inode before calling
mark_all_dquot_dirty(). So dquot pointers can be removed by
remove_inode_dquot_ref() while mark_all_dquot_dirty() works on them. That's
wrong and can lead to use after free.

Quota code uses dqptr_sem to provide exclusion for three cases:
* dquot_init()
* dquot_transfer()
* various places just reading dquot pointers to update allocation
  information
* remove_dquot_ref() (called from quotaoff code)

Now exclusion against remove_dquot_ref() is relatively easy to deal with.
We can create srcu for dquots, whoever looks at dquot pointers from inode
takes srcu_read_lock() (so you basically convert read side of dqptr_sem 
and write side in dquot_init() and dquot_transfer() to that) and use
synchronize_srcu() in remove_dquot_ref() to make sure all users are done
before starting to remove dquot pointers. You'll need to move
dquot_active() checks inside srcu_read_lock() to make this reliable but that
should be easy.

What remains to deal with is an exclusion between the remaining places.
dquot_init(). dq_data_lock spinlock should already provide the necessary
exclusion between readers of allocation pointers and dquot_transfer() (that
spinlock would actually be a good candidate to a conversion to per-inode
one - using i_lock for this should noticeably reduce the contention - but
that's the next step). dquot_init() doesn't take the spinlock so far but
probably we can make it to take it for simplicity for now.

>  static void __dquot_initialize(struct inode *inode, int type)
>  {
> -	int cnt;
> -	struct dquot *got[MAXQUOTAS];
> +	int cnt, dq_get = 0;
> +	struct dquot *got[MAXQUOTAS] = { NULL, NULL };
>  	struct super_block *sb = inode->i_sb;
>  	qsize_t rsv;
>  
> -	/* First test before acquiring mutex - solves deadlocks when we
> -         * re-enter the quota code and are already holding the mutex */
>  	if (!dquot_active(inode))
>  		return;
>  
> -	/* First get references to structures we might need. */
> +	/* In most case, the i_dquot should have been initialized, except
> +	 * the newly allocated one. We'd always try to skip the dqget() and
> +	 * dqput() calls to avoid unnecessary global lock contention. */
> +	if (!(inode->i_state & I_NEW))
> +		goto init_idquot;
  The optimization is a good idea but dquot_init() is often called for
!I_NEW inodes when the initialization is necessary. So I'd rather first
check whether relevant i_dquot[] pointers are != NULL before taking any
lock. If yes, we are done, otherwise we grab pointers to dquots, take the
lock and update the pointers.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2014-05-22 13:25 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-22 10:47 [PATCH] quota: remove dqptr_sem for scalability Niu Yawei
2014-05-22 13:25 ` Jan Kara [this message]
2014-05-23  3:37   ` Niu Yawei
2014-05-27 10:13   ` [PATCH 2/3] quota: avoid unnecessary dqget()/dqput() calls Niu Yawei
2014-05-27 10:15   ` [PATCH 3/3] quota: remove dqptr_sem Niu Yawei
2014-05-23  4:02 ` [PATCH] quota: remove dqptr_sem for scalability Eric Sandeen
2014-05-23  5:22   ` Niu Yawei
2014-05-23 13:02     ` Eric Sandeen
2014-05-27 10:10 ` [PATCH 1/3] quota: protect Q_GETFMT by dqonoff_mutex Niu Yawei
2014-05-27 10:12   ` Christoph Hellwig
2014-05-27 10:28     ` Niu Yawei
2014-05-28  1:52     ` [PATCH 1/3 v2] " Niu Yawei
2014-06-02  7:32       ` Jan Kara
2014-05-28  1:53     ` [PATCH 2/3 v2] quota: avoid unnecessary dqget()/dqput() calls Niu Yawei
2014-06-02  7:42       ` Jan Kara
2014-05-28  1:55     ` [PATCH 3/3 v2] quota: remove dqptr_sem Niu Yawei
2014-05-28  2:01       ` Niu Yawei
2014-06-02  8:34       ` Jan Kara
2014-06-03  9:51         ` Niu Yawei
2014-06-03 15:43           ` Jan Kara
2014-06-04  4:19             ` [PATCH 1/5] quota: protect Q_GETFMT by dqonoff_mutex Niu Yawei
2014-06-04 15:36               ` Jan Kara
2014-06-04  4:20             ` [PATCH 2/5] quota: avoid unnecessary dqget()/dqput() calls Niu Yawei
2014-06-04  4:21             ` [PATCH 3/5] quota: simplify remove_inode_dquot_ref() Niu Yawei
2014-06-04  4:22             ` [PATCH 4/5] quota: missing lock in dqcache_shrink_scan() Niu Yawei
2014-06-04  4:23             ` [PATCH 5/5] quota: remove dqptr_sem Niu Yawei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140522132556.GE7999@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=andreas.dilger@intel.com \
    --cc=lai.siyao@intel.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=yawei.niu@gmail.com \
    --cc=yawei.niu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).