From: Niu Yawei <yawei.niu@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
yawei.niu@intel.com, andreas.dilger@intel.com,
lai.siyao@intel.com
Subject: Re: [PATCH] quota: remove dqptr_sem for scalability
Date: Fri, 23 May 2014 11:37:43 +0800 [thread overview]
Message-ID: <537EC287.70002@gmail.com> (raw)
In-Reply-To: <20140522132556.GE7999@quack.suse.cz>
Thanks for your review, Jack.
> Hello,
>
> thanks for the work! I have some comments below.
>
> On Thu 22-05-14 18:47:22, Niu Yawei wrote:
>> There are several global locks in the VFS quota code which hurts
>> performance a lot when quota accounting enabled, dqptr_sem is the major one.
>>
>> This patch tries to make the VFS quota code scalable with minimal changes.
>>
>> Following tests (mdtest & dbench) were running over ext4 fs in a
>> centos6.5 vm (8 cpus, 4G mem, kenrel: 3.15.0-rc5+), and the result shows
>> the patch relieved the lock congestion a lot.
>>
> Snipped performance results - thanks for those but first let's concentrate
> on correctness side of things.
>
>> [PATCH] quota: remove dqptr_sem for scalability
>>
>> Remove dqptr_sem (but kept in struct quota_info to keep kernel ABI
>> unchanged), and the functionality of this lock is implemented by
>> other locks:
>> * i_dquot is protected by i_lock, however only this pointer, the
>> content of this struct is by dq_data_lock.
>> * Q_GETFMT is now protected with dqonoff_mutex instead of dqptr_sem.
>> * Small changes in __dquot_initialize() to avoid unnecessary
>> dqget()/dqput() calls.
> Each of these three steps should be a separate patch please.
Ok, sure.
>
> Now regarding each of these steps: Using i_lock for protection of dquot
> pointers doesn't quite work. You have e.g.:
>> @@ -1636,12 +1646,12 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
>> }
>> inode_incr_space(inode, number, reserve);
>> spin_unlock(&dq_data_lock);
>> + spin_unlock(&inode->i_lock);
>>
>> if (reserve)
>> goto out_flush_warn;
>> mark_all_dquot_dirty(dquots);
>> out_flush_warn:
>> - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem);
>> flush_warnings(warn);
>> out:
>> return ret;
> So you release protection of dquot pointers from inode before calling
> mark_all_dquot_dirty(). So dquot pointers can be removed by
> remove_inode_dquot_ref() while mark_all_dquot_dirty() works on them. That's
> wrong and can lead to use after free.
Indeed, I didn't realise that the getting refcount code has been removed
from these functions,
is it ok to add it back?
>
> Quota code uses dqptr_sem to provide exclusion for three cases:
> * dquot_init()
> * dquot_transfer()
> * various places just reading dquot pointers to update allocation
> information
> * remove_dquot_ref() (called from quotaoff code)
>
> Now exclusion against remove_dquot_ref() is relatively easy to deal with.
> We can create srcu for dquots, whoever looks at dquot pointers from inode
> takes srcu_read_lock() (so you basically convert read side of dqptr_sem
> and write side in dquot_init() and dquot_transfer() to that) and use
> synchronize_srcu() in remove_dquot_ref() to make sure all users are done
> before starting to remove dquot pointers. You'll need to move
> dquot_active() checks inside srcu_read_lock() to make this reliable but that
> should be easy.
Ok, but adding back the refcounting code looks more straightforward to
me, may I add them back?
> What remains to deal with is an exclusion between the remaining places.
> dquot_init(). dq_data_lock spinlock should already provide the necessary
> exclusion between readers of allocation pointers and dquot_transfer() (that
> spinlock would actually be a good candidate to a conversion to per-inode
> one - using i_lock for this should noticeably reduce the contention - but
> that's the next step). dquot_init() doesn't take the spinlock so far but
> probably we can make it to take it for simplicity for now.
Using global lock in dquot_initalize() will drop the performance a lot
(because it could
be called several times for a single create/unlink operation), so I'm
afraid that we have to
use per-inode lock (i_lock) to serialize them.
>
>> static void __dquot_initialize(struct inode *inode, int type)
>> {
>> - int cnt;
>> - struct dquot *got[MAXQUOTAS];
>> + int cnt, dq_get = 0;
>> + struct dquot *got[MAXQUOTAS] = { NULL, NULL };
>> struct super_block *sb = inode->i_sb;
>> qsize_t rsv;
>>
>> - /* First test before acquiring mutex - solves deadlocks when we
>> - * re-enter the quota code and are already holding the mutex */
>> if (!dquot_active(inode))
>> return;
>>
>> - /* First get references to structures we might need. */
>> + /* In most case, the i_dquot should have been initialized, except
>> + * the newly allocated one. We'd always try to skip the dqget() and
>> + * dqput() calls to avoid unnecessary global lock contention. */
>> + if (!(inode->i_state & I_NEW))
>> + goto init_idquot;
> The optimization is a good idea but dquot_init() is often called for
> !I_NEW inodes when the initialization is necessary. So I'd rather first
> check whether relevant i_dquot[] pointers are != NULL before taking any
> lock. If yes, we are done, otherwise we grab pointers to dquots, take the
> lock and update the pointers.
Ok, thank you.
>
> Honza
next prev parent reply other threads:[~2014-05-23 3:37 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-22 10:47 [PATCH] quota: remove dqptr_sem for scalability Niu Yawei
2014-05-22 13:25 ` Jan Kara
2014-05-23 3:37 ` Niu Yawei [this message]
2014-05-27 10:13 ` [PATCH 2/3] quota: avoid unnecessary dqget()/dqput() calls Niu Yawei
2014-05-27 10:15 ` [PATCH 3/3] quota: remove dqptr_sem Niu Yawei
2014-05-23 4:02 ` [PATCH] quota: remove dqptr_sem for scalability Eric Sandeen
2014-05-23 5:22 ` Niu Yawei
2014-05-23 13:02 ` Eric Sandeen
2014-05-27 10:10 ` [PATCH 1/3] quota: protect Q_GETFMT by dqonoff_mutex Niu Yawei
2014-05-27 10:12 ` Christoph Hellwig
2014-05-27 10:28 ` Niu Yawei
2014-05-28 1:52 ` [PATCH 1/3 v2] " Niu Yawei
2014-06-02 7:32 ` Jan Kara
2014-05-28 1:53 ` [PATCH 2/3 v2] quota: avoid unnecessary dqget()/dqput() calls Niu Yawei
2014-06-02 7:42 ` Jan Kara
2014-05-28 1:55 ` [PATCH 3/3 v2] quota: remove dqptr_sem Niu Yawei
2014-05-28 2:01 ` Niu Yawei
2014-06-02 8:34 ` Jan Kara
2014-06-03 9:51 ` Niu Yawei
2014-06-03 15:43 ` Jan Kara
2014-06-04 4:19 ` [PATCH 1/5] quota: protect Q_GETFMT by dqonoff_mutex Niu Yawei
2014-06-04 15:36 ` Jan Kara
2014-06-04 4:20 ` [PATCH 2/5] quota: avoid unnecessary dqget()/dqput() calls Niu Yawei
2014-06-04 4:21 ` [PATCH 3/5] quota: simplify remove_inode_dquot_ref() Niu Yawei
2014-06-04 4:22 ` [PATCH 4/5] quota: missing lock in dqcache_shrink_scan() Niu Yawei
2014-06-04 4:23 ` [PATCH 5/5] quota: remove dqptr_sem Niu Yawei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=537EC287.70002@gmail.com \
--to=yawei.niu@gmail.com \
--cc=andreas.dilger@intel.com \
--cc=jack@suse.cz \
--cc=lai.siyao@intel.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=yawei.niu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.