From: Andrew Morton <akpm@linux-foundation.org>
To: Nathan Zimmer <nzimmer@sgi.com>
Cc: viro@zeniv.linux.org.uk, eric.dumazet@gmail.com,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Eric W. Biederman" <ebiederm@xmission.com>,
David Woodhouse <dwmw2@infradead.org>,
Alexey Dobriyan <adobriyan@gmail.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH v3 resend] procfs: Improve Scaling in proc
Date: Fri, 15 Feb 2013 14:12:39 -0800 [thread overview]
Message-ID: <20130215141239.99a3bccc.akpm@linux-foundation.org> (raw)
In-Reply-To: <1360961274-24652-1-git-send-email-nzimmer@sgi.com>
On Fri, 15 Feb 2013 14:47:54 -0600
Nathan Zimmer <nzimmer@sgi.com> wrote:
> I am currently tracking a hotlock reported by a customer on a large system,
> 512 cores. I am currently running 3.8-rc7 but the issue looks like it has been
> this way for a very long time.
> The offending lock is proc_dir_entry->pde_unload_lock.
>
> This patch converts the replaces the lock with the rcu. However the pde_openers
> list still is controlled by a spin lock. I tested on a 4096 machine and the lock
> doesn't seem hot at least according to perf.
>
> This is a refresh/resend of what was orignally suggested by Eric Dumazet some
> time ago.
>
> Supporting numbers, lower is better, they are from the test I posted earlier.
> cpuinfo baseline Rcu
> tasks read-sec read-sec
> 1 0.0141 0.0141
> 2 0.0140 0.0142
> 4 0.0140 0.0141
> 8 0.0145 0.0140
> 16 0.0553 0.0168
> 32 0.1688 0.0549
> 64 0.5017 0.1690
> 128 1.7005 0.5038
> 256 5.2513 2.0804
> 512 8.0529 3.0162
>
> ...
>
> diff --git a/fs/proc/generic.c b/fs/proc/generic.c
> index 76ddae8..6896a70 100644
> --- a/fs/proc/generic.c
> +++ b/fs/proc/generic.c
> @@ -191,13 +191,16 @@ proc_file_read(struct file *file, char __user *buf, size_t nbytes,
> struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
> ssize_t rv = -EIO;
>
> - spin_lock(&pde->pde_unload_lock);
> - if (!pde->proc_fops) {
> - spin_unlock(&pde->pde_unload_lock);
> + const struct file_operations *fops;
There's now a stray newline in the definitions section.
> + rcu_read_lock();
> + fops = rcu_dereference(pde->proc_fops);
> + if (!fops) {
> + rcu_read_unlock();
> return rv;
> }
> - pde->pde_users++;
> - spin_unlock(&pde->pde_unload_lock);
> + atomic_inc(&pde->pde_users);
> + rcu_read_unlock();
So what's up with pde_users? Seems that it's atomic_t *and* uses a
form of RCU protection. We can't make it a plain old integer because
it's modified under rcu_read_lock() and we can't move the atomic_inc()
outside rcu_read_lock() because of the synchronization games in
remove_proc_entry()?
> rv = __proc_file_read(file, buf, nbytes, ppos);
>
>
> ...
>
> @@ -802,37 +809,30 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
> return;
> }
>
> - spin_lock(&de->pde_unload_lock);
> /*
> * Stop accepting new callers into module. If you're
> * dynamically allocating ->proc_fops, save a pointer somewhere.
> */
> - de->proc_fops = NULL;
> - /* Wait until all existing callers into module are done. */
> - if (de->pde_users > 0) {
> - DECLARE_COMPLETION_ONSTACK(c);
> -
> - if (!de->pde_unload_completion)
> - de->pde_unload_completion = &c;
>
> - spin_unlock(&de->pde_unload_lock);
> + rcu_assign_pointer(de->proc_fops, NULL);
> + synchronize_rcu();
> + /* Wait until all existing callers into module are done. */
>
> + DECLARE_COMPLETION_ONSTACK(c);
This should have generated a c99-style definition warning. Did your
compiler version not do this?
> + de->pde_unload_completion = &c;
> + if (!atomic_dec_and_test(&de->pde_users))
> wait_for_completion(de->pde_unload_completion);
>
> - spin_lock(&de->pde_unload_lock);
> - }
> -
> + spin_lock(&de->pde_openers_lock);
> while (!list_empty(&de->pde_openers)) {
> struct pde_opener *pdeo;
>
> pdeo = list_first_entry(&de->pde_openers, struct pde_opener, lh);
> list_del(&pdeo->lh);
> - spin_unlock(&de->pde_unload_lock);
> pdeo->release(pdeo->inode, pdeo->file);
> kfree(pdeo);
> - spin_lock(&de->pde_unload_lock);
> }
> - spin_unlock(&de->pde_unload_lock);
> + spin_unlock(&de->pde_openers_lock);
>
> if (S_ISDIR(de->mode))
> parent->nlink--;
>
> ...
>
> static loff_t proc_reg_llseek(struct file *file, loff_t offset, int whence)
> {
> + const struct file_operations *fops;
> struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
> loff_t rv = -EINVAL;
> loff_t (*llseek)(struct file *, loff_t, int);
>
> - spin_lock(&pde->pde_unload_lock);
> + rcu_read_lock();
> + fops = rcu_dereference(pde->proc_fops);
> /*
> * remove_proc_entry() is going to delete PDE (as part of module
> * cleanup sequence). No new callers into module allowed.
> */
> - if (!pde->proc_fops) {
> - spin_unlock(&pde->pde_unload_lock);
> + if (!fops) {
> + rcu_read_unlock();
> return rv;
> }
> /*
> * Bump refcount so that remove_proc_entry will wail for ->llseek to
> * complete.
> */
> - pde->pde_users++;
> + atomic_inc(&pde->pde_users);
> /*
> * Save function pointer under lock, to protect against ->proc_fops
> * NULL'ifying right after ->pde_unload_lock is dropped.
> */
This comment needs updating.
However, it doesn't appear to be true any more. With this patch we no
longer set ->fops to NULL in remove_proc_entry(). (What replaced that
logic?)
So are all these games with local variable `llseek' still needed?
afaict the increment of pde_users will stabilize ->fops?
> - llseek = pde->proc_fops->llseek;
> - spin_unlock(&pde->pde_unload_lock);
> + llseek = fops->llseek;
> + rcu_read_unlock();
>
> if (!llseek)
> llseek = default_llseek;
> @@ -182,15 +176,17 @@ static ssize_t proc_reg_read(struct file *file, char __user *buf, size_t count,
> struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
> ssize_t rv = -EIO;
> ssize_t (*read)(struct file *, char __user *, size_t, loff_t *);
> + const struct file_operations *fops;
>
> - spin_lock(&pde->pde_unload_lock);
> - if (!pde->proc_fops) {
> - spin_unlock(&pde->pde_unload_lock);
> + rcu_read_lock();
> + fops = rcu_dereference(pde->proc_fops);
> + if (!fops) {
> + rcu_read_unlock();
> return rv;
> }
> - pde->pde_users++;
> - read = pde->proc_fops->read;
> - spin_unlock(&pde->pde_unload_lock);
> + atomic_inc(&pde->pde_users);
> + read = fops->read;
> + rcu_read_unlock();
Many dittoes.
> if (read)
> rv = read(file, buf, count, ppos);
> @@ -204,15 +200,17 @@ static ssize_t proc_reg_write(struct file *file, const char __user *buf, size_t
> struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
> ssize_t rv = -EIO;
> ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *);
> + const struct file_operations *fops;
>
> - spin_lock(&pde->pde_unload_lock);
> - if (!pde->proc_fops) {
> - spin_unlock(&pde->pde_unload_lock);
> + rcu_read_lock();
> + fops = rcu_dereference(pde->proc_fops);
> + if (!fops) {
> + rcu_read_unlock();
> return rv;
> }
> - pde->pde_users++;
> - write = pde->proc_fops->write;
> - spin_unlock(&pde->pde_unload_lock);
> + atomic_inc(&pde->pde_users);
> + write = fops->write;
> + rcu_read_unlock();
>
> ...
>
next prev parent reply other threads:[~2013-02-15 22:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-15 20:47 [PATCH v3 resend] procfs: Improve Scaling in proc Nathan Zimmer
2013-02-15 22:12 ` Andrew Morton [this message]
2013-02-15 23:39 ` Nathan Zimmer
2013-02-15 23:44 ` Andrew Morton
2013-03-04 20:16 ` Stephen Warren
2013-03-04 21:00 ` Nathan Zimmer
2013-03-04 22:00 ` Stephen Warren
[not found] <fa.Y6PHWInLhuIydVYAcT5/eAuyF3Y@ifi.uio.no>
2013-02-22 7:06 ` Zheng Huai Cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130215141239.99a3bccc.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=adobriyan@gmail.com \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=eric.dumazet@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nzimmer@sgi.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.