From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, hugh@veritas.com
Subject: Re: smp race fix between invalidate_inode_pages* and do_no_page
Date: Wed, 11 Jan 2006 20:34:11 +1100 [thread overview]
Message-ID: <43C4D113.4060705@yahoo.com.au> (raw)
In-Reply-To: <20060111082359.GV15897@opteron.random>
Andrea Arcangeli wrote:
> On Wed, Jan 11, 2006 at 03:08:31PM +1100, Nick Piggin wrote:
>
>>I'd be inclined to think a lock_page is not a big SMP scalability
>>problem because the struct page's cacheline(s) will be written to
>>several times in the process of refcounting anyway. Such a workload
>>would also be running into tree_lock as well.
>
>
> I seem to recall you wanted to make the tree_lock a readonly lock for
> readers for the exact same scalability reason? do_no_page is quite a
I think Bill Irwin or Peter Chubb made the tree_lock a reader-writer
lock back in the day.
I have some patches (ref:lockless pagecache) that completely removes
the tree_lock from read-side operations like find_get_page and
find_lock_page, and turns the write side back into a regular spinlock.
You must be thinking of that?
> fast path for the tree lock too. But I totally agree the unavoidable is
> the atomic_inc though, good point, so it worth more to remove the
> tree_lock than to remove the page lock, the tree_lock can be avoided the
> atomic_inc on page->_count not.
>
Yep, my thinking as well.
> The other bonus that makes this attractive is that then we can drop the
> *whole* vm_truncate_count mess... vm_truncate_count and
> inode->trunate_count exists for the only single reason that do_no_page
> must not map into the pte a page that is under truncation. We can
> provide the same guarantee with the page lock doing like
> invalidate_inode_pages2_range (that is to check page_mapping under the
> page_lock and executing unmap_mapping_range with the page lock held if
> needed). That will free 4 bytes per vma (without even counting the
> truncate_count on every inode out there! that could be an even larger
> gain), on my system I have 9191 vmas in use, that's 36K saved of ram in
> my system, and that's 36K saved on x86, on x86-64 it's 72K saved of
> physical ram since it's an unsigned long after a pointer, and vma must
> not be hw aligned (and infact it isn't so the saving is real). On the
> indoes side it saves 4 bytes
> * 1384 on my current system, on a busy nfs server it can save a lot
> more. The inode also most not be hw aligned and correctly it isn't. On a
> server with lot more of vmas and lot more of inodes it'll save more ram.
>
> So if I make this change this could give me a grant for lifetime
> guarantee of seccomp in the kernel that takes less than 1kbyte on a x86,
> right? (on a normal desktop I'll save at minimum 30 times more than what
> I cost to the kernel users ;) Just kidding of course...
>
Sounds like a good idea (and your proposed implementation -
lock_page and recheck mapping in do_no_page sounds sane).
Thanks,
Nick
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
prev parent reply other threads:[~2006-01-11 9:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-13 19:37 smp race fix between invalidate_inode_pages* and do_no_page Andrea Arcangeli
2005-12-13 21:02 ` Andrew Morton
2005-12-13 21:14 ` Andrea Arcangeli
2005-12-16 13:51 ` Andrea Arcangeli
2006-01-10 6:24 ` Andrea Arcangeli
2006-01-10 6:48 ` Andrea Arcangeli
2006-01-11 4:08 ` Nick Piggin
2006-01-11 8:23 ` Andrea Arcangeli
2006-01-11 8:51 ` Andrew Morton
2006-01-11 9:02 ` Andrea Arcangeli
2006-01-11 9:06 ` Andrew Morton
2006-01-11 9:13 ` Andrea Arcangeli
2006-01-11 20:49 ` Hugh Dickins
2006-01-11 21:05 ` Andrew Morton
2006-01-13 7:35 ` Nick Piggin
2006-01-13 7:47 ` Andrew Morton
2006-01-13 10:37 ` Nick Piggin
2006-03-31 12:36 ` Andrea Arcangeli
2006-04-02 5:17 ` Nick Piggin
2006-04-02 5:21 ` Andrew Morton
2006-04-07 19:18 ` Hugh Dickins
2006-01-11 9:39 ` Nick Piggin
2006-01-11 9:34 ` Nick Piggin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43C4D113.4060705@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.