From: "Theodore Ts'o" <tytso@mit.edu>
To: Linux Kernel Developers List <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
"Theodore Ts'o" <tytso@mit.edu>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: [PATCH 06/49] ext3: avoid unnecessary spinlock in critical POSIX ACL path
Date: Mon, 8 Jun 2009 15:22:24 -0400 [thread overview]
Message-ID: <1244488987-32564-7-git-send-email-tytso@mit.edu> (raw)
In-Reply-To: <1244488987-32564-6-git-send-email-tytso@mit.edu>
From: Linus Torvalds <torvalds@linux-foundation.org>
If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
to do POSIX ACL checks on any files not owned by the caller, and it does
this for every single pathname component that it looks up.
That obviously can be pretty expensive if the filesystem isn't careful
about it, especially with locking. That's doubly sad, since the common
case tends to be that there are no ACL's associated with the files in
question.
ext3 already caches the ACL data so that it doesn't have to look it up
over and over again, but it does so by taking the inode->i_lock spinlock
on every lookup. Which is a noticeable overhead even if it's a private
lock, especially on CPU's where the serialization is expensive (eg Intel
Netburst aka 'P4').
For the special case of not actually having any ACL's, all that locking is
unnecessary. Even if somebody else were to be changing the ACL's on
another CPU, we simply don't care - if we've seen a NULL ACL, we might as
well use it.
So just load the ACL speculatively without any locking, and if it was
NULL, just use it. If it's non-NULL (either because we had a cached
entry, or because the cache hasn't been filled in at all), it means that
we'll need to get the lock and re-load it properly.
This is noticeable even on Nehalem, which does locking quite well (much
better than P4). From lmbench:
Processor, Processes - times in microseconds - smaller is better
--------------------------------------------------------------------
Host OS Mhz null null open slct fork exec sh
call I/O stat clos TCP proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
- before:
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.45 2.18 69.1 273. 1141
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.48 2.28 69.9 253. 1140
nehalem.l Linux 2.6.30- 3193 0.04 0.10 0.95 1.42 2.19 68.6 284. 1141
- after:
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.44 2.12 68.3 282. 1094
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.20 67.0 308. 1123
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.36 67.4 293. 1148
where you can see what appears to be a roughly 3% improvement in stat
and open/close latencies from just the removal of the locking overhead.
Of course, this only matters for files you don't own (the owner never
needs to do the ACL checks), but that's the common case for libraries,
header files, and executables. As well as for the base components of any
absolute pathname, even if you are the owner of the final file.
[ At some point we probably want to move this ACL caching logic entirely
into the VFS layer (and only call down to the filesystem when
uncached), but in the meantime this improves ext3 a bit.
A similar fix to btrfs makes a much bigger difference (15x improvement
in lmbench) due to broken caching. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
---
fs/ext3/acl.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/fs/ext3/acl.c b/fs/ext3/acl.c
index d81ef2f..e0c7454 100644
--- a/fs/ext3/acl.c
+++ b/fs/ext3/acl.c
@@ -129,12 +129,15 @@ fail:
static inline struct posix_acl *
ext3_iget_acl(struct inode *inode, struct posix_acl **i_acl)
{
- struct posix_acl *acl = EXT3_ACL_NOT_CACHED;
+ struct posix_acl *acl = ACCESS_ONCE(*i_acl);
- spin_lock(&inode->i_lock);
- if (*i_acl != EXT3_ACL_NOT_CACHED)
- acl = posix_acl_dup(*i_acl);
- spin_unlock(&inode->i_lock);
+ if (acl) {
+ spin_lock(&inode->i_lock);
+ acl = *i_acl;
+ if (acl != EXT3_ACL_NOT_CACHED)
+ acl = posix_acl_dup(acl);
+ spin_unlock(&inode->i_lock);
+ }
return acl;
}
--
1.6.3.2.1.gb9f7d.dirty
next prev parent reply other threads:[~2009-06-08 19:26 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-08 19:22 [PATCH 00/49] Ext4 patches currently queued for mainline Theodore Ts'o
2009-06-08 19:22 ` [PATCH 01/49] ext4: Properly initialize the buffer_head state Theodore Ts'o
2009-06-08 19:22 ` [PATCH 02/49] vfs: Add BUG_ON for delayed and unwritten flags in submit_bh() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 03/49] ext4: Mark the unwritten buffer_head as mapped during write_begin Theodore Ts'o
2009-06-08 19:22 ` [PATCH 04/49] ext4: Fallback to vmalloc if kmalloc can't allocate s_flex_groups array Theodore Ts'o
2009-06-08 19:22 ` [PATCH 05/49] ext4: Use is_power_of_2() for clarity Theodore Ts'o
2009-06-08 19:22 ` Theodore Ts'o [this message]
2009-06-08 19:22 ` [PATCH 07/49] ext4: avoid unnecessary spinlock in critical POSIX ACL path Theodore Ts'o
2009-06-08 19:22 ` [PATCH 08/49] ext4: Simplify ext4_commit_super()'s function signature Theodore Ts'o
2009-06-08 19:22 ` [PATCH 09/49] ext4: Fix and simplify s_dirt handling Theodore Ts'o
2009-06-08 19:22 ` [PATCH 10/49] ext4: Use separate super_operations structure for no_journal filesystems Theodore Ts'o
2009-06-08 19:22 ` [PATCH 11/49] ext4: Avoid races caused by on-line resizing and SMP memory reordering Theodore Ts'o
2009-06-08 19:22 ` [PATCH 12/49] ext4: Remove outdated comment about lock_super() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 13/49] ext4: ext4_mark_recovery_complete() doesn't need to use lock_super Theodore Ts'o
2009-06-08 19:22 ` [PATCH 14/49] ext4: Replace lock/unlock_super() with an explicit lock for the orphan list Theodore Ts'o
2009-06-08 19:22 ` [PATCH 15/49] ext4: Replace lock/unlock_super() with an explicit lock for resizing Theodore Ts'o
2009-06-08 19:22 ` [PATCH 16/49] ext4: Don't avoid using BLOCK_UNINIT block groups in mballoc Theodore Ts'o
2009-06-08 19:22 ` [PATCH 17/49] ext4: Move the ext4_i.h header file into ext4.h Theodore Ts'o
2009-06-08 19:22 ` [PATCH 18/49] ext4: Move the ext4_sb.h " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 19/49] ext4: Move fs/ext4/namei.h " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 20/49] ext4: Move fs/ext4/group.h " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 21/49] ext4: Make the length of the mb_history file tunable Theodore Ts'o
2009-06-08 19:22 ` [PATCH 22/49] ext4: hook fiemap operation for directories Theodore Ts'o
2009-06-08 19:22 ` [PATCH 23/49] vfs: Enable FS_IOC_FIEMAP and FIGETBSZ for all filetypes Theodore Ts'o
2009-06-08 19:22 ` [PATCH 24/49] ext4: fix for fiemap last-block test Theodore Ts'o
2009-06-08 19:22 ` [PATCH 25/49] ext4: fix the length returned by fiemap for an unallocated extent Theodore Ts'o
2009-06-08 19:22 ` [PATCH 26/49] ext4: Convert ext4_lock_group to use sb_bgl_lock Theodore Ts'o
2009-06-08 19:22 ` [PATCH 27/49] ext4: Fix spinlock assertions on UP systems Theodore Ts'o
2009-06-08 19:22 ` [PATCH 28/49] ext4: Simplify function signature for ext4_da_get_block_write() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 29/49] ext4: Rename ext4_get_blocks_handle() to be ext4_ind_get_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 30/49] ext4: Rename ext4_get_blocks_wrap() to be ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 31/49] ext4: Define a new set of flags for ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 32/49] ext4: Add documentation to the ext4_*get_block* functions Theodore Ts'o
2009-06-08 19:22 ` [PATCH 33/49] ext4: Add BUG_ON debugging checks to noalloc_get_block_write() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 34/49] ext4: Merge ext4_da_get_block_write() into mpage_da_map_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 35/49] ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_state Theodore Ts'o
2009-06-08 19:22 ` [PATCH 36/49] ext4: Add a comprehensive block validity check to ext4_get_blocks() Theodore Ts'o
2009-06-08 19:22 ` [PATCH 37/49] ext4: down i_data_sem only for read when walking tree for fiemap Theodore Ts'o
2009-06-08 19:22 ` [PATCH 38/49] ext4: Fix memory leak in ext4_fill_super() in case of a failed mount Theodore Ts'o
2009-06-08 19:22 ` [PATCH 39/49] ext3: Fix memory leak in ext3_fill_super() " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 40/49] ext2: Fix memory leak in ext2_fill_super() " Theodore Ts'o
2009-06-08 19:22 ` [PATCH 41/49] ext4: remove unused function __ext4_write_dirty_metadata Theodore Ts'o
2009-06-08 19:23 ` [PATCH 42/49] ext4: Clean up calls to ext4_get_group_desc() Theodore Ts'o
2009-06-08 19:23 ` [PATCH 43/49] jbd2: Fix minor typos in comments in fs/jbd2/journal.c Theodore Ts'o
2009-06-08 19:23 ` [PATCH 44/49] ext4: super.c whitespace cleanup Theodore Ts'o
2009-06-08 19:23 ` [PATCH 45/49] ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() Theodore Ts'o
2009-06-08 19:23 ` [PATCH 46/49] ext4: Change all super.c messages to print the device Theodore Ts'o
2009-06-08 19:23 ` [PATCH 47/49] ext4: Avoid leaking blocks after a block allocation failure Theodore Ts'o
2009-06-08 19:23 ` [PATCH 48/49] ext4: truncate the file properly if we fail to copy data from userspace Theodore Ts'o
2009-06-08 19:23 ` [PATCH 49/49] ext4: fix dx_map_entry to support 256k directory blocks Theodore Ts'o
2009-06-08 19:41 ` [PATCH 43/49] jbd2: Fix minor typos in comments in fs/jbd2/journal.c Alberto Bertogli
2009-06-09 4:06 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1244488987-32564-7-git-send-email-tytso@mit.edu \
--to=tytso@mit.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox