* [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list [not found] ` <1311660013.2996.6.camel@edumazet-laptop> @ 2011-07-26 8:21 ` Eric Dumazet 2011-07-26 9:03 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-07-26 8:21 UTC (permalink / raw) To: Tim Chen, Al Viro, David Miller Cc: Christoph Hellwig, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev Le mardi 26 juillet 2011 à 08:00 +0200, Eric Dumazet a écrit : > Next step is to not chain pipes/sockets into superblock s_inodes list > > inode_sb_list_add()/inode_sb_list_del() is the very last contention > point because of spin_lock(&inode_sb_list_lock); Well, not 'last' contention point, as we still hit remove_inode_hash(), inode_wb_list_del(), inode_lru_list_del(), but thats a clear win on my 2x4x2 machine : 9 seconds instead of 22 on a close(socket()) benchmark. [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list Workloads using pipes and sockets hit inode_sb_list_lock contention. superblock s_inodes list is needed for quota, dirty, pagecache and fsnotify management. pipe/anon/socket fs are clearly not candidates for these. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- fs/anon_inodes.c | 2 +- fs/inode.c | 31 +++++++++++++++++++++---------- fs/pipe.c | 2 +- include/linux/fs.h | 3 ++- net/socket.c | 2 +- 5 files changed, 26 insertions(+), 14 deletions(-) diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index 4d433d3..269499e 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd); */ static struct inode *anon_inode_mkinode(void) { - struct inode *inode = new_inode(anon_inode_mnt->mnt_sb); + struct inode *inode = __new_inode(anon_inode_mnt->mnt_sb); if (!inode) return ERR_PTR(-ENOMEM); diff --git a/fs/inode.c b/fs/inode.c index 96c77b8..8a6d62b 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add); static inline void inode_sb_list_del(struct inode *inode) { - spin_lock(&inode_sb_list_lock); - list_del_init(&inode->i_sb_list); - spin_unlock(&inode_sb_list_lock); + if (!list_empty(&inode->i_sb_list)) { + spin_lock(&inode_sb_list_lock); + list_del_init(&inode->i_sb_list); + spin_unlock(&inode_sb_list_lock); + } } static unsigned long hash(struct super_block *sb, unsigned long hashval) @@ -796,6 +798,19 @@ unsigned int get_next_ino(void) } EXPORT_SYMBOL(get_next_ino); +struct inode *__new_inode(struct super_block *sb) +{ + struct inode *inode = alloc_inode(sb); + + if (inode) { + spin_lock(&inode->i_lock); + inode->i_state = 0; + spin_unlock(&inode->i_lock); + INIT_LIST_HEAD(&inode->i_sb_list); + } + return inode; +} + /** * new_inode - obtain an inode * @sb: superblock @@ -814,13 +829,9 @@ struct inode *new_inode(struct super_block *sb) spin_lock_prefetch(&inode_sb_list_lock); - inode = alloc_inode(sb); - if (inode) { - spin_lock(&inode->i_lock); - inode->i_state = 0; - spin_unlock(&inode->i_lock); - inode_sb_list_add(inode); - } + inode = __new_inode(sb); + if (inode) + inode_sb_list_add(inode); return inode; } EXPORT_SYMBOL(new_inode); diff --git a/fs/pipe.c b/fs/pipe.c index 1b7f9af..937b962 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = { static struct inode * get_pipe_inode(void) { - struct inode *inode = new_inode(pipe_mnt->mnt_sb); + struct inode *inode = __new_inode(pipe_mnt->mnt_sb); struct pipe_inode_info *pipe; if (!inode) diff --git a/include/linux/fs.h b/include/linux/fs.h index a665804..60be54f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode); extern void iget_failed(struct inode *); extern void end_writeback(struct inode *); extern void __destroy_inode(struct inode *); -extern struct inode *new_inode(struct super_block *); +extern struct inode *__new_inode(struct super_block *sb); +extern struct inode *new_inode(struct super_block *sb); extern void free_inode_nonrcu(struct inode *inode); extern int should_remove_suid(struct dentry *); extern int file_remove_suid(struct file *); diff --git a/net/socket.c b/net/socket.c index 02dc82d..b4b8a08 100644 --- a/net/socket.c +++ b/net/socket.c @@ -467,7 +467,7 @@ static struct socket *sock_alloc(void) struct inode *inode; struct socket *sock; - inode = new_inode(sock_mnt->mnt_sb); + inode = __new_inode(sock_mnt->mnt_sb); if (!inode) return NULL; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list 2011-07-26 8:21 ` [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list Eric Dumazet @ 2011-07-26 9:03 ` Christoph Hellwig 2011-07-26 9:36 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2011-07-26 9:03 UTC (permalink / raw) To: Eric Dumazet Cc: Tim Chen, Al Viro, David Miller, Christoph Hellwig, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote: > Well, not 'last' contention point, as we still hit remove_inode_hash(), There should be no ned to put pipe or anon inodes on the inode hash. Probably sockets don't need it either, but I'd need to look at it in detail. > inode_wb_list_del() The should never be on the wb list either, doing an unlocked check for actually beeing on the list before taking the lock should help you. > inode_lru_list_del(), No real need to keep inodes in the LRU if we only allocate them using new_inode but never look them up either. You might want to try setting .drop_inode to generic_delete_inode for these. > +struct inode *__new_inode(struct super_block *sb) > +{ > + struct inode *inode = alloc_inode(sb); > + > + if (inode) { > + spin_lock(&inode->i_lock); > + inode->i_state = 0; > + spin_unlock(&inode->i_lock); > + INIT_LIST_HEAD(&inode->i_sb_list); > + } > + return inode; > +} This needs a much better name like new_inode_pseudo, and a kerneldoc comment explaining when it is safe to use, and the consequences, which appear to me: - fs may never be unmount - quotas can't work on the filesystem - writeback can't work on the filesystem > @@ -814,13 +829,9 @@ struct inode *new_inode(struct super_block *sb) > > spin_lock_prefetch(&inode_sb_list_lock); > > - inode = alloc_inode(sb); > - if (inode) { > - spin_lock(&inode->i_lock); > - inode->i_state = 0; > - spin_unlock(&inode->i_lock); > - inode_sb_list_add(inode); > - } > + inode = __new_inode(sb); > + if (inode) > + inode_sb_list_add(inode); bad indentation. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list 2011-07-26 9:03 ` Christoph Hellwig @ 2011-07-26 9:36 ` Eric Dumazet 2011-07-26 9:42 ` Christoph Hellwig 2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet 0 siblings, 2 replies; 14+ messages in thread From: Eric Dumazet @ 2011-07-26 9:36 UTC (permalink / raw) To: Christoph Hellwig Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev Le mardi 26 juillet 2011 à 05:03 -0400, Christoph Hellwig a écrit : > On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote: > > Well, not 'last' contention point, as we still hit remove_inode_hash(), > > There should be no ned to put pipe or anon inodes on the inode hash. > Probably sockets don't need it either, but I'd need to look at it in > detail. > > > inode_wb_list_del() > > The should never be on the wb list either, doing an unlocked check for > actually beeing on the list before taking the lock should help you. Yes, it might even help regular inodes ;) > > > inode_lru_list_del(), > > No real need to keep inodes in the LRU if we only allocate them using > new_inode but never look them up either. You might want to try setting > .drop_inode to generic_delete_inode for these. Yes, I'll take a look, thanks. > > > +struct inode *__new_inode(struct super_block *sb) > > +{ > > + struct inode *inode = alloc_inode(sb); > > + > > + if (inode) { > > + spin_lock(&inode->i_lock); > > + inode->i_state = 0; > > + spin_unlock(&inode->i_lock); > > + INIT_LIST_HEAD(&inode->i_sb_list); > > + } > > + return inode; > > +} > > This needs a much better name like new_inode_pseudo, and a kerneldoc > comment explaining when it is safe to use, and the consequences, which > appear to me: > > - fs may never be unmount > - quotas can't work on the filesystem > - writeback can't work on the filesystem Thanks for reviewing, here is v2 of the patch, addressing your comments. [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list Workloads using pipes and sockets hit inode_sb_list_lock contention. superblock s_inodes list is needed for quota, dirty, pagecache and fsnotify management. pipe/anon/socket fs are clearly not candidates for these. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- v2: address Christoph comments fs/anon_inodes.c | 2 +- fs/inode.c | 39 ++++++++++++++++++++++++++++++--------- fs/pipe.c | 2 +- include/linux/fs.h | 3 ++- net/socket.c | 2 +- 5 files changed, 35 insertions(+), 13 deletions(-) diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index 4d433d3..f11e43e 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd); */ static struct inode *anon_inode_mkinode(void) { - struct inode *inode = new_inode(anon_inode_mnt->mnt_sb); + struct inode *inode = new_inode_pseudo(anon_inode_mnt->mnt_sb); if (!inode) return ERR_PTR(-ENOMEM); diff --git a/fs/inode.c b/fs/inode.c index 96c77b8..319b93b 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add); static inline void inode_sb_list_del(struct inode *inode) { - spin_lock(&inode_sb_list_lock); - list_del_init(&inode->i_sb_list); - spin_unlock(&inode_sb_list_lock); + if (!list_empty(&inode->i_sb_list)) { + spin_lock(&inode_sb_list_lock); + list_del_init(&inode->i_sb_list); + spin_unlock(&inode_sb_list_lock); + } } static unsigned long hash(struct super_block *sb, unsigned long hashval) @@ -797,6 +799,29 @@ unsigned int get_next_ino(void) EXPORT_SYMBOL(get_next_ino); /** + * new_inode_pseudo - obtain an inode + * @sb: superblock + * + * Allocates a new inode for given superblock. + * Inode wont be chained in superblock s_inodes list + * This means : + * - fs can't be unmount + * - quotas, fsnotify, writeback can't work + */ +struct inode *new_inode_pseudo(struct super_block *sb) +{ + struct inode *inode = alloc_inode(sb); + + if (inode) { + spin_lock(&inode->i_lock); + inode->i_state = 0; + spin_unlock(&inode->i_lock); + INIT_LIST_HEAD(&inode->i_sb_list); + } + return inode; +} + +/** * new_inode - obtain an inode * @sb: superblock * @@ -814,13 +839,9 @@ struct inode *new_inode(struct super_block *sb) spin_lock_prefetch(&inode_sb_list_lock); - inode = alloc_inode(sb); - if (inode) { - spin_lock(&inode->i_lock); - inode->i_state = 0; - spin_unlock(&inode->i_lock); + inode = new_inode_pseudo(sb); + if (inode) inode_sb_list_add(inode); - } return inode; } EXPORT_SYMBOL(new_inode); diff --git a/fs/pipe.c b/fs/pipe.c index 1b7f9af..0e0be1d 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = { static struct inode * get_pipe_inode(void) { - struct inode *inode = new_inode(pipe_mnt->mnt_sb); + struct inode *inode = new_inode_pseudo(pipe_mnt->mnt_sb); struct pipe_inode_info *pipe; if (!inode) diff --git a/include/linux/fs.h b/include/linux/fs.h index a665804..cc363fa 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode); extern void iget_failed(struct inode *); extern void end_writeback(struct inode *); extern void __destroy_inode(struct inode *); -extern struct inode *new_inode(struct super_block *); +extern struct inode *new_inode_pseudo(struct super_block *sb); +extern struct inode *new_inode(struct super_block *sb); extern void free_inode_nonrcu(struct inode *inode); extern int should_remove_suid(struct dentry *); extern int file_remove_suid(struct file *); diff --git a/net/socket.c b/net/socket.c index 02dc82d..26ed35c 100644 --- a/net/socket.c +++ b/net/socket.c @@ -467,7 +467,7 @@ static struct socket *sock_alloc(void) struct inode *inode; struct socket *sock; - inode = new_inode(sock_mnt->mnt_sb); + inode = new_inode_pseudo(sock_mnt->mnt_sb); if (!inode) return NULL; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list 2011-07-26 9:36 ` Eric Dumazet @ 2011-07-26 9:42 ` Christoph Hellwig 2011-07-26 10:43 ` Eric Dumazet 2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2011-07-26 9:42 UTC (permalink / raw) To: Eric Dumazet Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote: > [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list > > Workloads using pipes and sockets hit inode_sb_list_lock contention. > > superblock s_inodes list is needed for quota, dirty, pagecache and > fsnotify management. pipe/anon/socket fs are clearly not candidates for > these. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Looks good to me, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list 2011-07-26 9:42 ` Christoph Hellwig @ 2011-07-26 10:43 ` Eric Dumazet 2011-07-26 11:49 ` Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2011-07-26 10:43 UTC (permalink / raw) To: Christoph Hellwig Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev Le mardi 26 juillet 2011 à 05:42 -0400, Christoph Hellwig a écrit : > On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote: > > [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list > > > > Workloads using pipes and sockets hit inode_sb_list_lock contention. > > > > superblock s_inodes list is needed for quota, dirty, pagecache and > > fsnotify management. pipe/anon/socket fs are clearly not candidates for > > these. > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > Looks good to me, > > Reviewed-by: Christoph Hellwig <hch@lst.de> > Thanks ! BTW, we have one atomic op that could be avoided in new_inode() spin_lock(&inode->i_lock); inode->i_state = 0; spin_unlock(&inode->i_lock); can probably be changed to something less expensive... inode->i_state = 0; smp_wmb(); Not clear if we really need a memory barrier either.... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list 2011-07-26 10:43 ` Eric Dumazet @ 2011-07-26 11:49 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2011-07-26 11:49 UTC (permalink / raw) To: Eric Dumazet Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev On Tue, Jul 26, 2011 at 12:43:33PM +0200, Eric Dumazet wrote: > BTW, we have one atomic op that could be avoided in new_inode() > > spin_lock(&inode->i_lock); > inode->i_state = 0; > spin_unlock(&inode->i_lock); > > can probably be changed to something less expensive... > > inode->i_state = 0; > smp_wmb(); > > Not clear if we really need a memory barrier either.... I think we already had this in some of the earlier vfs/inode scale series, but it got lost when Al asked to just put the fundamental changes in. For plain new_inode() the barrier shouldn't be needed as we take the sb list lock just a little later. I'm not sure about your new variant, so I'll rather lave that to you. There's a few other things missing from earlier iterations, most notable the non-atomic i_count, and the bucket locks for the inode hash, if you're eager enough to look into that area. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] vfs: avoid taking locks if inode not in lists 2011-07-26 9:36 ` Eric Dumazet 2011-07-26 9:42 ` Christoph Hellwig @ 2011-07-27 15:21 ` Eric Dumazet 2011-07-27 17:12 ` Andi Kleen 2011-07-27 20:44 ` Christoph Hellwig 1 sibling, 2 replies; 14+ messages in thread From: Eric Dumazet @ 2011-07-27 15:21 UTC (permalink / raw) To: Christoph Hellwig Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev Le mardi 26 juillet 2011 à 11:36 +0200, Eric Dumazet a écrit : > Le mardi 26 juillet 2011 à 05:03 -0400, Christoph Hellwig a écrit : > > On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote: > > > Well, not 'last' contention point, as we still hit remove_inode_hash(), > > > > There should be no ned to put pipe or anon inodes on the inode hash. > > Probably sockets don't need it either, but I'd need to look at it in > > detail. > > > > > inode_wb_list_del() > > > > The should never be on the wb list either, doing an unlocked check for > > actually beeing on the list before taking the lock should help you. > > Yes, it might even help regular inodes ;) > > > > > > inode_lru_list_del(), > > > > No real need to keep inodes in the LRU if we only allocate them using > > new_inode but never look them up either. You might want to try setting > > .drop_inode to generic_delete_inode for these. > > Yes, I'll take a look, thanks. If I am not mistaken, we can add unlocked checks on the three hot spots. After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on my dev machine takes ~3us instead of ~9us. Maybe its better to split it in three patches, just let me know. 22us -> 3us, thats a nice patch series ;) Thanks [PATCH] vfs: avoid taking locks if inode not in lists sockets and pipes inodes destruction hits three possibly contended locks : system-wide inode_hash_lock in remove_inode_hash() superblock s_inode_lru_lock in inode_lru_list_del() bdi wb.list_lock in inode_wb_list_del() Before even taking locks, we can perform an unlocked test to check if inode can possibly be in the lists. On a 2x4x2 machine, a close(socket()) pair can be 200% faster with these changes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- fs/fs-writeback.c | 10 ++++++---- fs/inode.c | 6 ++++++ 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 1599aa9..8b90bdb 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -182,11 +182,13 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi) */ void inode_wb_list_del(struct inode *inode) { - struct backing_dev_info *bdi = inode_to_bdi(inode); + if (!list_empty(&inode->i_wb_list)) { + struct backing_dev_info *bdi = inode_to_bdi(inode); - spin_lock(&bdi->wb.list_lock); - list_del_init(&inode->i_wb_list); - spin_unlock(&bdi->wb.list_lock); + spin_lock(&bdi->wb.list_lock); + list_del_init(&inode->i_wb_list); + spin_unlock(&bdi->wb.list_lock); + } } /* diff --git a/fs/inode.c b/fs/inode.c index d0c72ff..796a420 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -338,6 +338,9 @@ static void inode_lru_list_add(struct inode *inode) static void inode_lru_list_del(struct inode *inode) { + if (list_empty(&inode->i_lru)) + return; + spin_lock(&inode->i_sb->s_inode_lru_lock); if (!list_empty(&inode->i_lru)) { list_del_init(&inode->i_lru); @@ -406,6 +409,9 @@ EXPORT_SYMBOL(__insert_inode_hash); */ void remove_inode_hash(struct inode *inode) { + if (inode_unhashed(inode)) + return; + spin_lock(&inode_hash_lock); spin_lock(&inode->i_lock); hlist_del_init(&inode->i_hash); -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists 2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet @ 2011-07-27 17:12 ` Andi Kleen 2011-07-27 20:44 ` Christoph Hellwig 1 sibling, 0 replies; 14+ messages in thread From: Andi Kleen @ 2011-07-27 17:12 UTC (permalink / raw) To: Eric Dumazet Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev > Before even taking locks, we can perform an unlocked test to check if > inode can possibly be in the lists. > > On a 2x4x2 machine, a close(socket()) pair can be 200% faster with these > changes. Great result! Seems simple and obvious to me. -Andi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists 2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet 2011-07-27 17:12 ` Andi Kleen @ 2011-07-27 20:44 ` Christoph Hellwig 2011-07-27 20:59 ` Andi Kleen ` (2 more replies) 1 sibling, 3 replies; 14+ messages in thread From: Christoph Hellwig @ 2011-07-27 20:44 UTC (permalink / raw) To: Eric Dumazet Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote: > If I am not mistaken, we can add unlocked checks on the three hot spots. > > After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on > my dev machine takes ~3us instead of ~9us. > > Maybe its better to split it in three patches, just let me know. I think three patches would be a lot cleaner. As for safety of the unlocked checks: - inode are either hashed when created or never, so that one looks fine. - same for the sb list. - the writeback list is a bit more dynamic as we move things around quite a bit. But in additon to the inode_wb_list_del call from evict() it only ever gets remove in writeback_single_inode, which for a freeing inode can only be called from the callers of evict(). Btw, I wonder if you should micro-optimize things a bit further by moving the unhashed checks from the deletion functions into the callers and thus save a function call for each of them. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists 2011-07-27 20:44 ` Christoph Hellwig @ 2011-07-27 20:59 ` Andi Kleen 2011-07-27 21:01 ` Christoph Hellwig 2011-07-28 4:41 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet 2011-07-28 4:55 ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet 2 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2011-07-27 20:59 UTC (permalink / raw) To: Christoph Hellwig Cc: Eric Dumazet, Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev > Btw, I wonder if you should micro-optimize things a bit further by > moving the unhashed checks from the deletion functions into the callers > and thus save a function call for each of them. If the caller is in the same file modern gcc is able to do that automatically if you're lucky enough ("partial inlining") I would not uglify the code for it. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists 2011-07-27 20:59 ` Andi Kleen @ 2011-07-27 21:01 ` Christoph Hellwig 2011-07-28 4:11 ` [PATCH] vfs: conditionally call inode_wb_list_del() Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2011-07-27 21:01 UTC (permalink / raw) To: Andi Kleen Cc: Christoph Hellwig, Eric Dumazet, Tim Chen, Al Viro, David Miller, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev On Wed, Jul 27, 2011 at 10:59:57PM +0200, Andi Kleen wrote: > > Btw, I wonder if you should micro-optimize things a bit further by > > moving the unhashed checks from the deletion functions into the callers > > and thus save a function call for each of them. > > If the caller is in the same file modern gcc is able to do that automatically > if you're lucky enough ("partial inlining") > > I would not uglify the code for it. Depending on how you look at it the code might actually be a tad cleaner. One of called functions is outside of inode.c. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] vfs: conditionally call inode_wb_list_del() 2011-07-27 21:01 ` Christoph Hellwig @ 2011-07-28 4:11 ` Eric Dumazet 0 siblings, 0 replies; 14+ messages in thread From: Eric Dumazet @ 2011-07-28 4:11 UTC (permalink / raw) To: Christoph Hellwig Cc: Andi Kleen, Tim Chen, Al Viro, David Miller, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev Le mercredi 27 juillet 2011 à 17:01 -0400, Christoph Hellwig a écrit : > On Wed, Jul 27, 2011 at 10:59:57PM +0200, Andi Kleen wrote: > > > Btw, I wonder if you should micro-optimize things a bit further by > > > moving the unhashed checks from the deletion functions into the callers > > > and thus save a function call for each of them. > > > > If the caller is in the same file modern gcc is able to do that automatically > > if you're lucky enough ("partial inlining") > > > > I would not uglify the code for it. > > Depending on how you look at it the code might actually be a tad > cleaner. One of called functions is outside of inode.c. > Thats right, thanks again for your valuable input Christoph. The following is a clear win, since we avoid the call to external function. [PATCH] vfs: conditionally call inode_wb_list_del() Some inodes (pipes, sockets, ...) are not in bdi writeback list. evict() can avoid calling inode_wb_list_del() and its expensive spinlock by checking inode i_wb_list being empty or not. At this point, no other cpu/user can concurrently manipulate this inode i_wb_list Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- fs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/inode.c b/fs/inode.c index d0c72ff..9dab13a 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -454,7 +454,9 @@ static void evict(struct inode *inode) BUG_ON(!(inode->i_state & I_FREEING)); BUG_ON(!list_empty(&inode->i_lru)); - inode_wb_list_del(inode); + if (!list_empty(&inode->i_wb_list)) + inode_wb_list_del(inode); + inode_sb_list_del(inode); if (op->evict_inode) { -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] vfs: avoid taking locks if inode not in lists 2011-07-27 20:44 ` Christoph Hellwig 2011-07-27 20:59 ` Andi Kleen @ 2011-07-28 4:41 ` Eric Dumazet 2011-07-28 4:55 ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet 2 siblings, 0 replies; 14+ messages in thread From: Eric Dumazet @ 2011-07-28 4:41 UTC (permalink / raw) To: Christoph Hellwig Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev Le mercredi 27 juillet 2011 à 16:44 -0400, Christoph Hellwig a écrit : > On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote: > > If I am not mistaken, we can add unlocked checks on the three hot spots. > > > > After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on > > my dev machine takes ~3us instead of ~9us. > > > > Maybe its better to split it in three patches, just let me know. > > I think three patches would be a lot cleaner. > > As for safety of the unlocked checks: > > - inode are either hashed when created or never, so that one looks > fine. > - same for the sb list. > - the writeback list is a bit more dynamic as we move things around > quite a bit. But in additon to the inode_wb_list_del call from > evict() it only ever gets remove in writeback_single_inode, which > for a freeing inode can only be called from the callers of evict(). > > Btw, I wonder if you should micro-optimize things a bit further by > moving the unhashed checks from the deletion functions into the callers > and thus save a function call for each of them. > What about following patch, addressing the micro-optimization and Andi Kleen concern about evict() readability ? Thanks ! [PATCH] vfs: avoid taking inode_hash_lock on pipes and sockets Some inodes (pipes, sockets, ...) are not hashed, no need to take contended inode_hash_lock at dismantle time. nice speedup on SMP machines on socket intensive workloads. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- fs/inode.c | 6 +++--- include/linux/fs.h | 9 ++++++++- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index d0c72ff..73b5598 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -399,12 +399,12 @@ void __insert_inode_hash(struct inode *inode, unsigned long hashval) EXPORT_SYMBOL(__insert_inode_hash); /** - * remove_inode_hash - remove an inode from the hash + * __remove_inode_hash - remove an inode from the hash * @inode: inode to unhash * * Remove an inode from the superblock. */ -void remove_inode_hash(struct inode *inode) +void __remove_inode_hash(struct inode *inode) { spin_lock(&inode_hash_lock); spin_lock(&inode->i_lock); @@ -412,7 +412,7 @@ void remove_inode_hash(struct inode *inode) spin_unlock(&inode->i_lock); spin_unlock(&inode_hash_lock); } -EXPORT_SYMBOL(remove_inode_hash); +EXPORT_SYMBOL(__remove_inode_hash); void end_writeback(struct inode *inode) { diff --git a/include/linux/fs.h b/include/linux/fs.h index f23bcb7..786b3b1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2317,11 +2317,18 @@ extern int should_remove_suid(struct dentry *); extern int file_remove_suid(struct file *); extern void __insert_inode_hash(struct inode *, unsigned long hashval); -extern void remove_inode_hash(struct inode *); static inline void insert_inode_hash(struct inode *inode) { __insert_inode_hash(inode, inode->i_ino); } + +extern void __remove_inode_hash(struct inode *); +static inline void remove_inode_hash(struct inode *inode) +{ + if (!inode_unhashed(inode)) + __remove_inode_hash(inode); +} + extern void inode_sb_list_add(struct inode *inode); #ifdef CONFIG_BLOCK -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH] vfs: avoid call to inode_lru_list_del() if possible 2011-07-27 20:44 ` Christoph Hellwig 2011-07-27 20:59 ` Andi Kleen 2011-07-28 4:41 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet @ 2011-07-28 4:55 ` Eric Dumazet 2 siblings, 0 replies; 14+ messages in thread From: Eric Dumazet @ 2011-07-28 4:55 UTC (permalink / raw) To: Christoph Hellwig Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev Le mercredi 27 juillet 2011 à 16:44 -0400, Christoph Hellwig a écrit : > On Wed, Jul 27, 2011 at 05:21:05PM +0200, Eric Dumazet wrote: > > If I am not mistaken, we can add unlocked checks on the three hot spots. > > > > After following patch, a close(socket(PF_INET, SOCK_DGRAM, 0)) pair on > > my dev machine takes ~3us instead of ~9us. > > > > Maybe its better to split it in three patches, just let me know. > > I think three patches would be a lot cleaner. > > As for safety of the unlocked checks: > > - inode are either hashed when created or never, so that one looks > fine. > - same for the sb list. > - the writeback list is a bit more dynamic as we move things around > quite a bit. But in additon to the inode_wb_list_del call from > evict() it only ever gets remove in writeback_single_inode, which > for a freeing inode can only be called from the callers of evict(). > > Btw, I wonder if you should micro-optimize things a bit further by > moving the unhashed checks from the deletion functions into the callers > and thus save a function call for each of them. > Here is the last patch, addressing inode_lru_list_del() call. Only the call done from iput_final() can obviously benefit from checking i_lru being empty or not, so it makes sense to perform the check at caller site instead of doing it in inode_lru_list_del() [PATCH] vfs: avoid call to inode_lru_list_del() if possible inode_lru_list_del() is expensive because of per superblock lru locking, while some inodes are not in lru list. Adding a check in iput_final() can speedup pipe/sockets workloads on SMP. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- fs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/inode.c b/fs/inode.c index d0c72ff..b8b8939 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1328,7 +1328,8 @@ static void iput_final(struct inode *inode) } inode->i_state |= I_FREEING; - inode_lru_list_del(inode); + if (!list_empty(&inode->i_lru)) + inode_lru_list_del(inode); spin_unlock(&inode->i_lock); evict(inode); -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-07-28 4:55 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20110717010427.GC5359@parisc-linux.org>
[not found] ` <20110717084630.GC8006@one.firstfloor.org>
[not found] ` <1310977028.5756.1.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>
[not found] ` <20110718154008.GA29593@infradead.org>
[not found] ` <20110718155141.GA11013@ZenIV.linux.org.uk>
[not found] ` <1311093158.2707.75.camel@schen9-DESK>
[not found] ` <20110721204042.GB31405@ZenIV.linux.org.uk>
[not found] ` <1311294452.2576.18.camel@schen9-DESK>
[not found] ` <20110723132411.GA22183@infradead.org>
[not found] ` <1311633550.2576.33.camel@schen9-DESK>
[not found] ` <20110725225154.GD22133@ZenIV.linux.org.uk>
[not found] ` <1311636178.2576.34.camel@schen9-DESK>
[not found] ` <1311660013.2996.6.camel@edumazet-laptop>
2011-07-26 8:21 ` [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list Eric Dumazet
2011-07-26 9:03 ` Christoph Hellwig
2011-07-26 9:36 ` Eric Dumazet
2011-07-26 9:42 ` Christoph Hellwig
2011-07-26 10:43 ` Eric Dumazet
2011-07-26 11:49 ` Christoph Hellwig
2011-07-27 15:21 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
2011-07-27 17:12 ` Andi Kleen
2011-07-27 20:44 ` Christoph Hellwig
2011-07-27 20:59 ` Andi Kleen
2011-07-27 21:01 ` Christoph Hellwig
2011-07-28 4:11 ` [PATCH] vfs: conditionally call inode_wb_list_del() Eric Dumazet
2011-07-28 4:41 ` [PATCH] vfs: avoid taking locks if inode not in lists Eric Dumazet
2011-07-28 4:55 ` [PATCH] vfs: avoid call to inode_lru_list_del() if possible Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).