From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nick Piggin Subject: Re: [PATCH 17/18] fs: icache remove inode_lock Date: Sat, 16 Oct 2010 07:50:45 +1100 Message-ID: <20101015205045.GA20476@amd> References: <20101013123008.GA6067@amd> <20101013232319.GZ4681@dastard> <20101014090609.GB3144@amd> <20101014144159.GA11972@infradead.org> <20101015001443.GA3146@amd> <20101015031343.GG4681@dastard> <20101015033017.GA6750@amd> <20101015034451.GA6827@amd> <20101015064150.GB7511@amd> <20101015105943.GE32255@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Nick Piggin , Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org To: Dave Chinner Return-path: Content-Disposition: inline In-Reply-To: <20101015105943.GE32255@dastard> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Fri, Oct 15, 2010 at 09:59:43PM +1100, Dave Chinner wrote: > On Fri, Oct 15, 2010 at 05:41:50PM +1100, Nick Piggin wrote: > > You're worried about mere mortals reviewing and understanding it... > > I don't really know. If you understand inode locking today, you > > can understand the inode scaling series quite easily. Ditto for > > dcache. rcu-walk path walking is trickier, but it is described in > > detail in documentation and changelog. > > > > And you can understand the high level approach without exactly > > digesting every detail at once. The inode locking work goes to > > break up all global locks: > > > > - a single inode object is protected (to the same level as > > inode_lock) with i_lock. This makes it really trivial for > > filesystems to lock down the object without taking a global > > lock. > > Which is unnecessarily wide, and results in i_lock having to have > list locks nested inside it, and that leads to the lock > inversion try-lock mess that several people have complained about. Gee, you keep repeating this so often that you have me doubting myself a tiny bit, so I have to check. $ grep spin_trylock fs/inode.c fs/fs-writeback.c fs/inode.c: if (!spin_trylock(&inode->i_lock)) { fs/inode.c: if (!spin_trylock(&old->i_lock)) { fs/inode.c: if (!spin_trylock(&old->i_lock)) { fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) { fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) { fs/fs-writeback.c: if (!spin_trylock(&inode->i_lock)) { This is your unmaintainable mess? You decided to rewrite your own vfs-scale tree because I wanted i_lock to protect the icache state (and offered you very good reason for)? Well, surely they must be horrible complex and unmaintainable beasts.... repeat: /* * Don't use RCU walks, common case of no old inode * found requires hash lock. */ spin_lock_bucket(b); hlist_bl_for_each_entry(old, node, &b->head, i_hash) { if (old->i_ino != ino) continue; if (old->i_sb != sb) continue; if (old->i_state & (I_FREEING|I_WILL_FREE)) continue; if (!spin_trylock(&old->i_lock)) { spin_unlock_bucket(b); cpu_relax(); goto repeat; } Nope, no big deal. The rest are much the same. So thanks for the repeated suggestion, but I'll actually prefer to keep my regular i_lock locking scheme where you don't need to look up the documentation and think hard about coherency between protected and unprotected parts of the inode whenever you use it. I didn't stumble upon my locking design by chance. If you think a few trylocks == impending doom, then xfs is looking pretty poorly at this point. So I would ask that you stop making things up about my patch series. If you dislike the trylocks so much that you think it is worth breaking the i_lock regularity or using RCU or whatever, then please propose them as incremental patches to the end of my series where you can see they logically will fit. You know I will argue that locking consistency is more important for maintainability than these few trylocks, however.