From: Nick Piggin <npiggin@kernel.dk>
To: Christoph Hellwig <hch@infradead.org>
Cc: npiggin@kernel.dk, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [patch 00/35] my inode scaling series for review
Date: Wed, 20 Oct 2010 14:05:01 +1100 [thread overview]
Message-ID: <20101020030501.GG3740@amd> (raw)
In-Reply-To: <20101019162207.GA3555@infradead.org>
On Tue, Oct 19, 2010 at 12:22:07PM -0400, Christoph Hellwig wrote:
> On Tue, Oct 19, 2010 at 02:42:16PM +1100, npiggin@kernel.dk wrote:
> > * My locking design allows i_lock to lock the entire state of the icache
> > for a particular inode. Not so with Dave's, and he had to add code not
> > required with inode_lock synchronisation or my i_lock synchronisation.
> > I prefer being very conservative about making changes, especially before
> > inode_lock is lifted (which will be the end-point of bisection for any
> > locking breakage before it).
>
> Which code exaxtly? I've done a diff between his inode.c and yours -
I pointed it out earlier (not that you replied).
> and Dave's is a lot simpler. Mostly due to the more regular and simpler
> locking, but also because he did various cleanups before tackling the
> actual locking. See the diff at the end of this mail for a direct
> comparism.
How can you say it is more regular and simpler locking? _Acquiring_
locks is not the difficult part of a locking rewrite. Verifying that
no unhandled concurrency is introduced is.
At first glance there are parts of my code that is more complex, that
is because it is actually more regular locking model that is easier to
verify when you lift inode_lock away.
Like I said to Dave when he first looked at my series: he's welcome to
submit a small incremental change to reduce i_lock coverage or put
the lock order another way and we can see what the impact looks like.
Locking changes should start simpler and more conservative.
> > * As far as I can tell, I have addressed all Dave and Christoph's real
> > concerns. The disagreement about the i_lock locking model can easily be
> > solved if they post a couple of small incremental patches to the end of the
> > series, making i_lock locking less regular and no longer protecting icache
> > state of that given inode (like inode_lock was able to pre-patchset). I've
> > repeatedly disagreed with this approach, however.
>
> The diff below and looking over the other patches doesn't make it look
> like you have actually picked up much at all, neither of the feedback
> from me, nor Dave nor Andrew or Al.
I picked up most of it. I missed the per-cpu one from Andrew because
I was waiting for Eric's new API.
I did pick up feedback from you or Dave, and where I didn't, I replied
with my reasons.
> Even worse than that none of the sometimes quite major bug fixes were
> picked up either. The get_new_inode re-lookup locking is still wrong,
I see, yes, I missed that point.
> the exofs fix is not there.
It doesn't belong there.
> And the fix for mapping move of the
> block devices which we unfortunately still have seems to be paper
> over by passing the bdi_writeback to the requing helpers instead
> of fixing it. While this makes the assert_spin_lock panic go away
> it still leaves a really nasty race as your version locks a different
> bdi than the one that it actually modifies.
Yeah Dave's fix got omitted from my series for some reason. I
agreed it is needed and I just had it commented out for some
reason. Thanks.
> There's also another bug which was there in your very first version
> with an XXX but that Dave AFAIK never picked up: invalidate_inodes is
> called from a lot of other places than umount, and unlocked list
> access is everything but safe there.
No, I didn't ignore that. It is quite definitely already buggy upstream
if there is concurrent list modification there. I noticed that Al was
thinking about it and I was going to wait until he replied with
something definitive.
> Anyway, below is the diff between the two trees. I've cut down the
> curn in filesystem a bit - every related to the gratious i_ref vs i_ref
> and iref vs inode_get difference, as well as the call_rcu boilerplat
> additions and get_next_ino calls are removed to make it somewhat
> readable.
>
> To me the inode.c and especially fs-writeback.c code in Dave's version
> looks a lot more polished.
Well, like I said, I disagree. I don't like how the icache state
of the inode is not locked (as it is with inode_lock) when the
inode is being manipulated.
I see any such step in that direction as an incremental patch,
and shouldn't be lumped in with lock breaking work. That's just
not how to structure a series that adds fine grained locking.
As I said, I am willing to review and comment on such an incremental
patch with an open mind. But I would have to see a good justification
and a reason why RCU or another technique would not work.
You also missed my feedback that he didn't add required RCU, didn't
structure the series properly for a serious scalability rework,
didn't break out the rest of the global locks, and hadn't had it
tested with the rest of the vfs stack.
But your review so far has been very helpful, thanks.
next prev parent reply other threads:[~2010-10-20 3:05 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-19 3:42 [patch 00/35] my inode scaling series for review npiggin
2010-10-19 3:42 ` [patch 01/35] bit_spinlock: add required includes npiggin
2010-10-19 3:42 ` [patch 02/35] kernel: add bl_list npiggin
2010-10-19 3:42 ` [patch 03/35] mm: implement per-zone shrinker npiggin
2010-10-19 4:49 ` KOSAKI Motohiro
2010-10-19 5:33 ` Nick Piggin
2010-10-19 5:40 ` KOSAKI Motohiro
2010-10-19 3:42 ` [patch 04/35] vfs: convert inode and dentry caches to " npiggin
2010-10-19 3:42 ` [patch 05/35] fs: icache lock s_inodes list npiggin
2010-10-19 3:42 ` [patch 06/35] fs: icache lock inode hash npiggin
2010-10-19 3:42 ` [patch 07/35] fs: icache lock i_state npiggin
2010-10-19 10:47 ` Miklos Szeredi
2010-10-19 17:06 ` Peter Zijlstra
2010-10-19 3:42 ` [patch 08/35] fs: icache lock i_count npiggin
2010-10-19 10:16 ` Boaz Harrosh
2010-10-20 2:14 ` Nick Piggin
2010-10-19 3:42 ` [patch 09/35] fs: icache lock lru/writeback lists npiggin
2010-10-19 3:42 ` [patch 10/35] fs: icache atomic inodes_stat npiggin
2010-10-19 3:42 ` [patch 11/35] fs: icache lock inode state npiggin
2010-10-19 3:42 ` [patch 12/35] fs: inode atomic last_ino, iunique lock npiggin
2010-10-19 3:42 ` [patch 13/35] fs: icache remove inode_lock npiggin
2010-10-19 3:42 ` [patch 14/35] fs: icache factor hash lock into functions npiggin
2010-10-19 3:42 ` [patch 15/35] fs: icache per-bucket inode hash locks npiggin
2010-10-19 3:42 ` [patch 16/35] fs: icache lazy inode lru npiggin
2010-10-19 3:42 ` [patch 17/35] fs: icache RCU free inodes npiggin
2010-10-19 3:42 ` [patch 18/35] fs: avoid inode RCU freeing for pseudo fs npiggin
2010-10-19 3:42 ` [patch 19/35] fs: icache remove redundant i_sb_list umount locking npiggin
2010-10-20 12:46 ` Al Viro
2010-10-20 13:03 ` Nick Piggin
2010-10-20 13:27 ` Al Viro
2010-10-19 3:42 ` [patch 20/35] fs: icache rcu walk for i_sb_list npiggin
2010-10-19 3:42 ` [patch 21/35] fs: icache per-cpu nr_inodes, non-atomic nr_unused counters npiggin
2010-10-19 3:42 ` [patch 22/35] fs: icache per-cpu last_ino allocator npiggin
2010-10-19 3:42 ` [patch 23/35] fs: icache use per-CPU lists and locks for sb inode lists npiggin
2010-10-19 15:33 ` Miklos Szeredi
2010-10-20 2:37 ` Nick Piggin
2010-10-19 3:42 ` [patch 24/35] fs: icache use RCU to avoid locking in hash lookups npiggin
2010-10-19 3:42 ` [patch 25/35] fs: icache reduce some locking overheads npiggin
2010-10-19 3:42 ` [patch 26/35] fs: icache alloc anonymous inode allocation npiggin
2010-10-19 15:50 ` Miklos Szeredi
2010-10-20 2:38 ` Nick Piggin
2010-10-19 16:33 ` Christoph Hellwig
2010-10-20 3:07 ` Nick Piggin
2010-10-19 3:42 ` [patch 27/35] fs: icache split IO and LRU lists npiggin
2010-10-19 16:12 ` Miklos Szeredi
2010-10-20 2:41 ` Nick Piggin
2010-10-19 3:42 ` [patch 28/35] fs: icache split writeback and lru locks npiggin
2010-10-19 3:42 ` [patch 29/35] fs: icache per-bdi writeback list locking npiggin
2010-10-19 3:42 ` [patch 30/35] fs: icache lazy LRU avoid LRU locking after IO operation npiggin
2010-10-19 3:42 ` [patch 31/35] fs: icache per-zone inode LRU npiggin
2010-10-19 12:38 ` Dave Chinner
2010-10-20 2:35 ` Nick Piggin
2010-10-20 3:12 ` Nick Piggin
2010-10-20 9:43 ` Dave Chinner
2010-10-20 10:02 ` Nick Piggin
2010-10-20 3:14 ` KOSAKI Motohiro
2010-10-20 3:20 ` Nick Piggin
2010-10-20 3:29 ` KOSAKI Motohiro
2010-10-20 10:19 ` Dave Chinner
2010-10-20 10:41 ` Nick Piggin
2010-10-19 3:42 ` [patch 32/35] fs: icache minimise I_FREEING latency npiggin
2010-10-19 3:42 ` [patch 33/35] fs: icache introduce inode_get/inode_get_ilock npiggin
2010-10-19 10:17 ` Boaz Harrosh
2010-10-20 2:17 ` Nick Piggin
2010-10-19 3:42 ` [patch 34/35] fs: inode rename i_count to i_refs npiggin
2010-10-19 3:42 ` [patch 35/35] fs: icache document more lock orders npiggin
2010-10-19 16:22 ` [patch 00/35] my inode scaling series for review Christoph Hellwig
2010-10-20 3:05 ` Nick Piggin [this message]
2010-10-20 13:14 ` Al Viro
2010-10-20 13:59 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101020030501.GG3740@amd \
--to=npiggin@kernel.dk \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).