* Re: Linux 2.6.35 [not found] ` <20100802055834.GB19164@dastard> @ 2010-08-02 7:55 ` Nick Piggin 2010-08-02 8:24 ` Christoph Hellwig 0 siblings, 1 reply; 9+ messages in thread From: Nick Piggin @ 2010-08-02 7:55 UTC (permalink / raw) To: Dave Chinner, linux-fsdevel; +Cc: Linus Torvalds, Linux Kernel Mailing List On Mon, Aug 02, 2010 at 03:58:34PM +1000, Dave Chinner wrote: > On Sun, Aug 01, 2010 at 07:50:02PM -0700, Linus Torvalds wrote: > > On Sun, Aug 1, 2010 at 7:33 PM, Dave Chinner <david@fromorbit.com> wrote: > > > > > > There hasn't been nearly enough review or testing of this patch > > > series yet. Before a merge, it needs to be split up in smaller, > > > more digestable chunks for more comprehensive review, regression > > > testing and behavioural analysis. BTW. it has in fact had quite a bit of testing in earlier form in the -rt tree for a long time, and several fixes come from there. And good performance results there too. > > I dunno. We merge _way_ scarier things in the VM and the block layer, > > for much less actual upside, and with less review. > > Scary stuff outside of direct VFS/FS interfaces is generally hidden > from me by my +6 Blinkers of Blissful Ignorance. I make the > assumption that the experts involved know the risks and have weighed > them up appropriately. ;) > > > The RCU pathname lookup has some rather impressive performance > > upsides, and I agree that it would be good to get a lot of review and > > testing, but the latter isn't going to happen without it being > > mainlined, and the former is sadly lacking. The person I'd like most > > to review it is Al, > > Most definitely. I hate to say but I would like to see it mature for another release. It should also clash a bit with Al's recent inode work that he'll want to push. What I can do is send some of the ground work patches this time around, put the tree into linux-next, and put reviewers on notice. I think it is all conceptually sound, but it will inevitably have some bugs left to shake out, and things to be fixed on the review side. I don't anticipate a problem that could not be fixed in the release cycle, but I think aiming for post 2.6.36 is a bit fairer for vfs guys, honestly. LSF is next week too, so most of them will be busy with travel and such. But I do hope to discuss the vfs-scale patches there. > > but anybody in the filesystem world should > > basically see it as a #1 priority, > > Agreed - I've actually looked at every patch, commented on some > of the more questionable things, got quoted by LWN for saying that > it "fell off the locking cliff", have run benchmarks on it and sent > patches fixing bugs back to Nick. > > It's just really hard to digest it all in one lump and core VFS > changes on this scale scare me.... For filesystems developers, the dcache and inode locking changes should be more or less just following simple steps as shown in the patch series. If they're not abusing dcache_lock (and most except autofs4 are not), then it should not be a big deal. There are a couple of locking constraints changed at the API level, but I didn't run into any problems there yet. It should be all documented in Documentation/filesystems/* although I need to run a few more passes over the series to ensure I caught everything. > > because unlike all the masturbatory > > patches like xstat() that add new functionality that nobody will > > likely ever use, Nick's patchseries improves on the thing that > > everybody uses heavily every day without even thinking about it. > > > > Is it tough to review? Yes. It's core code, not just some random > > addition that adds a new feature and doesn't impact any old code. But > > that's also the thing that makes it meaningful, and makes me think it > > should get merged _much_ more eagerly than most code we ever see. > > I agree with you for the pure locking changes. > > But for the bits that change writeback, LRU ordering and reclaim > calculations the benefits are not quite so obvious, nor is the > correctness of the code/behaviour quite so provably correct. Maybe > I'm being a bit too paranoid, but generally it pays to be a bit > conservative as a filesystem developer because the cost of screwing > up can be pretty high... Writeback shouldn't be changed. LRU ordering is changed for 2 reasons. Firstly, to make things per-zone instead of global. This basically fits our whole reclaim model much better, although it will inevitably cause some random little changes but I think it is agreed this is a good thing (memory shortage in one zone or node does not require global shrinkings, NUMA level parallelism of reclaim.) The other thing is converting the last few dcache refcounting, and all of inode refcounting over to this "lazy LRU" model. This can have a bigger impact, but it really reduces locking on the per-zone lists, so it definitely helps speed and scalability of non-reclaim fastpaths. I'm up for changing this if numbers show it hurts, it would be rather easy to do, but in comparison to the overall patchset, it would rate as a minor tweak :) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux 2.6.35 2010-08-02 7:55 ` Linux 2.6.35 Nick Piggin @ 2010-08-02 8:24 ` Christoph Hellwig 2010-08-02 8:46 ` KOSAKI Motohiro ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Christoph Hellwig @ 2010-08-02 8:24 UTC (permalink / raw) To: Nick Piggin Cc: Dave Chinner, linux-fsdevel, Linus Torvalds, Linux Kernel Mailing List On Mon, Aug 02, 2010 at 05:55:37PM +1000, Nick Piggin wrote: > I hate to say but I would like to see it mature for another release. It > should also clash a bit with Al's recent inode work that he'll want to > push. > > What I can do is send some of the ground work patches this time around, > put the tree into linux-next, and put reviewers on notice. > > I think it is all conceptually sound, but it will inevitably have some > bugs left to shake out, and things to be fixed on the review side. I > don't anticipate a problem that could not be fixed in the release cycle, > but I think aiming for post 2.6.36 is a bit fairer for vfs guys, > honestly. LSF is next week too, so most of them will be busy with travel > and such. But I do hope to discuss the vfs-scale patches there. What I'm most concerned bit merging everything in one go. It's a huge series and I'd rather see it start going in in batches over multiple kernel releases. Things like the fs_struct spinlock and some other preparatory patches should be ver easily to do for 2.6.36. Scaling the files and vfsmount locks should also be easily doable, but we need to sort out the struct file growth in the later. We really can't grow struct file by two pointers as that would have devasting effects on various workloads. What follows after that is the dcache_lock scaling which to seems the most immature bit of the series, and the one that showed by far the most problems in -RT. I'm very much dead set against merging that in .36. I'd much rather see the inode_lock scaling or the lockless path walk going in before, but I haven't checked how complicated the reordering would be. The lockless path walk also is only rather theoretically useful until we do ACL checks lockless as we're having ACLs enabled pretty much everywhere at least in the distros. The per-zone shrinkers are another thing that's not directly related, I think they need a lot more discussion with the VM folks, and integrating with Dave's work in that area. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux 2.6.35 2010-08-02 8:24 ` Christoph Hellwig @ 2010-08-02 8:46 ` KOSAKI Motohiro 2010-08-02 9:05 ` Christoph Hellwig 2010-08-02 9:51 ` Nick Piggin 2 siblings, 0 replies; 9+ messages in thread From: KOSAKI Motohiro @ 2010-08-02 8:46 UTC (permalink / raw) To: Christoph Hellwig Cc: kosaki.motohiro, Nick Piggin, Dave Chinner, linux-fsdevel, Linus Torvalds, Linux Kernel Mailing List > The per-zone shrinkers are another thing that's not directly related, > I think they need a lot more discussion with the VM folks, and > integrating with Dave's work in that area. per-zone shrinkers don't cause so much impact to VM design except zone reclaim feature. So, if FS folks think it's ok, I'm not against this at all. btw, however, I haven't review such patch series in the detail yet. so perhaps I might post some bug fix later. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux 2.6.35 2010-08-02 8:24 ` Christoph Hellwig 2010-08-02 8:46 ` KOSAKI Motohiro @ 2010-08-02 9:05 ` Christoph Hellwig 2010-08-02 10:07 ` Nick Piggin 2010-08-02 9:51 ` Nick Piggin 2 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2010-08-02 9:05 UTC (permalink / raw) To: Nick Piggin Cc: Dave Chinner, linux-fsdevel, Linus Torvalds, Linux Kernel Mailing List On Mon, Aug 02, 2010 at 04:24:28AM -0400, Christoph Hellwig wrote: > .36. I'd much rather see the inode_lock scaling or the lockless path > walk going in before, but I haven't checked how complicated the > reordering would be. The lockless path walk also is only rather > theoretically useful until we do ACL checks lockless as we're having > ACLs enabled pretty much everywhere at least in the distros. >From a quick look it seems like the inode_lock splitup can easily be moved forward, and it would help us with doing some work on the writeback side. The problem is that it would need rebasing ontop of both the vfs and writeback (aka block) trees. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux 2.6.35 2010-08-02 9:05 ` Christoph Hellwig @ 2010-08-02 10:07 ` Nick Piggin 0 siblings, 0 replies; 9+ messages in thread From: Nick Piggin @ 2010-08-02 10:07 UTC (permalink / raw) To: Christoph Hellwig Cc: Nick Piggin, Dave Chinner, linux-fsdevel, Linus Torvalds, Linux Kernel Mailing List On Mon, Aug 02, 2010 at 05:05:42AM -0400, Christoph Hellwig wrote: > On Mon, Aug 02, 2010 at 04:24:28AM -0400, Christoph Hellwig wrote: > > .36. I'd much rather see the inode_lock scaling or the lockless path > > walk going in before, but I haven't checked how complicated the > > reordering would be. The lockless path walk also is only rather > > theoretically useful until we do ACL checks lockless as we're having > > ACLs enabled pretty much everywhere at least in the distros. > > >From a quick look it seems like the inode_lock splitup can easily > be moved forward, and it would help us with doing some work on the > writeback side. The problem is that it would need rebasing ontop > of both the vfs and writeback (aka block) trees. inode_lock splitup is much simpler than dcache_lock, yes. And I have to rebase it on the work currently queued for 2.6.35 anyway, so that's no problem. I can easily put it in front of dcache_lock patches in the series (as I said, I've kept everything independent and well split up). I do want opinions on how to do the big-picture merge, though, before I start moving things around. And obviously reviewing each of the parts is more important at this point than exact way to order the thing. But even the inode_lock patches I am wary of merging in 2.6.36 without having much review or any linux-next / vfs-tree exposure. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux 2.6.35 2010-08-02 8:24 ` Christoph Hellwig 2010-08-02 8:46 ` KOSAKI Motohiro 2010-08-02 9:05 ` Christoph Hellwig @ 2010-08-02 9:51 ` Nick Piggin 2 siblings, 0 replies; 9+ messages in thread From: Nick Piggin @ 2010-08-02 9:51 UTC (permalink / raw) To: Christoph Hellwig Cc: Nick Piggin, Dave Chinner, linux-fsdevel, Linus Torvalds, Linux Kernel Mailing List On Mon, Aug 02, 2010 at 04:24:28AM -0400, Christoph Hellwig wrote: > On Mon, Aug 02, 2010 at 05:55:37PM +1000, Nick Piggin wrote: > > I hate to say but I would like to see it mature for another release. It > > should also clash a bit with Al's recent inode work that he'll want to > > push. > > > > What I can do is send some of the ground work patches this time around, > > put the tree into linux-next, and put reviewers on notice. > > > > I think it is all conceptually sound, but it will inevitably have some > > bugs left to shake out, and things to be fixed on the review side. I > > don't anticipate a problem that could not be fixed in the release cycle, > > but I think aiming for post 2.6.36 is a bit fairer for vfs guys, > > honestly. LSF is next week too, so most of them will be busy with travel > > and such. But I do hope to discuss the vfs-scale patches there. > > What I'm most concerned bit merging everything in one go. It's a huge > series and I'd rather see it start going in in batches over multiple > kernel releases. One problem is that to win much benefit, several different aspects must be scaled. If not, then you end up with more locks *and* still have bouncing global cachelines. And filesystems will go through multiple releases where locking changes are in flux. This is what I'm concerned about. I definitely have tried to keep everything as conceptually seperate small chunks. But there is a real big-picture aspect that is required to review it. For example, you asked for just the locking split-up without any of the per-hash-locks and per-cpu locks etc. That's fine for review, but you cannot merge it because then you end up with N bouncing global locks instead of 1. It also tends to be much uglier than a final outcome because I have not applied any transformations to improve lock orderings and reduce trylocking etc. > Things like the fs_struct spinlock and some other preparatory patches > should be ver easily to do for 2.6.36. Scaling the files and vfsmount > locks should also be easily doable, but we need to sort out the struct > file growth in the later. We really can't grow struct file by two > pointers as that would have devasting effects on various workloads. Strictly, it is a filesystem corruption bug-fix for the tty layer and nothing to do with tty scaling patches. I don't have the patience at the moment to sort through tty layer crap, but whoever is maintaining that should. I could possibly come back and look at it some point, but given your half-working patch as a guide, I think someone who knows the code can fix it. > What follows after that is the dcache_lock scaling which to seems the > most immature bit of the series, and the one that showed by far the > most problems in -RT. I'm very much dead set against merging that in > .36. That's a fair point, I agree with. It needs most review. > I'd much rather see the inode_lock scaling or the lockless path > walk going in before, but I haven't checked how complicated the > reordering would be. I would much prefer not to re-order it before either of inode or dcache scaling patches. It would introduce a lot of churn and locking is significantly changed. It probably should be possible, although we would still get path walk contention on dcache_lock, vfsmount_lock, and requires inode-RCU (making inodes more expensive without being offset by any benefits of inode scaling), and requires changes to filesystem dcache and inode APIs. I could work on re-ordering it certainly, but only if it is decided that we definitely don't want dcache-scale or inode-scale patch sets in the forseeable future. I think we definitely do want them, so I find it hard to justify a big reordering. > The lockless path walk also is only rather > theoretically useful until we do ACL checks lockless as we're having > ACLs enabled pretty much everywhere at least in the distros. True, it needs a last bit of work for permission checking. The conceptual idea and the bulk of the code I think is ready to review though. ACLs should be just more of the same. > The per-zone shrinkers are another thing that's not directly related, > I think they need a lot more discussion with the VM folks, and > integrating with Dave's work in that area. Well I'm a VM folk :) Conceptually, there is no problems for MM here. This is really the right way to drive reclaim from the MM perspective (ie. per-zone). Of course I will work with Dave and take suggestions on implementation. It is directly related in that it is required to remove global lock and global list scanning from vfs reclaim, which is something that we've known and wanted for a long time. On one hand, you might say I'm going overboard, but on another hand, vfs really sucks on NUMA and SMP right now and it's only going to get worse for "normal" (ie. not HPC) people. ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <87tyncds89.fsf@basil.nowhere.org>]
* Re: Linux 2.6.35 [not found] ` <87tyncds89.fsf@basil.nowhere.org> @ 2010-08-03 9:28 ` Nick Piggin 2010-08-03 9:49 ` Andi Kleen 2010-08-03 15:05 ` Henrique de Moraes Holschuh 0 siblings, 2 replies; 9+ messages in thread From: Nick Piggin @ 2010-08-03 9:28 UTC (permalink / raw) To: Andi Kleen, linux-fsdevel, Christoph Hellwig Cc: Dave Chinner, Linus Torvalds, Linux Kernel Mailing List, npiggin On Tue, Aug 03, 2010 at 10:18:30AM +0200, Andi Kleen wrote: > Dave Chinner <david@fromorbit.com> writes: > > > On Sun, Aug 01, 2010 at 04:52:42PM -0700, Linus Torvalds wrote: > >> On a slightly happier note: one thing I do hope we can merge in the > >> upcoming merge window is Nick Piggin's cool VFS scalability series. > >> I've been using it on my own machine, and gone through all the commits > >> (not that I shouldn't go through some of them some more), and am > >> personally really excited about it. It's seldom we see major > >> performance improvements in core code that are quite that noticeable, > >> and Nick's whole RCU pathname lookup in particular just tickles me > >> pink. > > > > There hasn't been nearly enough review or testing of this patch > > series yet. Before a merge, it needs to be split up in smaller, > > more digestable chunks for more comprehensive review, regression > > testing and behavioural analysis. > > We started some testing of the VFS series on larger systems and so > far it looks all very good and the performance improvements are impressive > (but of course new bottlenecks are being exposed then) > > The only snag found so far was that an ACL enabled file system > disables all the nice path walk improvements, so right now you > need to remount with noacl. I'm hoping this can be fixed > before a mainline release, otherwise I suspect it would disable > the improvements for lots of people (common distributions default > to acl on) OK, vfs-scale-working branch now has commits to enable rcu-walk aware d_revalidate, permission, and check_acl in the filesystems, and implements a basic rcu-walk aware scheme for generic/posix acls and implements that for tmpfs, ext2, btrfs. It just drops out of rcu-walk if there is an ACL on a directory, or if it is not cached. I think that's enough to be production ready now. Pushing rcu-walk awareness down into acl checking code would not be hard. I was under the impression that ACLs on directories are not that common, so maybe this is as far as we need to go for now anyway. It does need more commenting of the new methods and explaining how they can and can't be used by filesystems. The tree is also getting messy with incremental changes -- I'm avoiding rebasing it so people following it can see response to reviews and issues that arise. Obviously it will all get cleaned up and rebased properly onto a new branch before anything is merged. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux 2.6.35 2010-08-03 9:28 ` Nick Piggin @ 2010-08-03 9:49 ` Andi Kleen 2010-08-03 15:05 ` Henrique de Moraes Holschuh 1 sibling, 0 replies; 9+ messages in thread From: Andi Kleen @ 2010-08-03 9:49 UTC (permalink / raw) To: Nick Piggin Cc: Andi Kleen, linux-fsdevel, Christoph Hellwig, Dave Chinner, Linus Torvalds, Linux Kernel Mailing List On Tue, Aug 03, 2010 at 07:28:23PM +1000, Nick Piggin wrote: > OK, vfs-scale-working branch now has commits to enable rcu-walk aware > d_revalidate, permission, and check_acl in the filesystems, and > implements a basic rcu-walk aware scheme for generic/posix acls and Cool. I was just looking at that. > I was under the impression that ACLs on directories are not that common, > so maybe this is as far as we need to go for now anyway. Yes it shouldn't be common normally. I think the common case for distros is just a few ACLs in /dev. Of course you never know for specific end user workloads. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Linux 2.6.35 2010-08-03 9:28 ` Nick Piggin 2010-08-03 9:49 ` Andi Kleen @ 2010-08-03 15:05 ` Henrique de Moraes Holschuh 1 sibling, 0 replies; 9+ messages in thread From: Henrique de Moraes Holschuh @ 2010-08-03 15:05 UTC (permalink / raw) To: Nick Piggin Cc: Andi Kleen, linux-fsdevel, Christoph Hellwig, Dave Chinner, Linus Torvalds, Linux Kernel Mailing List On Tue, 03 Aug 2010, Nick Piggin wrote: > I was under the impression that ACLs on directories are not that common, > so maybe this is as far as we need to go for now anyway. They are quite common on fileserver data areas, at least on the places where I worked at. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-08-03 15:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <AANLkTim8FALeVG+NPLNLEQChu=dPD=GHSpnxmZHYgsNx@mail.gmail.com> [not found] ` <20100802023322.GA19164@dastard> [not found] ` <AANLkTint_s6h_EC7mDiPsyxr=C0GSrnrgJYkCCU7JEtN@mail.gmail.com> [not found] ` <20100802055834.GB19164@dastard> 2010-08-02 7:55 ` Linux 2.6.35 Nick Piggin 2010-08-02 8:24 ` Christoph Hellwig 2010-08-02 8:46 ` KOSAKI Motohiro 2010-08-02 9:05 ` Christoph Hellwig 2010-08-02 10:07 ` Nick Piggin 2010-08-02 9:51 ` Nick Piggin [not found] ` <87tyncds89.fsf@basil.nowhere.org> 2010-08-03 9:28 ` Nick Piggin 2010-08-03 9:49 ` Andi Kleen 2010-08-03 15:05 ` Henrique de Moraes Holschuh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).