From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Phillips Subject: Re: [RFC] Tux3 for review Date: Thu, 19 Jun 2014 14:58:20 -0700 Message-ID: References: <5376B273.7000800@partner.samsung.com> <20140518235555.GC18954@dastard> <537AA802.408@phunq.net> <20140520031802.GF18954@dastard> <20140613103216.GA4589@amd.pavel.ucw.cz> <02d3b094-808c-4b17-903d-1280d451704b@phunq.net> <20140613202039.GA23872@amd.pavel.ucw.cz> <1402932354.2197.61.camel@dabdike.int.hansenpartnership.com> <20140619082129.GA4309@xo-6d-61-c0.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Pavel Machek , James Bottomley , Dave Chinner , , , Linus Torvalds , Andrew Morton To: =?utf-8?B?THVrw6HFoSBDemVybmVy?= Return-path: Received: from mail.phunq.net ([184.71.0.62]:39088 "EHLO starbase.phunq.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965284AbaFSV6c convert rfc822-to-8bit (ORCPT ); Thu, 19 Jun 2014 17:58:32 -0400 In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thursday, June 19, 2014 2:26:48 AM PDT, Luk=C3=A1=C5=A1 Czerner wrot= e: > On Thu, 19 Jun 2014, Pavel Machek wrote: > >> Date: Thu, 19 Jun 2014 10:21:29 +0200 >> From: Pavel Machek >> To: James Bottomley >> Cc: Daniel Phillips , Dave Chinner=20 >> , >> linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, > ... > > Flamed ? really ? Yes, really. There were valid points and there were also unabashed flam= es.=20 The latter are not helpful to anybody, even the flamer. But note that t= here=20 were no counter flames. The boy scout rule applies: always leave your=20 campsite cleaner than you found it. > Dave pointed out some serious coding style problems. > Those should be very easy to fix. One needs to be careful about the definition of "fix" so that it does n= ot=20 turn into "throw the baby out with the bath water". Our kernel code=20 necessarily has a few __KERNEL__ #ifdefs because the majority of it als= o=20 runs in user space. This not a feature to disparage, far from it. Among other benefits, running in user space supports automated unit tes= ting=20 at fine granularity. We run make tests as a habit to catch a wide spect= rum=20 of correctness regressions. A successful make tests usually indicates t= hat=20 the heavyweight kernel stress tests are going to pass. Obviously, there= are=20 occasional exceptions to this. For example user space does not catch SM= P=20 races. In practice, only a handful of those have slipped through and=20 required kernel level bug chasing. That said, we will will happily merge any concrete suggestion that redu= ces=20 the frequency of __KERNEL__. But please be realistic. There are 32=20 __KERNEL__ ifdefs in our 18K line code base. That hardly amounts to a=20 "dog's breakfast". > Let me remind you some more important problems Dave brought up, > including page forking: > > " > The hacks around VFS and MM functionality need to have demonstrated > methods for being removed. We already removed 450 lines of core kernel workarounds from Tux3 with = an=20 approach that was literally cut and pasted from one of Dave's emails. T= hen=20 Dave changed his mind. Now the Tux3 team has been assigned a research=20 project to improve core kernel writeback instead of simply adapting the= =20 approach that is already proven to work well enough. That is a rather=20 blatant example of "perfect is the enemy of good enough". Please read t= he=20 thread. > We're not going to merge that page > forking stuff (like you were told at LSF 2013 more than a year ago: > http://lwn.net/Articles/548091/) without rigorous design review and > a demonstration of the solutions to all the hard corner cases it > has. The current code doesn't solve them (e.g. direct IO doesn't > work in tux3), and there's no clear patch set we can review that > demonstrates how it is all supposed to work. i.e. you need to > separate out all the page forking code into a separate patchset for > review, independent of the tux3 code and applies to the core mm/ > code. > " Direct IO is a spurious issue. To recap: direct IO does not introduce a= ny=20 new page forking issues. All of the page forking issues already exist w= ith=20 normal buffered IO and mmap. We have little interest and scant availabl= e=20 time for heading off on a tangent to implement direct IO at this point = just=20 as a precondition for merging. On the other hand, page forking itself has a number of interesting issu= es.=20 Hirofumi is currently preparing a set of core kernel patches for review= =2E=20 These patches explicitly do not attempt to package page forking up into= a=20 nice and easy API that other filesystems could patch in tomorrow. That=20 would be an unreasonable research burden on our small development team.= =20 Instead, we show how it works in Tux3, and if other filesystems want to= get=20 those benefits, they can make similar changes. If we (the kernel commun= ity)=20 are lucky enough to find a pattern in it such that substantial parts of= the=20 code can be abstracted into a library, then good. But requiring such a=20 library to be developed as a precondition to merging Tux3 is unreasonab= le. > " > Then there's all the writeback hacks. You've simply copy-n-pasted > most of fs-writeback.c, including duplicating structures like struct > wb_writeback_work and then hacked in crap (kallsyms lookups!) to be > able to access core structures from kernel module context > (tux3_setup_writeback(), I'm looking at you). This is completely > unacceptible for a merge. Again, you need to separate out all the > writeback changes you need into an independent patchset so that they > can be reviewed independently of the tux3 code that uses it. > " That was already fixed as noted above, and all the relevant changes wer= e=20 already posted as an independent patch set. After that, some developers= =20 weighed in with half formed ideas about how the same thing could be don= e=20 better, but without concrete suggestions. There is nothing wrong with h= alf=20 formed ideas, except when they turn into a way of blocking forward=20 progress. See "perfect is the enemy of good enough" above. It is worth noting that we (the kernel community) have been thrashing a= way=20 at the writeback problem for more than twenty years, and the current=20 solution still leaves much to be desired. It is unfair to expect us, th= e=20 Tux3 team, to fix that mess in a week or two, just to merge our filesys= tem.=20 We prefer to adapt the existing infrastructure for now, as expressed in= the=20 currently proposed patch set. With that, we allow core to mark our inod= es=20 dirty just as it has always done, and we continue to use the usual inod= e=20 writeback lists for writeback sheduling, which work just fine. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html