* ->direct_IO API change in current 2.4 BK @ 2003-07-09 12:31 Christoph Hellwig 2003-07-09 17:03 ` Andreas Dilger 0 siblings, 1 reply; 27+ messages in thread From: Christoph Hellwig @ 2003-07-09 12:31 UTC (permalink / raw) To: marcelo, trond.myklebust; +Cc: linux-kernel I just got a nice XFS oops due to the direct_IO API change in 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly that, _STABLE_. If you really think O_DIRECT on NFS is soo important please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack. But what's the use of it anyway? AFAIK it's mostly for whoracle setups that have their data on netapps but that needs a certified vendor kernel not mainline.. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 12:31 ->direct_IO API change in current 2.4 BK Christoph Hellwig @ 2003-07-09 17:03 ` Andreas Dilger 2003-07-09 17:24 ` Marcelo Tosatti 0 siblings, 1 reply; 27+ messages in thread From: Andreas Dilger @ 2003-07-09 17:03 UTC (permalink / raw) To: Christoph Hellwig, marcelo, trond.myklebust, linux-kernel On Jul 09, 2003 13:31 +0100, Christoph Hellwig wrote: > I just got a nice XFS oops due to the direct_IO API change in > 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly > that, _STABLE_. If you really think O_DIRECT on NFS is soo important > please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack. I would have to agree with that sentiment - we shouldn't change the API in an "almost compatible" way, although I would have hoped that compile warnings and/or module symbol versioning would have avoided a crash. > But what's the use of it anyway? AFAIK it's mostly for whoracle setups > that have their data on netapps but that needs a certified vendor kernel > not mainline.. Actually, it is useful for Lustre to do this, because it allows us to have a file handle (which, naturally, holds per-file data) at the time the IO is sent over the wire, instead of the "anonymous" writes that happen now. This helps us with readahead on the server and other minor improvements. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 17:03 ` Andreas Dilger @ 2003-07-09 17:24 ` Marcelo Tosatti 2003-07-09 17:43 ` Marc-Christian Petersen 2003-07-09 18:29 ` Trond Myklebust 0 siblings, 2 replies; 27+ messages in thread From: Marcelo Tosatti @ 2003-07-09 17:24 UTC (permalink / raw) To: Andreas Dilger; +Cc: Christoph Hellwig, marcelo, Trond Myklebust, lkml On Wed, 9 Jul 2003, Andreas Dilger wrote: > On Jul 09, 2003 13:31 +0100, Christoph Hellwig wrote: > > I just got a nice XFS oops due to the direct_IO API change in > > 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly > > that, _STABLE_. If you really think O_DIRECT on NFS is soo important > > please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack. > > I would have to agree with that sentiment - we shouldn't change the > API in an "almost compatible" way, although I would have hoped that > compile warnings and/or module symbol versioning would have avoided > a crash. > > > But what's the use of it anyway? AFAIK it's mostly for whoracle setups > > that have their data on netapps but that needs a certified vendor kernel > > not mainline.. > > Actually, it is useful for Lustre to do this, because it allows us to have > a file handle (which, naturally, holds per-file data) at the time the IO is > sent over the wire, instead of the "anonymous" writes that happen now. > This helps us with readahead on the server and other minor improvements. Fine, I agree. Trond, I'll have to revert your direct IO patches because they break the _stable_ API, indeed. Please come up with another solution for the problem (->direct_IO2 ? its ugly, but...). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 17:24 ` Marcelo Tosatti @ 2003-07-09 17:43 ` Marc-Christian Petersen 2003-07-09 17:46 ` Marcelo Tosatti 2003-07-09 18:29 ` Trond Myklebust 1 sibling, 1 reply; 27+ messages in thread From: Marc-Christian Petersen @ 2003-07-09 17:43 UTC (permalink / raw) To: Marcelo Tosatti, Andreas Dilger, Andrea Arcangeli Cc: Christoph Hellwig, marcelo, Trond Myklebust, lkml On Wednesday 09 July 2003 19:24, Marcelo Tosatti wrote: Hi, > > > I just got a nice XFS oops due to the direct_IO API change in > > > 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly > > > that, _STABLE_. If you really think O_DIRECT on NFS is soo important > > > please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack. I wonder why -aa and -wolk don't have these problems with O_DIRECT vs. XFS. ciao, Marc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 17:43 ` Marc-Christian Petersen @ 2003-07-09 17:46 ` Marcelo Tosatti 2003-07-09 17:55 ` Marc-Christian Petersen 0 siblings, 1 reply; 27+ messages in thread From: Marcelo Tosatti @ 2003-07-09 17:46 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andreas Dilger, Andrea Arcangeli, Christoph Hellwig, marcelo, Trond Myklebust, lkml On Wed, 9 Jul 2003, Marc-Christian Petersen wrote: > On Wednesday 09 July 2003 19:24, Marcelo Tosatti wrote: > > Hi, > > > > > I just got a nice XFS oops due to the direct_IO API change in > > > > 2.4. Guys, this is a STABLE series and APIs are supposed to be exactly > > > > that, _STABLE_. If you really think O_DIRECT on NFS is soo important > > > > please add a ->direct_IO2 for NFS like the reiserfs read_inode2 hack. > I wonder why -aa and -wolk don't have these problems with O_DIRECT vs. XFS. Do they have the NFS DIRECT IO patch? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 17:46 ` Marcelo Tosatti @ 2003-07-09 17:55 ` Marc-Christian Petersen 2003-07-09 18:08 ` Marcelo Tosatti 0 siblings, 1 reply; 27+ messages in thread From: Marc-Christian Petersen @ 2003-07-09 17:55 UTC (permalink / raw) To: Marcelo Tosatti Cc: Andreas Dilger, Andrea Arcangeli, Christoph Hellwig, marcelo, Trond Myklebust, lkml On Wednesday 09 July 2003 19:46, Marcelo Tosatti wrote: Hi Marcelo, > > > > > I just got a nice XFS oops due to the direct_IO API change in > > > > > 2.4. Guys, this is a STABLE series and APIs are supposed to be > > > > > exactly that, _STABLE_. If you really think O_DIRECT on NFS is soo > > > > > important please add a ->direct_IO2 for NFS like the reiserfs > > > > > read_inode2 hack. > > > > I wonder why -aa and -wolk don't have these problems with O_DIRECT vs. > > XFS. > Do they have the NFS DIRECT IO patch? Yes. If not, my sentence would be superflous ;) ciao, Marc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 17:55 ` Marc-Christian Petersen @ 2003-07-09 18:08 ` Marcelo Tosatti 2003-07-09 18:22 ` Marc-Christian Petersen 2003-07-09 18:33 ` Trond Myklebust 0 siblings, 2 replies; 27+ messages in thread From: Marcelo Tosatti @ 2003-07-09 18:08 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andreas Dilger, Andrea Arcangeli, Christoph Hellwig, marcelo, Trond Myklebust, lkml On Wed, 9 Jul 2003, Marc-Christian Petersen wrote: > On Wednesday 09 July 2003 19:46, Marcelo Tosatti wrote: > > Hi Marcelo, > > > > > > > I just got a nice XFS oops due to the direct_IO API change in > > > > > > 2.4. Guys, this is a STABLE series and APIs are supposed to be > > > > > > exactly that, _STABLE_. If you really think O_DIRECT on NFS is soo > > > > > > important please add a ->direct_IO2 for NFS like the reiserfs > > > > > > read_inode2 hack. > > > > > > I wonder why -aa and -wolk don't have these problems with O_DIRECT vs. > > > XFS. > > Do they have the NFS DIRECT IO patch? > Yes. If not, my sentence would be superflous ;) Ok, right. Well, I dont know why it doesnt happen there. Maybe not enough testing? Anyway, I'm going to revert the NFS DIRECT IO patch because, as Christoph mentioned, breaks the API. I except another solution from Trond (maybe ->direct_IO2). ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:08 ` Marcelo Tosatti @ 2003-07-09 18:22 ` Marc-Christian Petersen 2003-07-09 19:13 ` Marcelo Tosatti 2003-07-09 18:33 ` Trond Myklebust 1 sibling, 1 reply; 27+ messages in thread From: Marc-Christian Petersen @ 2003-07-09 18:22 UTC (permalink / raw) To: Marcelo Tosatti Cc: Andreas Dilger, Andrea Arcangeli, Christoph Hellwig, marcelo, Trond Myklebust, lkml On Wednesday 09 July 2003 20:08, Marcelo Tosatti wrote: Hi Marcelo, > Ok, right. Well, I dont know why it doesnt happen there. Maybe not enough > testing? aehm, I'll bet -aa is tested _alot_ and also -wolk has tons of testers, but well, this is a different talk ;) > Anyway, I'm going to revert the NFS DIRECT IO patch because, as Christoph > mentioned, breaks the API. > I except another solution from Trond (maybe ->direct_IO2). I wonder why you merge such stuff then in the first place, if you agree with hch's opinion, that an API should not change in stable kernel series in the 2nd place? ... Did you temporarly forget it? ;) ciao, Marc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:22 ` Marc-Christian Petersen @ 2003-07-09 19:13 ` Marcelo Tosatti 2003-07-09 19:45 ` Marc-Christian Petersen 2003-07-09 23:43 ` Alan Cox 0 siblings, 2 replies; 27+ messages in thread From: Marcelo Tosatti @ 2003-07-09 19:13 UTC (permalink / raw) To: Marc-Christian Petersen Cc: Andreas Dilger, Andrea Arcangeli, Christoph Hellwig, marcelo, Trond Myklebust, lkml On Wed, 9 Jul 2003, Marc-Christian Petersen wrote: > On Wednesday 09 July 2003 20:08, Marcelo Tosatti wrote: > > Hi Marcelo, > > > Ok, right. Well, I dont know why it doesnt happen there. Maybe not enough > > testing? > aehm, I'll bet -aa is tested _alot_ and also -wolk has tons of testers, but > well, this is a different talk ;) > > > Anyway, I'm going to revert the NFS DIRECT IO patch because, as Christoph > > mentioned, breaks the API. > > I except another solution from Trond (maybe ->direct_IO2). > I wonder why you merge such stuff then in the first place, if you agree with > hch's opinion, that an API should not change in stable kernel series in the > 2nd place? ... Did you temporarly forget it? ;) No, I did not. I applied it because, in my ignorance, I did not noticed it would break the stable API. I applied it because I wanted comments useful from people (Like hch and others did). Thank you very much for your flaming. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 19:13 ` Marcelo Tosatti @ 2003-07-09 19:45 ` Marc-Christian Petersen 2003-07-09 23:43 ` Alan Cox 1 sibling, 0 replies; 27+ messages in thread From: Marc-Christian Petersen @ 2003-07-09 19:45 UTC (permalink / raw) To: Marcelo Tosatti Cc: Andreas Dilger, Andrea Arcangeli, Christoph Hellwig, marcelo, Trond Myklebust, lkml On Wednesday 09 July 2003 21:13, Marcelo Tosatti wrote: Hi Marcelo, > No, I did not. > I applied it because, in my ignorance, I did not noticed it would break > the stable API. > I applied it because I wanted comments useful from people (Like hch and > others did). > Thank you very much for your flaming. Somewhere deep in my head I knew you'd say that :-( .. When I flame, it looks very different. It was not any kind of flaming or any kind of bad meaning or such, just a simple question because I was just curious! ciao, Marc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 19:13 ` Marcelo Tosatti 2003-07-09 19:45 ` Marc-Christian Petersen @ 2003-07-09 23:43 ` Alan Cox 2003-07-10 0:21 ` Jeff Garzik 1 sibling, 1 reply; 27+ messages in thread From: Alan Cox @ 2003-07-09 23:43 UTC (permalink / raw) To: Marcelo Tosatti Cc: Marc-Christian Petersen, Andreas Dilger, Andrea Arcangeli, Christoph Hellwig, marcelo, Trond Myklebust, lkml On Mer, 2003-07-09 at 20:13, Marcelo Tosatti wrote: > I applied it because, in my ignorance, I did not noticed it would break > the stable API. > > I applied it because I wanted comments useful from people (Like hch and > others did). I'm not sure I see what the fuss is about a slight API change that is safe since it spews warnings/breaks existing code that isnt fixed. At least one vendor kernel also has the changed API anyway ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 23:43 ` Alan Cox @ 2003-07-10 0:21 ` Jeff Garzik 0 siblings, 0 replies; 27+ messages in thread From: Jeff Garzik @ 2003-07-10 0:21 UTC (permalink / raw) To: Alan Cox Cc: Marcelo Tosatti, Marc-Christian Petersen, Andreas Dilger, Andrea Arcangeli, Christoph Hellwig, marcelo, Trond Myklebust, lkml Alan Cox wrote: > On Mer, 2003-07-09 at 20:13, Marcelo Tosatti wrote: > >>I applied it because, in my ignorance, I did not noticed it would break >>the stable API. >> >>I applied it because I wanted comments useful from people (Like hch and >>others did). > > > I'm not sure I see what the fuss is about a slight API change that is > safe since it spews warnings/breaks existing code that isnt fixed. At > least one vendor kernel also has the changed API anyway "safe" ignores the pain of people trying to support multiple kernels. Each API change like the direct_IO one introduces ifdefs. Changing a function prototype is particularly annoying because you can't create a backwards-compat wrapper I disagree with the AC97 codec changes being merged into 2.4, too, for the same reason. Yes I recognize it is required to support new hardware. Yes I realize it vastly simplifies supporting some existing hardware. But I don't think you realize (or don't care?) about the maintenance pain created by the change. If a vendor wishes their driver to support 2.4.21 _and_ 2.4.22 (not a lot to ask), they must add a bunch of ifdef crud in their OSS driver. Feature and API additions are _far_ less painful than API changes in the middle of a stable series. Overall, I think we are looking at a question which needs to be answered by the community: what constitutes a stable series? when do we stop changing the API and let it stabilize? ... and I am writing a mail right now to ask that question (as requested by Marcelo and a couple others, though I wanted to do it for a while now). Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:08 ` Marcelo Tosatti 2003-07-09 18:22 ` Marc-Christian Petersen @ 2003-07-09 18:33 ` Trond Myklebust 2003-07-09 18:41 ` Marc-Christian Petersen 1 sibling, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2003-07-09 18:33 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Marc-Christian Petersen, lkml >>>>> " " == Marcelo Tosatti <marcelo@conectiva.com.br> writes: > Ok, right. Well, I dont know why it doesnt happen there. Maybe > not enough testing? Probably nobody is applying the XFS patches on those kernel? Cheers, Trond ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:33 ` Trond Myklebust @ 2003-07-09 18:41 ` Marc-Christian Petersen 2003-07-09 18:50 ` Trond Myklebust 0 siblings, 1 reply; 27+ messages in thread From: Marc-Christian Petersen @ 2003-07-09 18:41 UTC (permalink / raw) To: trond.myklebust, Marcelo Tosatti; +Cc: lkml On Wednesday 09 July 2003 20:33, Trond Myklebust wrote: Hi Trond, > >>>>> " " == Marcelo Tosatti <marcelo@conectiva.com.br> writes: > > Ok, right. Well, I dont know why it doesnt happen there. Maybe > > not enough testing? > Probably nobody is applying the XFS patches on those kernel? err, -aa has XFS per default, -wolk has XFS per default. So ... ;) ciao, Marc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:41 ` Marc-Christian Petersen @ 2003-07-09 18:50 ` Trond Myklebust 2003-07-09 18:55 ` Marc-Christian Petersen ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Trond Myklebust @ 2003-07-09 18:50 UTC (permalink / raw) To: Marc-Christian Petersen; +Cc: Marcelo Tosatti, lkml >>>>> " " == Marc-Christian Petersen <m.c.p@wolk-project.de> writes: > err, -aa has XFS per default, -wolk has XFS per default. So > ... ;) So they have both XFS + NFS O_DIRECT? The answer to your question is then that somebody made the trivial conversion on XFS... It's just a question of replacing the second argument of the direct_IO() method with a filp, then extracting the inode from that. A 2-liner patch at most... The point here is that Marcelo's tree does not include XFS, so my patch can't fix it up... As I said, I suggest replacing KERNEL_HAS_O_DIRECT with KERNEL_HAS_O_DIRECT2 so that the XFS patches can switch on that, and hence provide the 2-liner on newer kernels... Cheers, Trond ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:50 ` Trond Myklebust @ 2003-07-09 18:55 ` Marc-Christian Petersen 2003-07-09 19:05 ` Jeff Garzik 2003-07-09 23:40 ` Alan Cox 2 siblings, 0 replies; 27+ messages in thread From: Marc-Christian Petersen @ 2003-07-09 18:55 UTC (permalink / raw) To: trond.myklebust; +Cc: Marcelo Tosatti, lkml On Wednesday 09 July 2003 20:50, Trond Myklebust wrote: Hi Trond, > So they have both XFS + NFS O_DIRECT? yes. > The answer to your question is then that somebody made the trivial > conversion on XFS... It's just a question of replacing the second > argument of the direct_IO() method with a filp, then extracting the > inode from that. A 2-liner patch at most... EXACT! That was my intention with my small first post ;) > The point here is that Marcelo's tree does not include XFS, so my > patch can't fix it up... > As I said, I suggest replacing KERNEL_HAS_O_DIRECT with > KERNEL_HAS_O_DIRECT2 so that the XFS patches can switch on that, and > hence provide the 2-liner on newer kernels... It's very okay with me. ciao, Marc ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:50 ` Trond Myklebust 2003-07-09 18:55 ` Marc-Christian Petersen @ 2003-07-09 19:05 ` Jeff Garzik 2003-07-09 19:08 ` Trond Myklebust 2003-07-09 23:40 ` Alan Cox 2 siblings, 1 reply; 27+ messages in thread From: Jeff Garzik @ 2003-07-09 19:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: Marc-Christian Petersen, Marcelo Tosatti, lkml On Wed, Jul 09, 2003 at 08:50:59PM +0200, Trond Myklebust wrote: > >>>>> " " == Marc-Christian Petersen <m.c.p@wolk-project.de> writes: > > > err, -aa has XFS per default, -wolk has XFS per default. So > > ... ;) > > So they have both XFS + NFS O_DIRECT? > > The answer to your question is then that somebody made the trivial > conversion on XFS... It's just a question of replacing the second > argument of the direct_IO() method with a filp, then extracting the > inode from that. A 2-liner patch at most... > > The point here is that Marcelo's tree does not include XFS, so my > patch can't fix it up... > As I said, I suggest replacing KERNEL_HAS_O_DIRECT with > KERNEL_HAS_O_DIRECT2 so that the XFS patches can switch on that, and > hence provide the 2-liner on newer kernels... s/replacing/adding/ A new ->direct_IO2 hook would be an addition, so you really want to simply add another feature flag. Since the 2.5 direct_IO API is already different from current 2.4, I would also suggestion considering KERNEL24_HAS_O_DIRECT2 as the name, to specify the feature is specific to 2.4. Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 19:05 ` Jeff Garzik @ 2003-07-09 19:08 ` Trond Myklebust 2003-07-09 19:17 ` Jeff Garzik 0 siblings, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2003-07-09 19:08 UTC (permalink / raw) To: Jeff Garzik; +Cc: Marc-Christian Petersen, Marcelo Tosatti, lkml >>>>> " " == Jeff Garzik <jgarzik@pobox.com> writes: > s/replacing/adding/ Revert ;-) > A new ->direct_IO2 hook would be an addition, so you really > want to simply add another feature flag. You missed the point. This is *instead* of ->direct_IO2. It's only purpose would be to distinguish between the old int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int); and the new int (*direct_IO)(int, struct file *, struct kiobuf *, unsigned long, int); Cheers, Trond ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 19:08 ` Trond Myklebust @ 2003-07-09 19:17 ` Jeff Garzik 2003-07-09 19:51 ` Trond Myklebust 2003-07-09 23:42 ` Alan Cox 0 siblings, 2 replies; 27+ messages in thread From: Jeff Garzik @ 2003-07-09 19:17 UTC (permalink / raw) To: Trond Myklebust; +Cc: Marc-Christian Petersen, Marcelo Tosatti, lkml On Wed, Jul 09, 2003 at 09:08:53PM +0200, Trond Myklebust wrote: > >>>>> " " == Jeff Garzik <jgarzik@pobox.com> writes: > > > s/replacing/adding/ > > Revert ;-) > > > A new ->direct_IO2 hook would be an addition, so you really > > want to simply add another feature flag. > > You missed the point. This is *instead* of ->direct_IO2. It's only > purpose would be to distinguish between the old > > int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int); > > > and the new > > int (*direct_IO)(int, struct file *, struct kiobuf *, unsigned long, int); I respectfully disagree, then. The 2.5 direct_IO API is already different from 2.4, so, changing the 2.4 stable API implies creating yet another different version of the API. When this situation presented itself earlier, with reiserfs, the solution was ->read_inode2. I think that same situation applies here. Having the stable API change, conditional on a define, is really nasty and IMO will create maintenance and support headaches down the line. I do not recall Linux VFS _ever_ having a hook's definition conditional. We should not start now... We need a 2.4-specific solution for this. ->direct_IO2 should suffice, and it follows historical example. XFS, ocfs, and others need only to test the HAVE_NEW_24_DIRECT_IO define. In the core kernel implementation, it is trivial to say "if direct_IO2 is non-NULL, use that, otherwise use direct_IO", without needing to make any code conditional at all. Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 19:17 ` Jeff Garzik @ 2003-07-09 19:51 ` Trond Myklebust 2003-07-09 21:43 ` Jeff Garzik 2003-07-09 23:42 ` Alan Cox 1 sibling, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2003-07-09 19:51 UTC (permalink / raw) To: Jeff Garzik Cc: Trond Myklebust, Marc-Christian Petersen, Marcelo Tosatti, lkml >>>>> " " == Jeff Garzik <jgarzik@pobox.com> writes: > Having the stable API change, conditional on a define, is > really nasty and IMO will create maintenance and support > headaches down the line. I do not recall Linux VFS _ever_ > having a hook's definition conditional. We should not start > now... direct_IO() was precisely such a conditional hook definition. It appeared in 2.4.15, and anybody who does not check for KERNEL_HAS_O_DIRECT is not backward compatible. To comment further: There is at least one example I can think of which was exactly equivalent to the proposed change, namely the redefinition of the filldir_t type in 2.4.9. It was admittedly not documented using a define... Note: We could at the same time replace the name direct_IO() with direct_IO2() (that has several precedents). There are currently only a small number of filesystems that provide O_DIRECT, and converting them all is (as has been pointed out before) trivial... The problem with read_inode2() was rather that it overloaded the the existing iget4() interface... Cheers, Trond ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 19:51 ` Trond Myklebust @ 2003-07-09 21:43 ` Jeff Garzik 0 siblings, 0 replies; 27+ messages in thread From: Jeff Garzik @ 2003-07-09 21:43 UTC (permalink / raw) To: trond.myklebust; +Cc: Marc-Christian Petersen, Marcelo Tosatti, lkml Trond Myklebust wrote: >>>>>>" " == Jeff Garzik <jgarzik@pobox.com> writes: > > > > Having the stable API change, conditional on a define, is > > really nasty and IMO will create maintenance and support > > headaches down the line. I do not recall Linux VFS _ever_ > > having a hook's definition conditional. We should not start > > now... > > direct_IO() was precisely such a conditional hook definition. It > appeared in 2.4.15, and anybody who does not check for > KERNEL_HAS_O_DIRECT is not backward compatible. You misunderstand. The 2.4.15 direct_IO hook was _not_ conditionally defined. It appeared in the middle of a stable series, yes. It has a feature macro, yes. But the definition of the hook in include/linux/fs.h does not _change_ based on a define. That is what I mean by a conditional hook definition. It is far less trouble for everyone to add a new hook, instead of changing an existing hook, in the middle of a stable series. > To comment further: There is at least one example I can think of which > was exactly equivalent to the proposed change, namely the redefinition > of the filldir_t type in 2.4.9. It was admittedly not documented using > a define... No doubt you can find more :) That doesn't make the right thing to do, though :) > Note: We could at the same time replace the name direct_IO() with > direct_IO2() (that has several precedents). There are currently only > a small number of filesystems that provide O_DIRECT, and converting > them all is (as has been pointed out before) trivial... We cannot just-fix-up filesystems which are not in-tree, which is what the KERNEL_HAS_O_DIRECT2 define would be mainly used for. In-tree filesystems would just unconditionally use the new, or old, interface as they chose. > The problem with read_inode2() was rather that it overloaded the the > existing iget4() interface... The higher-level problem was that we didn't want to change the VFS API... otherwise we could have simply used the new interface, and converted all in-tree filesystems. Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 19:17 ` Jeff Garzik 2003-07-09 19:51 ` Trond Myklebust @ 2003-07-09 23:42 ` Alan Cox 2003-07-10 0:23 ` Jeff Garzik 1 sibling, 1 reply; 27+ messages in thread From: Alan Cox @ 2003-07-09 23:42 UTC (permalink / raw) To: Jeff Garzik Cc: Trond Myklebust, Marc-Christian Petersen, Marcelo Tosatti, lkml On Mer, 2003-07-09 at 20:17, Jeff Garzik wrote: > Having the stable API change, conditional on a define, is really > nasty and IMO will create maintenance and support headaches down > the line. I do not recall Linux VFS _ever_ having a hook's definition > conditional. We should not start now... The new quota code in 2.4.22pre already changed the rules slightly, as do the shmemfs fixes if you are pedantic ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 23:42 ` Alan Cox @ 2003-07-10 0:23 ` Jeff Garzik 0 siblings, 0 replies; 27+ messages in thread From: Jeff Garzik @ 2003-07-10 0:23 UTC (permalink / raw) To: Alan Cox; +Cc: Trond Myklebust, Marc-Christian Petersen, Marcelo Tosatti, lkml Alan Cox wrote: > On Mer, 2003-07-09 at 20:17, Jeff Garzik wrote: > >>Having the stable API change, conditional on a define, is really >>nasty and IMO will create maintenance and support headaches down >>the line. I do not recall Linux VFS _ever_ having a hook's definition >>conditional. We should not start now... > > > The new quota code in 2.4.22pre already changed the rules slightly, as > do the shmemfs fixes if you are pedantic I am ;-) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:50 ` Trond Myklebust 2003-07-09 18:55 ` Marc-Christian Petersen 2003-07-09 19:05 ` Jeff Garzik @ 2003-07-09 23:40 ` Alan Cox 2 siblings, 0 replies; 27+ messages in thread From: Alan Cox @ 2003-07-09 23:40 UTC (permalink / raw) To: trond.myklebust; +Cc: Marc-Christian Petersen, Marcelo Tosatti, lkml On Mer, 2003-07-09 at 19:50, Trond Myklebust wrote: > >>>>> " " == Marc-Christian Petersen <m.c.p@wolk-project.de> writes: > > > err, -aa has XFS per default, -wolk has XFS per default. So > > ... ;) > > So they have both XFS + NFS O_DIRECT? -ac has both ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 17:24 ` Marcelo Tosatti 2003-07-09 17:43 ` Marc-Christian Petersen @ 2003-07-09 18:29 ` Trond Myklebust 2003-07-09 18:51 ` Andreas Dilger 1 sibling, 1 reply; 27+ messages in thread From: Trond Myklebust @ 2003-07-09 18:29 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Andreas Dilger, Christoph Hellwig, marcelo, lkml >>>>> " " == Marcelo Tosatti <marcelo@conectiva.com.br> writes: > Fine, I agree. Trond, I'll have to revert your direct IO > patches because they break the _stable_ API, indeed. > Please come up with another solution for the problem > (->direct_IO2 ? its ugly, but...). Like this? Blech... How about instead following Alan's suggestion to replace KERNEL_HAS_O_DIRECT with a KERNEL_HAS_O_DIRECT2 that can be used to switch between the old and new O_DIRECT format in the XFS patches? Cheers, Trond diff -u --recursive --new-file linux-2.4.22-nodirect/Documentation/Configure.help linux-2.4.22-blech/Documentation/Configure.help --- linux-2.4.22-nodirect/Documentation/Configure.help 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/Documentation/Configure.help 2003-07-09 20:07:36.000000000 +0200 @@ -15938,6 +15938,30 @@ If unsure, say N. +Allow direct I/O on files in NFS +CONFIG_NFS_DIRECTIO + There are important applications whose performance or correctness + depends on uncached access to file data. Database clusters (multiple + copies of the same instance running on separate hosts) implement their + own cache coherency protocol that subsumes the NFS cache protocols. + Applications that process datasets considerably larger than the client's + memory do not always benefit from a local cache. A streaming video + server, for instance, has no need to cache the contents of a file. + + This option enables applications to perform direct I/O on files in NFS + file systems using the O_DIRECT open() flag. When O_DIRECT is set for + files, their data is not cached in the system's page cache. Direct + read and write operations are aligned to block boundaries. Data is + moved to and from user-level application buffers directly. + + Unless your program is designed to use O_DIRECT properly, you are much + better off allowing the NFS client to manage caching for you. Misusing + O_DIRECT can cause poor server performance or network storms. This + kernel build option defaults OFF to avoid exposing system administrators + unwittingly to a potentially hazardous feature. + + If unsure, say N. + Root file system on NFS CONFIG_ROOT_NFS If you want your Linux box to mount its whole root file system (the diff -u --recursive --new-file linux-2.4.22-nodirect/fs/Config.in linux-2.4.22-blech/fs/Config.in --- linux-2.4.22-nodirect/fs/Config.in 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/fs/Config.in 2003-07-09 20:08:20.000000000 +0200 @@ -106,6 +106,7 @@ dep_tristate 'InterMezzo file system support (replicating fs) (EXPERIMENTAL)' CONFIG_INTERMEZZO_FS $CONFIG_INET $CONFIG_EXPERIMENTAL dep_tristate 'NFS file system support' CONFIG_NFS_FS $CONFIG_INET dep_mbool ' Provide NFSv3 client support' CONFIG_NFS_V3 $CONFIG_NFS_FS + dep_mbool ' Allow direct I/O on NFS files (EXPERIMENTAL)' CONFIG_NFS_DIRECTIO $CONFIG_NFS_FS $CONFIG_EXPERIMENTAL dep_bool ' Root file system on NFS' CONFIG_ROOT_NFS $CONFIG_NFS_FS $CONFIG_IP_PNP dep_tristate 'NFS server support' CONFIG_NFSD $CONFIG_INET diff -u --recursive --new-file linux-2.4.22-nodirect/fs/nfs/direct.c linux-2.4.22-blech/fs/nfs/direct.c --- linux-2.4.22-nodirect/fs/nfs/direct.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.4.22-blech/fs/nfs/direct.c 2003-07-08 11:52:37.000000000 +0200 @@ -0,0 +1,382 @@ +/* + * linux/fs/nfs/direct.c + * + * High-performance direct I/O for the NFS client + * + * When an application requests uncached I/O, all read and write requests + * are made directly to the server; data stored or fetched via these + * requests is not cached in the Linux page cache. The client does not + * correct unaligned requests from applications. All requested bytes are + * held on permanent storage before a direct write system call returns to + * an application. Applications that manage their own data caching, such + * as databases, make very good use of direct I/O on local file systems. + * + * Solaris implements an uncached I/O facility called directio() that + * is used for backups and sequential I/O to very large files. Solaris + * also supports uncaching whole NFS partitions with "-o forcedirectio," + * an undocumented mount option. + * + * Note that I/O to read in executables (e.g. kernel_read) cannot use + * direct (kiobuf) reads because there is no vma backing the passed-in + * data buffer. + * + * Designed by Jeff Kimmel, Chuck Lever, and Trond Myklebust. + * + * Initial implementation: 12/2001 by Chuck Lever <cel@netapp.com> + * + * TODO: + * + * 1. Use concurrent asynchronous network requests rather than + * serialized synchronous network requests for normal (non-sync) + * direct I/O. + */ + +#include <linux/config.h> +#include <linux/sched.h> +#include <linux/kernel.h> +#include <linux/file.h> +#include <linux/errno.h> +#include <linux/nfs_fs.h> +#include <linux/smp_lock.h> +#include <linux/sunrpc/clnt.h> +#include <linux/iobuf.h> + +#include <asm/system.h> +#include <asm/uaccess.h> + +#define NFSDBG_FACILITY (NFSDBG_PAGECACHE | NFSDBG_VFS) +#define VERF_SIZE (2 * sizeof(__u32)) + +static inline int +nfs_direct_read_rpc(struct file *file, struct nfs_readargs *arg) +{ + int result; + struct inode * inode = file->f_dentry->d_inode; + struct nfs_fattr fattr; + struct rpc_message msg; + struct nfs_readres res = { &fattr, arg->count, 0 }; + +#ifdef CONFIG_NFS_V3 + msg.rpc_proc = (NFS_PROTO(inode)->version == 3) ? + NFS3PROC_READ : NFSPROC_READ; +#else + msg.rpc_proc = NFSPROC_READ; +#endif + msg.rpc_argp = arg; + msg.rpc_resp = &res; + + lock_kernel(); + msg.rpc_cred = nfs_file_cred(file); + fattr.valid = 0; + result = rpc_call_sync(NFS_CLIENT(inode), &msg, 0); + nfs_refresh_inode(inode, &fattr); + unlock_kernel(); + + return result; +} + +static inline int +nfs_direct_write_rpc(struct file *file, struct nfs_writeargs *arg, + struct nfs_writeverf *verf) +{ + int result; + struct inode *inode = file->f_dentry->d_inode; + struct nfs_fattr fattr; + struct rpc_message msg; + struct nfs_writeres res = { &fattr, verf, 0 }; + +#ifdef CONFIG_NFS_V3 + msg.rpc_proc = (NFS_PROTO(inode)->version == 3) ? + NFS3PROC_WRITE : NFSPROC_WRITE; +#else + msg.rpc_proc = NFSPROC_WRITE; +#endif + msg.rpc_argp = arg; + msg.rpc_resp = &res; + + lock_kernel(); + msg.rpc_cred = get_rpccred(nfs_file_cred(file)); + fattr.valid = 0; + result = rpc_call_sync(NFS_CLIENT(inode), &msg, 0); + nfs_write_attributes(inode, &fattr); + put_rpccred(msg.rpc_cred); + unlock_kernel(); + +#ifdef CONFIG_NFS_V3 + if (NFS_PROTO(inode)->version == 3) { + if (result > 0) { + if ((arg->stable == NFS_FILE_SYNC) && + (verf->committed != NFS_FILE_SYNC)) { + printk(KERN_ERR + "%s: server didn't sync stable write request\n", + __FUNCTION__); + return -EIO; + } + + if (result != arg->count) { + printk(KERN_INFO + "%s: short write, count=%u, result=%d\n", + __FUNCTION__, arg->count, result); + } + } + return result; + } else { +#endif + verf->committed = NFS_FILE_SYNC; /* NFSv2 always syncs data */ + if (result == 0) + return arg->count; + return result; +#ifdef CONFIG_NFS_V3 + } +#endif +} + +#ifdef CONFIG_NFS_V3 +static inline int +nfs_direct_commit_rpc(struct inode *inode, loff_t offset, size_t count, + struct nfs_writeverf *verf) +{ + int result; + struct nfs_fattr fattr; + struct nfs_writeargs arg = { NFS_FH(inode), offset, count, 0, 0, + NULL }; + struct nfs_writeres res = { &fattr, verf, 0 }; + struct rpc_message msg = { NFS3PROC_COMMIT, &arg, &res, NULL }; + + fattr.valid = 0; + + lock_kernel(); + result = rpc_call_sync(NFS_CLIENT(inode), &msg, 0); + nfs_write_attributes(inode, &fattr); + unlock_kernel(); + + return result; +} +#else +static inline int +nfs_direct_commit_rpc(struct inode *inode, loff_t offset, size_t count, + struct nfs_writeverf *verf) +{ + return 0; +} +#endif + +/* + * Walk through the iobuf and create an iovec for each "rsize" bytes. + */ +static int +nfs_direct_read(struct file *file, struct kiobuf *iobuf, loff_t offset, + size_t count) +{ + int curpage, total; + int result = 0; + struct inode *inode = file->f_dentry->d_inode; + int rsize = NFS_SERVER(inode)->rsize; + struct page *pages[NFS_READ_MAXIOV]; + struct nfs_readargs args = { NFS_FH(inode), offset, 0, iobuf->offset, + pages }; + + total = 0; + curpage = 0; + while (count) { + int len, request; + struct page **dest = pages; + + request = count; + if (count > rsize) + request = rsize; + args.count = request; + args.offset = offset; + args.pgbase = (iobuf->offset + total) & ~PAGE_MASK; + len = PAGE_SIZE - args.pgbase; + + do { + struct page *page = iobuf->maplist[curpage]; + + if (curpage >= iobuf->nr_pages || !page) { + result = -EFAULT; + goto out_err; + } + + *dest++ = page; + /* zero after the first iov */ + if (request < len) + break; + request -= len; + len = PAGE_SIZE; + curpage++; + } while (request != 0); + + result = nfs_direct_read_rpc(file, &args); + + if (result < 0) + break; + + total += result; + if (result < args.count) /* NFSv2ism */ + break; + count -= result; + offset += result; + }; +out_err: + if (!total) + return result; + return total; +} + +/* + * Walk through the iobuf and create an iovec for each "wsize" bytes. + * If only one network write is necessary, or if the O_SYNC flag or + * 'sync' mount option are present, or if this is a V2 inode, use + * FILE_SYNC. Otherwise, use UNSTABLE and finish with a COMMIT. + * + * The mechanics of this function are much the same as nfs_direct_read, + * with the added complexity of committing unstable writes. + */ +static int +nfs_direct_write(struct file *file, struct kiobuf *iobuf, + loff_t offset, size_t count) +{ + int curpage, total; + int need_commit = 0; + int result = 0; + loff_t save_offset = offset; + struct inode *inode = file->f_dentry->d_inode; + int wsize = NFS_SERVER(inode)->wsize; + struct nfs_writeverf first_verf, ret_verf; + struct page *pages[NFS_WRITE_MAXIOV]; + struct nfs_writeargs args = { NFS_FH(inode), 0, 0, NFS_FILE_SYNC, 0, + pages }; + +#ifdef CONFIG_NFS_V3 + if ((NFS_PROTO(inode)->version == 3) && (count > wsize) && + (!IS_SYNC(inode))) + args.stable = NFS_UNSTABLE; +#endif + +retry: + total = 0; + curpage = 0; + while (count) { + int len, request; + struct page **dest = pages; + + request = count; + if (count > wsize) + request = wsize; + args.count = request; + args.offset = offset; + args.pgbase = (iobuf->offset + total) & ~PAGE_MASK; + len = PAGE_SIZE - args.pgbase; + + do { + struct page *page = iobuf->maplist[curpage]; + + if (curpage >= iobuf->nr_pages || !page) { + result = -EFAULT; + goto out_err; + } + + *dest++ = page; + /* zero after the first iov */ + if (request < len) + break; + request -= len; + len = PAGE_SIZE; + curpage++; + } while (request != 0); + + result = nfs_direct_write_rpc(file, &args, &ret_verf); + + if (result < 0) + break; + + if (!total) + memcpy(&first_verf.verifier, &ret_verf.verifier, + VERF_SIZE); + if (ret_verf.committed != NFS_FILE_SYNC) { + need_commit = 1; + if (memcmp(&first_verf.verifier, &ret_verf.verifier, + VERF_SIZE)) + goto print_retry; + } + + total += result; + count -= result; + offset += result; + }; + +out_err: + /* + * Commit data written so far, even in the event of an error + */ + if (need_commit) { + if (nfs_direct_commit_rpc(inode, save_offset, + iobuf->length - count, &ret_verf)) + goto print_retry; + if (memcmp(&first_verf.verifier, &ret_verf.verifier, + VERF_SIZE)) + goto print_retry; + } + + if (!total) + return result; + return total; + +print_retry: + printk(KERN_INFO "%s: detected server restart; retrying with FILE_SYNC\n", + __FUNCTION__); + args.stable = NFS_FILE_SYNC; + offset = save_offset; + count = iobuf->length; + goto retry; +} + +/* + * Read or write data, moving the data directly to/from the + * application's buffer without caching in the page cache. + * + * Rules for direct I/O + * + * 1. block size = 512 bytes or more + * 2. file byte offset is block aligned + * 3. byte count is a multiple of block size + * 4. user buffer is not aligned + * 5. user buffer is faulted in and pinned + * + * These are verified before we get here. + */ +int +nfs_direct_IO(int rw, struct file *file, struct kiobuf *iobuf, + unsigned long blocknr, int blocksize) +{ + int result = -EINVAL; + size_t count = iobuf->length; + struct dentry *dentry = file->f_dentry; + struct inode *inode = dentry->d_inode; + loff_t offset = blocknr << inode->i_blkbits; + + switch (rw) { + case READ: + dfprintk(VFS, + "NFS: direct_IO(READ) (%s/%s) off/cnt(%Lu/%d)\n", + dentry->d_parent->d_name.name, + dentry->d_name.name, offset, count); + + result = nfs_direct_read(file, iobuf, offset, count); + break; + case WRITE: + dfprintk(VFS, + "NFS: direct_IO(WRITE) (%s/%s) off/cnt(%Lu/%d)\n", + dentry->d_parent->d_name.name, + dentry->d_name.name, offset, count); + + result = nfs_direct_write(file, iobuf, offset, count); + break; + default: + break; + } + + dfprintk(VFS, "NFS: direct_IO result = %d\n", result); + return result; +} diff -u --recursive --new-file linux-2.4.22-nodirect/fs/nfs/file.c linux-2.4.22-blech/fs/nfs/file.c --- linux-2.4.22-nodirect/fs/nfs/file.c 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/fs/nfs/file.c 2003-07-09 20:10:21.000000000 +0200 @@ -16,6 +16,7 @@ * nfs regular file handling functions */ +#include <linux/config.h> #include <linux/sched.h> #include <linux/kernel.h> #include <linux/errno.h> @@ -199,6 +200,9 @@ readpage: nfs_readpage, sync_page: nfs_sync_page, writepage: nfs_writepage, +#ifdef CONFIG_NFS_DIRECTIO + direct_fileIO: nfs_direct_IO, +#endif prepare_write: nfs_prepare_write, commit_write: nfs_commit_write }; diff -u --recursive --new-file linux-2.4.22-nodirect/fs/nfs/Makefile linux-2.4.22-blech/fs/nfs/Makefile --- linux-2.4.22-nodirect/fs/nfs/Makefile 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/fs/nfs/Makefile 2003-07-09 20:08:47.000000000 +0200 @@ -14,6 +14,7 @@ obj-$(CONFIG_ROOT_NFS) += nfsroot.o mount_clnt.o obj-$(CONFIG_NFS_V3) += nfs3proc.o nfs3xdr.o +obj-$(CONFIG_NFS_DIRECTIO) += direct.o obj-m := $(O_TARGET) diff -u --recursive --new-file linux-2.4.22-nodirect/fs/nfs/write.c linux-2.4.22-blech/fs/nfs/write.c --- linux-2.4.22-nodirect/fs/nfs/write.c 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/fs/nfs/write.c 2003-07-09 20:11:04.000000000 +0200 @@ -123,23 +123,6 @@ } /* - * This function will be used to simulate weak cache consistency - * under NFSv2 when the NFSv3 attribute patch is included. - * For the moment, we just call nfs_refresh_inode(). - */ -static __inline__ int -nfs_write_attributes(struct inode *inode, struct nfs_fattr *fattr) -{ - if ((fattr->valid & NFS_ATTR_FATTR) && !(fattr->valid & NFS_ATTR_WCC)) { - fattr->pre_size = NFS_CACHE_ISIZE(inode); - fattr->pre_mtime = NFS_CACHE_MTIME(inode); - fattr->pre_ctime = NFS_CACHE_CTIME(inode); - fattr->valid |= NFS_ATTR_WCC; - } - return nfs_refresh_inode(inode, fattr); -} - -/* * Write a page synchronously. * Offset is the data offset within the page. */ diff -u --recursive --new-file linux-2.4.22-nodirect/include/linux/fs.h linux-2.4.22-blech/include/linux/fs.h --- linux-2.4.22-nodirect/include/linux/fs.h 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/include/linux/fs.h 2003-07-09 19:42:37.000000000 +0200 @@ -396,6 +396,8 @@ int (*releasepage) (struct page *, int); #define KERNEL_HAS_O_DIRECT /* this is for modules out of the kernel */ int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int); +#define KERNEL_HAS_O_DIRECT2 /* Unfortunate kludge due to lack of foresight */ + int (*direct_fileIO)(int, struct file *, struct kiobuf *, unsigned long, int); void (*removepage)(struct page *); /* called when page gets removed from the inode */ }; diff -u --recursive --new-file linux-2.4.22-nodirect/include/linux/nfs_fs.h linux-2.4.22-blech/include/linux/nfs_fs.h --- linux-2.4.22-nodirect/include/linux/nfs_fs.h 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/include/linux/nfs_fs.h 2003-07-09 20:12:27.000000000 +0200 @@ -274,6 +274,11 @@ #define NFS_TestClearPageSync(page) test_and_clear_bit(PG_fs_1, &(page)->flags) /* + * linux/fs/nfs/direct.c + */ +extern int nfs_direct_IO(int, struct file *, struct kiobuf *, unsigned long, int); + +/* * linux/fs/mount_clnt.c * (Used only by nfsroot module) */ @@ -302,6 +307,23 @@ return __nfs_refresh_inode(inode,fattr); } +/* + * This function will be used to simulate weak cache consistency + * under NFSv2 when the NFSv3 attribute patch is included. + * For the moment, we just call nfs_refresh_inode(). + */ +static __inline__ int +nfs_write_attributes(struct inode *inode, struct nfs_fattr *fattr) +{ + if ((fattr->valid & NFS_ATTR_FATTR) && !(fattr->valid & NFS_ATTR_WCC)) { + fattr->pre_size = NFS_CACHE_ISIZE(inode); + fattr->pre_mtime = NFS_CACHE_MTIME(inode); + fattr->pre_ctime = NFS_CACHE_CTIME(inode); + fattr->valid |= NFS_ATTR_WCC; + } + return nfs_refresh_inode(inode, fattr); +} + static inline loff_t nfs_size_to_loff_t(__u64 size) { diff -u --recursive --new-file linux-2.4.22-nodirect/include/linux/nfs_xdr.h linux-2.4.22-blech/include/linux/nfs_xdr.h --- linux-2.4.22-nodirect/include/linux/nfs_xdr.h 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/include/linux/nfs_xdr.h 2003-07-09 20:13:22.000000000 +0200 @@ -59,7 +59,7 @@ /* Arguments to the read call. * Note that NFS_READ_MAXIOV must be <= (MAX_IOVEC-2) from sunrpc/xprt.h */ -#define NFS_READ_MAXIOV 8 +#define NFS_READ_MAXIOV (9) struct nfs_readargs { struct nfs_fh * fh; @@ -78,7 +78,7 @@ /* Arguments to the write call. * Note that NFS_WRITE_MAXIOV must be <= (MAX_IOVEC-2) from sunrpc/xprt.h */ -#define NFS_WRITE_MAXIOV 8 +#define NFS_WRITE_MAXIOV (9) struct nfs_writeargs { struct nfs_fh * fh; __u64 offset; diff -u --recursive --new-file linux-2.4.22-nodirect/mm/filemap.c linux-2.4.22-blech/mm/filemap.c --- linux-2.4.22-nodirect/mm/filemap.c 2003-07-09 19:39:39.000000000 +0200 +++ linux-2.4.22-blech/mm/filemap.c 2003-07-09 19:57:09.000000000 +0200 @@ -1651,6 +1651,95 @@ return retval; } +/* Differs from the above in that it uses the a_op->direct_fileIO() call */ +static ssize_t direct_fileIO(int rw, struct file * filp, char * buf, size_t count, loff_t offset) +{ + ssize_t retval; + int new_iobuf, chunk_size, blocksize_mask, blocksize, blocksize_bits, iosize, progress; + struct kiobuf * iobuf; + struct address_space * mapping = filp->f_dentry->d_inode->i_mapping; + struct inode * inode = mapping->host; + loff_t size = inode->i_size; + + new_iobuf = 0; + iobuf = filp->f_iobuf; + if (test_and_set_bit(0, &filp->f_iobuf_lock)) { + /* + * A parallel read/write is using the preallocated iobuf + * so just run slow and allocate a new one. + */ + retval = alloc_kiovec(1, &iobuf); + if (retval) + goto out; + new_iobuf = 1; + } + + blocksize = 1 << inode->i_blkbits; + blocksize_bits = inode->i_blkbits; + blocksize_mask = blocksize - 1; + chunk_size = KIO_MAX_ATOMIC_IO << 10; + + retval = -EINVAL; + if ((offset & blocksize_mask) || (count & blocksize_mask) || ((unsigned long) buf & blocksize_mask)) + goto out_free; + if (!mapping->a_ops->direct_fileIO) + goto out_free; + + if ((rw == READ) && (offset + count > size)) + count = size - offset; + + /* + * Flush to disk exclusively the _data_, metadata must remain + * completly asynchronous or performance will go to /dev/null. + */ + retval = filemap_fdatasync(mapping); + if (retval == 0) + retval = fsync_inode_data_buffers(inode); + if (retval == 0) + retval = filemap_fdatawait(mapping); + if (retval < 0) + goto out_free; + + progress = retval = 0; + while (count > 0) { + iosize = count; + if (iosize > chunk_size) + iosize = chunk_size; + + retval = map_user_kiobuf(rw, iobuf, (unsigned long) buf, iosize); + if (retval) + break; + + retval = mapping->a_ops->direct_fileIO(rw, filp, iobuf, (offset+progress) >> blocksize_bits, blocksize); + + if (rw == READ && retval > 0) + mark_dirty_kiobuf(iobuf, retval); + + if (retval >= 0) { + count -= retval; + buf += retval; + /* warning: weird semantics here, we're reporting a read behind the end of the file */ + progress += retval; + } + + unmap_kiobuf(iobuf); + + if (retval != iosize) + break; + } + + if (progress) + retval = progress; + + out_free: + if (!new_iobuf) + clear_bit(0, &filp->f_iobuf_lock); + else + free_kiovec(1, &iobuf); + out: + return retval; +} + int file_read_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size) { char *kaddr; @@ -1721,7 +1810,10 @@ down(&inode->i_sem); size = inode->i_size; if (pos < size) { - retval = generic_file_direct_IO(READ, filp, buf, count, pos); + if (mapping->a_ops->direct_fileIO) + retval = direct_fileIO(READ, filp, buf, count, pos); + else + retval = generic_file_direct_IO(READ, filp, buf, count, pos); if (retval > 0) *ppos = pos + retval; } @@ -3230,7 +3322,10 @@ inode->i_ctime = inode->i_mtime = CURRENT_TIME; mark_inode_dirty_sync(inode); - written = generic_file_direct_IO(WRITE, file, (char *) buf, count, pos); + if (mapping->a_ops->direct_fileIO) + written = direct_fileIO(WRITE, file, (char *) buf, count, pos); + else + written = generic_file_direct_IO(WRITE, file, (char *) buf, count, pos); if (written > 0) { loff_t end = pos + written; if (end > inode->i_size && !S_ISBLK(inode->i_mode)) { ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:29 ` Trond Myklebust @ 2003-07-09 18:51 ` Andreas Dilger 2003-07-09 19:18 ` Jeff Garzik 0 siblings, 1 reply; 27+ messages in thread From: Andreas Dilger @ 2003-07-09 18:51 UTC (permalink / raw) To: Trond Myklebust; +Cc: Marcelo Tosatti, Christoph Hellwig, marcelo, lkml On Jul 09, 2003 20:29 +0200, Trond Myklebust wrote: > How about instead following Alan's suggestion to replace > KERNEL_HAS_O_DIRECT with a KERNEL_HAS_O_DIRECT2 that can be used to > switch between the old and new O_DIRECT format in the XFS patches? Actually, I like that a lot more, as it allows out-of-tree code to know which interface to use, and we don't have to key on kernel version (which is bogus if compiling against a vendor kernel that back-ports this fix). Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: ->direct_IO API change in current 2.4 BK 2003-07-09 18:51 ` Andreas Dilger @ 2003-07-09 19:18 ` Jeff Garzik 0 siblings, 0 replies; 27+ messages in thread From: Jeff Garzik @ 2003-07-09 19:18 UTC (permalink / raw) To: Trond Myklebust, Marcelo Tosatti, Christoph Hellwig, marcelo, lkml On Wed, Jul 09, 2003 at 11:51:48AM -0700, Andreas Dilger wrote: > On Jul 09, 2003 20:29 +0200, Trond Myklebust wrote: > > How about instead following Alan's suggestion to replace > > KERNEL_HAS_O_DIRECT with a KERNEL_HAS_O_DIRECT2 that can be used to > > switch between the old and new O_DIRECT format in the XFS patches? > > Actually, I like that a lot more, as it allows out-of-tree code to > know which interface to use, and we don't have to key on kernel version > (which is bogus if compiling against a vendor kernel that back-ports > this fix). Feature test macros are definitely quite useful... ...but making the stable series VFS API definition _conditional_ is quite unprecedented, and IMO wrong. Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2003-07-10 0:12 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-07-09 12:31 ->direct_IO API change in current 2.4 BK Christoph Hellwig 2003-07-09 17:03 ` Andreas Dilger 2003-07-09 17:24 ` Marcelo Tosatti 2003-07-09 17:43 ` Marc-Christian Petersen 2003-07-09 17:46 ` Marcelo Tosatti 2003-07-09 17:55 ` Marc-Christian Petersen 2003-07-09 18:08 ` Marcelo Tosatti 2003-07-09 18:22 ` Marc-Christian Petersen 2003-07-09 19:13 ` Marcelo Tosatti 2003-07-09 19:45 ` Marc-Christian Petersen 2003-07-09 23:43 ` Alan Cox 2003-07-10 0:21 ` Jeff Garzik 2003-07-09 18:33 ` Trond Myklebust 2003-07-09 18:41 ` Marc-Christian Petersen 2003-07-09 18:50 ` Trond Myklebust 2003-07-09 18:55 ` Marc-Christian Petersen 2003-07-09 19:05 ` Jeff Garzik 2003-07-09 19:08 ` Trond Myklebust 2003-07-09 19:17 ` Jeff Garzik 2003-07-09 19:51 ` Trond Myklebust 2003-07-09 21:43 ` Jeff Garzik 2003-07-09 23:42 ` Alan Cox 2003-07-10 0:23 ` Jeff Garzik 2003-07-09 23:40 ` Alan Cox 2003-07-09 18:29 ` Trond Myklebust 2003-07-09 18:51 ` Andreas Dilger 2003-07-09 19:18 ` Jeff Garzik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox