* O_DIRECT fails in some kernel and FS
@ 2002-02-01 20:37 Ricardo Galli
2002-02-01 20:44 ` Andrew Morton
0 siblings, 1 reply; 32+ messages in thread
From: Ricardo Galli @ 2002-02-01 20:37 UTC (permalink / raw)
To: linux-kernel
After some comments from Oliver Diedrich (editor of heise.de), which told me
he couldn't make O_DIRECT work on 2.4.17, I tried with different versions and
file systems:
This is the result:
2.4.14 - Ext[23] - redhat7.2 glibs: OK (at least the bytes are written)
2.4.17 - ReiserFS - Debian Sid : FAILS (0 bytes file, write returns -1)
2.4.17 - Ext2 - Debian Woody : OK (bytes written)
2.4.17 - Ext3 - Debian Woody : FAILS (0 bytes file, write returns -1)
Oliver Diedrich also told he could make work O_DIRECT with ext3 and 2.4.17.
Is this normal? Does it really work on 2.4.14? Or it doesn't but the kernel
doesn't avoid caching?
Funny behaviour...
Regards,
--
ricardo
"I just stopped using Windows and now you tell me to use Mirrors?"
- said Aunt Tillie, just before downloading 2.5.3 kernel.
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: O_DIRECT fails in some kernel and FS 2002-02-01 20:37 O_DIRECT fails in some kernel and FS Ricardo Galli @ 2002-02-01 20:44 ` Andrew Morton 2002-02-01 20:49 ` Ricardo Galli ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Andrew Morton @ 2002-02-01 20:44 UTC (permalink / raw) To: Ricardo Galli; +Cc: linux-kernel Ricardo Galli wrote: > > After some comments from Oliver Diedrich (editor of heise.de), which told me > he couldn't make O_DIRECT work on 2.4.17, I tried with different versions and > file systems: > > This is the result: > > 2.4.14 - Ext[23] - redhat7.2 glibs: OK (at least the bytes are written) > 2.4.17 - ReiserFS - Debian Sid : FAILS (0 bytes file, write returns -1) > 2.4.17 - Ext2 - Debian Woody : OK (bytes written) > 2.4.17 - Ext3 - Debian Woody : FAILS (0 bytes file, write returns -1) > > Oliver Diedrich also told he could make work O_DIRECT with ext3 and 2.4.17. > > Is this normal? Does it really work on 2.4.14? Or it doesn't but the kernel > doesn't avoid caching? > ext2 is the only filesystem which has O_DIRECT support. - ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-01 20:44 ` Andrew Morton @ 2002-02-01 20:49 ` Ricardo Galli 2002-02-01 20:57 ` Andrew Morton 2002-02-01 21:05 ` Steve Lord 2002-02-02 17:14 ` Christoph Hellwig 2 siblings, 1 reply; 32+ messages in thread From: Ricardo Galli @ 2002-02-01 20:49 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On 01/02/02 21:44, Andrew Morton wrote: > Ricardo Galli wrote: > > After some comments from Oliver Diedrich (editor of heise.de), which told > > me he couldn't make O_DIRECT work on 2.4.17, I tried with different > > versions and file systems: > > > > This is the result: > > > > 2.4.14 - Ext[23] - redhat7.2 glibs: OK (at least the bytes are written) > > 2.4.17 - ReiserFS - Debian Sid : FAILS (0 bytes file, write returns > > -1) 2.4.17 - Ext2 - Debian Woody : OK (bytes written) > > 2.4.17 - Ext3 - Debian Woody : FAILS (0 bytes file, write returns > > -1) > > > > Oliver Diedrich also told he could make work O_DIRECT with ext3 and > > 2.4.17. > > > > Is this normal? Does it really work on 2.4.14? Or it doesn't but the > > kernel doesn't avoid caching? > > ext2 is the only filesystem which has O_DIRECT support. Does that mean that the succesful test with ext3 and 2.4.14 is bogus? -- ricardo "I just stopped using Windows and now you tell me to use Mirrors?" - said Aunt Tillie, just before downloading 2.5.3 kernel. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-01 20:49 ` Ricardo Galli @ 2002-02-01 20:57 ` Andrew Morton 0 siblings, 0 replies; 32+ messages in thread From: Andrew Morton @ 2002-02-01 20:57 UTC (permalink / raw) To: Ricardo Galli; +Cc: linux-kernel Ricardo Galli wrote: > > > ext2 is the only filesystem which has O_DIRECT support. > > Does that mean that the succesful test with ext3 and 2.4.14 is bogus? > Yep. O_DIRECT was added around 2.4.10, was tugged out for a while and then went back in again. - ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-01 20:44 ` Andrew Morton 2002-02-01 20:49 ` Ricardo Galli @ 2002-02-01 21:05 ` Steve Lord 2002-02-02 9:35 ` Chris Wedgwood 2002-02-02 17:14 ` Christoph Hellwig 2 siblings, 1 reply; 32+ messages in thread From: Steve Lord @ 2002-02-01 21:05 UTC (permalink / raw) To: Andrew Morton; +Cc: Ricardo Galli, Linux Kernel On Fri, 2002-02-01 at 14:44, Andrew Morton wrote: > Ricardo Galli wrote: > > > > After some comments from Oliver Diedrich (editor of heise.de), which told me > > he couldn't make O_DIRECT work on 2.4.17, I tried with different versions and > > file systems: > > > > This is the result: > > > > 2.4.14 - Ext[23] - redhat7.2 glibs: OK (at least the bytes are written) > > 2.4.17 - ReiserFS - Debian Sid : FAILS (0 bytes file, write returns -1) > > 2.4.17 - Ext2 - Debian Woody : OK (bytes written) > > 2.4.17 - Ext3 - Debian Woody : FAILS (0 bytes file, write returns -1) > > > > Oliver Diedrich also told he could make work O_DIRECT with ext3 and 2.4.17. > > > > Is this normal? Does it really work on 2.4.14? Or it doesn't but the kernel > > doesn't avoid caching? > > > > ext2 is the only filesystem which has O_DIRECT support. And XFS ;-) Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-01 21:05 ` Steve Lord @ 2002-02-02 9:35 ` Chris Wedgwood 2002-02-02 10:25 ` Hans Reiser ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Chris Wedgwood @ 2002-02-02 9:35 UTC (permalink / raw) To: Steve Lord; +Cc: Andrew Morton, Ricardo Galli, Linux Kernel, Chris Mason On Fri, Feb 01, 2002 at 03:05:38PM -0600, Steve Lord wrote: > ext2 is the only filesystem which has O_DIRECT support. And XFS ;-) I sent reiserfs O_DIRECT support patches to someone a while ago. I can look to ressurect these (assuming I can find them!) Chris Mason is always going to be a better source for these anyhow, he certainly understands any complex nuances there may be. Chris, do you have any cycles to comment on this please? --cw ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-02 9:35 ` Chris Wedgwood @ 2002-02-02 10:25 ` Hans Reiser 2002-02-02 15:24 ` Chris Mason 2002-02-02 18:20 ` Chris Mason 2 siblings, 0 replies; 32+ messages in thread From: Hans Reiser @ 2002-02-02 10:25 UTC (permalink / raw) To: Chris Wedgwood Cc: Steve Lord, Andrew Morton, Ricardo Galli, Linux Kernel, Chris Mason, green Chris Wedgwood wrote: >On Fri, Feb 01, 2002 at 03:05:38PM -0600, Steve Lord wrote: > > > ext2 is the only filesystem which has O_DIRECT support. > > And XFS ;-) > >I sent reiserfs O_DIRECT support patches to someone a while ago. I >can look to ressurect these (assuming I can find them!) > >Chris Mason is always going to be a better source for these anyhow, he >certainly understands any complex nuances there may be. Chris, do you >have any cycles to comment on this please? > > > > > --cw >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > > You might try sending them to me if you want them to be reviewed and hopefully go in. Cc green@namesys.com if you do, because I will be away until the 24th. Hans ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-02 9:35 ` Chris Wedgwood 2002-02-02 10:25 ` Hans Reiser @ 2002-02-02 15:24 ` Chris Mason 2002-02-02 18:20 ` Chris Mason 2 siblings, 0 replies; 32+ messages in thread From: Chris Mason @ 2002-02-02 15:24 UTC (permalink / raw) To: Chris Wedgwood, Steve Lord; +Cc: Andrew Morton, Ricardo Galli, Linux Kernel On Saturday, February 02, 2002 01:35:56 AM -0800 Chris Wedgwood <cw@f00f.org> wrote: > On Fri, Feb 01, 2002 at 03:05:38PM -0600, Steve Lord wrote: > > > ext2 is the only filesystem which has O_DIRECT support. > > And XFS ;-) > > I sent reiserfs O_DIRECT support patches to someone a while ago. I > can look to ressurect these (assuming I can find them!) > > Chris Mason is always going to be a better source for these anyhow, he > certainly understands any complex nuances there may be. Chris, do you > have any cycles to comment on this please? I've dug your patch out of my archives, it should be safer now that we've got the expanding truncate patch into the kernel (2.2.18pre). I'm porting it forward now. thanks, chris ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-02 9:35 ` Chris Wedgwood 2002-02-02 10:25 ` Hans Reiser 2002-02-02 15:24 ` Chris Mason @ 2002-02-02 18:20 ` Chris Mason 2002-02-02 19:54 ` Andrea Arcangeli 2 siblings, 1 reply; 32+ messages in thread From: Chris Mason @ 2002-02-02 18:20 UTC (permalink / raw) To: Chris Wedgwood, Steve Lord Cc: Andrew Morton, Ricardo Galli, Linux Kernel, andrea Ok, the tricky part of direct io on reiserfs is the tails. But, since direct io isn't allowed on non-page aligned file sizes, we'll never have direct io onto a normal file tail. < 2.4.18 reiserfs versions allowed expanding truncates to set i_size without creating the corresponding metadata, so we still have to deal with that. It means we could have a packed tail on any file size, including those bigger than the 16k limit after which we don't create tails any more. Chris and I had initially decided to unpack the tails on file open if O_DIRECT is used, but it seems cleaner to add a reiserfs_get_block_direct_io, and have it return -EINVAL if a read went to a tail. writes that happen to a tail will trigger tail conversion. Anyway, this patch is very lightly tested, I'll try all the corner cases on sunday. -chris # against 2.4.18-pe7 # --- temp.1/fs/reiserfs/inode.c Mon, 28 Jan 2002 09:51:50 -0500 +++ temp.1(w)/fs/reiserfs/inode.c Sat, 02 Feb 2002 12:26:50 -0500 @@ -445,6 +445,20 @@ return reiserfs_get_block(inode, block, bh_result, GET_BLOCK_NO_HOLE) ; } +static int reiserfs_get_block_direct_io (struct inode * inode, long block, + struct buffer_head * bh_result, int create) { + int ret ; + + ret = reiserfs_get_block(inode, block, bh_result, create) ; + + /* don't allow direct io onto tail pages */ + if (ret == 0 && buffer_mapped(bh_result) && bh_result->b_blocknr == 0) { + ret = -EINVAL ; + } + return ret ; +} + + /* ** helper function for when reiserfs_get_block is called for a hole ** but the file tail is still in a direct item @@ -2050,11 +2064,20 @@ return ret ; } +static int reiserfs_direct_io(int rw, struct inode *inode, + struct kiobuf *iobuf, unsigned long blocknr, + int blocksize) +{ + return generic_direct_IO(rw, inode, iobuf, blocknr, blocksize, + reiserfs_get_block_direct_io) ; +} + struct address_space_operations reiserfs_address_space_operations = { writepage: reiserfs_writepage, readpage: reiserfs_readpage, sync_page: block_sync_page, prepare_write: reiserfs_prepare_write, commit_write: reiserfs_commit_write, - bmap: reiserfs_aop_bmap + bmap: reiserfs_aop_bmap, + direct_IO: reiserfs_direct_io, } ; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-02 18:20 ` Chris Mason @ 2002-02-02 19:54 ` Andrea Arcangeli 2002-02-02 20:10 ` Chris Mason 0 siblings, 1 reply; 32+ messages in thread From: Andrea Arcangeli @ 2002-02-02 19:54 UTC (permalink / raw) To: Chris Mason Cc: Chris Wedgwood, Steve Lord, Andrew Morton, Ricardo Galli, Linux Kernel On Sat, Feb 02, 2002 at 01:20:08PM -0500, Chris Mason wrote: > > Ok, the tricky part of direct io on reiserfs is the tails. But, > since direct io isn't allowed on non-page aligned file sizes, we'll > never have direct io onto a normal file tail. > > < 2.4.18 reiserfs versions allowed expanding truncates to set i_size > without creating the corresponding metadata, so we still have to deal > with that. It means we could have a packed tail on any file size, > including those bigger than the 16k limit after which we don't create > tails any more. > > Chris and I had initially decided to unpack the tails on file open > if O_DIRECT is used, but it seems cleaner to add a > reiserfs_get_block_direct_io, and have it return -EINVAL if a read > went to a tail. writes that happen to a tail will trigger tail > conversion. This is a safe approch (no risk of corruption etc..). However to provide the same semantics of the other filesystems it would be even better if we could unpack the tail within reiserfs_get_block_direct_io rather than returning -EINVAL, but ok, most apps should work fine anyways (and as worse people can workaround the magic by remounting reiserfs with notail before writing the data that will need to be handled later via O_DIRECT). thanks for the patch, Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-02 19:54 ` Andrea Arcangeli @ 2002-02-02 20:10 ` Chris Mason 2002-02-02 20:16 ` Stephen Lord 0 siblings, 1 reply; 32+ messages in thread From: Chris Mason @ 2002-02-02 20:10 UTC (permalink / raw) To: Andrea Arcangeli Cc: Chris Wedgwood, Steve Lord, Andrew Morton, Ricardo Galli, Linux Kernel On Saturday, February 02, 2002 08:54:38 PM +0100 Andrea Arcangeli <andrea@suse.de> wrote: >> Chris and I had initially decided to unpack the tails on file open >> if O_DIRECT is used, but it seems cleaner to add a >> reiserfs_get_block_direct_io, and have it return -EINVAL if a read >> went to a tail. writes that happen to a tail will trigger tail >> conversion. > > This is a safe approch (no risk of corruption etc..). However to provide > the same semantics of the other filesystems it would be even better if > we could unpack the tail within reiserfs_get_block_direct_io rather than > returning -EINVAL, but ok, most apps should work fine anyways (and as > worse people can workaround the magic by remounting reiserfs with notail > before writing the data that will need to be handled later via > O_DIRECT). In the normal case, O_DIRECT can't be done on a file with a tail. The way I read generic_file_direct_IO, O_DIRECT is only done in units that start block aligned, and continue for a block aligned length. So, this can never include a packed file tail. We should only need to worry if i_size on the file is wrong, and allows a read/write to a block aligned chunk on a file with a tail, which should only be legal in the expanding truncate case from older kernels. The -EINVAL return should only happen in this (very unlikely) case. -chris ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-02 20:10 ` Chris Mason @ 2002-02-02 20:16 ` Stephen Lord 2002-02-02 20:50 ` Jeff Garzik 0 siblings, 1 reply; 32+ messages in thread From: Stephen Lord @ 2002-02-02 20:16 UTC (permalink / raw) To: Chris Mason Cc: Andrea Arcangeli, Chris Wedgwood, Andrew Morton, Ricardo Galli, Linux Kernel Chris Mason wrote: > >On Saturday, February 02, 2002 08:54:38 PM +0100 Andrea Arcangeli <andrea@suse.de> wrote: > >>>Chris and I had initially decided to unpack the tails on file open >>>if O_DIRECT is used, but it seems cleaner to add a >>>reiserfs_get_block_direct_io, and have it return -EINVAL if a read >>>went to a tail. writes that happen to a tail will trigger tail >>>conversion. >>> >>This is a safe approch (no risk of corruption etc..). However to provide >>the same semantics of the other filesystems it would be even better if >>we could unpack the tail within reiserfs_get_block_direct_io rather than >>returning -EINVAL, but ok, most apps should work fine anyways (and as >>worse people can workaround the magic by remounting reiserfs with notail >>before writing the data that will need to be handled later via >>O_DIRECT). >> > >In the normal case, O_DIRECT can't be done on a file with a tail. > >The way I read generic_file_direct_IO, O_DIRECT is only done in >units that start block aligned, and continue for a block aligned >length. So, this can never include a packed file tail. > >We should only need to worry if i_size on the file is wrong, and allows a >read/write to a block aligned chunk on a file with a tail, which should >only be legal in the expanding truncate case from older kernels. The >-EINVAL return should only happen in this (very unlikely) case. > >-chris > Can't you fall back to buffered I/O for the tail? OK it complicates the code, probably a lot, but it keeps things sane from the user's point of view. Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-02 20:16 ` Stephen Lord @ 2002-02-02 20:50 ` Jeff Garzik 2002-02-03 13:40 ` Stephen Lord 0 siblings, 1 reply; 32+ messages in thread From: Jeff Garzik @ 2002-02-02 20:50 UTC (permalink / raw) To: Stephen Lord Cc: Chris Mason, Andrea Arcangeli, Chris Wedgwood, Andrew Morton, Ricardo Galli, Linux Kernel On Sat, Feb 02, 2002 at 02:16:41PM -0600, Stephen Lord wrote: > Can't you fall back to buffered I/O for the tail? OK it complicates the > code, probably a lot, but it keeps things sane from the user's point of > view. For O_DIRECT, IMHO you should fail not fallback. You're simply lying to the underlying program otherwise. In the ibu fs I am hacking on, the idea for O_DIRECT is to fail a read if the file is small enough to fit in the inode. If the O_DIRECT action is a write, then I will invalidate the data in the inode, then follow the standard path (which eventually calls get_block()). For file tails (a different case from small-file-in-inode), I imagine it would be prudent to support O_DIRECT for all actions except reading the file tail. If you want to be complicated, you could provide userspace with a way to say "this is a dense file" and/or simply not create a tail at all... Jeff ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-02 20:50 ` Jeff Garzik @ 2002-02-03 13:40 ` Stephen Lord 2002-02-03 14:09 ` Chris Wedgwood 0 siblings, 1 reply; 32+ messages in thread From: Stephen Lord @ 2002-02-03 13:40 UTC (permalink / raw) To: Jeff Garzik Cc: Chris Mason, Andrea Arcangeli, Chris Wedgwood, Andrew Morton, Ricardo Galli, Linux Kernel Jeff Garzik wrote: >On Sat, Feb 02, 2002 at 02:16:41PM -0600, Stephen Lord wrote: > >>Can't you fall back to buffered I/O for the tail? OK it complicates the >>code, probably a lot, but it keeps things sane from the user's point of >>view. >> > >For O_DIRECT, IMHO you should fail not fallback. You're simply lying >to the underlying program otherwise. > By fallback I mean't just for the tail, not the whole file. I have been there before. I had to implement the mixed mode buffered/direct I/O on Unicos because a change in underlying disk subsystems stopped customer applications from working - the allowed boundaries for O_DIRECT stopped working when the sales people sold them some new disks. This also meant you could get most of the speed benefits of O_DIRECT without having to align your I/O, it also meant really large I/Os could be made to automatically bypass cache to avoid cache thrashing. What we had were two flags, one which indicated use direct I/O, and another which indicated return an error to user space rather than go through buffers. So lie to me and make it work, or don't lie to me options I suppose. > > >In the ibu fs I am hacking on, the idea for O_DIRECT is to fail a read >if the file is small enough to fit in the inode. If the O_DIRECT >action is a write, then I will invalidate the data in the inode, >then follow the standard path (which eventually calls get_block()). > >For file tails (a different case from small-file-in-inode), I >imagine it would be prudent to support O_DIRECT for all actions >except reading the file tail. If you want to be complicated, you >could provide userspace with a way to say "this is a dense file" >and/or simply not create a tail at all... > I suspect the reason XFS never did small files in the inode was because of the problems with implementing mmap and O_DIRECT. > > Jeff > > Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-03 13:40 ` Stephen Lord @ 2002-02-03 14:09 ` Chris Wedgwood 2002-02-03 15:05 ` Stephen Lord 0 siblings, 1 reply; 32+ messages in thread From: Chris Wedgwood @ 2002-02-03 14:09 UTC (permalink / raw) To: Stephen Lord Cc: Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel On Sun, Feb 03, 2002 at 07:40:57AM -0600, Stephen Lord wrote: What we had were two flags, one which indicated use direct I/O, and another which indicated return an error to user space rather than go through buffers. So lie to me and make it work, or don't lie to me options I suppose. This seems way to complex in the case of reiserfs... you're only going to see tails for small files (typically under 16k) and for the tail part when less than a block. Since O_DIRECT much be blocked sized and block aligned, I'm not sure if this is a problem at present... I suspect the reason XFS never did small files in the inode was because of the problems with implementing mmap and O_DIRECT. How does IRIX deal with O_DIRECT read/writes of a mapped area? Invalidate them or just accept things as being incoherent? --cw ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-03 14:09 ` Chris Wedgwood @ 2002-02-03 15:05 ` Stephen Lord 2002-02-03 22:44 ` Chris Wedgwood 0 siblings, 1 reply; 32+ messages in thread From: Stephen Lord @ 2002-02-03 15:05 UTC (permalink / raw) To: Chris Wedgwood Cc: Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel Chris Wedgwood wrote: >On Sun, Feb 03, 2002 at 07:40:57AM -0600, Stephen Lord wrote: > > What we had were two flags, one which indicated use direct I/O, > and another which indicated return an error to user space rather > than go through buffers. So lie to me and make it work, or don't > lie to me options I suppose. > >This seems way to complex in the case of reiserfs... you're only going >to see tails for small files (typically under 16k) and for the tail >part when less than a block. > >Since O_DIRECT much be blocked sized and block aligned, I'm not sure >if this is a problem at present... > I agree is is not a big issue in this case - my interpretation of tails was the end of any file could be packed, but if it is only small files..... > > > I suspect the reason XFS never did small files in the inode was > because of the problems with implementing mmap and O_DIRECT. > >How does IRIX deal with O_DIRECT read/writes of a mapped area? >Invalidate them or just accept things as being incoherent? > They are invalidated at the start of the I/O, but page faults are not blocked out for the duration of the I/O, so the coherency is weak. However, if an application is doing a combination of mmapped and direct I/O to a file at the same time, then it should generally have some form of user space synchronization anyway. For an application doing its own synchronization of different I/Os they are coherent. > > > > --cw > Steve ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-03 15:05 ` Stephen Lord @ 2002-02-03 22:44 ` Chris Wedgwood 2002-02-04 15:04 ` Jeff Garzik 2002-02-04 15:15 ` Steve Lord 0 siblings, 2 replies; 32+ messages in thread From: Chris Wedgwood @ 2002-02-03 22:44 UTC (permalink / raw) To: Stephen Lord Cc: Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel On Sun, Feb 03, 2002 at 09:05:04AM -0600, Stephen Lord wrote: I agree is is not a big issue in this case - my interpretation of tails was the end of any file could be packed, but if it is only small files..... But you can't mmap (say) a 1k file right now... so right now this isn't a problem, but at some point a larger mmap granularity would be nice --- especially on architectures with small (or untagged) TLBs. I'm guessing so as not to break backwards compatibility we will have to support variable page-sizes (creating a plethora of nasties I imagine). They are invalidated at the start of the I/O Cool. That much I'd like to see under Linux but page faults are not blocked out for the duration of the I/O so the coherency is weak. I was thinking this would also be goof, basically invalidate those pages and remove them from the VMAs, marking them as unusable pending IO completion --- the logic her being if you were to fault on an invalidated page during IO you deserve to block indefinitely until the IO completes. However, if an application is doing a combination of mmapped and direct I/O to a file at the same time, then it should generally have some form of user space synchronization anyway. I hadn't considered that. I imagined an application doing either but not both, and the kernel enforcing this. However, in the case when you want to mmap a large file, you may want to manipulate some pages using mmap whilst writing others with O_DIRECT. Although, in such cases arguably you could using multiple mapping's. --cw ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-03 22:44 ` Chris Wedgwood @ 2002-02-04 15:04 ` Jeff Garzik 2002-02-04 15:21 ` Chris Mason 2002-02-04 15:15 ` Steve Lord 1 sibling, 1 reply; 32+ messages in thread From: Jeff Garzik @ 2002-02-04 15:04 UTC (permalink / raw) To: Chris Wedgwood Cc: Stephen Lord, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel Chris Wedgwood wrote: > > On Sun, Feb 03, 2002 at 09:05:04AM -0600, Stephen Lord wrote: > > I agree is is not a big issue in this case - my interpretation of > tails was the end of any file could be packed, but if it is only > small files..... > > But you can't mmap (say) a 1k file right now... so right now this huh? You can mmap a file of any size > 0. Is this a reiserfs limitation or something? Jeff -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 15:04 ` Jeff Garzik @ 2002-02-04 15:21 ` Chris Mason 0 siblings, 0 replies; 32+ messages in thread From: Chris Mason @ 2002-02-04 15:21 UTC (permalink / raw) To: Jeff Garzik, Chris Wedgwood Cc: Stephen Lord, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel On Monday, February 04, 2002 10:04:45 AM -0500 Jeff Garzik <jgarzik@mandrakesoft.com> wrote: > Chris Wedgwood wrote: >> >> On Sun, Feb 03, 2002 at 09:05:04AM -0600, Stephen Lord wrote: >> >> I agree is is not a big issue in this case - my interpretation of >> tails was the end of any file could be packed, but if it is only >> small files..... >> >> But you can't mmap (say) a 1k file right now... so right now this > > huh? You can mmap a file of any size > 0. Is this a reiserfs > limitation or something? > No, reiserfs can mmap files of size 1k. Data past the end of file is zerod on write. -chris ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-03 22:44 ` Chris Wedgwood 2002-02-04 15:04 ` Jeff Garzik @ 2002-02-04 15:15 ` Steve Lord 2002-02-04 15:46 ` Alan Cox 1 sibling, 1 reply; 32+ messages in thread From: Steve Lord @ 2002-02-04 15:15 UTC (permalink / raw) To: Chris Wedgwood Cc: Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel On Sun, 2002-02-03 at 16:44, Chris Wedgwood wrote: > On Sun, Feb 03, 2002 at 09:05:04AM -0600, Stephen Lord wrote: > > but page faults are not blocked out for the duration of the I/O so > the coherency is weak. > > I was thinking this would also be goof, basically invalidate those > pages and remove them from the VMAs, marking them as unusable pending > IO completion --- the logic her being if you were to fault on an > invalidated page during IO you deserve to block indefinitely until the > IO completes. > > However, if an application is doing a combination of mmapped and > direct I/O to a file at the same time, then it should generally > have some form of user space synchronization anyway. > > I hadn't considered that. I imagined an application doing either but > not both, and the kernel enforcing this. However, in the case when > you want to mmap a large file, you may want to manipulate some pages > using mmap whilst writing others with O_DIRECT. Although, in such > cases arguably you could using multiple mapping's. > > If an application is single threaded then it cannot be doing both at the same time - so all we need to do is flush and invalidate mappings at the start of I/O. This is really only needed for the range covered by the direct read/write. If an application is multithreaded and is doing mmap and direct I/O from different threads without doing its own synchronization, then it is broken, there is no ordering guarantee provided by the kernel as to what happens first. > > --cw Steve -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 15:15 ` Steve Lord @ 2002-02-04 15:46 ` Alan Cox 2002-02-04 16:02 ` Steve Lord 2002-02-04 18:29 ` Joel Becker 0 siblings, 2 replies; 32+ messages in thread From: Alan Cox @ 2002-02-04 15:46 UTC (permalink / raw) To: Steve Lord Cc: Chris Wedgwood, Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel > If an application is multithreaded and is doing mmap and direct I/O > from different threads without doing its own synchronization, then it > is broken, there is no ordering guarantee provided by the kernel as > to what happens first. Providing we don't allow asynchronous I/O with O_DIRECT once asynchronous I/O is merged. Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 15:46 ` Alan Cox @ 2002-02-04 16:02 ` Steve Lord 2002-02-04 18:22 ` Daniel Phillips 2002-02-04 18:29 ` Joel Becker 1 sibling, 1 reply; 32+ messages in thread From: Steve Lord @ 2002-02-04 16:02 UTC (permalink / raw) To: Alan Cox Cc: Chris Wedgwood, Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel On Mon, 2002-02-04 at 09:46, Alan Cox wrote: > > If an application is multithreaded and is doing mmap and direct I/O > > from different threads without doing its own synchronization, then it > > is broken, there is no ordering guarantee provided by the kernel as > > to what happens first. > > Providing we don't allow asynchronous I/O with O_DIRECT once asynchronous > I/O is merged. But async I/O itself needs synchronisation (being English in this email ;-) to be meaningful. If I issue a bunch of async I/O calls which overlap with each other then the outcome is really undefined in terms of what ends up on the disk. Scheduling of the actual I/O operations is really no different from them being synchronous calls from different user space threads. The only questions you can really ask is 'is read atomic with respect to write?' and 'are writes atomic with respect to each other?'. So when you perform a read it sees data from before or after writes, but never sees data from half way through a write. And for multiple write calls the output appears as if one write happened after the other, not intermingled with each other. Irix actually takes the viewpoint that it only needs to make a best effort at synchronizing between direct I/O and other modes of I/O. Multiple direct writers are allowed into a file at once, and direct writers and buffered readers are also allowed to operate in parallel. At this point coherency is really up to the applications. I am not presenting this as a recommended model for linux, just reporting what it does. > > Alan Steve -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 16:02 ` Steve Lord @ 2002-02-04 18:22 ` Daniel Phillips 2002-02-04 19:11 ` Steve Lord 0 siblings, 1 reply; 32+ messages in thread From: Daniel Phillips @ 2002-02-04 18:22 UTC (permalink / raw) To: Steve Lord, Alan Cox Cc: Chris Wedgwood, Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel On February 4, 2002 05:02 pm, Steve Lord wrote: > But async I/O itself needs synchronisation (being English in this email ;-) > to be meaningful. If I issue a bunch of async I/O calls which overlap with > each other then the outcome is really undefined in terms of what ends up > on the disk. Scheduling of the actual I/O operations is really no different > from them being synchronous calls from different user space threads. > > The only questions you can really ask is 'is read atomic with respect to > write?' and 'are writes atomic with respect to each other?'. So when you > perform a read it sees data from before or after writes, but never sees > data from half way through a write. And for multiple write calls the output > appears as if one write happened after the other, not intermingled > with each other. Why is it not ok to have the writes come out intermingled, if that's what the user has asked for? (Implicitly, by not synchronizing the writes.) > Irix actually takes the viewpoint that it only needs to make a best effort > at synchronizing between direct I/O and other modes of I/O. Multiple > direct writers are allowed into a file at once, and direct writers and > buffered readers are also allowed to operate in parallel. At this point > coherency is really up to the applications. I am not presenting this as > a recommended model for linux, just reporting what it does. I'm having a little trouble with this. Suppose an application does direct IO on a file but, unbeknownst to it, some other program has done buffered IO on the file, so that there are still dirty blocks in the page cache, waiting to land by surprise on top of unbuffered data. A third program may come along to do buffered IO on the file, and find stale blocks in cache. Am I missing something here? -- Daniel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 18:22 ` Daniel Phillips @ 2002-02-04 19:11 ` Steve Lord 0 siblings, 0 replies; 32+ messages in thread From: Steve Lord @ 2002-02-04 19:11 UTC (permalink / raw) To: Daniel Phillips Cc: Alan Cox, Chris Wedgwood, Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel On Mon, 2002-02-04 at 12:22, Daniel Phillips wrote: > On February 4, 2002 05:02 pm, Steve Lord wrote: > > But async I/O itself needs synchronisation (being English in this email ;-) > > to be meaningful. If I issue a bunch of async I/O calls which overlap with > > each other then the outcome is really undefined in terms of what ends up > > on the disk. Scheduling of the actual I/O operations is really no different > > from them being synchronous calls from different user space threads. > > > > The only questions you can really ask is 'is read atomic with respect to > > write?' and 'are writes atomic with respect to each other?'. So when you > > perform a read it sees data from before or after writes, but never sees > > data from half way through a write. And for multiple write calls the output > > appears as if one write happened after the other, not intermingled > > with each other. > > Why is it not ok to have the writes come out intermingled, if that's what the > user has asked for? (Implicitly, by not synchronizing the writes.) I cannot quote a source, but I have heard people say Posix - or some other standard, all I can find on google is people saying read is atomic wrt to write, but there is no definition of writes wrt other writes. > > > Irix actually takes the viewpoint that it only needs to make a best effort > > at synchronizing between direct I/O and other modes of I/O. Multiple > > direct writers are allowed into a file at once, and direct writers and > > buffered readers are also allowed to operate in parallel. At this point > > coherency is really up to the applications. I am not presenting this as > > a recommended model for linux, just reporting what it does. > > I'm having a little trouble with this. Suppose an application does direct > IO on a file but, unbeknownst to it, some other program has done buffered > IO on the file, so that there are still dirty blocks in the page cache, > waiting to land by surprise on top of unbuffered data. A third program > may come along to do buffered IO on the file, and find stale blocks in > cache. Am I missing something here? No you are not, I did not say it was totally coherent, at the start of the direct I/O the caches are made coherent, they can drift apart during the operation if buffered or mmapped I/O is ongoing during the operation, and yes those blocks are stale in the cache. In normal life people do not seem to mix direct I/O and other forms of I/O in parallel. If you want full coherency you have to lock out page faults and buffered I/O during direct I/O. You also need to deadlock avoidance code for the case where someone does this: fd = open("file", O_DIRECT|O_RDWR); mem = mmap(&addr, 40960, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 20480); read(fd, mem, 32768); Steve > > -- > Daniel -- Steve Lord voice: +1-651-683-3511 Principal Engineer, Filesystem Software email: lord@sgi.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 15:46 ` Alan Cox 2002-02-04 16:02 ` Steve Lord @ 2002-02-04 18:29 ` Joel Becker 2002-02-04 18:49 ` Jeff Garzik 1 sibling, 1 reply; 32+ messages in thread From: Joel Becker @ 2002-02-04 18:29 UTC (permalink / raw) To: Alan Cox Cc: Steve Lord, Chris Wedgwood, Jeff Garzik, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel On Mon, Feb 04, 2002 at 03:46:20PM +0000, Alan Cox wrote: > > If an application is multithreaded and is doing mmap and direct I/O > > from different threads without doing its own synchronization, then it > > is broken, there is no ordering guarantee provided by the kernel as > > to what happens first. > > Providing we don't allow asynchronous I/O with O_DIRECT once asynchronous > I/O is merged. Oh, but async + O_DIRECT is a good thing. The fundamental ordering comes down at the block layer. Things are synchronous there. An application using async I/O knows that ordering is not guaranteed. Applications using O_DIRECT know they are skipping the buffer cache. "Caveat emptor" and "Don't do that then" apply to stupid applications. The big issues I see are O_DIRECT alignment size (see my patch to allow hardsectsize alignment on O_DIRECT ops) and whether or not to synchronize with the caches upon O_DIRECT write. Keeping the page/buffer caches in sync with O_DIRECT writes is a bit of work, especially with writes smaller than sb_blocksize. You can either do that work, or you can say that applications and people using O_DIRECT should know the caches might be inconsistent. Large O_DIRECT users, such as databases, already know this. They are happily ignorant of cache inconsistencies. All they care about is hardsectsize O_DIRECT operations. Joel -- Life's Little Instruction Book #267 "Lie on your back and look at the stars." http://www.jlbec.org/ jlbec@evilplan.org ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 18:29 ` Joel Becker @ 2002-02-04 18:49 ` Jeff Garzik 2002-02-04 18:55 ` Joel Becker 0 siblings, 1 reply; 32+ messages in thread From: Jeff Garzik @ 2002-02-04 18:49 UTC (permalink / raw) To: Joel Becker Cc: Alan Cox, Steve Lord, Chris Wedgwood, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel, linux-fsdevel Joel Becker wrote: > should know the caches might be inconsistent. Large O_DIRECT users, > such as databases, already know this. They are happily ignorant of > cache inconsistencies. All they care about is hardsectsize O_DIRECT > operations. I have similar inclination, that is inspired from the implementation of "NTFS TNG": hard sector size should always equal sb->blocksize. This allows for fine-grained operations at the O_DIRECT level, logical block sizes > PAGE_CACHE_SIZE, easy implementation of fragments (>= hard sect size), O_DIRECT for fragments, and other stuff. This works right now in 2.4 and 2.5 with no modification to the VFS core. Jeff -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 18:49 ` Jeff Garzik @ 2002-02-04 18:55 ` Joel Becker 2002-02-04 19:16 ` Jeff Garzik 0 siblings, 1 reply; 32+ messages in thread From: Joel Becker @ 2002-02-04 18:55 UTC (permalink / raw) To: Jeff Garzik Cc: Joel Becker, Alan Cox, Steve Lord, Chris Wedgwood, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel, linux-fsdevel On Mon, Feb 04, 2002 at 01:49:10PM -0500, Jeff Garzik wrote: > I have similar inclination, that is inspired from the implementation of > "NTFS TNG": hard sector size should always equal sb->blocksize. This > allows for fine-grained operations at the O_DIRECT level, logical block > sizes > PAGE_CACHE_SIZE, easy implementation of fragments (>= hard sect > size), O_DIRECT for fragments, and other stuff. I'm not sure I get you here. When I say hardsectsize, I mean get_hardsectsize(dev), not super->s_blocksize. On ext2, s_blocksize is 1k, 2k, or 4k. Databases want to use O_DIRECT aligned at 512b. This can be done (again, see my patch), and I would think it necesary. If you meant that s_blocksize should match get_hardsectsize, I agree. If you meant the other way around, then consumers that want to do O_DIRECT operations at 512b alingments won't be able to. Joel -- "All alone at the end of the evening When the bright lights have faded to blue. I was thinking about a woman who had loved me And I never knew" http://www.jlbec.org/ jlbec@evilplan.org ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 18:55 ` Joel Becker @ 2002-02-04 19:16 ` Jeff Garzik 0 siblings, 0 replies; 32+ messages in thread From: Jeff Garzik @ 2002-02-04 19:16 UTC (permalink / raw) To: Joel Becker Cc: Alan Cox, Steve Lord, Chris Wedgwood, Chris Mason, Andrea Arcangeli, Andrew Morton, Ricardo Galli, Linux Kernel, linux-fsdevel Joel Becker wrote: > On Mon, Feb 04, 2002 at 01:49:10PM -0500, Jeff Garzik wrote: > > hard sector size should always equal sb->blocksize. > If you meant that s_blocksize should match get_hardsectsize, I yes. get_hardsectsize returns hard sector size, so this is what I meant. -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-01 20:44 ` Andrew Morton 2002-02-01 20:49 ` Ricardo Galli 2002-02-01 21:05 ` Steve Lord @ 2002-02-02 17:14 ` Christoph Hellwig 2 siblings, 0 replies; 32+ messages in thread From: Christoph Hellwig @ 2002-02-02 17:14 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Ricardo Galli In article <3C5AFE2D.95A3C02E@zip.com.au> you wrote: >> Oliver Diedrich also told he could make work O_DIRECT with ext3 and 2.4.17. >> >> Is this normal? Does it really work on 2.4.14? Or it doesn't but the kernel >> doesn't avoid caching? >> > > ext2 is the only filesystem which has O_DIRECT support. You forgot JFS and XFS. Also there is a patche for NFS, but this one requires a prototype change for ->directIO. Christoph -- Of course it doesn't work. We've performed a software upgrade. ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <E16WkQj-0005By-00@antoli.uib.es.suse.lists.linux.kernel>]
[parent not found: <3C5AFE2D.95A3C02E@zip.com.au.suse.lists.linux.kernel>]
[parent not found: <1012597538.26363.443.camel@jen.americas.sgi.com.suse.lists.linux.kernel>]
[parent not found: <20020202093554.GA7207@tapu.f00f.org.suse.lists.linux.kernel>]
[parent not found: <234710000.1012674008@tiny.suse.lists.linux.kernel>]
[parent not found: <20020202205438.D3807@athlon.random.suse.lists.linux.kernel>]
[parent not found: <242700000.1012680610@tiny.suse.lists.linux.kernel>]
[parent not found: <3C5C4929.5080403@sgi.com.suse.lists.linux.kernel>]
[parent not found: <20020202155028.B26147@havoc.gtf.org.suse.lists.linux.kernel>]
* Re: O_DIRECT fails in some kernel and FS [not found] ` <20020202155028.B26147@havoc.gtf.org.suse.lists.linux.kernel> @ 2002-02-03 7:26 ` Andi Kleen 2002-02-04 15:13 ` Jeff Garzik 0 siblings, 1 reply; 32+ messages in thread From: Andi Kleen @ 2002-02-03 7:26 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel Jeff Garzik <garzik@havoc.gtf.org> writes: > On Sat, Feb 02, 2002 at 02:16:41PM -0600, Stephen Lord wrote: > > Can't you fall back to buffered I/O for the tail? OK it complicates the > > code, probably a lot, but it keeps things sane from the user's point of > > view. > > For O_DIRECT, IMHO you should fail not fallback. You're simply lying > to the underlying program otherwise. It's just impossible to write a tail which is smaller than a disk block without another buffer. -Andi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-03 7:26 ` Andi Kleen @ 2002-02-04 15:13 ` Jeff Garzik 2002-02-04 15:31 ` Chris Mason 0 siblings, 1 reply; 32+ messages in thread From: Jeff Garzik @ 2002-02-04 15:13 UTC (permalink / raw) To: Andi Kleen; +Cc: Chris Mason, Andrea Arcangeli, linux-kernel Andi Kleen wrote: > > Jeff Garzik <garzik@havoc.gtf.org> writes: > > > On Sat, Feb 02, 2002 at 02:16:41PM -0600, Stephen Lord wrote: > > > Can't you fall back to buffered I/O for the tail? OK it complicates the > > > code, probably a lot, but it keeps things sane from the user's point of > > > view. > > > > For O_DIRECT, IMHO you should fail not fallback. You're simply lying > > to the underlying program otherwise. > > It's just impossible to write a tail which is smaller than a disk block > without another buffer. I argue, for reiserfs: For O_DIRECT writes, the preferred behavior is to write disk blocks obtained through the normal methods (get_block, etc.), and fully support inodes for which file tails do not exist. For O_DIRECT reads, if the data is determined to be in a file tail, ->direct_IO should either (a) fail or (b) dump the file tail to a normal disk block before performing ->direct_IO. -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: O_DIRECT fails in some kernel and FS 2002-02-04 15:13 ` Jeff Garzik @ 2002-02-04 15:31 ` Chris Mason 0 siblings, 0 replies; 32+ messages in thread From: Chris Mason @ 2002-02-04 15:31 UTC (permalink / raw) To: Jeff Garzik, Andi Kleen; +Cc: Andrea Arcangeli, linux-kernel On Monday, February 04, 2002 10:13:37 AM -0500 Jeff Garzik <jgarzik@mandrakesoft.com> wrote: > Andi Kleen wrote: >> >> Jeff Garzik <garzik@havoc.gtf.org> writes: >> >> > On Sat, Feb 02, 2002 at 02:16:41PM -0600, Stephen Lord wrote: >> > > Can't you fall back to buffered I/O for the tail? OK it complicates the >> > > code, probably a lot, but it keeps things sane from the user's point of >> > > view. >> > >> > For O_DIRECT, IMHO you should fail not fallback. You're simply lying >> > to the underlying program otherwise. >> >> It's just impossible to write a tail which is smaller than a disk block >> without another buffer. > > I argue, for reiserfs: > > For O_DIRECT writes, the preferred behavior is to write disk blocks > obtained through the normal methods (get_block, etc.), and fully support > inodes for which file tails do not exist. Done ;-) > > For O_DIRECT reads, if the data is determined to be in a file tail, > ->direct_IO should either (a) fail or (b) dump the file tail to a normal > disk block before performing ->direct_IO. The current patch does A. Another option is to change the reiserfs open code to detect the tail and do an -EINVAL for o_direct. This gives the application a better way to fall back to normal open methods than returning an error during the read. Just to restate, the current O_DIRECT code can never hit a reiserfs tail in the normal case. By definition, reiserfs tails are not block aligned, and O_DIRECT writes are. The only time it is a concern is with a screwy interaction between expanding truncates and tails on kernels < 2.4.17. Since most O_DIRECT users are databases, and tails are never created on files > 16k in size, I don't expect anyone to ever see the reiserfs triggered -EINVAL from the current patch (famous last words). -chris ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2002-02-04 19:17 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-01 20:37 O_DIRECT fails in some kernel and FS Ricardo Galli
2002-02-01 20:44 ` Andrew Morton
2002-02-01 20:49 ` Ricardo Galli
2002-02-01 20:57 ` Andrew Morton
2002-02-01 21:05 ` Steve Lord
2002-02-02 9:35 ` Chris Wedgwood
2002-02-02 10:25 ` Hans Reiser
2002-02-02 15:24 ` Chris Mason
2002-02-02 18:20 ` Chris Mason
2002-02-02 19:54 ` Andrea Arcangeli
2002-02-02 20:10 ` Chris Mason
2002-02-02 20:16 ` Stephen Lord
2002-02-02 20:50 ` Jeff Garzik
2002-02-03 13:40 ` Stephen Lord
2002-02-03 14:09 ` Chris Wedgwood
2002-02-03 15:05 ` Stephen Lord
2002-02-03 22:44 ` Chris Wedgwood
2002-02-04 15:04 ` Jeff Garzik
2002-02-04 15:21 ` Chris Mason
2002-02-04 15:15 ` Steve Lord
2002-02-04 15:46 ` Alan Cox
2002-02-04 16:02 ` Steve Lord
2002-02-04 18:22 ` Daniel Phillips
2002-02-04 19:11 ` Steve Lord
2002-02-04 18:29 ` Joel Becker
2002-02-04 18:49 ` Jeff Garzik
2002-02-04 18:55 ` Joel Becker
2002-02-04 19:16 ` Jeff Garzik
2002-02-02 17:14 ` Christoph Hellwig
[not found] <E16WkQj-0005By-00@antoli.uib.es.suse.lists.linux.kernel>
[not found] ` <3C5AFE2D.95A3C02E@zip.com.au.suse.lists.linux.kernel>
[not found] ` <1012597538.26363.443.camel@jen.americas.sgi.com.suse.lists.linux.kernel>
[not found] ` <20020202093554.GA7207@tapu.f00f.org.suse.lists.linux.kernel>
[not found] ` <234710000.1012674008@tiny.suse.lists.linux.kernel>
[not found] ` <20020202205438.D3807@athlon.random.suse.lists.linux.kernel>
[not found] ` <242700000.1012680610@tiny.suse.lists.linux.kernel>
[not found] ` <3C5C4929.5080403@sgi.com.suse.lists.linux.kernel>
[not found] ` <20020202155028.B26147@havoc.gtf.org.suse.lists.linux.kernel>
2002-02-03 7:26 ` Andi Kleen
2002-02-04 15:13 ` Jeff Garzik
2002-02-04 15:31 ` Chris Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox