* O_DIRECT foolish question @ 2003-02-12 21:19 Bruno Diniz de Paula 2003-02-12 22:03 ` Andrew Morton 2003-02-12 22:07 ` Chris Wedgwood 0 siblings, 2 replies; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-12 21:19 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 394 bytes --] Hi, I am trying to use O_DIRECT to read ordinary files and read syscall always returns 0, unless when the file size equals the fs block size. Is it true that I can only use O_DIRECT when the size of the file written in the inode is a multiple of block size? Thanks and excuse me for the newbie question, Bruno. -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 21:19 O_DIRECT foolish question Bruno Diniz de Paula @ 2003-02-12 22:03 ` Andrew Morton 2003-02-12 22:29 ` Bruno Diniz de Paula 2003-02-12 22:07 ` Chris Wedgwood 1 sibling, 1 reply; 19+ messages in thread From: Andrew Morton @ 2003-02-12 22:03 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: linux-kernel Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote: > > Hi, > > I am trying to use O_DIRECT to read ordinary files and read syscall > always returns 0, unless when the file size equals the fs block size. It should be returning -1, with errno set to EINVAL. > Is > it true that I can only use O_DIRECT when the size of the file written > in the inode is a multiple of block size? > The file can be of any size - the kernel will zero-fill any remaining bytes. The address and length which you pass into the read() or write() system call must both be a multiple of the filesystem block size. It is always safe to just use the machine's page size for alignment calculations - no filesystem has a blocksize larger than the pagesize. A good way to do this is to run getpagesize(), and to then malloc a buffer which is one page larger than you need. Then round that address up to the next page boundary. And perform I/O into that memory with multiple-of-page-size requests. In the 2.5 kernel the "must be a multiple of blocksize" requirement was relaxed. We now support alignments and lengths down to the minimum which is supported by the underlying device. Typically 512 bytes, but not always. Portable applications should not assume that 512-byte alignment is supported. They should query the device's aligment requirements via the BLKSSZGET ioctl against (say) /dev/hda1. Or they can simply try 512, 1024, 2048, ... at initialisation time. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 22:03 ` Andrew Morton @ 2003-02-12 22:29 ` Bruno Diniz de Paula 2003-02-12 22:42 ` Chris Wedgwood 0 siblings, 1 reply; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-12 22:29 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 2545 bytes --] On Wed, 2003-02-12 at 17:03, Andrew Morton wrote: > Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote: > > > > Hi, > > > > I am trying to use O_DIRECT to read ordinary files and read syscall > > always returns 0, unless when the file size equals the fs block size. > > It should be returning -1, with errno set to EINVAL. But I am using multiples of page size in both buffer alignment and buffer size (2nd and 3rd parameters of read). The issue is that when I try to read files with sizes that are NOT multiples of block size (and therefore also not multiples of page size), the read syscall returns 0, with no errors. With files of size 4096, 8192 etc, everything works fine. The errors shouldn't occur indeed, as I am using the correct alignment and size to read. So the question remains, am I able to read just files whose size is a multiple of block size? Thanks, Bruno. PS: I am running 2.4.20... > > > Is > > it true that I can only use O_DIRECT when the size of the file written > > in the inode is a multiple of block size? > > > > The file can be of any size - the kernel will zero-fill any remaining bytes. > > The address and length which you pass into the read() or write() system call > must both be a multiple of the filesystem block size. > > It is always safe to just use the machine's page size for alignment > calculations - no filesystem has a blocksize larger than the pagesize. > > A good way to do this is to run getpagesize(), and to then malloc a buffer > which is one page larger than you need. Then round that address up to the > next page boundary. And perform I/O into that memory with > multiple-of-page-size requests. > > > > In the 2.5 kernel the "must be a multiple of blocksize" requirement was > relaxed. We now support alignments and lengths down to the minimum which is > supported by the underlying device. Typically 512 bytes, but not always. > > Portable applications should not assume that 512-byte alignment is supported. > They should query the device's aligment requirements via the BLKSSZGET ioctl > against (say) /dev/hda1. Or they can simply try 512, 1024, 2048, ... at > initialisation time. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 22:29 ` Bruno Diniz de Paula @ 2003-02-12 22:42 ` Chris Wedgwood 2003-02-12 23:02 ` Bruno Diniz de Paula 0 siblings, 1 reply; 19+ messages in thread From: Chris Wedgwood @ 2003-02-12 22:42 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: linux-kernel On Wed, Feb 12, 2003 at 05:29:52PM -0500, Bruno Diniz de Paula wrote: > But I am using multiples of page size in both buffer alignment and > buffer size (2nd and 3rd parameters of read). The issue is that > when I try to read files with sizes that are NOT multiples of block > size (and therefore also not multiples of page size), the read > syscall returns 0, with no errors. What filesystem? Can you send an strace of this occurring? > So the question remains, am I able to read just files whose size is > a multiple of block size? No. You ideally should be able to read any length file with O_DIRECT. Even a 1-byte file. --cw ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 22:42 ` Chris Wedgwood @ 2003-02-12 23:02 ` Bruno Diniz de Paula 2003-02-12 23:22 ` Bruno Diniz de Paula 2003-02-12 23:24 ` Chris Wedgwood 0 siblings, 2 replies; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-12 23:02 UTC (permalink / raw) To: Chris Wedgwood; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 2771 bytes --] On Wed, 2003-02-12 at 17:42, Chris Wedgwood wrote: > On Wed, Feb 12, 2003 at 05:29:52PM -0500, Bruno Diniz de Paula wrote: > > > But I am using multiples of page size in both buffer alignment and > > buffer size (2nd and 3rd parameters of read). The issue is that > > when I try to read files with sizes that are NOT multiples of block > > size (and therefore also not multiples of page size), the read > > syscall returns 0, with no errors. > > What filesystem? ext2. > > Can you send an strace of this occurring? execve("./testopen", ["./testopen"], [/* 30 vars */]) = 0 uname({sys="Linux", node="urca", ...}) = 0 brk(0) = 0x80497fc open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=57677, ...}) = 0 old_mmap(NULL, 57677, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40012000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0]Z\1\000"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=1102984, ...}) = 0 old_mmap(NULL, 1112740, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40021000 mprotect(0x40129000, 31396, PROT_NONE) = 0 old_mmap(0x40129000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x107000) = 0x40129000 old_mmap(0x4012f000, 6820, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4012f000 close(3) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40131000 munmap(0x40012000, 57677) = 0 open("/var/tmp/testopen.txt", O_RDONLY|O_DIRECT) = 3 brk(0) = 0x80497fc brk(0x804c7fc) = 0x804c7fc brk(0) = 0x804c7fc brk(0x804d000) = 0x804d000 read(3, "", 4096) = 0 fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 4), ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40012000 write(1, "0 bytes read from file.\n", 240 bytes read from file. ) = 24 close(3) = 0 write(1, "Message: ", 9Message: ) = 9 munmap(0x40012000, 4096) = 0 exit_group(0) = ? Thanks a lot, Bruno. > > > So the question remains, am I able to read just files whose size is > > a multiple of block size? > > No. > > You ideally should be able to read any length file with O_DIRECT. > Even a 1-byte file. > > > > --cw -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 23:02 ` Bruno Diniz de Paula @ 2003-02-12 23:22 ` Bruno Diniz de Paula 2003-02-13 1:46 ` Randy.Dunlap 2003-02-12 23:24 ` Chris Wedgwood 1 sibling, 1 reply; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-12 23:22 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 3619 bytes --] Just to complete the information, I am trying to read a file with 5 bytes, and here is the piece of code I am using: char *message; int fd = open("/var/tmp/testopen.txt", O_RDONLY|O_DIRECT); int len, pagesize = getpagesize(); posix_memalign((void **)&message, pagesize, pagesize); if(fd < 0) { printf("Unable to open file, errno is %d.\n", errno); } else { if((len = read(fd, message, pagesize)) < 0) { perror("read"); } else { printf("%d bytes read from file.\n", len); printf("Message: %s", message); } } close(fd); Thanks, Bruno. On Wed, 2003-02-12 at 18:02, Bruno Diniz de Paula wrote: > On Wed, 2003-02-12 at 17:42, Chris Wedgwood wrote: > > On Wed, Feb 12, 2003 at 05:29:52PM -0500, Bruno Diniz de Paula wrote: > > > > > But I am using multiples of page size in both buffer alignment and > > > buffer size (2nd and 3rd parameters of read). The issue is that > > > when I try to read files with sizes that are NOT multiples of block > > > size (and therefore also not multiples of page size), the read > > > syscall returns 0, with no errors. > > > > What filesystem? > > ext2. > > > > > Can you send an strace of this occurring? > > execve("./testopen", ["./testopen"], [/* 30 vars */]) = 0 > uname({sys="Linux", node="urca", ...}) = 0 > brk(0) = 0x80497fc > open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or > directory) > open("/etc/ld.so.cache", O_RDONLY) = 3 > fstat64(3, {st_mode=S_IFREG|0644, st_size=57677, ...}) = 0 > old_mmap(NULL, 57677, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40012000 > close(3) = 0 > open("/lib/libc.so.6", O_RDONLY) = 3 > read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0]Z\1\000"..., > 1024) = 1024 > fstat64(3, {st_mode=S_IFREG|0755, st_size=1102984, ...}) = 0 > old_mmap(NULL, 1112740, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = > 0x40021000 > mprotect(0x40129000, 31396, PROT_NONE) = 0 > old_mmap(0x40129000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, > 3, 0x107000) = 0x40129000 > old_mmap(0x4012f000, 6820, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4012f000 > close(3) = 0 > old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0x40131000 > munmap(0x40012000, 57677) = 0 > open("/var/tmp/testopen.txt", O_RDONLY|O_DIRECT) = 3 > brk(0) = 0x80497fc > brk(0x804c7fc) = 0x804c7fc > brk(0) = 0x804c7fc > brk(0x804d000) = 0x804d000 > read(3, "", 4096) = 0 > fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 4), ...}) = 0 > old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0x40012000 > write(1, "0 bytes read from file.\n", 240 bytes read from file. > ) = 24 > close(3) = 0 > write(1, "Message: ", 9Message: ) = 9 > munmap(0x40012000, 4096) = 0 > exit_group(0) = ? > > Thanks a lot, > > Bruno. > > > > > > So the question remains, am I able to read just files whose size is > > > a multiple of block size? > > > > No. > > > > You ideally should be able to read any length file with O_DIRECT. > > Even a 1-byte file. > > > > > > > > --cw -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 23:22 ` Bruno Diniz de Paula @ 2003-02-13 1:46 ` Randy.Dunlap 0 siblings, 0 replies; 19+ messages in thread From: Randy.Dunlap @ 2003-02-13 1:46 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: linux-kernel On 12 Feb 2003 18:22:35 -0500 Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote: | Just to complete the information, I am trying to read a file with 5 | bytes, and here is the piece of code I am using: | | char *message; | int fd = open("/var/tmp/testopen.txt", O_RDONLY|O_DIRECT); | int len, pagesize = getpagesize(); | | posix_memalign((void **)&message, pagesize, pagesize); | if(fd < 0) { | printf("Unable to open file, errno is %d.\n", errno); | } else { | if((len = read(fd, message, pagesize)) < 0) { | perror("read"); | } else { | printf("%d bytes read from file.\n", len); | printf("Message: %s", message); | } | } | close(fd); | | Thanks, | | Bruno. | | On Wed, 2003-02-12 at 18:02, Bruno Diniz de Paula wrote: | > On Wed, 2003-02-12 at 17:42, Chris Wedgwood wrote: | > > On Wed, Feb 12, 2003 at 05:29:52PM -0500, Bruno Diniz de Paula wrote: | > > | > > > But I am using multiples of page size in both buffer alignment and | > > > buffer size (2nd and 3rd parameters of read). The issue is that | > > > when I try to read files with sizes that are NOT multiples of block | > > > size (and therefore also not multiples of page size), the read | > > > syscall returns 0, with no errors. | > > | > > What filesystem? | > | > ext2. | > | > > | > > Can you send an strace of this occurring? | > [strace snipped] | > | > Thanks a lot, | > | > Bruno. | > | > > | > > > So the question remains, am I able to read just files whose size is | > > > a multiple of block size? | > > | > > No. | > > | > > You ideally should be able to read any length file with O_DIRECT. | > > Even a 1-byte file. Here's what I get using Bruno's and cw's (od) programs: 2.4.8|2.4.20 2.4.20 2.5.54 ext2 ext3 ext2|ext3 ==== ==== ========= od: read 0 bytes read: Inv. arg. read: Inv. arg. bruno: 0 bytes read read: Inv. arg. read: Inv. arg. -- ~Randy ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 23:02 ` Bruno Diniz de Paula 2003-02-12 23:22 ` Bruno Diniz de Paula @ 2003-02-12 23:24 ` Chris Wedgwood 2003-02-12 23:33 ` Bruno Diniz de Paula 2003-02-12 23:33 ` Chris Wedgwood 1 sibling, 2 replies; 19+ messages in thread From: Chris Wedgwood @ 2003-02-12 23:24 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 331 bytes --] On Wed, Feb 12, 2003 at 06:02:58PM -0500, Bruno Diniz de Paula wrote: > ext2. are you able to test with another fs? (reiserfs and XFS also support O_DIRECT) > read(3, "", 4096) = 0 odd... I'm not sure why you get this i tested locally here and it works as expected ... my test code is appended. --cw [-- Attachment #2: od.c --] [-- Type: text/x-csrc, Size: 422 bytes --] #define _GNU_SOURCE #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <stdlib.h> #include <stdio.h> int main() { int h; int ps; char *buf; ssize_t n; ps = getpagesize(); if (!(buf = valloc(ps))) return 1; if ((h = open("test", O_RDONLY)) < 0) return 1; n = read(h, buf, ps); printf("read %d bytes\n", n); close(h); return 0; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 23:24 ` Chris Wedgwood @ 2003-02-12 23:33 ` Bruno Diniz de Paula 2003-02-12 23:38 ` Chris Wedgwood 2003-02-12 23:33 ` Chris Wedgwood 1 sibling, 1 reply; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-12 23:33 UTC (permalink / raw) To: Chris Wedgwood; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 712 bytes --] On Wed, 2003-02-12 at 18:24, Chris Wedgwood wrote: > On Wed, Feb 12, 2003 at 06:02:58PM -0500, Bruno Diniz de Paula wrote: > > > ext2. > > are you able to test with another fs? (reiserfs and XFS also support > O_DIRECT) Unfortunately not, I just have ext2 partitions here... > > > read(3, "", 4096) = 0 > > odd... I'm not sure why you get this > > i tested locally here and it works as expected ... my test code is > appended. But your code doesn't use O_DIRECT: if ((h = open("test", O_RDONLY)) < 0) Let me know whether including O_DIRECT the test worked. Bruno. > > > --cw -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 23:33 ` Bruno Diniz de Paula @ 2003-02-12 23:38 ` Chris Wedgwood 2003-02-12 23:49 ` Bruno Diniz de Paula 0 siblings, 1 reply; 19+ messages in thread From: Chris Wedgwood @ 2003-02-12 23:38 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 507 bytes --] On Wed, Feb 12, 2003 at 06:33:23PM -0500, Bruno Diniz de Paula wrote: > But your code doesn't use O_DIRECT: Sorry, you need to edit it (see my previous email). A better version (appended) gives the following results. cw:3@tapu(cw)$ cp od.c test cw:3@tapu(cw)$ gcc -Wall od.c cw:3@tapu(cw)$ ./a.out read 503 bytes read 0 bytes > Let me know whether including O_DIRECT the test worked. Seems to. I get 0 the 2nd time about, presumably this is EOF but arguably it should return something else. --cw [-- Attachment #2: od.c --] [-- Type: text/x-csrc, Size: 503 bytes --] #define _GNU_SOURCE #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <stdlib.h> #include <stdio.h> int main() { int h; int ps; char *buf; ssize_t n; ps = getpagesize(); if (!(buf = valloc(ps))) return 1; if ((h = open("test", O_RDONLY|O_DIRECT)) < 0) return 1; do { n = read(h, buf, ps); if (n == -1) { perror("read"); break; } printf("read %d bytes\n", n); } while(n); close(h); return 0; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 23:38 ` Chris Wedgwood @ 2003-02-12 23:49 ` Bruno Diniz de Paula 2003-02-12 23:51 ` Chris Wedgwood 0 siblings, 1 reply; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-12 23:49 UTC (permalink / raw) To: Chris Wedgwood; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 472 bytes --] On Wed, 2003-02-12 at 18:38, Chris Wedgwood wrote: > Seems to. I get 0 the 2nd time about, presumably this is EOF but > arguably it should return something else. It didn't work for me. See the output: diniz@urca:/var/tmp$ gcc -Wall od.c diniz@urca:/var/tmp$ cp od.c test diniz@urca:/var/tmp$ ./a.out read 0 bytes diniz@urca:/var/tmp$ What is your partition type? ext2? Bruno. -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 23:49 ` Bruno Diniz de Paula @ 2003-02-12 23:51 ` Chris Wedgwood [not found] ` <1045094589.4767.106.camel@urca.rutgers.edu> 0 siblings, 1 reply; 19+ messages in thread From: Chris Wedgwood @ 2003-02-12 23:51 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: linux-kernel On Wed, Feb 12, 2003 at 06:49:35PM -0500, Bruno Diniz de Paula wrote: > What is your partition type? ext2? XFS. I can't test e2fs right now as my test machine is running 2.5.60 where it fails just as it does for you. I think both use generic_direct_IO or whatever it's called so maybe I'll have a poke in there as to why 2.5.x is failing. --cw ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1045094589.4767.106.camel@urca.rutgers.edu>]
[parent not found: <20030213001302.GA13833@f00f.org>]
* Re: O_DIRECT foolish question [not found] ` <20030213001302.GA13833@f00f.org> @ 2003-02-13 0:36 ` Bruno Diniz de Paula 2003-02-13 5:12 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-13 0:36 UTC (permalink / raw) To: Chris Wedgwood; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1446 bytes --] On Wed, 2003-02-12 at 19:13, Chris Wedgwood wrote: > If I had to guess, write should work more or less the same as reads > (ie. I should be able to write aligned-but-smaller-than-page-sized > blocks to the end of files). > > Testing this however shows this is *not* the case. This is not the case, I have also tested here and the file written has n*block_size always. The problem with writing is that we can't sign to the kernel that the actual data has finished and from that point on it should zero-fill the bytes. And what is worse, the information about the actual size is lost, since the write syscall will store what is passed on the 3rd argument in the inode (field st_size of stat). This means that after writing using O_DIRECT we can't read data correctly anymore. The exception is when we write together with the data information about the actual size and process disregarding information from stat, for instance. Well, I am sure I am completely wrong because this doesn't make any sense for me. Someone that has already dealt with this and can bring a light to the discussion? Thanks, Bruno. > > Now, this *might* actually be the right thing to do ... if we allow > 'small writes' how do we deal with larger writes once the file-write > position is messed up? > > Heh... tricky stuff. Though required. > > > > --cw -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-13 0:36 ` Bruno Diniz de Paula @ 2003-02-13 5:12 ` Andrew Morton 2003-02-13 15:22 ` Bruno Diniz de Paula 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2003-02-13 5:12 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: cw, linux-kernel Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote: > > On Wed, 2003-02-12 at 19:13, Chris Wedgwood wrote: > > If I had to guess, write should work more or less the same as reads > > (ie. I should be able to write aligned-but-smaller-than-page-sized > > blocks to the end of files). > > > > Testing this however shows this is *not* the case. > > This is not the case, I have also tested here and the file written has > n*block_size always. The problem with writing is that we can't sign to > the kernel that the actual data has finished and from that point on it > should zero-fill the bytes. And what is worse, the information about the > actual size is lost, since the write syscall will store what is passed > on the 3rd argument in the inode (field st_size of stat). This means > that after writing using O_DIRECT we can't read data correctly anymore. > The exception is when we write together with the data information about > the actual size and process disregarding information from stat, for > instance. > > Well, I am sure I am completely wrong because this doesn't make any > sense for me. Someone that has already dealt with this and can bring a > light to the discussion? > For writes, I don't think it is reasonable for the kernel to be have to handle byte-granular appends. O_DIRECT is different. For this case the application should ftruncate the file back to the desired size prior to closing it. For the short reads at EOF, the 2.4 kernel refuses to read anything, and returns zero. The 2.5 kernel will return -EINVAL, which is better behaviour (shouldn't make it just look like the file is shorter than it really is). The ideal behaviour is that which I mistakenly described previously: we should fill with zeroes and return the partial result. I'll look at converting 2.5 to do that. As long as the changes are small - the direct-io code does a ton of stuff, is complex, is not tested a lot and breakage tends to be subtle. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-13 5:12 ` Andrew Morton @ 2003-02-13 15:22 ` Bruno Diniz de Paula 2003-02-13 17:31 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-13 15:22 UTC (permalink / raw) To: Andrew Morton; +Cc: cw, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2484 bytes --] Thanks, Andrew. So, no chances of getting this working correctly on 2.4 kernel for now (I mean, reading files with size != n*block_size), and I'd better give up on this... Is it the case, or you think there is still something to do to get this working on ext2 and 2.4 kernel? Bruno. On Thu, 2003-02-13 at 00:12, Andrew Morton wrote: > Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote: > > > > On Wed, 2003-02-12 at 19:13, Chris Wedgwood wrote: > > > If I had to guess, write should work more or less the same as reads > > > (ie. I should be able to write aligned-but-smaller-than-page-sized > > > blocks to the end of files). > > > > > > Testing this however shows this is *not* the case. > > > > This is not the case, I have also tested here and the file written has > > n*block_size always. The problem with writing is that we can't sign to > > the kernel that the actual data has finished and from that point on it > > should zero-fill the bytes. And what is worse, the information about the > > actual size is lost, since the write syscall will store what is passed > > on the 3rd argument in the inode (field st_size of stat). This means > > that after writing using O_DIRECT we can't read data correctly anymore. > > The exception is when we write together with the data information about > > the actual size and process disregarding information from stat, for > > instance. > > > > Well, I am sure I am completely wrong because this doesn't make any > > sense for me. Someone that has already dealt with this and can bring a > > light to the discussion? > > > > For writes, I don't think it is reasonable for the kernel to be have to > handle byte-granular appends. O_DIRECT is different. For this case the > application should ftruncate the file back to the desired size prior to > closing it. > > For the short reads at EOF, the 2.4 kernel refuses to read anything, and > returns zero. The 2.5 kernel will return -EINVAL, which is better behaviour > (shouldn't make it just look like the file is shorter than it really is). > > The ideal behaviour is that which I mistakenly described previously: we > should fill with zeroes and return the partial result. I'll look at > converting 2.5 to do that. As long as the changes are small - the direct-io > code does a ton of stuff, is complex, is not tested a lot and breakage tends > to be subtle. -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-13 15:22 ` Bruno Diniz de Paula @ 2003-02-13 17:31 ` Andrew Morton 2003-02-13 22:45 ` Bruno Diniz de Paula 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2003-02-13 17:31 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: cw, linux-kernel Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote: > > Thanks, Andrew. So, no chances of getting this working correctly on 2.4 > kernel for now (I mean, reading files with size != n*block_size), and > I'd better give up on this... Is it the case, or you think there is > still something to do to get this working on ext2 and 2.4 kernel? > Oh I think we can probably fix this up. Can you test this diff? diff -puN fs/buffer.c~o_direct-length-fix fs/buffer.c --- 24/fs/buffer.c~o_direct-length-fix 2003-02-13 09:23:34.000000000 -0800 +++ 24-akpm/fs/buffer.c 2003-02-13 09:24:39.000000000 -0800 @@ -2107,7 +2107,7 @@ int generic_direct_IO(int rw, struct ino int length; length = iobuf->length; - nr_blocks = length / blocksize; + nr_blocks = (length + blocksize - 1) / blocksize; /* build the blocklist */ for (i = 0; i < nr_blocks; i++, blocknr++) { struct buffer_head bh; @@ -2148,6 +2148,10 @@ int generic_direct_IO(int rw, struct ino retval = brw_kiovec(rw, 1, &iobuf, inode->i_dev, iobuf->blocks, blocksize); /* restore orig length */ iobuf->length = length; + + /* Return correct value for reads at eof */ + if (retval > 0 && retval > length) + retval = length; out: return retval; _ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-13 17:31 ` Andrew Morton @ 2003-02-13 22:45 ` Bruno Diniz de Paula 0 siblings, 0 replies; 19+ messages in thread From: Bruno Diniz de Paula @ 2003-02-13 22:45 UTC (permalink / raw) To: Andrew Morton; +Cc: cw, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1942 bytes --] Hi Andrew, it worked perfectly on my box. Now I am going to try in my experiments environment and I'll let you know if everything was ok. Thanks a lot, Bruno. PS: BTW, is this patch going to be added to 2.4 kernel? On Thu, 2003-02-13 at 12:31, Andrew Morton wrote: > Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote: > > > > Thanks, Andrew. So, no chances of getting this working correctly on 2.4 > > kernel for now (I mean, reading files with size != n*block_size), and > > I'd better give up on this... Is it the case, or you think there is > > still something to do to get this working on ext2 and 2.4 kernel? > > > > Oh I think we can probably fix this up. Can you test this diff? > > > diff -puN fs/buffer.c~o_direct-length-fix fs/buffer.c > --- 24/fs/buffer.c~o_direct-length-fix 2003-02-13 09:23:34.000000000 -0800 > +++ 24-akpm/fs/buffer.c 2003-02-13 09:24:39.000000000 -0800 > @@ -2107,7 +2107,7 @@ int generic_direct_IO(int rw, struct ino > int length; > > length = iobuf->length; > - nr_blocks = length / blocksize; > + nr_blocks = (length + blocksize - 1) / blocksize; > /* build the blocklist */ > for (i = 0; i < nr_blocks; i++, blocknr++) { > struct buffer_head bh; > @@ -2148,6 +2148,10 @@ int generic_direct_IO(int rw, struct ino > retval = brw_kiovec(rw, 1, &iobuf, inode->i_dev, iobuf->blocks, blocksize); > /* restore orig length */ > iobuf->length = length; > + > + /* Return correct value for reads at eof */ > + if (retval > 0 && retval > length) > + retval = length; > out: > > return retval; > > _ > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Bruno Diniz de Paula <diniz@cs.rutgers.edu> Rutgers University [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 23:24 ` Chris Wedgwood 2003-02-12 23:33 ` Bruno Diniz de Paula @ 2003-02-12 23:33 ` Chris Wedgwood 1 sibling, 0 replies; 19+ messages in thread From: Chris Wedgwood @ 2003-02-12 23:33 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: linux-kernel On Wed, Feb 12, 2003 at 03:24:43PM -0800, Chris Wedgwood wrote: > i tested locally here and it works as expected ... my test code is > appended. btw, edit the args to open for test with as i was messing about before i sent this --cw ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: O_DIRECT foolish question 2003-02-12 21:19 O_DIRECT foolish question Bruno Diniz de Paula 2003-02-12 22:03 ` Andrew Morton @ 2003-02-12 22:07 ` Chris Wedgwood 1 sibling, 0 replies; 19+ messages in thread From: Chris Wedgwood @ 2003-02-12 22:07 UTC (permalink / raw) To: Bruno Diniz de Paula; +Cc: linux-kernel On Wed, Feb 12, 2003 at 04:19:24PM -0500, Bruno Diniz de Paula wrote: > I am trying to use O_DIRECT to read ordinary files and read syscall > always returns 0, unless when the file size equals the fs block > size. Sounds correct. > Is it true that I can only use O_DIRECT when the size of the file > written in the inode is a multiple of block size? You usually can only do O_DIRECT reads/writes in multiples of the block size (or in some cases multiples of 512-bytes, but I'm not sure of that code is still about though). It depends on the filesystem to some extent. --cw ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2003-02-13 22:35 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-12 21:19 O_DIRECT foolish question Bruno Diniz de Paula
2003-02-12 22:03 ` Andrew Morton
2003-02-12 22:29 ` Bruno Diniz de Paula
2003-02-12 22:42 ` Chris Wedgwood
2003-02-12 23:02 ` Bruno Diniz de Paula
2003-02-12 23:22 ` Bruno Diniz de Paula
2003-02-13 1:46 ` Randy.Dunlap
2003-02-12 23:24 ` Chris Wedgwood
2003-02-12 23:33 ` Bruno Diniz de Paula
2003-02-12 23:38 ` Chris Wedgwood
2003-02-12 23:49 ` Bruno Diniz de Paula
2003-02-12 23:51 ` Chris Wedgwood
[not found] ` <1045094589.4767.106.camel@urca.rutgers.edu>
[not found] ` <20030213001302.GA13833@f00f.org>
2003-02-13 0:36 ` Bruno Diniz de Paula
2003-02-13 5:12 ` Andrew Morton
2003-02-13 15:22 ` Bruno Diniz de Paula
2003-02-13 17:31 ` Andrew Morton
2003-02-13 22:45 ` Bruno Diniz de Paula
2003-02-12 23:33 ` Chris Wedgwood
2003-02-12 22:07 ` Chris Wedgwood
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.