* TAKE 972756 - Implement fallocate. @ 2007-11-02 2:43 David Chinner 2007-11-05 18:42 ` Bhagi rathi 2007-12-10 17:50 ` Eric Sandeen 0 siblings, 2 replies; 10+ messages in thread From: David Chinner @ 2007-11-02 2:43 UTC (permalink / raw) To: sgi.bugs.xfs; +Cc: xfs Implement fallocate. Implement the new generic callout for file preallocation. Atomically change the file size if requested. Date: Fri Nov 2 13:42:52 AEDT 2007 Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs Inspected by: hch@infradead.org The following file(s) were checked into: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb Modid: xfs-linux-melb:xfs-kern:30009a fs/xfs/linux-2.6/xfs_iops.c - 1.268 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h - implement ->fallocate() ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-02 2:43 TAKE 972756 - Implement fallocate David Chinner @ 2007-11-05 18:42 ` Bhagi rathi 2007-11-06 0:12 ` David Chinner 2007-12-10 17:50 ` Eric Sandeen 1 sibling, 1 reply; 10+ messages in thread From: Bhagi rathi @ 2007-11-05 18:42 UTC (permalink / raw) To: David Chinner; +Cc: sgi.bugs.xfs, xfs David, What happens if offset is not aligned to 4k? Let's say we have a file whose size is not aligned to 4k. It could have blocks beyond the eof which haven't been zero'ed out. fallocate may increase the size and we can read garbage from disk-block if it hasn't been zero'ed out. -Thanks, Bhagi. On 11/2/07, David Chinner <dgc@sgi.com> wrote: > > Implement fallocate. > > Implement the new generic callout for file preallocation. > Atomically change the file size if requested. > > > Date: Fri Nov 2 13:42:52 AEDT 2007 > Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs > Inspected by: hch@infradead.org > > The following file(s) were checked into: > longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb > > > Modid: xfs-linux-melb:xfs-kern:30009a > fs/xfs/linux-2.6/xfs_iops.c - 1.268 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/> linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h > > http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h > - implement ->fallocate() > > > > [[HTML alternate version deleted]] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-05 18:42 ` Bhagi rathi @ 2007-11-06 0:12 ` David Chinner 2007-11-06 17:27 ` Bhagi rathi 0 siblings, 1 reply; 10+ messages in thread From: David Chinner @ 2007-11-06 0:12 UTC (permalink / raw) To: Bhagi rathi; +Cc: David Chinner, xfs On Tue, Nov 06, 2007 at 12:12:52AM +0530, Bhagi rathi wrote: > David, What happens if offset is not aligned to 4k? Let's say we have a file > whose size is > not aligned to 4k. It could have blocks beyond the eof which haven't been > zero'ed out. No it won't. They are *preallocated* blocks, which by definition are zero-filled. Preallocated blocks are marked as unwritten on disk, so it is known that they contain zeros, even if they lie beyond EOF. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-06 0:12 ` David Chinner @ 2007-11-06 17:27 ` Bhagi rathi 2007-11-06 19:04 ` Eric Sandeen 2007-11-06 20:41 ` David Chinner 0 siblings, 2 replies; 10+ messages in thread From: Bhagi rathi @ 2007-11-06 17:27 UTC (permalink / raw) To: David Chinner; +Cc: xfs File is of size 1k. A 4k block is allocated as file-system block size is 4k. Preallocation happened from 1k to 256k. Now, it looks to me that we have un-written extents from 4k to 256k. There is no guarantee that data from 1k to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read, we can get garbage from 1k to 4k and all zero'es from 4k to 256k Is the expectation here is application should take the responsibility of zero'ing data? I still need to through fallocate requirements. -Thanks, Bhagi. On 11/6/07, David Chinner <dgc@sgi.com> wrote: > > On Tue, Nov 06, 2007 at 12:12:52AM +0530, Bhagi rathi wrote: > > David, What happens if offset is not aligned to 4k? Let's say we have a > file > > whose size is > > not aligned to 4k. It could have blocks beyond the eof which haven't > been > > zero'ed out. > > No it won't. They are *preallocated* blocks, which by definition are > zero-filled. Preallocated blocks are marked as unwritten on disk, so > it is known that they contain zeros, even if they lie beyond EOF. > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group > [[HTML alternate version deleted]] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-06 17:27 ` Bhagi rathi @ 2007-11-06 19:04 ` Eric Sandeen 2007-11-06 20:41 ` David Chinner 1 sibling, 0 replies; 10+ messages in thread From: Eric Sandeen @ 2007-11-06 19:04 UTC (permalink / raw) To: Bhagi rathi; +Cc: David Chinner, xfs Bhagi rathi wrote: > File is of size 1k. A 4k block is allocated as file-system block size is > 4k. > Preallocation happened from 1k to 256k. Now, it looks to me that we have > un-written extents from 4k to 256k. There is no guarantee that data from 1k > to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read, > we can get garbage from 1k to 4k and all zero'es from 4k to 256k You've tested this and found it to be true? -Eric > Is the expectation here is application should take the responsibility of > zero'ing > data? I still need to through fallocate requirements. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-06 17:27 ` Bhagi rathi 2007-11-06 19:04 ` Eric Sandeen @ 2007-11-06 20:41 ` David Chinner 2007-11-06 22:38 ` Nathan Scott 1 sibling, 1 reply; 10+ messages in thread From: David Chinner @ 2007-11-06 20:41 UTC (permalink / raw) To: Bhagi rathi; +Cc: David Chinner, xfs On Tue, Nov 06, 2007 at 10:57:03PM +0530, Bhagi rathi wrote: > File is of size 1k. A 4k block is allocated as file-system block size is > 4k. > Preallocation happened from 1k to 256k. Now, it looks to me that we have > un-written extents from 4k to 256k. There is no guarantee that data from 1k > to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read, > we can get garbage from 1k to 4k and all zero'es from 4k to 256k # rm /mnt/test/fred # xfs_io -f -c "pwrite 0 1024" -c "fsync" -c "falloc_allocsp 0 262144" -c "bmap -vp" /mnt/test/fred wrote 1024/1024 bytes at offset 0 1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec) /mnt/test/fred: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 14520..14527 0 (14520..14527) 8 00000 1: [8..511]: 345688..346191 0 (345688..346191) 504 10000 # dd if=/mnt/test/fred bs=4k count=1 |od -Ax 1+0 records in 1+0 records out 4096 bytes (4.1 kB) copied, 0.004566 seconds, 897 kB/s 000000 146715 146715 146715 146715 146715 146715 146715 146715 * 000400 000000 000000 000000 000000 000000 000000 000000 000000 * 001000 Only 1k of modified data, then 3k of zeros, then a bunch of unwritten extents out to EOF. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-06 20:41 ` David Chinner @ 2007-11-06 22:38 ` Nathan Scott 2007-11-07 5:42 ` Bhagi rathi 0 siblings, 1 reply; 10+ messages in thread From: Nathan Scott @ 2007-11-06 22:38 UTC (permalink / raw) To: Bhagi rathi, David Chinner; +Cc: xfs On Wed, 2007-11-07 at 07:41 +1100, David Chinner wrote: > > Preallocation happened from 1k to 256k. Now, it looks to me that we > have > > un-written extents from 4k to 256k. There is no guarantee that data > from 1k > > to 4k is all zero'es. That guarantee does exist - when the initial 1K block write is done, the end of the block is zeroed (by the kernel write path). This is always done (guaranteed) and is required independently to unwritten extents. cheers. -- Nathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-06 22:38 ` Nathan Scott @ 2007-11-07 5:42 ` Bhagi rathi 2007-11-07 9:35 ` nscott 0 siblings, 1 reply; 10+ messages in thread From: Bhagi rathi @ 2007-11-07 5:42 UTC (permalink / raw) To: nscott; +Cc: David Chinner, xfs Since size log change and data I/O are not binded, it is always possible that size can reach to the disk before I/O reaching to the disk. Also, the other problem is because of speculative allocation. A write-back allocation can leady to allocation of delayed extents into real and gets pruned only close of the file. Before that, we get fallocate, it allocates the exents, but the extents residing because of delayed allocation write-back will not have zero'ed content. Conceptually, fallocate if it intends to change size, it is no way different from size extending write. We do xfs_zero_eof for write and not in this case. Probably, I am missing the context of usage of fallocate if it has some semantics over-loaded. -Thanks, Bhagi. On 11/7/07, Nathan Scott <nscott@aconex.com> wrote: > > On Wed, 2007-11-07 at 07:41 +1100, David Chinner wrote: > > > Preallocation happened from 1k to 256k. Now, it looks to me that we > > have > > > un-written extents from 4k to 256k. There is no guarantee that data > > from 1k > > > to 4k is all zero'es. > > That guarantee does exist - when the initial 1K block write is done, the > end of the block is zeroed (by the kernel write path). This is always > done (guaranteed) and is required independently to unwritten extents. > > cheers. > > -- > Nathan > > [[HTML alternate version deleted]] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-07 5:42 ` Bhagi rathi @ 2007-11-07 9:35 ` nscott 0 siblings, 0 replies; 10+ messages in thread From: nscott @ 2007-11-07 9:35 UTC (permalink / raw) To: Bhagi rathi; +Cc: David Chinner, xfs > Since size log change and data I/O are not binded, it is always possible > that size can reach to the > disk before I/O reaching to the disk. Not clear what that has to do with whether partial blocks are zeroed or not? Can you give a specific series of steps that would demonstrate a problem? (preferably with a test case) > Also, the other problem is because > of > speculative allocation. > A write-back allocation can leady to allocation of delayed extents into > real > and gets pruned only > close of the file. > Before that, we get fallocate, it allocates the exents, > but the extents residing > because of delayed allocation write-back will not have zero'ed content. Again, I think a test case demonstrating the problem would go a long way to helping explain the issue. The preallocation code and ioctl interface have been in XFS forever on Linux - are you reporting problems you've actually observed here, or are these rather "potential issues" that you foresee from code analysis? cheers. -- Nathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate. 2007-11-02 2:43 TAKE 972756 - Implement fallocate David Chinner 2007-11-05 18:42 ` Bhagi rathi @ 2007-12-10 17:50 ` Eric Sandeen 1 sibling, 0 replies; 10+ messages in thread From: Eric Sandeen @ 2007-12-10 17:50 UTC (permalink / raw) To: David Chinner; +Cc: xfs David Chinner wrote: > Implement fallocate. > > Implement the new generic callout for file preallocation. > Atomically change the file size if requested. > > > Date: Fri Nov 2 13:42:52 AEDT 2007 > Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs > Inspected by: hch@infradead.org > > The following file(s) were checked into: > longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb > > > Modid: xfs-linux-melb:xfs-kern:30009a > fs/xfs/linux-2.6/xfs_iops.c - 1.268 - changed http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/> linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h > http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h > - implement ->fallocate() > > > Is this ever going to go upstream...? -eric ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-12-10 23:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-02 2:43 TAKE 972756 - Implement fallocate David Chinner 2007-11-05 18:42 ` Bhagi rathi 2007-11-06 0:12 ` David Chinner 2007-11-06 17:27 ` Bhagi rathi 2007-11-06 19:04 ` Eric Sandeen 2007-11-06 20:41 ` David Chinner 2007-11-06 22:38 ` Nathan Scott 2007-11-07 5:42 ` Bhagi rathi 2007-11-07 9:35 ` nscott 2007-12-10 17:50 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox