* TAKE 972756 - Implement fallocate.
@ 2007-11-02 2:43 David Chinner
2007-11-05 18:42 ` Bhagi rathi
2007-12-10 17:50 ` Eric Sandeen
0 siblings, 2 replies; 10+ messages in thread
From: David Chinner @ 2007-11-02 2:43 UTC (permalink / raw)
To: sgi.bugs.xfs; +Cc: xfs
Implement fallocate.
Implement the new generic callout for file preallocation.
Atomically change the file size if requested.
Date: Fri Nov 2 13:42:52 AEDT 2007
Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs
Inspected by: hch@infradead.org
The following file(s) were checked into:
longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb
Modid: xfs-linux-melb:xfs-kern:30009a
fs/xfs/linux-2.6/xfs_iops.c - 1.268 - changed
http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h
- implement ->fallocate()
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-02 2:43 TAKE 972756 - Implement fallocate David Chinner
@ 2007-11-05 18:42 ` Bhagi rathi
2007-11-06 0:12 ` David Chinner
2007-12-10 17:50 ` Eric Sandeen
1 sibling, 1 reply; 10+ messages in thread
From: Bhagi rathi @ 2007-11-05 18:42 UTC (permalink / raw)
To: David Chinner; +Cc: sgi.bugs.xfs, xfs
David, What happens if offset is not aligned to 4k? Let's say we have a file
whose size is
not aligned to 4k. It could have blocks beyond the eof which haven't been
zero'ed out.
fallocate may increase the size and we can read garbage from disk-block if
it hasn't
been zero'ed out.
-Thanks,
Bhagi.
On 11/2/07, David Chinner <dgc@sgi.com> wrote:
>
> Implement fallocate.
>
> Implement the new generic callout for file preallocation.
> Atomically change the file size if requested.
>
>
> Date: Fri Nov 2 13:42:52 AEDT 2007
> Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs
> Inspected by: hch@infradead.org
>
> The following file(s) were checked into:
> longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb
>
>
> Modid: xfs-linux-melb:xfs-kern:30009a
> fs/xfs/linux-2.6/xfs_iops.c - 1.268 - changed
http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/> linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h
>
> http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h
> - implement ->fallocate()
>
>
>
>
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-05 18:42 ` Bhagi rathi
@ 2007-11-06 0:12 ` David Chinner
2007-11-06 17:27 ` Bhagi rathi
0 siblings, 1 reply; 10+ messages in thread
From: David Chinner @ 2007-11-06 0:12 UTC (permalink / raw)
To: Bhagi rathi; +Cc: David Chinner, xfs
On Tue, Nov 06, 2007 at 12:12:52AM +0530, Bhagi rathi wrote:
> David, What happens if offset is not aligned to 4k? Let's say we have a file
> whose size is
> not aligned to 4k. It could have blocks beyond the eof which haven't been
> zero'ed out.
No it won't. They are *preallocated* blocks, which by definition are
zero-filled. Preallocated blocks are marked as unwritten on disk, so
it is known that they contain zeros, even if they lie beyond EOF.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-06 0:12 ` David Chinner
@ 2007-11-06 17:27 ` Bhagi rathi
2007-11-06 19:04 ` Eric Sandeen
2007-11-06 20:41 ` David Chinner
0 siblings, 2 replies; 10+ messages in thread
From: Bhagi rathi @ 2007-11-06 17:27 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
File is of size 1k. A 4k block is allocated as file-system block size is
4k.
Preallocation happened from 1k to 256k. Now, it looks to me that we have
un-written extents from 4k to 256k. There is no guarantee that data from 1k
to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read,
we can get garbage from 1k to 4k and all zero'es from 4k to 256k
Is the expectation here is application should take the responsibility of
zero'ing
data? I still need to through fallocate requirements.
-Thanks,
Bhagi.
On 11/6/07, David Chinner <dgc@sgi.com> wrote:
>
> On Tue, Nov 06, 2007 at 12:12:52AM +0530, Bhagi rathi wrote:
> > David, What happens if offset is not aligned to 4k? Let's say we have a
> file
> > whose size is
> > not aligned to 4k. It could have blocks beyond the eof which haven't
> been
> > zero'ed out.
>
> No it won't. They are *preallocated* blocks, which by definition are
> zero-filled. Preallocated blocks are marked as unwritten on disk, so
> it is known that they contain zeros, even if they lie beyond EOF.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-06 17:27 ` Bhagi rathi
@ 2007-11-06 19:04 ` Eric Sandeen
2007-11-06 20:41 ` David Chinner
1 sibling, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2007-11-06 19:04 UTC (permalink / raw)
To: Bhagi rathi; +Cc: David Chinner, xfs
Bhagi rathi wrote:
> File is of size 1k. A 4k block is allocated as file-system block size is
> 4k.
> Preallocation happened from 1k to 256k. Now, it looks to me that we have
> un-written extents from 4k to 256k. There is no guarantee that data from 1k
> to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read,
> we can get garbage from 1k to 4k and all zero'es from 4k to 256k
You've tested this and found it to be true?
-Eric
> Is the expectation here is application should take the responsibility of
> zero'ing
> data? I still need to through fallocate requirements.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-06 17:27 ` Bhagi rathi
2007-11-06 19:04 ` Eric Sandeen
@ 2007-11-06 20:41 ` David Chinner
2007-11-06 22:38 ` Nathan Scott
1 sibling, 1 reply; 10+ messages in thread
From: David Chinner @ 2007-11-06 20:41 UTC (permalink / raw)
To: Bhagi rathi; +Cc: David Chinner, xfs
On Tue, Nov 06, 2007 at 10:57:03PM +0530, Bhagi rathi wrote:
> File is of size 1k. A 4k block is allocated as file-system block size is
> 4k.
> Preallocation happened from 1k to 256k. Now, it looks to me that we have
> un-written extents from 4k to 256k. There is no guarantee that data from 1k
> to 4k is all zero'es. Fallocate is updating size. Hence on subsequent read,
> we can get garbage from 1k to 4k and all zero'es from 4k to 256k
# rm /mnt/test/fred
# xfs_io -f -c "pwrite 0 1024" -c "fsync" -c "falloc_allocsp 0 262144" -c "bmap -vp" /mnt/test/fred
wrote 1024/1024 bytes at offset 0
1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec)
/mnt/test/fred:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 14520..14527 0 (14520..14527) 8 00000
1: [8..511]: 345688..346191 0 (345688..346191) 504 10000
# dd if=/mnt/test/fred bs=4k count=1 |od -Ax
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.004566 seconds, 897 kB/s
000000 146715 146715 146715 146715 146715 146715 146715 146715
*
000400 000000 000000 000000 000000 000000 000000 000000 000000
*
001000
Only 1k of modified data, then 3k of zeros, then a bunch of unwritten extents
out to EOF.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-06 20:41 ` David Chinner
@ 2007-11-06 22:38 ` Nathan Scott
2007-11-07 5:42 ` Bhagi rathi
0 siblings, 1 reply; 10+ messages in thread
From: Nathan Scott @ 2007-11-06 22:38 UTC (permalink / raw)
To: Bhagi rathi, David Chinner; +Cc: xfs
On Wed, 2007-11-07 at 07:41 +1100, David Chinner wrote:
> > Preallocation happened from 1k to 256k. Now, it looks to me that we
> have
> > un-written extents from 4k to 256k. There is no guarantee that data
> from 1k
> > to 4k is all zero'es.
That guarantee does exist - when the initial 1K block write is done, the
end of the block is zeroed (by the kernel write path). This is always
done (guaranteed) and is required independently to unwritten extents.
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-06 22:38 ` Nathan Scott
@ 2007-11-07 5:42 ` Bhagi rathi
2007-11-07 9:35 ` nscott
0 siblings, 1 reply; 10+ messages in thread
From: Bhagi rathi @ 2007-11-07 5:42 UTC (permalink / raw)
To: nscott; +Cc: David Chinner, xfs
Since size log change and data I/O are not binded, it is always possible
that size can reach to the
disk before I/O reaching to the disk. Also, the other problem is because of
speculative allocation.
A write-back allocation can leady to allocation of delayed extents into real
and gets pruned only
close of the file. Before that, we get fallocate, it allocates the exents,
but the extents residing
because of delayed allocation write-back will not have zero'ed content.
Conceptually, fallocate if it intends to change size, it is no way different
from size extending write.
We do xfs_zero_eof for write and not in this case. Probably, I am missing
the context of usage of
fallocate if it has some semantics over-loaded.
-Thanks,
Bhagi.
On 11/7/07, Nathan Scott <nscott@aconex.com> wrote:
>
> On Wed, 2007-11-07 at 07:41 +1100, David Chinner wrote:
> > > Preallocation happened from 1k to 256k. Now, it looks to me that we
> > have
> > > un-written extents from 4k to 256k. There is no guarantee that data
> > from 1k
> > > to 4k is all zero'es.
>
> That guarantee does exist - when the initial 1K block write is done, the
> end of the block is zeroed (by the kernel write path). This is always
> done (guaranteed) and is required independently to unwritten extents.
>
> cheers.
>
> --
> Nathan
>
>
[[HTML alternate version deleted]]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-07 5:42 ` Bhagi rathi
@ 2007-11-07 9:35 ` nscott
0 siblings, 0 replies; 10+ messages in thread
From: nscott @ 2007-11-07 9:35 UTC (permalink / raw)
To: Bhagi rathi; +Cc: David Chinner, xfs
> Since size log change and data I/O are not binded, it is always possible
> that size can reach to the
> disk before I/O reaching to the disk.
Not clear what that has to do with whether partial blocks are zeroed or not?
Can you give a specific series of steps that would demonstrate a problem?
(preferably with a test case)
> Also, the other problem is because
> of
> speculative allocation.
> A write-back allocation can leady to allocation of delayed extents into
> real
> and gets pruned only
> close of the file.
> Before that, we get fallocate, it allocates the exents,
> but the extents residing
> because of delayed allocation write-back will not have zero'ed content.
Again, I think a test case demonstrating the problem would go a long way
to helping explain the issue.
The preallocation code and ioctl interface have been in XFS forever on
Linux - are you reporting problems you've actually observed here, or
are these rather "potential issues" that you foresee from code analysis?
cheers.
--
Nathan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: TAKE 972756 - Implement fallocate.
2007-11-02 2:43 TAKE 972756 - Implement fallocate David Chinner
2007-11-05 18:42 ` Bhagi rathi
@ 2007-12-10 17:50 ` Eric Sandeen
1 sibling, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2007-12-10 17:50 UTC (permalink / raw)
To: David Chinner; +Cc: xfs
David Chinner wrote:
> Implement fallocate.
>
> Implement the new generic callout for file preallocation.
> Atomically change the file size if requested.
>
>
> Date: Fri Nov 2 13:42:52 AEDT 2007
> Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs
> Inspected by: hch@infradead.org
>
> The following file(s) were checked into:
> longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb
>
>
> Modid: xfs-linux-melb:xfs-kern:30009a
> fs/xfs/linux-2.6/xfs_iops.c - 1.268 - changed
http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/> linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h
> http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-linux/linux-2.6/xfs_iops.c.diff?r1=text&tr1=1.268&r2=text&tr2=1.267&f=h
> - implement ->fallocate()
>
>
>
Is this ever going to go upstream...?
-eric
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-12-10 23:49 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-02 2:43 TAKE 972756 - Implement fallocate David Chinner
2007-11-05 18:42 ` Bhagi rathi
2007-11-06 0:12 ` David Chinner
2007-11-06 17:27 ` Bhagi rathi
2007-11-06 19:04 ` Eric Sandeen
2007-11-06 20:41 ` David Chinner
2007-11-06 22:38 ` Nathan Scott
2007-11-07 5:42 ` Bhagi rathi
2007-11-07 9:35 ` nscott
2007-12-10 17:50 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox