* ReiserFS Maximum file size (in practice)
@ 2004-05-18 20:40 Jeff Mahoney
2004-05-19 0:45 ` Chris Mason
2004-05-19 5:39 ` Hans Reiser
0 siblings, 2 replies; 7+ messages in thread
From: Jeff Mahoney @ 2004-05-18 20:40 UTC (permalink / raw)
To: Reiserfs mail-list
[-- Attachment #1: Type: text/plain, Size: 764 bytes --]
Hey all -
The ReiserFS FAQ that we quote and point people to when they ask
questions about limits in ReiserFS states that the maxmimum file size
for a reiserfs v3 filesystem is 2^60-1. However, the actual limits, in
practice, are far less.
I tried to create a 3 TB sparse file, and ended up getting told it was
too large. 2 TB was too large also, just under 2 TB was ok.
This is a result of super->s_maxbytes = (512LL << 32) - s->s_blocksize;,
in fs/reiserfs/super.c, which is set so that i_blocks isn't overflowed.
Other filesystems that have the ability to cross the 2 TB limit on file
sizes simply ignore the limit and allow i_blocks to wrap. There's really
no reason we can't do the same.
The patch is attached.
-Jeff
--
Jeff Mahoney
SuSE Labs
[-- Attachment #2: reiserfs-large-file.diff --]
[-- Type: text/plain, Size: 402 bytes --]
--- linux-2.6.5/fs/reiserfs/super.c 2004-05-14 15:32:49.000000000 -0400
+++ linux-2.6.5.fix/fs/reiserfs/super.c 2004-05-18 12:07:25.000000000 -0400
@@ -1204,7 +1204,7 @@
/* new format is limited by the 32 bit wide i_blocks field, want to
** be one full block below that.
*/
- s->s_maxbytes = (512LL << 32) - s->s_blocksize ;
+ s->s_maxbytes = MAX_LFS_FILESIZE;
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ReiserFS Maximum file size (in practice)
2004-05-18 20:40 ReiserFS Maximum file size (in practice) Jeff Mahoney
@ 2004-05-19 0:45 ` Chris Mason
2004-05-19 8:33 ` Alex Zarochentsev
2004-05-19 12:13 ` Jeffrey Mahoney
2004-05-19 5:39 ` Hans Reiser
1 sibling, 2 replies; 7+ messages in thread
From: Chris Mason @ 2004-05-19 0:45 UTC (permalink / raw)
To: Jeff Mahoney; +Cc: Reiserfs mail-list
On Tue, 2004-05-18 at 16:40, Jeff Mahoney wrote:
> Hey all -
>
> The ReiserFS FAQ that we quote and point people to when they ask
> questions about limits in ReiserFS states that the maxmimum file size
> for a reiserfs v3 filesystem is 2^60-1. However, the actual limits, in
> practice, are far less.
>
> I tried to create a 3 TB sparse file, and ended up getting told it was
> too large. 2 TB was too large also, just under 2 TB was ok.
>
> This is a result of super->s_maxbytes = (512LL << 32) - s->s_blocksize;,
> in fs/reiserfs/super.c, which is set so that i_blocks isn't overflowed.
>
> Other filesystems that have the ability to cross the 2 TB limit on file
> sizes simply ignore the limit and allow i_blocks to wrap. There's really
> no reason we can't do the same.
>
> The patch is attached.
Are quotas happy with this?
-chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ReiserFS Maximum file size (in practice)
2004-05-18 20:40 ReiserFS Maximum file size (in practice) Jeff Mahoney
2004-05-19 0:45 ` Chris Mason
@ 2004-05-19 5:39 ` Hans Reiser
1 sibling, 0 replies; 7+ messages in thread
From: Hans Reiser @ 2004-05-19 5:39 UTC (permalink / raw)
To: Jeff Mahoney; +Cc: Reiserfs mail-list, Vladimir Saveliev
Jeff Mahoney wrote:
>
> Hey all -
>
> The ReiserFS FAQ that we quote and point people to when they ask
> questions about limits in ReiserFS states that the maxmimum file size
> for a reiserfs v3 filesystem is 2^60-1. However, the actual limits, in
> practice, are far less.
>
> I tried to create a 3 TB sparse file, and ended up getting told it was
> too large. 2 TB was too large also, just under 2 TB was ok.
>
> This is a result of super->s_maxbytes = (512LL << 32) -
> s->s_blocksize;, in fs/reiserfs/super.c, which is set so that i_blocks
> isn't overflowed.
if vs approves it, I do.
>
> Other filesystems that have the ability to cross the 2 TB limit on
> file sizes simply ignore the limit and allow i_blocks to wrap. There's
> really no reason we can't do the same.
>
> The patch is attached.
>
> -Jeff
>
>------------------------------------------------------------------------
>
>--- linux-2.6.5/fs/reiserfs/super.c 2004-05-14 15:32:49.000000000 -0400
>+++ linux-2.6.5.fix/fs/reiserfs/super.c 2004-05-18 12:07:25.000000000 -0400
>@@ -1204,7 +1204,7 @@
> /* new format is limited by the 32 bit wide i_blocks field, want to
> ** be one full block below that.
> */
>- s->s_maxbytes = (512LL << 32) - s->s_blocksize ;
>+ s->s_maxbytes = MAX_LFS_FILESIZE;
> return 0;
> }
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ReiserFS Maximum file size (in practice)
2004-05-19 0:45 ` Chris Mason
@ 2004-05-19 8:33 ` Alex Zarochentsev
2004-05-19 12:13 ` Jeffrey Mahoney
1 sibling, 0 replies; 7+ messages in thread
From: Alex Zarochentsev @ 2004-05-19 8:33 UTC (permalink / raw)
To: Chris Mason; +Cc: Jeff Mahoney, Reiserfs mail-list
On Tue, May 18, 2004 at 08:45:07PM -0400, Chris Mason wrote:
> On Tue, 2004-05-18 at 16:40, Jeff Mahoney wrote:
> > Hey all -
> >
> > The ReiserFS FAQ that we quote and point people to when they ask
> > questions about limits in ReiserFS states that the maxmimum file size
> > for a reiserfs v3 filesystem is 2^60-1. However, the actual limits, in
> > practice, are far less.
> >
> > I tried to create a 3 TB sparse file, and ended up getting told it was
> > too large. 2 TB was too large also, just under 2 TB was ok.
> >
> > This is a result of super->s_maxbytes = (512LL << 32) - s->s_blocksize;,
> > in fs/reiserfs/super.c, which is set so that i_blocks isn't overflowed.
> >
> > Other filesystems that have the ability to cross the 2 TB limit on file
> > sizes simply ignore the limit and allow i_blocks to wrap. There's really
> > no reason we can't do the same.
> >
> > The patch is attached.
>
> Are quotas happy with this?
quota database initialization is affected, I guess.
> -chris
--
Alex.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ReiserFS Maximum file size (in practice)
2004-05-19 0:45 ` Chris Mason
2004-05-19 8:33 ` Alex Zarochentsev
@ 2004-05-19 12:13 ` Jeffrey Mahoney
2004-05-19 12:57 ` Chris Mason
1 sibling, 1 reply; 7+ messages in thread
From: Jeffrey Mahoney @ 2004-05-19 12:13 UTC (permalink / raw)
To: Chris Mason; +Cc: Reiserfs mail-list
Chris Mason wrote:
> On Tue, 2004-05-18 at 16:40, Jeff Mahoney wrote:
>
>>Hey all -
>>
>>The ReiserFS FAQ that we quote and point people to when they ask
>>questions about limits in ReiserFS states that the maxmimum file size
>>for a reiserfs v3 filesystem is 2^60-1. However, the actual limits, in
>>practice, are far less.
>>
>>I tried to create a 3 TB sparse file, and ended up getting told it was
>>too large. 2 TB was too large also, just under 2 TB was ok.
>>
>>This is a result of super->s_maxbytes = (512LL << 32) - s->s_blocksize;,
>>in fs/reiserfs/super.c, which is set so that i_blocks isn't overflowed.
>>
>>Other filesystems that have the ability to cross the 2 TB limit on file
>>sizes simply ignore the limit and allow i_blocks to wrap. There's really
>>no reason we can't do the same.
>>
>>The patch is attached.
>
>
> Are quotas happy with this?
They should be, but aren't. The data structures support it. The on-disk
quota format doesn't keep track of blocks, it keeps track of bytes. The
DQUOT* macros do the translation based on the blocksize in the super.
The value to track bytes is a u64, so it should track the increased
maximum just fine. As expected, the places were it may get sticky are
when i_blocks/i_bytes are referenced. There are several places in the
quota code where this is done, but most are just keeping them in line
with its view of the world. The values will be appropriately wrong as
they are elsewhere after they wrap.
There is one important place where they're accessed though, and that's
in dquot_transfer. inode_get_bytes() is used to obtain the number of
bytes to transfer from one quota to another, and if wrapped, will
contain a wrong value and the wrong transfer will occur. The
infrastructure allows the dq_op->transfer() call to be overridden, but
I'm not too wild about copying that function wholesale just to change
one line to be something reiserfs specific.
It seems to me that while we're not allowed to change the st_blocks
exported to userspace, we're allowed to change struct inode to reflect
the increased block count available in 2.6. We should at least be
tracking it correctly internally. Was it simply overlooked when the max
file size was increased for the entire system? Currently, i_blocks is an
unsigned long which makes it work fine on 64-bit systems.
i_blocks/i_bytes are protected by inode->i_lock, so making 32-bit
systems use a 64-bit value for i_blocks doesn't introduce any atomicity
issues. Other filesystems are wrapping i_blocks as well, but for various
reasons, we're just the first to see the quota problem. (JFS doesn't yet
support quotas, XFS uses its own implementation)
The question is if being correct on a case that gets truncated when
exported to userspace anyway is worth adding 4 bytes to every inode,
when there is a workable option to get around the shortcomings.
The only other limit where the data structures would allow an overflow
is the 16 TB limit imposed by the maximum filesystem size.
stat_data->sd_blocks is 32-bit, but unless blocks-in-file <=
blocks-in-filesystem ceases to be an invariant, this will never happen. ;)
-Jeff
--
Jeff Mahoney
SuSE Labs
jeffm@suse.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ReiserFS Maximum file size (in practice)
2004-05-19 12:13 ` Jeffrey Mahoney
@ 2004-05-19 12:57 ` Chris Mason
2004-05-19 20:04 ` Jeff Mahoney
0 siblings, 1 reply; 7+ messages in thread
From: Chris Mason @ 2004-05-19 12:57 UTC (permalink / raw)
To: Jeffrey Mahoney; +Cc: Reiserfs mail-list
On Wed, 2004-05-19 at 08:13, Jeffrey Mahoney wrote:
> Chris Mason wrote:
> > On Tue, 2004-05-18 at 16:40, Jeff Mahoney wrote:
> >
> >>Hey all -
> >>
> >>The ReiserFS FAQ that we quote and point people to when they ask
> >>questions about limits in ReiserFS states that the maxmimum file size
> >>for a reiserfs v3 filesystem is 2^60-1. However, the actual limits, in
> >>practice, are far less.
> >>
> >>I tried to create a 3 TB sparse file, and ended up getting told it was
> >>too large. 2 TB was too large also, just under 2 TB was ok.
> >>
> >>This is a result of super->s_maxbytes = (512LL << 32) - s->s_blocksize;,
> >>in fs/reiserfs/super.c, which is set so that i_blocks isn't overflowed.
> >>
> >>Other filesystems that have the ability to cross the 2 TB limit on file
> >>sizes simply ignore the limit and allow i_blocks to wrap. There's really
> >>no reason we can't do the same.
> >>
> >>The patch is attached.
> >
> >
> > Are quotas happy with this?
>
> They should be, but aren't. The data structures support it. The on-disk
> quota format doesn't keep track of blocks, it keeps track of bytes. The
> DQUOT* macros do the translation based on the blocksize in the super.
> The value to track bytes is a u64, so it should track the increased
> maximum just fine. As expected, the places were it may get sticky are
> when i_blocks/i_bytes are referenced. There are several places in the
> quota code where this is done, but most are just keeping them in line
> with its view of the world. The values will be appropriately wrong as
> they are elsewhere after they wrap.
>
> There is one important place where they're accessed though, and that's
> in dquot_transfer. inode_get_bytes() is used to obtain the number of
> bytes to transfer from one quota to another, and if wrapped, will
> contain a wrong value and the wrong transfer will occur. The
> infrastructure allows the dq_op->transfer() call to be overridden, but
> I'm not too wild about copying that function wholesale just to change
> one line to be something reiserfs specific.
>
We can probably talk Jan Kara into fixing dquot_transfer, and probably
talk Andrew Morton into allowing a resolution on the inode->i_bytes
mess. ext2/3 suffer from the same problem (read ext[23]_max_size).
> It seems to me that while we're not allowed to change the st_blocks
> exported to userspace, we're allowed to change struct inode to reflect
> the increased block count available in 2.6. We should at least be
> tracking it correctly internally. Was it simply overlooked when the max
> file size was increased for the entire system? Currently, i_blocks is an
> unsigned long which makes it work fine on 64-bit systems.
> i_blocks/i_bytes are protected by inode->i_lock, so making 32-bit
> systems use a 64-bit value for i_blocks doesn't introduce any atomicity
> issues. Other filesystems are wrapping i_blocks as well, but for various
> reasons, we're just the first to see the quota problem. (JFS doesn't yet
> support quotas, XFS uses its own implementation)
>
> The question is if being correct on a case that gets truncated when
> exported to userspace anyway is worth adding 4 bytes to every inode,
> when there is a workable option to get around the shortcomings.
>
I think I'd rather just fix the quota problem right now, it's a smaller
change.
-chris
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: ReiserFS Maximum file size (in practice)
2004-05-19 12:57 ` Chris Mason
@ 2004-05-19 20:04 ` Jeff Mahoney
0 siblings, 0 replies; 7+ messages in thread
From: Jeff Mahoney @ 2004-05-19 20:04 UTC (permalink / raw)
To: Chris Mason; +Cc: Reiserfs mail-list
Chris Mason wrote:
> We can probably talk Jan Kara into fixing dquot_transfer, and probably
> talk Andrew Morton into allowing a resolution on the inode->i_bytes
> mess. ext2/3 suffer from the same problem (read ext[23]_max_size).
The comment on ext[23]_max_size doesn't only refer to the in-memory
i_blocks, but also to the on-disk one as well. Unlike ReiserFS, Ext[23]
doesn't track blocks as a multiple of the filesystem block size. It
tracks it as a multiple of 512B blocks, just like the in memory one.
(see ext[23]_read_inode and ext[23]_do_update_inode; There's no
blocksize translation)
ReiserFS is artificially limited by the VFS layer (and in its own use of
i_blocks to update stat_data->sd_blocks); Ext[23] is limited by its own
disk format. Other than its i_blocks, ext[23]'s triple indirection
should be able to support files up to ~ 4TB.
> I think I'd rather just fix the quota problem right now, it's a smaller
> change.
I think there's going to be a decent amount of work in either case.
Definately moreso than the patch I posted earlier. There are two scenarios:
1) Fix just reiserfs. This means adding a sd_blocks field to the
in-memory inode, increasing every reiserfs inode by 4 bytes. It also
means that every time a quota operation occurs, we need to follow it
with an update to our internal block count as well. It's easy enough to
do, just not pretty.
2) Fix the VFS layer. Fixing the VFS layer is simple, but there are
filesystems that assume i_blocks is 32-bit already and they'd need to be
cleaned up also.
Unfortunately, all this may be a moot point anyway. If a file is created
on a kernel supporting file sizes > 2 TB, and is then updated on a
previous kernel, wrapping will occur and cause breakage.
-Jeff
--
Jeff Mahoney
SuSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-05-19 20:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-18 20:40 ReiserFS Maximum file size (in practice) Jeff Mahoney
2004-05-19 0:45 ` Chris Mason
2004-05-19 8:33 ` Alex Zarochentsev
2004-05-19 12:13 ` Jeffrey Mahoney
2004-05-19 12:57 ` Chris Mason
2004-05-19 20:04 ` Jeff Mahoney
2004-05-19 5:39 ` Hans Reiser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.