* 16 TB filesystem limit on 32bit machine
@ 2012-02-20 15:31 Rabeeh Khoury
2012-02-20 16:50 ` Eric Sandeen
0 siblings, 1 reply; 4+ messages in thread
From: Rabeeh Khoury @ 2012-02-20 15:31 UTC (permalink / raw)
To: linux-ext4
I'm trying to figure out all issues with regards >16TB filesystem
support on ARM (32bit) machines.
Clearly this issue was hot few years ago, part of the discussions -
https://bugzilla.kernel.org/show_bug.cgi?id=12556
http://www.redhat.com/archives/dm-devel/2009-July/msg00131.html
And there was Eric's patch of checking length of pgoff_t and
accordingly refuse mount.
Now, today with 4TB hard drives in the market, having 5 of those on an
ARM machine is really common and the ext4 limitation is becoming more
reachable and requires attention.
What i'm trying to achieve is the following two items -
---- item #1 ---
Understand where the limitation is really coming from? Is this ext4
implementation limitation or 32bit machines will never work with >16TB
filesystems?
I understand that there is a 16TB file size limitation (2^32*4K page
size so you won't be able to mmap() further than that point) but how
is that related to filesystem size?
Will 64KB page size fix this issue (ARM supports 4KB and 64KB pages) -
clearly memory fragmentation will be a hit here.
----- item #2 ---
Reproduce a failing scenario.
For now i'v created a 24TB volume (thin provisioned) - RAID-0 on a 3 x
loopback on a 3 x truncted 8TB consisting total of 24TB volume
mkfs.ext4 /dev/md0 (e2fsprogrs 1.42 - thanks for the >16TB support)
mount on a hacked kernel (#define pgoff_t unsigned long long thus
making filesystem mounting check disappear)
The volume mounts ok; now how do i get into corruption? I don't have
physical 24TB drive, so best if there is a pin-pointed to test to
reproduce the issue.
Best regards,
Rabeeh
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 16 TB filesystem limit on 32bit machine
2012-02-20 15:31 16 TB filesystem limit on 32bit machine Rabeeh Khoury
@ 2012-02-20 16:50 ` Eric Sandeen
2012-02-20 21:17 ` Phillip Susi
0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2012-02-20 16:50 UTC (permalink / raw)
To: Rabeeh Khoury; +Cc: linux-ext4
On 2/20/12 9:31 AM, Rabeeh Khoury wrote:
> I'm trying to figure out all issues with regards >16TB filesystem
> support on ARM (32bit) machines.
> Clearly this issue was hot few years ago, part of the discussions -
>
> https://bugzilla.kernel.org/show_bug.cgi?id=12556
> http://www.redhat.com/archives/dm-devel/2009-July/msg00131.html
>
> And there was Eric's patch of checking length of pgoff_t and
> accordingly refuse mount.
>
> Now, today with 4TB hard drives in the market, having 5 of those on an
> ARM machine is really common and the ext4 limitation is becoming more
> reachable and requires attention.
It's not an ext4 limitation, though - it's a limitation of the pagecache.
With a 32-bit index into 4k pages, you can only address 16T in the
pagecache. XFS won't mount it either, for example.
> What i'm trying to achieve is the following two items -
>
> ---- item #1 ---
> Understand where the limitation is really coming from? Is this ext4
> implementation limitation or 32bit machines will never work with >16TB
> filesystems?
The latter, see above.
> I understand that there is a 16TB file size limitation (2^32*4K page
> size so you won't be able to mmap() further than that point) but how
> is that related to filesystem size?
fs metadata is mapped into an address space, IIRC, so can't be addressed
past 2^32 pages. Also, mkfs can't do buffered IO to the device past
16T (it is writing to a device _file_) and ditto for e2fsck.
> Will 64KB page size fix this issue (ARM supports 4KB and 64KB pages) -
> clearly memory fragmentation will be a hit here.
If you can have 64k pages, I think you can address 2^32 * 64k.
> ----- item #2 ---
> Reproduce a failing scenario.
> For now i'v created a 24TB volume (thin provisioned) - RAID-0 on a 3 x
> loopback on a 3 x truncted 8TB consisting total of 24TB volume
> mkfs.ext4 /dev/md0 (e2fsprogrs 1.42 - thanks for the >16TB support)
> mount on a hacked kernel (#define pgoff_t unsigned long long thus
> making filesystem mounting check disappear)
that's the other way to do it; pgoff_t was made a typedef just
for that reason, but someone would need to audit a ton of code
to be sure it's used consistently, and doesn't overflow anywhere,
before it can be made larger.
Another thing to consider is whether you can successfully run e2fsck
on a very large filesystem on this box, even if you resolve the above
issues. Would you have the resources you need to fsck, say, a 32T fs
if^Wwhen something goes wrong?
-Eric
> The volume mounts ok; now how do i get into corruption? I don't have
> physical 24TB drive, so best if there is a pin-pointed to test to
> reproduce the issue.
>
> Best regards,
> Rabeeh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 16 TB filesystem limit on 32bit machine
2012-02-20 16:50 ` Eric Sandeen
@ 2012-02-20 21:17 ` Phillip Susi
2012-02-20 22:50 ` Eric Sandeen
0 siblings, 1 reply; 4+ messages in thread
From: Phillip Susi @ 2012-02-20 21:17 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Rabeeh Khoury, linux-ext4
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2/20/2012 11:50 AM, Eric Sandeen wrote:
> fs metadata is mapped into an address space, IIRC, so can't be
> addressed past 2^32 pages. Also, mkfs can't do buffered IO to the
> device past 16T (it is writing to a device _file_) and ditto for
> e2fsck.
But the file is only used for open(), after that the IO is handled by
the correct device driver, which handles 64 bit offsets ( when you
have CONFIG_LBDAF on ). So if you can change the page cache index
size ( and it doesn't blow up ) this should work fine.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJPQrhrAAoJEJrBOlT6nu75yX8IAKNmJv0tH2CMWefUFV/4un0g
NKU5L2uwuMmyAzVRUOv165d1KAJkNc0z3OLUyh3425yku6wQ4dDqqnWSHCIZHJs7
/d1ltEP1g/vhpM5keAwwWHfC7N2KL+FRyqze8YAe6Qx5GGEWlUOqK2ALSxtuL2Il
TQ+lfLI9kAA5Ggtwf6mjjD/fKFTdJILQuZikxpJRfKasLg/4Y25LWdaPG8BOm78l
qPxkazpmIG8HVFczHwVWpuu+xY8nAmgDdZCTJBFoK9xmBUO0kjzl8JlAl4ScaXdi
v1793CSzev/2aIhEwXSK0HdI7e6KI4iT292/k4ryjfwJ+QX5L6eyfd4HlSVMM88=
=IOG0
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 16 TB filesystem limit on 32bit machine
2012-02-20 21:17 ` Phillip Susi
@ 2012-02-20 22:50 ` Eric Sandeen
0 siblings, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2012-02-20 22:50 UTC (permalink / raw)
To: Phillip Susi; +Cc: Rabeeh Khoury, linux-ext4
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2/20/12 3:17 PM, Phillip Susi wrote:
> On 2/20/2012 11:50 AM, Eric Sandeen wrote:
>> fs metadata is mapped into an address space, IIRC, so can't be
>> addressed past 2^32 pages. Also, mkfs can't do buffered IO to the
>> device past 16T (it is writing to a device _file_) and ditto for
>> e2fsck.
>
> But the file is only used for open(), after that the IO is handled by
> the correct device driver, which handles 64 bit offsets ( when you
> have CONFIG_LBDAF on ). So if you can change the page cache index
> size ( and it doesn't blow up ) this should work fine.
Oh, sure - if you change pgoff_t to 64 bits, but until then, you
can't even mkfs.ext4 a device larger than 16T; there is nowhere
for that buffered IO to go in the page cache, right?
(You could mkfs.ext4 a 16T partition on a 40T lun today, thanks
to LBDAF, but not a 17T without changing pgoff_t et al)
And "if you change the page cache index size" that's a big leap
of faith, today. Might be interesting work to do, but I'd bet
it's a real chunk of work.
- -Eric
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQIcBAEBAgAGBQJPQs5IAAoJECCuFpLhPd7gFGcP/REp9fxQJEXP/JoV8ixcdPmi
Grs15shrEuaJDTnz120wOVS+HSyYU7EczdC1uL6dHFtxqSm/gpdRka5zabz2L46l
RWcPjEofLjR45PWJaGzkWQrmrL9tyJdgP83idUUPl9AlGLbK4jgpzgs9OB0tf639
7lf/a711cZT1G7fLg02ZHb88TGE5BltEQHtX1nk1k4srLgjFKRCi5Am+auXgu1ta
W0Q3a+oxPlkOiVcr499xmInsAhPHBuErtd7B/S7ViP7Cz+Bhbv25xcM77jwRHmtp
9kwkt2ntQ4v9dccmlqpIMElqQJQGKU1li2ySzmJTUbS8jzmBXG/kXtUEr1y50tNc
tm6kIPkMX0RkXRIOfri2jq4LBV0Nl1uGIqUEbUvtJDMh9s4tBtlKV0ZWJi9foIaW
OqMAiEhvgb5tpMZG9gjfCSxnMelUAMC9LrygRG04O26Q9vQMEDD57Ee77MOTOZId
1g01fxZzcmd+UXlfDWmHtgSjotSMyp6pV0lWro82qe1pKo7HwwCjIA2KeHu9owTa
zmzvzbopd0OoUUmXzqAx2kBdxhhNqoqB2AQXfDf8tmyckA8YY4KMgJhKjFSfMc54
ZDzvWuJI11WoBsia9LlAbcFq0Dchd64Mq6As/By9N6QBWV0LdhkikSTFGTzKL4D2
FrAh6QhIiRhkaGiEI5XG
=2bla
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-02-20 22:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-20 15:31 16 TB filesystem limit on 32bit machine Rabeeh Khoury
2012-02-20 16:50 ` Eric Sandeen
2012-02-20 21:17 ` Phillip Susi
2012-02-20 22:50 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).