Re: ext4 64bit (disk >16TB) question

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: Ric Wheeler <rwheeler@redhat.com>
To: Bernd Schubert <bs@q-leap.de>
Cc: Goswin von Brederlow <goswin-v-b@web.de>, linux-ext4@vger.kernel.org
Subject: Re: ext4 64bit (disk >16TB) question
Date: Tue, 15 Jul 2008 10:08:42 -0400	[thread overview]
Message-ID: <487CAF6A.8070403@redhat.com> (raw)
In-Reply-To: <200807151601.20881.bs@q-leap.de>

Bernd Schubert wrote:
> On Tuesday 15 July 2008 15:16:33 Ric Wheeler wrote:
>   
>> Goswin von Brederlow wrote:
>>     
>>> Theodore Tso <tytso@mit.edu> writes:
>>>       
>>>> On Mon, Jul 14, 2008 at 09:50:56PM +0200, Goswin von Brederlow wrote:
>>>>         
>>>>> I found ext4 64bit patches for e2fsprogs 1.39 that fix at least
>>>>> mkfs. Does anyone know if there is an updated patch set for 1.41
>>>>> anywhere? And when will that be added to e2fsprogs upstream?
>>>>>           
>>>> Yes, this is correct.  The 1.39 64-bit patches break the shared
>>>> library ABI, and also there were some long-term problems with having
>>>> super-large bitmaps taking huge amounts of memory without some kind of
>>>> run-length encoding or other compression technique.  I decided to
>>>> reject the 1.39 approach because it would have caused short- and
>>>> long-term maintenance issues.
>>>>         
>>> Is that a problem for the kernel or for the user space? I notices that
>>> mke2fs 1.39 used over a gigabyte memory to format a >16TiB disk. While
>>> being a lot that is not really a problem here.
>>>
>>>       
>>>> At the moment 1.41 does not support > 32 bit block numbers.  The
>>>> priority was to get something which supported all of the other ext4
>>>> features out the door, since that would allow much better testing of
>>>> the ext4 code base.  We are now working on 64-bit support in
>>>> e2fsprogs, with mke2fs coming first, and the other tools coming later.
>>>> But yeah, good quality 64-bit e2fsprogs support is going to lag for a
>>>> bit.  Sorry, we're working as fast as we can, given the resources we
>>>> have.
>>>>         
>>> Will there be filesystem changes as well? The above mentioned
>>> run-length encoding sounds a bit like a new bitmap format or is that
>>> only supposed to be the in memory format in userspace?
>>>
>>> What is the plan of how to add 64-bit support to the shared lib now?
>>> Will you introduce a do_foo64() function in parallel to do_foo() to
>>> maintain abi compatibility? Will you add versioned symbols? Or will
>>> there be an abi break at some point?
>>>
>>> The reason I ask all this is because I'm willing to spend some time
>>> patching and testing. A single >16TiB filesystem instead of multiple
>>> smaller ones would be a great benefit for us.
>>>       
>> Can you give us any details about your use case? Is it hundreds of very
>> large files, or 100 million little ones?
>>     
>
> Depends on our customers. Though lustre is rather slow for small files and we 
> try to inform our customers about that. On the other hand there also also no 
> choices of cluster filesystem for small files.
>   

Thanks - so this is not an internal application, but hosting for various 
workloads? We have different scalability issues depending on the nature 
and mix of file sizes, etc.

>   
>> Any interesting hardware in the mix on the storage or server side?
>>     
>
> What exactly do you want to know? Usually we have a server-pair and Infortrend 
> Raid-units. Since lustre doesn't do any redundancy on its own, we usually 
> also have a raid1, raid5 or raid6 of several raid units.
>   

One thing that we have been working on/thinking about is how best to 
automatically self tune a file system to the storage. Today, XFS is 
probably the best normal linux file system at figuring out raid stripe 
size, etc. Getting this enhanced in ext4 could lead to a significant 
performance win for users who are not masters of performance tuning, etc.

How long would you wait for something like fsck to run to completion 
before you would need to go to back up tapes? 6 hours? 1 day? 1 week ;-) ?

> For ease of management and optimal performance, we need single partitions 
> larger than 8TiB (raid1) or 16TiB (raid5 or raid6). And the present 8TiB 
> limit strongly bites us.
>
>
> Cheers,
> Bernd
>   

Makes sense, thanks for the information!

Regards,

Ric

next prev parent reply	other threads:[~2008-07-15 14:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-14 19:50 ext4 64bit (disk >16TB) question Goswin von Brederlow
2008-07-14 23:46 ` Theodore Tso
2008-07-15  5:42   ` Goswin von Brederlow
2008-07-15 12:36     ` Theodore Tso
2008-07-15 17:00       ` Goswin von Brederlow
2008-07-15 17:19         ` Theodore Tso
2008-07-15 13:16     ` Ric Wheeler
2008-07-15 14:01       ` Bernd Schubert
2008-07-15 14:08         ` Ric Wheeler [this message]
2008-07-15 16:13           ` Goswin von Brederlow
2008-07-15 18:27 ` Jose R. Santos
2008-07-15 20:12   ` Andreas Dilger
2008-07-15 20:15     ` Ric Wheeler
2008-07-15 21:03       ` Goswin von Brederlow
2008-07-15 21:20     ` Jose R. Santos
2008-07-16 10:10       ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=487CAF6A.8070403@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=bs@q-leap.de \
    --cc=goswin-v-b@web.de \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox