From: Andrew Morton <akpm@linux-foundation.org>
To: Neil Brown <neilb@suse.de>
Cc: linux-fsdevel@vger.kernel.org, linux-raid@vger.kernel.org,
dm-devel@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: How to handle >16TB devices on 32 bit hosts ??
Date: Tue, 21 Jul 2009 23:59:04 -0700 [thread overview]
Message-ID: <20090721235904.42e6cd35.akpm@linux-foundation.org> (raw)
In-Reply-To: <19041.4714.686158.130252@notabene.brown>
On Sat, 18 Jul 2009 10:08:10 +1000 Neil Brown <neilb@suse.de> wrote:
> It has recently come to by attention that Linux on a 32 bit host does
> not handle devices beyond 16TB particularly well.
>
> In particular, any access that goes through the page cache for the
> block device is limited to a pgoff_t number of pages.
> As pgoff_t is "unsigned long" and hence 32bit, and as page size is
> 4096, this comes to 16TB total.
I expect that the VFS could be made to work with 64-bit pgoff_t fairly
easily. The generated code will be pretty damn sad.
radix-trees use a ulong index, so we would need a new
lib/radix_tree64.c or some other means of fixing that up.
The bigger problem is filesystems - they'll each need to be checked,
tested, fixed and enabled. It's probably not too bad for the
mainstream filesystems which mostly bounce their operations into VFS
libarary functions anyway.
There's perhaps a middle ground - support >16TB devices, but not >16TB
partitions. That way everything remains 32-bit and we just have to get
the offsetting right (probably already the case).
So now /dev/sda1, /dev/sda2 etc are all <16TB. The remaining problem
is that /dev/sda is >16TB. I expect that we could arrange for the
kernel to error out if userspace tries to access /dev/sda beyond the
16TB point, and those very very few applications which want to touch
that part of the disk will need to be written using direct-io, (or
perhaps sgio) or run on 64-bit machines.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Neil Brown <neilb@suse.de>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-raid@vger.kernel.org, dm-devel@redhat.com
Subject: Re: How to handle >16TB devices on 32 bit hosts ??
Date: Tue, 21 Jul 2009 23:59:04 -0700 [thread overview]
Message-ID: <20090721235904.42e6cd35.akpm@linux-foundation.org> (raw)
In-Reply-To: <19041.4714.686158.130252@notabene.brown>
On Sat, 18 Jul 2009 10:08:10 +1000 Neil Brown <neilb@suse.de> wrote:
> It has recently come to by attention that Linux on a 32 bit host does
> not handle devices beyond 16TB particularly well.
>
> In particular, any access that goes through the page cache for the
> block device is limited to a pgoff_t number of pages.
> As pgoff_t is "unsigned long" and hence 32bit, and as page size is
> 4096, this comes to 16TB total.
I expect that the VFS could be made to work with 64-bit pgoff_t fairly
easily. The generated code will be pretty damn sad.
radix-trees use a ulong index, so we would need a new
lib/radix_tree64.c or some other means of fixing that up.
The bigger problem is filesystems - they'll each need to be checked,
tested, fixed and enabled. It's probably not too bad for the
mainstream filesystems which mostly bounce their operations into VFS
libarary functions anyway.
There's perhaps a middle ground - support >16TB devices, but not >16TB
partitions. That way everything remains 32-bit and we just have to get
the offsetting right (probably already the case).
So now /dev/sda1, /dev/sda2 etc are all <16TB. The remaining problem
is that /dev/sda is >16TB. I expect that we could arrange for the
kernel to error out if userspace tries to access /dev/sda beyond the
16TB point, and those very very few applications which want to touch
that part of the disk will need to be written using direct-io, (or
perhaps sgio) or run on 64-bit machines.
next prev parent reply other threads:[~2009-07-22 6:59 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-18 0:08 How to handle >16TB devices on 32 bit hosts ?? Neil Brown
2009-07-18 4:31 ` Andreas Dilger
2009-07-18 4:31 ` Andreas Dilger
2009-07-18 6:16 ` Andi Kleen
2009-07-18 6:52 ` Andreas Dilger
2009-07-18 7:48 ` Andi Kleen
2009-07-18 13:49 ` Theodore Tso
2009-07-18 14:21 ` Andi Kleen
2009-07-18 14:21 ` Andi Kleen
2009-07-18 14:21 ` Andi Kleen
2009-07-18 14:32 ` Andreas Dilger
2009-07-18 18:19 ` Christoph Hellwig
2009-07-19 0:54 ` Leslie Rhorer
2009-07-19 11:04 ` Christoph Hellwig
2009-07-29 15:07 ` Pavel Machek
2009-07-29 15:07 ` Pavel Machek
2009-07-29 15:07 ` Pavel Machek
2009-07-19 3:44 ` Tapani Tarvainen
2009-07-18 6:09 ` Andi Kleen
2009-07-18 6:09 ` Andi Kleen
2009-07-22 6:59 ` Andrew Morton [this message]
2009-07-22 6:59 ` Andrew Morton
2009-07-22 18:32 ` Andreas Dilger
2009-07-22 18:51 ` Andrew Morton
2009-07-22 18:51 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090721235904.42e6cd35.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=dm-devel@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.