* Re: Questions for article
@ 2008-06-03 20:48 Thomas King
2008-06-03 22:00 ` Martin K. Petersen
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Thomas King @ 2008-06-03 20:48 UTC (permalink / raw)
To: xfs
>
>
> On Tue, 3 Jun 2008, Thomas King wrote:
>
>> I am writing an article to answer Henry Newman's at
>> http://www.enterprisestorageforum.com/sans/features/article.php/3749926. I've
>> already been bugging folks on the ext4 mailing list and one of them mentioned
>> I
>> should also send some of the same questions to this list. Please let me know
>> if
>> I may do so.
>>
>> Thanks!
>> Tom King
>>
>>
>
> What are the questions?
>
> Justin.
For the most part, XFS is used for massive filesystems (hundreds of petabytes)
successfully in Linux (among other OS's). However, Mr. Newman still believes
there are details that he believes XFS doesn't include or Linux limits (such as
page sizes in x86 limiting block sizes).
With that preface, here are some questions:
-Is XFS fully RAID aware inthat it aligns metadata with RAID stripes? Some of
the information I see states XFS can get geometry information from LVM and MD,
but what about hardware RAID?
-Does XFS take advantage of T10 DIF (block protection?)?
-Does/Will XFS support NFS v4.1?
-Concerning the block-size limit, will this eventually be a thing of the past?
Mr. Newman's contention is massive filesystems should have much larger block
sizes, but he also contends that OSD is the eventual answer instead of using
block allocation.
-Is there anything else y'all would like folks to understand about XFS and
massive implementations?
Thanks!
Tom King
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Questions for article
2008-06-03 20:48 Questions for article Thomas King
@ 2008-06-03 22:00 ` Martin K. Petersen
2008-06-03 22:14 ` Eric Sandeen
2008-06-04 5:31 ` Christoph Hellwig
2 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2008-06-03 22:00 UTC (permalink / raw)
To: Thomas King; +Cc: xfs
>>>>> "Thomas" == Thomas King <kingttx@tomslinux.homelinux.org> writes:
Thomas> -Is XFS fully RAID aware inthat it aligns metadata with RAID
Thomas> stripes? Some of the information I see states XFS can get
Thomas> geometry information from LVM and MD, but what about hardware
Thomas> RAID?
The stuff that queries MD/LVM for stripe unit/stripe size has been in
XFS for a while[1].
For hardware RAID there is no non-proprietary way to obtain the
information from the device. So whoever runs mkfs on a hardware RAID
device must manually specify the geometry using the sunit and swidth
parameters. That capability has been there since the dawn of time.
Note that in the upcoming version of SBC-3 (SCSI Block Commands)
finally features a VPD page that the array firmware can fill out to
let the operating system know about stripe size, etc. I have been
working on a patch that extracts this information and presents it to
the block layer in a generic fashion. But so far I have not seen a
single array that implements said VPD page. IOW, there hasn't been
much motivation to finish that work.
Also, SBC-3 is work in progress. The standard has not been ratified
yet so things could change before it is released. I doubt they are
going to change the block limits VPD, but who knows?
Thomas> -Does XFS take advantage of T10 DIF (block protection?)?
As I mentioned earlier today, filesystems do not need to be explicitly
DIF-aware. I/Os submitted by XFS will be protected if the kernel does
DIF.
The DIF support has not been accepted upstream yet. Working on that.
But in any case DIF-capable hardware is not generally available.
[1] http://www.linux.sgi.com/archives/xfs/2001-03/msg00435.html
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions for article
2008-06-03 20:48 Questions for article Thomas King
2008-06-03 22:00 ` Martin K. Petersen
@ 2008-06-03 22:14 ` Eric Sandeen
2008-06-03 22:19 ` Thomas King
2008-06-04 5:28 ` Christoph Hellwig
2008-06-04 5:31 ` Christoph Hellwig
2 siblings, 2 replies; 11+ messages in thread
From: Eric Sandeen @ 2008-06-03 22:14 UTC (permalink / raw)
To: Thomas King; +Cc: xfs
Thomas King wrote:
> -Concerning the block-size limit, will this eventually be a thing of the past?
> Mr. Newman's contention is massive filesystems should have much larger block
> sizes, but he also contends that OSD is the eventual answer instead of using
> block allocation.
Just to reiterate what I already put on the ext4 list... :)
ftp://ftp.kernel.org/pub/linux/kernel/people/christoph/largeblocksize/4/patches/
http://kerneltrap.org/Linux/Large_Blocksize_Performance
Not sure where those patches are headed.
It's also not clear to me that this is really a critical feature for
large filesystems; space allocation is not done block by block per se in
xfs, as Mr. Newman seems (?) to imply (?) The block granularity is
there throughout the fs but I'm not sure how much it matters in
practice. Dave...?
OSDs may have their place, we'll see. It's pretty new stuff (unless you
count Lustre, I guess, but I thought he didn't want to talk lustre...)
I don't think this relates to a linux shortcoming in any way (or to
xfs...), it's awfully new stuff that just about nobody really has in
production.
> -Is there anything else y'all would like folks to understand about XFS and
> massive implementations?
I already pointed him at the xfs_repair paper, since he seems concerned
about fsck (and pointed out that yes, xfs_repair really *DOES* check all
filesystem data and does not simply replay the log...)
http://mirror.linux.org.au/pub/linux.conf.au/2008/slides/135-fixing_xfs_faster.pdf
Maybe some of the folks on the list with said massive implementations
can speak up too. :)
-Eric
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions for article
2008-06-03 22:14 ` Eric Sandeen
@ 2008-06-03 22:19 ` Thomas King
2008-06-04 5:28 ` Christoph Hellwig
1 sibling, 0 replies; 11+ messages in thread
From: Thomas King @ 2008-06-03 22:19 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
> Thomas King wrote:
>
>> -Concerning the block-size limit, will this eventually be a thing of the past?
>> Mr. Newman's contention is massive filesystems should have much larger block
>> sizes, but he also contends that OSD is the eventual answer instead of using
>> block allocation.
>
> Just to reiterate what I already put on the ext4 list... :)
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/christoph/largeblocksize/4/patches/
> http://kerneltrap.org/Linux/Large_Blocksize_Performance
>
> Not sure where those patches are headed.
>
> It's also not clear to me that this is really a critical feature for
> large filesystems; space allocation is not done block by block per se in
> xfs, as Mr. Newman seems (?) to imply (?) The block granularity is
> there throughout the fs but I'm not sure how much it matters in
> practice. Dave...?
>
> OSDs may have their place, we'll see. It's pretty new stuff (unless you
> count Lustre, I guess, but I thought he didn't want to talk lustre...)
> I don't think this relates to a linux shortcoming in any way (or to
> xfs...), it's awfully new stuff that just about nobody really has in
> production.
>
>> -Is there anything else y'all would like folks to understand about XFS and
>> massive implementations?
>
> I already pointed him at the xfs_repair paper, since he seems concerned
> about fsck (and pointed out that yes, xfs_repair really *DOES* check all
> filesystem data and does not simply replay the log...)
>
> http://mirror.linux.org.au/pub/linux.conf.au/2008/slides/135-fixing_xfs_faster.pdf
>
> Maybe some of the folks on the list with said massive implementations
> can speak up too. :)
>
> -Eric
>
Both you and Andreas gave me some excellent information on both lists, and thank
you all for your patience. I appreciate everyone piping in. Like you say, if
there is anyone with massive implementations that wishes to add, please do so.
Thanks!
Tom King
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions for article
2008-06-03 22:14 ` Eric Sandeen
2008-06-03 22:19 ` Thomas King
@ 2008-06-04 5:28 ` Christoph Hellwig
1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2008-06-04 5:28 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Thomas King, xfs
On Tue, Jun 03, 2008 at 05:14:44PM -0500, Eric Sandeen wrote:
> It's also not clear to me that this is really a critical feature for
> large filesystems; space allocation is not done block by block per se in
> xfs, as Mr. Newman seems (?) to imply (?) The block granularity is
> there throughout the fs but I'm not sure how much it matters in
> practice. Dave...?
For streaming I/O workloads it doesn't matter anymore, see Dave's 2006
OLS talk. The direct to bio I/O path mitigates any blocksize impact.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions for article
2008-06-03 20:48 Questions for article Thomas King
2008-06-03 22:00 ` Martin K. Petersen
2008-06-03 22:14 ` Eric Sandeen
@ 2008-06-04 5:31 ` Christoph Hellwig
2008-06-04 14:16 ` Thomas King
2 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2008-06-04 5:31 UTC (permalink / raw)
To: Thomas King; +Cc: xfs
On Tue, Jun 03, 2008 at 03:48:49PM -0500, Thomas King wrote:
> For the most part, XFS is used for massive filesystems (hundreds of petabytes)
I think undreds of petabytes is not something we commonly see today :)
hundreds of TB is more reasonable.
> -Does/Will XFS support NFS v4.1?
I suspect he means support for PNFS. PNFS is just like CXFS over
sunrpc, so for anyone whoe cares adding an XFS layout driver shouldn't
be a problem, and not actually require changes to the disk format or
low-level XFS code. Note that I think pnfs a really good idea.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions for article
2008-06-04 5:31 ` Christoph Hellwig
@ 2008-06-04 14:16 ` Thomas King
2008-06-04 15:06 ` Eric Sandeen
0 siblings, 1 reply; 11+ messages in thread
From: Thomas King @ 2008-06-04 14:16 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
> On Tue, Jun 03, 2008 at 03:48:49PM -0500, Thomas King wrote:
>> For the most part, XFS is used for massive filesystems (hundreds of petabytes)
>
> I think undreds of petabytes is not something we commonly see today :)
> hundreds of TB is more reasonable.
If I'm going to answer his two articles, he's speaking in the context of massive
filesystems. True, hundreds of petabytes are not common but that's the
environment he's talking about.
>From what I'm seeing from XFS, BTRFS, ext4, and HAMMER, Linux filesystems are
going to easily keep up with the current trend. For the massive filesystems
Henry speaks of, XFS has some new features I don't think he's aware of and needs
to come out in this answer.
Tom King
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions for article
2008-06-04 14:16 ` Thomas King
@ 2008-06-04 15:06 ` Eric Sandeen
0 siblings, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2008-06-04 15:06 UTC (permalink / raw)
To: Thomas King; +Cc: Christoph Hellwig, xfs
Thomas King wrote:
>> On Tue, Jun 03, 2008 at 03:48:49PM -0500, Thomas King wrote:
>>> For the most part, XFS is used for massive filesystems (hundreds of petabytes)
>> I think undreds of petabytes is not something we commonly see today :)
>> hundreds of TB is more reasonable.
>
> If I'm going to answer his two articles, he's speaking in the context of massive
> filesystems. True, hundreds of petabytes are not common but that's the
> environment he's talking about.
>
> From what I'm seeing from XFS, BTRFS, ext4, and HAMMER, Linux filesystems are
> going to easily keep up with the current trend. For the massive filesystems
> Henry speaks of, XFS has some new features I don't think he's aware of and needs
> to come out in this answer.
>
> Tom King
One thing I would be careful of is not to fall into the trap of letting
Linux filesystems get bashed over things that *nobody* really has today.
Stuff like PNFS, OSD, DIF etc are bleeding-edge for almost *everybody*
Petabyte filesystems are hard. For *everybody*
And hundred-petabyte filesystems aren't just uncommon, they don't exist
AFAIK.
-Eric
^ permalink raw reply [flat|nested] 11+ messages in thread
* Questions for article
@ 2008-06-03 15:34 Thomas King
2008-06-03 19:42 ` Justin Piszcz
2008-06-04 14:52 ` Emmanuel Florac
0 siblings, 2 replies; 11+ messages in thread
From: Thomas King @ 2008-06-03 15:34 UTC (permalink / raw)
To: xfs
I am writing an article to answer Henry Newman's at
http://www.enterprisestorageforum.com/sans/features/article.php/3749926. I've
already been bugging folks on the ext4 mailing list and one of them mentioned I
should also send some of the same questions to this list. Please let me know if
I may do so.
Thanks!
Tom King
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions for article
2008-06-03 15:34 Thomas King
@ 2008-06-03 19:42 ` Justin Piszcz
2008-06-04 14:52 ` Emmanuel Florac
1 sibling, 0 replies; 11+ messages in thread
From: Justin Piszcz @ 2008-06-03 19:42 UTC (permalink / raw)
To: Thomas King; +Cc: xfs
On Tue, 3 Jun 2008, Thomas King wrote:
> I am writing an article to answer Henry Newman's at
> http://www.enterprisestorageforum.com/sans/features/article.php/3749926. I've
> already been bugging folks on the ext4 mailing list and one of them mentioned I
> should also send some of the same questions to this list. Please let me know if
> I may do so.
>
> Thanks!
> Tom King
>
>
What are the questions?
Justin.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions for article
2008-06-03 15:34 Thomas King
2008-06-03 19:42 ` Justin Piszcz
@ 2008-06-04 14:52 ` Emmanuel Florac
1 sibling, 0 replies; 11+ messages in thread
From: Emmanuel Florac @ 2008-06-04 14:52 UTC (permalink / raw)
To: Thomas King; +Cc: xfs
Le Tue, 3 Jun 2008 10:34:48 -0500 (CDT)
Thomas King <kingttx@tomslinux.homelinux.org> écrivait:
> I am writing an article to answer Henry Newman's at
> http://www.enterprisestorageforum.com/sans/features/article.php/3749926.
> I've already been bugging folks on the ext4 mailing list and one of
> them mentioned I should also send some of the same questions to this
> list. Please let me know if I may do so.
Seems like a good idea. This guy doesn't even mention XFS, while it's
more or less the only viable option for big filesystems (more than 8TB).
I currently use 30, 40TB XFS filesystems that work just fine.
I've already compared all filesystems : XFS works great for big
filesystems. JFS works well too, however it lacks a defragmenting
utility which is quite a problem for big filesystems with lots of write
activity. reiserfs 3.6 simply breaks over 4TB; mkfs.ext3 is so slow
than it's a problem from the start, then the performance is abysmal.
--
----------------------------------------
Emmanuel Florac | Intellique
----------------------------------------
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-06-04 15:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-03 20:48 Questions for article Thomas King
2008-06-03 22:00 ` Martin K. Petersen
2008-06-03 22:14 ` Eric Sandeen
2008-06-03 22:19 ` Thomas King
2008-06-04 5:28 ` Christoph Hellwig
2008-06-04 5:31 ` Christoph Hellwig
2008-06-04 14:16 ` Thomas King
2008-06-04 15:06 ` Eric Sandeen
-- strict thread matches above, loose matches on Subject: below --
2008-06-03 15:34 Thomas King
2008-06-03 19:42 ` Justin Piszcz
2008-06-04 14:52 ` Emmanuel Florac
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox