* Re: 3ware 9650 tips
[not found] ` <4697CA4D.6020304@etelos.com>
@ 2007-07-13 19:36 ` Justin Piszcz
2007-07-16 2:41 ` David Chinner
0 siblings, 1 reply; 18+ messages in thread
From: Justin Piszcz @ 2007-07-13 19:36 UTC (permalink / raw)
To: Jon Collette; +Cc: Joshua Baker-LePain, linux-ide-arrays, linux-raid, xfs
On Fri, 13 Jul 2007, Jon Collette wrote:
> Wouldn't Raid 6 be slower than Raid 5 because of the extra fault tolerance?
> http://www.enterprisenetworksandservers.com/monthly/art.php?1754 - 20%
> drop according to this article
>
> His 500GB WD drives are 7200RPM compared to the Raptors 10K. So his numbers
> will be slower.
> Justin what file system do you have running on the Raptors? I think thats an
> interesting point made by Joshua.
I use XFS:
Justin what file system do you have running on the Raptors? I think thats
an interesting point made by Joshua.
But I also run several 'optimations' for SW RAID and my overall
configuration-- However, for the mkfs.xfs options, they auto-optimize for
whatever (SW) raid I have them configured for. Whereas if XFS cannot tell
the disks/etc underneath the HW raid, I do not think the optimizations
will be present(?)-- which menas you'd have to set the sunit and swidth
appropriately.
>
>
> Justin Piszcz wrote:
>>
>>
>> On Fri, 13 Jul 2007, Joshua Baker-LePain wrote:
>>
>>> My new system has a 3ware 9650SE-24M8 controller hooked to 24 500GB WD
>>> drives. The controller is set up as a RAID6 w/ a hot spare. OS is CentOS
>>> 5 x86_64. It's all running on a couple of Xeon 5130s on a Supermicro
>>> X7DBE motherboard w/ 4GB of RAM.
>>>
>>> Trying to stick with a supported config as much as possible, I need to run
>>> ext3. As per usual, though, initial ext3 numbers are less than
>>> impressive. Using bonnie++ to get a baseline, I get (after doing 'blockdev
>>> --setra 65536' on the device):
>>> Write: 136MB/s
>>> Read: 384MB/s
>>>
>>> Proving it's not the hardware, with XFS the numbers look like:
>>> Write: 333MB/s
>>> Read: 465MB/s
>>>
>>> How many folks are using these? Any tuning tips?
>>>
>>> Thanks.
>>>
>>> --
>>> Joshua Baker-LePain
>>> Department of Biomedical Engineering
>>> Duke University
>>>
>>
>> Let's try that again with the right address :)
>>
>>
>> You are using HW RAID then? Those numbers seem pretty awful for that
>> setup, including linux-raid@ even it though it appears you're running HW
>> raid,
>> this is rather peculiar.
>>
>> To give you an example I get 464MB/s write and 627MB/s with a 10 disk
>> raptor software raid5.
>>
>> Justin.
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 3ware 9650 tips
2007-07-13 19:36 ` 3ware 9650 tips Justin Piszcz
@ 2007-07-16 2:41 ` David Chinner
2007-07-16 12:22 ` David Chinner
2007-07-16 15:43 ` Joshua Baker-LePain
0 siblings, 2 replies; 18+ messages in thread
From: David Chinner @ 2007-07-16 2:41 UTC (permalink / raw)
To: Justin Piszcz
Cc: Jon Collette, Joshua Baker-LePain, linux-ide-arrays, linux-raid,
xfs
On Fri, Jul 13, 2007 at 03:36:46PM -0400, Justin Piszcz wrote:
> On Fri, 13 Jul 2007, Jon Collette wrote:
>
> >Wouldn't Raid 6 be slower than Raid 5 because of the extra fault tolerance?
> > http://www.enterprisenetworksandservers.com/monthly/art.php?1754 - 20%
> >drop according to this article
> >
> >His 500GB WD drives are 7200RPM compared to the Raptors 10K. So his
> >numbers will be slower.
> >Justin what file system do you have running on the Raptors? I think thats
> >an interesting point made by Joshua.
>
> I use XFS:
When it comes to bandwidth, there is good reason for that.
> >>>Trying to stick with a supported config as much as possible, I need to
> >>>run ext3. As per usual, though, initial ext3 numbers are less than
> >>>impressive. Using bonnie++ to get a baseline, I get (after doing
> >>>'blockdev --setra 65536' on the device):
> >>>Write: 136MB/s
> >>>Read: 384MB/s
> >>>
> >>>Proving it's not the hardware, with XFS the numbers look like:
> >>>Write: 333MB/s
> >>>Read: 465MB/s
> >>>
Those are pretty typical numbers. In my experience, ext3 is limited to about
250MB/s buffered write speed. It's not disk limited, it's design limited. e.g.
on a disk subsystem where XFS was getting 4-5GB/s buffered write, ext3 was doing
250MB/s.
http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-paper.pdf
If you've got any sort of serious disk array, ext3 is not the filesystem
to use....
> >>>How many folks are using these? Any tuning tips?
Make sure you tell XFS the correct sunit/swidth. For hardware
raid5/6, sunit = per-disk chunksize, swidth = number of *data* disks in
array.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 3ware 9650 tips
2007-07-16 2:41 ` David Chinner
@ 2007-07-16 12:22 ` David Chinner
2007-07-16 12:39 ` Bernd Schubert
2007-07-16 15:50 ` Eric Sandeen
2007-07-16 15:43 ` Joshua Baker-LePain
1 sibling, 2 replies; 18+ messages in thread
From: David Chinner @ 2007-07-16 12:22 UTC (permalink / raw)
To: David Chinner
Cc: Justin Piszcz, Jon Collette, Joshua Baker-LePain,
linux-ide-arrays, linux-raid, xfs
On Mon, Jul 16, 2007 at 12:41:15PM +1000, David Chinner wrote:
> On Fri, Jul 13, 2007 at 03:36:46PM -0400, Justin Piszcz wrote:
> > On Fri, 13 Jul 2007, Jon Collette wrote:
> >
> > >Wouldn't Raid 6 be slower than Raid 5 because of the extra fault tolerance?
> > > http://www.enterprisenetworksandservers.com/monthly/art.php?1754 - 20%
> > >drop according to this article
> > >
> > >His 500GB WD drives are 7200RPM compared to the Raptors 10K. So his
> > >numbers will be slower.
> > >Justin what file system do you have running on the Raptors? I think thats
> > >an interesting point made by Joshua.
> >
> > I use XFS:
>
> When it comes to bandwidth, there is good reason for that.
>
> > >>>Trying to stick with a supported config as much as possible, I need to
> > >>>run ext3. As per usual, though, initial ext3 numbers are less than
> > >>>impressive. Using bonnie++ to get a baseline, I get (after doing
> > >>>'blockdev --setra 65536' on the device):
> > >>>Write: 136MB/s
> > >>>Read: 384MB/s
> > >>>
> > >>>Proving it's not the hardware, with XFS the numbers look like:
> > >>>Write: 333MB/s
> > >>>Read: 465MB/s
> > >>>
>
> Those are pretty typical numbers. In my experience, ext3 is limited to about
> 250MB/s buffered write speed. It's not disk limited, it's design limited. e.g.
> on a disk subsystem where XFS was getting 4-5GB/s buffered write, ext3 was doing
> 250MB/s.
>
> http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-paper.pdf
>
> If you've got any sort of serious disk array, ext3 is not the filesystem
> to use....
To show what the difference is, I used blktrace and Chris Mason's
seekwatcher script on a simple, single threaded dd command on
a 12 disk dm RAID0 stripe:
# dd if=/dev/zero of=/mnt/scratch/fred bs=1024k count=10k; sync
http://oss.sgi.com/~dgc/writes/ext3_write.png
http://oss.sgi.com/~dgc/writes/xfs_write.png
You can see from the ext3 graph that it comes to a screeching halt
every 5s (probably when pdflush runs) and at all other times the
seek rate is >10,000 seeks/s. That's pretty bad for a brand new,
empty filesystem and the only way it is sustained is the fact that
the disks have their write caches turned on. ext4 will probably show
better results, but I haven't got any of the tools installed to be
able to test it....
The XFS pattern shows consistently an order of magnitude less seeks
and consistent throughput above 600MB/s. To put the number of seeks
in context, XFS is doing 512k I/Os at about 1200-1300 per second. The
number of seeks? A bit above 10^3 per second or roughly 1 seek per
I/O which is pretty much optimal.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 3ware 9650 tips
2007-07-16 12:22 ` David Chinner
@ 2007-07-16 12:39 ` Bernd Schubert
2007-07-16 15:50 ` Eric Sandeen
1 sibling, 0 replies; 18+ messages in thread
From: Bernd Schubert @ 2007-07-16 12:39 UTC (permalink / raw)
To: David Chinner
Cc: Justin Piszcz, Jon Collette, Joshua Baker-LePain,
linux-ide-arrays, linux-raid, xfs
On Monday 16 July 2007 14:22:25 David Chinner wrote:
> You can see from the ext3 graph that it comes to a screeching halt
> every 5s (probably when pdflush runs) and at all other times the
> seek rate is >10,000 seeks/s. That's pretty bad for a brand new,
> empty filesystem and the only way it is sustained is the fact that
> the disks have their write caches turned on. ext4 will probably show
> better results, but I haven't got any of the tools installed to be
> able to test it....
I recently did some filesystem throuput tests, you may find it here
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/lustre/performance/
ldiskfs is ext3+extents+mballoc+some-smaller-patches, so is almost ext4
(delayed allocation is still missing, but the clusterfs/lustre people didn't
port it and I'm afraid of hard to detect filesystem corruptions if I include
it myself).
Write performance is still slower than with xfs and I'm really considering to
try to use xfs in lustre.
Cheers,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 3ware 9650 tips
2007-07-16 2:41 ` David Chinner
2007-07-16 12:22 ` David Chinner
@ 2007-07-16 15:43 ` Joshua Baker-LePain
2007-07-16 17:15 ` [Advocacy] " Bryan J. Smith
2007-07-16 17:34 ` Stuart Levy
1 sibling, 2 replies; 18+ messages in thread
From: Joshua Baker-LePain @ 2007-07-16 15:43 UTC (permalink / raw)
To: David Chinner
Cc: Justin Piszcz, Jon Collette, linux-ide-arrays, linux-raid, xfs
On Mon, 16 Jul 2007 at 12:41pm, David Chinner wrote
> If you've got any sort of serious disk array, ext3 is not the filesystem
> to use....
I do so wish that RedHat shared this view...
--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 3ware 9650 tips
2007-07-16 12:22 ` David Chinner
2007-07-16 12:39 ` Bernd Schubert
@ 2007-07-16 15:50 ` Eric Sandeen
2007-07-16 22:21 ` David Chinner
1 sibling, 1 reply; 18+ messages in thread
From: Eric Sandeen @ 2007-07-16 15:50 UTC (permalink / raw)
To: David Chinner
Cc: Justin Piszcz, Jon Collette, Joshua Baker-LePain,
linux-ide-arrays, linux-raid, xfs
David Chinner wrote:
> On Mon, Jul 16, 2007 at 12:41:15PM +1000, David Chinner wrote:
>> On Fri, Jul 13, 2007 at 03:36:46PM -0400, Justin Piszcz wrote:
...
>> If you've got any sort of serious disk array, ext3 is not the filesystem
>> to use....
>
> To show what the difference is, I used blktrace and Chris Mason's
> seekwatcher script on a simple, single threaded dd command on
> a 12 disk dm RAID0 stripe:
>
> # dd if=/dev/zero of=/mnt/scratch/fred bs=1024k count=10k; sync
>
> http://oss.sgi.com/~dgc/writes/ext3_write.png
> http://oss.sgi.com/~dgc/writes/xfs_write.png
Were those all with default mkfs & mount options? ext3 in writeback
mode might be an interesting comparison too.
-Eric
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Advocacy] Re: 3ware 9650 tips
2007-07-16 15:43 ` Joshua Baker-LePain
@ 2007-07-16 17:15 ` Bryan J. Smith
[not found] ` <200707162040.00062.a1426z@gawab.com>
2007-07-16 17:34 ` Stuart Levy
1 sibling, 1 reply; 18+ messages in thread
From: Bryan J. Smith @ 2007-07-16 17:15 UTC (permalink / raw)
To: Joshua Baker-LePain
Cc: David Chinner, Justin Piszcz, Jon Collette, linux-ide-arrays,
linux-raid, xfs
Off-topic, advocacy-level response ...
On Mon, 2007-07-16 at 11:43 -0400, Joshua Baker-LePain wrote:
> I do so wish that RedHat shared this view...
I've been trying to convince them since Red Hat Linux 7 (and, later, 9)
that they need to realize the limits of Ext3 at the enterprise end of
the scalability spectrum -- you know, that whole market they are
seemingly saying they are the king of and a replacement for Sun? ;->
The problem with Red Hat is that when anyone brings up an alternative to
Ext3, Red Hat falls back to arguments against other filesystems, which
is rather easy given the various compatibility issues with JFS (ported
from OS/2, requiring a lot of inode compatibility hacks -- don't get me
started with my experiences) and ReiserFS (utter lack of inode
compatibility in structures, requiring kernel-level emulation, etc...
that never seems to work, regardless of what the advocates say, let
alone the almsota always "out-of-sync" off-line repair tools).
But when you bring up XFS and its history of a stable, but advanced
inode structure, quota support from day 1, POSIX ACLs from nearly day 1,
and all the SGI team put into 2.5.3+ that is now stock kernel, they
still try to dance. One thing I always get is "oh, its extents don't
perform well for /tmp or /var" or countless other arguments, of which I
merely respond, "all the more reason to use Ext3 for those few
filesystems, and XFS when Ext3 doesn't scale -- like for
large /home, /export, etc... filesystems." No matter how many times I
put forth the argument that XFS complements Ext3, they seem to treat it
as yet another JFS/ReiserFS argument.
Hopeless?
-- Bryan "one of the reasons I still deploy Solaris instead of RHEL for
fileservers, even though RHL7+XFS and RHL9+XFS rocked (and are still
rocking!)" Smith
--
Bryan J. Smith Professional, Technical Annoyance
mailto:b.j.smith@ieee.org http://thebs413.blogspot.com
--------------------------------------------------------
Fission Power: An Inconvenient Solution
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 3ware 9650 tips
2007-07-16 15:43 ` Joshua Baker-LePain
2007-07-16 17:15 ` [Advocacy] " Bryan J. Smith
@ 2007-07-16 17:34 ` Stuart Levy
2007-07-16 18:44 ` [Advocacy] " Bryan J. Smith
1 sibling, 1 reply; 18+ messages in thread
From: Stuart Levy @ 2007-07-16 17:34 UTC (permalink / raw)
To: Joshua Baker-LePain
Cc: David Chinner, Justin Piszcz, Jon Collette, linux-ide-arrays,
linux-raid, xfs
On Mon, Jul 16, 2007 at 11:43:24AM -0400, Joshua Baker-LePain wrote:
> On Mon, 16 Jul 2007 at 12:41pm, David Chinner wrote
>
> >If you've got any sort of serious disk array, ext3 is not the filesystem
> >to use....
>
> I do so wish that RedHat shared this view...
So they support XFS in Fedora, but not in RHEL??
(I've been using Fedora...)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Advocacy] Re: 3ware 9650 tips
[not found] ` <200707162040.00062.a1426z@gawab.com>
@ 2007-07-16 17:48 ` Matthew Wilcox
2007-07-16 18:28 ` [RFC] VFS: data=ordered (was: [Advocacy] Re: 3ware 9650 tips) Al Boldi
2007-07-16 18:38 ` [Advocacy] Re: 3ware 9650 tips Bryan J. Smith
0 siblings, 2 replies; 18+ messages in thread
From: Matthew Wilcox @ 2007-07-16 17:48 UTC (permalink / raw)
To: Al Boldi
Cc: Bryan J. Smith, Joshua Baker-LePain, David Chinner, Justin Piszcz,
Jon Collette, linux-ide-arrays, linux-raid, xfs, linux-fsdevel
On Mon, Jul 16, 2007 at 08:40:00PM +0300, Al Boldi wrote:
> XFS surely rocks, but it's missing one critical component: data=ordered
> And that's one component that's just too critical to overlook for an
> enterprise environment that is built on data-integrity over performance.
>
> So that's the secret why people still use ext3, and XFS' reliance on external
> hardware to ensure integrity is really misplaced.
>
> Now, maybe when we get the data=ordered onto the VFS level, then maybe XFS
> may become viable for the enterprise, and ext3 may cease to be KING.
Wow, thanks for bringing an advocacy thread onto linux-fsdevel. Just what
we wanted. Do you have any insight into how to "get the data=ordered
onto the VFS level"? Because to me, that sounds like pure nonsense.
--
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC] VFS: data=ordered (was: [Advocacy] Re: 3ware 9650 tips)
2007-07-16 17:48 ` Matthew Wilcox
@ 2007-07-16 18:28 ` Al Boldi
2007-07-16 19:02 ` Matthew Wilcox
2007-07-16 18:38 ` [Advocacy] Re: 3ware 9650 tips Bryan J. Smith
1 sibling, 1 reply; 18+ messages in thread
From: Al Boldi @ 2007-07-16 18:28 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Bryan J. Smith, Joshua Baker-LePain, David Chinner, Justin Piszcz,
Jon Collette, linux-ide-arrays, linux-raid, xfs, linux-fsdevel
Matthew Wilcox wrote:
> On Mon, Jul 16, 2007 at 08:40:00PM +0300, Al Boldi wrote:
> > XFS surely rocks, but it's missing one critical component: data=ordered
> > And that's one component that's just too critical to overlook for an
> > enterprise environment that is built on data-integrity over performance.
> >
> > So that's the secret why people still use ext3, and XFS' reliance on
> > external hardware to ensure integrity is really misplaced.
> >
> > Now, maybe when we get the data=ordered onto the VFS level, then maybe
> > XFS may become viable for the enterprise, and ext3 may cease to be KING.
>
> Wow, thanks for bringing an advocacy thread onto linux-fsdevel. Just what
> we wanted. Do you have any insight into how to "get the data=ordered
> onto the VFS level"? Because to me, that sounds like pure nonsense.
Well, conceptually it sounds like a piece of cake, technically your guess is
as good as mine. IIRC, akpm once mentioned something like this.
But seriously, can you think of a technical reason why it shouldn't be
possible to abstract data=ordered mode out into the VFS?
Thanks!
--
Al
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Advocacy] Re: 3ware 9650 tips
2007-07-16 17:48 ` Matthew Wilcox
2007-07-16 18:28 ` [RFC] VFS: data=ordered (was: [Advocacy] Re: 3ware 9650 tips) Al Boldi
@ 2007-07-16 18:38 ` Bryan J. Smith
1 sibling, 0 replies; 18+ messages in thread
From: Bryan J. Smith @ 2007-07-16 18:38 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Al Boldi, Joshua Baker-LePain, David Chinner, Justin Piszcz,
Jon Collette, linux-ide-arrays, linux-raid, xfs, linux-fsdevel
On Mon, 2007-07-16 at 11:48 -0600, Matthew Wilcox wrote:
> Wow, thanks for bringing an advocacy thread onto linux-fsdevel. Just what
> we wanted. Do you have any insight into how to "get the data=ordered
> onto the VFS level"? Because to me, that sounds like pure nonsense.
First off, I have no idea who decided to respond to my post and CC:
linux-fsdevel on it.
In retrospect, secondly, I should have not posted my post to linux-raid
in the first place (is that list now mirrored to linux-fsdevel or
something?). I was just sharing in my frustration of the lack of XFS
support by Red Hat.
So, lastly and in any case, my apologies to all, even if I did not
proliferate it to linux-fsdevel, it was probably not ideal for me to
post such to anything on vger.kernel.org (like linux-raid) in the first
place.
--
Bryan J. Smith Professional, Technical Annoyance
mailto:b.j.smith@ieee.org http://thebs413.blogspot.com
--------------------------------------------------------
Fission Power: An Inconvenient Solution
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Advocacy] Re: 3ware 9650 tips
2007-07-16 17:34 ` Stuart Levy
@ 2007-07-16 18:44 ` Bryan J. Smith
2007-07-17 17:30 ` Simon Matter
0 siblings, 1 reply; 18+ messages in thread
From: Bryan J. Smith @ 2007-07-16 18:44 UTC (permalink / raw)
To: Stuart Levy
Cc: Joshua Baker-LePain, David Chinner, Justin Piszcz, Jon Collette,
linux-ide-arrays, xfs
[ I removed linux-raid, the only vgers list I posted to (who
re-posted/responded to my post on linux-fsdevel anyway?), from this
response. ]
On Mon, 2007-07-16 at 12:34 -0500, Stuart Levy wrote:
> So they support XFS in Fedora, but not in RHEL??
> (I've been using Fedora...)
Fedora ships support for all filesystems, and I believe has since Fedora
Core 2 / kernel 2.6. RHEL only ships with support for Ext3.
There are various support issues with XFS that I, among others, feel
could be quickly addressed if Red Hat took a keen interest in supporting
XFS as a second, supplemental filesystem to Ext3. I have gone on-record
several times about this, although I've dropped the advocacy over the
last few years and just "given up." I honestly haven't kept up with the
issues either (like issues with XFS and the 4G/4G kernel model and/or
4KiB stacks -- have they been addressed?).
Luckily I have been doing more and more embedded as of late, so I
haven't had to deploy large filesystems. But I have put in a few
Solaris 10/Opteron systems as NFS/SMB file server solutions in the last
year. I'm not saying Solaris is "better," I'm just saying I would
really like RHEL Ext3 with an officially supported XFS release. That's
all.
--
Bryan J. Smith Professional, Technical Annoyance
mailto:b.j.smith@ieee.org http://thebs413.blogspot.com
--------------------------------------------------------
Fission Power: An Inconvenient Solution
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC] VFS: data=ordered (was: [Advocacy] Re: 3ware 9650 tips)
2007-07-16 18:28 ` [RFC] VFS: data=ordered (was: [Advocacy] Re: 3ware 9650 tips) Al Boldi
@ 2007-07-16 19:02 ` Matthew Wilcox
0 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2007-07-16 19:02 UTC (permalink / raw)
To: Al Boldi
Cc: Bryan J. Smith, Joshua Baker-LePain, David Chinner, Justin Piszcz,
Jon Collette, linux-ide-arrays, linux-raid, xfs, linux-fsdevel
On Mon, Jul 16, 2007 at 09:28:08PM +0300, Al Boldi wrote:
> Well, conceptually it sounds like a piece of cake, technically your guess is
> as good as mine. IIRC, akpm once mentioned something like this.
How much have you looked at the VFS? There's nothing journalling-related
in the VFS right now. ext3 and XFS share no common journalling code,
nor do I think that would be possible, due to the very different concepts
they have of journalling.
Here's a good hint:
$ find fs -type f |xargs grep -l journal_start
fs/ext3/acl.c
fs/ext3/inode.c
fs/ext3/ioctl.c
fs/ext3/namei.c
fs/ext3/resize.c
fs/ext3/super.c
fs/ext3/xattr.c
fs/ext4/acl.c
fs/ext4/extents.c
fs/ext4/inode.c
fs/ext4/ioctl.c
fs/ext4/namei.c
fs/ext4/resize.c
fs/ext4/super.c
fs/ext4/xattr.c
fs/jbd/journal.c
fs/jbd/transaction.c
fs/jbd2/journal.c
fs/jbd2/transaction.c
fs/ocfs2/journal.c
fs/ocfs2/super.c
JBD and JBD2 provide a journalling implementation that ext3, ext4 and
ocfs2 use. Note that XFS doesn't, it has its own journalling code.
If you want XFS to support data=ordered, talk to the XFS folks. Or
start picking through XFS yourself, of course -- you do have the source
code.
--
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 3ware 9650 tips
2007-07-16 15:50 ` Eric Sandeen
@ 2007-07-16 22:21 ` David Chinner
0 siblings, 0 replies; 18+ messages in thread
From: David Chinner @ 2007-07-16 22:21 UTC (permalink / raw)
To: Eric Sandeen
Cc: David Chinner, Justin Piszcz, Jon Collette, Joshua Baker-LePain,
linux-ide-arrays, linux-raid, xfs
On Mon, Jul 16, 2007 at 10:50:34AM -0500, Eric Sandeen wrote:
> David Chinner wrote:
> > On Mon, Jul 16, 2007 at 12:41:15PM +1000, David Chinner wrote:
> >> On Fri, Jul 13, 2007 at 03:36:46PM -0400, Justin Piszcz wrote:
> ...
> >> If you've got any sort of serious disk array, ext3 is not the filesystem
> >> to use....
> >
> > To show what the difference is, I used blktrace and Chris Mason's
> > seekwatcher script on a simple, single threaded dd command on
> > a 12 disk dm RAID0 stripe:
> >
> > # dd if=/dev/zero of=/mnt/scratch/fred bs=1024k count=10k; sync
> >
> > http://oss.sgi.com/~dgc/writes/ext3_write.png
> > http://oss.sgi.com/~dgc/writes/xfs_write.png
>
> Were those all with default mkfs & mount options? ext3 in writeback
> mode might be an interesting comparison too.
Defaults. i.e.
# mkfs.ext3 /dev/mapper/dm0
# mkfs.xfs /dev/mapper/dm0
The mkfs.xfs picked up sunit/swidth correctly from the dm volume.
Last time I checked, writeback made little difference to ext3 throughput;
maybe 5-10% at most. I'll run it again later today...
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Advocacy] Re: 3ware 9650 tips
2007-07-16 18:44 ` [Advocacy] " Bryan J. Smith
@ 2007-07-17 17:30 ` Simon Matter
0 siblings, 0 replies; 18+ messages in thread
From: Simon Matter @ 2007-07-17 17:30 UTC (permalink / raw)
To: Bryan J. Smith
Cc: Stuart Levy, Joshua Baker-LePain, David Chinner, Justin Piszcz,
Jon Collette, linux-ide-arrays, xfs
> [ I removed linux-raid, the only vgers list I posted to (who
> re-posted/responded to my post on linux-fsdevel anyway?), from this
> response. ]
>
> On Mon, 2007-07-16 at 12:34 -0500, Stuart Levy wrote:
>> So they support XFS in Fedora, but not in RHEL??
>> (I've been using Fedora...)
>
> Fedora ships support for all filesystems, and I believe has since Fedora
> Core 2 / kernel 2.6. RHEL only ships with support for Ext3.
>
> There are various support issues with XFS that I, among others, feel
> could be quickly addressed if Red Hat took a keen interest in supporting
> XFS as a second, supplemental filesystem to Ext3. I have gone on-record
> several times about this, although I've dropped the advocacy over the
> last few years and just "given up." I honestly haven't kept up with the
> issues either (like issues with XFS and the 4G/4G kernel model and/or
> 4KiB stacks -- have they been addressed?).
I already wrote quite a long mail for this list yesterday but then decided
not to send it because I also gave up on the issue with RedHat. It has all
been said in the past years and it's still true: You can't compare
anything with XFS on Linux and the competitiors aren't sleeping (just
think ZFS for example). I know that XFS had and maybe still has some minor
issues - but, if RedHat decided to use XFS and put the same effort into it
like they did with Ext3, those problems would have gone long ago.
I'm a long time XFS user starting with RedHat 7.x, and I'm still using it
for choosen filesystem on RHEL3,4,5. I just don't want to live without
lvextend and xfs_growfs.
Simon
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Advocacy] Re: 3ware 9650 tips
[not found] <582908739-1184695294-cardhu_decombobulator_blackberry.rim.net-122921225-@bxe015.bisx.prod.on.blackberry>
@ 2007-07-17 20:20 ` Giuseppe Ghibò
2007-07-18 1:41 ` David Chinner
0 siblings, 1 reply; 18+ messages in thread
From: Giuseppe Ghibò @ 2007-07-17 20:20 UTC (permalink / raw)
To: b.j.smith
Cc: Simon Matter, xfs-bounce, Stuart Levy, Joshua Baker-LePain,
David Chinner, Justin Piszcz, Jon Collette, linux-ide-arrays, xfs
Bryan J Smith wrote:
> Matter, Simon wrote:
>> I know that XFS had and maybe still has some minor issues -
>> but, if RedHat decided to use XFS and put the same effort into it
>> like they did with Ext3, those problems would have gone long ago.
>
> Ditto. XFS was the ultimate, complementary filesystem to Ext3.
>
> In all honesty, I'm just looking forward to ZFS on RHEL, sorry to say.
ZFS won't appear officially in Linux until there will be change of
license, so right now it's released in userspace.
> Although I don't see that being "proven" for years, so I'm with XFS "on my own"
> when I deploy RHEL - which is, many times, the only viable filesystem.
>
> --
Indeed XFS is a lot faster than ext3 on many task (e.g. copy/moving
or delete huge files o creating filesystems or dumping with xfsdump),
and worked fine, until linux kernels around 2.6.15|16|17|18 when it had serious problems about
data corruptions. Furthermore when you run xfs_repair to fix such errors, you find that it lost
all the directory names, and places restored files into "random" dirs
named with "number" names.
See for instance:
http://lkml.org/lkml/2006/8/4/97
http://lkml.org/lkml/2006/8/28/88
or http://qa.mandriva.com/show_bug.cgi?id=24716
and sound the fix is not easy to be backportable to kernel series older than 2.6.18,
such as 2.6.17 (used for instance in ubuntu 7.04 and mandriva 2007.1).
Also in the recent 2.6.20|21 kernel series I found it has serious problems of
performance, especially when used in softraid (e.g. for storing
the vmware huge filedisks images a simple "sync" takes fifteen minutes in a raid1).
Bye
Giuseppe.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Advocacy] Re: 3ware 9650 tips
2007-07-17 20:20 ` Giuseppe Ghibò
@ 2007-07-18 1:41 ` David Chinner
2007-07-18 8:47 ` Giuseppe Ghibò
0 siblings, 1 reply; 18+ messages in thread
From: David Chinner @ 2007-07-18 1:41 UTC (permalink / raw)
To: Giuseppe Ghibò
Cc: b.j.smith, Simon Matter, Stuart Levy, Joshua Baker-LePain,
Justin Piszcz, Jon Collette, linux-ide-arrays, xfs
On Tue, Jul 17, 2007 at 10:20:54PM +0200, Giuseppe Ghibò wrote:
> Indeed XFS is a lot faster than ext3 on many task (e.g.
> copy/moving or delete huge files o creating filesystems or dumping
> with xfsdump), and worked fine, until linux kernels around
> 2.6.15|16|17|18 when it had serious problems about data
> corruptions.
<sigh>
I don't mind advocacy, but misleading FUD about filesystem
corruption, regardless of filesystem type, is *not acceptable* in
any forum.
So, to set the record straight, the only XFS corruption I know of in the
releases you mention above is this:
http://oss.sgi.com/projects/xfs/faq.html#dir2
Which was introduced in 2.6.17-rc1 and fixed in 2.6.18-rc2 (IIRC).i
The only released kernels affected are 2.6.17.0-6. i.e. it was fixed
in 2.6.17.7.
And in the interest of full disclosure, there was another in 2.6.19 (IIRC)
to do with a brand new feature that nobody used (the attr2 bug) - until
it was enabled by default on Fedora and the installer tripped over
it - that was fixed in 2.6.20.
If you know of more, then where are the bug reports?
> Furthermore when you run xfs_repair to fix such errors, you find that it lost
> all the directory names, and places restored files into "random" dirs
> named with "number" names.
Please, a little research would tell you what these mean.
When you lose directory entries on a filesystem for *any* reason,
you'll end up with files named by *inode number* placed in
lost+found because they are guaranteed to be unique. The names and
the structure that end up in lost+found are certainly not random
and it's not just XFS that does this. e.g. ext2/3/4 does this, too [1]:
"Some of the directory and files may not pop-up at their right
places. Instead they will be located in /lost+found with names after
their inode numbers."
[1] trivial google search "e2fsck lost+found" points to
http://tldp.org/HOWTO/archived/Ext2fs-Undeletion-Dir-Struct/lostnfnd.html
> See for instance:
>
> http://lkml.org/lkml/2006/8/4/97
> http://lkml.org/lkml/2006/8/28/88
A kernel panic in 2.6.18-rc3/5 due to a bad error handling path that
nobody had hit - or, more correctly, reported - for a couple of
years. This is not a filesystem corrupting bug.
> or http://qa.mandriva.com/show_bug.cgi?id=24716
"------- Comment #3 From Thomas Backlund 2006-08-25 09:32:46 CEST -------
What you are hitting is a bug I tried to warn about before releasing 2007b1,
namely kernel.org-2.6.17.6 had a nasty xfs bug, wich mdv 2.6.17.1mdv was based
on, and I tried to point out before beta1 was released that 2.6.17.7 was out
and had this fixed, but no-one with powers to do anything listened..."
And yes, I can see that you raised this bug. I'm sorry that were
affected by this bug, but in reality you should be complaining to
your kernel release team who released a kernel with known serious
corruption bug that they'd been pre-warned about.
IOWs, your evidence points to one data corruption that only affected
2.6.17.0-2.6.17.6 and you *already knew this*. How does this
translate into data corruption problems that span four whole kernel
releases?
Hence in future can you please try to stick to facts as filesystem
corruption is something that we take extremely seriously.
> Also in the recent 2.6.20|21 kernel series I found it has serious
> problems of performance, especially when used in softraid (e.g.
> for storing the vmware huge filedisks images a simple "sync" takes
> fifteen minutes in a raid1).
Where's the bug report? We can't fix what we don't know about.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Advocacy] Re: 3ware 9650 tips
2007-07-18 1:41 ` David Chinner
@ 2007-07-18 8:47 ` Giuseppe Ghibò
0 siblings, 0 replies; 18+ messages in thread
From: Giuseppe Ghibò @ 2007-07-18 8:47 UTC (permalink / raw)
To: David Chinner; +Cc: linux-ide-arrays, xfs
David Chinner wrote:
> On Tue, Jul 17, 2007 at 10:20:54PM +0200, Giuseppe Ghibò wrote:
>> Indeed XFS is a lot faster than ext3 on many task (e.g.
>> copy/moving or delete huge files o creating filesystems or dumping
>> with xfsdump), and worked fine, until linux kernels around
>> 2.6.15|16|17|18 when it had serious problems about data
>> corruptions.
>
> <sigh>
>
> I don't mind advocacy, but misleading FUD about filesystem
> corruption, regardless of filesystem type, is *not acceptable* in
> any forum.
>
> So, to set the record straight, the only XFS corruption I know of in the
> releases you mention above is this:
>
> http://oss.sgi.com/projects/xfs/faq.html#dir2
>
> Which was introduced in 2.6.17-rc1 and fixed in 2.6.18-rc2 (IIRC).i
> The only released kernels affected are 2.6.17.0-6. i.e. it was fixed
> in 2.6.17.7.
>
> And in the interest of full disclosure, there was another in 2.6.19 (IIRC)
> to do with a brand new feature that nobody used (the attr2 bug) - until
> it was enabled by default on Fedora and the installer tripped over
> it - that was fixed in 2.6.20.
Sorry, but I'm not doing any FUD (if you think so, then my apologies),
I'm just reporting experiences, like any other in this thread.
I was a *strong* estimator of xfs for several reasons (I was even using it for /usr, and
also I was using it everywhere including for /home), but after the experience that lead
to me to that problems I switched back to ext3+dir_index, though it's slower, basically because
I've often to access to the filesystems (e.g. I use it mobile on removable devices)
though different kernels, and I don't know which kernel I would find (e.g.
i might write with a 2.6.12, then reopen with a 2.6.17 or other).
FYI, yes, I was the first who spotted the problem (or rather the first
who did the bug report there), but I was not the only one experiencing that problem, also
Olivier Thauvin, the maintainer of distrib-coffee (which is a 3TB mirror
here http://distrib-coffee.ipsl.jussieu.fr/) and he had the problem
also with kernel based on 2.6.17.14 (final not RC): the problem occurs usually under high|heavy
I/O load/pressure rsync (especially during rsyncing hard and soft links, e.g. over a gigabit
network or an USB external disk). He resolved right now just upgrading
to the final stock 2.6.20|21.
>
> If you know of more, then where are the bug reports?
>
>> Furthermore when you run xfs_repair to fix such errors, you find that it lost
>> all the directory names, and places restored files into "random" dirs
>> named with "number" names.
>
> Please, a little research would tell you what these mean.
>
> When you lose directory entries on a filesystem for *any* reason,
> you'll end up with files named by *inode number* placed in
> lost+found because they are guaranteed to be unique. The names and
> the structure that end up in lost+found are certainly not random
> and it's not just XFS that does this. e.g. ext2/3/4 does this, too [1]:
>
> "Some of the directory and files may not pop-up at their right
> places. Instead they will be located in /lost+found with names after
> their inode numbers."
yes, I knew names should come from the inode numbers, but indeed
were notjust "some" file which takes the inode dir name, but a bunch.
Note that in my case if I shouldn't have done any xfs_repair I problably
wouldn't have lost any file. Indeed the term "lost" is wrong, it didn't
loose any file, the files were there just moved by xfs_repair under the inodes names
(but was not just a couple but thousand of dirs like so, but the problem
originated on try deleting a simple softlink). I know that also other
filesystem place files in lost+found, but I've not experienced a
so high number of recovered|renamed dirs (thousand) in the past, and
if you have the same filename repated under thousand of dir (think to a mirror
of software branches), then it's hard to manually replace them in the right place.
> [...]
> Hence in future can you please try to stick to facts as filesystem
> corruption is something that we take extremely seriously.
so I.
Bye
Giuseppe.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-07-18 8:45 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.LRH.0.999.0707131356520.25773@chaos.egr.duke.edu>
[not found] ` <Pine.LNX.4.64.0707131434470.31742@p34.internal.lan>
[not found] ` <4697CA4D.6020304@etelos.com>
2007-07-13 19:36 ` 3ware 9650 tips Justin Piszcz
2007-07-16 2:41 ` David Chinner
2007-07-16 12:22 ` David Chinner
2007-07-16 12:39 ` Bernd Schubert
2007-07-16 15:50 ` Eric Sandeen
2007-07-16 22:21 ` David Chinner
2007-07-16 15:43 ` Joshua Baker-LePain
2007-07-16 17:15 ` [Advocacy] " Bryan J. Smith
[not found] ` <200707162040.00062.a1426z@gawab.com>
2007-07-16 17:48 ` Matthew Wilcox
2007-07-16 18:28 ` [RFC] VFS: data=ordered (was: [Advocacy] Re: 3ware 9650 tips) Al Boldi
2007-07-16 19:02 ` Matthew Wilcox
2007-07-16 18:38 ` [Advocacy] Re: 3ware 9650 tips Bryan J. Smith
2007-07-16 17:34 ` Stuart Levy
2007-07-16 18:44 ` [Advocacy] " Bryan J. Smith
2007-07-17 17:30 ` Simon Matter
[not found] <582908739-1184695294-cardhu_decombobulator_blackberry.rim.net-122921225-@bxe015.bisx.prod.on.blackberry>
2007-07-17 20:20 ` Giuseppe Ghibò
2007-07-18 1:41 ` David Chinner
2007-07-18 8:47 ` Giuseppe Ghibò
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox