public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Zhang <mingz@ele.uri.edu>
To: Peter Grandi <pg_xfs@xfs.for.sabi.co.UK>
Cc: Linux XFS <linux-xfs@oss.sgi.com>
Subject: Re: stable xfs
Date: Wed, 19 Jul 2006 10:45:04 -0400	[thread overview]
Message-ID: <1153320304.2691.56.camel@localhost.localdomain> (raw)
In-Reply-To: <17598.3876.565887.172598@base.ty.sabi.co.UK>

On Wed, 2006-07-19 at 11:53 +0100, Peter Grandi wrote:
> [ ... ]
> 
> mingz> when u say large parallel storage system, you mean
> mingz> independent spindles right? but most people will have all
> mingz> disks configured in one RAID5/6 and thus it is not parallel
> mingz> any more.
> 
> cw> it depends, you might have 100s of spindles in groups, you
> cw> don't make a giant raid5/6 array with that many disks, you
> cw> make a number of smaller arrays
> 
> Perhaps you are undestimating the ''if it can be done''
> mindset...
> 
> Also, if one does a number of smaller RAID5s, is each one a
> separate filesystem or they get aggregated, for example with
> LVM with ''concat''? Either way, how likely is is that the
> consequences have been thought through?
> 
> I would personally hesitate to recommend either, especially a
> two-level arrangement where the base level is a RAID5.

could u give us some hints on this? since it is really popular to have a
FS/LV/MD structure and I believe LVM is designed for this purpose.


> 
> [I am making an effort in this discussion to use euphemisms]
> 
> mingz> i think with write barrier support, system without UPS
> mingz> should be ok.
> 
> cw> with barrier support a UPS shouldn't be necessary
> 
> Sure, «should» and «shouldn't» are nice hopeful concepts.
> 
> But write barriers are difficult to achieve, and when achieved
> they are often unreliable, except on enterprise level hardware,
> because many disks/host adapters/...  simply lie as to whether
> they have actually started writing (never mind finished writing,
> or written correctly) stuff.
> 
> To get reliable write barrier often one has to source special
> cards or disks with custom firmware; or leave system integration
> to the big expensive guys and buy an Altix or equivalent system
> from Sun or IBM.
> 
> Besides I have seen many reports of ''corruption'' that cannot
> be fixed by write barriers: many have the expectation that
> *data* should not be lost, even if no 'fsync' is done, *as if*
> 'mount -o sync' or 'mount -o data=ordered'.
> 
> Of course that is a bit of an inflated expectation, but all that
> the vast majority of sysadms care about is whether it ''just
> works'', without ''wasting time'' figuring things out.
> 
> mingz> considering even u have UPS, kernel oops in other parts
> mingz> still can take the FS down.
> 
> cw> but a crash won't cause writes to be 'reordered' [ ... ]
> 
> The metadata will be consistent, but metadata and data may well
> will be lost. So the filesystem is still ''corrupted'', at least
> from the point of view of a sysadm who just wants the filesystem
> to be effortlessly foolproof. Anyhow, if a crash happens all
> bets are off, because who knows *what* gets written.
> 
> Look at it from the point of view of a ''practitioner'' sysadm:
> 
>   ''who cares if the metadata is consistent, if my 3TiB
>   application database is unusable (and I don't do backups
>   because after all it is a concat of RAID5s, backups are not
>   necessary) as there is a huge gap in some data file, and my
>   users are yelling at me, and it is not my fault''
> 
> The tradeoff in XFS is that if you know exactly what you are
> doing you get extra performance...

then i think unless you disable all write cache, none of the file system
can achieve this goal. or maybe ext3 with both data and metadata into
log might do this?

Ming

  reply	other threads:[~2006-07-19 14:45 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-17 15:30 stable xfs Ming Zhang
2006-07-17 16:20 ` Peter Grandi
2006-07-18 22:36   ` Ming Zhang
2006-07-18 23:14     ` Peter Grandi
2006-07-19  1:20       ` Ming Zhang
2006-07-19  5:56         ` Chris Wedgwood
2006-07-19 10:53           ` Peter Grandi
2006-07-19 14:45             ` Ming Zhang [this message]
2006-07-22 17:13               ` Peter Grandi
2006-07-20  6:12             ` Chris Wedgwood
2006-07-22 17:31               ` Peter Grandi
2006-07-19 14:10           ` Ming Zhang
2006-07-19 10:24         ` Peter Grandi
2006-07-19 13:11           ` Ming Zhang
2006-07-20  6:15             ` Chris Wedgwood
2006-07-20 14:08               ` Ming Zhang
2006-07-20 16:17                 ` Chris Wedgwood
2006-07-20 16:38                   ` Ming Zhang
2006-07-20 19:04                     ` Chris Wedgwood
2006-07-21  0:19                       ` Ming Zhang
2006-07-21  3:26                         ` Chris Wedgwood
2006-07-21 13:10                           ` Ming Zhang
2006-07-21 16:07                             ` Chris Wedgwood
2006-07-21 17:00                               ` Ming Zhang
2006-07-21 18:07                                 ` Chris Wedgwood
2006-07-24  1:14                                   ` Ming Zhang
2006-07-22 18:09                     ` Peter Grandi
2006-07-22 17:47                 ` Peter Grandi
2006-07-22 15:37             ` Peter Grandi
2006-07-18 23:54 ` Nathan Scott
2006-07-19  1:15   ` Ming Zhang
2006-07-19  7:40   ` Martin Steigerwald
2006-07-19 14:11     ` Ming Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1153320304.2691.56.camel@localhost.localdomain \
    --to=mingz@ele.uri.edu \
    --cc=linux-xfs@oss.sgi.com \
    --cc=pg_xfs@xfs.for.sabi.co.UK \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox