All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Some very basic questions
@ 2008-10-22 14:35 dbz
  2008-10-27 15:43 ` Stephan von Krawczynski
  0 siblings, 1 reply; 80+ messages in thread
From: dbz @ 2008-10-22 14:35 UTC (permalink / raw)
  To: linux-btrfs

concerning this discussion, I'd like to put up some "requests" which 
strongly oppose to those brought up initially:

- if you run into an error in the fs structure or any IO error that prevents 
you from bringing the fs into a consistent state, please simply oops. If a 
user feels that availability is a main issue, he has to use a failover 
solution. In this case a fast and clean cut is desireable and no 
"pray-and-hope-mode" or "90%-mode". If avaliability is not the issue, it is 
in any case most important that data on the fs is safe. If you don't oops, 
you risk to pose further damage onto the filesystem and end up with a 
completely destroyed fs.

- if you get any IO error, please **don't** put up a number of retries or 
anything. If the device reports an error simply believe it. It is bad enough 
that many block drivers or controllers try to be smart and put up hundreds 
of retries. Adding further retries you only end up in wasting hours on 
useless retries. If availability is an issue, the user again has to put up a 
failover solution. Again, a clean cut is what is needed. The user has to 
make shure he uses appropiate configuration according to the importance of 
his data (mirroring on the fs and/or RAID, failover ...)

- if during mount something unexpected comes up and you can't be shure that 
the fs will work properly, please deny mounting and request a fsck. This can 
be easily handled by a start- or mount-script. During mount, take the time 
you need to ensure that the fs looks proper and safe to use. I'd rather now 
during boot that something is wrong than to run with a foul fs and end up 
with data loss or any other mixup later on.

- btrfs is no cluster fs, so there is no point of even thinking about it. If 
somebody feels he needs multiple writeable mounts of the same fs, please use 
a cluster fs. Of course, you have to live with the tradeoffs. Dreaming of a 
fs that uses something like witchcraft to do things like locking, quorums, 
cache synchronisation without penalty and, of course, without any 
configuration, is pointless.

In my opinon, the whole thing comes up from the idea of using cheap hardware 
and out-of-the-box configurations to keep promises of reliability and 
availability which are not realistic. There is a reason why there are more 
expensive HDDs, RAIDs, SANs with volume mirroring, multipathing and so on. 
Simply ignoring the fact that you have to use the proper tools to address 
specific problems and pray to the toothfairy to put a 
solve-all-my-problems-fs under your pillow is no solution. I'd rather have a 
solid fs with deterministic behavior and some state-of-the-art features.

Just my 2c.
(Gerald) 


^ permalink raw reply	[flat|nested] 80+ messages in thread
* Re: Some very basic questions
@ 2008-10-21 17:37 calin
  2008-10-21 20:08 ` jim owens
  0 siblings, 1 reply; 80+ messages in thread
From: calin @ 2008-10-21 17:37 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-btrfs

> question is: if you had such an implementation, are there
> drawbacks expectable for the single-mount case? If not I'd vote for it
> because there are not really many alternatives "on the market".

As I understand it, the largest issue is in locking and boundaries.  Two different systems could mount a filesystem, and try to use some sort of on-disk markers to keep from writing to the same area at the same time... but there is often some bit of time between when a system sends data to the disk and when it would become available to read from the disk, and little or no guarantee about the order in which the data is written.  All the work that goes into making transactions atomic depends on there only being a single path to the disk - through the code that handles transactions.  If data can arrive on the disk without being managed by that code, all bets are off.

^ permalink raw reply	[flat|nested] 80+ messages in thread
* Some very basic questions
@ 2008-10-21 11:23 Stephan von Krawczynski
  2008-10-21 12:13 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: Stephan von Krawczynski @ 2008-10-21 11:23 UTC (permalink / raw)
  To: linux-btrfs

Hello all,

reading the list for a while it looks like all kinds of implementational
topics are covered but no basic user requests or talks are going on. Since I
have found no other list on vger covering these issues I choose this one,
forgive my ignorance if it is the wrong place.
Like many people on the planet we try to handle quite some amounts of data
(TBs) and try to solve this with several linux-based fileservers.
Years of (mostly bad) experience led us to the following minimum requirements
for a new fs on our servers:

1. filesystem-check
1.1 it should not
    - delay boot process (we have to wait for hours currently)
    - prevent mount in case of errors
    - be a part of the mount process at all
    - always check the whole fs
1.2 it should be able 
    - to always be started interactively by user
    - to check parts/subtrees of the fs
    - to run purely informational (reporting, non-modifying)
    - to run on a mounted fs
2. general requirements
    - fs errors without file/dir names are useless
    - errors in parts of the fs are no reason for a fs to go offline as a whole
    - mounting must not delay the system startup significantly
    - resizing during runtime (up and down)
    - parallel mounts (very important!)
      (two or more hosts mount the same fs concurrently for reading and
      writing)
    - journaling
    - versioning (file and dir)
    - undelete (file and dir)
    - snapshots
    - run into hd errors more than once for the same file (as an option)
    - map out dead blocks
      (and of course display of the currently mapped out list)
    - no size limitations (more or less)
    - performant handling of large numbers of files inside single dirs
      (to check that use > 100.000 files in a dir, understand that it is
      no good idea to spread inode-blocks over the whole hd because of seek
      times)
    - power loss at any time must not corrupt the fs (atomic fs modification)
      (new-data loss is acceptable)

Remember, this is not meant to be a request for features, it is a list that
built up over 10 years of handling data and the failures we experienced. To
our knowledge no fs meets this list, but hey, is that a reason for not talking
about it? Our goal is pretty simple: maximize fs uptime.
How does btrfs match?
-- 
Regards,
Stephan

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2008-10-28  3:45 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-22 14:35 Some very basic questions dbz
2008-10-27 15:43 ` Stephan von Krawczynski
2008-10-28  3:45   ` Re[2]: " sftf
  -- strict thread matches above, loose matches on Subject: below --
2008-10-21 17:37 calin
2008-10-21 20:08 ` jim owens
2008-10-22  7:15   ` Avi Kivity
2008-10-22 14:13     ` jim owens
2008-10-22 14:25       ` Avi Kivity
2008-10-21 11:23 Stephan von Krawczynski
2008-10-21 12:13 ` Andi Kleen
2008-10-21 14:22   ` Stephan von Krawczynski
2008-10-21 15:34     ` jim owens
2008-10-22 11:36       ` Stephan von Krawczynski
2008-10-22 12:15         ` Avi Kivity
2008-10-22 13:03           ` Ric Wheeler
2008-10-22 13:13             ` Chris Mason
2008-10-22 13:16             ` Avi Kivity
2008-10-21 13:20 ` jim owens
2008-10-21 17:01   ` Stephan von Krawczynski
2008-10-21 17:15     ` Christoph Hellwig
2008-10-21 17:31       ` Ric Wheeler
2008-10-22 12:27         ` Stephan von Krawczynski
2008-10-22 13:15           ` Chris Mason
2008-10-22 13:27             ` Ric Wheeler
2008-10-22 14:32               ` Avi Kivity
2008-10-22 14:36                 ` Chris Mason
2008-10-22 14:40                   ` Avi Kivity
2008-10-22 14:46                 ` Ric Wheeler
2008-10-22 14:54                   ` Avi Kivity
2008-10-22 15:02                     ` Ric Wheeler
2008-10-22 15:13                       ` Avi Kivity
2008-10-22 15:25                         ` Ric Wheeler
2008-10-22 15:33                           ` Chris Mason
2008-10-22 15:43                             ` Avi Kivity
2008-10-22 15:54                               ` Ric Wheeler
2008-10-22 18:28                                 ` Avi Kivity
2008-10-22 15:39                           ` Avi Kivity
2008-10-22 13:52             ` Stephan von Krawczynski
2008-10-22 15:56               ` Michel Salim
2008-10-22 16:56                 ` jim owens
2008-10-23  9:47                 ` Stephan von Krawczynski
2008-10-22 11:40       ` Stephan von Krawczynski
2008-10-21 13:59 ` Chris Mason
2008-10-21 16:09   ` Andi Kleen
2008-10-22 11:43     ` Stephan von Krawczynski
2008-10-21 16:27   ` Stephan von Krawczynski
2008-10-21 16:59     ` Andi Kleen
2008-10-22 11:46       ` Stephan von Krawczynski
2008-10-21 17:49     ` Chris Mason
2008-10-22 12:19       ` Stephan von Krawczynski
2008-10-22 12:48         ` Jeff Schroeder
2008-10-22 14:02           ` Stephan von Krawczynski
2008-10-22 13:50         ` Chris Mason
2008-10-22 14:04           ` Matthias Wächter
2008-10-22 14:32             ` Ric Wheeler
2008-10-22 14:44               ` jim owens
2008-10-24  8:42           ` Chris Samuel
2008-10-24  8:39         ` Chris Samuel
2008-10-21 20:54   ` Eric Anopolsky
2008-10-21 22:18     ` Ric Wheeler
2008-10-22  2:29       ` Eric Anopolsky
2008-10-22 10:42         ` Ric Wheeler
2008-10-22 10:53           ` Tejun Heo
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 13:15               ` Tejun Heo
2008-10-22 13:19                 ` Chris Mason
2008-10-22 13:38                   ` Ric Wheeler
2008-10-22 13:59                     ` Chris Mason
2008-10-22 14:23                       ` Ric Wheeler
2008-10-22 13:23                 ` Ric Wheeler
2008-10-22 16:14                   ` Tejun Heo
2008-10-22 16:34                     ` Ric Wheeler
2008-10-23  3:59                       ` Tejun Heo
2008-10-22 18:32                     ` Avi Kivity
2008-10-22 19:13                       ` jim owens
2008-10-22 19:22                         ` Avi Kivity
2008-10-22 19:59                       ` Ric Wheeler
2008-10-22 21:31                     ` Eric Anopolsky
2008-10-22 21:56                       ` Ric Wheeler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.