Re: trouble with generic/081

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	dm-devel@redhat.com, Eric Sandeen <sandeen@sandeen.net>,
	eguan@redhat.com
Subject: Re: trouble with generic/081
Date: Fri, 6 Jan 2017 09:46:00 +1100	[thread overview]
Message-ID: <20170105224600.GC4326@dastard> (raw)
In-Reply-To: <7b8fa79f-89f8-bb9a-a2fc-8a3b966a877d@redhat.com>

On Thu, Jan 05, 2017 at 10:12:25PM +0100, Zdenek Kabelac wrote:
> Dne 5.1.2017 v 20:29 Eric Sandeen napsal(a):
> >On 1/5/17 1:13 PM, Zdenek Kabelac wrote:
> >>>Anyway, at this point I'm not convinced that anything but the filesystem
> >>>should be making decisions based on storage error conditions.
> >>
> >>So far I'm not convinced  doing nothing is better then trying at least unmount.
> >>
> >>Since doing nothing is known to cause  SEVERE filesystem damages,
> >>while I've haven't heard about them when 'unmount' is in the field.
> >
> >I'm pretty sure that's exactly what started this thread.  ;)
> >
> >Failing IOs should never cause "severe filesystem damage" - that is what
> >a journaling filesystem is /for/.  Can you explain further?
> 
> well all I know are user reports - which we capable to use 'XFS'
> with exhausted  thin-pool while  having 'snapshots' of their volumes.
> 
> Since there was no 'umount' and  XFS upon write error just retried
> endlessly to write block over and over -  system appeared

Which has already been fixed upstream.

And my 2c worth on the "lvm unmounting filesystems on error" - stop
it, now. It's the wrong thing to do, and it makes it impossible for
filesystems to handle the error and recover gracefully when
possible.

> to the users nice & usable for quite long time (especially when
> boxes had 32G of RAM or more...)
> 
> Maybe writes passed to 'uniquely' owned blocs....
> 
> Then after some day,two,free   OOM finally killed.
> Users realized thin-pool was out-of-space - added room to VG and pool
> and tried  xfs_repair - but whole FS was largely lost.

That sounds very much like a block device snapshot corruption
problem, not a filesystem problem. As always, the filesystem gets
blamed for data loss, regardless of where the problem really lies.

> Use  LV and make some thin snapshots.
> 
> Then change various parts of origin - at various moment before pool
> is out-of-space
> 
> So you will get lots of different scenarios of missing data.
> 
> You will mostly not get into those mentioned trouble if you
> have just single thinLV and you exhaust thin-pool while using it.
> 
> Games with snapshot are needed.

This really sounds like a problem with snapshot ENOSPC error
handling, not a filesystem issue - the filesystem is simply the
messenger here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2017-01-05 22:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-14 16:43 trouble with generic/081 Christoph Hellwig
2016-12-15  6:29 ` Eryu Guan
2016-12-15  6:36 ` Dave Chinner
2016-12-15  8:42   ` Christoph Hellwig
2016-12-15  9:16     ` Zdenek Kabelac
2016-12-16  8:15       ` Christoph Hellwig
2016-12-16  9:31         ` Zdenek Kabelac
2017-01-04 23:03         ` Eric Sandeen
2017-01-05 10:35           ` Zdenek Kabelac
2017-01-05 10:35             ` Zdenek Kabelac
2017-01-05 16:26             ` Mike Snitzer
2017-01-05 17:42               ` Zdenek Kabelac
2017-01-05 17:42                 ` Zdenek Kabelac
2017-01-05 18:07                 ` Mike Snitzer
2017-01-05 18:40                 ` Eric Sandeen
2017-01-05 18:24             ` Eric Sandeen
2017-01-05 18:52               ` Mike Snitzer
2017-01-05 19:13               ` Zdenek Kabelac
2017-01-05 19:29                 ` Eric Sandeen
2017-01-05 21:12                   ` Zdenek Kabelac
2017-01-05 22:03                     ` Eric Sandeen
2017-01-05 22:46                     ` Dave Chinner [this message]
2017-01-09 13:39                       ` Christoph Hellwig
2017-01-09 14:22                         ` Zdenek Kabelac
2017-01-09 14:54                           ` Eric Sandeen
2017-01-09 15:11                             ` Zdenek Kabelac
2017-01-10  2:48                               ` Theodore Ts'o
2017-01-10  4:30                             ` Darrick J. Wong
2017-01-09 15:01                           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170105224600.GC4326@dastard \
    --to=david@fromorbit.com \
    --cc=dm-devel@redhat.com \
    --cc=eguan@redhat.com \
    --cc=hch@infradead.org \
    --cc=sandeen@sandeen.net \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.