Re: Shutdown filesystem when a thin pool become full

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gionatan Danti <g.danti@assyoma.it>
To: linux-xfs@vger.kernel.org
Cc: g.danti@assyoma.it
Subject: Re: Shutdown filesystem when a thin pool become full
Date: Tue, 20 Jun 2017 17:03:42 +0200	[thread overview]
Message-ID: <dee0d3cc198663bb850ef2576c84d620@assyoma.it> (raw)
In-Reply-To: <20170620110548.eruly7ygixydyk2o@eorzea.usersys.redhat.com>

Il 20-06-2017 13:05 Carlos Maiolino ha scritto:
> 
> AFAIK, it will return ENOSPC with O_DIRECT, yes. With async writes, you 
> won't
> have any error returned until you issue a fsync/fdatasync, which, per 
> my
> understanding, it will return an EIO.
> 

Ok, I was missing that; so ENOSPC will be returned for O_DIRECT only. 
I'll take a note ;)

> 
> The application won't be alerted in any way unless it uses 
> fsync()/fdatasync()
> with any filesystem being used, even using data=journal in ext4, this 
> won't
> happen, ext4 gets mounted as read-only because there were 'metadata' 
> errors when
> writing the file to the journal, but again, it is not a fix for a 
> faulty
> application, it is not even reliable for shutting down the filesystem 
> the way
> you are thinking this will. It will only shut down the filesystem 
> depending on
> the amount of blocks being allocated, even when using data=journal, if 
> the
> amount of blocks allocated are enough to hold the metadata, but not the 
> data,
> you will see the same problem as you are seeing with XFS (or ext4 
> without
> data=journal), so, don't rely on it.
> 

This somewhat scares me. From my understanding, a full thin pool will 
eventually bring XFS to an halt (filesystem shutdown) but, from my 
testing, this can take a fair amount of time/failed writes. During this 
period, any writes will be lost without nobody noticing that. In fact, I 
opened a similar thread on the lvm mailing list discussing this very 
same problem.

> 
> Yes, these options won't help, because they are configuration options
> for metadata errors, not data errors.
> 
> Please, bear in mind that your question should be: "how can I stop a 
> filesystem
> when async writes return I/O errors", because this isn't a XFS issue.
> 
> BUt again, there isn't too much you can do here, async writes are 
> supposed to
> behave this way. And whoever is writing "data" to the device is 
> supposed to care
> of their own data.
> 
> Imagine for example a situation where you have 2 applications using the 
> same
> filesystem (quite common right?), then application A and B issues 
> buffered
> writes, and for some reason, application A data, hits an IO error, for 
> any
> reason, maybe a too busy storage, a missed scsi command, whatever, 
> anything that
> can be retried.
> 
> then the filesystem shuts down because of that, which will also affect
> application B, even if nothing wrong happened with application B.
> 
> One of the goals of multitasking is having applications running at the 
> same time
> without affecting each other.
> 
> Now, consider that, application B is a well written application, and 
> application
> A isn't.
> 
> App B cares for its data to be written to disk, while app A doesn't.
> 
> In case of a casual error, app B will retry to write its data, while 
> app A
> won't.
> 
> Should we really shutdown the filesystem here affecting everything on 
> the
> system, because application A is not caring for its own data?
> 
> Shutting a filesystem down, has basically one purpose: avoid 
> corruption, we
> basically only shutdown a filesystem when keeping it alive can cause a 
> problem
> with everything using it (really really simple explanation here).
> 
> Surely this can be improved, but at the end, the application will 
> always need to
> check for its own data.

I think the key improvement would be to let the filesystem know about 
the full thin pool - ie: returing ENOSPC at some convenient time (a wild 
guess: can we return ENOSPC during delayed block allocation?)

> 
> I am not really a device-mapper developer and I don't know much about 
> its code
> in depth. But, I know it will issue warnings when there isn't more 
> space left,
> and you can configure a watermark too, to warn the admin when the space 
> used
> reaches that watermark.
> 
> By now, I believe the best solution is to have a reasonable watermark 
> set on the
> thin device, and the Admin take the appropriate action whenever this 
> watermark
> is achieved.

Yeah, lvmthin *will* return appropriate warnings during pool filling. 
However, this require active monitoring which, albeit a great idea and 
"the right thing to do (tm)", it adds complexity and can itself fail. In 
recent enought (experimental) versions, lvmthin can be instructed to 
execute specific actions when data allocation is higher than some 
threshold, which somewhat addresses my concerns at the block layer.

Thank you for your patience and sharing, Carlos.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

next prev parent reply	other threads:[~2017-06-20 15:03 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-22 14:25 Shutdown filesystem when a thin pool become full Gionatan Danti
2017-05-22 23:09 ` Carlos Maiolino
2017-05-23 10:56   ` Gionatan Danti
2017-05-23 11:01     ` Gionatan Danti
2017-05-23 12:27       ` Carlos Maiolino
2017-05-23 20:05         ` Gionatan Danti
2017-05-23 21:33           ` Eric Sandeen
2017-05-24 17:52             ` Gionatan Danti
2017-06-13  9:09           ` Gionatan Danti
2017-06-15 11:51             ` Gionatan Danti
2017-06-15 13:14               ` Carlos Maiolino
2017-06-15 14:10                 ` Carlos Maiolino
2017-06-15 15:04                   ` Gionatan Danti
2017-06-20 10:19                     ` Gionatan Danti
2017-06-20 11:05                     ` Carlos Maiolino
2017-06-20 15:03                       ` Gionatan Danti [this message]
2017-06-20 15:28                         ` Brian Foster
2017-06-20 15:34                           ` Luis R. Rodriguez
2017-06-20 17:01                             ` Brian Foster
2017-06-20 15:55                           ` Gionatan Danti
2017-06-20 17:02                             ` Brian Foster
2017-06-20 18:43                               ` Gionatan Danti
2017-06-21  9:44                                 ` Carlos Maiolino
2017-06-21 10:39                                   ` Gionatan Danti
2017-06-21  9:53                                 ` Brian Foster
2017-05-23 12:11     ` Carlos Maiolino
2017-05-23 13:24 ` Eric Sandeen
2017-05-23 20:23   ` Gionatan Danti
2017-05-24  7:38     ` Carlos Maiolino
2017-05-24 17:50       ` Gionatan Danti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dee0d3cc198663bb850ef2576c84d620@assyoma.it \
    --to=g.danti@assyoma.it \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.