* Reducing ext4 fs issues resulting from frequent hard poweroffs [not found] <CAPA0+rx8eLJU6j1uus2bBY63SrY_WC4TU_WTy0MoXk031wNjJw@mail.gmail.com> @ 2020-05-12 21:08 ` Julio Lajara 2020-05-12 22:01 ` Theodore Y. Ts'o 2020-05-13 3:16 ` Eric Sandeen 0 siblings, 2 replies; 3+ messages in thread From: Julio Lajara @ 2020-05-12 21:08 UTC (permalink / raw) To: linux-ext4 Hi all, I currently manage an IOT fleet based on Intel NUCs running Ubuntu 18.04 Server on SSDs with etx4, no swap. The device usage is more CPU bound than I/O bound and we are having some issues keeping a subset of devices running due to them being hard powered off in the field in some regions (sometimes as frequently as every 12hrs). Due to current difficulties in getting devices back from the field I'm looking into tweaking them as best as possible to survive these hard power off barring any physical SSD issues. Currently I have tried tweaking some ext4 and I/O settings with the following: * kernel options: elevator=noop fsck.mode=force fsck.repair=yes * fstab ext4 specific mount options: commit=1,max_batch_time=0 Are there any other configuration settings or changes to the above that would make sense to try here for this use case? I am hoping to at least make the fsck repair the last line of defence so it doesnt get stuck waiting for a prompt to repair it at boot, but want to try to change the I/O / ext4 behavior if possible so its writing as frequently as sanely possible to try to reduce the frequency where fsck is actually needed. Thanks, ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Reducing ext4 fs issues resulting from frequent hard poweroffs 2020-05-12 21:08 ` Reducing ext4 fs issues resulting from frequent hard poweroffs Julio Lajara @ 2020-05-12 22:01 ` Theodore Y. Ts'o 2020-05-13 3:16 ` Eric Sandeen 1 sibling, 0 replies; 3+ messages in thread From: Theodore Y. Ts'o @ 2020-05-12 22:01 UTC (permalink / raw) To: julio.lajara; +Cc: linux-ext4 On Tue, May 12, 2020 at 05:08:51PM -0400, Julio Lajara wrote: > Hi all, I currently manage an IOT fleet based on Intel NUCs running > Ubuntu 18.04 Server on SSDs with etx4, no swap. The device usage is > more CPU bound than I/O bound and we are having some issues keeping a > subset of devices running due to them being hard powered off in the > field in some regions (sometimes as frequently as every 12hrs). Due to > current difficulties in getting devices back from the field I'm > looking into tweaking them as best as possible to survive these hard > power off barring any physical SSD issues. Hi Julio, If the hardware devices are behaving appropriately --- that is, after receiving a CACHE FLUSH command the storage device persists all blocks written up to the CACHE FLUSH command, such that when the OS receives the command completion notification of the CACHE FLUSH, everything is persisted even after a hard power off --- no special configuration should be necessary. We have regression tests which simulate this and ext4 regularly passes them. If you need to tweak settings, that's an indication that your hardware is buggy. And unfortunately ,there's not much we can do to prevent failures. A lot is going to depend on *how* crappy the SSD's happen to be. Your best bet might be to find a way to make your root filesystem read-only, so it's not being modified at all, and then set up a scratch partition with state which can be reformatted at any time if it gets corrupted --- and then try to get all of your date pushed out to your remote servers / cloud as often as possible. And next time, qualify the SSD's ahead of time to make sure they aren't overly "cost optimized" (read: crap) before you buy your fleet of devices. :-( - Ted ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Reducing ext4 fs issues resulting from frequent hard poweroffs 2020-05-12 21:08 ` Reducing ext4 fs issues resulting from frequent hard poweroffs Julio Lajara 2020-05-12 22:01 ` Theodore Y. Ts'o @ 2020-05-13 3:16 ` Eric Sandeen 1 sibling, 0 replies; 3+ messages in thread From: Eric Sandeen @ 2020-05-13 3:16 UTC (permalink / raw) To: julio.lajara, linux-ext4 On 5/12/20 4:08 PM, Julio Lajara wrote: > Hi all, I currently manage an IOT fleet based on Intel NUCs running > Ubuntu 18.04 Server on SSDs with etx4, no swap. The device usage is > more CPU bound than I/O bound and we are having some issues keeping a > subset of devices running due to them being hard powered off in the > field in some regions (sometimes as frequently as every 12hrs). Due to > current difficulties in getting devices back from the field I'm > looking into tweaking them as best as possible to survive these hard > power off barring any physical SSD issues. I don't think you've actually said what the failure mode after power loss is, have you? > Currently I have tried tweaking some ext4 and I/O settings with the following: > > * kernel options: > elevator=noop fsck.mode=force fsck.repair=yes > > * fstab ext4 specific mount options: > commit=1,max_batch_time=0 > > Are there any other configuration settings or changes to the above > that would make sense to try here for this use case? I am hoping to at > least make the fsck repair the last line of defence so it doesnt get > stuck waiting for a prompt to repair it at boot, but want to try to > change the I/O / ext4 behavior if possible so its writing as > frequently as sanely possible to try to reduce the frequency where > fsck is actually needed. I can't tell from this why fsck is needed in the first place; what actually goes wrong when power is lost? Ted's right that properly behaving hardware should not require any special attention after power loss to restore filesystem consistency, but I can't tell for sure what your actual root cause for boot failure is from this email... -Eric ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-05-13 3:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAPA0+rx8eLJU6j1uus2bBY63SrY_WC4TU_WTy0MoXk031wNjJw@mail.gmail.com>
2020-05-12 21:08 ` Reducing ext4 fs issues resulting from frequent hard poweroffs Julio Lajara
2020-05-12 22:01 ` Theodore Y. Ts'o
2020-05-13 3:16 ` Eric Sandeen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.