From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-f65.google.com ([209.85.221.65]:45265 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729062AbeJWFr2 (ORCPT ); Tue, 23 Oct 2018 01:47:28 -0400 MIME-Version: 1.0 References: <3000620.g91H2S8sUk@blindfold> In-Reply-To: <3000620.g91H2S8sUk@blindfold> From: =?UTF-8?B?UmFmYcWCIE1pxYJlY2tp?= Date: Mon, 22 Oct 2018 23:27:00 +0200 Message-ID: Subject: Re: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") To: Richard Weinberger Cc: Amir Goldstein , Miklos Szeredi , linux-unionfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Artem Bityutskiy , Adrian Hunter , linux-mtd@lists.infradead.org, Russell Senior , OpenWrt Development List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-fsdevel-owner@vger.kernel.org List-ID: [resending without HTML attachment] [see http://files.zajec.net/openwrt/ubifs-dbg.html instead] On Mon, 22 Oct 2018 at 10:26, Richard Weinberger wrote: > Am Montag, 22. Oktober 2018, 09:14:08 CEST schrieb Rafa=C5=82 Mi=C5=82eck= i: > > On Fri, 19 Oct 2018 at 14:31, Rafa=C5=82 Mi=C5=82ecki wrote: > > > Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly > > > reporting file system corruptions. OpenWrt uses overlay(fs) with > > > squashfs as lowerdir and ubifs as upperdir. Russell managed to isolat= e > > > & describe test case for reproducing corruption when doing a power cu= t > > > after first boot. > > > > > > (...) > > > > > > Can I ask you to check if there is something possibly wrong with the > > > above ovl commit? Or does it expose some problem with the ubifs? Or > > > maybe the whole UBI? > > > > > > FWIW testing above commit (and one before it) always results in singl= e > > > error in the kernel log: > > > [ 14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphan= ed twice > > > > > > That UBIFS error doesn't occur with 4.12.14. Unfortunately it's > > > impossible to cleanly revert 3a1e819b4e80 from the top of 4.12.14. > > > > Let me provide a summary of all relevant commits & tests: > > > > By "Corruption" I mean file system corruption after power cut > > Well, is the filesystem not consistent anymore? > From what Russel explained to me, I thought the main problem is that no w= rite back happens. > IOW the inode is present, has correct length, but no content is there (al= l zeros). I probably misused "corruption" word. What I meant by "corruption" was file having all "00"es instead of expected data. > Just like the typical case where userspace does not fsync. > But in your case sooner or later write back should have happened because = the writeback timer > fires at some point. As you probably noticed I wrote tmptest.c - could you test it, please? See if you can reproduce the problem? Please call it with 3 arguments: 1) Path on ubifs mount point 2) Some xattr name 3) Some xattr value Please note I wait 5 seconds (this matches vm.dirty_writeback_centisecs being 500) before doing a power cut. That lets ubifs write to flash. For some reason files that got fsetxattr called for them still are all "00"es after a power cut. I did some extra testing after enabling ubifs debugging for io.c, file.c and journal.c. Debugging output looks like expected. I can clearly see wbuf_timer_callback_nolock() being called. I attached my debugging summary as ubifs-dbg.html please take a look at it in case I missed something. --=20 Rafa=C5=82