From: Richard Weinberger <richard@nod.at>
To: sg@emlix.com
Cc: linux-mtd <linux-mtd@lists.infradead.org>,
Artem Bityutskiy <dedekind1@gmail.com>,
Adrian Hunter <adrian.hunter@intel.com>,
chengzhihao1 <chengzhihao1@huawei.com>
Subject: Re: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
Date: Thu, 22 Jul 2021 10:33:34 +0200 (CEST) [thread overview]
Message-ID: <2125418384.6749.1626942814175.JavaMail.zimbra@nod.at> (raw)
In-Reply-To: <YPkj9NUxNLGG5tjL@emlix.com>
----- Ursprüngliche Mail -----
> Von: sg@emlix.com
> An: "linux-mtd" <linux-mtd@lists.infradead.org>
> CC: "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter" <adrian.hunter@intel.com>,
> "chengzhihao1" <chengzhihao1@huawei.com>
> Gesendet: Donnerstag, 22. Juli 2021 09:53:24
> Betreff: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
> Hi,
>
> we encountered an error in ubifs most likely when using O_TMPFILE and linkat.
>
> The code (dir.c:ubifs_lookup) we ran into states
>
> /*
> * This should not happen. Probably the file-system needs
> * checking.
> */
> I'd love to, but how? since recovery.c should have taken care of that before
> AFAIK
UBIFS code sometimes contains a little portion of humor.
>
> I wish I could give more elaborate information but this one is hard to track
> and reproduce.
>
> What has been tried so far and also was the first encounter
>
> - systemd-coredump has been triggered (in our test-setup with kill -SIGABRT of
> some process)
> - hardware power-cut (the timing when to poweroff has yet to be found)
> - on replaying the journal the fs was mounted read only
> - on later boots the fs was mounted properly but when accessing the respective
> directory an inode could not be found (with ls).
> Traced it down to ubifs_tnc_lookup.
> - dead directory entry error lead to ro_mode
>
> Following part of dmesg shows the error
>
> [ 67.480858] UBIFS DBG gen (pid 241): dir ino 93, f_pos 0x0
> [ 67.481634] UBIFS DBG gen (pid 241): ino 40735, new f_pos 0x295e98
> [ 67.482024] UBIFS DBG gen (pid 241): ino 98493, new f_pos 0x9f9a65
> [ 67.482753] UBIFS DBG gen (pid 241): ino 98146, new f_pos 0x18b5dd81
> [ 67.483257] UBIFS DBG gen (pid 241): 'tmp' in dir ino 93
> [ 67.486920] UBIFS DBG gen (pid 241):
> 'core.WPEWebProcess.0.e7c55980c230401d8e3941e72f56a95c.251.1530017543000000' in
> dir ino 93
> [ 67.489412] UBIFS error (ubi0:2 pid 241): ubifs_iget: failed to read inode
> 98493, error -2
>
>
> systemd-coredump opens a file with O_TMPFILE, writes to it and later calls
> linkat()
> With O_TMPFILE disabled in kernel the error does not occur and systemd uses an
> alternate code path.
In this area we had a lot of issues, simply because the concept of inode "rebirth"
was unknown to UBIFS.
> System
>
> Linux 4.19.186 #26 SMP Wed Jul 21 13:19:08 CEST 2021 armv7l GNU/Linux
This is new enough to contain ee1438ce5dc4 ("ubifs: Check link count of inodes when killing orphans.").
So it must be something different.
>
> with following patches of which I thought they might solve the problem
> 325b731983d010d358505bf2c40f5e0069af62d2
You want this one definitely. It could explain the problem you see.
BTW: Applying this patch will not recover the corrupted fs. It just makes sure
that the corruption does not happen.
> dd7db149bcd914db8fdeb19cd5597c0740121bc7
This one is unrelated. In worst case you'll hit an UBIFS assert.
> /data/var is bind-mounted to /var the directory used by systemd-coredump.
> /data is the actual mountpoint of ubi volume ubi0_2
>
> systemd version 239
>
> Hardware
> i.MX6Q on a phyCore SOM with a SPANSION S34ML08G201TFI00 NAND flash
>
> The power-cut is done hard by pulling the CPU reset line.
>
>
> Can you give me hints how to proceed on this?
> - are there any patches from 5.14-rc2 I might have missed
> - how to reproduce the error more reliable, ie. find the right time to cut power
> It is now at around 4s after issueing 'kill'. With a margin of 500ms
> and step width of 25ms
Can you reproduce using a recent kernel?
> - with the mentioned above I might get my hands on more useful
> information than the blank tnc and journal spam with enabled DEBUG (had to set
> LOG_BUF to
> 24 (16M) to capture/save all messages
> - how to recover by hand
Without inspecting the filesystem this impossible to answer.
You have some recovery tool in mind?
> - might it be safe to just get rid of the inode in ubifs_readdir when it can not
> be found? In our particular case it is just a coredump an we don't care for
> data loss as long as we can still use the rest of the volume/
> filesystem.
Yes. Maybe it makes sense to make UBIFS at this stage more forgiving instead of
the current situation.
Thanks,
//richard
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
next prev parent reply other threads:[~2021-07-22 8:34 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-22 7:53 ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat sg
2021-07-22 8:33 ` Richard Weinberger [this message]
2021-07-22 10:47 ` Sebastian Gross
2021-07-27 21:06 ` Richard Weinberger
2021-08-12 12:01 ` Sebastian Gross
2021-09-30 7:29 ` Richard Weinberger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2125418384.6749.1626942814175.JavaMail.zimbra@nod.at \
--to=richard@nod.at \
--cc=adrian.hunter@intel.com \
--cc=chengzhihao1@huawei.com \
--cc=dedekind1@gmail.com \
--cc=linux-mtd@lists.infradead.org \
--cc=sg@emlix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox