public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
From: sg@emlix.com
To: linux-mtd@lists.infradead.org
Cc: Richard Weinberger <richard@nod.at>,
	Artem Bityutskiy <dedekind1@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Zhihao Cheng <chengzhihao1@huawei.com>
Subject: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
Date: Thu, 22 Jul 2021 09:53:24 +0200	[thread overview]
Message-ID: <YPkj9NUxNLGG5tjL@emlix.com> (raw)

Hi,

we encountered an error in ubifs most likely when using O_TMPFILE and linkat.

The code (dir.c:ubifs_lookup) we ran into states

        /*
         * This should not happen. Probably the file-system needs
         * checking.
         */
I'd love to, but how? since recovery.c should have taken care of that before AFAIK


I wish I could give more elaborate information but this one is hard to track
and reproduce.

What has been tried so far and also was the first encounter

- systemd-coredump has been triggered (in our test-setup with kill -SIGABRT of
  some process)
- hardware power-cut (the timing when to poweroff has yet to be found)
- on replaying the journal the fs was mounted read only
- on later boots the fs was mounted properly but when accessing the respective
  directory an inode could not be found (with ls).
  Traced it down to ubifs_tnc_lookup.
- dead directory entry error lead to ro_mode

Following part of dmesg shows the error

   [   67.480858] UBIFS DBG gen (pid 241): dir ino 93, f_pos 0x0
   [   67.481634] UBIFS DBG gen (pid 241): ino 40735, new f_pos 0x295e98
   [   67.482024] UBIFS DBG gen (pid 241): ino 98493, new f_pos 0x9f9a65
   [   67.482753] UBIFS DBG gen (pid 241): ino 98146, new f_pos 0x18b5dd81
   [   67.483257] UBIFS DBG gen (pid 241): 'tmp' in dir ino 93
   [   67.486920] UBIFS DBG gen (pid 241): 'core.WPEWebProcess.0.e7c55980c230401d8e3941e72f56a95c.251.1530017543000000' in dir ino 93
   [   67.489412] UBIFS error (ubi0:2 pid 241): ubifs_iget: failed to read inode 98493, error -2


systemd-coredump opens a file with O_TMPFILE, writes to it and later calls
linkat()
With O_TMPFILE disabled in kernel the error does not occur and systemd uses an
alternate code path.

System 

  Linux 4.19.186 #26 SMP Wed Jul 21 13:19:08 CEST 2021 armv7l GNU/Linux

with following patches of which I thought they might solve the problem
  325b731983d010d358505bf2c40f5e0069af62d2
  dd7db149bcd914db8fdeb19cd5597c0740121bc7

/data/var is bind-mounted to /var the directory used by systemd-coredump.
/data is the actual mountpoint of ubi volume ubi0_2

  systemd version 239

Hardware
  i.MX6Q on a phyCore SOM with a SPANSION S34ML08G201TFI00 NAND flash

  The power-cut is done hard by pulling the CPU reset line.


Can you give me hints how to proceed on this?
- are there any patches from 5.14-rc2 I might have missed
- how to reproduce the error more reliable, ie. find the right time to cut power
  It is now at around 4s after issueing 'kill'. With a margin of 500ms
  and step width of 25ms
- with the mentioned above I might get my hands on more useful
  information than the blank tnc and journal spam with enabled DEBUG (had to set LOG_BUF to
  24 (16M) to capture/save all messages
- how to recover by hand
- might it be safe to just get rid of the inode in ubifs_readdir when it can not
  be found? In our particular case it is just a coredump an we don't care for 
  data loss as long as we can still use the rest of the volume/
  filesystem.
- any more information you require

@Zhihao Cheng I put you in CC for you seem to have some experience regarding
similar problems

Thanks in advance and regards


Sebastian Groß

-- 
B.Sc. Sebastian Groß, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11,
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
Ust-IdNr.: DE 205 198 055

emlix - your embedded linux partner

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

             reply	other threads:[~2021-07-22  7:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-22  7:53 sg [this message]
2021-07-22  8:33 ` ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat Richard Weinberger
2021-07-22 10:47   ` Sebastian Gross
2021-07-27 21:06     ` Richard Weinberger
2021-08-12 12:01       ` Sebastian Gross
2021-09-30  7:29         ` Richard Weinberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YPkj9NUxNLGG5tjL@emlix.com \
    --to=sg@emlix.com \
    --cc=adrian.hunter@intel.com \
    --cc=chengzhihao1@huawei.com \
    --cc=dedekind1@gmail.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox