public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
From: Sebastian Gross <sebastian.gross@emlix.com>
To: Richard Weinberger <richard@nod.at>
Cc: linux-mtd <linux-mtd@lists.infradead.org>,
	Artem Bityutskiy <dedekind1@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	chengzhihao1 <chengzhihao1@huawei.com>
Subject: Re: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
Date: Thu, 22 Jul 2021 12:47:53 +0200	[thread overview]
Message-ID: <3b009a01-1f13-e029-0341-9728d4dc16ea@emlix.com> (raw)
In-Reply-To: <2125418384.6749.1626942814175.JavaMail.zimbra@nod.at>

On 7/22/21 10:33 AM, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: sg@emlix.com
>> An: "linux-mtd" <linux-mtd@lists.infradead.org>
>> CC: "richard" <richard@nod.at>, "Artem Bityutskiy" <dedekind1@gmail.com>, "Adrian Hunter" <adrian.hunter@intel.com>,
>> "chengzhihao1" <chengzhihao1@huawei.com>
>> Gesendet: Donnerstag, 22. Juli 2021 09:53:24
>> Betreff: ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat
>> The code (dir.c:ubifs_lookup) we ran into states
>>
>>         /*
>>          * This should not happen. Probably the file-system needs
>>          * checking.
>>          */
>> I'd love to, but how? since recovery.c should have taken care of that before
>> AFAIK
> 
> UBIFS code sometimes contains a little portion of humor.
I appreciate that - in fact it kept me going digging into the problem the last two days =)

>> System
>>
>>   Linux 4.19.186 #26 SMP Wed Jul 21 13:19:08 CEST 2021 armv7l GNU/Linux
> 
> This is new enough to contain ee1438ce5dc4 ("ubifs: Check link count of inodes when killing orphans.").
Confirmed

> So it must be something different.
> 
Agreed

>>
>> with following patches of which I thought they might solve the problem
>>   325b731983d010d358505bf2c40f5e0069af62d2
> 
> You want this one definitely. It could explain the problem you see.
> BTW: Applying this patch will not recover the corrupted fs. It just makes sure
> that the corruption does not happen.
I am not so sure anymore but I think this one was already present when I reproduced the corruption for the third time.

> 
>>   dd7db149bcd914db8fdeb19cd5597c0740121bc7
> 
> This one is unrelated. In worst case you'll hit an UBIFS assert.
Acknowledged. I'll drop it.

>> Can you give me hints how to proceed on this?
>> - are there any patches from 5.14-rc2 I might have missed
>> - how to reproduce the error more reliable, ie. find the right time to cut power
>>   It is now at around 4s after issueing 'kill'. With a margin of 500ms
>>   and step width of 25ms
> 
> Can you reproduce using a recent kernel?
As stated I am already having a hard time to reproduce it as it is. The last time took me ~2000 runs.
But I will try anyway.
Can you give me a pointer when there might a good time/place to cut the power?
During the linkat or fsync call? Or maybe a certain function in the ubifs driver?

> 
>> - with the mentioned above I might get my hands on more useful
>>   information than the blank tnc and journal spam with enabled DEBUG (had to set
>>   LOG_BUF to
>>   24 (16M) to capture/save all messages
>> - how to recover by hand
> 
> Without inspecting the filesystem this impossible to answer.
> You have some recovery tool in mind?
No not particular a tool but by the method I suggested below.

> 
>> - might it be safe to just get rid of the inode in ubifs_readdir when it can not
>>   be found? In our particular case it is just a coredump an we don't care for
>>   data loss as long as we can still use the rest of the volume/
>>   filesystem.
> 
> Yes. Maybe it makes sense to make UBIFS at this stage more forgiving instead of
> the current situation.
ubifs_readdir had then to check if the inodes it got are actually be found by tnc and then remove it.
Seems quite hacky to me and not really upstream-worthy.
Are there other places where a "faulty" inode can be "discovered"?

Regards


Sebastian

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2021-07-22 10:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-22  7:53 ubifs: corrupted dirent (ENOENT), problably related to O_TMPFILE and linkat sg
2021-07-22  8:33 ` Richard Weinberger
2021-07-22 10:47   ` Sebastian Gross [this message]
2021-07-27 21:06     ` Richard Weinberger
2021-08-12 12:01       ` Sebastian Gross
2021-09-30  7:29         ` Richard Weinberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b009a01-1f13-e029-0341-9728d4dc16ea@emlix.com \
    --to=sebastian.gross@emlix.com \
    --cc=adrian.hunter@intel.com \
    --cc=chengzhihao1@huawei.com \
    --cc=dedekind1@gmail.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox