linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
To: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jkoolstra@xs4all.nl" <jkoolstra@xs4all.nl>
Cc: "glaubitz@physik.fu-berlin.de" <glaubitz@physik.fu-berlin.de>,
	"frank.li@vivo.com" <frank.li@vivo.com>,
	"slava@dubeyko.com" <slava@dubeyko.com>,
	"skhan@linuxfoundation.org" <skhan@linuxfoundation.org>,
	"syzbot+17cc9bb6d8d69b4139f0@syzkaller.appspotmail.com"
	<syzbot+17cc9bb6d8d69b4139f0@syzkaller.appspotmail.com>
Subject: RE: [PATCH v2] hfs: replace BUG_ONs with error handling
Date: Mon, 1 Dec 2025 19:57:38 +0000	[thread overview]
Message-ID: <01e9ce7fb96e555f0ab07f27890b0ed3406a92ae.camel@ibm.com> (raw)
In-Reply-To: <299926848.3375545.1764534866882@kpc.webmail.kpnmail.nl>

Hi Jori,

On Sun, 2025-11-30 at 21:34 +0100, Jori Koolstra wrote:
> Hi Viachslav,
> 
> Thanks for your time to write such a detailed answer. Your comments are very useful
> to someone like me starting out in the linux kernel. I really appreciate it.
> 
> > > @@ -264,9 +264,9 @@ static int hfs_remove(struct inode *dir, struct dentry *dentry)
> > >  		return res;
> > >  	clear_nlink(inode);
> > >  	inode_set_ctime_current(inode);
> > > -	hfs_delete_inode(inode);
> > > +	res = hfs_delete_inode(inode);
> > >  	mark_inode_dirty(inode);
> > > -	return 0;
> > > +	return res;
> > 
> > This modification doesn't look good, frankly speaking. The hfs_delete_inode()
> > will return error code pretty at the beginning of execution. So, it doesn't make
> > sense to call mark_inode_dirty() then. However, we already did a lot of activity
> > before hfs_delete_inode() call:
> > 
> > static int hfs_remove(struct inode *dir, struct dentry *dentry)
> > {
> > 	struct inode *inode = d_inode(dentry);
> > 	int res;
> > 
> > 	if (S_ISDIR(inode->i_mode) && inode->i_size != 2)
> > 		return -ENOTEMPTY;
> > 	res = hfs_cat_delete(inode->i_ino, dir, &dentry->d_name);
> > 	if (res)
> > 		return res;
> > 	clear_nlink(inode);
> > 	inode_set_ctime_current(inode);
> > 	hfs_delete_inode(inode);
> > 	mark_inode_dirty(inode);
> > 	return 0;
> > }
> > 
> > So, not full executing of hfs_delete_inode() makes situation really bad.
> > Because, we deleted record from Catalog File but rejected of execution of
> > hfs_delete_inode() functionality.
> > 
> > I am thinking that, maybe, better course of action is to check HFS_SB(sb)-
> > > folder_count and HFS_SB(sb)->file_count at the beginning of hfs_remove():
> > 
> > static int hfs_remove(struct inode *dir, struct dentry *dentry)
> > {
> > 	struct inode *inode = d_inode(dentry);
> > 	int res;
> > 
> > 	if (S_ISDIR(inode->i_mode) && inode->i_size != 2)
> > 		return -ENOTEMPTY;
> > 
> > <<-- Check it here and return error
> > 
> > 	res = hfs_cat_delete(inode->i_ino, dir, &dentry->d_name);
> > 	if (res)
> > 		return res;
> > 	clear_nlink(inode);
> > 	inode_set_ctime_current(inode);
> > 	hfs_delete_inode(inode);
> > 	mark_inode_dirty(inode);
> > 	return 0;
> > }
> > 
> 
> That sounds good. But maybe we should do the check even before the ENOTEMPTY check,
> as corruption detection is perhaps more interesting than informing that the operation
> cannot complete because of some other (less acute) reason.
> 

If we are not going to call hfs_cat_delete() and hfs_delete_inode(), then it
doesn't need to check folder_count or file_count. So, this is why I've suggested
to place these checks after.

Again, we could have complete mess of in-core file system's data structures
because of bugs, memory issues or anything else. But if we prevent this state
from writing to the file system's volume, then no corruption of file system
volume will happen. So, we cannot say that file system volume corrupted if we
detected incorrect values of folder_count or/and file_count before write. But we
can say that file system volume could be corrupted if we read inconsistent state
of metadata from the volume.

> > In such case, we reject to make the removal, to return error and no activity
> > will happened. Let's move the check from hfs_delete_inode() to hfs_remove(). We
> > can ignore hfs_create() [1] and hfs_mkdir() [2] because these methods simply
> > processing erroneous situation.
> > 
> 
> One thing we can also do is what happens in ext4. We introduce an errors= mount option
> which can be set to readonly, panic, or continue depending on the desired behavior in
> case of serious error (like corruption). I already implemented this for minix fs, and
> the patch was fine. However, the people responsible for minix felt that it was more
> sensible to deprecate minix and write a FUSE driver for it. [1]

I had impression that HFS already has it. But even if it is not so, then it
sounds like another task. Let's don't mix the different problems into one
solution. Otherwise, we will have a huge patch.

> 
> > > +#define EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */
> > 
> > I don't think that rename existing error code is good idea. Especially, because
> > we will not need the newly introduce error code's name. Please, see my comments
> > below.
> > 
> 
> For context, I took this from ext4.

I still don't see corruption here. :) Please, see my remarks above.

> 
> > > --- a/fs/hfs/inode.c
> > > +++ b/fs/hfs/inode.c
> > > @@ -186,16 +186,22 @@ struct inode *hfs_new_inode(struct inode *dir, const struct qstr *name, umode_t
> > >  	s64 next_id;
> > >  	s64 file_count;
> > >  	s64 folder_count;
> > > +	int err = -ENOMEM;
> > >  
> > >  	if (!inode)
> > > -		return NULL;
> > > +		goto out_err;
> > > +
> > > +	err = -EFSCORRUPTED;
> > 
> > In 99% of cases, this logic will be called for file system internal logic when
> > mount was successful. So, file system volume is not corrupted. Even if we
> > suspect that volume is corrupted, then potential reason could be failed read (-
> > EIO). It needs to run FSCK tool to be sure that volume is really corrupted.
> > 
> 
> I get your point, maybe just warn for possible corruption?

We can consider this as corruption only for the case of mount operation. So, we
can share warning only if we are executing the mount operation. But
hfs_fill_super() is the right place for such warning then.

> 
> > >  
> > >  	mutex_init(&HFS_I(inode)->extents_lock);
> > >  	INIT_LIST_HEAD(&HFS_I(inode)->open_dir_list);
> > >  	spin_lock_init(&HFS_I(inode)->open_dir_lock);
> > >  	hfs_cat_build_key(sb, (btree_key *)&HFS_I(inode)->cat_key, dir->i_ino, name);
> > >  	next_id = atomic64_inc_return(&HFS_SB(sb)->next_id);
> > > -	BUG_ON(next_id > U32_MAX);
> > > +	if (next_id > U32_MAX) {
> > > +		pr_err("next CNID exceeds limit — filesystem corrupted. It is recommended to run fsck\n");
> > 
> > File system volume is not corrupted here. Because, it is only error of file
> > system logic. And we will not store this not correct number to the volume,
> > anyway. At minimum, we should protect the logic from doing this. And it doesn't
> > need to recommend to run FSCK tool here.
> 
> What if e.g. next_id is not U32_MAX, but some other slightly smaller value, that's still
> not possible, correct? And then we find out not at mount time (at least not right now).
> Maybe we should just check at mount time and when the mdb is written if the values like
> file/folder_count and next_id make any sense. I think they indicate corruption even for
> much smaller values than U32_MAX, but I could not really distill that.
> 
> If we have this, then the other BUG_ONs should not indicate corruption but implementation
> logic issues. Correct?
> 
> 

It's easy to make conclusion about inconsistent state of file/folder_count and
next_id if these values are U32_MAX. Otherwise, if it is smaller than U32_MAX
but still huge, then the check of these values correctness requires of checking
all of Catalog File's entries. Usually, we cannot afford of doing such check on
file system driver side. And this is responsibility of FSCK tool, usually.

Thanks,
Slava.

> > Probably, it makes sense to decrement erroneous back.
> > 
> > Potentially, if we have such situation, maybe, it makes sense to consider to
> > make file system READ-ONLY. But I am not fully sure.
> > 
> 
> See my comment above.
> 
> Thanks,
> Jori.
> 
> [1] https://lkml.org/lkml/2025/10/28/1786  

  reply	other threads:[~2025-12-01 19:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-25 21:13 [PATCH v2] hfs: replace BUG_ONs with error handling Jori Koolstra
2025-11-26 21:09 ` Viacheslav Dubeyko
2025-11-30 20:34   ` Jori Koolstra
2025-12-01 19:57     ` Viacheslav Dubeyko [this message]
2025-12-02 16:45       ` Jori Koolstra
2025-12-02 20:33         ` Viacheslav Dubeyko
2025-12-07 18:30           ` Jori Koolstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01e9ce7fb96e555f0ab07f27890b0ed3406a92ae.camel@ibm.com \
    --to=slava.dubeyko@ibm.com \
    --cc=frank.li@vivo.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=jkoolstra@xs4all.nl \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=slava@dubeyko.com \
    --cc=syzbot+17cc9bb6d8d69b4139f0@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).