All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	dsterba@suse.com, Al Viro <viro@zeniv.linux.org.uk>,
	Jan Kara <jack@suse.com>, Theodore Tso <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Jaegeuk Kim <jaegeuk@kernel.org>,
	yuchao0@huawei.com, Hugh Dickins <hughd@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Richard Weinberger <richard@nod.at>,
	Artem Bityutskiy <dedekind1@gmail.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Ext4 <linux-ext4@vger.kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-mtd@lists.infradead.org, Linux MM <linux-mm@kvack.org>
Subject: Re: [PATCH] vfs: don't decrement i_nlink in d_tmpfile
Date: Fri, 15 Feb 2019 07:56:04 -0800	[thread overview]
Message-ID: <20190215155604.GL32253@magnolia> (raw)
In-Reply-To: <CAOQ4uxho2AK7g-uhHykGaG6n+aqad-SaCTC6Z_EaA4Jn07tDSg@mail.gmail.com>

On Fri, Feb 15, 2019 at 10:04:12AM +0200, Amir Goldstein wrote:
> On Fri, Feb 15, 2019 at 4:23 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > d_tmpfile was introduced to instantiate an inode in the dentry cache as
> > a temporary file.  This helper decrements the inode's nlink count and
> > dirties the inode, presumably so that filesystems could call new_inode
> > to create a new inode with nlink == 1 and then call d_tmpfile which will
> > decrement nlink.
> >
> > However, this doesn't play well with XFS, which needs to allocate,
> > initialize, and insert a tempfile inode on its unlinked list in a single
> > transaction.  In order to maintain referential integrity of the XFS
> > metadata, we cannot have an inode on the unlinked list with nlink >= 1.
> >
> > XFS and btrfs hack around d_tmpfile's behavior by creating the inode
> > with nlink == 0 and then incrementing it just prior to calling
> > d_tmpfile, anticipating that it will be reset to 0.
> >
> > Everywhere else outside of d_tmpfile, it appears that nlink updates and
> > persistence is the responsibility of individual filesystems.  Therefore,
> > move the nlink decrement out of d_tmpfile into the callers, and require
> > that callers only pass in inodes with nlink already set to 0.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/btrfs/inode.c  |    8 --------
> >  fs/dcache.c       |    8 ++++++--
> >  fs/ext2/namei.c   |    2 +-
> >  fs/ext4/namei.c   |    1 +
> >  fs/f2fs/namei.c   |    1 +
> >  fs/minix/namei.c  |    2 +-
> >  fs/ubifs/dir.c    |    1 +
> >  fs/udf/namei.c    |    2 +-
> >  fs/xfs/xfs_iops.c |   13 ++-----------
> >  mm/shmem.c        |    1 +
> >  10 files changed, 15 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 5c349667c761..bd189fc50f83 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -10382,14 +10382,6 @@ static int btrfs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
> >         if (ret)
> >                 goto out;
> >
> > -       /*
> > -        * We set number of links to 0 in btrfs_new_inode(), and here we set
> > -        * it to 1 because d_tmpfile() will issue a warning if the count is 0,
> > -        * through:
> > -        *
> > -        *    d_tmpfile() -> inode_dec_link_count() -> drop_nlink()
> > -        */
> > -       set_nlink(inode, 1);
> >         d_tmpfile(dentry, inode);
> >         unlock_new_inode(inode);
> >         mark_inode_dirty(inode);
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index aac41adf4743..5fb4ecce2589 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -3042,12 +3042,16 @@ void d_genocide(struct dentry *parent)
> >
> >  EXPORT_SYMBOL(d_genocide);
> >
> > +/*
> > + * Instantiate an inode in the dentry cache as a temporary file.  Callers must
> > + * ensure that @inode has a zero link count.
> > + */
> >  void d_tmpfile(struct dentry *dentry, struct inode *inode)
> >  {
> > -       inode_dec_link_count(inode);
> >         BUG_ON(dentry->d_name.name != dentry->d_iname ||
> >                 !hlist_unhashed(&dentry->d_u.d_alias) ||
> > -               !d_unlinked(dentry));
> > +               !d_unlinked(dentry) ||
> > +               inode->i_nlink != 0);
> 
> You've just promoted i_nlink filesystem accounting error (which
> are not that rare) from WARN_ON() to BUG_ON(), not to mention
> Linus' objection to any use of BUG_ON() at all.
> 
> !hlist_unhashed is anyway checked again in d_instantiate().
> !d_unlinked is not a reason to break the machine.
> The name check is really not a reason to break the machine.
> Can probably make tmp name code conditional to WARN_ON().

Fair enough, I'll remove the redundant checks and downgrade that to a
WARN_ON, if nobody else objects....

--D

> Thanks,
> Amir.

WARNING: multiple messages have this Message-ID (diff)
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: linux-xfs <linux-xfs@vger.kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	Theodore Tso <tytso@mit.edu>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Artem Bityutskiy <dedekind1@gmail.com>,
	Richard Weinberger <richard@nod.at>,
	yuchao0@huawei.com, Hugh Dickins <hughd@google.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Christoph Hellwig <hch@infradead.org>, Chris Mason <clm@fb.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-mtd@lists.infradead.org, Jan Kara <jack@suse.com>,
	dsterba@suse.com, Jaegeuk Kim <jaegeuk@kernel.org>,
	Ext4 <linux-ext4@vger.kernel.org>, Linux MM <linux-mm@kvack.org>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] vfs: don't decrement i_nlink in d_tmpfile
Date: Fri, 15 Feb 2019 07:56:04 -0800	[thread overview]
Message-ID: <20190215155604.GL32253@magnolia> (raw)
In-Reply-To: <CAOQ4uxho2AK7g-uhHykGaG6n+aqad-SaCTC6Z_EaA4Jn07tDSg@mail.gmail.com>

On Fri, Feb 15, 2019 at 10:04:12AM +0200, Amir Goldstein wrote:
> On Fri, Feb 15, 2019 at 4:23 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
> >
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > d_tmpfile was introduced to instantiate an inode in the dentry cache as
> > a temporary file.  This helper decrements the inode's nlink count and
> > dirties the inode, presumably so that filesystems could call new_inode
> > to create a new inode with nlink == 1 and then call d_tmpfile which will
> > decrement nlink.
> >
> > However, this doesn't play well with XFS, which needs to allocate,
> > initialize, and insert a tempfile inode on its unlinked list in a single
> > transaction.  In order to maintain referential integrity of the XFS
> > metadata, we cannot have an inode on the unlinked list with nlink >= 1.
> >
> > XFS and btrfs hack around d_tmpfile's behavior by creating the inode
> > with nlink == 0 and then incrementing it just prior to calling
> > d_tmpfile, anticipating that it will be reset to 0.
> >
> > Everywhere else outside of d_tmpfile, it appears that nlink updates and
> > persistence is the responsibility of individual filesystems.  Therefore,
> > move the nlink decrement out of d_tmpfile into the callers, and require
> > that callers only pass in inodes with nlink already set to 0.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/btrfs/inode.c  |    8 --------
> >  fs/dcache.c       |    8 ++++++--
> >  fs/ext2/namei.c   |    2 +-
> >  fs/ext4/namei.c   |    1 +
> >  fs/f2fs/namei.c   |    1 +
> >  fs/minix/namei.c  |    2 +-
> >  fs/ubifs/dir.c    |    1 +
> >  fs/udf/namei.c    |    2 +-
> >  fs/xfs/xfs_iops.c |   13 ++-----------
> >  mm/shmem.c        |    1 +
> >  10 files changed, 15 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 5c349667c761..bd189fc50f83 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -10382,14 +10382,6 @@ static int btrfs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
> >         if (ret)
> >                 goto out;
> >
> > -       /*
> > -        * We set number of links to 0 in btrfs_new_inode(), and here we set
> > -        * it to 1 because d_tmpfile() will issue a warning if the count is 0,
> > -        * through:
> > -        *
> > -        *    d_tmpfile() -> inode_dec_link_count() -> drop_nlink()
> > -        */
> > -       set_nlink(inode, 1);
> >         d_tmpfile(dentry, inode);
> >         unlock_new_inode(inode);
> >         mark_inode_dirty(inode);
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index aac41adf4743..5fb4ecce2589 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -3042,12 +3042,16 @@ void d_genocide(struct dentry *parent)
> >
> >  EXPORT_SYMBOL(d_genocide);
> >
> > +/*
> > + * Instantiate an inode in the dentry cache as a temporary file.  Callers must
> > + * ensure that @inode has a zero link count.
> > + */
> >  void d_tmpfile(struct dentry *dentry, struct inode *inode)
> >  {
> > -       inode_dec_link_count(inode);
> >         BUG_ON(dentry->d_name.name != dentry->d_iname ||
> >                 !hlist_unhashed(&dentry->d_u.d_alias) ||
> > -               !d_unlinked(dentry));
> > +               !d_unlinked(dentry) ||
> > +               inode->i_nlink != 0);
> 
> You've just promoted i_nlink filesystem accounting error (which
> are not that rare) from WARN_ON() to BUG_ON(), not to mention
> Linus' objection to any use of BUG_ON() at all.
> 
> !hlist_unhashed is anyway checked again in d_instantiate().
> !d_unlinked is not a reason to break the machine.
> The name check is really not a reason to break the machine.
> Can probably make tmp name code conditional to WARN_ON().

Fair enough, I'll remove the redundant checks and downgrade that to a
WARN_ON, if nobody else objects....

--D

> Thanks,
> Amir.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2019-02-15 15:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-14 23:49 [PATCH] vfs: don't decrement i_nlink in d_tmpfile Darrick J. Wong
2019-02-14 23:49 ` Darrick J. Wong
2019-02-14 23:49 ` Darrick J. Wong
2019-02-15  8:04 ` Amir Goldstein
2019-02-15  8:04   ` Amir Goldstein
2019-02-15 15:56   ` Darrick J. Wong [this message]
2019-02-15 15:56     ` Darrick J. Wong
2019-02-15 22:39 ` [PATCH v2] " Darrick J. Wong
2019-02-15 22:39   ` Darrick J. Wong
2019-02-17  0:26   ` Al Viro
2019-02-17  0:26     ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190215155604.GL32253@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=adrian.hunter@intel.com \
    --cc=amir73il@gmail.com \
    --cc=clm@fb.com \
    --cc=dedekind1@gmail.com \
    --cc=dsterba@suse.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.com \
    --cc=jaegeuk@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=richard@nod.at \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.