From: "J. Bruce Fields" <bfields@fieldses.org>
To: Sage Weil <sage@inktank.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Al Viro <viro@ZenIV.linux.org.uk>,
linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
sandeen@redhat.com
Subject: Re: why is i_ino unsigned long, anyway?
Date: Wed, 2 Oct 2013 15:00:34 -0400 [thread overview]
Message-ID: <20131002190034.GH14808@fieldses.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1310021130280.7765@cobra.newdream.net>
On Wed, Oct 02, 2013 at 11:47:22AM -0700, Sage Weil wrote:
> On Wed, 2 Oct 2013, J. Bruce Fields wrote:
> > On Wed, Oct 02, 2013 at 09:05:27AM -0700, Christoph Hellwig wrote:
> > > On Wed, Oct 02, 2013 at 10:25:27AM -0400, J. Bruce Fields wrote:
> > > > If so then it's no huge code duplication to it by hand:
> > > >
> > > > if (inode->i_op->getattr)
> > > > inode->i_op->getattr(path->mnt, path->dentry, &stat);
> > > > else
> > > > generic_fillattr(inode, &stat);
> > >
> > > Maybe make that a vfs_getattr_nosec and let vfs_getattr call it?
> > >
> > > Including a proper kerneldoc comment explaining when to use it, please.
> >
> > Something like this?
>
> I'm late to this thread, but: getattr() is a comparatively expensive
> operation for just getting an (immutable) ino value on a distributed or
> clustered fs. It would be nice if there were a separate call that didn't
> try to populate all the other kstat fields with valid data. Something
> like this was part of the xstat series from forever ago (a bit mask passed
> to getattr indicating which fields were needed), so the approach below
> might be okay if we think we'll get there sometime soon, but my preference
> would be for another fix...
Understood, and perhaps this code should eventually take advantage of
xstat, but:
- this code isn't handling the common case, so we're not too
worried about the performance, and
- you also have the option of defining your own
export_operations->get_name. Especially consider that if you
have some better way to answer the question "what is the name
of inode X in directory Y" that's better than reading Y
looking for a matching inode number.
--b.
> (Ceph also uses 64-bit inos.)
>
> Thanks!
> sage
>
>
> >
> > --b.
> >
> > commit 8418a41b7192cf2f372ae091207adb29a088f9a0
> > Author: J. Bruce Fields <bfields@redhat.com>
> > Date: Tue Sep 10 11:41:12 2013 -0400
> >
> > exportfs: fix 32-bit nfsd handling of 64-bit inode numbers
> >
> > Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
> > 32-bit NFS server exporting a very large XFS filesystem, when the
> > server's cache is cold (so the inodes in question are not in cache).
> >
> > Reported-by: Trevor Cordes <trevor@tecnopolis.ca>
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> >
> > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> > index 293bc2e..811831a 100644
> > --- a/fs/exportfs/expfs.c
> > +++ b/fs/exportfs/expfs.c
> > @@ -215,7 +215,7 @@ struct getdents_callback {
> > struct dir_context ctx;
> > char *name; /* name that was found. It already points to a
> > buffer NAME_MAX+1 is size */
> > - unsigned long ino; /* the inum we are looking for */
> > + u64 ino; /* the inum we are looking for */
> > int found; /* inode matched? */
> > int sequence; /* sequence counter */
> > };
> > @@ -255,10 +255,10 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> > struct inode *dir = path->dentry->d_inode;
> > int error;
> > struct file *file;
> > + struct kstat stat;
> > struct getdents_callback buffer = {
> > .ctx.actor = filldir_one,
> > .name = name,
> > - .ino = child->d_inode->i_ino
> > };
> >
> > error = -ENOTDIR;
> > @@ -268,6 +268,16 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> > if (!dir->i_fop)
> > goto out;
> > /*
> > + * inode->i_ino is unsigned long, kstat->ino is u64, so the
> > + * former would be insufficient on 32-bit hosts when the
> > + * filesystem supports 64-bit inode numbers. So we need to
> > + * actually call ->getattr, not just read i_ino:
> > + */
> > + error = vfs_getattr_nosec(path, &stat);
> > + if (error)
> > + return error;
> > + buffer.ino = stat.ino;
> > + /*
> > * Open the directory ...
> > */
> > file = dentry_open(path, O_RDONLY, cred);
> > diff --git a/fs/stat.c b/fs/stat.c
> > index 04ce1ac..71a39e8 100644
> > --- a/fs/stat.c
> > +++ b/fs/stat.c
> > @@ -37,14 +37,21 @@ void generic_fillattr(struct inode *inode, struct kstat *stat)
> >
> > EXPORT_SYMBOL(generic_fillattr);
> >
> > -int vfs_getattr(struct path *path, struct kstat *stat)
> > +/**
> > + * vfs_getattr_nosec - getattr without security checks
> > + * @path: file to get attributes from
> > + * @stat: structure to return attributes in
> > + *
> > + * Get attributes without calling security_inode_getattr.
> > + *
> > + * Currently the only caller other than vfs_getattr is internal to the
> > + * filehandle lookup code, which uses only the inode number and returns
> > + * no attributes to any user. Any other code probably wants
> > + * vfs_getattr.
> > + */
> > +int vfs_getattr_nosec(struct path *path, struct kstat *stat)
> > {
> > struct inode *inode = path->dentry->d_inode;
> > - int retval;
> > -
> > - retval = security_inode_getattr(path->mnt, path->dentry);
> > - if (retval)
> > - return retval;
> >
> > if (inode->i_op->getattr)
> > return inode->i_op->getattr(path->mnt, path->dentry, stat);
> > @@ -53,6 +60,18 @@ int vfs_getattr(struct path *path, struct kstat *stat)
> > return 0;
> > }
> >
> > +EXPORT_SYMBOL_GPL(vfs_getattr_nosec);
> > +
> > +int vfs_getattr(struct path *path, struct kstat *stat)
> > +{
> > + int retval;
> > +
> > + retval = security_inode_getattr(path->mnt, path->dentry);
> > + if (retval)
> > + return retval;
> > + return vfs_getattr_nosec(path, stat);
> > +}
> > +
> > EXPORT_SYMBOL(vfs_getattr);
> >
> > int vfs_fstat(unsigned int fd, struct kstat *stat)
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 9818747..5a51faa 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2500,6 +2500,7 @@ extern int page_symlink(struct inode *inode, const char *symname, int len);
> > extern const struct inode_operations page_symlink_inode_operations;
> > extern int generic_readlink(struct dentry *, char __user *, int);
> > extern void generic_fillattr(struct inode *, struct kstat *);
> > +int vfs_getattr_nosec(struct path *path, struct kstat *stat);
> > extern int vfs_getattr(struct path *, struct kstat *);
> > void __inode_add_bytes(struct inode *inode, loff_t bytes);
> > void inode_add_bytes(struct inode *inode, loff_t bytes);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
WARNING: multiple messages have this Message-ID (diff)
From: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
To: Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>
Cc: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: why is i_ino unsigned long, anyway?
Date: Wed, 2 Oct 2013 15:00:34 -0400 [thread overview]
Message-ID: <20131002190034.GH14808@fieldses.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1310021130280.7765-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
On Wed, Oct 02, 2013 at 11:47:22AM -0700, Sage Weil wrote:
> On Wed, 2 Oct 2013, J. Bruce Fields wrote:
> > On Wed, Oct 02, 2013 at 09:05:27AM -0700, Christoph Hellwig wrote:
> > > On Wed, Oct 02, 2013 at 10:25:27AM -0400, J. Bruce Fields wrote:
> > > > If so then it's no huge code duplication to it by hand:
> > > >
> > > > if (inode->i_op->getattr)
> > > > inode->i_op->getattr(path->mnt, path->dentry, &stat);
> > > > else
> > > > generic_fillattr(inode, &stat);
> > >
> > > Maybe make that a vfs_getattr_nosec and let vfs_getattr call it?
> > >
> > > Including a proper kerneldoc comment explaining when to use it, please.
> >
> > Something like this?
>
> I'm late to this thread, but: getattr() is a comparatively expensive
> operation for just getting an (immutable) ino value on a distributed or
> clustered fs. It would be nice if there were a separate call that didn't
> try to populate all the other kstat fields with valid data. Something
> like this was part of the xstat series from forever ago (a bit mask passed
> to getattr indicating which fields were needed), so the approach below
> might be okay if we think we'll get there sometime soon, but my preference
> would be for another fix...
Understood, and perhaps this code should eventually take advantage of
xstat, but:
- this code isn't handling the common case, so we're not too
worried about the performance, and
- you also have the option of defining your own
export_operations->get_name. Especially consider that if you
have some better way to answer the question "what is the name
of inode X in directory Y" that's better than reading Y
looking for a matching inode number.
--b.
> (Ceph also uses 64-bit inos.)
>
> Thanks!
> sage
>
>
> >
> > --b.
> >
> > commit 8418a41b7192cf2f372ae091207adb29a088f9a0
> > Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Date: Tue Sep 10 11:41:12 2013 -0400
> >
> > exportfs: fix 32-bit nfsd handling of 64-bit inode numbers
> >
> > Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
> > 32-bit NFS server exporting a very large XFS filesystem, when the
> > server's cache is cold (so the inodes in question are not in cache).
> >
> > Reported-by: Trevor Cordes <trevor-CGgvEiIIHbIFyWsGDH9TEg@public.gmane.org>
> > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> >
> > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> > index 293bc2e..811831a 100644
> > --- a/fs/exportfs/expfs.c
> > +++ b/fs/exportfs/expfs.c
> > @@ -215,7 +215,7 @@ struct getdents_callback {
> > struct dir_context ctx;
> > char *name; /* name that was found. It already points to a
> > buffer NAME_MAX+1 is size */
> > - unsigned long ino; /* the inum we are looking for */
> > + u64 ino; /* the inum we are looking for */
> > int found; /* inode matched? */
> > int sequence; /* sequence counter */
> > };
> > @@ -255,10 +255,10 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> > struct inode *dir = path->dentry->d_inode;
> > int error;
> > struct file *file;
> > + struct kstat stat;
> > struct getdents_callback buffer = {
> > .ctx.actor = filldir_one,
> > .name = name,
> > - .ino = child->d_inode->i_ino
> > };
> >
> > error = -ENOTDIR;
> > @@ -268,6 +268,16 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> > if (!dir->i_fop)
> > goto out;
> > /*
> > + * inode->i_ino is unsigned long, kstat->ino is u64, so the
> > + * former would be insufficient on 32-bit hosts when the
> > + * filesystem supports 64-bit inode numbers. So we need to
> > + * actually call ->getattr, not just read i_ino:
> > + */
> > + error = vfs_getattr_nosec(path, &stat);
> > + if (error)
> > + return error;
> > + buffer.ino = stat.ino;
> > + /*
> > * Open the directory ...
> > */
> > file = dentry_open(path, O_RDONLY, cred);
> > diff --git a/fs/stat.c b/fs/stat.c
> > index 04ce1ac..71a39e8 100644
> > --- a/fs/stat.c
> > +++ b/fs/stat.c
> > @@ -37,14 +37,21 @@ void generic_fillattr(struct inode *inode, struct kstat *stat)
> >
> > EXPORT_SYMBOL(generic_fillattr);
> >
> > -int vfs_getattr(struct path *path, struct kstat *stat)
> > +/**
> > + * vfs_getattr_nosec - getattr without security checks
> > + * @path: file to get attributes from
> > + * @stat: structure to return attributes in
> > + *
> > + * Get attributes without calling security_inode_getattr.
> > + *
> > + * Currently the only caller other than vfs_getattr is internal to the
> > + * filehandle lookup code, which uses only the inode number and returns
> > + * no attributes to any user. Any other code probably wants
> > + * vfs_getattr.
> > + */
> > +int vfs_getattr_nosec(struct path *path, struct kstat *stat)
> > {
> > struct inode *inode = path->dentry->d_inode;
> > - int retval;
> > -
> > - retval = security_inode_getattr(path->mnt, path->dentry);
> > - if (retval)
> > - return retval;
> >
> > if (inode->i_op->getattr)
> > return inode->i_op->getattr(path->mnt, path->dentry, stat);
> > @@ -53,6 +60,18 @@ int vfs_getattr(struct path *path, struct kstat *stat)
> > return 0;
> > }
> >
> > +EXPORT_SYMBOL_GPL(vfs_getattr_nosec);
> > +
> > +int vfs_getattr(struct path *path, struct kstat *stat)
> > +{
> > + int retval;
> > +
> > + retval = security_inode_getattr(path->mnt, path->dentry);
> > + if (retval)
> > + return retval;
> > + return vfs_getattr_nosec(path, stat);
> > +}
> > +
> > EXPORT_SYMBOL(vfs_getattr);
> >
> > int vfs_fstat(unsigned int fd, struct kstat *stat)
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 9818747..5a51faa 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2500,6 +2500,7 @@ extern int page_symlink(struct inode *inode, const char *symname, int len);
> > extern const struct inode_operations page_symlink_inode_operations;
> > extern int generic_readlink(struct dentry *, char __user *, int);
> > extern void generic_fillattr(struct inode *, struct kstat *);
> > +int vfs_getattr_nosec(struct path *path, struct kstat *stat);
> > extern int vfs_getattr(struct path *, struct kstat *);
> > void __inode_add_bytes(struct inode *inode, loff_t bytes);
> > void inode_add_bytes(struct inode *inode, loff_t bytes);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-10-02 19:01 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-12 16:03 why is i_ino unsigned long, anyway? J. Bruce Fields
2013-09-12 19:33 ` Al Viro
2013-09-29 11:54 ` Christoph Hellwig
2013-09-29 11:54 ` Christoph Hellwig
2013-10-02 14:25 ` J. Bruce Fields
2013-10-02 14:25 ` J. Bruce Fields
2013-10-02 15:43 ` J. Bruce Fields
2013-10-02 15:43 ` J. Bruce Fields
2013-10-02 16:04 ` Christoph Hellwig
2013-10-02 16:04 ` Christoph Hellwig
2013-10-02 18:14 ` J. Bruce Fields
2013-10-02 16:05 ` Christoph Hellwig
2013-10-02 16:05 ` Christoph Hellwig
2013-10-02 17:53 ` J. Bruce Fields
2013-10-02 17:53 ` J. Bruce Fields
2013-10-02 17:57 ` Christoph Hellwig
2013-10-02 17:57 ` Christoph Hellwig
2013-10-02 21:07 ` J. Bruce Fields
2013-10-02 21:28 ` [PATCH 1/2] vfs: split out vfs_getattr_nosec J. Bruce Fields
2013-10-02 21:28 ` J. Bruce Fields
2013-10-02 21:28 ` [PATCH 2/2] exportfs: fix 32-bit nfsd handling of 64-bit inode numbers J. Bruce Fields
2013-10-04 22:12 ` J. Bruce Fields
2013-10-04 22:12 ` J. Bruce Fields
2013-10-04 22:15 ` J. Bruce Fields
2013-10-04 22:15 ` J. Bruce Fields
2013-10-08 21:56 ` J. Bruce Fields
2013-10-09 0:16 ` Dave Chinner
2013-10-09 14:53 ` J. Bruce Fields
2013-10-09 14:53 ` J. Bruce Fields
2013-10-10 22:28 ` Dave Chinner
2013-10-11 21:53 ` J. Bruce Fields
2013-10-11 21:53 ` J. Bruce Fields
2013-10-13 22:52 ` Dave Chinner
2013-10-02 18:47 ` why is i_ino unsigned long, anyway? Sage Weil
2013-10-02 19:00 ` J. Bruce Fields [this message]
2013-10-02 19:00 ` J. Bruce Fields
2013-10-02 19:04 ` Sage Weil
2013-10-02 19:04 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131002190034.GH14808@fieldses.org \
--to=bfields@fieldses.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=sage@inktank.com \
--cc=sandeen@redhat.com \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.