Re: [PATCH 2/2] exportfs: fix 32-bit nfsd handling of 64-bit inode numbers

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "J. Bruce Fields" <bfields@fieldses.org>
To: Dave Chinner <david@fromorbit.com>
Cc: "J. Bruce Fields" <bfields@redhat.com>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
	sandeen@redhat.com
Subject: Re: [PATCH 2/2] exportfs: fix 32-bit nfsd handling of 64-bit inode numbers
Date: Wed, 9 Oct 2013 10:53:20 -0400	[thread overview]
Message-ID: <20131009145320.GD3456@fieldses.org> (raw)
In-Reply-To: <20131009001631.GD4446@dastard>

On Wed, Oct 09, 2013 at 11:16:31AM +1100, Dave Chinner wrote:
> On Tue, Oct 08, 2013 at 05:56:56PM -0400, J. Bruce Fields wrote:
> > On Fri, Oct 04, 2013 at 06:15:22PM -0400, J. Bruce Fields wrote:
> > > On Fri, Oct 04, 2013 at 06:12:16PM -0400, bfields wrote:
> > > > On Wed, Oct 02, 2013 at 05:28:14PM -0400, J. Bruce Fields wrote:
> > > > > @@ -268,6 +268,16 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> > > > >  	if (!dir->i_fop)
> > > > >  		goto out;
> > > > >  	/*
> > > > > +	 * inode->i_ino is unsigned long, kstat->ino is u64, so the
> > > > > +	 * former would be insufficient on 32-bit hosts when the
> > > > > +	 * filesystem supports 64-bit inode numbers.  So we need to
> > > > > +	 * actually call ->getattr, not just read i_ino:
> > > > > +	 */
> > > > > +	error = vfs_getattr_nosec(path, &stat);
> > > > 
> > > > Doh, "path" here is for the parent....  The following works better!
> > > 
> > > By the way, I'm testing this with:
> > > 
> > > 	- create a bunch of nested subdirectories, use
> > > 	  name_to_fhandle_at to get a handle for the bottom directory.
> > > 	- echo 2 >/proc/sys/vm/drop_caches
> > > 	- open_by_fhandle_at on the filehandle
> > > 
> > > But this only actually exercises the reconnect path on the first run
> > > after boot.  Is there something obvious I'm missing here?
> > 
> > Looking at the code....  OK, most of the work of drop_caches is done by
> > shrink_slab_node, which doesn't actually try to free every single thing
> > that it could free--in particular, it won't try to free anything if it
> > thinks there are less than shrinker->batch_size (1024 in the
> > super_block->s_shrink case) objects to free.

(Oops, sorry, that should have been "less than half of
shrinker->batch_size", see below.)

> That's not quite right. Yes, the shrinker won't be called if the
> calculated scan count is less than the batch size, but the left over
> is added back the shrinker scan count to carry over to the next call
> to the shrinker. Hence if you repeated call the shrinker on a small
> cache with a large batch size, it will eventually aggregate the scan
> counts to over the batch size and trim the cache....

No, in shrink_slab_count, we do this:

	if (total_scan > max_pass * 2)
		total_scan = max_pass * 2;

	while (total_scan >= batch_size) {
		...
	}

where max_pass is the value returned from count_objects.  So as long as
count_objects returns less than half batch_size, nothing ever happens.

(I wonder if that check's correct?  The "forever" in the comment above
it seems wrong at least.)

--b.

WARNING: multiple messages have this Message-ID (diff)

From: "J. Bruce Fields" <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
To: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
Cc: "J. Bruce Fields"
	<bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH 2/2] exportfs: fix 32-bit nfsd handling of 64-bit inode numbers
Date: Wed, 9 Oct 2013 10:53:20 -0400	[thread overview]
Message-ID: <20131009145320.GD3456@fieldses.org> (raw)
In-Reply-To: <20131009001631.GD4446@dastard>

On Wed, Oct 09, 2013 at 11:16:31AM +1100, Dave Chinner wrote:
> On Tue, Oct 08, 2013 at 05:56:56PM -0400, J. Bruce Fields wrote:
> > On Fri, Oct 04, 2013 at 06:15:22PM -0400, J. Bruce Fields wrote:
> > > On Fri, Oct 04, 2013 at 06:12:16PM -0400, bfields wrote:
> > > > On Wed, Oct 02, 2013 at 05:28:14PM -0400, J. Bruce Fields wrote:
> > > > > @@ -268,6 +268,16 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> > > > >  	if (!dir->i_fop)
> > > > >  		goto out;
> > > > >  	/*
> > > > > +	 * inode->i_ino is unsigned long, kstat->ino is u64, so the
> > > > > +	 * former would be insufficient on 32-bit hosts when the
> > > > > +	 * filesystem supports 64-bit inode numbers.  So we need to
> > > > > +	 * actually call ->getattr, not just read i_ino:
> > > > > +	 */
> > > > > +	error = vfs_getattr_nosec(path, &stat);
> > > > 
> > > > Doh, "path" here is for the parent....  The following works better!
> > > 
> > > By the way, I'm testing this with:
> > > 
> > > 	- create a bunch of nested subdirectories, use
> > > 	  name_to_fhandle_at to get a handle for the bottom directory.
> > > 	- echo 2 >/proc/sys/vm/drop_caches
> > > 	- open_by_fhandle_at on the filehandle
> > > 
> > > But this only actually exercises the reconnect path on the first run
> > > after boot.  Is there something obvious I'm missing here?
> > 
> > Looking at the code....  OK, most of the work of drop_caches is done by
> > shrink_slab_node, which doesn't actually try to free every single thing
> > that it could free--in particular, it won't try to free anything if it
> > thinks there are less than shrinker->batch_size (1024 in the
> > super_block->s_shrink case) objects to free.

(Oops, sorry, that should have been "less than half of
shrinker->batch_size", see below.)

> That's not quite right. Yes, the shrinker won't be called if the
> calculated scan count is less than the batch size, but the left over
> is added back the shrinker scan count to carry over to the next call
> to the shrinker. Hence if you repeated call the shrinker on a small
> cache with a large batch size, it will eventually aggregate the scan
> counts to over the batch size and trim the cache....

No, in shrink_slab_count, we do this:

	if (total_scan > max_pass * 2)
		total_scan = max_pass * 2;

	while (total_scan >= batch_size) {
		...
	}

where max_pass is the value returned from count_objects.  So as long as
count_objects returns less than half batch_size, nothing ever happens.

(I wonder if that check's correct?  The "forever" in the comment above
it seems wrong at least.)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2013-10-09 14:53 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-12 16:03 why is i_ino unsigned long, anyway? J. Bruce Fields
2013-09-12 19:33 ` Al Viro
2013-09-29 11:54   ` Christoph Hellwig
2013-09-29 11:54     ` Christoph Hellwig
2013-10-02 14:25     ` J. Bruce Fields
2013-10-02 14:25       ` J. Bruce Fields
2013-10-02 15:43       ` J. Bruce Fields
2013-10-02 15:43         ` J. Bruce Fields
2013-10-02 16:04         ` Christoph Hellwig
2013-10-02 16:04           ` Christoph Hellwig
2013-10-02 18:14           ` J. Bruce Fields
2013-10-02 16:05       ` Christoph Hellwig
2013-10-02 16:05         ` Christoph Hellwig
2013-10-02 17:53         ` J. Bruce Fields
2013-10-02 17:53           ` J. Bruce Fields
2013-10-02 17:57           ` Christoph Hellwig
2013-10-02 17:57             ` Christoph Hellwig
2013-10-02 21:07             ` J. Bruce Fields
2013-10-02 21:28               ` [PATCH 1/2] vfs: split out vfs_getattr_nosec J. Bruce Fields
2013-10-02 21:28                 ` J. Bruce Fields
2013-10-02 21:28                 ` [PATCH 2/2] exportfs: fix 32-bit nfsd handling of 64-bit inode numbers J. Bruce Fields
2013-10-04 22:12                   ` J. Bruce Fields
2013-10-04 22:12                     ` J. Bruce Fields
2013-10-04 22:15                     ` J. Bruce Fields
2013-10-04 22:15                       ` J. Bruce Fields
2013-10-08 21:56                       ` J. Bruce Fields
2013-10-09  0:16                         ` Dave Chinner
2013-10-09 14:53                           ` J. Bruce Fields [this message]
2013-10-09 14:53                             ` J. Bruce Fields
2013-10-10 22:28                             ` Dave Chinner
2013-10-11 21:53                               ` J. Bruce Fields
2013-10-11 21:53                                 ` J. Bruce Fields
2013-10-13 22:52                                 ` Dave Chinner
2013-10-02 18:47           ` why is i_ino unsigned long, anyway? Sage Weil
2013-10-02 19:00             ` J. Bruce Fields
2013-10-02 19:00               ` J. Bruce Fields
2013-10-02 19:04               ` Sage Weil
2013-10-02 19:04                 ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131009145320.GD3456@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=bfields@redhat.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.