linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fsstack: fsstack_copy_inode_size locking
@ 2008-06-29  0:05 Hugh Dickins
  2008-06-29  7:15 ` Christoph Hellwig
  2008-06-30  4:31 ` hooanon05
  0 siblings, 2 replies; 10+ messages in thread
From: Hugh Dickins @ 2008-06-29  0:05 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Andrew Morton, mhalcrow, hooanon05, hch, viro, linux-fsdevel,
	linux-kernel

LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a
unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost
seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP.

But akpm was dissatisfied with the resulting patch: its lack of commentary,
the #ifs, the nesting around i_size_read, the lack of attention to i_blocks.
I promised to redo it with the general spin_lock_32bit() he proposed; but 
disliked the result, partly because "32bit" obscures the real constraints,
which are best commented within fsstack_copy_inode_size itself.

This version adds those comments, and uses sizeof comparisons which the
compiler can optimize out, instead of CONFIG_SMP, CONFIG_LSF. BITS_PER_LONG.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
Should follow mmotm's git-unionfs-fixup.patch

 fs/stack.c               |   58 +++++++++++++++++++++++++++++++------
 include/linux/fs_stack.h |    3 -
 2 files changed, 50 insertions(+), 11 deletions(-)

--- mmotm/fs/stack.c	2008-06-27 13:39:18.000000000 +0100
+++ linux/fs/stack.c	2008-06-27 18:24:19.000000000 +0100
@@ -19,16 +19,56 @@
  * This function cannot be inlined since i_size_{read,write} is rather
  * heavy-weight on 32-bit systems
  */
-void fsstack_copy_inode_size(struct inode *dst, const struct inode *src)
+void fsstack_copy_inode_size(struct inode *dst, struct inode *src)
 {
-#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
-	spin_lock(&dst->i_lock);
-#endif
-	i_size_write(dst, i_size_read(src));
-	dst->i_blocks = src->i_blocks;
-#if BITS_PER_LONG == 32 && defined(CONFIG_SMP)
-	spin_unlock(&dst->i_lock);
-#endif
+	loff_t i_size;
+	blkcnt_t i_blocks;
+
+	/*
+	 * i_size_read() includes its own seqlocking and protection from
+	 * preemption (see include/linux/fs.h): we need nothing extra for
+	 * that here, and prefer to avoid nesting locks than attempt to
+	 * keep i_size and i_blocks in synch together.
+	 */
+	i_size = i_size_read(src);
+
+	/*
+	 * But if CONFIG_LSF (on 32-bit), we ought to make an effort to keep
+	 * the two halves of i_blocks in synch despite SMP or PREEMPT - though
+	 * stat's generic_fillattr() doesn't bother, and we won't be applying
+	 * quotas (where i_blocks does become important) at the upper level.
+	 *
+	 * We don't actually know what locking is used at the lower level; but
+	 * if it's a filesystem that supports quotas, it will be using i_lock
+	 * as in inode_add_bytes().  tmpfs uses other locking, and its 32-bit
+	 * is (just) able to exceed 2TB i_size with the aid of holes; but its
+	 * i_blocks cannot carry into the upper long without almost 2TB swap -
+	 * let's ignore that case.
+	 */
+	if (sizeof(i_blocks) > sizeof(long))
+		spin_lock(&src->i_lock);
+	i_blocks = src->i_blocks;
+	if (sizeof(i_blocks) > sizeof(long))
+		spin_unlock(&src->i_lock);
+
+	/*
+	 * If CONFIG_SMP on 32-bit, it's vital for fsstack_copy_inode_size()
+	 * to hold some lock around i_size_write(), otherwise i_size_read()
+	 * may spin forever (see include/linux/fs.h).  We don't necessarily
+	 * hold i_mutex when this is called, so take i_lock for that case.
+	 *
+	 * And if CONFIG_LSF (on 32-bit), continue our effort to keep the
+	 * two halves of i_blocks in synch despite SMP or PREEMPT: use i_lock
+	 * for that case too, and do both at once by combining the tests.
+	 *
+	 * There is none of this locking overhead in the 64-bit case.
+	 */
+	if (sizeof(i_size) > sizeof(long) || sizeof(i_blocks) > sizeof(long))
+		spin_lock(&dst->i_lock);
+	i_size_write(dst, i_size);
+	dst->i_blocks = i_blocks;
+	if (sizeof(i_size) > sizeof(long) || sizeof(i_blocks) > sizeof(long))
+		spin_unlock(&dst->i_lock);
 }
 EXPORT_SYMBOL_GPL(fsstack_copy_inode_size);
 
--- mmotm/include/linux/fs_stack.h	2008-06-27 13:39:19.000000000 +0100
+++ linux/include/linux/fs_stack.h	2008-06-27 14:08:00.000000000 +0100
@@ -21,8 +21,7 @@
 
 /* externs for fs/stack.c */
 extern void fsstack_copy_attr_all(struct inode *dest, const struct inode *src);
-extern void fsstack_copy_inode_size(struct inode *dst,
-				    const struct inode *src);
+extern void fsstack_copy_inode_size(struct inode *dst, struct inode *src);
 
 /* inlines */
 static inline void fsstack_copy_attr_atime(struct inode *dest,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-06-29  0:05 [PATCH] fsstack: fsstack_copy_inode_size locking Hugh Dickins
@ 2008-06-29  7:15 ` Christoph Hellwig
  2008-06-29 11:56   ` Hugh Dickins
  2008-06-30  4:31 ` hooanon05
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2008-06-29  7:15 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Erez Zadok, Andrew Morton, mhalcrow, hooanon05, hch, viro,
	linux-fsdevel, linux-kernel

On Sun, Jun 29, 2008 at 01:05:09AM +0100, Hugh Dickins wrote:
> LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a
> unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost
> seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP.
> 
> But akpm was dissatisfied with the resulting patch: its lack of commentary,
> the #ifs, the nesting around i_size_read, the lack of attention to i_blocks.
> I promised to redo it with the general spin_lock_32bit() he proposed; but 
> disliked the result, partly because "32bit" obscures the real constraints,
> which are best commented within fsstack_copy_inode_size itself.
> 
> This version adds those comments, and uses sizeof comparisons which the
> compiler can optimize out, instead of CONFIG_SMP, CONFIG_LSF. BITS_PER_LONG.

Btw, I hope fsstack doesn't rely on i_size having any particular
meaning.  As far as the VFS is concerned i_size is field only used by
the filesystem (or library routines like generic_file_*).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-06-29  7:15 ` Christoph Hellwig
@ 2008-06-29 11:56   ` Hugh Dickins
  2008-06-30 18:19     ` Erez Zadok
  0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2008-06-29 11:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Erez Zadok, Andrew Morton, mhalcrow, hooanon05, viro,
	linux-fsdevel, linux-kernel

On Sun, 29 Jun 2008, Christoph Hellwig wrote:
> 
> Btw, I hope fsstack doesn't rely on i_size having any particular
> meaning.  As far as the VFS is concerned i_size is field only used by
> the filesystem (or library routines like generic_file_*).

Interesting point.  I can't speak for fsstack itself (I'm not even
sure if it's anything beyond fs/stack.c and the tag I used to identify
where this patch lies); but certainly fs/stack.c doesn't use i_size
for anything, just duplicates it from the lower filesystem.

unionfs (which I think you don't care for at all in general) does
look as if it assumes it's the lower file size in a few places,
when copying up or truncating.  Isn't that reasonable?  Wouldn't
users make the same assumption?

Or are you saying that filesystems which don't support the usual
meaning of inode->i_size (leave it 0?) would supply their own
equivalent to vmtruncate() if they support truncation, and their
own getattr which fills in stat->size from somewhere else.

Yes, I think you are saying that: unionfs may not play well with them.

Hugh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-06-29  0:05 [PATCH] fsstack: fsstack_copy_inode_size locking Hugh Dickins
  2008-06-29  7:15 ` Christoph Hellwig
@ 2008-06-30  4:31 ` hooanon05
  2008-06-30 12:10   ` Hugh Dickins
  1 sibling, 1 reply; 10+ messages in thread
From: hooanon05 @ 2008-06-30  4:31 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Erez Zadok, Andrew Morton, mhalcrow, hch, viro, linux-fsdevel,
	linux-kernel


Hugh Dickins:
> LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a
> unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost
> seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP.

I don't know why dst->i_lock is affected by src->i_size_seqcount.
Do you mean that your test issued write(2) to the lower/actual file so
frequently that i_size_read() in unionfs always failed?

Is your test
iogen01 export LTPROOT; rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:doio.f1.$$ 1000b:doio.f2.$$
line in runtest/fs?


Junjiro Okajima

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-06-30  4:31 ` hooanon05
@ 2008-06-30 12:10   ` Hugh Dickins
  2008-06-30 13:08     ` hooanon05
  0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2008-06-30 12:10 UTC (permalink / raw)
  To: hooanon05
  Cc: Erez Zadok, Andrew Morton, mhalcrow, hch, viro, linux-fsdevel,
	linux-kernel

On Mon, 30 Jun 2008, hooanon05@yahoo.co.jp wrote:
> Hugh Dickins:
> > LTP's iogen01 doio tests used to hang nicely on 32-bit SMP when /tmp was a
> > unionfs mount of a tmpfs, i_size_read spinning forever, waiting for a lost
> > seqcount update: fixed by taking i_lock around i_size_write when 32-bit SMP.

(That, of course, is already fixed in Erez's unionfs git, so in mm;
but I thought I'd better repeat the history because some of akpm's
objections to pushing the fix to mainline came from the lack of
this explanation when the patch was submitted for mainline.)

> 
> I don't know why dst->i_lock is affected by src->i_size_seqcount.

It certainly shouldn't be.  The problem would have come from two
racing i_size_write(dst)s, one of the unlocked increments getting
lost, leaving seqcount odd, so the next i_size_read(dst) would
spin forever waiting for it to go even.

> Do you mean that your test issued write(2) to the lower/actual file

I was merely running LTP, on a Core(2) Quad machine, with /tmp as
unionfs mount of a tmpfs.  It wouldn't have been doing anything
directly to the lower file, that would all have been coming via
the unionfs mount.

> so frequently that i_size_read() in unionfs always failed?

I'm not sure what you mean by that.  i_size_read() doesn't fail,
but it may loop; and if the seqcount has got out of step from
concurrent unlocked i_size_write()s, then it'll spin forever.

> 
> Is your test
> iogen01 export LTPROOT; rwtest -N iogen01 -i 120s -s read,write -Da -Dv -n 2 500b:doio.f1.$$ 1000b:doio.f2.$$
> line in runtest/fs?

That looks like it: I don't think I ever tried it as a standalone test,
just investigated what was happening when standard LTP runs hung here.

Hugh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-06-30 12:10   ` Hugh Dickins
@ 2008-06-30 13:08     ` hooanon05
  0 siblings, 0 replies; 10+ messages in thread
From: hooanon05 @ 2008-06-30 13:08 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Erez Zadok, Andrew Morton, mhalcrow, hch, viro, linux-fsdevel,
	linux-kernel


Hugh Dickins:
> It certainly shouldn't be.  The problem would have come from two
> racing i_size_write(dst)s, one of the unlocked increments getting
> lost, leaving seqcount odd, so the next i_size_read(dst) would
> spin forever waiting for it to go even.

I see.
The unlocked increment can cause the next i_size_read() hang.


> I'm not sure what you mean by that.  i_size_read() doesn't fail,
> but it may loop; and if the seqcount has got out of step from
> concurrent unlocked i_size_write()s, then it'll spin forever.

What I meant by "fail" was "loop" actually.
And I understand that you didn't writing directly (bypassing unionfs)
too.


Junjiro Okajima

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-06-29 11:56   ` Hugh Dickins
@ 2008-06-30 18:19     ` Erez Zadok
  2008-06-30 21:49       ` Michael Halcrow
  0 siblings, 1 reply; 10+ messages in thread
From: Erez Zadok @ 2008-06-30 18:19 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Christoph Hellwig, Erez Zadok, Andrew Morton, mhalcrow, hooanon05,
	viro, linux-fsdevel, linux-kernel

In message <Pine.LNX.4.64.0806291244440.32708@blonde.site>, Hugh Dickins writes:
> On Sun, 29 Jun 2008, Christoph Hellwig wrote:
> > 
> > Btw, I hope fsstack doesn't rely on i_size having any particular
> > meaning.  As far as the VFS is concerned i_size is field only used by
> > the filesystem (or library routines like generic_file_*).
> 
> Interesting point.  I can't speak for fsstack itself (I'm not even
> sure if it's anything beyond fs/stack.c and the tag I used to identify
> where this patch lies); but certainly fs/stack.c doesn't use i_size
> for anything, just duplicates it from the lower filesystem.
> 
> unionfs (which I think you don't care for at all in general) does
> look as if it assumes it's the lower file size in a few places,
> when copying up or truncating.  Isn't that reasonable?  Wouldn't
> users make the same assumption?
> 
> Or are you saying that filesystems which don't support the usual
> meaning of inode->i_size (leave it 0?) would supply their own
> equivalent to vmtruncate() if they support truncation, and their
> own getattr which fills in stat->size from somewhere else.
> 
> Yes, I think you are saying that: unionfs may not play well with them.
> 
> Hugh

Hugh, yes, the only place in fsstack where i_size is used is to copy the
lower i_size to the upper one verbatim.  If this assumption is incorrect for
some lower file systems, then stackable file systems in general may have a
problem with this assumption; in that case, we'll need an alternative way to
"interpret" the lower i_size, and report the i_size in the upper inode (and
hence to the user).

Is there such an alternative?

BTW, ecryptfs may have a problem with this, b/c it uses i_size_read/write
b/t the lower and upper inodes.  If some filesystems have a different
interpretation for i_size, then stacking ecryptfs on top of them could be an
issue.  Mike?

Erez.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-06-30 18:19     ` Erez Zadok
@ 2008-06-30 21:49       ` Michael Halcrow
  2008-07-01  7:03         ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Halcrow @ 2008-06-30 21:49 UTC (permalink / raw)
  To: Erez Zadok
  Cc: Hugh Dickins, Christoph Hellwig, Andrew Morton, hooanon05, viro,
	linux-fsdevel, linux-kernel

On Mon, Jun 30, 2008 at 02:19:27PM -0400, Erez Zadok wrote:
> BTW, ecryptfs may have a problem with this, b/c it uses
> i_size_read/write b/t the lower and upper inodes.  If some
> filesystems have a different interpretation for i_size, then
> stacking ecryptfs on top of them could be an issue.  Mike?

eCryptfs depends on the results of i_size_read() on the lower inode
when it needs to interpolate the filesize when the metadata is stored
in the lower file's xattr region.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-06-30 21:49       ` Michael Halcrow
@ 2008-07-01  7:03         ` Christoph Hellwig
  2008-07-02  0:00           ` Erez Zadok
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2008-07-01  7:03 UTC (permalink / raw)
  To: Michael Halcrow
  Cc: Erez Zadok, Hugh Dickins, Christoph Hellwig, Andrew Morton,
	hooanon05, viro, linux-fsdevel, linux-kernel

On Mon, Jun 30, 2008 at 04:49:01PM -0500, Michael Halcrow wrote:
> eCryptfs depends on the results of i_size_read() on the lower inode
> when it needs to interpolate the filesize when the metadata is stored
> in the lower file's xattr region.

please use vfs_getattr to get the size in thst case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] fsstack: fsstack_copy_inode_size locking
  2008-07-01  7:03         ` Christoph Hellwig
@ 2008-07-02  0:00           ` Erez Zadok
  0 siblings, 0 replies; 10+ messages in thread
From: Erez Zadok @ 2008-07-02  0:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael Halcrow, Erez Zadok, Hugh Dickins, Andrew Morton,
	hooanon05, viro, linux-fsdevel, linux-kernel

In message <20080701070350.GA22914@infradead.org>, Christoph Hellwig writes:
> On Mon, Jun 30, 2008 at 04:49:01PM -0500, Michael Halcrow wrote:
> > eCryptfs depends on the results of i_size_read() on the lower inode
> > when it needs to interpolate the filesize when the metadata is stored
> > in the lower file's xattr region.
> 
> please use vfs_getattr to get the size in thst case.

So, Christoph, is using vfs_getattr() the preferred choice for stackable
file systems in general to get the lower inode size?

If so, the Hugh's fsstack_copy_inode_size patch will have to be updated: and
we'll probably have to change the names of the src/dst parameters to
lower/upper, possibly watch out for whether "src" was the lower or upper
inode, and be careful about locking semantics when calling vfs_getattr.

Who'd have thought that a concept as simple as
      dst->i_size = src->i_size
could turn out to be so hairy... 1/2 a :-)

Thanks,
Erez.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-07-02  0:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-29  0:05 [PATCH] fsstack: fsstack_copy_inode_size locking Hugh Dickins
2008-06-29  7:15 ` Christoph Hellwig
2008-06-29 11:56   ` Hugh Dickins
2008-06-30 18:19     ` Erez Zadok
2008-06-30 21:49       ` Michael Halcrow
2008-07-01  7:03         ` Christoph Hellwig
2008-07-02  0:00           ` Erez Zadok
2008-06-30  4:31 ` hooanon05
2008-06-30 12:10   ` Hugh Dickins
2008-06-30 13:08     ` hooanon05

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).