* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-29 6:25 ` Andrew Morton
@ 2004-01-29 23:20 ` Miquel van Smoorenburg
2004-02-04 0:03 ` Christoph Hellwig
2004-01-30 16:01 ` Miquel van Smoorenburg
2004-01-30 20:21 ` Miquel van Smoorenburg
2 siblings, 1 reply; 25+ messages in thread
From: Miquel van Smoorenburg @ 2004-01-29 23:20 UTC (permalink / raw)
To: Andrew Morton, Nathan Scott; +Cc: linux-kernel, linux-xfs
According to Andrew Morton:
> "Miquel van Smoorenburg" <miquels@cistron.nl> wrote:
> >
> > On the NFS server I'm exporting an XFS filesystem to the client
> > over Gigabit ethernet. The client mounts using
> > mount -o nfsvers=3,intr,rsize=32768,wsize=32768 server:/export/hwr /mnt
> >
> > On the client I then run a large dd to a file on the server,
> > like dd if=/dev/zero of=/mnt/file bs=4096 count=800000
> >
> > In a few seconds, the server locks up. It spins in
> > generic_fillattr(), apparently in the i_size_read() inline function.
> > Server responds to pings and sysrq, but nothing else.
> >
> > 2.6.1 doesn't show this behaviour. I tested several kernels:
Okay so I did some more tests today. As it turns out, all stock
2.6 kernels lock up - it's just that 2.6.2-rc* locks up faster
than earlier versions. I tested 2.6.1-rc1-mm3 and 2.6.2-rc2-mm1,
these do not lock up (test on -mm has been running for 8 hours
straight now, 2.6.* vanilla locks up in a few minutes).
> > Here's the sysrq output:
> >
> > Pid: 531, comm: nfsd
> > EIP: 0060:[<c01577c2>] CPU: 0
> > EIP is at generic_fillattr+0x82/0xa0
> > EFLAGS: 00000246 Not tainted
> > EAX: 00000000 EBX: 07ae7200 ECX: f650a0a0 EDX: 000113eb
> > ESI: 00000000 EDI: f66cfed4 EBP: f66e5804 DS: 007b ES: 007b
> > CR0: 8005003b CR2: 40019000 CR3: 37245000 CR4: 000006d0
> > Call Trace:
> > [<f8ab1f6b>] linvfs_getattr+0x2b/0x34 [xfs]
> > [<c0157805>] vfs_getattr+0x25/0x84
> > [<c01c19a3>] encode_post_op_attr+0x53/0x238
> > [<c01c1e27>] encode_wcc_data+0x29f/0x2a8
> > [<c01c4521>] nfs3svc_encode_commitres+0x19/0x5c
> > [<c01b709d>] nfsd_dispatch+0x14d/0x1a3
> > [<c02fb79b>] svc_process+0x3b3/0x640
> > [<c01b6ddc>] nfsd+0x1e4/0x358
> > [<c01b6bf8>] nfsd+0x0/0x358
> > [<c0107251>] kernel_thread_helper+0x5/0xc
> >
>
> Is the EIP _always_ inside generic_fillattr()?
Yes, if a keep on doing BREAK-p the output is alway the same.
EIP varies a few bytes, ofcourse, not much.
> If so then yes, your analysis look right. I'd say that the inode has been
> corrupted and the seqcount counter has assumed an non-even value. That
> will cause i_size_read() to lock up.
>
> Are you using slab debugging? Is so, does the lockup go away if you change
> mm/slab.c:POISON_FREE to an even value, say 0x6a? That would tend to
> confirm a use-after-free problem. Or a totally wild pointer.
I turned on slab debugging but it didn't help, and it didn't make
any difference. Well actually it made it take longer to show the
effect - at first I thought that the problem dissapeared when I turned
on slab debugging, but when I removed the "count=bignumber" from the
dd command, it locked up eventually (in a few minutes).
I also changed the debug constanst from odd to even and the
other way around, but that had no effect - no messages logged,
and the kernel still locked up.
Note that at this heavy load, there are up to 8 nfsd's running
on the server concurrently. There must be some lock or race ...
According to Nathan Scott:
> Hmmmm... I don't think Andrew has any XFS fixes in his tree that
> might help there; and I can't think of any XFS change in -rc1 that
> might have caused this (does the fs/xfs subdir from 2.6.1 plonked
> down in place of the 2.6.2-rc1/2 version still have the problem?)
Yes, sorry for the confusion - all 2.6 (non-mm) shows this behaviour.
I tried -mm1 XFS in 2.6.1-rc2, but that did't help, it still
locks up (the diff is just a one-liner, so that was to be expected).
So it doesn't look like an XFS problem, perse.
> i_size_read seems to have a fair bit of config dependency - are you
> CONFIG_SMP / CONFIG_PREEMPT / neither? and is your BITS_PER_LONG
> 32 or 64? thanks.
CONFIG_SMP (for P IV hyperthreading), 32 bits.
Mike.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-29 23:20 ` Miquel van Smoorenburg
@ 2004-02-04 0:03 ` Christoph Hellwig
2004-02-03 14:13 ` Steve Lord
2004-02-04 0:06 ` David Weinehall
0 siblings, 2 replies; 25+ messages in thread
From: Christoph Hellwig @ 2004-02-04 0:03 UTC (permalink / raw)
To: Miquel van Smoorenburg
Cc: Andrew Morton, Nathan Scott, linux-kernel, linux-xfs
Okay, what about this little patch?:
Index: fs/xfs/linux-2.6/xfs_iops.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_iops.c,v
retrieving revision 1.212
diff -u -p -r1.212 xfs_iops.c
--- fs/xfs/linux-2.6/xfs_iops.c 12 Dec 2003 04:17:52 -0000 1.212
+++ fs/xfs/linux-2.6/xfs_iops.c 3 Feb 2004 23:56:17 -0000
@@ -80,11 +80,15 @@ validate_fields(
vattr_t va;
int error;
- va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS;
+ va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS|XFS_AT_SIZE;
VOP_GETATTR(vp, &va, ATTR_LAZY, NULL, error);
ip->i_nlink = va.va_nlink;
ip->i_size = va.va_size;
ip->i_blocks = va.va_nblocks;
+
+ /* we're under i_sem so i_size can't change under us */
+ if (i_size_read(ip) != va.va_size)
+ i_size_write(ip, va.va_size);
}
/*
@@ -186,8 +190,7 @@ linvfs_mknod(
if (S_ISCHR(mode) || S_ISBLK(mode))
ip->i_rdev = rdev;
- else if (S_ISDIR(mode))
- validate_fields(ip);
+ validate_fields(ip);
d_instantiate(dentry, ip);
validate_fields(dir);
}
@@ -536,6 +539,7 @@ linvfs_setattr(
if (error)
return(-error); /* Positive error up from XFS */
if (ia_valid & ATTR_SIZE) {
+ i_size_write(inode, vattr.va_size);
error = vmtruncate(inode, attr->ia_size);
}
Index: fs/xfs/linux-2.6/xfs_vnode.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_vnode.c,v
retrieving revision 1.120
diff -u -p -r1.120 xfs_vnode.c
--- fs/xfs/linux-2.6/xfs_vnode.c 20 Oct 2003 02:08:58 -0000 1.120
+++ fs/xfs/linux-2.6/xfs_vnode.c 3 Feb 2004 23:56:17 -0000
@@ -213,7 +213,6 @@ vn_revalidate(
inode->i_mtime = va.va_mtime;
inode->i_ctime = va.va_ctime;
inode->i_atime = va.va_atime;
- i_size_write(inode, va.va_size);
if (va.va_xflags & XFS_XFLAG_IMMUTABLE)
inode->i_flags |= S_IMMUTABLE;
else
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-02-04 0:03 ` Christoph Hellwig
@ 2004-02-03 14:13 ` Steve Lord
2004-02-04 15:16 ` Christoph Hellwig
2004-02-04 18:17 ` Christoph Hellwig
2004-02-04 0:06 ` David Weinehall
1 sibling, 2 replies; 25+ messages in thread
From: Steve Lord @ 2004-02-03 14:13 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Miquel van Smoorenburg, Andrew Morton, Nathan Scott, linux-kernel,
linux-xfs
Christoph Hellwig wrote:
> Okay, what about this little patch?:
>
>
> Index: fs/xfs/linux-2.6/xfs_iops.c
> ===================================================================
> RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_iops.c,v
> retrieving revision 1.212
> diff -u -p -r1.212 xfs_iops.c
> --- fs/xfs/linux-2.6/xfs_iops.c 12 Dec 2003 04:17:52 -0000 1.212
> +++ fs/xfs/linux-2.6/xfs_iops.c 3 Feb 2004 23:56:17 -0000
> @@ -80,11 +80,15 @@ validate_fields(
> vattr_t va;
> int error;
>
> - va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS;
> + va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS|XFS_AT_SIZE;
> VOP_GETATTR(vp, &va, ATTR_LAZY, NULL, error);
> ip->i_nlink = va.va_nlink;
> ip->i_size = va.va_size;
> ip->i_blocks = va.va_nblocks;
> +
> + /* we're under i_sem so i_size can't change under us */
> + if (i_size_read(ip) != va.va_size)
> + i_size_write(ip, va.va_size);
> }
>
> /*
> @@ -186,8 +190,7 @@ linvfs_mknod(
>
> if (S_ISCHR(mode) || S_ISBLK(mode))
> ip->i_rdev = rdev;
> - else if (S_ISDIR(mode))
> - validate_fields(ip);
> + validate_fields(ip);
There was some reason this was only necessary on directories, but I
cannot remember why just now.
> d_instantiate(dentry, ip);
> validate_fields(dir);
> }
> @@ -536,6 +539,7 @@ linvfs_setattr(
> if (error)
> return(-error); /* Positive error up from XFS */
> if (ia_valid & ATTR_SIZE) {
> + i_size_write(inode, vattr.va_size);
> error = vmtruncate(inode, attr->ia_size);
> }
>
> Index: fs/xfs/linux-2.6/xfs_vnode.c
> ===================================================================
> RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_vnode.c,v
> retrieving revision 1.120
> diff -u -p -r1.120 xfs_vnode.c
> --- fs/xfs/linux-2.6/xfs_vnode.c 20 Oct 2003 02:08:58 -0000 1.120
> +++ fs/xfs/linux-2.6/xfs_vnode.c 3 Feb 2004 23:56:17 -0000
> @@ -213,7 +213,6 @@ vn_revalidate(
> inode->i_mtime = va.va_mtime;
> inode->i_ctime = va.va_ctime;
> inode->i_atime = va.va_atime;
> - i_size_write(inode, va.va_size);
> if (va.va_xflags & XFS_XFLAG_IMMUTABLE)
> inode->i_flags |= S_IMMUTABLE;
> else
>
I think this should work, it just leaves the extending O_DIRECT write
case. Keeping the revalidate call out of the path for creating regular
files would be nice though, why did you deem that necessary?
Steve
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-02-03 14:13 ` Steve Lord
@ 2004-02-04 15:16 ` Christoph Hellwig
2004-02-04 18:17 ` Christoph Hellwig
1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2004-02-04 15:16 UTC (permalink / raw)
To: Steve Lord
Cc: Christoph Hellwig, Miquel van Smoorenburg, Andrew Morton,
Nathan Scott, linux-kernel, linux-xfs
On Tue, Feb 03, 2004 at 08:13:04AM -0600, Steve Lord wrote:
> > ip->i_rdev = rdev;
> > - else if (S_ISDIR(mode))
> > - validate_fields(ip);
> > + validate_fields(ip);
>
> There was some reason this was only necessary on directories, but I
> cannot remember why just now.
Well, it is nessecary now to update i_size. Or rather it was, I think
I can get rid of it again after taking care of initialize_vnode.
> I think this should work, it just leaves the extending O_DIRECT write
> case.
And initialize_vnode. I have a working patch for the latter, but I still
need to take a look at O_DIRECT.
> Keeping the revalidate call out of the path for creating regular
> files would be nice though, why did you deem that necessary?
I thought I need it for i_size udates, but we should be able to take
care of it in initialize_vnode.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-02-03 14:13 ` Steve Lord
2004-02-04 15:16 ` Christoph Hellwig
@ 2004-02-04 18:17 ` Christoph Hellwig
1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2004-02-04 18:17 UTC (permalink / raw)
To: Steve Lord
Cc: Christoph Hellwig, Miquel van Smoorenburg, Andrew Morton,
Nathan Scott, linux-kernel, linux-xfs
Here's a new diff, that should get rid of all the warnings except for
O_DIRECT. I've also addressed the comments from Dave and Steve.
Index: fs/xfs/linux-2.6/xfs_iops.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_iops.c,v
retrieving revision 1.212
diff -u -p -r1.212 xfs_iops.c
--- fs/xfs/linux-2.6/xfs_iops.c 12 Dec 2003 04:17:52 -0000 1.212
+++ fs/xfs/linux-2.6/xfs_iops.c 4 Feb 2004 17:56:34 -0000
@@ -82,9 +82,14 @@ validate_fields(
va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS;
VOP_GETATTR(vp, &va, ATTR_LAZY, NULL, error);
- ip->i_nlink = va.va_nlink;
- ip->i_size = va.va_size;
- ip->i_blocks = va.va_nblocks;
+ if (likely(!error)) {
+ ip->i_nlink = va.va_nlink;
+ ip->i_blocks = va.va_nblocks;
+
+ /* we're under i_sem so i_size can't change under us */
+ if (i_size_read(ip) != va.va_size)
+ i_size_write(ip, va.va_size);
+ }
}
/*
@@ -536,6 +541,7 @@ linvfs_setattr(
if (error)
return(-error); /* Positive error up from XFS */
if (ia_valid & ATTR_SIZE) {
+ i_size_write(inode, vattr.va_size);
error = vmtruncate(inode, attr->ia_size);
}
Index: fs/xfs/linux-2.6/xfs_super.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_super.c,v
retrieving revision 1.294
diff -u -p -r1.294 xfs_super.c
--- fs/xfs/linux-2.6/xfs_super.c 21 Jan 2004 16:46:06 -0000 1.294
+++ fs/xfs/linux-2.6/xfs_super.c 4 Feb 2004 17:56:55 -0000
@@ -178,7 +178,7 @@ xfs_revalidate_inode(
}
inode->i_blksize = PAGE_CACHE_SIZE;
inode->i_generation = ip->i_d.di_gen;
- i_size_write(inode, ip->i_d.di_size);
+ inode->i_size = ip->i_d.di_size;
inode->i_blocks =
XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
inode->i_atime.tv_sec = ip->i_d.di_atime.t_sec;
Index: fs/xfs/linux-2.6/xfs_vnode.c
===================================================================
RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_vnode.c,v
retrieving revision 1.120
diff -u -p -r1.120 xfs_vnode.c
--- fs/xfs/linux-2.6/xfs_vnode.c 20 Oct 2003 02:08:58 -0000 1.120
+++ fs/xfs/linux-2.6/xfs_vnode.c 4 Feb 2004 17:56:55 -0000
@@ -213,7 +213,6 @@ vn_revalidate(
inode->i_mtime = va.va_mtime;
inode->i_ctime = va.va_ctime;
inode->i_atime = va.va_atime;
- i_size_write(inode, va.va_size);
if (va.va_xflags & XFS_XFLAG_IMMUTABLE)
inode->i_flags |= S_IMMUTABLE;
else
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-02-04 0:03 ` Christoph Hellwig
2004-02-03 14:13 ` Steve Lord
@ 2004-02-04 0:06 ` David Weinehall
2004-02-04 0:07 ` Christoph Hellwig
1 sibling, 1 reply; 25+ messages in thread
From: David Weinehall @ 2004-02-04 0:06 UTC (permalink / raw)
To: Christoph Hellwig, Miquel van Smoorenburg, Andrew Morton,
Nathan Scott, linux-kernel, linux-xfs
On Wed, Feb 04, 2004 at 12:03:15AM +0000, Christoph Hellwig wrote:
> Okay, what about this little patch?:
>
>
> Index: fs/xfs/linux-2.6/xfs_iops.c
> ===================================================================
> RCS file: /cvs/linux-2.6-xfs/fs/xfs/linux-2.6/xfs_iops.c,v
> retrieving revision 1.212
> diff -u -p -r1.212 xfs_iops.c
> --- fs/xfs/linux-2.6/xfs_iops.c 12 Dec 2003 04:17:52 -0000 1.212
> +++ fs/xfs/linux-2.6/xfs_iops.c 3 Feb 2004 23:56:17 -0000
> @@ -80,11 +80,15 @@ validate_fields(
> vattr_t va;
> int error;
>
> - va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS;
> + va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS|XFS_AT_SIZE;
Huh? XFS_AT_SIZE twice?!
[snip]
Regards: David Weinehall
--
/) David Weinehall <tao@acc.umu.se> /) Northern lights wander (\
// Maintainer of the v2.0 kernel // Dance across the winter sky //
\) http://www.acc.umu.se/~tao/ (/ Full colour fire (/
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-02-04 0:06 ` David Weinehall
@ 2004-02-04 0:07 ` Christoph Hellwig
0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2004-02-04 0:07 UTC (permalink / raw)
To: Christoph Hellwig, Miquel van Smoorenburg, Andrew Morton,
Nathan Scott, linux-kernel, linux-xfs
On Wed, Feb 04, 2004 at 01:06:23AM +0100, David Weinehall wrote:
> > - va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS;
> > + va.va_mask = XFS_AT_NLINK|XFS_AT_SIZE|XFS_AT_NBLOCKS|XFS_AT_SIZE;
>
> Huh? XFS_AT_SIZE twice?!
Ah thanks. Stupid braino but fortunately harmless..
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-29 6:25 ` Andrew Morton
2004-01-29 23:20 ` Miquel van Smoorenburg
@ 2004-01-30 16:01 ` Miquel van Smoorenburg
2004-01-30 20:21 ` Miquel van Smoorenburg
2 siblings, 0 replies; 25+ messages in thread
From: Miquel van Smoorenburg @ 2004-01-30 16:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: Miquel van Smoorenburg, linux-kernel
On Thu, 29 Jan 2004 07:25:21, Andrew Morton wrote:
> "Miquel van Smoorenburg" <miquels@cistron.nl> wrote:
> >
> > I have a Linux 2.6.2-rc2 NFS file server and another similar
> > box as client. Kernel is compiled for SMP (hyperthreading).
> >
> > In a few seconds, the server locks up. It spins in
> > generic_fillattr(), apparently in the i_size_read() inline function.
> > Server responds to pings and sysrq, but nothing else.
> > Here's the sysrq output:
> >
> > Pid: 531, comm: nfsd
> > EIP: 0060:[<c01577c2>] CPU: 0
> > EIP is at generic_fillattr+0x82/0xa0
> > EFLAGS: 00000246 Not tainted
> > EAX: 00000000 EBX: 07ae7200 ECX: f650a0a0 EDX: 000113eb
> > ESI: 00000000 EDI: f66cfed4 EBP: f66e5804 DS: 007b ES: 007b
> > CR0: 8005003b CR2: 40019000 CR3: 37245000 CR4: 000006d0
> > Call Trace:
> > [<f8ab1f6b>] linvfs_getattr+0x2b/0x34 [xfs]
> > [<c0157805>] vfs_getattr+0x25/0x84
> > [<c01c19a3>] encode_post_op_attr+0x53/0x238
> > [<c01c1e27>] encode_wcc_data+0x29f/0x2a8
> > [<c01c4521>] nfs3svc_encode_commitres+0x19/0x5c
> > [<c01b709d>] nfsd_dispatch+0x14d/0x1a3
> > [<c02fb79b>] svc_process+0x3b3/0x640
> > [<c01b6ddc>] nfsd+0x1e4/0x358
> > [<c01b6bf8>] nfsd+0x0/0x358
> > [<c0107251>] kernel_thread_helper+0x5/0xc
> >
>
> Is the EIP _always_ inside generic_fillattr()?
>
> If so then yes, your analysis look right. I'd say that the inode has been
> corrupted and the seqcount counter has assumed an non-even value. That
> will cause i_size_read() to lock up.
Yes, but why does it lock up the whole machine? I'd say just that one
process should go into 'R', but other things should get scheduled right?
Or is some very common lock being held? The machine responds to pings,
so interrupts are working.
In fact as this is a hyperthreaded machine, why don't I see anything
running on the "other virtual cpu" ? I only see nfsd spinning on CPU#0
Could it be a CPU problem, the other CPU-thread is in i_size_write() but
that CPU-thread is simply never run ? To test this, I rebooted the exact
same 2.6.2-rc2 kernel with "maxcpus=1" .. and it doesn't lock up ..
(Still weird that -mm doesn't lock up)
Mike.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-29 6:25 ` Andrew Morton
2004-01-29 23:20 ` Miquel van Smoorenburg
2004-01-30 16:01 ` Miquel van Smoorenburg
@ 2004-01-30 20:21 ` Miquel van Smoorenburg
2004-01-30 22:13 ` Miquel van Smoorenburg
2 siblings, 1 reply; 25+ messages in thread
From: Miquel van Smoorenburg @ 2004-01-30 20:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: Miquel van Smoorenburg, linux-kernel
On Thu, 29 Jan 2004 07:25:21, Andrew Morton wrote:
> "Miquel van Smoorenburg" <miquels@cistron.nl> wrote:
> >
> > I have a Linux 2.6.2-rc2 NFS file server and another similar
> > box as client. Kernel is compiled for SMP (hyperthreading).
> > In a few seconds, the server locks up. It spins in
> > generic_fillattr(), apparently in the i_size_read() inline function.
> > Server responds to pings and sysrq, but nothing else.
>
> Is the EIP _always_ inside generic_fillattr()?
>
> If so then yes, your analysis look right. I'd say that the inode has been
> corrupted and the seqcount counter has assumed an non-even value. That
> will cause i_size_read() to lock up.
I added some extra code to i_size_read() and i_size_write(). First,
some debugging code:
--- fs.h.orig 2004-01-30 21:10:28.000000000 +0100
+++ fs.h.v1 2004-01-30 21:11:19.000000000 +0100
@@ -425,6 +425,7 @@
} u;
#ifdef __NEED_I_SIZE_ORDERED
seqcount_t i_size_seqcount;
+ pid_t seq_pid; /* XXX */
#endif
};
@@ -450,6 +451,12 @@
do {
seq = read_seqcount_begin(&inode->i_size_seqcount);
i_size = inode->i_size;
+#if 1 /* XXX HACK */
+ if ((++count & 65535) == 0) {
+ printk("i_size_read() seems to be looping - pid %d\n", inode->seq_pid);
+ mdelay(100);
+ }
+#endif
} while (read_seqcount_retry(&inode->i_size_seqcount, seq));
return i_size;
#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
@@ -467,9 +474,20 @@
static inline void i_size_write(struct inode *inode, loff_t i_size)
{
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+#if 1 /* XXX */
+ inode->seq_pid = current->tgid;
+ write_seqcount_begin(&inode->i_size_seqcount);
+ inode->i_size = i_size;
+ write_seqcount_end(&inode->i_size_seqcount);
+ if (inode->i_size_seqcount.sequence & 1)
+ printk("i_size_write: pid %d: sequence is odd!\n",
+ current->tgid);
+ inode->seq_pid = 0;
+#else
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;
write_seqcount_end(&inode->i_size_seqcount);
+#endif
#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
preempt_disable();
inode->i_size = i_size;
I then started the test that locks up the kernel, and it printed this:
i_size_write: pid 542: sequence is odd!
i_size_write: pid 543: sequence is odd!
i_size_write: pid 542: sequence is odd!
i_size_read() seems to be looping - pid 0
i_size_read() seems to be looping - pid 0
[this keeps on being printed and the kernel is locked up]
It took some time for the i_size_write messages to show up, and they were
spaced 10-30 seconds apart, and during that time the server was still
up - right until the first i_size_read message.
Then I added this patch:
--- fs.h.v1 2004-01-30 21:11:19.000000000 +0100
+++ fs.h 2004-01-30 20:16:35.000000000 +0100
@@ -426,6 +426,7 @@
#ifdef __NEED_I_SIZE_ORDERED
seqcount_t i_size_seqcount;
pid_t seq_pid; /* XXX */
+ spinlock_t i_size_lock;
#endif
};
@@ -475,6 +476,7 @@
{
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
#if 1 /* XXX */
+ spin_lock(&inode->i_size_lock);
inode->seq_pid = current->tgid;
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;
@@ -483,6 +485,7 @@
printk("i_size_write: pid %d: sequence is odd!\n",
current->tgid);
inode->seq_pid = 0;
+ spin_unlock(&inode->i_size_lock);
#else
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;
(and some code in fs/inode.c to initialize i_size_lock)
Guess what. No more debug output, no more lockups ... is there
anything else I can do to debug this ? Because I'm not really
sure what I'm doing, you see :)
Mike.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-30 20:21 ` Miquel van Smoorenburg
@ 2004-01-30 22:13 ` Miquel van Smoorenburg
2004-01-30 22:34 ` Andrew Morton
0 siblings, 1 reply; 25+ messages in thread
From: Miquel van Smoorenburg @ 2004-01-30 22:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: Miquel van Smoorenburg, linux-kernel, Nathan Scott
On Fri, 30 Jan 2004 21:21:55, Miquel van Smoorenburg wrote:
> I added some extra code to i_size_read() and i_size_write(). First,
> some debugging code:
> I then started the test that locks up the kernel, and it printed this:
>
> i_size_write: pid 542: sequence is odd!
> i_size_write: pid 543: sequence is odd!
> i_size_write: pid 542: sequence is odd!
>
> i_size_read() seems to be looping - pid 0
> i_size_read() seems to be looping - pid 0
> [this keeps on being printed and the kernel is locked up]
>
> It took some time for the i_size_write messages to show up, and they were
> spaced 10-30 seconds apart, and during that time the server was still
> up - right until the first i_size_read message.
>
Okay, I added a patch to make the sequence increments atomic. Now
i_size_write() still sometimes ends up with an odd sequence, but
i_size_read() doesn't lock up anymore.
What lock exactly is supposed to protect i_size_write, since it
appears that i_size_write is being called without proper locking ?
(Am I right?)
--- fs.h.v1 2004-01-30 21:11:19.000000000 +0100
+++ fs.h 2004-01-30 21:55:17.000000000 +0100
@@ -426,6 +426,7 @@
#ifdef __NEED_I_SIZE_ORDERED
seqcount_t i_size_seqcount;
pid_t seq_pid; /* XXX */
+ spinlock_t i_size_lock;
#endif
};
@@ -441,6 +442,7 @@
*/
#include <linux/delay.h>
#include <linux/sched.h>
+#include <asm/atomic.h>
static inline loff_t i_size_read(struct inode *inode)
{
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
@@ -476,9 +478,11 @@
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
#if 1 /* XXX */
inode->seq_pid = current->tgid;
- write_seqcount_begin(&inode->i_size_seqcount);
+ atomic_inc((atomic_t *)&inode->i_size_seqcount.sequence);
+ smp_wmb();
inode->i_size = i_size;
- write_seqcount_end(&inode->i_size_seqcount);
+ smp_wmb();
+ atomic_inc((atomic_t *)&inode->i_size_seqcount.sequence);
if (inode->i_size_seqcount.sequence & 1)
printk("i_size_write: pid %d: sequence is odd!\n",
current->tgid);
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-30 22:13 ` Miquel van Smoorenburg
@ 2004-01-30 22:34 ` Andrew Morton
2004-01-30 22:53 ` Christoph Hellwig
2004-01-30 23:07 ` Nathan Scott
0 siblings, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2004-01-30 22:34 UTC (permalink / raw)
To: Miquel van Smoorenburg; +Cc: miquels, linux-kernel, nathans
Miquel van Smoorenburg <miquels@cistron.nl> wrote:
>
> On Fri, 30 Jan 2004 21:21:55, Miquel van Smoorenburg wrote:
> > I added some extra code to i_size_read() and i_size_write(). First,
> > some debugging code:
> > I then started the test that locks up the kernel, and it printed this:
> >
> > i_size_write: pid 542: sequence is odd!
> > i_size_write: pid 543: sequence is odd!
> > i_size_write: pid 542: sequence is odd!
> >
> > i_size_read() seems to be looping - pid 0
> > i_size_read() seems to be looping - pid 0
> > [this keeps on being printed and the kernel is locked up]
> >
> > It took some time for the i_size_write messages to show up, and they were
> > spaced 10-30 seconds apart, and during that time the server was still
> > up - right until the first i_size_read message.
> >
>
> Okay, I added a patch to make the sequence increments atomic. Now
> i_size_write() still sometimes ends up with an odd sequence, but
> i_size_read() doesn't lock up anymore.
Go, Miquel!
> What lock exactly is supposed to protect i_size_write, since it
> appears that i_size_write is being called without proper locking ?
> (Am I right?)
If two CPUs hit i_size_write() at the same time we have a bug. That
function requires that the caller provide external serialisation, via i_sem.
Try adding this to the start of i_size_write():
if (down_trylock(&inode->i_sem) == 0) {
printk("I am buggy\n");
dump_stack();
up(&inode->i_sem);
}
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-30 22:34 ` Andrew Morton
@ 2004-01-30 22:53 ` Christoph Hellwig
2004-01-30 23:13 ` Andrew Morton
2004-01-30 23:07 ` Nathan Scott
1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2004-01-30 22:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Miquel van Smoorenburg, linux-kernel, nathans
On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> If two CPUs hit i_size_write() at the same time we have a bug. That
> function requires that the caller provide external serialisation, via i_sem.
O_APPEND|O_DIRECT writes could do that under XFS..
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-30 22:53 ` Christoph Hellwig
@ 2004-01-30 23:13 ` Andrew Morton
2004-01-31 1:25 ` Miquel van Smoorenburg
0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2004-01-30 23:13 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: miquels, linux-kernel, nathans
Christoph Hellwig <hch@infradead.org> wrote:
>
> On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> > If two CPUs hit i_size_write() at the same time we have a bug. That
> > function requires that the caller provide external serialisation, via i_sem.
>
> O_APPEND|O_DIRECT writes could do that under XFS..
Sigh.
diff -puN include/linux/fs.h~i_size_write-check include/linux/fs.h
--- 25/include/linux/fs.h~i_size_write-check Fri Jan 30 15:09:47 2004
+++ 25-akpm/include/linux/fs.h Fri Jan 30 15:10:28 2004
@@ -464,9 +464,11 @@ static inline loff_t i_size_read(struct
#endif
}
+void i_size_write_check(struct inode *inode);
static inline void i_size_write(struct inode *inode, loff_t i_size)
{
+ i_size_write_check(inode);
#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
write_seqcount_begin(&inode->i_size_seqcount);
inode->i_size = i_size;
diff -puN mm/filemap.c~i_size_write-check mm/filemap.c
--- 25/mm/filemap.c~i_size_write-check Fri Jan 30 15:10:23 2004
+++ 25-akpm/mm/filemap.c Fri Jan 30 15:11:41 2004
@@ -2010,3 +2010,18 @@ out:
}
EXPORT_SYMBOL_GPL(generic_file_direct_IO);
+
+void i_size_write_check(struct inode *inode)
+{
+ static int count = 0;
+
+ if (down_trylock(&inode->i_sem) == 0) {
+ if (count < 10) {
+ count++;
+ printk("i_size_write() called without i_sem\n");
+ dump_stack();
+ }
+ up(&inode->i_sem);
+ }
+}
+EXPORT_SYMBOL(i_size_write_check);
_
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-30 23:13 ` Andrew Morton
@ 2004-01-31 1:25 ` Miquel van Smoorenburg
2004-01-31 1:38 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Miquel van Smoorenburg @ 2004-01-31 1:25 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Hellwig, miquels, linux-kernel, nathans
On Sat, 31 Jan 2004 00:13:16, Andrew Morton wrote:
> Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> > > If two CPUs hit i_size_write() at the same time we have a bug. That
> > > function requires that the caller provide external serialisation, via i_sem.
> >
> > O_APPEND|O_DIRECT writes could do that under XFS..
>
> Sigh.
>
> diff -puN mm/filemap.c~i_size_write-check mm/filemap.c
> --- 25/mm/filemap.c~i_size_write-check Fri Jan 30 15:10:23 2004
> +++ 25-akpm/mm/filemap.c Fri Jan 30 15:11:41 2004
> @@ -2010,3 +2010,18 @@ out:
> }
>
> EXPORT_SYMBOL_GPL(generic_file_direct_IO);
> +
> +void i_size_write_check(struct inode *inode)
> +{
> + static int count = 0;
> +
> + if (down_trylock(&inode->i_sem) == 0) {
> + if (count < 10) {
You want to set this to 100 at least, since at boot time the message
happens _often_ even without XFS.
It's caused by sysfs:
Jan 31 00:48:33 meghan kernel: i_size_write() called without i_sem
Jan 31 00:48:33 meghan kernel: Call Trace:
Jan 31 00:48:33 meghan kernel: [i_size_write_check+95/97] i_size_write_check+0x5f/0x61
Jan 31 00:48:33 meghan kernel: [simple_commit_write+74/148] simple_commit_write+0x4a/0x94
Jan 31 00:48:33 meghan kernel: [page_symlink+210/414] page_symlink+0xd2/0x19e
Jan 31 00:48:33 meghan kernel: [sysfs_symlink+107/126] sysfs_symlink+0x6b/0x7e
Jan 31 00:48:33 meghan kernel: [sysfs_create_link+313/317] sysfs_create_link+0x139/0x13d
Jan 31 00:48:33 meghan kernel: [bus_add_device+143/161] bus_add_device+0x8f/0xa1
[etc etc]
Also, bd_set_size runs unlocked:
Jan 31 02:02:39 meghan kernel: I am buggy
Jan 31 02:02:39 meghan kernel: Call Trace:
Jan 31 02:02:39 meghan kernel: [bd_set_size+201/218] bd_set_size+0xc9/0xda
Jan 31 02:02:39 meghan kernel: [do_open+282/1014] do_open+0x11a/0x3f6
Jan 31 02:02:39 meghan kernel: [blkdev_get+114/128] blkdev_get+0x72/0x80
Jan 31 02:02:39 meghan kernel: [do_open+603/1014] do_open+0x25b/0x3f6
Jan 31 02:02:39 meghan kernel: [blkdev_get+114/128] blkdev_get+0x72/0x80
Jan 31 02:02:39 meghan kernel: [open_bdev_excl+87/172] open_bdev_excl+0x57/0xacJan 31 02:02:39 meghan kernel: [get_sb_bdev+55/344] get_sb_bdev+0x37/0x158
Jan 31 02:02:39 meghan kernel: [ext3_get_sb+47/51] ext3_get_sb+0x2f/0x33
Jan 31 02:02:39 meghan kernel: [ext3_fill_super+0/2949] ext3_fill_super+0x0/0xb85
Jan 31 02:02:39 meghan kernel: [do_kern_mount+95/213] do_kern_mount+0x5f/0xd5
Jan 31 02:02:39 meghan kernel: [do_add_mount+120/334] do_add_mount+0x78/0x14e
Jan 31 02:02:39 meghan kernel: [do_mount+289/359] do_mount+0x121/0x167
Jan 31 02:02:39 meghan kernel: [__copy_from_user_ll+104/108] __copy_from_user_ll+0x68/0x6c
Jan 31 02:02:39 meghan kernel: [copy_mount_options+128/234] copy_mount_options+0x80/0xea
Jan 31 02:02:39 meghan kernel: [sys_mount+203/309] sys_mount+0xcb/0x135
Jan 31 02:02:39 meghan kernel: [do_mount_root+47/156] do_mount_root+0x2f/0x9c
Jan 31 02:02:39 meghan kernel: [mount_block_root+84/285] mount_block_root+0x54/0x11d
Jan 31 02:02:39 meghan kernel: [mount_root+94/102] mount_root+0x5e/0x66
Jan 31 02:02:39 meghan kernel: [prepare_namespace+38/195] prepare_namespace+0x26/0xc3
Jan 31 02:02:39 meghan kernel: [init+80/334] init+0x50/0x14e
Jan 31 02:02:39 meghan kernel: [init+0/334] init+0x0/0x14e
Jan 31 02:02:39 meghan kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb
But the XFS problem appears to be vn_revalidate which calls i_size_write()
without holding i_sem:
Jan 31 02:04:00 meghan kernel: I am buggy
Jan 31 02:04:00 meghan kernel: Call Trace:
Jan 31 02:04:00 meghan kernel: [__crc_pm_idle+1813107/4617778] vn_revalidate+0x18e/0x1a9 [xfs]
Jan 31 02:04:00 meghan kernel: [nfsd_commit+205/219] nfsd_commit+0xcd/0xdb
Jan 31 02:04:00 meghan kernel: [__crc_pm_idle+1798087/4617778] linvfs_getattr+0x39/0x3f [xfs]
Jan 31 02:04:00 meghan kernel: [vfs_getattr+57/154] vfs_getattr+0x39/0x9a
Jan 31 02:04:00 meghan kernel: [encode_post_op_attr+104/577] encode_post_op_attr+0x68/0x241
Jan 31 02:04:00 meghan kernel: [nfs3svc_encode_commitres+42/115] nfs3svc_encode_commitres+0x2a/0x73
Jan 31 02:04:00 meghan kernel: [nfsd_dispatch+297/485] nfsd_dispatch+0x129/0x1e5
Jan 31 02:04:00 meghan kernel: [svc_process+1170/1621] svc_process+0x492/0x655
Jan 31 02:04:00 meghan kernel: [apic_timer_interrupt+26/32] apic_timer_interrupt+0x1a/0x20
Jan 31 02:04:00 meghan kernel: [nfsd+490/915] nfsd+0x1ea/0x393
Jan 31 02:04:00 meghan kernel: [nfsd+0/915] nfsd+0x0/0x393
Jan 31 02:04:00 meghan kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb
Mike.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-31 1:25 ` Miquel van Smoorenburg
@ 2004-01-31 1:38 ` Andrew Morton
2004-01-31 11:46 ` Miquel van Smoorenburg
2004-01-31 17:07 ` Christoph Hellwig
2004-02-01 1:46 ` Anton Blanchard
2 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2004-01-31 1:38 UTC (permalink / raw)
To: Miquel van Smoorenburg; +Cc: hch, miquels, linux-kernel, nathans
Miquel van Smoorenburg <miquels@cistron.nl> wrote:
>
> On Sat, 31 Jan 2004 00:13:16, Andrew Morton wrote:
> > Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> > > > If two CPUs hit i_size_write() at the same time we have a bug. That
> > > > function requires that the caller provide external serialisation, via i_sem.
> > >
> > > O_APPEND|O_DIRECT writes could do that under XFS..
> >
> > Sigh.
> >
> > diff -puN mm/filemap.c~i_size_write-check mm/filemap.c
> > --- 25/mm/filemap.c~i_size_write-check Fri Jan 30 15:10:23 2004
> > +++ 25-akpm/mm/filemap.c Fri Jan 30 15:11:41 2004
> > @@ -2010,3 +2010,18 @@ out:
> > }
> >
> > EXPORT_SYMBOL_GPL(generic_file_direct_IO);
> > +
> > +void i_size_write_check(struct inode *inode)
> > +{
> > + static int count = 0;
> > +
> > + if (down_trylock(&inode->i_sem) == 0) {
> > + if (count < 10) {
>
> You want to set this to 100 at least, since at boot time the message
> happens _often_ even without XFS.
fun.
> It's caused by sysfs:
OK, sysfs_symlink() needs i_sem.
> Also, bd_set_size runs unlocked:
I don't expect we'll ever see a race over the blockdev's i_size in there.
It might be simplest to just do a straight assignment. I'll work that with
viro.
> But the XFS problem appears to be vn_revalidate which calls i_size_write()
> without holding i_sem:
There's your bug.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-31 1:38 ` Andrew Morton
@ 2004-01-31 11:46 ` Miquel van Smoorenburg
2004-01-31 15:59 ` Steve Lord
0 siblings, 1 reply; 25+ messages in thread
From: Miquel van Smoorenburg @ 2004-01-31 11:46 UTC (permalink / raw)
To: nathans; +Cc: hch, miquels, linux-kernel
On Sat, 31 Jan 2004 02:38:51, Andrew Morton wrote:
> Miquel van Smoorenburg <miquels@cistron.nl> wrote:
> > But the XFS problem appears to be vn_revalidate which calls i_size_write()
> > without holding i_sem:
>
> There's your bug.
Okay. It seems that XFS uses its own locking with xfs_ilock() etc,
so I am not sure if this should be fixed by using down(&inode->i_sem)
or by using xfs_ilock().
Perhaps xfs_ilock() should also get the inode->i_sem semaphore
in the XFS_ILOCK_EXCL case ?
Also, there are one or two more places that call i_size_write()
that should be looked at I guess.
Ofcourse I'll test any patches you send me.
Mike.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-31 11:46 ` Miquel van Smoorenburg
@ 2004-01-31 15:59 ` Steve Lord
2004-02-01 16:15 ` Christoph Hellwig
0 siblings, 1 reply; 25+ messages in thread
From: Steve Lord @ 2004-01-31 15:59 UTC (permalink / raw)
To: Miquel van Smoorenburg; +Cc: nathans, hch, linux-kernel
Miquel van Smoorenburg wrote:
> On Sat, 31 Jan 2004 02:38:51, Andrew Morton wrote:
>
>>Miquel van Smoorenburg <miquels@cistron.nl> wrote:
>>
>>>But the XFS problem appears to be vn_revalidate which calls i_size_write()
>>>without holding i_sem:
>>
>>There's your bug.
>
>
> Okay. It seems that XFS uses its own locking with xfs_ilock() etc,
> so I am not sure if this should be fixed by using down(&inode->i_sem)
> or by using xfs_ilock().
>
> Perhaps xfs_ilock() should also get the inode->i_sem semaphore
> in the XFS_ILOCK_EXCL case ?
>
> Also, there are one or two more places that call i_size_write()
> that should be looked at I guess.
>
Actually it gets more messy than this. i_size_write is called by
different spots by xfs without the i_sem held. The call in
generic_file_aio_write_nolock() is made without the lock for O_DIRECT
writes, XFS allows multiple parallel writes for O_DIRECT
writes.
The vn_revalidate call is called out of linvfs_setattr,
which is called with the i_sem held, it is also potentially called out
of linvfs_getattr, although since the i_size is always maintained
as it is changed, this call should not actually be updating the size.
Possibly changing the code in vn_revalidate to do this:
if (i_size_read(inode) < va.va_size))
i_size_write(inode, va.va_size);
Would be a good starting point, I suspect those calls from the nfs
revalidate call are not really going to change the inode size. My
guess is this will make your problem go away.
Probably some larger code restructure is needed so that revalidate
knows if the i_sem is held or not at this point.
The O_DIRECT write case is the hard one. In XFS's internal view of
the world, the inode size is maintained via the XFS_ILOCK, but we
only hold that across metadata manipulation within the fs code,
not across I/O such as a call to generic_file_aio_write_nolock.
Right now the only way I see of dealing with that is to make
writes which we know will extent the file hold the i_sem for
the duration in the O_DIRECT case.
Steve
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-31 15:59 ` Steve Lord
@ 2004-02-01 16:15 ` Christoph Hellwig
2004-01-31 16:41 ` Steve Lord
0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2004-02-01 16:15 UTC (permalink / raw)
To: Steve Lord; +Cc: Miquel van Smoorenburg, nathans, linux-kernel
On Sat, Jan 31, 2004 at 09:59:06AM -0600, Steve Lord wrote:
> The vn_revalidate call is called out of linvfs_setattr,
> which is called with the i_sem held, it is also potentially called out
> of linvfs_getattr, although since the i_size is always maintained
> as it is changed, this call should not actually be updating the size.
> Possibly changing the code in vn_revalidate to do this:
>
> if (i_size_read(inode) < va.va_size))
> i_size_write(inode, va.va_size);
>
> Would be a good starting point, I suspect those calls from the nfs
> revalidate call are not really going to change the inode size. My
> guess is this will make your problem go away.
>
> Probably some larger code restructure is needed so that revalidate
> knows if the i_sem is held or not at this point.
I think the right fix is to update the Linux i_size always shortly after
di_size is updated. There's a lot of updates in the directory code while
should be handled by an i_size_write in the matching linvfs routines, and
there's a few more but we should be able to handle those without
vn_revalidate aswell.
> The O_DIRECT write case is the hard one. In XFS's internal view of
> the world, the inode size is maintained via the XFS_ILOCK, but we
> only hold that across metadata manipulation within the fs code,
> not across I/O such as a call to generic_file_aio_write_nolock.
> Right now the only way I see of dealing with that is to make
> writes which we know will extent the file hold the i_sem for
> the duration in the O_DIRECT case.
That's the more difficult case. Any reson why you'd hold i_sem
for the whole O_DIRECT I/O instead of just for updating i_sem?
Note that the EOF zeroing code also calls i_size_write.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-02-01 16:15 ` Christoph Hellwig
@ 2004-01-31 16:41 ` Steve Lord
0 siblings, 0 replies; 25+ messages in thread
From: Steve Lord @ 2004-01-31 16:41 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Miquel van Smoorenburg, nathans, linux-kernel
Christoph Hellwig wrote:
> On Sat, Jan 31, 2004 at 09:59:06AM -0600, Steve Lord wrote:
>
>>The vn_revalidate call is called out of linvfs_setattr,
>>which is called with the i_sem held, it is also potentially called out
>>of linvfs_getattr, although since the i_size is always maintained
>>as it is changed, this call should not actually be updating the size.
>>Possibly changing the code in vn_revalidate to do this:
>>
>> if (i_size_read(inode) < va.va_size))
>> i_size_write(inode, va.va_size);
>>
>>Would be a good starting point, I suspect those calls from the nfs
>>revalidate call are not really going to change the inode size. My
>>guess is this will make your problem go away.
>>
>>Probably some larger code restructure is needed so that revalidate
>>knows if the i_sem is held or not at this point.
>
>
> I think the right fix is to update the Linux i_size always shortly after
> di_size is updated. There's a lot of updates in the directory code while
> should be handled by an i_size_write in the matching linvfs routines, and
> there's a few more but we should be able to handle those without
> vn_revalidate aswell.
>
Changing the validate_fields call to use
if (i_size_read(inode) != va.va_size)
i_size_write(inode, va.va_size);
should take care of directories, certainly better than the
direct assignment to i_size which is in there now....
This is called from the directory ops which modify the
contents of a directory and should already be under the
i_sem. The vn_revalidate code should use a != test
as well of course...
>
>>The O_DIRECT write case is the hard one. In XFS's internal view of
>>the world, the inode size is maintained via the XFS_ILOCK, but we
>>only hold that across metadata manipulation within the fs code,
>>not across I/O such as a call to generic_file_aio_write_nolock.
>>Right now the only way I see of dealing with that is to make
>>writes which we know will extent the file hold the i_sem for
>>the duration in the O_DIRECT case.
>
>
> That's the more difficult case. Any reson why you'd hold i_sem
> for the whole O_DIRECT I/O instead of just for updating i_sem?
>
We worked hard not to hold it, but that i_size_write in
the middle of the async_io code is tough to work around.
> Note that the EOF zeroing code also calls i_size_write.
>
But that is called out of an extending setattr or a buffered write which
hold the semaphore. Hmm, actually O_DIRECT write can be in there as
well, but that is the same problem as the generic I/O path call.
Steve
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-31 1:25 ` Miquel van Smoorenburg
2004-01-31 1:38 ` Andrew Morton
@ 2004-01-31 17:07 ` Christoph Hellwig
2004-02-01 1:46 ` Anton Blanchard
2 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2004-01-31 17:07 UTC (permalink / raw)
To: Miquel van Smoorenburg; +Cc: Andrew Morton, linux-kernel, nathans
Here's an (untested) quickhack to take i_sem in linvfs_getattr.
I'll try to come up with a real fix next week.
--- 1.39/fs/xfs/linux/xfs_iops.c Fri Jan 9 01:07:07 2004
+++ edited/fs/xfs/linux/xfs_iops.c Sat Jan 31 18:41:44 2004
@@ -478,8 +478,11 @@
vnode_t *vp = LINVFS_GET_VP(inode);
int error = 0;
- if (unlikely(vp->v_flag & VMODIFIED))
+ if (unlikely(vp->v_flag & VMODIFIED)) {
+ down(&inode->i_sem);
error = vn_revalidate(vp);
+ up(&inode->i_sem);
+ }
if (!error)
generic_fillattr(inode, stat);
return 0;
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-31 1:25 ` Miquel van Smoorenburg
2004-01-31 1:38 ` Andrew Morton
2004-01-31 17:07 ` Christoph Hellwig
@ 2004-02-01 1:46 ` Anton Blanchard
2 siblings, 0 replies; 25+ messages in thread
From: Anton Blanchard @ 2004-02-01 1:46 UTC (permalink / raw)
To: Miquel van Smoorenburg
Cc: Andrew Morton, Christoph Hellwig, linux-kernel, nathans
> > + if (down_trylock(&inode->i_sem) == 0) {
> > + if (count < 10) {
>
> You want to set this to 100 at least, since at boot time the message
> happens _often_ even without XFS.
You could also just printk_ratelimit it :)
Anton
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: 2.6.2-rc2 nfsd+xfs spins in i_size_read()
2004-01-30 22:34 ` Andrew Morton
2004-01-30 22:53 ` Christoph Hellwig
@ 2004-01-30 23:07 ` Nathan Scott
1 sibling, 0 replies; 25+ messages in thread
From: Nathan Scott @ 2004-01-30 23:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: Miquel van Smoorenburg, linux-kernel
On Fri, Jan 30, 2004 at 02:34:59PM -0800, Andrew Morton wrote:
> Miquel van Smoorenburg <miquels@cistron.nl> wrote:
> > What lock exactly is supposed to protect i_size_write, since it
> > appears that i_size_write is being called without proper locking ?
> > (Am I right?)
>
> If two CPUs hit i_size_write() at the same time we have a bug. That
> function requires that the caller provide external serialisation, via i_sem.
Hmm... I suspect we may not always be providing that in XFS -
I'll go audit our calls.
> Try adding this to the start of i_size_write():
>
> if (down_trylock(&inode->i_sem) == 0) {
> printk("I am buggy\n");
> dump_stack();
> up(&inode->i_sem);
> }
Let me know what you hit Miquel :) - I'll run the XFS tests
next week with this too and see what I can find.
thanks.
--
Nathan
^ permalink raw reply [flat|nested] 25+ messages in thread