All of lore.kernel.org
 help / color / mirror / Atom feed
* NFSv3 and linux-2.4.10-ac3 => oops
@ 2001-10-01 23:40 H. Peter Anvin
  2001-10-02  9:40 ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: H. Peter Anvin @ 2001-10-01 23:40 UTC (permalink / raw)
  To: alan, linux-kernel, trond.myklebust

Hello everyone,

I have a reproducible (and rather quick) oops on a system running
linux-2.4.10-ac3, which seems to be NFS (v3) related; although
ksymoops core dumps when I try to use it, I have manually decoded
the dump to indicate that it happens in rwsem_down_read_failed
called from nfs_file_wite.  Rather than posting too much here,
I have put as much information as I have been able to gather at:

ftp://ftp.zytor.com/pub/hpa/oops/

This includes the configuration, System.map, oops text etc.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFSv3 and linux-2.4.10-ac3 => oops
  2001-10-01 23:40 NFSv3 and linux-2.4.10-ac3 => oops H. Peter Anvin
@ 2001-10-02  9:40 ` Trond Myklebust
  2001-10-02 11:32   ` Matt Bernstein
  0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2001-10-02  9:40 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: alan, linux-kernel

>>>>> " " == H Peter Anvin <hpa@transmeta.com> writes:

     > Hello everyone, I have a reproducible (and rather quick) oops
     > on a system running linux-2.4.10-ac3, which seems to be NFS
     > (v3) related; although ksymoops core dumps when I try to use
     > it, I have manually decoded the dump to indicate that it
     > happens in rwsem_down_read_failed called from nfs_file_wite.
     > Rather than posting too much here, I have put as much
     > information as I have been able to gather at:

     > ftp://ftp.zytor.com/pub/hpa/oops/

I'm trying to look at this, but it seems a hopeless mess: there are no
calls to any read/write semaphore routines in the NFS code.

AFAICS the second stack return point corresponds to the call to
generic_file_write() in nfs_file_write(), so I'd guess that the Oops
is actually happening somewhere there...

Hmm... Looking at the code in generic_file_write(), I see that Alan
hasn't merged in the kmap() stuff in generic_file_write()from
Linus. At the same time, the nfs_prepare_write() seems to have been
synced with Linus, and so the kmap() that used to be there has
disappeared.

As  your  config  indicates  that  you  *are*  using CONFIG_HIGHMEM4G,
perhaps one ought to start with a patch that fixes the obvious bug (in
the hope that it'll at least clean up the next Oops)...

Cheers,
  Trond

--- linux-2.4.10-hpa/fs/nfs/file.c.orig	Sun Sep 23 18:48:01 2001
+++ linux-2.4.10-hpa/fs/nfs/file.c	Tue Oct  2 11:33:43 2001
@@ -155,7 +155,12 @@
  */
 static int nfs_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
 {
-	return nfs_flush_incompatible(file, page);
+	int status;
+	kmap(page);
+	status = nfs_flush_incompatible(file, page);
+	if (status)
+		kunmap(page);
+	return status;
 }
 
 static int nfs_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFSv3 and linux-2.4.10-ac3 => oops
  2001-10-02  9:40 ` Trond Myklebust
@ 2001-10-02 11:32   ` Matt Bernstein
  2001-10-02 12:03     ` Trond Myklebust
  2001-10-02 13:47     ` Alan Cox
  0 siblings, 2 replies; 8+ messages in thread
From: Matt Bernstein @ 2001-10-02 11:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: H. Peter Anvin, alan, linux-kernel

I wonder if this is related to oopses I sent in in the last two days?
We're running 4GB setups with NFSv3 client and server on our fileservers,
and the oopses might (don't really have strong correlation evidence yet)
be related to when our fileservers push online backups to cheaper NFS
servers (running the same kernel based on 2.4.9-ac10). Is there a last
known good kernel I can try on my production systems while I try to
reproduce the problem on smaller boxes? Or would you like me to try your
patch?

Matt

At 11:40 +0200 Trond Myklebust wrote:

>>>>>> " " == H Peter Anvin <hpa@transmeta.com> writes:
>
>     > Hello everyone, I have a reproducible (and rather quick) oops
>     > on a system running linux-2.4.10-ac3, which seems to be NFS
>     > (v3) related; although ksymoops core dumps when I try to use
[snip]
>     > ftp://ftp.zytor.com/pub/hpa/oops/
>
>I'm trying to look at this, but it seems a hopeless mess: there are no
>calls to any read/write semaphore routines in the NFS code.
>
>AFAICS the second stack return point corresponds to the call to
>generic_file_write() in nfs_file_write(), so I'd guess that the Oops
>is actually happening somewhere there...
>
>Hmm... Looking at the code in generic_file_write(), I see that Alan
>hasn't merged in the kmap() stuff in generic_file_write()from
>Linus. At the same time, the nfs_prepare_write() seems to have been
>synced with Linus, and so the kmap() that used to be there has
>disappeared.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFSv3 and linux-2.4.10-ac3 => oops
  2001-10-02 11:32   ` Matt Bernstein
@ 2001-10-02 12:03     ` Trond Myklebust
  2001-10-02 13:49       ` Alan Cox
  2001-10-02 13:47     ` Alan Cox
  1 sibling, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2001-10-02 12:03 UTC (permalink / raw)
  To: Matt Bernstein; +Cc: H. Peter Anvin, alan, linux-kernel

>>>>> " " == Matt Bernstein <matt@theBachChoir.org.uk> writes:

     > I wonder if this is related to oopses I sent in in the last two
     > days?  We're running 4GB setups with NFSv3 client and server on
     > our fileservers, and the oopses might (don't really have strong
     > correlation evidence yet) be related to when our fileservers
     > push online backups to cheaper NFS servers (running the same
     > kernel based on 2.4.9-ac10). Is there a last known good kernel
     > I can try on my production systems while I try to reproduce the
     > problem on smaller boxes? Or would you like me to try your
     > patch?

Linus changed nfs_prepare_write() in his tree around 2.4.10-pre5. From
what I can see, Alan merged that particular patch into 2.4.9-ac11 (but
without merging in the related changes to linux/mm/filemap.c).

Argh. I see that in the patch I put out earlier today, I forgot to
also revert the removal of the kunmap() in nfs_commit_write() (sorry -
my coffee was particularly weak this morning).

Please apply the following patch to the 'ac' tree instead.

People who use Linus' tree should *not* apply this patch!!!!!

Cheers,
  Trond

diff -u --recursive --new-file linux-2.4.10-reclaim/fs/nfs/file.c linux-2.4.10-ac4/fs/nfs/file.c
--- linux-2.4.10-reclaim/fs/nfs/file.c	Sun Sep 23 18:48:01 2001
+++ linux-2.4.10-ac4/fs/nfs/file.c	Tue Oct  2 13:40:58 2001
@@ -155,7 +155,12 @@
  */
 static int nfs_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
 {
-	return nfs_flush_incompatible(file, page);
+	int status;
+	kmap(page);
+	status = nfs_flush_incompatible(file, page);
+	if (status)
+		kunmap(page);
+	return status;
 }
 
 static int nfs_commit_write(struct file *file, struct page *page, unsigned offset, unsigned to)
@@ -164,6 +169,7 @@
 	loff_t pos = ((loff_t)page->index<<PAGE_CACHE_SHIFT) + to;
 	struct inode *inode = page->mapping->host;
 
+	kunmap(page);
 	lock_kernel();
 	status = nfs_updatepage(file, page, offset, to-offset);
 	unlock_kernel();

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFSv3 and linux-2.4.10-ac3 => oops
  2001-10-02 11:32   ` Matt Bernstein
  2001-10-02 12:03     ` Trond Myklebust
@ 2001-10-02 13:47     ` Alan Cox
  2001-10-02 14:02       ` Matt Bernstein
  1 sibling, 1 reply; 8+ messages in thread
From: Alan Cox @ 2001-10-02 13:47 UTC (permalink / raw)
  To: Matt Bernstein; +Cc: Trond Myklebust, H. Peter Anvin, alan, linux-kernel

> I wonder if this is related to oopses I sent in in the last two days?
> We're running 4GB setups with NFSv3 client and server on our fileservers,
> and the oopses might (don't really have strong correlation evidence yet)
> be related to when our fileservers push online backups to cheaper NFS
> servers (running the same kernel based on 2.4.9-ac10). Is there a last
> known good kernel I can try on my production systems while I try to
> reproduce the problem on smaller boxes? Or would you like me to try your
> patch?

Are these oopses new as of the 2.4.10 based tree. If so do you see them 
with 2.4.10-ac3 ?

Right now we have a sort of bug candidate set that is

		VM	NFS	LOCKING
2.4.9-ac10	old	old	old
2.4.9-ac16	new	old	old
2.4.9-ac18	new	old	half-way
2.4.10-ac3	new	new	new

that may help deduce which problem

Alan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFSv3 and linux-2.4.10-ac3 => oops
  2001-10-02 12:03     ` Trond Myklebust
@ 2001-10-02 13:49       ` Alan Cox
  2001-10-02 14:03         ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Cox @ 2001-10-02 13:49 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Matt Bernstein, H. Peter Anvin, alan, linux-kernel

> what I can see, Alan merged that particular patch into 2.4.9-ac11 (but
> without merging in the related changes to linux/mm/filemap.c).

Ok its probably better I merge the related mm/filemap.c changes if someone
has the relevant bits handy. That helps to keep the differences down

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFSv3 and linux-2.4.10-ac3 => oops
  2001-10-02 13:47     ` Alan Cox
@ 2001-10-02 14:02       ` Matt Bernstein
  0 siblings, 0 replies; 8+ messages in thread
From: Matt Bernstein @ 2001-10-02 14:02 UTC (permalink / raw)
  To: Alan Cox; +Cc: Trond Myklebust, H. Peter Anvin, linux-kernel

At 14:47 +0100 Alan Cox wrote:

>> I wonder if this is related to oopses I sent in in the last two days?
[snip]
>
>Are these oopses new as of the 2.4.10 based tree. If so do you see them
>with 2.4.10-ac3 ?

Mine were from 2.4.9-ac10 + ext3-0.9.9 + ext3 speedup patch (which is in
0.9.10) + "experimental VM patch" (see the ext3 for 2.4 page) + jfs-1.0.4
(compiled with gcc 2.96-85, romfs initrd, everything possible as modules)

I've booted two of our servers into 2.4.9-ac18 compiled with egcs-1.1.2
(so far without Trond's patches) and will report anything odd.

Incidentally a third server on my 2.4.9-ac10 things has oopsed (output
below). What these three servers have in common is that they're all using
ICP-Vortex gdth raid arrays, and no IDE. I have four or five other setups
with the exact same kernel (well, two of them compiled for UP Athlon
rather than SMP Coppermine) with IDE root and further SCSI partitions
(some aic7xxx, some gdth) which have all been very stable. We haven't
ruled out a cabling/termination problem, but it's a bit spooky.

Thanks for the responses :)

Matt


ksymoops 2.4.1 on i686 2.4.9-ac10-jfs.  Options used
     -V (default)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.9-ac10-jfs/ (default)
     -m /boot/System.map-2.4.9-ac10-jfs (default)

No modules in ksyms, skipping objects
Unable to handle kernel paging request at virtual address 756f6a00
756f6a00
*pde = 00000000
Oops: 0000
CPU:    1
EIP:    0010:[<756f6a00>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 756f6a00   ebx: c7e219cc   ecx: d01fc594   edx: d01fc594
esi: c7e219b4   edi: d01fc584   ebp: ffff4909   esp: c1969f68
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 5, stackpage=c1969000)
Stack: c015a271 c7e219b4 d01fc584 c022c940 00000206 ffffffff 00003044
c11fa670
       c11fa670 00000000 00000001 000000c0 00000001 c0231d60 0008e000
c015a891
       00000000 c0139306 00000000 000000c0 000000c0 00000000 c1968000
ffffffff
Call Trace: [<c015a271>] [<c015a891>] [<c0139306>] [<c01393ae>]
[<c0105000>]
   [<c0105000>] [<c0105926>] [<c0139340>]
Code:  Bad EIP value.

>>EIP; 756f6a00 Before first symbol   <=====
Trace; c015a271 <prune_dcache+141/270>
Trace; c015a891 <shrink_dcache_memory+21/40>
Trace; c0139306 <do_try_to_free_pages+26/60>
Trace; c01393ae <kswapd+6e/f0>
Trace; c0105000 <_stext+0/0>
Trace; c0105000 <_stext+0/0>
Trace; c0105926 <kernel_thread+26/30>
Trace; c0139340 <kswapd+0/f0>

 <1>Unable to handle kernel paging request at virtual address 756f6a00
756f6a00
*pde = 00000000
Oops: 0000
CPU:    1
EIP:    0010:[<756f6a00>]
EFLAGS: 00010206
eax: 756f6a00   ebx: de844de0   ecx: c0819e54   edx: c0819e54
esi: de844dc8   edi: c0819e44   ebp: 00000000   esp: cc7c3e74
ds: 0018   es: 0018   ss: 0018
Process bonnie++ (pid: 17138, stackpage=cc7c3000)
Stack: c015a271 de844dc8 c0819e44 00000082 c01383ba c10143c0 00000082
c10143dc
       c1509d24 00000000 00000000 000000d2 00015ec2 00000000 000000d2
c015a891
       00000000 c0139306 00000000 000000d2 000000d2 00000001 cc7c2000
00000010
Call Trace: [<c015a271>] [<c01383ba>] [<c015a891>] [<c0139306>]
[<c0139488>]
   [<c013a13e>] [<c0131f0b>] [<c0109437>] [<e099ae42>] [<c0142656>]
[<c01128bc>]
   [<c010772b>]
Code:  Bad EIP value.

>>EIP; 756f6a00 Before first symbol   <=====
Trace; c015a271 <prune_dcache+141/270>
Trace; c01383ba <try_to_release_page+3a/60>
Trace; c015a891 <shrink_dcache_memory+21/40>
Trace; c0139306 <do_try_to_free_pages+26/60>
Trace; c0139488 <try_to_free_pages+28/40>
Trace; c013a13e <__alloc_pages+1be/250>
Trace; c0131f0b <generic_file_write+35b/610>
Trace; c0109437 <do_IRQ+1a7/1c0>
Trace; e099ae42 <END_OF_CODE+206ce85a/????>
Trace; c0142656 <sys_write+96/d0>
Trace; c01128bc <smp_apic_timer_interrupt+ec/110>
Trace; c010772b <system_call+33/38>




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFSv3 and linux-2.4.10-ac3 => oops
  2001-10-02 13:49       ` Alan Cox
@ 2001-10-02 14:03         ` Trond Myklebust
  0 siblings, 0 replies; 8+ messages in thread
From: Trond Myklebust @ 2001-10-02 14:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: Matt Bernstein, H. Peter Anvin, alan, linux-kernel

>>>>> " " == Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

    >> what I can see, Alan merged that particular patch into
    >> 2.4.9-ac11 (but without merging in the related changes to
    >> linux/mm/filemap.c).

     > Ok its probably better I merge the related mm/filemap.c changes
     > if someone has the relevant bits handy. That helps to keep the
     > differences down

The following ought to be sufficient.

Cheers,
   Trond

--- linux-2.4.10-ac/mm/filemap.c	Tue Oct  2 15:53:04 2001
+++ linux-2.4.10-new/mm/filemap.c	Tue Oct  2 15:56:29 2001
@@ -2673,10 +2673,10 @@
 			PAGE_BUG(page);
 		}
 
+		kaddr = kmap(page);
 		status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes);
 		if (status)
 			goto sync_failure;
-		kaddr = page_address(page);
 		status = __copy_from_user(kaddr+offset, buf, bytes);
 		flush_dcache_page(page);
 		if (status) {
@@ -2695,6 +2695,7 @@
 			buf += status;
 		}
 unlock:
+		kunmap(page);
 		/* Mark it unlocked again and drop the page.. */
 		UnlockPage(page);
 		if (deactivate)
@@ -2728,9 +2729,9 @@
 fail_write:
 	status = -EFAULT;
 	ClearPageUptodate(page);
-	kunmap(page);
 	goto unlock;
 sync_failure:
+	kunmap(page);
 	UnlockPage(page);
 	deactivate_page(page);
 	page_cache_release(page);

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-10-02 14:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-01 23:40 NFSv3 and linux-2.4.10-ac3 => oops H. Peter Anvin
2001-10-02  9:40 ` Trond Myklebust
2001-10-02 11:32   ` Matt Bernstein
2001-10-02 12:03     ` Trond Myklebust
2001-10-02 13:49       ` Alan Cox
2001-10-02 14:03         ` Trond Myklebust
2001-10-02 13:47     ` Alan Cox
2001-10-02 14:02       ` Matt Bernstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.