netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Frequent Oops on Shutdown 2.6.10
@ 2005-02-28 22:25 AndyLiebman
  0 siblings, 0 replies; 15+ messages in thread
From: AndyLiebman @ 2005-02-28 22:25 UTC (permalink / raw)
  To: yoshfuji; +Cc: herbert, terryg, netdev, davem, akpm

In a message dated 2/23/2005 11:42:42 A.M. Eastern Standard Time,  
yoshfuji@linux-ipv6.org writes:
In article  <b9.522b6602.2f4df280@aol.com> (at Wed, 23 Feb 2005 09:51:44 
EST),  AndyLiebman@aol.com says:

> Should I bother to apply this patch, or  should I wait for you to make this 
 
> last change? What did you  think about my comment that the Oops only 
occurred  
> when the  Ethernet cable had been unplugged during operation? 

Well, not sure, but  I think it is worth trying.

Thanks.

--yoshfuji
Hi Yoshi,  

I just thought I would let you know that I applied the patch to the  2.6.10 
kernel and recompiled. It seems to have made the Oops go away. At least,  I 
can't make the Oops happen any more by unplugging the Ethernet cables during  
operation and then shutting down. 

I might add that when I tried to apply  the patch with:

patch --dry-run -p1 -d dir < patchfile

I got all kinds of errors about this line or that line. I can't remember  
what they were. In the end, I VERY CAREFULLY cut and pasted your patch, and  
removed the lines the patch was supposed to remove. 

Did you attempt to  apply the patch? Anyway, looks good. 
 
Regards, 
Andy Liebman
 

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Frequent Oops on Shutdown 2.6.10
@ 2005-02-23 14:53 AndyLiebman
  0 siblings, 0 replies; 15+ messages in thread
From: AndyLiebman @ 2005-02-23 14:53 UTC (permalink / raw)
  To: herbert, yoshfuji; +Cc: terryg, netdev, davem, akpm

Hello, Yoshi

Should I bother to apply  this patch, or should I wait for you to make this 
last change? What did you  think about my comment that the Oops only occurred 
when the Ethernet cable had  been unplugged during operation? 

Regards, 
Andy Liebman


In  a message dated 2/23/2005 4:52:59 A.M. Eastern Standard Time,  
herbert@gondor.apana.org.au writes:
On Wed, Feb 23, 2005 at 06:35:55PM +0900,  YOSHIFUJI Hideaki / ?$B5HF#1QL@ 
wrote:
> 
> What do you think of  this?

Thanks, this looks great.  There is just one technical  detail
to patch up.

> -int snmp6_unregister_dev(struct inet6_dev  *idev)
> +int snmp6_free_dev(struct inet6_dev *idev)
>   {
>      snmp6_mib_free((void  **)idev->stats.icmpv6);
>      return  0;
>  }

You need to check whether icmpv6[0] is NULL either  here or in
snmp6_mib_free.  Otherwise when snmp6_alloc_dev fails  we'll
wind up here and then call free_percpu on a pair of NULL  pointers.

Cheers,
-- 
Visit Openswan at  http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~}  <herbert@gondor.apana.org.au>
Home Page:  http://gondor.apana.org.au/~herbert/
PGP Key:  http://gondor.apana.org.au/~herbert/pubkey.txt  

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Frequent Oops on Shutdown 2.6.10
@ 2005-02-23 14:51 AndyLiebman
  2005-02-23 16:43 ` YOSHIFUJI Hideaki / 吉藤英明
  0 siblings, 1 reply; 15+ messages in thread
From: AndyLiebman @ 2005-02-23 14:51 UTC (permalink / raw)
  To: herbert, yoshfuji; +Cc: terryg, netdev, davem, akpm, AndyLiebman

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

 
Hello, Yoshi
 
Should I bother to apply this patch, or should I wait for you to make this  
last change? What did you think about my comment that the Oops only occurred  
when the Ethernet cable had been unplugged during operation? 
 
Regards, 
Andy Liebman

On Wed,  Feb 23, 2005 at 06:35:55PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@  
wrote:
> 
> What do you think of this?

Thanks, this looks  great.  There is just one technical detail
to patch up.

>  -int snmp6_unregister_dev(struct inet6_dev *idev)
> +int  snmp6_free_dev(struct inet6_dev *idev)
>  {
>     snmp6_mib_free((void **)idev->stats.icmpv6);
>     return 0;
>  }

You need to check whether icmpv6[0] is  NULL either here or in
snmp6_mib_free.  Otherwise when snmp6_alloc_dev  fails we'll
wind up here and then call free_percpu on a pair of NULL  pointers.

Cheers,
-- 
Visit Openswan at  http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~}  <herbert@gondor.apana.org.au>
Home Page:  http://gondor.apana.org.au/~herbert/
PGP Key:  http://gondor.apana.org.au/~herbert/pubkey.txt





[-- Attachment #2: Type: text/html, Size: 1989 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Frequent Oops on Shutdown 2.6.10
@ 2005-02-22 23:30 AndyLiebman
  0 siblings, 0 replies; 15+ messages in thread
From: AndyLiebman @ 2005-02-22 23:30 UTC (permalink / raw)
  To: AndyLiebman, herbert, yoshfuji; +Cc: terryg, netdev, davem, akpm


I tested my server today several  times. I booted up and shutdown the server 
eight times while running the 2.6.10  kernel. Six times the sever shut down 
fine. Two times -- only when I had  unplugged the Ethernet connection during my 
session -- I got the Oops when I  shut down Ini both cases, it was about 2 
minutes after unplugging the Ethernet  cable.  

So, do you know what this means? Sure, the easy solution  would be "don't 
unplug the Ethernet cable while you're running." I can follow  that rule, but out 
in the field where the servers go, there will be accidents.  

Should I still try that patch? 

Regards, 
Andy Liebman  

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Frequent Oops on Shutdown 2.6.10
@ 2005-02-22 14:14 AndyLiebman
  0 siblings, 0 replies; 15+ messages in thread
From: AndyLiebman @ 2005-02-22 14:14 UTC (permalink / raw)
  To: herbert, yoshfuji; +Cc: terryg, netdev, davem, akpm

For what it's worth, I believe I only get this  Oops if I have unplugged an 
Ethernet cable while running the server. 

I  have 4 Ethernet ports on the server -- and in fact I am testing and 
configuring  many servers  at the same time. All servers are set up with the exact 
same  image, and the same set of IP addresses. Sometimes for convenience, I 
unplug an  ethernet cable from one server and plug it into another server -- 
while they're  running -- so that I can operate a machine remotely (I never 
connect more than  one server to my network at a time, to avoid IP address 
conflicts). Unplugging  and plugging Ethernet cables while running ALWAYS leads to nmbd 
errors on  shutdown -- guaranteed -- but with the 2.6.6 kernel never an Oops. 
I only get an  Oops with the 2.6.10 kernel. 

I'm going to do a more rigorous test today  to see if the Oops behavior 
really is 100 percent correlated with unplugging and  plugging the Ethernet cable. 

So, should I test the patch? 

Andy  Liebman

-------------------------------------------------------OLD  MESSAGES BELOW 
--------
In a message dated 2/22/2005 5:17:19 A.M. Eastern  Standard Time, 
herbert@gondor.apana.org.au writes:
On Tue, Feb 22, 2005 at  08:57:19PM +1100, Herbert Xu wrote:
> YOSHIFUJI Hideaki / ????  <yoshfuji@linux-ipv6.org> wrote:
> > In article  <20050221.162241.24618885.yoshfuji@linux-ipv6.org> (at Mon, 
21 Feb 2005  16:22:41 +0900 (JST)), YOSHIFUJI Hideaki / ???? 
<yoshfuji@linux-ipv6.org>  says:
> > 
> >> [IPV6] Don't remove dev_snmp6 procfs entry  until all users gone.
> 
> Sorry, but I don't see how this patch  explains the oops the
> people saw.

OK, I think I see what you were  trying to fix now.  Unfortunately
I think this patch doesn't quite cure  the problem.

First of all you can't sleep in snmp6_unregister_dev so  semaphores
are out.  More importantly, the race is still on.

Here  is what  happens:

CPU0                     CPU1
ifdown eth0
...
ifup  eth0
snmp6_register_dev
adds proc entry
in6_dev_finish_destroy
snmp6_unregister_dev
deletes new  proc entry

The next ifdown may fail because snmp6_unregister_dev will  retrieve the
name from a proc entry that's already been deleted.

I see  two solutions:

1) Unregister the proc entry earlier.  In other  words, do it in
addrconf_ifdown.  Since this is highly serialised it  means that
we can't add the new proc entry before the old proc entry  has
been deleted.

2) Fix procfs so that we delete by pointer instead  of name.  This
makes sense from a semantic pointer of view.   However, for this
particular instance it means that we may have two "eth0"  entries
for as long as the old idev entry sticks around.

Cheers,
--  
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu  ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page:  http://gondor.apana.org.au/~herbert/
PGP Key:  http://gondor.apana.org.au/~herbert/pubkey.txt  

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Fw: Frequent Oops on Shutdown 2.6.10
@ 2005-02-21  6:03 Andrew Morton
  2005-02-21  7:22 ` YOSHIFUJI Hideaki / 吉藤英明
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2005-02-21  6:03 UTC (permalink / raw)
  To: netdev; +Cc: AndyLiebman



Begin forwarded message:

Date: Sun, 20 Feb 2005 09:56:04 EST
From: AndyLiebman@aol.com
To: linux-kernel@vger.kernel.org
Subject: Frequent Oops on Shutdown 2.6.10


Hi, 
I compiled the 2.6.10 kernel with HyperThreading optimization (I'm  running a 
3.06 Ghz single Xeon processor with HT enabled). More or less, I'm  running 
Mandrake 10 Official, but with my own kernel. Can anybody help explain  why I'm 
getting this Oops on shutdown? It doesn't happen all the time -- about  50 
percent of the time. Never happens with the 2.6.6 kernel. I configured the  
2.6.10 kernel with mostly the same settings -- saying "no" to everything new,  
except optimizing for P4/Xeon processor, enabling HT optimization, and NOT  
enabling lots of "ham radio" and "ISDN" stuff that seemed to be enabled in the  
Mandrake kernel. 
 
Some relevant things about the machine: 
Single Xeon 3.06 processor
2 GB ECC RAM
2x 3ware 9500S-8 SATA RAID Cards
16 SATA drives
2 built-in GigE ports on motherboard
2 Intel 1000 MT Server Adapters on each of the two 133 Mhz  PCI-X slots
 
Here's the output from the Oops


*pde = 00000000
Oops:  0000 [#1]
SMP
Modules linked in: raid) appletalk xfs sd_mod sg sr_mod  3w_9xxx scsi_mod 
nfsd exportfs ipv6 af_packet raw ide_floppy ide_tape ide_cd  cdrom e1000 uhci_hcd 
usbcore rtc ext3 jbd
CPU: 1
EIP:  0060:[<c018b600>] Not tainted VLI
EFLAGS: 00010246  (2.6.10es-feb06)
EIP is at remove_proc_entry+0x2a/0x166
eax: 00000000 ebx:  f66a4e00 ecx: ffffffff edx: f6da1300
esi: f7cfb000 edi: 00000005 ebp:  c2183eb4 esp: c2183e94
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0,  threadinfo=c2182000 task=c214e520)
Stack: c043402c c043402c 00000000 c2024980  00000005 f66a4e00 f7cfb000 
00000000 c2183ec8 f8c4f051 00000005 f6da1300 f66a4e00  c2183ee8 f8c2cc7b f66a4e00 
c2024980 00000002 00000000 f6da7e80 f6da6080 c2183f04  c0289967 f66a4e00
Call Trace: 
[<c0103352>]  show_stack+0xaf/0xb7
[<c01034d9>]  show_registers+0x15f/0x1d2
[<c01036dd>]  die+0xfa/0x180
[<c011365e>]  do_page_fault+0x464/0x646
[<c0102fbf>]  error_code+0x2b/0x30
[<f8c4f051>] snmp6_unregister_dev+0x41/0x57  [ipv6]
[<f8c2cc7b>] in6_dev_finish_destroy+0x35/0xb6  [ipv6]
[<c0289967>] dst_destroy+0xa2/0xcd
[<c028969a>]  dst_run_gc+0x72/0xfb
[<c0123584>]  run_timer_softirq+0xc4/0x185
[<c011f631>]  __do_softirq+0x65/0xd3
[<c011f6d0>]  do_softirq+0x31/0x33
[<c0102f18>]  apic_timer_interrupt+0x1c/0x24
[<c0100747>]  cpu_idle+0x31/0x3f
[<00000000>] 0x0
[<c2183fbc>]  0xc2183fbc
Code: e2 55 89 ef 83 ec 20 8b 55 0c 8b 4d 08 89 5d f4 85 d2 89 75  f8 89 7d 
fc 89 4d f0 0f 84 b0 00 00 00 8b 7d f0 31 c0 b9 ff ff ff ff <f2>  ae f7 d1 49 
8b 42 34 8d 5a 34 85 c0 89 ce 0f 84 84 00 00 00 
<0>Kernel  panic ___ not syncing: Fatal exception in interrupt
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-02-28 22:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-28 22:25 Frequent Oops on Shutdown 2.6.10 AndyLiebman
  -- strict thread matches above, loose matches on Subject: below --
2005-02-23 14:53 AndyLiebman
2005-02-23 14:51 AndyLiebman
2005-02-23 16:43 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-22 23:30 AndyLiebman
2005-02-22 14:14 AndyLiebman
2005-02-21  6:03 Fw: " Andrew Morton
2005-02-21  7:22 ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-21  7:29   ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-22  9:57     ` Herbert Xu
2005-02-22 10:15       ` Herbert Xu
2005-02-23  9:35         ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-23  9:51           ` Herbert Xu
2005-02-23 16:41             ` YOSHIFUJI Hideaki / 吉藤英明
2005-02-23 22:48               ` Herbert Xu
2005-02-24  4:17                 ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).