All of lore.kernel.org
 help / color / mirror / Atom feed
* help with horrible network failures
@ 2005-03-01 21:06 Rob Gardner
  0 siblings, 0 replies; 13+ messages in thread
From: Rob Gardner @ 2005-03-01 21:06 UTC (permalink / raw)
  To: xen-devel

I've got two machines running identical versions of xen & linux. One of 
them constantly has problems under high network loads (70-100% httperf 
loads). This is using xen-unstable checked out on or about Feb 13. Any 
clues? See below for details.

Rob


KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1284)
(this messager sometimes repeated dozens of times...)
Followed by:
Unable to handle kernel paging request at virtual address a02e19e0
  printing eip:
c0115fad
*pde = ma 00000000 pa 55555000
  [<c0116b21>] __wake_up_common+0x41/0x60
  [<c0116b8c>] __wake_up+0x4c/0xb0
  [<c023d2d9>] sock_def_wakeup+0x49/0x50
  [<c026a9d9>] tcp_rcv_state_process+0x749/0x970
  [<c02723d6>] tcp_v4_do_rcv+0xa6/0x130
  [<c0272a18>] tcp_v4_rcv+0x5b8/0x850
  [<c02158b7>] add_timer_randomness+0x107/0x130
  [<c025740b>] ip_local_deliver+0xab/0x160
  [<c02577ea>] ip_rcv+0x32a/0x460
  [<c0210bf0>] memmove+0x50/0x60
  [<c0244553>] netif_receive_skb+0x133/0x1c0
  [<c0238601>] netif_poll+0x301/0x660
  [<c023de84>] kfree_skbmem+0x24/0x30
  [<c0244815>] net_rx_action+0xb5/0x1a0
  [<c011f545>] __do_softirq+0xc5/0xf0
  [<c011f5fa>] do_softirq+0x8a/0x90
  [<c0136355>] irq_exit+0x35/0x40
  [<c010e262>] do_IRQ+0x22/0x30
  [<c0106048>] evtchn_do_upcall+0xa8/0x110
  [<c0109dc7>] hypervisor_callback+0x37/0x40
Oops: 0000 [#1]
PREEMPT
Modules linked in:
CPU:    0
EIP:    0061:[<c0115fad>]    Not tainted VLI
EFLAGS: 00010202   (2.6.10-xenU)
EIP is at try_to_wake_up+0x1d/0xf0
eax: c033d860   ebx: a02e19e0   ecx: 00000001   edx: c2403d88
esi: c527dc98   edi: 00000000   ebp: c2403d98   esp: c2403d7c
ds: 007b   es: 007b   ss: 0069
Process httpd (pid: 7900, threadinfo=c2402000 task=c41c7a60)
Stack: a02e19e0 c2403d88 00000004 00000001 00000000 c527dc98 00000000 
c2403dbc
        c0116b21 a02e19e0 00000001 00000000 00000000 00000000 00000000 
c2402000
        c2403de8 c0116b8c c527dc98 00000001 00000000 00000000 00000000 
00000000
Call Trace:
  [<c0116b21>] __wake_up_common+0x41/0x60
  [<c0116b8c>] __wake_up+0x4c/0xb0
  [<c023d2d9>] sock_def_wakeup+0x49/0x50
  [<c026a9d9>] tcp_rcv_state_process+0x749/0x970
  [<c02723d6>] tcp_v4_do_rcv+0xa6/0x130
  [<c0272a18>] tcp_v4_rcv+0x5b8/0x850
  [<c02158b7>] add_timer_randomness+0x107/0x130
  [<c025740b>] ip_local_deliver+0xab/0x160
  [<c02577ea>] ip_rcv+0x32a/0x460
  [<c0210bf0>] memmove+0x50/0x60
  [<c0244553>] netif_receive_skb+0x133/0x1c0
  [<c0238601>] netif_poll+0x301/0x660
  [<c023de84>] kfree_skbmem+0x24/0x30
  [<c0244815>] net_rx_action+0xb5/0x1a0
  [<c011f545>] __do_softirq+0xc5/0xf0
  [<c011f5fa>] do_softirq+0x8a/0x90
  [<c0136355>] irq_exit+0x35/0x40
  [<c010e262>] do_IRQ+0x22/0x30
  [<c0106048>] evtchn_do_upcall+0xa8/0x110
  [<c0109dc7>] hypervisor_callback+0x37/0x40
Code: 28 00 00 00 00 8b 5d fc 89 ec 5d c3 89 f6 55 89 e5 57 8d 45 f0 31 
ff 56 53
  83 ec 1   0 8b 5d 08 89 44 24 04 89 1c 24 e8 33 fc ff ff <8b> 13 89 c6 
8b 45 0c
  85 d0 74 4d 8b 43    28 85 c0 75 40 83 fa 02
  <0>Kernel panic - not syncing: Fatal exception in interrupt




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: help with horrible network failures
@ 2005-03-02 13:37 Ian Pratt
  2005-03-02 18:01 ` Nivedita Singhvi
  2005-03-02 18:10 ` Rob Gardner
  0 siblings, 2 replies; 13+ messages in thread
From: Ian Pratt @ 2005-03-02 13:37 UTC (permalink / raw)
  To: Rob Gardner, xen-devel; +Cc: ian.pratt

 
> I've got two machines running identical versions of xen & 
> linux. One of 
> them constantly has problems under high network loads 
> (70-100% httperf 
> loads). This is using xen-unstable checked out on or about 
> Feb 13. Any 
> clues? See below for details.

One machine exhibits the bug, the other doesn't? How similar is the h/w?

Can you reproduce with a single high-rate TCP stream? Do you have any of
the iptables/netfilter connection tracking stuff in your kernel?

Can you reproduce with an older version of Xen?

Ian

> Rob
> 
> 
> KERNEL: assertion (flags & MSG_PEEK) failed at net/ipv4/tcp.c (1284)
> (this messager sometimes repeated dozens of times...)
> Followed by:
> Unable to handle kernel paging request at virtual address a02e19e0
>   printing eip:
> c0115fad
> *pde = ma 00000000 pa 55555000
>   [<c0116b21>] __wake_up_common+0x41/0x60
>   [<c0116b8c>] __wake_up+0x4c/0xb0
>   [<c023d2d9>] sock_def_wakeup+0x49/0x50
>   [<c026a9d9>] tcp_rcv_state_process+0x749/0x970
>   [<c02723d6>] tcp_v4_do_rcv+0xa6/0x130
>   [<c0272a18>] tcp_v4_rcv+0x5b8/0x850
>   [<c02158b7>] add_timer_randomness+0x107/0x130
>   [<c025740b>] ip_local_deliver+0xab/0x160
>   [<c02577ea>] ip_rcv+0x32a/0x460
>   [<c0210bf0>] memmove+0x50/0x60
>   [<c0244553>] netif_receive_skb+0x133/0x1c0
>   [<c0238601>] netif_poll+0x301/0x660
>   [<c023de84>] kfree_skbmem+0x24/0x30
>   [<c0244815>] net_rx_action+0xb5/0x1a0
>   [<c011f545>] __do_softirq+0xc5/0xf0
>   [<c011f5fa>] do_softirq+0x8a/0x90
>   [<c0136355>] irq_exit+0x35/0x40
>   [<c010e262>] do_IRQ+0x22/0x30
>   [<c0106048>] evtchn_do_upcall+0xa8/0x110
>   [<c0109dc7>] hypervisor_callback+0x37/0x40
> Oops: 0000 [#1]
> PREEMPT
> Modules linked in:
> CPU:    0
> EIP:    0061:[<c0115fad>]    Not tainted VLI
> EFLAGS: 00010202   (2.6.10-xenU)
> EIP is at try_to_wake_up+0x1d/0xf0
> eax: c033d860   ebx: a02e19e0   ecx: 00000001   edx: c2403d88
> esi: c527dc98   edi: 00000000   ebp: c2403d98   esp: c2403d7c
> ds: 007b   es: 007b   ss: 0069
> Process httpd (pid: 7900, threadinfo=c2402000 task=c41c7a60)
> Stack: a02e19e0 c2403d88 00000004 00000001 00000000 c527dc98 00000000 
> c2403dbc
>         c0116b21 a02e19e0 00000001 00000000 00000000 00000000 
> 00000000 
> c2402000
>         c2403de8 c0116b8c c527dc98 00000001 00000000 00000000 
> 00000000 
> 00000000
> Call Trace:
>   [<c0116b21>] __wake_up_common+0x41/0x60
>   [<c0116b8c>] __wake_up+0x4c/0xb0
>   [<c023d2d9>] sock_def_wakeup+0x49/0x50
>   [<c026a9d9>] tcp_rcv_state_process+0x749/0x970
>   [<c02723d6>] tcp_v4_do_rcv+0xa6/0x130
>   [<c0272a18>] tcp_v4_rcv+0x5b8/0x850
>   [<c02158b7>] add_timer_randomness+0x107/0x130
>   [<c025740b>] ip_local_deliver+0xab/0x160
>   [<c02577ea>] ip_rcv+0x32a/0x460
>   [<c0210bf0>] memmove+0x50/0x60
>   [<c0244553>] netif_receive_skb+0x133/0x1c0
>   [<c0238601>] netif_poll+0x301/0x660
>   [<c023de84>] kfree_skbmem+0x24/0x30
>   [<c0244815>] net_rx_action+0xb5/0x1a0
>   [<c011f545>] __do_softirq+0xc5/0xf0
>   [<c011f5fa>] do_softirq+0x8a/0x90
>   [<c0136355>] irq_exit+0x35/0x40
>   [<c010e262>] do_IRQ+0x22/0x30
>   [<c0106048>] evtchn_do_upcall+0xa8/0x110
>   [<c0109dc7>] hypervisor_callback+0x37/0x40
> Code: 28 00 00 00 00 8b 5d fc 89 ec 5d c3 89 f6 55 89 e5 57 
> 8d 45 f0 31 
> ff 56 53
>   83 ec 1   0 8b 5d 08 89 44 24 04 89 1c 24 e8 33 fc ff ff 
> <8b> 13 89 c6 
> 8b 45 0c
>   85 d0 74 4d 8b 43    28 85 c0 75 40 83 fa 02
>   <0>Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from 
> real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
> 


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 13:37 Ian Pratt
@ 2005-03-02 18:01 ` Nivedita Singhvi
  2005-03-02 18:10 ` Rob Gardner
  1 sibling, 0 replies; 13+ messages in thread
From: Nivedita Singhvi @ 2005-03-02 18:01 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Rob Gardner, xen-devel, ian.pratt

Ian Pratt wrote:

>>I've got two machines running identical versions of xen & 
>>linux. One of 
>>them constantly has problems under high network loads 
>>(70-100% httperf 
>>loads). This is using xen-unstable checked out on or about 
>>Feb 13. Any 
>>clues? See below for details.
> 
> 
> One machine exhibits the bug, the other doesn't? How similar is the h/w?

I believe this is a known unresolved bug in mainline.
It's possible Rob is provoking it by using NAPI - and
it is load-related, so that might explain why one box
is seeing it and one isn't. Essentially, data is getting
reordered where it really shouldn't, due to a race that
isn't clear. It would be very helpful to track this down
finally - if it hasn't already..

> Can you reproduce with a single high-rate TCP stream? Do you have any of
> the iptables/netfilter connection tracking stuff in your kernel?

> Can you reproduce with an older version of Xen?

I suspect that it is independent of Xen - unless some versions
of Xen slow down traffic enough to change the window..

thanks,
Nivedita


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 13:37 Ian Pratt
  2005-03-02 18:01 ` Nivedita Singhvi
@ 2005-03-02 18:10 ` Rob Gardner
  2005-03-02 18:37   ` Nivedita Singhvi
  1 sibling, 1 reply; 13+ messages in thread
From: Rob Gardner @ 2005-03-02 18:10 UTC (permalink / raw)
  Cc: xen-devel

Ian Pratt wrote:
>  
> Rob Gardner wrote: 
>>I've got two machines running identical versions of xen & 
>>linux. One of 
>>them constantly has problems under high network loads 
>>(70-100% httperf 
>>loads). This is using xen-unstable checked out on or about 
>>Feb 13. Any 
>>clues? See below for details.

> One machine exhibits the bug, the other doesn't? How similar is the h/w?

The machine with the problem:
Intel e100 nic
1.7 Ghz Xeon

The machine that does not exhibit the problem:
Tigon3 nic
2.6Ghz P4

> 
> Can you reproduce with a single high-rate TCP stream? 

No. The problem seems to stem from the creation or use of lots of 
multiple streams.

 > Can you reproduce with an older version of Xen?

We have not seen this problem before using countless older versions of xen.

>                                           Do you have any of
> the iptables/netfilter connection tracking stuff in your kernel?

We're using vanilla xen-unstable, haven't changed any parameters at all:

#
# IP: Virtual Server Configuration
#
# CONFIG_IP_VS is not set
# CONFIG_IPV6 is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_BRIDGE_NETFILTER=y

#
# IP: Netfilter Configuration
#
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_CT_ACCT=y
# CONFIG_IP_NF_CONNTRACK_MARK is not set
# CONFIG_IP_NF_CT_PROTO_SCTP is not set
CONFIG_IP_NF_FTP=m
# CONFIG_IP_NF_IRC is not set
# CONFIG_IP_NF_TFTP is not set
# CONFIG_IP_NF_AMANDA is not set
# CONFIG_IP_NF_QUEUE is not set
CONFIG_IP_NF_IPTABLES=m
# CONFIG_IP_NF_MATCH_LIMIT is not set
CONFIG_IP_NF_MATCH_IPRANGE=m
# CONFIG_IP_NF_MATCH_MAC is not set
# CONFIG_IP_NF_MATCH_PKTTYPE is not set
# CONFIG_IP_NF_MATCH_MARK is not set
# CONFIG_IP_NF_MATCH_MULTIPORT is not set
# CONFIG_IP_NF_MATCH_TOS is not set
# CONFIG_IP_NF_MATCH_RECENT is not set
# CONFIG_IP_NF_MATCH_ECN is not set
# CONFIG_IP_NF_MATCH_DSCP is not set
# CONFIG_IP_NF_MATCH_AH_ESP is not set
# CONFIG_IP_NF_MATCH_LENGTH is not set
# CONFIG_IP_NF_MATCH_TTL is not set
# CONFIG_IP_NF_MATCH_TCPMSS is not set
# CONFIG_IP_NF_MATCH_HELPER is not set
# CONFIG_IP_NF_MATCH_STATE is not set
# CONFIG_IP_NF_MATCH_CONNTRACK is not set
# CONFIG_IP_NF_MATCH_OWNER is not set
# CONFIG_IP_NF_MATCH_PHYSDEV is not set
# CONFIG_IP_NF_MATCH_ADDRTYPE is not set
# CONFIG_IP_NF_MATCH_REALM is not set
# CONFIG_IP_NF_MATCH_SCTP is not set
# CONFIG_IP_NF_MATCH_COMMENT is not set
# CONFIG_IP_NF_MATCH_HASHLIMIT is not set
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
# CONFIG_IP_NF_TARGET_LOG is not set
# CONFIG_IP_NF_TARGET_ULOG is not set
# CONFIG_IP_NF_TARGET_TCPMSS is not set
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
# CONFIG_IP_NF_TARGET_REDIRECT is not set
# CONFIG_IP_NF_TARGET_NETMAP is not set
# CONFIG_IP_NF_TARGET_SAME is not set
# CONFIG_IP_NF_NAT_LOCAL is not set
# CONFIG_IP_NF_NAT_SNMP_BASIC is not set
CONFIG_IP_NF_NAT_FTP=m
# CONFIG_IP_NF_MANGLE is not set
# CONFIG_IP_NF_RAW is not set
# CONFIG_IP_NF_ARPTABLES is not set
# CONFIG_IP_NF_COMPAT_IPCHAINS is not set
# CONFIG_IP_NF_COMPAT_IPFWADM is not set

#
# Bridge: Netfilter Configuration
#
# CONFIG_BRIDGE_NF_EBTABLES is not set




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 18:10 ` Rob Gardner
@ 2005-03-02 18:37   ` Nivedita Singhvi
  2005-03-02 20:07     ` Jon Mason
  2005-03-02 22:52     ` Rob Gardner
  0 siblings, 2 replies; 13+ messages in thread
From: Nivedita Singhvi @ 2005-03-02 18:37 UTC (permalink / raw)
  To: Rob Gardner; +Cc: xen-devel

Rob Gardner wrote:

> The machine with the problem:
> Intel e100 nic
> 1.7 Ghz Xeon

Yep, practically all instances of this problem were
with the e100. Unfortunately, the current version
of the driver no longer has NAPI as a dynamically
tunable parameter via ethtool. It can be disabled
via a kernel config parameter (CONFIG_E100_NAPI).
You could recompile and see if the problem disappears.
(Not a real fix).

I'd be very interested if you could switch to tg3
on the other box too, and see if you can reproduce
the problem. It all depends on the traffic, phase
of the moon, etc..

> The machine that does not exhibit the problem:
> Tigon3 nic
> 2.6Ghz P4

Your sysctl settings would be helpful, too..


thanks,
Nivedita



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 18:37   ` Nivedita Singhvi
@ 2005-03-02 20:07     ` Jon Mason
  2005-03-02 20:20       ` Nivedita Singhvi
  2005-03-02 22:52     ` Rob Gardner
  1 sibling, 1 reply; 13+ messages in thread
From: Jon Mason @ 2005-03-02 20:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Nivedita Singhvi, Rob Gardner

On Wednesday 02 March 2005 12:37 pm, Nivedita Singhvi wrote:
> Rob Gardner wrote:
> > The machine with the problem:
> > Intel e100 nic
> > 1.7 Ghz Xeon
>
> Yep, practically all instances of this problem were
> with the e100. Unfortunately, the current version
> of the driver no longer has NAPI as a dynamically
> tunable parameter via ethtool. It can be disabled
> via a kernel config parameter (CONFIG_E100_NAPI).
> You could recompile and see if the problem disappears.
> (Not a real fix).

This isn't quite true.  Though a compile option until recently, it was 
discovered that e100 had NAPI hooks regardless of config enablement.  The 
compile option has been removed from the latest kernels.  


> I'd be very interested if you could switch to tg3
> on the other box too, and see if you can reproduce
> the problem. It all depends on the traffic, phase
> of the moon, etc..
>
> > The machine that does not exhibit the problem:
> > Tigon3 nic
> > 2.6Ghz P4
>
> Your sysctl settings would be helpful, too..
>
>
> thanks,
> Nivedita

-- 
Jon Mason
jdmason@us.ibm.com


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 20:07     ` Jon Mason
@ 2005-03-02 20:20       ` Nivedita Singhvi
  2005-03-02 20:55         ` Nivedita Singhvi
  0 siblings, 1 reply; 13+ messages in thread
From: Nivedita Singhvi @ 2005-03-02 20:20 UTC (permalink / raw)
  To: Jon Mason; +Cc: xen-devel, Rob Gardner

Jon Mason wrote:

> This isn't quite true.  Though a compile option until recently, it was 
> discovered that e100 had NAPI hooks regardless of config enablement.  The 

Can one still use ethtool to disable it if it's compiled in, Jon?

> compile option has been removed from the latest kernels.  

Yep, but that was post 2.6.10 (latest xen kernel).
So it's not tunable at all now? Was that Scott's intent?

thanks,
Nivedita



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 20:20       ` Nivedita Singhvi
@ 2005-03-02 20:55         ` Nivedita Singhvi
  0 siblings, 0 replies; 13+ messages in thread
From: Nivedita Singhvi @ 2005-03-02 20:55 UTC (permalink / raw)
  To: Jon Mason; +Cc: xen-devel, Rob Gardner

Nivedita Singhvi wrote:

> Can one still use ethtool to disable it if it's compiled in, Jon?

Sorry, ethtool can't be used to disable/enable NAPI. Ignore
this idiot, neuron misfire...

thanks,
Nivedita



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 18:37   ` Nivedita Singhvi
  2005-03-02 20:07     ` Jon Mason
@ 2005-03-02 22:52     ` Rob Gardner
  2005-03-02 23:13       ` Jon Mason
  2005-03-02 23:43       ` Nivedita Singhvi
  1 sibling, 2 replies; 13+ messages in thread
From: Rob Gardner @ 2005-03-02 22:52 UTC (permalink / raw)
  Cc: xen-devel

Nivedita Singhvi wrote:
> Rob Gardner wrote:
> 
>> The machine with the problem:
>> Intel e100 nic
>> 1.7 Ghz Xeon
> 
> 
> Yep, practically all instances of this problem were
> with the e100. Unfortunately, the current version
> of the driver no longer has NAPI as a dynamically
> tunable parameter via ethtool. It can be disabled
> via a kernel config parameter (CONFIG_E100_NAPI).
> You could recompile and see if the problem disappears.
> (Not a real fix).

My current configuration has:
# CONFIG_E100_NAPI is not set

Isn't that the same as it being disabled? Or should I change it to:
CONFIG_E100_NAPI=n


> I'd be very interested if you could switch to tg3
> on the other box too, and see if you can reproduce
> the problem. It all depends on the traffic, phase
> of the moon, etc..

I'm afraid that isn't easy to accomplish. I am working with another 
researcher in a faraway land, and so I do not have direct control over 
their machine. Changing their nic could take a while.

> Sorry, ethtool can't be used to disable/enable NAPI. Ignore
> this idiot, neuron misfire...

So,,, what's the conclusion?


> Your sysctl settings would be helpful, too..

sunrpc.tcp_slot_table_entries = 16
sunrpc.udp_slot_table_entries = 16
sunrpc.nlm_debug = 0
sunrpc.nfsd_debug = 0
sunrpc.nfs_debug = 0
sunrpc.rpc_debug = 0
xen.independent_wallclock = 0
dev.raid.speed_limit_max = 200000
dev.raid.speed_limit_min = 1000
dev.cdrom.check_media = 0
dev.cdrom.lock = 1
dev.cdrom.debug = 0
dev.cdrom.autoeject = 0
dev.cdrom.autoclose = 1
dev.cdrom.info = CD-ROM information, Id: cdrom.c 3.20 2003/12/17
dev.cdrom.info =
dev.cdrom.info = drive name:		hdc
dev.cdrom.info = drive speed:		48
dev.cdrom.info = drive # of slots:	1
dev.cdrom.info = Can close tray:		1
dev.cdrom.info = Can open tray:		1
dev.cdrom.info = Can lock tray:		1
dev.cdrom.info = Can change speed:	1
dev.cdrom.info = Can select disk:	0
dev.cdrom.info = Can read multisession:	1
dev.cdrom.info = Can read MCN:		1
dev.cdrom.info = Reports media changed:	1
dev.cdrom.info = Can play audio:		1
dev.cdrom.info = Can write CD-R:		0
dev.cdrom.info = Can write CD-RW:	0
dev.cdrom.info = Can read DVD:		0
dev.cdrom.info = Can write DVD-R:	0
dev.cdrom.info = Can write DVD-RAM:	0
dev.cdrom.info = Can read MRW:		1
dev.cdrom.info = Can write MRW:		1
dev.cdrom.info = Can write RAM:		1
dev.cdrom.info =
dev.cdrom.info =
dev.scsi.logging_level = 0
fs.nfs.nlm_tcpport = 0
fs.nfs.nlm_udpport = 0
fs.nfs.nlm_timeout = 10
fs.nfs.nlm_grace_period = 0
fs.aio-max-nr = 65536
fs.aio-nr = 0
fs.lease-break-time = 45
fs.dir-notify-enable = 1
fs.leases-enable = 1
fs.overflowgid = 65534
fs.overflowuid = 65534
fs.dentry-state = 1566	448	45	0	0	0
fs.file-max = 12185
fs.file-nr = 525	0	12185
fs.inode-state = 1638	222	0	0	0	0	0
fs.inode-nr = 1638	222
net.bridge.bridge-nf-filter-vlan-tagged = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1
net.unix.max_dgram_qlen = 10
net.ipv4.conf.xen-br0.force_igmp_version = 0
net.ipv4.conf.xen-br0.disable_policy = 0
net.ipv4.conf.xen-br0.disable_xfrm = 0
net.ipv4.conf.xen-br0.arp_ignore = 0
net.ipv4.conf.xen-br0.arp_announce = 0
net.ipv4.conf.xen-br0.arp_filter = 0
net.ipv4.conf.xen-br0.tag = 0
net.ipv4.conf.xen-br0.log_martians = 0
net.ipv4.conf.xen-br0.bootp_relay = 0
net.ipv4.conf.xen-br0.medium_id = 0
net.ipv4.conf.xen-br0.proxy_arp = 0
net.ipv4.conf.xen-br0.accept_source_route = 1
net.ipv4.conf.xen-br0.send_redirects = 1
net.ipv4.conf.xen-br0.rp_filter = 1
net.ipv4.conf.xen-br0.shared_media = 1
net.ipv4.conf.xen-br0.secure_redirects = 1
net.ipv4.conf.xen-br0.accept_redirects = 1
net.ipv4.conf.xen-br0.mc_forwarding = 0
net.ipv4.conf.xen-br0.forwarding = 0
net.ipv4.conf.eth0.force_igmp_version = 0
net.ipv4.conf.eth0.disable_policy = 0
net.ipv4.conf.eth0.disable_xfrm = 0
net.ipv4.conf.eth0.arp_ignore = 0
net.ipv4.conf.eth0.arp_announce = 0
net.ipv4.conf.eth0.arp_filter = 0
net.ipv4.conf.eth0.tag = 0
net.ipv4.conf.eth0.log_martians = 0
net.ipv4.conf.eth0.bootp_relay = 0
net.ipv4.conf.eth0.medium_id = 0
net.ipv4.conf.eth0.proxy_arp = 0
net.ipv4.conf.eth0.accept_source_route = 1
net.ipv4.conf.eth0.send_redirects = 1
net.ipv4.conf.eth0.rp_filter = 1
net.ipv4.conf.eth0.shared_media = 1
net.ipv4.conf.eth0.secure_redirects = 1
net.ipv4.conf.eth0.accept_redirects = 1
net.ipv4.conf.eth0.mc_forwarding = 0
net.ipv4.conf.eth0.forwarding = 0
net.ipv4.conf.lo.force_igmp_version = 0
net.ipv4.conf.lo.disable_policy = 0
net.ipv4.conf.lo.disable_xfrm = 0
net.ipv4.conf.lo.arp_ignore = 0
net.ipv4.conf.lo.arp_announce = 0
net.ipv4.conf.lo.arp_filter = 0
net.ipv4.conf.lo.tag = 0
net.ipv4.conf.lo.log_martians = 0
net.ipv4.conf.lo.bootp_relay = 0
net.ipv4.conf.lo.medium_id = 0
net.ipv4.conf.lo.proxy_arp = 0
net.ipv4.conf.lo.accept_source_route = 1
net.ipv4.conf.lo.send_redirects = 1
net.ipv4.conf.lo.rp_filter = 1
net.ipv4.conf.lo.shared_media = 1
net.ipv4.conf.lo.secure_redirects = 1
net.ipv4.conf.lo.accept_redirects = 1
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.lo.forwarding = 0
net.ipv4.conf.default.force_igmp_version = 0
net.ipv4.conf.default.disable_policy = 0
net.ipv4.conf.default.disable_xfrm = 0
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.conf.default.arp_announce = 0
net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.tag = 0
net.ipv4.conf.default.log_martians = 0
net.ipv4.conf.default.bootp_relay = 0
net.ipv4.conf.default.medium_id = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.accept_source_route = 1
net.ipv4.conf.default.send_redirects = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.shared_media = 1
net.ipv4.conf.default.secure_redirects = 1
net.ipv4.conf.default.accept_redirects = 1
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.default.forwarding = 0
net.ipv4.conf.all.force_igmp_version = 0
net.ipv4.conf.all.disable_policy = 0
net.ipv4.conf.all.disable_xfrm = 0
net.ipv4.conf.all.arp_ignore = 0
net.ipv4.conf.all.arp_announce = 0
net.ipv4.conf.all.arp_filter = 0
net.ipv4.conf.all.tag = 0
net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.all.bootp_relay = 0
net.ipv4.conf.all.medium_id = 0
net.ipv4.conf.all.proxy_arp = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.send_redirects = 1
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.all.shared_media = 1
net.ipv4.conf.all.secure_redirects = 1
net.ipv4.conf.all.accept_redirects = 1
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.conf.all.forwarding = 0
net.ipv4.neigh.xen-br0.locktime = 100
net.ipv4.neigh.xen-br0.proxy_delay = 80
net.ipv4.neigh.xen-br0.anycast_delay = 100
net.ipv4.neigh.xen-br0.proxy_qlen = 64
net.ipv4.neigh.xen-br0.unres_qlen = 3
net.ipv4.neigh.xen-br0.gc_stale_time = 60
net.ipv4.neigh.xen-br0.delay_first_probe_time = 5
net.ipv4.neigh.xen-br0.base_reachable_time = 30
net.ipv4.neigh.xen-br0.retrans_time = 100
net.ipv4.neigh.xen-br0.app_solicit = 0
net.ipv4.neigh.xen-br0.ucast_solicit = 3
net.ipv4.neigh.xen-br0.mcast_solicit = 3
net.ipv4.neigh.eth0.locktime = 100
net.ipv4.neigh.eth0.proxy_delay = 80
net.ipv4.neigh.eth0.anycast_delay = 100
net.ipv4.neigh.eth0.proxy_qlen = 64
net.ipv4.neigh.eth0.unres_qlen = 3
net.ipv4.neigh.eth0.gc_stale_time = 60
net.ipv4.neigh.eth0.delay_first_probe_time = 5
net.ipv4.neigh.eth0.base_reachable_time = 30
net.ipv4.neigh.eth0.retrans_time = 100
net.ipv4.neigh.eth0.app_solicit = 0
net.ipv4.neigh.eth0.ucast_solicit = 3
net.ipv4.neigh.eth0.mcast_solicit = 3
net.ipv4.neigh.lo.locktime = 100
net.ipv4.neigh.lo.proxy_delay = 80
net.ipv4.neigh.lo.anycast_delay = 100
net.ipv4.neigh.lo.proxy_qlen = 64
net.ipv4.neigh.lo.unres_qlen = 3
net.ipv4.neigh.lo.gc_stale_time = 60
net.ipv4.neigh.lo.delay_first_probe_time = 5
net.ipv4.neigh.lo.base_reachable_time = 30
net.ipv4.neigh.lo.retrans_time = 100
net.ipv4.neigh.lo.app_solicit = 0
net.ipv4.neigh.lo.ucast_solicit = 3
net.ipv4.neigh.lo.mcast_solicit = 3
net.ipv4.neigh.default.gc_thresh3 = 1024
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_interval = 30
net.ipv4.neigh.default.locktime = 100
net.ipv4.neigh.default.proxy_delay = 80
net.ipv4.neigh.default.anycast_delay = 100
net.ipv4.neigh.default.proxy_qlen = 64
net.ipv4.neigh.default.unres_qlen = 3
net.ipv4.neigh.default.gc_stale_time = 60
net.ipv4.neigh.default.delay_first_probe_time = 5
net.ipv4.neigh.default.base_reachable_time = 30
net.ipv4.neigh.default.retrans_time = 100
net.ipv4.neigh.default.app_solicit = 0
net.ipv4.neigh.default.ucast_solicit = 3
net.ipv4.neigh.default.mcast_solicit = 3
net.ipv4.tcp_tso_win_divisor = 8
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_bic_low_window = 14
net.ipv4.tcp_bic_fast_convergence = 1
net.ipv4.tcp_bic = 1
net.ipv4.tcp_vegas_gamma = 2
net.ipv4.tcp_vegas_beta = 6
net.ipv4.tcp_vegas_alpha = 2
net.ipv4.tcp_vegas_cong_avoid = 0
net.ipv4.tcp_westwood = 0
net.ipv4.tcp_no_metrics_save = 0
net.ipv4.ipfrag_secret_interval = 600
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_frto = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.icmp_ratemask = 6168
net.ipv4.icmp_ratelimit = 100
net.ipv4.tcp_adv_win_scale = 2
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_rmem = 4096	87380	174760
net.ipv4.tcp_wmem = 4096	16384	131072
net.ipv4.tcp_mem = 12288	16384	24576
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_fack = 1
net.ipv4.tcp_orphan_retries = 0
net.ipv4.inet_peer_gc_maxtime = 120
net.ipv4.inet_peer_gc_mintime = 10
net.ipv4.inet_peer_maxttl = 600
net.ipv4.inet_peer_minttl = 120
net.ipv4.inet_peer_threshold = 65664
net.ipv4.igmp_max_msf = 10
net.ipv4.route.secret_interval = 600
net.ipv4.route.min_adv_mss = 256
net.ipv4.route.min_pmtu = 552
net.ipv4.route.mtu_expires = 600
net.ipv4.route.gc_elasticity = 8
net.ipv4.route.error_burst = 500
net.ipv4.route.error_cost = 100
net.ipv4.route.redirect_silence = 2048
net.ipv4.route.redirect_number = 9
net.ipv4.route.redirect_load = 2
net.ipv4.route.gc_interval = 60
net.ipv4.route.gc_timeout = 300
net.ipv4.route.gc_min_interval = 0
net.ipv4.route.max_size = 16384
net.ipv4.route.gc_thresh = 1024
net.ipv4.route.max_delay = 10
net.ipv4.route.min_delay = 2
net.ipv4.icmp_ignore_bogus_error_responses = 0
net.ipv4.icmp_echo_ignore_broadcasts = 0
net.ipv4.icmp_echo_ignore_all = 0
net.ipv4.ip_local_port_range = 1024	4999
net.ipv4.tcp_max_syn_backlog = 256
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.ipfrag_time = 30
net.ipv4.ip_dynaddr = 0
net.ipv4.ipfrag_low_thresh = 196608
net.ipv4.ipfrag_high_thresh = 262144
net.ipv4.tcp_max_tw_buckets = 16384
net.ipv4.tcp_max_orphans = 8192
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syn_retries = 5
net.ipv4.ip_nonlocal_bind = 0
net.ipv4.ip_no_pmtu_disc = 0
net.ipv4.ip_autoconfig = 0
net.ipv4.ip_default_ttl = 64
net.ipv4.ip_forward = 0
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.core.somaxconn = 128
net.core.optmem_max = 10240
net.core.message_burst = 10
net.core.message_cost = 5
net.core.mod_cong = 290
net.core.lo_cong = 100
net.core.no_cong = 20
net.core.no_cong_thresh = 10
net.core.netdev_max_backlog = 300
net.core.dev_weight = 64
net.core.rmem_default = 109568
net.core.wmem_default = 109568
net.core.rmem_max = 109568
net.core.wmem_max = 109568
vm.swap_token_timeout = 0
vm.legacy_va_layout = 0
vm.vfs_cache_pressure = 100
vm.block_dump = 0
vm.laptop_mode = 0
vm.max_map_count = 65536
vm.min_free_kbytes = 1448
vm.lower_zone_protection = 0
vm.swappiness = 60
vm.nr_pdflush_threads = 2
vm.dirty_expire_centisecs = 3000
vm.dirty_writeback_centisecs = 500
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10
vm.page-cluster = 3
vm.overcommit_ratio = 50
vm.overcommit_memory = 0
kernel.ngroups_max = 65536
kernel.printk_ratelimit_burst = 10
kernel.printk_ratelimit = 5
kernel.panic_on_oops = 0
kernel.pid_max = 32768
kernel.overflowgid = 65534
kernel.overflowuid = 65534
kernel.pty.nr = 4
kernel.pty.max = 4096
kernel.random.uuid = 97a20c1a-1f40-4445-bf5a-8ad6d1d7058e
kernel.random.boot_id = 26286f61-9d4f-4a61-acf7-d814ee43879b
kernel.random.write_wakeup_threshold = 128
kernel.random.read_wakeup_threshold = 64
kernel.random.entropy_avail = 2708
kernel.random.poolsize = 512
kernel.threads-max = 2048
kernel.cad_pid = 1
kernel.sem = 250	32000	32	128
kernel.msgmnb = 16384
kernel.msgmni = 16
kernel.msgmax = 8192
kernel.shmmni = 4096
kernel.shmall = 2097152
kernel.shmmax = 33554432
kernel.hotplug = /bin/true
kernel.modprobe = /bin/true
kernel.printk = 6	4	1	7
kernel.ctrl-alt-del = 0
kernel.real-root-dev = 0
kernel.cap-bound = -257
kernel.tainted = 0
kernel.core_pattern = core
kernel.core_uses_pid = 1
kernel.panic = 1
kernel.domainname = (none)
kernel.hostname = vmlc0.hpl.hp.com
kernel.version = #1 Mon Feb 14 11:48:06 MST 2005
kernel.osrelease = 2.6.10-xen0
kernel.ostype = Linux



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 22:52     ` Rob Gardner
@ 2005-03-02 23:13       ` Jon Mason
  2005-03-02 23:34         ` Rob Gardner
  2005-03-02 23:43       ` Nivedita Singhvi
  1 sibling, 1 reply; 13+ messages in thread
From: Jon Mason @ 2005-03-02 23:13 UTC (permalink / raw)
  To: xen-devel; +Cc: Rob Gardner

On Wednesday 02 March 2005 04:52 pm, Rob Gardner wrote:
> Nivedita Singhvi wrote:
> > Rob Gardner wrote:
> >> The machine with the problem:
> >> Intel e100 nic
> >> 1.7 Ghz Xeon
> >
> > Yep, practically all instances of this problem were
> > with the e100. Unfortunately, the current version
> > of the driver no longer has NAPI as a dynamically
> > tunable parameter via ethtool. It can be disabled
> > via a kernel config parameter (CONFIG_E100_NAPI).
> > You could recompile and see if the problem disappears.
> > (Not a real fix).
>
> My current configuration has:
> # CONFIG_E100_NAPI is not set
>
> Isn't that the same as it being disabled? Or should I change it to:
> CONFIG_E100_NAPI=n

Yes, that is disabled, but from my previous statement that does nothing in 
e100 (as it is enabled regardless of that compile flag).  I think there have 
been a few driver fixes that were dropped in the newly released 2.6.11.  If 
you can't change the adapter, it might be worth trying.  

> > I'd be very interested if you could switch to tg3
> > on the other box too, and see if you can reproduce
> > the problem. It all depends on the traffic, phase
> > of the moon, etc..
>
> I'm afraid that isn't easy to accomplish. I am working with another
> researcher in a faraway land, and so I do not have direct control over
> their machine. Changing their nic could take a while.
>
> > Sorry, ethtool can't be used to disable/enable NAPI. Ignore
> > this idiot, neuron misfire...
>
> So,,, what's the conclusion?

NAPI is a compile time option, not a runtime option.  so, it must be enabled 
during kernel compilation.  

> > Your sysctl settings would be helpful, too..
>
> sunrpc.tcp_slot_table_entries = 16
> sunrpc.udp_slot_table_entries = 16
> sunrpc.nlm_debug = 0
> sunrpc.nfsd_debug = 0
> sunrpc.nfs_debug = 0
> sunrpc.rpc_debug = 0
> xen.independent_wallclock = 0
> dev.raid.speed_limit_max = 200000
> dev.raid.speed_limit_min = 1000
> dev.cdrom.check_media = 0
> dev.cdrom.lock = 1
> dev.cdrom.debug = 0
> dev.cdrom.autoeject = 0
> dev.cdrom.autoclose = 1
> dev.cdrom.info = CD-ROM information, Id: cdrom.c 3.20 2003/12/17
> dev.cdrom.info =
> dev.cdrom.info = drive name:  hdc
> dev.cdrom.info = drive speed:  48
> dev.cdrom.info = drive # of slots: 1
> dev.cdrom.info = Can close tray:  1
> dev.cdrom.info = Can open tray:  1
> dev.cdrom.info = Can lock tray:  1
> dev.cdrom.info = Can change speed: 1
> dev.cdrom.info = Can select disk: 0
> dev.cdrom.info = Can read multisession: 1
> dev.cdrom.info = Can read MCN:  1
> dev.cdrom.info = Reports media changed: 1
> dev.cdrom.info = Can play audio:  1
> dev.cdrom.info = Can write CD-R:  0
> dev.cdrom.info = Can write CD-RW: 0
> dev.cdrom.info = Can read DVD:  0
> dev.cdrom.info = Can write DVD-R: 0
> dev.cdrom.info = Can write DVD-RAM: 0
> dev.cdrom.info = Can read MRW:  1
> dev.cdrom.info = Can write MRW:  1
> dev.cdrom.info = Can write RAM:  1
> dev.cdrom.info =
> dev.cdrom.info =
> dev.scsi.logging_level = 0
> fs.nfs.nlm_tcpport = 0
> fs.nfs.nlm_udpport = 0
> fs.nfs.nlm_timeout = 10
> fs.nfs.nlm_grace_period = 0
> fs.aio-max-nr = 65536
> fs.aio-nr = 0
> fs.lease-break-time = 45
> fs.dir-notify-enable = 1
> fs.leases-enable = 1
> fs.overflowgid = 65534
> fs.overflowuid = 65534
> fs.dentry-state = 1566 448 45 0 0 0
> fs.file-max = 12185
> fs.file-nr = 525 0 12185
> fs.inode-state = 1638 222 0 0 0 0 0
> fs.inode-nr = 1638 222
> net.bridge.bridge-nf-filter-vlan-tagged = 1
> net.bridge.bridge-nf-call-ip6tables = 1
> net.bridge.bridge-nf-call-iptables = 1
> net.bridge.bridge-nf-call-arptables = 1
> net.unix.max_dgram_qlen = 10
> net.ipv4.conf.xen-br0.force_igmp_version = 0
> net.ipv4.conf.xen-br0.disable_policy = 0
> net.ipv4.conf.xen-br0.disable_xfrm = 0
> net.ipv4.conf.xen-br0.arp_ignore = 0
> net.ipv4.conf.xen-br0.arp_announce = 0
> net.ipv4.conf.xen-br0.arp_filter = 0
> net.ipv4.conf.xen-br0.tag = 0
> net.ipv4.conf.xen-br0.log_martians = 0
> net.ipv4.conf.xen-br0.bootp_relay = 0
> net.ipv4.conf.xen-br0.medium_id = 0
> net.ipv4.conf.xen-br0.proxy_arp = 0
> net.ipv4.conf.xen-br0.accept_source_route = 1
> net.ipv4.conf.xen-br0.send_redirects = 1
> net.ipv4.conf.xen-br0.rp_filter = 1
> net.ipv4.conf.xen-br0.shared_media = 1
> net.ipv4.conf.xen-br0.secure_redirects = 1
> net.ipv4.conf.xen-br0.accept_redirects = 1
> net.ipv4.conf.xen-br0.mc_forwarding = 0
> net.ipv4.conf.xen-br0.forwarding = 0
> net.ipv4.conf.eth0.force_igmp_version = 0
> net.ipv4.conf.eth0.disable_policy = 0
> net.ipv4.conf.eth0.disable_xfrm = 0
> net.ipv4.conf.eth0.arp_ignore = 0
> net.ipv4.conf.eth0.arp_announce = 0
> net.ipv4.conf.eth0.arp_filter = 0
> net.ipv4.conf.eth0.tag = 0
> net.ipv4.conf.eth0.log_martians = 0
> net.ipv4.conf.eth0.bootp_relay = 0
> net.ipv4.conf.eth0.medium_id = 0
> net.ipv4.conf.eth0.proxy_arp = 0
> net.ipv4.conf.eth0.accept_source_route = 1
> net.ipv4.conf.eth0.send_redirects = 1
> net.ipv4.conf.eth0.rp_filter = 1
> net.ipv4.conf.eth0.shared_media = 1
> net.ipv4.conf.eth0.secure_redirects = 1
> net.ipv4.conf.eth0.accept_redirects = 1
> net.ipv4.conf.eth0.mc_forwarding = 0
> net.ipv4.conf.eth0.forwarding = 0
> net.ipv4.conf.lo.force_igmp_version = 0
> net.ipv4.conf.lo.disable_policy = 0
> net.ipv4.conf.lo.disable_xfrm = 0
> net.ipv4.conf.lo.arp_ignore = 0
> net.ipv4.conf.lo.arp_announce = 0
> net.ipv4.conf.lo.arp_filter = 0
> net.ipv4.conf.lo.tag = 0
> net.ipv4.conf.lo.log_martians = 0
> net.ipv4.conf.lo.bootp_relay = 0
> net.ipv4.conf.lo.medium_id = 0
> net.ipv4.conf.lo.proxy_arp = 0
> net.ipv4.conf.lo.accept_source_route = 1
> net.ipv4.conf.lo.send_redirects = 1
> net.ipv4.conf.lo.rp_filter = 1
> net.ipv4.conf.lo.shared_media = 1
> net.ipv4.conf.lo.secure_redirects = 1
> net.ipv4.conf.lo.accept_redirects = 1
> net.ipv4.conf.lo.mc_forwarding = 0
> net.ipv4.conf.lo.forwarding = 0
> net.ipv4.conf.default.force_igmp_version = 0
> net.ipv4.conf.default.disable_policy = 0
> net.ipv4.conf.default.disable_xfrm = 0
> net.ipv4.conf.default.arp_ignore = 0
> net.ipv4.conf.default.arp_announce = 0
> net.ipv4.conf.default.arp_filter = 0
> net.ipv4.conf.default.tag = 0
> net.ipv4.conf.default.log_martians = 0
> net.ipv4.conf.default.bootp_relay = 0
> net.ipv4.conf.default.medium_id = 0
> net.ipv4.conf.default.proxy_arp = 0
> net.ipv4.conf.default.accept_source_route = 1
> net.ipv4.conf.default.send_redirects = 1
> net.ipv4.conf.default.rp_filter = 1
> net.ipv4.conf.default.shared_media = 1
> net.ipv4.conf.default.secure_redirects = 1
> net.ipv4.conf.default.accept_redirects = 1
> net.ipv4.conf.default.mc_forwarding = 0
> net.ipv4.conf.default.forwarding = 0
> net.ipv4.conf.all.force_igmp_version = 0
> net.ipv4.conf.all.disable_policy = 0
> net.ipv4.conf.all.disable_xfrm = 0
> net.ipv4.conf.all.arp_ignore = 0
> net.ipv4.conf.all.arp_announce = 0
> net.ipv4.conf.all.arp_filter = 0
> net.ipv4.conf.all.tag = 0
> net.ipv4.conf.all.log_martians = 0
> net.ipv4.conf.all.bootp_relay = 0
> net.ipv4.conf.all.medium_id = 0
> net.ipv4.conf.all.proxy_arp = 0
> net.ipv4.conf.all.accept_source_route = 0
> net.ipv4.conf.all.send_redirects = 1
> net.ipv4.conf.all.rp_filter = 0
> net.ipv4.conf.all.shared_media = 1
> net.ipv4.conf.all.secure_redirects = 1
> net.ipv4.conf.all.accept_redirects = 1
> net.ipv4.conf.all.mc_forwarding = 0
> net.ipv4.conf.all.forwarding = 0
> net.ipv4.neigh.xen-br0.locktime = 100
> net.ipv4.neigh.xen-br0.proxy_delay = 80
> net.ipv4.neigh.xen-br0.anycast_delay = 100
> net.ipv4.neigh.xen-br0.proxy_qlen = 64
> net.ipv4.neigh.xen-br0.unres_qlen = 3
> net.ipv4.neigh.xen-br0.gc_stale_time = 60
> net.ipv4.neigh.xen-br0.delay_first_probe_time = 5
> net.ipv4.neigh.xen-br0.base_reachable_time = 30
> net.ipv4.neigh.xen-br0.retrans_time = 100
> net.ipv4.neigh.xen-br0.app_solicit = 0
> net.ipv4.neigh.xen-br0.ucast_solicit = 3
> net.ipv4.neigh.xen-br0.mcast_solicit = 3
> net.ipv4.neigh.eth0.locktime = 100
> net.ipv4.neigh.eth0.proxy_delay = 80
> net.ipv4.neigh.eth0.anycast_delay = 100
> net.ipv4.neigh.eth0.proxy_qlen = 64
> net.ipv4.neigh.eth0.unres_qlen = 3
> net.ipv4.neigh.eth0.gc_stale_time = 60
> net.ipv4.neigh.eth0.delay_first_probe_time = 5
> net.ipv4.neigh.eth0.base_reachable_time = 30
> net.ipv4.neigh.eth0.retrans_time = 100
> net.ipv4.neigh.eth0.app_solicit = 0
> net.ipv4.neigh.eth0.ucast_solicit = 3
> net.ipv4.neigh.eth0.mcast_solicit = 3
> net.ipv4.neigh.lo.locktime = 100
> net.ipv4.neigh.lo.proxy_delay = 80
> net.ipv4.neigh.lo.anycast_delay = 100
> net.ipv4.neigh.lo.proxy_qlen = 64
> net.ipv4.neigh.lo.unres_qlen = 3
> net.ipv4.neigh.lo.gc_stale_time = 60
> net.ipv4.neigh.lo.delay_first_probe_time = 5
> net.ipv4.neigh.lo.base_reachable_time = 30
> net.ipv4.neigh.lo.retrans_time = 100
> net.ipv4.neigh.lo.app_solicit = 0
> net.ipv4.neigh.lo.ucast_solicit = 3
> net.ipv4.neigh.lo.mcast_solicit = 3
> net.ipv4.neigh.default.gc_thresh3 = 1024
> net.ipv4.neigh.default.gc_thresh2 = 512
> net.ipv4.neigh.default.gc_thresh1 = 128
> net.ipv4.neigh.default.gc_interval = 30
> net.ipv4.neigh.default.locktime = 100
> net.ipv4.neigh.default.proxy_delay = 80
> net.ipv4.neigh.default.anycast_delay = 100
> net.ipv4.neigh.default.proxy_qlen = 64
> net.ipv4.neigh.default.unres_qlen = 3
> net.ipv4.neigh.default.gc_stale_time = 60
> net.ipv4.neigh.default.delay_first_probe_time = 5
> net.ipv4.neigh.default.base_reachable_time = 30
> net.ipv4.neigh.default.retrans_time = 100
> net.ipv4.neigh.default.app_solicit = 0
> net.ipv4.neigh.default.ucast_solicit = 3
> net.ipv4.neigh.default.mcast_solicit = 3
> net.ipv4.tcp_tso_win_divisor = 8
> net.ipv4.tcp_moderate_rcvbuf = 1
> net.ipv4.tcp_bic_low_window = 14
> net.ipv4.tcp_bic_fast_convergence = 1
> net.ipv4.tcp_bic = 1
> net.ipv4.tcp_vegas_gamma = 2
> net.ipv4.tcp_vegas_beta = 6
> net.ipv4.tcp_vegas_alpha = 2
> net.ipv4.tcp_vegas_cong_avoid = 0
> net.ipv4.tcp_westwood = 0
> net.ipv4.tcp_no_metrics_save = 0
> net.ipv4.ipfrag_secret_interval = 600
> net.ipv4.tcp_low_latency = 0
> net.ipv4.tcp_frto = 0
> net.ipv4.tcp_tw_reuse = 0
> net.ipv4.icmp_ratemask = 6168
> net.ipv4.icmp_ratelimit = 100
> net.ipv4.tcp_adv_win_scale = 2
> net.ipv4.tcp_app_win = 31
> net.ipv4.tcp_rmem = 4096 87380 174760
> net.ipv4.tcp_wmem = 4096 16384 131072
> net.ipv4.tcp_mem = 12288 16384 24576
> net.ipv4.tcp_dsack = 1
> net.ipv4.tcp_ecn = 0
> net.ipv4.tcp_reordering = 3
> net.ipv4.tcp_fack = 1
> net.ipv4.tcp_orphan_retries = 0
> net.ipv4.inet_peer_gc_maxtime = 120
> net.ipv4.inet_peer_gc_mintime = 10
> net.ipv4.inet_peer_maxttl = 600
> net.ipv4.inet_peer_minttl = 120
> net.ipv4.inet_peer_threshold = 65664
> net.ipv4.igmp_max_msf = 10
> net.ipv4.route.secret_interval = 600
> net.ipv4.route.min_adv_mss = 256
> net.ipv4.route.min_pmtu = 552
> net.ipv4.route.mtu_expires = 600
> net.ipv4.route.gc_elasticity = 8
> net.ipv4.route.error_burst = 500
> net.ipv4.route.error_cost = 100
> net.ipv4.route.redirect_silence = 2048
> net.ipv4.route.redirect_number = 9
> net.ipv4.route.redirect_load = 2
> net.ipv4.route.gc_interval = 60
> net.ipv4.route.gc_timeout = 300
> net.ipv4.route.gc_min_interval = 0
> net.ipv4.route.max_size = 16384
> net.ipv4.route.gc_thresh = 1024
> net.ipv4.route.max_delay = 10
> net.ipv4.route.min_delay = 2
> net.ipv4.icmp_ignore_bogus_error_responses = 0
> net.ipv4.icmp_echo_ignore_broadcasts = 0
> net.ipv4.icmp_echo_ignore_all = 0
> net.ipv4.ip_local_port_range = 1024 4999
> net.ipv4.tcp_max_syn_backlog = 256
> net.ipv4.tcp_rfc1337 = 0
> net.ipv4.tcp_stdurg = 0
> net.ipv4.tcp_abort_on_overflow = 0
> net.ipv4.tcp_tw_recycle = 0
> net.ipv4.tcp_fin_timeout = 60
> net.ipv4.tcp_retries2 = 15
> net.ipv4.tcp_retries1 = 3
> net.ipv4.tcp_keepalive_intvl = 75
> net.ipv4.tcp_keepalive_probes = 9
> net.ipv4.tcp_keepalive_time = 7200
> net.ipv4.ipfrag_time = 30
> net.ipv4.ip_dynaddr = 0
> net.ipv4.ipfrag_low_thresh = 196608
> net.ipv4.ipfrag_high_thresh = 262144
> net.ipv4.tcp_max_tw_buckets = 16384
> net.ipv4.tcp_max_orphans = 8192
> net.ipv4.tcp_synack_retries = 5
> net.ipv4.tcp_syn_retries = 5
> net.ipv4.ip_nonlocal_bind = 0
> net.ipv4.ip_no_pmtu_disc = 0
> net.ipv4.ip_autoconfig = 0
> net.ipv4.ip_default_ttl = 64
> net.ipv4.ip_forward = 0
> net.ipv4.tcp_retrans_collapse = 1
> net.ipv4.tcp_sack = 1
> net.ipv4.tcp_window_scaling = 1
> net.ipv4.tcp_timestamps = 1
> net.core.somaxconn = 128
> net.core.optmem_max = 10240
> net.core.message_burst = 10
> net.core.message_cost = 5
> net.core.mod_cong = 290
> net.core.lo_cong = 100
> net.core.no_cong = 20
> net.core.no_cong_thresh = 10
> net.core.netdev_max_backlog = 300
> net.core.dev_weight = 64
> net.core.rmem_default = 109568
> net.core.wmem_default = 109568
> net.core.rmem_max = 109568
> net.core.wmem_max = 109568
> vm.swap_token_timeout = 0
> vm.legacy_va_layout = 0
> vm.vfs_cache_pressure = 100
> vm.block_dump = 0
> vm.laptop_mode = 0
> vm.max_map_count = 65536
> vm.min_free_kbytes = 1448
> vm.lower_zone_protection = 0
> vm.swappiness = 60
> vm.nr_pdflush_threads = 2
> vm.dirty_expire_centisecs = 3000
> vm.dirty_writeback_centisecs = 500
> vm.dirty_ratio = 40
> vm.dirty_background_ratio = 10
> vm.page-cluster = 3
> vm.overcommit_ratio = 50
> vm.overcommit_memory = 0
> kernel.ngroups_max = 65536
> kernel.printk_ratelimit_burst = 10
> kernel.printk_ratelimit = 5
> kernel.panic_on_oops = 0
> kernel.pid_max = 32768
> kernel.overflowgid = 65534
> kernel.overflowuid = 65534
> kernel.pty.nr = 4
> kernel.pty.max = 4096
> kernel.random.uuid = 97a20c1a-1f40-4445-bf5a-8ad6d1d7058e
> kernel.random.boot_id = 26286f61-9d4f-4a61-acf7-d814ee43879b
> kernel.random.write_wakeup_threshold = 128
> kernel.random.read_wakeup_threshold = 64
> kernel.random.entropy_avail = 2708
> kernel.random.poolsize = 512
> kernel.threads-max = 2048
> kernel.cad_pid = 1
> kernel.sem = 250 32000 32 128
> kernel.msgmnb = 16384
> kernel.msgmni = 16
> kernel.msgmax = 8192
> kernel.shmmni = 4096
> kernel.shmall = 2097152
> kernel.shmmax = 33554432
> kernel.hotplug = /bin/true
> kernel.modprobe = /bin/true
> kernel.printk = 6 4 1 7
> kernel.ctrl-alt-del = 0
> kernel.real-root-dev = 0
> kernel.cap-bound = -257
> kernel.tainted = 0
> kernel.core_pattern = core
> kernel.core_uses_pid = 1
> kernel.panic = 1
> kernel.domainname = (none)
> kernel.hostname = vmlc0.hpl.hp.com
> kernel.version = #1 Mon Feb 14 11:48:06 MST 2005
> kernel.osrelease = 2.6.10-xen0
> kernel.ostype = Linux
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel

-- 
Jon Mason
jdmason@us.ibm.com


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 23:13       ` Jon Mason
@ 2005-03-02 23:34         ` Rob Gardner
  0 siblings, 0 replies; 13+ messages in thread
From: Rob Gardner @ 2005-03-02 23:34 UTC (permalink / raw)
  Cc: xen-devel

Jon Mason wrote:
> On Wednesday 02 March 2005 04:52 pm, Rob Gardner wrote:
> 
>>Nivedita Singhvi wrote:
>>
>>>Rob Gardner wrote:
>>>
>>>>The machine with the problem:
>>>>Intel e100 nic
>>>>1.7 Ghz Xeon
>>>
>>>Yep, practically all instances of this problem were
>>>with the e100. Unfortunately, the current version
>>>of the driver no longer has NAPI as a dynamically
>>>tunable parameter via ethtool. It can be disabled
>>>via a kernel config parameter (CONFIG_E100_NAPI).
>>>You could recompile and see if the problem disappears.
>>>(Not a real fix).
>>
>>My current configuration has:
>># CONFIG_E100_NAPI is not set
>>
>>Isn't that the same as it being disabled? Or should I change it to:
>>CONFIG_E100_NAPI=n
> 
> 
> Yes, that is disabled, but from my previous statement that does nothing in 
> e100 (as it is enabled regardless of that compile flag).  I think there have 
> been a few driver fixes that were dropped in the newly released 2.6.11.  If 
> you can't change the adapter, it might be worth trying.  

So... the answer is to try linux kernel 2.6.11? The latest xen-unstable 
still looks like it's using 2.6.10, and so does xen-2.04. So I don't 
understand what exactly  I should try next.

> 
> NAPI is a compile time option, not a runtime option.  so, it must be enabled 
> during kernel compilation.  

But you just said that it does nothing in e100. Or do you mean I need to 
modify the e100 driver directly?


Rob






-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 22:52     ` Rob Gardner
  2005-03-02 23:13       ` Jon Mason
@ 2005-03-02 23:43       ` Nivedita Singhvi
  2005-03-02 23:58         ` Rob Gardner
  1 sibling, 1 reply; 13+ messages in thread
From: Nivedita Singhvi @ 2005-03-02 23:43 UTC (permalink / raw)
  To: Rob Gardner; +Cc: xen-devel

Rob Gardner wrote:

> My current configuration has:
> # CONFIG_E100_NAPI is not set
> 
> Isn't that the same as it being disabled? Or should I change it to:
> CONFIG_E100_NAPI=n

That means NAPI is off, but from what Jon said earlier,
it's ignoring the compile option.

> I'm afraid that isn't easy to accomplish. I am working with another 
> researcher in a faraway land, and so I do not have direct control over 
> their machine. Changing their nic could take a while.

That's ok - it wasn't critical.

>> Sorry, ethtool can't be used to disable/enable NAPI. Ignore
>> this idiot, neuron misfire...

> So,,, what's the conclusion?

Er, that was just saying that ethtool couldn't be used
to alter this - you had to recompile the kernel in order to
disable/enable NAPI in the e100. And it's unclear whether
you are actually disabling it since it is always on. Not
sure what the maintainers intended, there, but claimed to
be fixed very recently..

However, I think you have two problems here - the
ASSERTION is one of them - but not the fatal problem - the
bad address in wakeup_common is the fatal error.
It would help if you could run some debug code (?),
unless someone on the Xen team or on this list has seen
that one before and knows what's going on..

>> Your sysctl settings would be helpful, too..

None of the following is very relevant to your
oops, but will help in your testing..

Could you turn this on? (this might affect the
assertion - but it's just playing with the race
window, really)..

> net.ipv4.tcp_low_latency = 0
echo 1 > net.ipv4.tcp_low_latency


BTW, if you are running httperf, you would probably
benefit from increasing your socket buffer sizes from
the default.
> net.ipv4.tcp_rmem = 4096    87380    174760
4096 109568 109568

> net.ipv4.tcp_wmem = 4096    16384    131072
4096 109568 109568

> net.ipv4.tcp_mem = 12288    16384    24576
122880 163840 245760


> net.core.optmem_max = 10240
102400

thanks,
Nivedita




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: help with horrible network failures
  2005-03-02 23:43       ` Nivedita Singhvi
@ 2005-03-02 23:58         ` Rob Gardner
  0 siblings, 0 replies; 13+ messages in thread
From: Rob Gardner @ 2005-03-02 23:58 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: xen-devel

Nivedita Singhvi wrote:

> Could you turn this on? (this might affect the
> assertion - but it's just playing with the race
> window, really)..
> 
>> net.ipv4.tcp_low_latency = 0
> 
> echo 1 > net.ipv4.tcp_low_latency


I'll try it and see what happens.

Rob



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-03-02 23:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-01 21:06 help with horrible network failures Rob Gardner
  -- strict thread matches above, loose matches on Subject: below --
2005-03-02 13:37 Ian Pratt
2005-03-02 18:01 ` Nivedita Singhvi
2005-03-02 18:10 ` Rob Gardner
2005-03-02 18:37   ` Nivedita Singhvi
2005-03-02 20:07     ` Jon Mason
2005-03-02 20:20       ` Nivedita Singhvi
2005-03-02 20:55         ` Nivedita Singhvi
2005-03-02 22:52     ` Rob Gardner
2005-03-02 23:13       ` Jon Mason
2005-03-02 23:34         ` Rob Gardner
2005-03-02 23:43       ` Nivedita Singhvi
2005-03-02 23:58         ` Rob Gardner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.