netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Kernel 3.7.2 strange warning and short system hang
       [not found] <5124F57A.6080908@enas.net>
@ 2013-02-20 16:52 ` Eric Dumazet
  2013-02-20 17:57   ` David Miller
  2013-02-22  9:49   ` Urban Loesch
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2013-02-20 16:52 UTC (permalink / raw)
  To: Urban Loesch; +Cc: linux-kernel, netdev

On Wed, 2013-02-20 at 17:10 +0100, Urban Loesch wrote:
> Hi,
> 
> today I had a strange system hang on one of our new Dell PER620 machines.
> I'm running a self compiled kernel, version 3.7.2 with linux vserver patch included.
> 
> uname -a
> Linux dbhost04 3.7.2-vs2.3.5.5-rol-em64t #4 SMP Sun Feb 3 14:08:37 CET 2013 x86_64 GNU/Linux
> 
> 15min. systemload between 1-3.
> 
> 
> Today the system hangs for some seconds and I got the folling errors in syslog multiple times within one second:
> 
> ...
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196338] WARNING: at net/core/skbuff.c:573 skb_release_head_state+0xed/0x100()
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196338] Hardware name: PowerEdge R620
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196352] Modules linked in: lru_cache netconsole configfs act_police cls_basic cls_flow cls_fw cls_u32 
> sch_tbf sch_prio sch_hfsc sch_htb sch_ingress sch_sfq xt_statistic xt_CT xt_realm xt_LOG xt_c
> onnlimit iptable_raw xt_comment xt_nat xt_recent ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah nf_nat_tftp nf_nat_sip nf_nat_pptp 
> nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_conntrack_tftp nf_con
> ntrack_sane nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink 
> nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc ts_kmp nf_conntrack_h323 nf_con
> ntrack_amanda nf_conntrack_ftp xt_TPROXY xt_time nf_tproxy_core xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_NFLOG nfnetlink_log xt_physdev 
> xt_owner xt_NFQUEUE xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt
> _hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY iptable_nat nf_nat_ipv
> Feb 20 15:58:04 dbhost04 kernel: 4 nf_nat ip6t_REJECT nf_conntrack_ipv4 xt_tcpudp nf_defrag_ipv4 xt_state nf_conntrack_ipv6 nf_defrag_ipv6 
> xt_conntrack nf_conntrack iptable_mangle ip6table_raw ip6table_mangle nfnetlink ip6table_filter ip
> 6_tables iptable_filter ip_tables x_tables ipmi_devintf ipmi_si ipmi_msghandler coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel xts aes_x86_64 
> lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support dcdbas microcode pcspkr jo
> ydev lpc_ich shpchp hed evbug hid_generic usbhid hid ahci libahci megaraid_sas tg3 [last unloaded: drbd]
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196368] Pid: 10942, comm: mysqld Tainted: G        W    3.7.2-vs2.3.5.5-rol-em64t #4
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196368] Call Trace:
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196370]  <IRQ> [<ffffffff81053bff>] warn_slowpath_common+0x7f/0xc0
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196371] [<ffffffff81594c52>] ? skb_release_data+0xf2/0x110
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196372] [<ffffffff81053c5a>] warn_slowpath_null+0x1a/0x20
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196373] [<ffffffff81594e9d>] skb_release_head_state+0xed/0x100
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196374] [<ffffffff81594c86>] __kfree_skb+0x16/0xa0
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196375] [<ffffffff8159521c>] consume_skb+0x2c/0x80
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196379] [<ffffffffa000b0af>] tg3_poll_work+0x5ef/0xdb0 [tg3]
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196384] [<ffffffffa000b055>] ? tg3_poll_work+0x595/0xdb0 [tg3]
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196388] [<ffffffffa00145cf>] tg3_poll+0x7f/0x390 [tg3]
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196392] [<ffffffffa000b927>] ? tg3_poll_msix+0xb7/0x140 [tg3]
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196394] [<ffffffff815b9622>] netpoll_poll_dev+0x162/0x580
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196395] [<ffffffff815b9bcc>] netpoll_send_skb_on_dev+0x18c/0x3a0
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196398] [<ffffffff815ba0f7>] netpoll_send_udp+0x277/0x290
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196400] [<ffffffffa03ae91f>] write_msg+0xaf/0x100 [netconsole]
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196401] [<ffffffff81054959>] call_console_drivers.constprop.16+0x99/0x100
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196403] [<ffffffff810553b9>] console_unlock+0x3d9/0x420
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196404] [<ffffffff81055ca5>] vprintk_emit+0x255/0x510
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196406] [<ffffffff8169f0b9>] printk+0x61/0x63
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196407] [<ffffffff81031e8e>] therm_throt_process+0x13e/0x180
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196408] [<ffffffff81032066>] intel_thermal_interrupt+0x196/0x1a0
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196410] [<ffffffff810320c1>] smp_thermal_interrupt+0x21/0x40
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196411] [<ffffffff816b1a1a>] thermal_interrupt+0x6a/0x70
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196413]  <EOI> [<ffffffff816b0e19>] ? system_call_fastpath+0x16/0x1b
> Feb 20 15:58:04 dbhost04 kernel: [1463997.196414] ---[ end trace e3ec69533a534ff5 ]---
> ...
> 
> After the last message I got this entries in syslog, too:
> Feb 20 15:58:04 dbhost04 kernel: [1464001.755218] CPU18: Core power limit normal
> Feb 20 15:58:04 dbhost04 kernel: [1464001.760038] Clocksource tsc unstable (delta = 299966106527 ns)
> Feb 20 15:58:04 dbhost04 kernel: [1464001.769627] Switching to clocksource hpet
> 
> I searched the archives for this error, but I can't find any solution.
> And my second PER620 doesn't show this error until now.
> 
> Have you any idea what this problem could be?
> 
> I'm not subscribed to lkml, if you need more information please contact me directly by email.
> 
> Many thanks for your help.
> Urban

CC netdev

I guess tg3 needs to call dev_kfree_skb_any()

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index bdb0869..22d9e44 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -5942,7 +5942,7 @@ static void tg3_tx(struct tg3_napi *tnapi)
 		pkts_compl++;
 		bytes_compl += skb->len;
 
-		dev_kfree_skb(skb);
+		dev_kfree_skb_any(skb);
 
 		if (unlikely(tx_bug)) {
 			tg3_tx_recover(tp);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Kernel 3.7.2 strange warning and short system hang
  2013-02-20 16:52 ` Kernel 3.7.2 strange warning and short system hang Eric Dumazet
@ 2013-02-20 17:57   ` David Miller
  2013-02-20 19:30     ` Eric Dumazet
  2013-02-22  9:49   ` Urban Loesch
  1 sibling, 1 reply; 4+ messages in thread
From: David Miller @ 2013-02-20 17:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bind, linux-kernel, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 20 Feb 2013 08:52:56 -0800

> I guess tg3 needs to call dev_kfree_skb_any()
> 
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index bdb0869..22d9e44 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -5942,7 +5942,7 @@ static void tg3_tx(struct tg3_napi *tnapi)
>  		pkts_compl++;
>  		bytes_compl += skb->len;
>  
> -		dev_kfree_skb(skb);
> +		dev_kfree_skb_any(skb);
>  
>  		if (unlikely(tx_bug)) {
>  			tg3_tx_recover(tp);

I've seen this pattern on several occasions and I have to wonder...

Do we really require, therefore, every NAPI driver to use dev_kfree_skb_any()
in it's TX reclaim if it supports netpoll?

That seems completely bogus.

netpoll is supposed to provide an execution environment when it invokes
->poll() that is identical to the normal NAPI execution.  If that would
be true, then this change above would be completely unnecessary.

We need to figure out what is the case here, and audit all the NAPI
drivers to make sure they do the right thing once we know what the
right thing actually is.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel 3.7.2 strange warning and short system hang
  2013-02-20 17:57   ` David Miller
@ 2013-02-20 19:30     ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2013-02-20 19:30 UTC (permalink / raw)
  To: David Miller; +Cc: bind, linux-kernel, netdev

On Wed, 2013-02-20 at 12:57 -0500, David Miller wrote:

> I've seen this pattern on several occasions and I have to wonder...
> 
> Do we really require, therefore, every NAPI driver to use dev_kfree_skb_any()
> in it's TX reclaim if it supports netpoll?
> 
> That seems completely bogus.
> 


> netpoll is supposed to provide an execution environment when it invokes
> ->poll() that is identical to the normal NAPI execution.  If that would
> be true, then this change above would be completely unnecessary.
> 
> We need to figure out what is the case here, and audit all the NAPI
> drivers to make sure they do the right thing once we know what the
> right thing actually is.

netpoll directly calls n->poll()

(poll_napi() -> poll_one_napi() -> napi->poll(napi, budget) )

Presumably it should not do that if running in interrupt context.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel 3.7.2 strange warning and short system hang
  2013-02-20 16:52 ` Kernel 3.7.2 strange warning and short system hang Eric Dumazet
  2013-02-20 17:57   ` David Miller
@ 2013-02-22  9:49   ` Urban Loesch
  1 sibling, 0 replies; 4+ messages in thread
From: Urban Loesch @ 2013-02-22  9:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev

Hi,

thanks for your help. I patched my kernel yesterday. Now I have to wait some days.
The error occurs not periodically. If it occurs again I let you now.

many thanks
Urban

On 20.02.2013 17:52, Eric Dumazet wrote:
> On Wed, 2013-02-20 at 17:10 +0100, Urban Loesch wrote:
>> Hi,
>>
>> today I had a strange system hang on one of our new Dell PER620 machines.
>> I'm running a self compiled kernel, version 3.7.2 with linux vserver patch included.
>>
>> uname -a
>> Linux dbhost04 3.7.2-vs2.3.5.5-rol-em64t #4 SMP Sun Feb 3 14:08:37 CET 2013 x86_64 GNU/Linux
>>
>> 15min. systemload between 1-3.
>>
>>
>> Today the system hangs for some seconds and I got the folling errors in syslog multiple times within one second:
>>
>> ...
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196338] WARNING: at net/core/skbuff.c:573 skb_release_head_state+0xed/0x100()
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196338] Hardware name: PowerEdge R620
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196352] Modules linked in: lru_cache netconsole configfs act_police cls_basic cls_flow cls_fw cls_u32
>> sch_tbf sch_prio sch_hfsc sch_htb sch_ingress sch_sfq xt_statistic xt_CT xt_realm xt_LOG xt_c
>> onnlimit iptable_raw xt_comment xt_nat xt_recent ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah nf_nat_tftp nf_nat_sip nf_nat_pptp
>> nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_conntrack_tftp nf_con
>> ntrack_sane nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink
>> nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc ts_kmp nf_conntrack_h323 nf_con
>> ntrack_amanda nf_conntrack_ftp xt_TPROXY xt_time nf_tproxy_core xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_NFLOG nfnetlink_log xt_physdev
>> xt_owner xt_NFQUEUE xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt
>> _hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY iptable_nat nf_nat_ipv
>> Feb 20 15:58:04 dbhost04 kernel: 4 nf_nat ip6t_REJECT nf_conntrack_ipv4 xt_tcpudp nf_defrag_ipv4 xt_state nf_conntrack_ipv6 nf_defrag_ipv6
>> xt_conntrack nf_conntrack iptable_mangle ip6table_raw ip6table_mangle nfnetlink ip6table_filter ip
>> 6_tables iptable_filter ip_tables x_tables ipmi_devintf ipmi_si ipmi_msghandler coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel xts aes_x86_64
>> lrw gf128mul ablk_helper cryptd iTCO_wdt iTCO_vendor_support dcdbas microcode pcspkr jo
>> ydev lpc_ich shpchp hed evbug hid_generic usbhid hid ahci libahci megaraid_sas tg3 [last unloaded: drbd]
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196368] Pid: 10942, comm: mysqld Tainted: G        W    3.7.2-vs2.3.5.5-rol-em64t #4
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196368] Call Trace:
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196370]  <IRQ> [<ffffffff81053bff>] warn_slowpath_common+0x7f/0xc0
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196371] [<ffffffff81594c52>] ? skb_release_data+0xf2/0x110
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196372] [<ffffffff81053c5a>] warn_slowpath_null+0x1a/0x20
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196373] [<ffffffff81594e9d>] skb_release_head_state+0xed/0x100
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196374] [<ffffffff81594c86>] __kfree_skb+0x16/0xa0
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196375] [<ffffffff8159521c>] consume_skb+0x2c/0x80
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196379] [<ffffffffa000b0af>] tg3_poll_work+0x5ef/0xdb0 [tg3]
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196384] [<ffffffffa000b055>] ? tg3_poll_work+0x595/0xdb0 [tg3]
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196388] [<ffffffffa00145cf>] tg3_poll+0x7f/0x390 [tg3]
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196392] [<ffffffffa000b927>] ? tg3_poll_msix+0xb7/0x140 [tg3]
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196394] [<ffffffff815b9622>] netpoll_poll_dev+0x162/0x580
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196395] [<ffffffff815b9bcc>] netpoll_send_skb_on_dev+0x18c/0x3a0
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196398] [<ffffffff815ba0f7>] netpoll_send_udp+0x277/0x290
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196400] [<ffffffffa03ae91f>] write_msg+0xaf/0x100 [netconsole]
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196401] [<ffffffff81054959>] call_console_drivers.constprop.16+0x99/0x100
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196403] [<ffffffff810553b9>] console_unlock+0x3d9/0x420
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196404] [<ffffffff81055ca5>] vprintk_emit+0x255/0x510
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196406] [<ffffffff8169f0b9>] printk+0x61/0x63
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196407] [<ffffffff81031e8e>] therm_throt_process+0x13e/0x180
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196408] [<ffffffff81032066>] intel_thermal_interrupt+0x196/0x1a0
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196410] [<ffffffff810320c1>] smp_thermal_interrupt+0x21/0x40
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196411] [<ffffffff816b1a1a>] thermal_interrupt+0x6a/0x70
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196413]  <EOI> [<ffffffff816b0e19>] ? system_call_fastpath+0x16/0x1b
>> Feb 20 15:58:04 dbhost04 kernel: [1463997.196414] ---[ end trace e3ec69533a534ff5 ]---
>> ...
>>
>> After the last message I got this entries in syslog, too:
>> Feb 20 15:58:04 dbhost04 kernel: [1464001.755218] CPU18: Core power limit normal
>> Feb 20 15:58:04 dbhost04 kernel: [1464001.760038] Clocksource tsc unstable (delta = 299966106527 ns)
>> Feb 20 15:58:04 dbhost04 kernel: [1464001.769627] Switching to clocksource hpet
>>
>> I searched the archives for this error, but I can't find any solution.
>> And my second PER620 doesn't show this error until now.
>>
>> Have you any idea what this problem could be?
>>
>> I'm not subscribed to lkml, if you need more information please contact me directly by email.
>>
>> Many thanks for your help.
>> Urban
>
> CC netdev
>
> I guess tg3 needs to call dev_kfree_skb_any()
>
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index bdb0869..22d9e44 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -5942,7 +5942,7 @@ static void tg3_tx(struct tg3_napi *tnapi)
>   		pkts_compl++;
>   		bytes_compl += skb->len;
>
> -		dev_kfree_skb(skb);
> +		dev_kfree_skb_any(skb);
>
>   		if (unlikely(tx_bug)) {
>   			tg3_tx_recover(tp);
>
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-02-22  9:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5124F57A.6080908@enas.net>
2013-02-20 16:52 ` Kernel 3.7.2 strange warning and short system hang Eric Dumazet
2013-02-20 17:57   ` David Miller
2013-02-20 19:30     ` Eric Dumazet
2013-02-22  9:49   ` Urban Loesch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).