netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Francois Romieu <romieu@fr.zoreil.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net 2/2]  8139cp: reset BQL when ring tx ring cleared
Date: Thu, 17 Sep 2015 12:36:33 +0100	[thread overview]
Message-ID: <1442489793.131189.142.camel@infradead.org> (raw)
In-Reply-To: <20150914215913.GA24962@electric-eye.fr.zoreil.com>

[-- Attachment #1: Type: text/plain, Size: 8925 bytes --]

On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote:
> 
> [...]
> > [308309.574551] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status 
> c   2b    0 80ff
> 
> Rx and Tx are enabled.
> 
> Instant (untested) hack below.

Thanks; I'll try that. In fact since updating to 4.2 the problem has
got worse — now the whole machine dies:

[  232.064630] ------------[ cut here ]------------                             
[  232.069282] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x1e5/0x200()                                                                 
[  232.077840] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out       
[  232.084380] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586                                                               
[  232.157787] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-gx+ #25            
[  232.163982]  00000000 c10313eb dee95000 00000000 ffffff32 00000258 c1031446 00000009                                                                         
[  232.171988]  dec0bf74 c13c3afc dec0bf8c c1272ef5 c13bfe82 0000012f c13c3afc dee95000                                                                         
[  232.179978]  e04cfd3c 00000000 dee95000 dee95240 00000258 80000100 c1272d10 dee95000                                                                         
[  232.188012] Call Trace:                                                      
[  232.190482]  [<c10313eb>] ? warn_slowpath_common+0x5b/0x90                   
[  232.196063]  [<c1031446>] ? warn_slowpath_fmt+0x26/0x30                      
[  232.201307]  [<c1272ef5>] ? dev_watchdog+0x1e5/0x200                         
[  232.206317]  [<c1272d10>] ? qdisc_rcu_free+0x30/0x30                         
[  232.211307]  [<c105b19e>] ? call_timer_fn.isra.7+0xe/0x60                    
[  232.216811]  [<c1272d10>] ? qdisc_rcu_free+0x30/0x30                         
[  232.221794]  [<c105b33d>] ? run_timer_softirq+0xfd/0x1b0                     
[  232.227221]  [<c1033777>] ? __do_softirq+0xa7/0x190                          
[  232.232117]  [<c10336d0>] ? __hrtimer_tasklet_trampoline+0x20/0x20           
[  232.238395]  [<c100336b>] ? do_softirq_own_stack+0x1b/0x20                   
[  232.243881]  <IRQ>  [<c1003165>] ? do_IRQ+0x35/0xa0                          
[  232.248904]  [<c131e2a9>] ? common_interrupt+0x29/0x30                       
[  232.254141]  [<c104007b>] ? put_unbound_pool+0x17b/0x1a0                     
[  232.259470]  [<c1008bf2>] ? default_idle+0x2/0x10                            
[  232.264213]  [<c1009106>] ? arch_cpu_idle+0x6/0x10                           
[  232.269026]  [<c104eee5>] ? cpu_startup_entry+0xf5/0x190                     
[  232.274459]  [<c142a9b9>] ? start_kernel+0x2e5/0x2e8                         
[  232.279432] ---[ end trace 30ae4e701c36b431 ]---                             
[  232.284167] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status  c   2b    1 80ac                                                                             
[  260.106382] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:0]
[  260.113515] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586                                                               
[  260.116369] CPU: 0 PID: 0 Comm: swapper Tainted: G        W       4.2.0-gx+ #25                                                                              
[  260.116369] task: c13f7540 ti: c13f0000 task.ti: c13f0000                    
[  260.116369] EIP: 0060:[<c131d6fa>] EFLAGS: 00200292 CPU: 0                   
[  260.116369] EIP is at _raw_spin_unlock_irqrestore+0xa/0x10                   
[  260.116369] EAX: 00000000 EBX: 00200292 ECX: e04d8100 EDX: 00200292          
[  260.116369] ESI: 00000000 EDI: ffffff32 EBP: 00000258 ESP: dec0bf90          
[  260.116369]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068                    
[  260.116369] CR0: 8005003b CR2: 098c4c50 CR3: 1ef88000 CR4: 00000090          
[  260.116369] Stack:                                                           
[  260.116369]  dee95000 c1272ecb dee95000 dee95240 00000258 80000100 c1272d10 dee95000                                                                         
[  260.116369]  00000100 c105b19e 00000410 c1272d10 c105b33d 00000000 c146d524 00000001                                                                         
[  260.116369]  00000002 c1033777 00000002 ffffe577 c13f2000 0000000a 00200000 c13f1f70                                                                         
[  260.116369] Call Trace:                                                      
[  260.116369]  [<c1272ecb>] ? dev_watchdog+0x1bb/0x200                         
[  260.116369]  [<c1272d10>] ? qdisc_rcu_free+0x30/0x30                         
[  260.116369]  [<c105b19e>] ? call_timer_fn.isra.7+0xe/0x60                    
[  260.116369]  [<c1272d10>] ? qdisc_rcu_free+0x30/0x30                         
[  260.116369]  [<c105b33d>] ? run_timer_softirq+0xfd/0x1b0                     
[  260.116369]  [<c1033777>] ? __do_softirq+0xa7/0x190                          
[  260.116369]  [<c10336d0>] ? __hrtimer_tasklet_trampoline+0x20/0x20           
[  260.116369]  [<c100336b>] ? do_softirq_own_stack+0x1b/0x20                   
[  260.116369]  <IRQ>                                                           
[  260.116369]  [<c1003165>] ? do_IRQ+0x35/0xa0                                 
[  260.116369]  [<c131e2a9>] ? common_interrupt+0x29/0x30                       
[  260.116369]  [<c104007b>] ? put_unbound_pool+0x17b/0x1a0                     
[  260.116369]  [<c1008bf2>] ? default_idle+0x2/0x10                            
[  260.116369]  [<c1009106>] ? arch_cpu_idle+0x6/0x10                           
[  260.116369]  [<c104eee5>] ? cpu_startup_entry+0xf5/0x190                     
[  260.116369]  [<c142a9b9>] ? start_kernel+0x2e5/0x2e8                         
[  260.116369] Code: 00 b8 01 00 00 00 c3 8d 76 00 8d bc 27 00 00 00 00 e8 db 2e d3 ff c3 8d 76 00 8d bc 27 00 00 00 00 53 89 d3 e8 c8 2e d3 ff 53 9d <5b> c3 8d 74 26 00 e8 bb 2e d3 ff fb c3 89 f6 8d bc 27 00 00 00                          

At this point even sysrq on the serial console isn't working; I'm
going to have to go and physically reset it.

(gdb) list *dev_watchdog+0x1bb
0xc1272ecb is in dev_watchdog (net/sched/sch_generic.c:304).
299				}
300	
301				if (some_queue_timedout) {
302					WARN_ONCE(1, KERN_INFO
"NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
303					       dev->name,
netdev_drivername(dev), i);
304					dev->netdev_ops
->ndo_tx_timeout(dev);
305				}
306				if (!mod_timer(&dev->watchdog_timer,
307					       round_jiffies(jiffies
+
308							     dev
->watchdog_timeo)))

So that _raw_spin_unlock_irqrestore() is going to be a tailcall from
the end of cp_tx_timeout(). Moderately confused that it dies on *unlock*.

_raw_spin_lock_irqrestore+0xa is the instruction after a 'popf', which
makes me wonder if it dies in an IRQ storm. Although then *some* of the
NMI watchdog invocations would surely show an IRQ on the stack, but
they
don't; they're all right on the same instruction.


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

  reply	other threads:[~2015-09-17 11:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-20 16:54 [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared Stephen Hemminger
2013-05-20 21:03 ` David Miller
2013-05-20 21:37 ` Francois Romieu
2013-05-21  0:27   ` Stephen Hemminger
2015-09-14 12:05     ` David Woodhouse
2015-09-14 21:59       ` Francois Romieu
2015-09-17 11:36         ` David Woodhouse [this message]
2015-09-17 15:06           ` David Woodhouse
2015-09-17 20:44             ` Francois Romieu
2015-09-17 22:14               ` David Woodhouse
2015-09-17 23:44                 ` Francois Romieu
2015-09-18  1:04                   ` David Woodhouse
2015-09-18  7:40                     ` David Woodhouse
2015-09-18 22:42                   ` David Woodhouse
2015-09-17 12:02         ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1442489793.131189.142.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=romieu@fr.zoreil.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).