All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lekensteyn <lekensteyn@gmail.com>
To: Xiaotian Feng <dannyfeng@tencent.com>,
	"David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Subject: [REGRESSION,v3.7-rc5,bisected] 100% CPU usage in softirqd, unable to shutdown
Date: Mon, 12 Nov 2012 22:20:59 +0100	[thread overview]
Message-ID: <5934660.4HuQirPM4Z@al> (raw)

Hi,

After upgrading from 3.7-rc4 to 3.7-rc5 I found that I was unable to suspend 
without locking up the system afterwards. Neither was I able to shutdown as it 
would simply hang where it should halt. The second suspend/resume in a session 
would make Networkmanager hang.

When looking in my process list, I saw that softirqd was using one full CPU 
core. Watching the contents of /proc/softirqs showed that the tasklet number 
would rapidly increase.

I got this message when trying to suspend for the second time in a session:

 Freezing user space processes ... 
 Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, 
wq_busy=0):
 NetworkManager  R  running task        0   332      1 0x00000004
  ffff88023169d628 ffffffff81549b86 ffff8802316e4470 ffff88023169dfd8
  ffff88023169dfd8 ffff88023169dfd8 ffff8802316e4470 ffff8802316e4470
  ffff88023169d698 ffff88023bc92a80 ffff88022fd3db70 ffff88022fd3dc90
 Call Trace:
  [<ffffffff81549b9e>] ? __schedule+0x13e/0x760
  [<ffffffff8154a4f9>] schedule+0x29/0x70
  [<ffffffff810722aa>] sys_sched_yield+0x4a/0x60
  [<ffffffff8154a7c2>] yield+0x32/0x40
  [<ffffffff810451f5>] tasklet_kill+0x35/0x80
  [<ffffffffa017c2f3>] jme_close+0xd3/0x850 [jme]
  [<ffffffff8146325d>] __dev_close_many+0x7d/0xc0
  [<ffffffff814632cd>] __dev_close+0x2d/0x40
  [<ffffffff81469551>] __dev_change_flags+0xa1/0x180
  [<ffffffff814696e8>] dev_change_flags+0x28/0x70
  [<ffffffff81475b68>] do_setlink+0x378/0xa00
  [<ffffffff81078a56>] ? find_busiest_group+0x36/0x490
  [<ffffffff812d5821>] ? nla_parse+0x31/0xe0
  [<ffffffff812d5821>] ? nla_parse+0x31/0xe0
  [<ffffffff81477e6e>] rtnl_newlink+0x36e/0x590
  [<ffffffff81286e16>] ? apparmor_capable+0x26/0x90
  [<ffffffff81477694>] rtnetlink_rcv_msg+0x114/0x300
  [<ffffffff8114d1c3>] ? __kmalloc_node_track_caller+0x63/0x1b0
  [<ffffffff8145b65b>] ? __alloc_skb+0x8b/0x290
  [<ffffffff81477580>] ? __rtnl_unlock+0x20/0x20
  [<ffffffff8148f471>] netlink_rcv_skb+0xb1/0xc0
  [<ffffffff814748f5>] rtnetlink_rcv+0x25/0x40
  [<ffffffff8148ed8b>] netlink_unicast+0x19b/0x220
  [<ffffffff8148f111>] netlink_sendmsg+0x301/0x3c0
  [<ffffffff8144f8ec>] sock_sendmsg+0xbc/0xf0
  [<ffffffff81450797>] ? sock_recvmsg+0xd7/0x110
  [<ffffffff8145045c>] __sys_sendmsg+0x3ac/0x3c0
  [<ffffffff810854dc>] ? ktime_get_ts+0x4c/0xf0
  [<ffffffff81452699>] sys_sendmsg+0x49/0x90
  [<ffffffff81553906>] system_call_fastpath+0x1a/0x1f

Bisecting leads to:
commit 175c0dffef310fc7d7f026ca4a7682beb2fbd8ec
Author: Xiaotian Feng <xtfeng@gmail.com>
Date:   Wed Oct 31 00:29:57 2012 +0000

    drivers/net: use tasklet_kill in device remove/close process
    
    Some driver uses tasklet_disable in device remove/close process,
    tasklet_disable will inc tasklet->count and return. If the tasklet
    is not handled yet because some softirq pressure, the tasklet will
    placed on the tasklet_vec, never have a chance to excute. This might
    lead to ksoftirqd heavy loaded, wakeup with pending_softirq, but
    tasklet is disabled. tasklet_kill should be used in this case.
    
    Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller <davem@davemloft.net>
(if it wasn't obvious, I have an Ethernet device that needs the "jme" driver, 
04:00.5 Ethernet controller [0200]: JMicron Technology Corp. JMC250 PCI 
Express Gigabit Ethernet Controller [197b:0250] (rev 03))

Since 3.7, I sometimes get the below messages during suspend, but it would 
never hang:
smpboot: CPU 2 is now offline                                                                                                                                                                                                    
NOHZ: local_softirq_pending 02                                                                                                                                                                                                   
NOHZ: local_softirq_pending 202                                                                                                                                                                                                  
NOHZ: local_softirq_pending 202
NOHZ: local_softirq_pending 02
NOHZ: local_softirq_pending 202
NOHZ: local_softirq_pending 202
smpboot: CPU 3 is now offline

Time for a revert or do you have an other proposed fix?

Regards,
Peter

             reply	other threads:[~2012-11-12 21:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-12 21:20 Lekensteyn [this message]
2012-11-13 10:26 ` [REGRESSION, v3.7-rc5, bisected] 100% CPU usage in softirqd, unable to shutdown(Internet mail) dannyfeng(冯小天)
2012-11-13 18:25   ` Lekensteyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5934660.4HuQirPM4Z@al \
    --to=lekensteyn@gmail.com \
    --cc=dannyfeng@tencent.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.