public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
To: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: linux-kernel@vger.kernel.org, ebiederm@xmission.com
Subject: Re: Device hang when offlining a CPU due to IRQ misrouting
Date: Tue, 5 Jun 2007 10:23:10 -0700	[thread overview]
Message-ID: <20070605172310.GD17143@linux-os.sc.intel.com> (raw)
In-Reply-To: <20070601004427.GI30788@tree.beaverton.ibm.com>

On Thu, May 31, 2007 at 05:44:27PM -0700, Darrick J. Wong wrote:
> Hi there,
> 
> I'm seeing a driver hang with 2.6.22-rc3 while being slightly stupid
> about offlining CPUs.  I suspect that this problem extends beyond a
> particular machine, as I've been able to replicate it with an IBM x3650
> and an IBM x3755.  This is what I'm doing:
> 
> 1) I tie an IRQ to a particular CPU via /proc/irq/XXX/smp_affinity (IRQ
> 4341 is the network card and we're picking on CPU1 in this example):
> echo 2 > /proc/irq/4341/smp_affinity

Darrick, I see a kernel bug in this area(which is already filled with bugs,
and I am looking into ways to fix them). Are you making sure that
between step-1 and step-2, that interrupts actually started arriving at cpu1?

i.e., do step-1 and wait till the irq's start hitting at cpu1. At this point
do step-2 and let us know if you still hit this bug?

> 
> 2) I then take CPU1 offline:
> echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> 3) The kernel prints this:
> [ 1101.968040] Breaking affinity for irq 4341
> [ 1102.074019] CPU 1 is now offline
> [ 1102.081593] lockdep: not fixing up alternatives.
> [ 1112.886919] nfs: server 9.47.66.169 not responding, still trying
> 
> After step 2 the system never sees interrupts from the network card and
> remains hung like that until CPU1 is brought back up.  It looks as
> though the kernel is trying to reroute the IRQ (or so I'm assuming from
> the "Breaking affinity" message), but this doesn't ever happen, so the
> the kernel stops seeing interrupts from the device.
> 
> Granted, one should not be offlining the CPU that is currently
> designated to handle an IRQ, but I suspect that the kernel ought at a
> minimum to reject the offlining or route the IRQ to any online CPU
> instead of screwing things up.
> 
> There exists a similar scenario.  Set the IRQ affinity to a bunch of
> CPUs, watch /proc/interrupts to see which CPU is actually servicing the
> interrupts, then offline that CPU.  The kernel does not reroute the IRQ
> to any of the other CPUs and the device also hangs.

Is this a theory or did you observe this problem happening?

thanks,
suresh

> 
> The furthest that I've dug is that it works on 2.6.17 and is broken in
> 2.6.22-rc3 and 2.6.21.  Will git-bisect further, but I wanted to know if
> anyone else has seen this sort of problem.  afaik, this seems to happen
> with both IOAPIC and MSI interrupts, possibly more.

  parent reply	other threads:[~2007-06-05 17:26 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-01  0:44 Device hang when offlining a CPU due to IRQ misrouting Darrick J. Wong
2007-06-01 19:39 ` Eric W. Biederman
2007-06-05 17:23 ` Siddha, Suresh B [this message]
2007-06-05 17:36   ` Darrick J. Wong
2007-06-05 18:13     ` Siddha, Suresh B
2007-06-05 18:33       ` Darrick J. Wong
2007-06-05 18:40         ` Siddha, Suresh B
2007-06-05 20:09           ` Darrick J. Wong
2007-06-05 21:14             ` Siddha, Suresh B
2007-06-05 23:57               ` Darrick J. Wong
2007-06-06  1:37                 ` Siddha, Suresh B
2007-06-06 18:58                   ` Darrick J. Wong
2007-06-06 19:35                     ` Siddha, Suresh B
2007-06-06 23:16                       ` Darrick J. Wong
2007-06-08  0:57                         ` Siddha, Suresh B
2007-06-18 22:38                           ` Darrick J. Wong
2007-06-18 23:54                             ` Siddha, Suresh B
2007-06-19  0:51                               ` Darrick J. Wong
2007-06-19 17:54                                 ` Eric W. Biederman
2007-06-19 18:00                                   ` Siddha, Suresh B
2007-06-19 18:55                                     ` Eric W. Biederman
2007-06-19 19:06                                     ` Darrick J. Wong
2007-06-19 19:59                                       ` Siddha, Suresh B
2007-06-19 20:49                                         ` Darrick J. Wong
2007-06-19 22:08                                           ` Siddha, Suresh B
2007-06-23 23:54                                             ` Rafael J. Wysocki
2007-06-23 23:58                                               ` Andrew Morton
2007-06-24  0:45                                                 ` Eric W. Biederman
2007-06-24  0:51                                                   ` Siddha, Suresh B
2007-06-24 12:50                                                   ` Rafael J. Wysocki
2007-06-24  0:28                                               ` Siddha, Suresh B
2007-06-24 12:48                                                 ` Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2007-06-01 21:57 Emmanuel Fusté
2007-06-02  0:18 ` Eric W. Biederman
2007-06-02  2:19   ` Darrick J. Wong
2007-06-02  3:48     ` Eric W. Biederman
2007-06-03 21:03 Emmanuel Fusté

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070605172310.GD17143@linux-os.sc.intel.com \
    --to=suresh.b.siddha@intel.com \
    --cc=djwong@us.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox