From: Michael Breuer <mbreuer@majjas.com>
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Berck E. Nash" <flyboy@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
netdev@vger.kernel.org
Subject: Re: audit.c skb - tty race condition - was sky2 panic in 2.6.32.1 under load (new oops)
Date: Wed, 30 Dec 2009 15:44:20 -0500 [thread overview]
Message-ID: <4B3BBBA4.5090400@majjas.com> (raw)
In-Reply-To: <4B3BA6E6.40906@majjas.com>
A couple more observations:
1) enabling auditd for runlevel 3 mitigates the issue
2) starting a remote x session (XDMCP) while under load and while auditd
is already running also triggers the sky2 interrupt status messages - so
maybe not tty1 - but some sort of X & auditd interaction. Even in this
case, the frequency of the error messages is much less than when auditd
is started in runlevel 5 for the first time.
On 12/30/2009 2:15 PM, Michael Breuer wrote:
> And now looking at audit.c it seems reasonable that there is a race
> condition when auditd is started at roughly the same time as X. I'm
> guessing that the kaudit thread is fired up; the tty connected; and at
> the same time X grabs the tty. Somewhere in there an skb gets hosed
> and is then reused by whatever comes along - in my case sky2 as that's
> where the subsequent demand is. If the demand happens first, the
> contaminated skb (dk in what way yet) is probably waiting to manifest
> as some other bug that's been frustrating people.
> On 12/30/2009 12:49 PM, Michael Breuer wrote:
>> On 12/30/2009 2:58 AM, Stephen Hemminger wrote:
>>> On Wed, 30 Dec 2009 02:23:20 -0500
>>> Michael Breuer<mbreuer@majjas.com> wrote:
>>>
>>>> Ok - I called dump_txring from sky2_net_intr:
>>>> --- a/drivers/net/sky2.c
>>>> +++ b/drivers/net/sky2.c
>>>> @@ -2725,8 +2791,10 @@ static void sky2_watchdog(unsigned long arg)
>>>> /* Hardware/software error handling */
>>>> static void sky2_err_intr(struct sky2_hw *hw, u32 status)
>>>> {
>>>> - if (net_ratelimit())
>>>> + if (net_ratelimit()) {
>>>> dev_warn(&hw->pdev->dev, "error interrupt
>>>> status=%#x\n", status);
>>>> + dump_txring(hw, 0);
>>>> + }
>>>>
>>>> if (status& Y2_IS_HW_ERR)
>>>> sky2_hw_intr(hw);
>>>>
>>>> And got this:
>>>> Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt
>>>> status=0x40000008
>>>> Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt
>>>> status=0x40000008
>>>> Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=28...30 report=29
>>>> done=29
>>>> Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=28...30 report=29
>>>> done=29
>>>> Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt
>>>> status=0x8
>>>> Dec 30 02:17:23 mail kernel: sky2 0000:06:00.0: error interrupt
>>>> status=0x8
>>>> Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=30...32 report=30
>>>> done=31
>>>> Dec 30 02:17:23 mail kernel: sky2 Tx ring pending=30...32 report=30
>>>> done=31
>>>>
>>> I notice that you have NOUVEAU Nvidia drivers loaded? The one
>>> difference in HW
>>> between your board and mine is that I have ATI video card.
>>>
>> Seems the problem is linked to auditd and X11 (but not nouveau).
>>
>> Today, I ran a bunch of scenarios. I first determined that the
>> problem only manifest in runlevel 5. Next, this occurred with or
>> without KMS and with or without nouveau. This happened whether or not
>> I was logged in (local or remote), and regardless of window manager
>> (xdm, gdm, kdm). I then checked to see what else was different
>> between runlevel 3 and 5 - only thing was auditd. I disabled auditd
>> and reran - no errors.
>>
>> Now for the odd stuff:
>>
>> The errors only manifest if the high throughput data transfer is
>> initiated when the system is in runlevel 5 and auditd was started by
>> init when transitioning from runlevel 3 to 5. For example, the
>> following scenarios do not cause the errors to manifest:
>>
>> runlevel3; start auditd runlevel 5; start transfer
>> runlevel3; chkconfig auditd off; runlevel5; start auditd; start transfer
>> runlevel3; start transfer (note: errors do not occur if I transition
>> to runlevel 5 after the high bandwidth transfer has started)
>> runlevel3; startx; start transfer
>>
>> The only way I get the problem to manifest is transition to runlevel
>> 5 with chkconfig auditd on (level 5 only) and then initate the
>> windows backup.
>>
>> I'm guessing that there is some sort of race condition happening
>> between X (xdm/gdm/kdm/greeter?) and auditd that is somehow
>> corrupting something. I'd hazard a more or less obvious guess that
>> whatever's being corrupted differs when there is already a high
>> throughput transfer under way.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2009-12-30 20:44 UTC|newest]
Thread overview: 145+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-21 23:52 sky2 panic in 2.6.32.1 under load Berck E. Nash
2009-12-22 0:09 ` Michael Breuer
2009-12-22 18:50 ` Michael Breuer
2009-12-23 22:54 ` sky2 panic in 2.6.32.1 under load (new oops) Michael Breuer
2009-12-24 7:01 ` Andrew Morton
2009-12-24 19:18 ` Michael Breuer
2009-12-24 22:27 ` Stephen Hemminger
2009-12-25 16:28 ` Michael Breuer
2009-12-25 23:22 ` Stephen Hemminger
2009-12-26 3:23 ` Michael Breuer
2009-12-26 17:57 ` Stephen Hemminger
2009-12-26 20:37 ` Michael Breuer
2009-12-26 22:05 ` [PATCH] sky2: make sure ethernet header is in transmit skb Stephen Hemminger
2009-12-27 3:44 ` David Miller
2009-12-27 4:11 ` David Miller
2010-01-04 5:32 ` David Miller
2010-01-04 16:40 ` Stephen Hemminger
2010-01-04 17:02 ` Michael Breuer
2010-01-05 23:07 ` [PATCH] af_packet: Don't use skb after dev_queue_xmit() Jarek Poplawski
2010-01-05 23:16 ` Michael Breuer
2010-01-05 23:29 ` Jarek Poplawski
2010-01-06 2:36 ` Michael Breuer
2010-01-06 7:22 ` Jarek Poplawski
2010-01-06 9:15 ` [PATCH alt.2] " Jarek Poplawski
2010-01-06 14:49 ` Stephen Hemminger
2010-01-06 19:40 ` Jarek Poplawski
2010-01-06 19:49 ` [PATCH] " Michael Breuer
2010-01-06 20:22 ` Jarek Poplawski
2010-01-06 20:33 ` Michael Breuer
2010-01-06 21:09 ` Jarek Poplawski
2010-01-06 21:32 ` Michael Breuer
2010-01-06 21:10 ` Stephen Hemminger
2010-01-06 21:20 ` Michael Breuer
2010-01-06 23:26 ` Michael Breuer
2010-01-07 2:42 ` Michael Breuer
2010-01-07 4:00 ` Michael Breuer
2010-01-07 4:53 ` Stephen Hemminger
2010-01-07 5:10 ` Michael Breuer
2010-01-07 5:32 ` Michael Breuer
2010-01-07 5:54 ` Michael Breuer
2010-01-07 7:20 ` Michael Breuer
2010-01-07 7:47 ` Jarek Poplawski
2010-01-07 7:55 ` Michael Breuer
2010-01-07 8:21 ` Jarek Poplawski
2010-01-07 15:03 ` Michael Breuer
2010-01-07 17:56 ` Jarek Poplawski
2010-01-07 18:17 ` Jarek Poplawski
2010-01-07 15:05 ` Michael Breuer
2010-01-07 18:01 ` Jarek Poplawski
2010-01-07 18:19 ` Michael Breuer
2010-01-07 18:35 ` Jarek Poplawski
2010-01-07 18:40 ` Michael Breuer
2010-01-07 18:43 ` Michael Breuer
2010-01-07 18:50 ` Jarek Poplawski
2010-01-07 19:36 ` Jarek Poplawski
2010-01-07 19:55 ` Michael Breuer
2010-01-07 20:22 ` Jarek Poplawski
2010-01-07 23:11 ` Michael Breuer
2010-01-08 7:45 ` Jarek Poplawski
2010-01-08 16:40 ` Michael Breuer
2010-01-08 21:29 ` Jarek Poplawski
2010-01-08 21:48 ` Michael Breuer
2010-01-08 22:02 ` Jarek Poplawski
2010-01-09 4:45 ` Michael Breuer
2010-01-09 5:44 ` Michael Breuer
2010-01-09 12:28 ` Jarek Poplawski
2010-01-09 18:34 ` Michael Breuer
2010-01-13 20:39 ` Michael Breuer
2010-01-13 21:09 ` Jarek Poplawski
2010-01-13 21:16 ` Michael Breuer
2010-01-13 21:34 ` Jarek Poplawski
2010-01-17 16:26 ` Michael Breuer
2010-01-17 22:17 ` Jarek Poplawski
2010-01-17 22:34 ` Michael Breuer
2010-01-17 23:05 ` Jarek Poplawski
2010-01-17 23:15 ` Michael Breuer
2010-01-18 7:30 ` Jarek Poplawski
2010-01-18 16:29 ` Michael Breuer
2010-01-18 20:46 ` Jarek Poplawski
2010-01-18 20:56 ` Michael Breuer
2010-01-18 21:00 ` Stephen Hemminger
2010-01-18 21:06 ` Jarek Poplawski
2010-01-18 21:24 ` Michael Breuer
2010-01-18 21:50 ` Jarek Poplawski
2010-01-18 21:25 ` Jarek Poplawski
2010-01-18 21:39 ` Michael Breuer
2010-01-18 22:08 ` Jarek Poplawski
2010-01-18 22:17 ` Jarek Poplawski
2010-01-18 22:47 ` Michael Breuer
2010-01-19 5:46 ` Michael Breuer
2010-01-19 8:41 ` Jarek Poplawski
2010-01-19 15:28 ` Michael Breuer
2010-01-21 19:48 ` Michael Breuer
2010-01-19 10:47 ` Jarek Poplawski
2010-01-19 15:47 ` Michael Breuer
2010-01-19 19:59 ` Jarek Poplawski
2010-01-19 20:06 ` Michael Breuer
2010-01-19 20:29 ` Jarek Poplawski
2010-01-19 22:45 ` Jarek Poplawski
2010-01-20 1:01 ` Michael Breuer
2010-01-20 1:10 ` Stephen Hemminger
2010-01-21 16:14 ` Stefan Richter
2010-01-21 16:50 ` Stefan Richter
2010-01-18 22:25 ` Michael Breuer
2010-01-18 22:40 ` Jarek Poplawski
2009-12-27 17:03 ` sky2 panic in 2.6.32.1 under load (new oops) Michael Breuer
2009-12-27 18:22 ` Stephen Hemminger
2009-12-27 19:39 ` Michael Breuer
2009-12-29 17:30 ` Stephen Hemminger
2009-12-29 17:39 ` Michael Breuer
2009-12-29 18:38 ` Michael Breuer
2009-12-29 18:54 ` Michael Breuer
2009-12-29 19:49 ` Stephen Hemminger
2009-12-29 20:41 ` Michael Breuer
2009-12-30 7:23 ` Michael Breuer
2009-12-30 7:58 ` Stephen Hemminger
2009-12-30 17:49 ` Michael Breuer
2009-12-30 19:15 ` audit.c skb - tty race condition - was " Michael Breuer
2009-12-30 20:44 ` Michael Breuer [this message]
2009-12-30 21:15 ` Michael Breuer
2009-12-30 21:21 ` Michael Breuer
2009-12-30 7:59 ` Stephen Hemminger
2009-12-30 15:40 ` Michael Breuer
2009-12-30 18:10 ` Stephen Hemminger
2009-12-30 18:37 ` Michael Breuer
2009-12-31 18:09 ` Michael Breuer
2009-12-31 18:24 ` Stephen Hemminger
2010-01-01 17:42 ` Michael Breuer
2010-01-01 19:26 ` sky2 panic in 2.6.32.1 under load (tty NULL write) Michael Breuer
2010-01-01 20:34 ` Michael Breuer
2010-01-02 21:42 ` Michael Breuer
2009-12-29 19:15 ` sky2 panic in 2.6.32.1 under load (new oops) Jarek Poplawski
2009-12-29 19:20 ` Michael Breuer
2009-12-30 8:07 ` Stephen Hemminger
2009-12-30 15:36 ` Michael Breuer
2009-12-22 0:52 ` sky2 panic in 2.6.32.1 under load Daniel Hazelton
2009-12-24 6:58 ` Andrew Morton
2009-12-24 16:03 ` Berck Nash
2009-12-24 16:28 ` Daniel Hazelton
2009-12-24 22:21 ` Stephen Hemminger
2009-12-24 22:42 ` Michael Breuer
2009-12-25 0:06 ` Daniel Hazelton
2009-12-24 16:10 ` Michael Breuer
2009-12-24 16:16 ` Berck Nash
2009-12-24 16:26 ` Michael Breuer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B3BBBA4.5090400@majjas.com \
--to=mbreuer@majjas.com \
--cc=akpm@linux-foundation.org \
--cc=flyboy@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.