All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Paris <eparis@redhat.com>
To: Peter Moody <pmoody@google.com>
Cc: linux-audit@redhat.com
Subject: Re: Kernel oops+crash on repeated auditd restarts
Date: Thu, 05 Apr 2012 17:07:01 -0400	[thread overview]
Message-ID: <1333660021.2273.0.camel@localhost> (raw)
In-Reply-To: <CALnj_=44rFZBci99ns8Y8_0ZAJ9CEQ1N+bLFfHaA_3cHAcJN1Q@mail.gmail.com>

please please please keep on list.  Everything you say might help track
it down!

On Thu, 2012-04-05 at 14:03 -0700, Peter Moody wrote:
> (please let me know if I should take this off-list)
> 
> One other thing (again, maybe already known), but this seems to be
> exacerbated by SMP. On my machine, I can't reproduce the crash if I
> booth with maxcpus=1.
> 
> Still hunting.
> 
> Cheers,
> peter
> 
> On Tue, Apr 3, 2012 at 9:15 AM, Peter Moody <pmoody@google.com> wrote:
> > This may already be known, but the issue seems to be limited to watch
> > rules. With any watch rules, I can reliably crash my machine while
> > freeing a watch rule after only starting/stopping auditd a few times.
> > With no watch rules, I have no issues.
> >
> > Cheers,
> > peter
> >
> > On Wed, Mar 28, 2012 at 11:44 PM, Valentin Avram <aval13@gmail.com> wrote:
> >> Yes, i know that patch. It made it into kernel 3.2.2. I tested it
> >> successfully (oops in 3.2.1, no oops in 3.2.9), but this oops i'm seeing is
> >> also in 3.2.9.
> >>
> >> I monitored changelogs since 3.2.1 to 3.2.12 but there were no fixes either
> >> in audit subsystem or in fsnotify. I'll try to reproduce in latest 3.2.13
> >> and repost the oops, but i'm 99% confident it will be the same.
> >>
> >> Sadly nobody except you seems to pay attention to this problem, probably
> >> because it requires special conditions to reproduce (really, who starts and
> >> stops auditd every 5 seconds on a production server?). We only ran into it
> >> because one of our servers would randomly oops and then freeze about each
> >> month after stopping and then starting
> >>
> >> auditd
> >>
> >> every morning (and the stop-start sequence was needed to workaround a bug
> >> somewhere that would hang a
> >>
> >> gzip
> >>
> >> running on a file outside a watched folder).
> >>
> >> Anyway, as a last note, i have a feeling that the oops is not exactly
> >> random, there is a pattern, just that i haven't figured it out completely
> >> yet.
> >>
> >> Will keep you
> >>
> >> uptodate
> >>
> >> with the things i find out.
> >>
> >> V.
> >>
> >> On Mar 29, 2012 4:14 AM, "Eric Paris" <eparis@redhat.com> wrote:
> >>>
> >>> That patch fixes a BUG() .  The report has a NULL ptr deref and some
> >>> apparent list correuption....  Sadly they aren't the same....
> >>>
> >>> On Wed, 2012-03-28 at 15:42 -0700, Peter Moody wrote:
> >>> > fyi: this patch [1] seems to fix the issue for me. The explanation in
> >>> > the subject would reliably oops my machine.
> >>> >
> >>> > [1]
> >>> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fed474857efbed79cd390d0aee224231ca718f63
> >>> >
> >>> > On Wed, Mar 28, 2012 at 1:51 PM, Peter Moody <pmoody@google.com> wrote:
> >>> > > Are you still able to reliably reproduce this oops? I'm trying to
> >>> > > track this down because this bug (or a very similar bug) is causing
> >>> > > some significant headaches here at work, but I haven't had a lot of
> >>> > > luck. I'm using usermode linux, though, so that might be interfering
> >>> > > with things.
> >>> > >
> >>> > > On Mon, Mar 5, 2012 at 12:35 AM, Valentin Avram <aval13@gmail.com>
> >>> > > wrote:
> >>> > >> Finally i found some time and spare server to retest the oops and
> >>> > >> list_add
> >>> > >> corruptions i was getting with the 3.x kernels and auditd 2.1.3.
> >>> > >>
> >>> > >> I tested now with gentoo's latest stable 3.2.1-gentoo-r2 and
> >>> > >> kernel.org's
> >>> > >> 3.2.9.
> >>> > >>
> >>> > >> Both get the oops/BUG in the same way and after that, they keep
> >>> > >> pouring
> >>> > >> list_add corruptions with audit_prune_tre(truncated?) and auditctl as
> >>> > >> comms.
> >>> > >>
> >>> > >> Since this is not about Gentoo's kernel only, i'll post here the oops
> >>> > >> in
> >>> > >> 3.2.9 and also attach some list_add corruptions.
> >>> > >>
> >>> > >> 3.2.9 BUG:
> >>> > >>
> >>> > >> kernel: [  301.240011] BUG: unable to handle kernel NULL pointer
> >>> > >> dereference
> >>> > >> at   (null)
> >>> > >> kernel: [  301.240305] IP: [<c1238dd0>] __list_del_entry+0x20/0xe0
> >>> > >> kernel: [  301.240481] *pdpt = 0000000000000000 *pde =
> >>> > >> f000ddc8f000ddc8
> >>> > >> kernel: [  301.240698] Oops: 0000 [#1] SMP
> >>> > >> kernel: [  301.240910]
> >>> > >> kernel: [  301.241030] Pid: 642, comm: fsnotify_mark Not tainted
> >>> > >> 3.2.9-drbd-version3 #1 Dell Inc. PowerEdge 2950/0CX396
> >>> > >> kernel: [  301.241370] EIP: 0060:[<c1238dd0>] EFLAGS: 00010287 CPU: 6
> >>> > >> kernel: [  301.241498] EIP is at __list_del_entry+0x20/0xe0
> >>> > >> kernel: [  301.241623] EAX: f4fae544 EBX: f47cffa4 ECX: ffffffff EDX:
> >>> > >> 00000000
> >>> > >> kernel: [  301.241751] ESI: f4fae544 EDI: f4fae508 EBP: f47cff7c ESP:
> >>> > >> f47cff64
> >>> > >> kernel: [  301.241879]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >>> > >> kernel: [  301.242005] Process fsnotify_mark (pid: 642, ti=f47ce000
> >>> > >> task=f4f47c00 task.ti=f47ce000)
> >>> > >> kernel: [  301.242207] Stack:
> >>> > >> kernel: [  301.242327]  c10813c0 f47cffa4 f4f47c00 f4e70888 f47cff7c
> >>> > >> f47cffa4 f47cffb8 c10f6976
> >>> > >> kernel: [  301.242882]  ffffffc3 f4f47c00 f4f47c00 00000000 f4f47c00
> >>> > >> c10530c0 f47cff9c f47cff9c
> >>> > >> kernel: [  301.243438]  f4fae544 f4fae544 f4c47f58 00000000 c10f68f0
> >>> > >> f47cffe4 c1052834 00000000
> >>> > >> kernel: [  301.243995] Call Trace:
> >>> > >> kernel: [  301.244119]  [<c10813c0>] ?
> >>> > >> rcu_check_callbacks+0x110/0x110
> >>> > >> kernel: [  301.244248]  [<c10f6976>] fsnotify_mark_destroy+0x86/0x120
> >>> > >> kernel: [  301.244377]  [<c10530c0>] ? abort_exclusive_wait+0x80/0x80
> >>> > >> kernel: [  301.244504]  [<c10f68f0>] ? fsnotify_put_mark+0x30/0x30
> >>> > >> kernel: [  301.244631]  [<c1052834>] kthread+0x74/0x80
> >>> > >> kernel: [  301.244756]  [<c10527c0>] ?
> >>> > >> kthread_flush_work_fn+0x10/0x10
> >>> > >> kernel: [  301.244885]  [<c1582ab6>] kernel_thread_helper+0x6/0xd
> >>> > >> kernel: [  301.245011] Code: 55 f4 8b 45 f8 e9 75 ff ff ff 90 55 89
> >>> > >> e5 53 83
> >>> > >> ec 14 8b 08 8b 50 04 81 f9 00 01 10 00 74 24 81 fa 00 02 20 00 0f 84
> >>> > >> 8e 00
> >>> > >> 00 00 <8b> 1a 39 d8 75 62 8b 59 04 39 d8 75 35 89 51 04 89 0a 83 c4
> >>> > >> 14
> >>> > >> kernel: [  301.248195] EIP: [<c1238dd0>] __list_del_entry+0x20/0xe0
> >>> > >> SS:ESP
> >>> > >> 0068:f47cff64
> >>> > >> kernel: [  301.248414] CR2: 0000000000000000
> >>> > >> kernel: [  301.248538] ---[ end trace 15082dbfb353f84c ]---
> >>> > >>
> >>> > >> The kernel was compiled with the following DEBUG support (the bolded
> >>> > >> one
> >>> > >> were requested by Gentoo's Dev:
> >>> > >> CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> >>> > >> CONFIG_SLUB_DEBUG=y
> >>> > >> CONFIG_HAVE_DMA_API_DEBUG=y
> >>> > >> CONFIG_X86_DEBUGCTLMSR=y
> >>> > >> CONFIG_PNP_DEBUG_MESSAGES=y
> >>> > >> CONFIG_AIC94XX_DEBUG=y
> >>> > >> CONFIG_USB_DEBUG=y
> >>> > >> CONFIG_DEBUG_KERNEL=y
> >>> > >> CONFIG_SCHED_DEBUG=y
> >>> > >> CONFIG_DEBUG_RT_MUTEXES=y
> >>> > >> CONFIG_DEBUG_PI_LIST=y
> >>> > >> CONFIG_DEBUG_BUGVERBOSE=y
> >>> > >> CONFIG_DEBUG_INFO=y
> >>> > >> CONFIG_DEBUG_MEMORY_INIT=y
> >>> > >> CONFIG_DEBUG_LIST=y
> >>> > >> CONFIG_DEBUG_STACKOVERFLOW=y
> >>> > >> CONFIG_DEBUG_RODATA=y
> >>> > >> CONFIG_DEBUG_RODATA_TEST=y
> >>> > >>
> >>> > >> I attached the kernel config i used for 3.2.9 to generate this oops
> >>> > >> and
> >>> > >> warnings.
> >>> > >>
> >>> > >> From the list_add warnings that come after, out of 805 warnings i
> >>> > >> processed,
> >>> > >> after masking with XXXXX the PID and next= values that kept changing
> >>> > >> in
> >>> > >> every one, i got 26 types of MD5. I also attached the files relevant
> >>> > >> as an
> >>> > >> archive to this email.
> >>> > >>
> >>> > >> The Gentoo bug i opened is sleeping, it seems nobody has the time to
> >>> > >> at
> >>> > >> least test to confirm or not the problems i'm seeing (or everybody's
> >>> > >> thinking that nobody would restart auditd so often, so the bug it's
> >>> > >> not that
> >>> > >> serious).
> >>> > >>
> >>> > >>
> >>> > >> Thank you for your time.
> >>> > >>
> >>> > >> On Wed, Feb 8, 2012 at 6:11 PM, Valentin Avram <aval13@gmail.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >>
> >>> > >> --
> >>> > >> Linux-audit mailing list
> >>> > >> Linux-audit@redhat.com
> >>> > >> https://www.redhat.com/mailman/listinfo/linux-audit
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Peter Moody      Google    1.650.253.7306
> >>> > > Security Engineer  pgp:0xC3410038
> >>> >
> >>> >
> >>> >
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Peter Moody      Google    1.650.253.7306
> > Security Engineer  pgp:0xC3410038
> 
> 
> 

  reply	other threads:[~2012-04-05 21:07 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-25 16:45 Kernel oops+crash on repeated auditd restarts Valentin Avram
2012-01-25 16:53 ` Peter Moody
2012-01-25 19:20 ` Eric Paris
2012-01-26  7:13   ` Valentin Avram
2012-02-08 16:11     ` Valentin Avram
2012-03-05  8:35       ` Valentin Avram
2012-03-28 20:51         ` Peter Moody
2012-03-28 22:42           ` Peter Moody
2012-03-29  1:14             ` Eric Paris
2012-03-29  6:44               ` Valentin Avram
2012-04-03 16:15                 ` Peter Moody
2012-04-05 21:03                   ` Peter Moody
2012-04-05 21:07                     ` Eric Paris [this message]
2012-04-17 17:56                       ` Peter Moody
2012-04-17 18:24                         ` Peter Moody
2012-04-17 21:54                           ` Peter Moody
2012-04-21  2:14                             ` Marcelo Cerri
2012-04-23 16:05                               ` Peter Moody
2012-04-23 16:26                               ` Eric Paris
2012-04-24  1:27                                 ` Peter Moody
2012-04-24  5:12                                 ` Marcelo Cerri
2012-04-24 18:31                                   ` Eric Paris
2012-04-24 18:38                                     ` Peter Moody
2012-04-24 19:06                                       ` Eric Paris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1333660021.2273.0.camel@localhost \
    --to=eparis@redhat.com \
    --cc=linux-audit@redhat.com \
    --cc=pmoody@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.