public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Luiz Capitulino <lcapitulino@redhat.com>
To: Eric Paris <eparis@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>,
	linux-kernel@vger.kernel.org, oleg@redhat.com,
	akpm@linux-foundation.org, rgb@redhat.com
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()
Date: Tue, 10 Sep 2013 13:45:51 -0400	[thread overview]
Message-ID: <20130910134551.791611b0@redhat.com> (raw)
In-Reply-To: <1378829005.12992.30.camel@flatline.rdu.redhat.com>

On Tue, 10 Sep 2013 12:03:25 -0400
Eric Paris <eparis@redhat.com> wrote:

> On Mon, 2013-09-09 at 18:32 +0400, Konstantin Khlebnikov wrote:
> > Luiz Capitulino wrote:
> > > I'm getting the following soft lockup:
> > >
> > > CPU: 6 PID: 2278 Comm: killall5 Tainted: GF            3.11.0-rc7+ #1
> > > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > >   0000000000000099 ffff88011fd83de8 ffffffff815324df 0000000000002800
> > >   ffffffff817d48f9 ffff88011fd83e68 ffffffff8152e669 ffff88011fd83e68
> > >   ffffffff00000008 ffff88011fd83e78 ffff88011fd83e18 0000004081dac040
> > > Call Trace:
> > >   <IRQ>   [<ffffffff815324df>] dump_stack+0x46/0x58
> > >   [<ffffffff8152e669>] panic+0xbb/0x1c4
> > >   [<ffffffff810d03c3>] watchdog_timer_fn+0x163/0x170
> > >   [<ffffffff8106c691>] __run_hrtimer+0x81/0x1c0
> > >   [<ffffffff810d0260>] ? watchdog+0x30/0x30
> > >   [<ffffffff8106cea7>] hrtimer_interrupt+0x107/0x240
> > >   [<ffffffff8102f61b>] local_apic_timer_interrupt+0x3b/0x60
> > >   [<ffffffff81542465>] smp_apic_timer_interrupt+0x45/0x60
> > >   [<ffffffff8154124a>] apic_timer_interrupt+0x6a/0x70
> > >   <EOI>   [<ffffffff810c2f5f>] ? audit_log_start+0xbf/0x430
> > >   [<ffffffff810c2fe7>] ? audit_log_start+0x147/0x430
> > >   [<ffffffff81079030>] ? try_to_wake_up+0x2a0/0x2a0
> > >   [<ffffffff810c86be>] audit_log_exit+0x6ae/0xc30
> > >   [<ffffffff81188662>] ? __alloc_fd+0x42/0x100
> > >   [<ffffffff810c98e7>] __audit_syscall_exit+0x257/0x2b0
> > >   [<ffffffff81540794>] sysret_audit+0x17/0x21
> > >
> > > The reproducer is somewhat unusual:
> > >
> > >   1. Install RHEL6.5 (maybe a similar older user-space will do)
> > >   2. Boot the just installed system
> > >   3. In this first boot you'll meet the firstboot script, which
> > >      will do some setup and (depending on your answers) it will
> > >      reboot the machine
> > >   4. During that first reboot the system hangs while terminating
> > >      all processes:
> > >
> > >         Sending all processes the TERM signal...
> > >
> > >      It's when the soft lockup above happens. And yes, I managed
> > >      to get this with latest upstream kernel (HEAD fa8218def1b1)
> > >
> > > I'm reproducing it on a VM, but the first report was on bare-metal.
> > >
> > > This is what is happening:
> > >
> > >   1. audit_log_start() is called
> > >   2. As we have SKBs waiting in audit_skb_queue and all conditions
> > >      evaluate to true, we sleep in wait_for_auditd()
> > >   3. Go to 2, until sleep_time gets negative and audit_log_start()
> > >      just busy-waits
> > >
> > > Now, *why* this is happening is a mistery to me. I tried debugging
> > > it, but all I could find is that at some point the kauditd thread
> > > never wakes up after having called schedule(). I even tried waking
> > > it up before calling wait_for_auditd(), but it didn't.
> > 
> > We run into the same problem in rhel6 kernel.
> > 
> > "readahead-collector" uses audit interface and sometimes stuck in 'stopped' state.
> > 
> > After commit 829199197a430dade2519d54f5545c4a094393b8
> > (which was backported by RH into their kernel)
> > audit emiters will block forever if userspace daemon cannot handle backlog.
> > That commit just breaks timeout condition, after timeout waiting loop turns
> > into busy loop until deamon dies or returns back to work.
> > 
> > this trivial patch should fix that problem
> > 
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -1215,9 +1215,10 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> > 
> >                          sleep_time = timeout_start + audit_backlog_wait_time -
> >                                          jiffies;
> > -                       if ((long)sleep_time > 0)
> > +                       if ((long)sleep_time > 0) {
> >                                  wait_for_auditd(sleep_time);
> > -                       continue;
> > +                               continue;
> > +                       }
> >                  }
> >                  if (audit_rate_check() && printk_ratelimit())
> >                          printk(KERN_WARNING
> 
> I think this is the right(ish) fix, at least it gets at the real bug.
> 829199197a430dade2519d54f5545c4a094393b8 definitely is the problem.  The
> 60 second wait is NOT causing the soft lockup.

It's not. What it does cause is a 60 seconds hang, during which my system
is unusable. Let me stress that. It's not auditd that gets a 60 second hang.
My _impression_ is that any process doing a system call will hang there,
which causes the system to become unusable. IMHO, that's not acceptable
behavior.

Now, I don't know if the other changes you suggest Richard doing are going
to help with that (if they are then I'm all for them). Also, I wonder why
this long hang is not a problem for Chuck's and Konstantin's test-cases,
given they submitted similar fixes (which are also similar to my RFC).

  reply	other threads:[~2013-09-10 17:45 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-28 22:21 [RFC] audit: avoid soft lockup in audit_log_start() Luiz Capitulino
2013-08-28 22:33 ` Andrew Morton
2013-08-28 22:54   ` Luiz Capitulino
2013-08-28 23:08     ` Andrew Morton
2013-08-29  0:49       ` Luiz Capitulino
2013-08-30 18:23       ` Luiz Capitulino
2013-09-09 14:32 ` Konstantin Khlebnikov
2013-09-09 14:54   ` Luiz Capitulino
2013-09-09 15:19     ` Konstantin Khlebnikov
2013-09-09 15:29       ` Luiz Capitulino
2013-09-09 15:42         ` Konstantin Khlebnikov
2013-09-10 16:03   ` Eric Paris
2013-09-10 17:45     ` Luiz Capitulino [this message]
2013-09-17 22:28     ` Andrew Morton
2013-09-17 22:54       ` Luiz Capitulino
2013-09-18  1:57       ` Richard Guy Briggs
2013-09-18  9:48       ` [PATCH] audit: fix endless wait " Konstantin Khlebnikov
2013-09-18 13:31         ` Richard Guy Briggs
2013-09-18 19:06       ` [PATCH 0/8] Audit backlog queue fixes related to soft lockup Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 1/8] audit: avoid soft lockup due to audit_log_start() incorrect loop termination Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 2/8] audit: reset audit backlog wait time after error recovery Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 3/8] audit: make use of remaining sleep time from wait_for_auditd Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 4/8] audit: efficiency fix 1: only wake up if queue shorter than backlog limit Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 5/8] audit: efficiency fix 2: request exclusive wait since all need same resource Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 6/8] audit: add boot option to override default backlog limit Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 7/8] audit: clean up AUDIT_GET/SET local variables and future-proof API Richard Guy Briggs
2013-09-19 21:18           ` Steve Grubb
2013-09-20 14:47             ` Eric Paris
2013-09-23 16:38               ` Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 8/8] audit: add audit_backlog_wait_time configuration option Richard Guy Briggs
2013-09-18 20:33           ` Eric Paris
2013-09-18 20:49             ` Richard Guy Briggs
2013-09-18 20:54               ` Eric Paris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130910134551.791611b0@redhat.com \
    --to=lcapitulino@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=eparis@redhat.com \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rgb@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox