From: Eric Sandeen <sandeen@redhat.com>
To: Peter Moody <pmoody@google.com>
Cc: lczerner@redhat.com, linux-audit@redhat.com,
Jeff Layton <jlayton@redhat.com>
Subject: Re: [PATCH] audit: grab a reference to context->pwd when it's cached
Date: Fri, 05 Oct 2012 15:23:02 -0500 [thread overview]
Message-ID: <506F41A6.5060104@redhat.com> (raw)
In-Reply-To: <CALnj_=4BBdLJi4yvcfd1M+XYQZ9OGsm8EJeT0DiyiVKiV4oRUQ@mail.gmail.com>
On 10/5/12 10:57 AM, Peter Moody wrote:
> On Fri, Oct 5, 2012 at 8:18 AM, Peter Moody <pmoody@google.com> wrote:
>> On Fri, Oct 5, 2012 at 8:16 AM, Peter Moody <pmoody@google.com> wrote:
>>> On Fri, Oct 5, 2012 at 7:26 AM, Jeff Layton <jlayton@redhat.com> wrote:
>>>> On Fri, 5 Oct 2012 06:57:59 -0700
>>>> Peter Moody <pmoody@google.com> wrote:
>>>>
>>>>> On Fri, Oct 5, 2012 at 5:55 AM, Jeff Layton <jlayton@redhat.com> wrote:
>>>>>> On Thu, 4 Oct 2012 11:48:23 -0700
>>>>>> Peter Moody <pmoody@google.com> wrote:
>>>>>>
>>>>>>> On Wed, Sep 26, 2012 at 6:50 AM, Alexander Viro <aviro@redhat.com> wrote:
>>>>>>>> On Tue, Sep 25, 2012 at 10:03:23AM -0700, Peter Moody wrote:
>>>>>>>>> Hey folks,
>>>>>>>>>
>>>>>>>>> following up on old patches, are there any comments on this? Did you
>>>>>>>>> get around to finding a better way to fix this bug, Al?
>>>>>>>>
>>>>>>>> Alas, I've found none ;-/ Looks like we'll have to go with this one,
>>>>>>>> at least until somebody comes up with better solution.
>>>>>>>
>>>>>>> Not surprisingly, this patch doesn't actually fix the issue (or at
>>>>>>> least doesn't do it correctly).
>>>>>>>
>>>>>>> I hadn't noticed that get_fs_pwd() actually calls path_get() on
>>>>>>> &context->pwd so the additional path_get() is useless and the
>>>>>>> reference doesn't ever actually get freed if audit_putname is called
>>>>>>> while we're in a syscall.
>>>>>>>
>>>>>>> Al, Eric, Jeff; do any of you guys have an understanding of what the
>>>>>>> initial bug actually is since this clearly doesn't fix it?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> peter
>>>>>>>
>>>>>>
>>>>>> BTW, I ran this test on one of my KVM guests and it ran just fine. That
>>>>>> one is an x86_64 guest running a 3.6.0+ kernel. The root fs on there is
>>>>>> ext4 though, not ext3. So perhaps that's a factor?
>>>>>>
>>>>>> The oops message you posted at least looks like something down in the
>>>>>> bowels of ext3 or fs/buffer.c.
>>>>>
>>>>> Yeah, the only place this actually happens for me on these giant xen
>>>>> instances we have (6 cores, 32G ram) and it happens on both ext3 and
>>>>> ext4 filesystems and it happens with 100% reliability.
>>>>>
>>>>> The actual oops is from:
>>>>>
>>>>> static inline void check_irqs_on(void)
>>>>> {
>>>>> #ifdef irqs_disabled
>>>>> BUG_ON(irqs_disabled());
>>>>> #endif
>>>>> }
>>>>>
>>>>> with the code path looking like:
>>>>>
>>>>> __find_get_block() -> lookup_bh_lru() -> check_irqs_on() -> BUG()
>>>>>
>>>>
>>>> Do you have a backtrace from a more recent kernel? I wonder if
>>>> something in the syscall exit codepath is disabling IRQs here?
>>>
>>> is 3.6.0-rc1 recent enough or do you want something newer?
>>
>> nevermind, that doesn't boot. One sec.
>
>
> here's 3.5.0
>
> ------------[ cut here ]------------
> kernel BUG at fs/buffer.c:1220!
> invalid opcode: 0000 [#1] SMP
> CPU 0
> Pid: 3683, comm: a.out Not tainted 3.5.0 #3
> RIP: e030:[<ffffffff816a99f4>] [<ffffffff816a99f4>]
> check_irqs_on.part.8+0x4/0x6
> RSP: e02b:ffff8807b156dc28 EFLAGS: 00010046
> RAX: ffff8807d0dd0000 RBX: ffff8807a7d6df28 RCX: 0000000005883396
> RDX: 0000000000001000 RSI: 0000000005883396 RDI: ffff8807cfc0c000
> RBP: ffff8807b156dc28 R08: 0000000000000001 R09: ffff8807a7d6de50
> R10: f83a2b0a359bf007 R11: 0000000000000000 R12: ffff8807a7d6de54
> R13: ffff8807a7d6de80 R14: ffff8807cfc1f120 R15: 0000000005883396
> FS: 00007f97164ec700(0000) GS:ffff8807ffc00000(0063) knlGS:0000000000000000
> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 00000000f76ca3b0 CR3: 00000007bbb53000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process a.out (pid: 3683, threadinfo ffff8807b156c000, task ffff8807bbae8000)
One thing that crossed my mind is that back in the 4k stacks days I
think sometimes a blown stack could corrupt threadinfo and we'd get
spurious warnings like this.
Is there any chance something like that has happened? (any stack
depth messages, etc?)
Maybe a crashdump on the BUG() for further poking around would
be in order.
-Eric
> Stack:
> ffff8807b156dc98 ffffffff8116a099 ffff8807b59f3000 ffff8807b156dd30
> ffff8807b156dd60 ffff8807b156de78 ffff8807b156dc78 ffffffff816af231
> ffff8807b156dcd8 ffff8807a7d6e538 ffff8807a7d6df28 ffff8807a7d6de54
> Call Trace:
> [<ffffffff8116a099>] __find_get_block+0x1f9/0x200
> [<ffffffff816af231>] ? down_read+0x11/0x30
> [<ffffffff811d1405>] ext3_clear_blocks+0x75/0x140
> [<ffffffff811d15dc>] ext3_free_data+0x10c/0x150
> [<ffffffff811e2061>] ? ext3_journal_start_sb+0x31/0x60
> [<ffffffff811d1cb5>] ext3_truncate+0x4a5/0x600
> [<ffffffff8123d5b8>] ? journal_start+0xb8/0x100
> [<ffffffff8106f406>] ? bit_waitqueue+0x16/0xc0
> [<ffffffff811d4598>] ext3_evict_inode+0x248/0x2c0
> [<ffffffff81153b9a>] evict+0xaa/0x1b0
> [<ffffffff81154843>] iput+0x103/0x210
> [<ffffffff8114fc88>] dentry_iput+0x88/0xd0
> [<ffffffff811505ec>] dput+0x12c/0x250
> [<ffffffff81146275>] path_put+0x15/0x30
> [<ffffffff810b2f35>] __audit_syscall_exit+0x2e5/0x460
> [<ffffffff816b30be>] sysexit_audit+0x29/0x5b
> Code: 04 00 00 4c 8d 88 c0 02 00 00 31 c0 e8 5f da ff ff 48 85 db 74
> 0c 80 43 5c 01 48 89 df e8 d5
> 6a aa ff 5b 41 5c 5d c3 55 48 89 e5 <0f> 0b 55 48 89 e5 0f 0b 55 48 89
> e5 0f 0b 55 48 89 e5 41 54 53
> RIP [<ffffffff816a99f4>] check_irqs_on.part.8+0x4/0x6
> RSP <ffff8807b156dc28>
> ---[ end trace 8d09f8cfbb601c14 ]---
>
>
>>>> --
>>>> Jeff Layton <jlayton@redhat.com>
>>>
>>>
>>>
>>> --
>>> Peter Moody Google 1.650.253.7306
>>> Security Engineer pgp:0xC3410038
>>
>>
>>
>> --
>> Peter Moody Google 1.650.253.7306
>> Security Engineer pgp:0xC3410038
>
>
>
next prev parent reply other threads:[~2012-10-05 20:23 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-16 1:13 [PATCH] audit: grab a reference to context->pwd when it's cached Peter Moody
2012-08-16 1:26 ` Alexander Viro
2012-08-16 1:41 ` Peter Moody
2012-08-20 17:19 ` Peter Moody
2012-09-25 17:03 ` Peter Moody
2012-09-26 13:50 ` Alexander Viro
2012-09-26 18:20 ` Peter Moody
2012-10-04 18:48 ` Peter Moody
2012-10-05 11:02 ` Jeff Layton
2012-10-05 12:55 ` Jeff Layton
2012-10-05 13:57 ` Peter Moody
2012-10-05 14:10 ` Jeff Layton
2012-10-05 14:26 ` Jeff Layton
2012-10-05 15:16 ` Peter Moody
2012-10-05 15:18 ` Peter Moody
2012-10-05 15:57 ` Peter Moody
2012-10-05 16:15 ` Jeff Layton
2012-10-05 17:04 ` Peter Moody
2012-10-05 19:41 ` Peter Moody
2012-10-08 12:23 ` Jeff Layton
2012-10-08 15:22 ` Peter Moody
2012-10-08 16:45 ` Peter Moody
2012-10-08 14:10 ` Jeff Layton
2012-10-05 20:23 ` Eric Sandeen [this message]
2012-10-07 3:22 ` Peter Moody
2012-10-08 18:48 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=506F41A6.5060104@redhat.com \
--to=sandeen@redhat.com \
--cc=jlayton@redhat.com \
--cc=lczerner@redhat.com \
--cc=linux-audit@redhat.com \
--cc=pmoody@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox