All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com>
Cc: linux-mm@kvack.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Kees Cook <keescook@chromium.org>, Christoph Hellwig <hch@lst.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: exec error: BUG: Bad rss-counter
Date: Tue, 02 Mar 2021 13:37:30 -0600	[thread overview]
Message-ID: <m1blc1gxdx.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <CALCv0x1Wka10b-mgb1wRHW-W-qRaZOKvJ_-ptq85Hj849PFPSw@mail.gmail.com> (Ilya Lipnitskiy's message of "Mon, 1 Mar 2021 23:59:36 -0800")

Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> writes:

> On Mon, Mar 1, 2021 at 12:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> writes:
>>
>> > Eric, All,
>> >
>> > The following error appears when running Linux 5.10.18 on an embedded
>> > MIPS mt7621 target:
>> > [    0.301219] BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
>> >
>> > Being a very generic error, I started digging and added a stack dump
>> > before the BUG:
>> > Call Trace:
>> > [<80008094>] show_stack+0x30/0x100
>> > [<8033b238>] dump_stack+0xac/0xe8
>> > [<800285e8>] __mmdrop+0x98/0x1d0
>> > [<801a6de8>] free_bprm+0x44/0x118
>> > [<801a86a8>] kernel_execve+0x160/0x1d8
>> > [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
>> > [<80003198>] ret_from_kernel_thread+0x14/0x1c
>> >
>> > So that's how I got to looking at fs/exec.c and noticed quite a few
>> > changes last year. Turns out this message only occurs once very early
>> > at boot during the very first call to kernel_execve. current->mm is
>> > NULL at this stage, so acct_arg_size() is effectively a no-op.
>>
>> If you believe this is a new error you could bisect the kernel
>> to see which change introduced the behavior you are seeing.
>>
>> > More digging, and I traced the RSS counter increment to:
>> > [<8015adb4>] add_mm_counter_fast+0xb4/0xc0
>> > [<80160d58>] handle_mm_fault+0x6e4/0xea0
>> > [<80158aa4>] __get_user_pages.part.78+0x190/0x37c
>> > [<8015992c>] __get_user_pages_remote+0x128/0x360
>> > [<801a6d9c>] get_arg_page+0x34/0xa0
>> > [<801a7394>] copy_string_kernel+0x194/0x2a4
>> > [<801a880c>] kernel_execve+0x11c/0x298
>> > [<800420f4>] call_usermodehelper_exec_async+0x114/0x194
>> > [<80003198>] ret_from_kernel_thread+0x14/0x1c
>> >
>> > In fact, I also checked vma_pages(bprm->vma) and lo and behold it is set to 1.
>> >
>> > How is fs/exec.c supposed to handle implied RSS increments that happen
>> > due to page faults when discarding the bprm structure? In this case,
>> > the bug-generating kernel_execve call never succeeded, it returned -2,
>> > but I didn't trace exactly what failed.
>>
>> Unless I am mistaken any left over pages should be purged by exit_mmap
>> which is called by mmput before mmput calls mmdrop.
> Good to know. Some more digging and I can say that we hit this error
> when trying to unmap PFN 0 (is_zero_pfn(pfn) returns TRUE,
> vm_normal_page returns NULL, zap_pte_range does not decrement
> MM_ANONPAGES RSS counter). Is my understanding correct that PFN 0 is
> usable, but special? Or am I totally off the mark here?

It would be good to know if that is the page that get_user_pages_remote
returned to copy_string_kernel.  The zero page that is always zero,
should never be returned when a writable mapping is desired.

> Here is the (optimized) stack trace when the counter does not get decremented:
> [<8015b078>] vm_normal_page+0x114/0x1a8
> [<8015dc98>] unmap_page_range+0x388/0xacc
> [<8015e5a0>] unmap_vmas+0x6c/0x98
> [<80166194>] exit_mmap+0xd8/0x1ac
> [<800290c0>] mmput+0x58/0xf8
> [<801a6f8c>] free_bprm+0x2c/0xc4
> [<801a8890>] kernel_execve+0x160/0x1d8
> [<800420e0>] call_usermodehelper_exec_async+0x114/0x194
> [<80003198>] ret_from_kernel_thread+0x14/0x1c
>
>>
>> AKA it looks very very fishy this happens and this does not look like
>> an execve error.
> I think you are right, I'm probably wrong to bother you. However,
> since the thread is already started, let me add linux-mm here :)

It happens during exec.  I don't mind looking and pointing you a useful
direction.

>>
>> On the other hand it would be good to know why kernel_execve is failing.
>> Then the error handling paths could be scrutinized, and we can check to
>> see if everything that should happen on an error path does.
> I can check on this, but likely it's the init system not doing things
> quite in the right order on my platform, or something similar. The
> error is ENOENT from do_open_execat().

That does narrow things down considerably.
After the error all we do is:
Clear in_execve and fs->in_exec.
Return from bprm_execve
Call free_bprm
Which does:
	if (bprm->mm) {
		acct_arg_size(bprm, 0);
		mmput(bprm->mm);
	}

So it really needs to be the mmput that cleans things up.\

I would really verify the correspondence between what get_arg_page
returns and what gets freed in mmput if it is not too difficult.
I think it should just be a page or two.

Eric

  parent reply	other threads:[~2021-03-03  5:38 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-01  3:28 exec error: BUG: Bad rss-counter Ilya Lipnitskiy
2021-03-01 20:43 ` Eric W. Biederman
2021-03-02  7:59   ` Ilya Lipnitskiy
2021-03-02 18:56     ` Linus Torvalds
2021-03-03  7:07       ` Ilya Lipnitskiy
2021-03-02 19:37     ` Eric W. Biederman [this message]
2021-03-03  7:01       ` Ilya Lipnitskiy
2021-03-03 15:50         ` Eric W. Biederman
2021-03-03 15:55           ` Ilya Lipnitskiy
2021-03-03 16:07             ` Eric W. Biederman
2021-03-20 15:59             ` Zhou Yanjie
2021-03-29  2:48               ` Ilya Lipnitskiy
2021-03-30  4:56                 ` Zhou Yanjie
2021-03-30 16:11                   ` Linus Torvalds
2021-03-30 16:36                     ` Ilya Lipnitskiy
2021-03-30 16:47                       ` Linus Torvalds
2021-03-29  2:46           ` Ilya Lipnitskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1blc1gxdx.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=hch@lst.de \
    --cc=ilya.lipnitskiy@gmail.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.