From: Ingo Molnar <mingo@kernel.org>
To: Song Liu <songliubraving@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [GIT pull] x86/pti for 5.4-rc1
Date: Wed, 25 Sep 2019 08:23:23 +0200 [thread overview]
Message-ID: <20190925062323.GA65860@gmail.com> (raw)
In-Reply-To: <C6FC577A-A589-46FD-92FE-5C441BDB922D@fb.com>
* Song Liu <songliubraving@fb.com> wrote:
>
>
> > On Sep 17, 2019, at 4:35 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > On Tue, Sep 17, 2019 at 4:29 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >> How about we just do:
> >>
> >> diff --git i/arch/x86/mm/pti.c w/arch/x86/mm/pti.c
> >> index b196524759ec..0437f65250db 100644
> >> --- i/arch/x86/mm/pti.c
> >> +++ w/arch/x86/mm/pti.c
> >> @@ -341,6 +341,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
> >> }
> >>
> >> if (pmd_large(*pmd) || level == PTI_CLONE_PMD) {
> >> + WARN_ON_ONCE(addr & ~PMD_MASK);
> >> target_pmd = pti_user_pagetable_walk_pmd(addr);
> >> if (WARN_ON(!target_pmd))
> >> return;
> >>
> >> So it is a "warn and continue" check just for unaligned PMD address.
> >
> > The problem there is that the "continue" part can be wrong.
> >
> > Admittedly it requires a pretty crazy setup: you first hit a
> > pmd_large() entry, but the *next* pmd is regular, so you start doing
> > the per-page cloning.
> >
> > And that per-page cloning will be wrong, because it will start in the
> > middle of the next pmd, because addr wasn't aligned, and the previous
> > pmd-only clone did
> >
> > addr += PMD_SIZE;
> >
> > to go to the next case.
> >
> > See?
>
> I see. This is tricky.
>
> Maybe we should skip clone of the first unaligned large pmd?
>
> diff --git i/arch/x86/mm/pti.c w/arch/x86/mm/pti.c
> index 7f2140414440..1dfa69f8196b 100644
> --- i/arch/x86/mm/pti.c
> +++ w/arch/x86/mm/pti.c
> @@ -343,6 +343,11 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
> }
>
> if (pmd_large(*pmd) || level == PTI_CLONE_PMD) {
> + if (WARN_ON_ONCE(addr & ~PMD_MASK)) {
> + addr = round_up(addr, PMD_SIZE);
> + continue;
> + }
> +
> target_pmd = pti_user_pagetable_walk_pmd(addr);
> if (WARN_ON(!target_pmd))
> return;
No, we should do a proper iteration of the page table structures.
> Or we can round_down the addr and copy the whole PMD properly:
>
> diff --git i/arch/x86/mm/pti.c w/arch/x86/mm/pti.c
> index 7f2140414440..bee9881f2e85 100644
> --- i/arch/x86/mm/pti.c
> +++ w/arch/x86/mm/pti.c
> @@ -343,6 +343,9 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
> }
>
> if (pmd_large(*pmd) || level == PTI_CLONE_PMD) {
> + if (WARN_ON_ONCE(addr & ~PMD_MASK))
> + addr &= PMD_MASK;
> +
> target_pmd = pti_user_pagetable_walk_pmd(addr);
> if (WARN_ON(!target_pmd))
> return;
>
> I think the latter is better, but I am not sure.
While this works, it's the wrong iterator pattern I believe.
In this function we iterate by passing in a 'random' [start,end) virtual
memory address range with no particular alignment assumptions, then look
up all pagetable entries covered by that range.
The iteration's principle is straightforward: we look up the first
address (byte granular) then continue iterating according to the observed
structure of the kernel pagetables, by skipping the range we have just
looked up:
- If the current PUD is not mapped, then we set 'addr' to the first byte
after the virtual memory range represented by the current PUD entry:
addr = round_up(addr + 1, PUD_SIZE);
- If the current PMD is not mapped, then the next byte is:
addr = round_up(addr + 1, PMD_SIZE);
The part Linus correctly pointed it is still iterating incorrectly and
might potentially be unrobust is:
addr += PMD_SIZE;
This is buggy because it doesn't step to the next byte after the current
mapped PMD, but potentially somewhere into the middle of the next
PMD-sized range of virtual memory (which might or might not be covered by
a PMD entry). The iterations after that might be similarly offset and
buggy as well.
The right fix is to *fix the address iterator*, to use the basic
principle of the function, with the same general exact calculation
pattern we use in the other cases:
addr = round_down(addr, PMD_SIZE) + PMD_SIZE;
BTW., I'd also suggest using this new round_down() pattern in the other
two cases as well:
addr = round_down(addr, PUD_SIZE) + PUD_SIZE;
...
addr = round_down(addr, PMD_SIZE) + PMD_SIZE;
Why? Because this:
addr = round_up(addr + 1, PUD_SIZE);
Will iterate incorrectly if 'addr' (which is byte granular) is the last
*byte* of a PUD range, it will incorrectly skip the next PUD range...
Is a page-unaligned address likely to be passed in to this function? With
the current users I really hope it won't happen, but it costs nothing to
use clean iterators and think through all cases - it also makes the code
more readable.
Three random nits about the pti_clone_pgtable() function:
- Could we please also fix all WARN()'s in that function to be
WARN_ONCE()? Any warning from that function is probably fatal to the
bootup anyway, and it doesn't help if we potentially spam many
warnings.
- Please add an explanation comment to why the 'BUG();' case is
unrecoverable and needs us to crash the kernel.
- Please add a comment about what the 'level' parameter does. It's non-obvious.
Thanks,
Ingo
next prev parent reply other threads:[~2019-09-25 6:23 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-16 13:30 [GIT pull] irq/core for 5.4-rc1 Thomas Gleixner
2019-09-16 13:30 ` [GIT pull] x86/irq " Thomas Gleixner
2019-09-17 20:15 ` pr-tracker-bot
2019-09-16 13:30 ` [GIT pull] smp/hotplug " Thomas Gleixner
2019-09-17 20:15 ` pr-tracker-bot
2019-09-16 13:30 ` [GIT pull] x86/apic " Thomas Gleixner
2019-09-17 20:15 ` pr-tracker-bot
2019-09-16 13:30 ` [GIT pull] x86/pti " Thomas Gleixner
2019-09-17 18:13 ` Linus Torvalds
2019-09-17 18:48 ` Song Liu
2019-09-17 19:01 ` Linus Torvalds
2019-09-17 23:28 ` Song Liu
2019-09-17 23:35 ` Linus Torvalds
2019-09-18 10:40 ` Song Liu
2019-09-25 6:23 ` Ingo Molnar [this message]
2019-09-17 20:15 ` pr-tracker-bot
2019-09-16 13:30 ` [GIT pull] timers/urgent " Thomas Gleixner
2019-09-17 20:15 ` pr-tracker-bot
2019-09-16 13:30 ` [GIT pull] timers/core " Thomas Gleixner
2019-09-17 20:15 ` pr-tracker-bot
2019-09-17 20:15 ` [GIT pull] irq/core " pr-tracker-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190925062323.GA65860@gmail.com \
--to=mingo@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=songliubraving@fb.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.