linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Andy Lutomirski <luto@amacapital.net>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
Date: Fri, 26 May 2017 16:00:57 +0300	[thread overview]
Message-ID: <20170526130057.t7zsynihkdtsepkf@node.shutemov.name> (raw)
In-Reply-To: <CA+55aFznnXPDxYy5CN6qVU7QJ3Y9hbSf-s2-w0QkaNJuTspGcQ@mail.gmail.com>

On Thu, May 25, 2017 at 04:24:24PM -0700, Linus Torvalds wrote:
> On Thu, May 25, 2017 at 1:33 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > Here' my first attempt to bring boot-time between 4- and 5-level paging.
> > It looks not too terrible to me. I've expected it to be worse.
> 
> If I read this right, you just made it a global on/off thing.
> 
> May I suggest possibly a different model entirely? Can you make it a
> per-mm flag instead?
> 
> And then we
> 
>  (a) make all kthreads use the 4-level page tables
> 
>  (b) which means that all the init code uses the 4-level page tables
> 
>  (c) which means that all those checks for "start_secondary" etc can
> just go away, because those all run with 4-level page tables.
> 
> Or is it just much too expensive to switch between 4-level and 5-level
> paging at run-time?

Hm..

I don't see how kernel threads can use 4-level paging. It doesn't work
from virtual memory layout POV. Kernel claims half of full virtual address
space for itself -- 256 PGD entries, not one as we would effectively have
in case of switching to 4-level paging. For instance, addresses, where
vmalloc and vmemmap are mapped, are not canonical with 4-level paging.

And you cannot see whole direct mapping of physical memory. Back to
highmem? (Please, no, please).

We could possible reduce number of PGD required by kernel. Currently,
layout for 5-level paging allows up-to 55-bit physical memory. It's
redundant as SDM claim that we never will get more than 52. So we could
reduce size of kernel part of layout by few bits, but not definitely to 1.

I don't see how it can possibly work.

Besides difficulties of getting switching between paging modes correct,
that Andy mentioned, it will also hurt performance. You cannot switch
between paging modes directly. It would require disabling paging
completely. It means we loose benefit from global page table entries on
such switching. More page-walks.

Even ignoring all of above, I don't see much benefit of having per-mm
switching. It adds complexity without much benefit -- saving few lines of
logic during early boot doesn't look as huge win to me.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Andy Lutomirski <luto@amacapital.net>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging
Date: Fri, 26 May 2017 16:00:57 +0300	[thread overview]
Message-ID: <20170526130057.t7zsynihkdtsepkf@node.shutemov.name> (raw)
Message-ID: <20170526130057.Z53IXbENMyADqrd30sbe8rA4W50r-IWdOLGxNy6U2L0@z> (raw)
In-Reply-To: <CA+55aFznnXPDxYy5CN6qVU7QJ3Y9hbSf-s2-w0QkaNJuTspGcQ@mail.gmail.com>

On Thu, May 25, 2017 at 04:24:24PM -0700, Linus Torvalds wrote:
> On Thu, May 25, 2017 at 1:33 PM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > Here' my first attempt to bring boot-time between 4- and 5-level paging.
> > It looks not too terrible to me. I've expected it to be worse.
> 
> If I read this right, you just made it a global on/off thing.
> 
> May I suggest possibly a different model entirely? Can you make it a
> per-mm flag instead?
> 
> And then we
> 
>  (a) make all kthreads use the 4-level page tables
> 
>  (b) which means that all the init code uses the 4-level page tables
> 
>  (c) which means that all those checks for "start_secondary" etc can
> just go away, because those all run with 4-level page tables.
> 
> Or is it just much too expensive to switch between 4-level and 5-level
> paging at run-time?

Hm..

I don't see how kernel threads can use 4-level paging. It doesn't work
from virtual memory layout POV. Kernel claims half of full virtual address
space for itself -- 256 PGD entries, not one as we would effectively have
in case of switching to 4-level paging. For instance, addresses, where
vmalloc and vmemmap are mapped, are not canonical with 4-level paging.

And you cannot see whole direct mapping of physical memory. Back to
highmem? (Please, no, please).

We could possible reduce number of PGD required by kernel. Currently,
layout for 5-level paging allows up-to 55-bit physical memory. It's
redundant as SDM claim that we never will get more than 52. So we could
reduce size of kernel part of layout by few bits, but not definitely to 1.

I don't see how it can possibly work.

Besides difficulties of getting switching between paging modes correct,
that Andy mentioned, it will also hurt performance. You cannot switch
between paging modes directly. It would require disabling paging
completely. It means we loose benefit from global page table entries on
such switching. More page-walks.

Even ignoring all of above, I don't see much benefit of having per-mm
switching. It adds complexity without much benefit -- saving few lines of
logic during early boot doesn't look as huge win to me.

-- 
 Kirill A. Shutemov

  parent reply	other threads:[~2017-05-26 13:00 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-25 20:33 [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 1/8] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time Kirill A. Shutemov
2017-05-25 20:33   ` Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 2/8] x86/mm: Make virtual memory layout movable for CONFIG_X86_5LEVEL Kirill A. Shutemov
2017-05-25 20:33   ` Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 3/8] x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variable Kirill A. Shutemov
2017-05-25 20:33   ` Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 4/8] x86/mm: Handle boot-time paging mode switching at early boot Kirill A. Shutemov
2017-05-25 20:33   ` Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 5/8] x86/mm: Fold p4d page table layer at runtime Kirill A. Shutemov
2017-05-25 20:33   ` Kirill A. Shutemov
2017-05-27 15:09   ` Brian Gerst
2017-05-27 15:09     ` Brian Gerst
2017-05-27 22:46     ` Kirill A. Shutemov
2017-05-27 22:46       ` Kirill A. Shutemov
2017-05-27 22:56       ` Brian Gerst
2017-05-27 22:56         ` Brian Gerst
2017-05-25 20:33 ` [PATCHv1, RFC 6/8] x86/mm: Replace compile-time checks for 5-level with runtime-time Kirill A. Shutemov
2017-05-25 20:33   ` Kirill A. Shutemov
2017-05-25 20:33 ` [PATCHv1, RFC 7/8] x86/mm: Hacks for boot-time switching between 4- and 5-level paging Kirill A. Shutemov
2017-05-25 20:33   ` Kirill A. Shutemov
2017-05-26 22:10   ` KASAN vs. " Kirill A. Shutemov
2017-05-26 22:10     ` Kirill A. Shutemov
2017-05-29 10:02     ` Dmitry Vyukov
2017-05-29 11:18       ` Andrey Ryabinin
2017-05-29 11:19         ` Dmitry Vyukov
2017-05-29 11:19           ` Dmitry Vyukov
2017-05-29 11:45           ` Andrey Ryabinin
2017-05-29 11:45             ` Andrey Ryabinin
2017-05-29 12:46             ` Andrey Ryabinin
2017-05-29 12:46               ` Andrey Ryabinin
2017-06-01 14:56               ` Andrey Ryabinin
2017-06-01 14:56                 ` Andrey Ryabinin
2017-07-10 12:33                 ` Kirill A. Shutemov
2017-07-10 12:33                   ` Kirill A. Shutemov
2017-07-10 12:43                   ` Dmitry Vyukov
2017-07-10 12:43                     ` Dmitry Vyukov
2017-07-10 14:17                     ` Kirill A. Shutemov
2017-07-10 14:17                       ` Kirill A. Shutemov
2017-07-10 15:56                       ` Andy Lutomirski
2017-07-10 15:56                         ` Andy Lutomirski
2017-07-10 18:47                         ` Kirill A. Shutemov
2017-07-10 18:47                           ` Kirill A. Shutemov
2017-07-10 20:07                           ` Andy Lutomirski
2017-07-10 20:07                             ` Andy Lutomirski
2017-07-10 21:24                             ` Kirill A. Shutemov
2017-07-10 21:24                               ` Kirill A. Shutemov
2017-07-11  0:30                               ` Andy Lutomirski
2017-07-11  0:30                                 ` Andy Lutomirski
2017-07-11 10:35                                 ` Kirill A. Shutemov
2017-07-11 15:06                                   ` Andy Lutomirski
2017-07-11 15:06                                     ` Andy Lutomirski
2017-07-11 15:15                                     ` Andrey Ryabinin
2017-07-11 15:15                                       ` Andrey Ryabinin
2017-07-11 16:45                                       ` Andrey Ryabinin
2017-07-11 17:03                                         ` Kirill A. Shutemov
2017-07-11 17:03                                           ` Kirill A. Shutemov
2017-07-11 17:29                                           ` Andrey Ryabinin
2017-07-11 17:29                                             ` Andrey Ryabinin
2017-07-11 19:05                                             ` Kirill A. Shutemov
2017-07-11 19:05                                               ` Kirill A. Shutemov
2017-07-13 12:58                                               ` Andrey Ryabinin
2017-07-13 12:58                                                 ` Andrey Ryabinin
2017-07-13 13:52                                                 ` Kirill A. Shutemov
2017-07-13 14:15                                                   ` Kirill A. Shutemov
2017-07-13 14:15                                                     ` Kirill A. Shutemov
2017-07-13 14:19                                                     ` Andrey Ryabinin
2017-07-13 14:19                                                       ` Andrey Ryabinin
2017-07-24 12:13                                                       ` Kirill A. Shutemov
2017-07-24 14:07                                                         ` Andrey Ryabinin
2017-07-10 16:57                   ` Andrey Ryabinin
2017-05-25 20:33 ` [PATCHv1, RFC 8/8] x86/mm: Allow to boot without la57 if CONFIG_X86_5LEVEL=y Kirill A. Shutemov
2017-05-25 20:33   ` Kirill A. Shutemov
2017-05-25 23:24 ` [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging Linus Torvalds
2017-05-25 23:24   ` Linus Torvalds
2017-05-26  0:40   ` Andy Lutomirski
2017-05-26  0:40     ` Andy Lutomirski
2017-05-26  4:18     ` Kevin Easton
2017-05-26  4:18       ` Kevin Easton
2017-05-26  7:21       ` Andy Lutomirski
2017-05-26 13:00   ` Kirill A. Shutemov [this message]
2017-05-26 13:00     ` Kirill A. Shutemov
2017-05-26 13:35     ` Andi Kleen
2017-05-26 15:51     ` Linus Torvalds
2017-05-26 15:51       ` Linus Torvalds
2017-05-26 15:58       ` Kirill A. Shutemov
2017-05-26 15:58         ` Kirill A. Shutemov
2017-05-26 16:13         ` Linus Torvalds
2017-05-26 16:13           ` Linus Torvalds
2017-05-26 18:24       ` hpa
2017-05-26 18:24         ` hpa
2017-05-26 19:23         ` Dave Hansen
2017-05-26 19:36           ` hpa
2017-05-26 19:36             ` hpa
2017-05-26 19:40     ` hpa
2017-05-26 19:40       ` hpa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170526130057.t7zsynihkdtsepkf@node.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).