All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Michal Hocko <mhocko@kernel.org>, Andy Lutomirski <luto@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	LKML <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Subject: Re: Use higher-order pages in vmalloc
Date: Thu, 01 Mar 2018 10:16:48 -0800	[thread overview]
Message-ID: <1519928208.11375.3.camel@gmail.com> (raw)
In-Reply-To: <20180223121300.GU30681@dhcp22.suse.cz>

On Fri, 2018-02-23 at 13:13 +0100, Michal Hocko wrote:
> On Thu 22-02-18 19:01:35, Andy Lutomirski wrote:
> > On Thu, Feb 22, 2018 at 1:36 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > > On Thu 22-02-18 04:22:54, Matthew Wilcox wrote:
> > > > On Thu, Feb 22, 2018 at 07:59:43AM +0100, Michal Hocko wrote:
> > > > > On Wed 21-02-18 09:01:29, Matthew Wilcox wrote:
> > > > > > Right.  It helps with fragmentation if we can keep higher-order
> > > > > > allocations together.
> > > > > 
> > > > > Hmm, wouldn't it help if we made vmalloc pages migrateable instead? That
> > > > > would help the compaction and get us to a lower fragmentation longterm
> > > > > without playing tricks in the allocation path.
> > > > 
> > > > I was wondering about that possibility.  If we want to migrate a page
> > > > then we have to shoot down the PTE across all CPUs, copy the data to the
> > > > new page, and insert the new PTE.  Copying 4kB doesn't take long; if you
> > > > have 12GB/s (current example on Wikipedia: dual-channel memory and one
> > > > DDR2-800 module per channel gives a theoretical bandwidth of 12.8GB/s)
> > > > then we should be able to copy a page in 666ns).  So there's no problem
> > > > holding a spinlock for it.
> > > > 
> > > > But we can't handle a fault in vmalloc space today.  It's handled in
> > > > arch-specific code, see vmalloc_fault() in arch/x86/mm/fault.c
> > > > If we're going to do this, it'll have to be something arches opt into
> > > > because I'm not taking on the job of fixing every architecture!
> > > 
> > > yes.
> > 
> > On x86, if you shoot down the PTE for the current stack, you're dead.
> > vmalloc_fault() might not even be called.  Instead we hit
> > do_double_fault(), and the manual warns extremely strongly against
> > trying to recover, and, in this case, I agree with the SDM.  If you
> > actually want this to work, there needs to be a special IPI broadcast
> > to the task in question (with appropriate synchronization) that calls
> > magic arch code that does the switcheroo.
> 
> Why cannot we use the pte swap entry trick also for vmalloc migration.
> I haven't explored this path at all, to be honest.
> 
> > Didn't someone (Christoph?) have a patch to teach the page allocator
> > to give high-order allocations if available and otherwise fall back to
> > low order?
> 
> Do you mean kvmalloc?


I sent something last year but had not finished the patch series :/

https://marc.info/?l=linux-kernel&m=148233423610544&w=2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Michal Hocko <mhocko@kernel.org>, Andy Lutomirski <luto@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	LKML <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Subject: Re: Use higher-order pages in vmalloc
Date: Thu, 01 Mar 2018 10:16:48 -0800	[thread overview]
Message-ID: <1519928208.11375.3.camel@gmail.com> (raw)
In-Reply-To: <20180223121300.GU30681@dhcp22.suse.cz>

On Fri, 2018-02-23 at 13:13 +0100, Michal Hocko wrote:
> On Thu 22-02-18 19:01:35, Andy Lutomirski wrote:
> > On Thu, Feb 22, 2018 at 1:36 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > > On Thu 22-02-18 04:22:54, Matthew Wilcox wrote:
> > > > On Thu, Feb 22, 2018 at 07:59:43AM +0100, Michal Hocko wrote:
> > > > > On Wed 21-02-18 09:01:29, Matthew Wilcox wrote:
> > > > > > Right.  It helps with fragmentation if we can keep higher-order
> > > > > > allocations together.
> > > > > 
> > > > > Hmm, wouldn't it help if we made vmalloc pages migrateable instead? That
> > > > > would help the compaction and get us to a lower fragmentation longterm
> > > > > without playing tricks in the allocation path.
> > > > 
> > > > I was wondering about that possibility.  If we want to migrate a page
> > > > then we have to shoot down the PTE across all CPUs, copy the data to the
> > > > new page, and insert the new PTE.  Copying 4kB doesn't take long; if you
> > > > have 12GB/s (current example on Wikipedia: dual-channel memory and one
> > > > DDR2-800 module per channel gives a theoretical bandwidth of 12.8GB/s)
> > > > then we should be able to copy a page in 666ns).  So there's no problem
> > > > holding a spinlock for it.
> > > > 
> > > > But we can't handle a fault in vmalloc space today.  It's handled in
> > > > arch-specific code, see vmalloc_fault() in arch/x86/mm/fault.c
> > > > If we're going to do this, it'll have to be something arches opt into
> > > > because I'm not taking on the job of fixing every architecture!
> > > 
> > > yes.
> > 
> > On x86, if you shoot down the PTE for the current stack, you're dead.
> > vmalloc_fault() might not even be called.  Instead we hit
> > do_double_fault(), and the manual warns extremely strongly against
> > trying to recover, and, in this case, I agree with the SDM.  If you
> > actually want this to work, there needs to be a special IPI broadcast
> > to the task in question (with appropriate synchronization) that calls
> > magic arch code that does the switcheroo.
> 
> Why cannot we use the pte swap entry trick also for vmalloc migration.
> I haven't explored this path at all, to be honest.
> 
> > Didn't someone (Christoph?) have a patch to teach the page allocator
> > to give high-order allocations if available and otherwise fall back to
> > low order?
> 
> Do you mean kvmalloc?


I sent something last year but had not finished the patch series :/

https://marc.info/?l=linux-kernel&m=148233423610544&w=2

  reply	other threads:[~2018-03-01 18:16 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-23 10:55 [PATCH 1/4] vmalloc: add vm_flags argument to internal __vmalloc_node() Konstantin Khlebnikov
2018-01-23 10:55 ` Konstantin Khlebnikov
2018-01-23 10:55 ` [PATCH 2/4] vmalloc: add __vmalloc_area() Konstantin Khlebnikov
2018-01-23 10:55   ` Konstantin Khlebnikov
2018-01-23 10:55 ` [PATCH 3/4] kernel/fork: switch vmapped stack callation to __vmalloc_area() Konstantin Khlebnikov
2018-01-23 10:55   ` Konstantin Khlebnikov
2018-01-23 13:57   ` Konstantin Khlebnikov
2018-01-23 13:57     ` Konstantin Khlebnikov
2018-02-21  0:16     ` Andrew Morton
2018-02-21  0:16       ` Andrew Morton
2018-02-21  7:23       ` Konstantin Khlebnikov
2018-02-21  7:23         ` Konstantin Khlebnikov
2018-02-21 16:35         ` Andy Lutomirski
2018-02-21 16:35           ` Andy Lutomirski
2018-01-23 10:55 ` [PATCH 4/4] kernel/fork: add option to use virtually mapped stacks as fallback Konstantin Khlebnikov
2018-01-23 10:55   ` Konstantin Khlebnikov
2018-02-21 15:42   ` Use higher-order pages in vmalloc Matthew Wilcox
2018-02-21 15:42     ` Matthew Wilcox
2018-02-21 16:11     ` Andy Lutomirski
2018-02-21 16:11       ` Andy Lutomirski
2018-02-21 16:50       ` Matthew Wilcox
2018-02-21 16:50         ` Matthew Wilcox
2018-02-21 16:16     ` Dave Hansen
2018-02-21 16:16       ` Dave Hansen
2018-02-21 17:01       ` Matthew Wilcox
2018-02-21 17:01         ` Matthew Wilcox
2018-02-22  6:59         ` Michal Hocko
2018-02-22  6:59           ` Michal Hocko
2018-02-22 12:22           ` Matthew Wilcox
2018-02-22 12:22             ` Matthew Wilcox
2018-02-22 13:36             ` Michal Hocko
2018-02-22 13:36               ` Michal Hocko
2018-02-22 19:01               ` Andy Lutomirski
2018-02-22 19:01                 ` Andy Lutomirski
2018-02-22 19:19                 ` Dave Hansen
2018-02-22 19:19                   ` Dave Hansen
2018-02-22 19:27                   ` Andy Lutomirski
2018-02-22 19:27                     ` Andy Lutomirski
2018-02-22 19:36                     ` Dave Hansen
2018-02-22 19:36                       ` Dave Hansen
2018-02-23 12:13                 ` Michal Hocko
2018-02-23 12:13                   ` Michal Hocko
2018-03-01 18:16                   ` Eric Dumazet [this message]
2018-03-01 18:16                     ` Eric Dumazet
2018-02-21 12:24 ` [PATCH 1/4] vmalloc: add vm_flags argument to internal __vmalloc_node() Matthew Wilcox
2018-02-21 12:24   ` Matthew Wilcox
2018-02-21 12:39   ` Andrey Ryabinin
2018-02-21 12:39     ` Andrey Ryabinin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1519928208.11375.3.camel@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hch@infradead.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.