public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Andi Kleen <andi@firstfloor.org>, Ingo Molnar <mingo@elte.hu>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Steven Rostedt <rostedt@rostedt.homelinux.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>, Li Zefan <lizf@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Johannes Berg <johannes.berg@intel.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Tom Zanussi <tzanussi@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Tejun Heo <htejun@gmail.com>
Subject: Re: [patch 1/2] x86_64 page fault NMI-safe
Date: Thu, 15 Jul 2010 16:11:22 +0200	[thread overview]
Message-ID: <20100715141118.GA6417@nowhere> (raw)
In-Reply-To: <AANLkTik-sJL_pAJg38Hzpl_jpn2v90t7tppYqOGmu6p-@mail.gmail.com>

On Wed, Jul 14, 2010 at 03:56:43PM -0700, Linus Torvalds wrote:
> On Wed, Jul 14, 2010 at 3:31 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> > Until now I didn't because I clearly misunderstand the vmalloc internals. I'm
> > not even quite sure why a memory allocated with vmalloc sometimes can be not
> > mapped (and then fault once for this to sync). Some people have tried to explain
> > me but the picture is still vague to me.
> 
> So the issue is that the system can have thousands and thousands of
> page tables (one for each process), and what do you do when you add a
> new kernel virtual mapping?
> 
> You can:
> 
>  - make sure that you only ever use _one_ single top-level entry for
> all vmalloc issues, and can make sure that all processes are created
> with that static entry filled in. This is optimal, but it just doesn't
> work on all architectures (eg on 32-bit x86, it would limit the
> vmalloc space to 4MB in non-PAE, whatever)


But then, even if you ensure that, don't we need to also fill lower level
entries for the new mapping.

Also, why is this a worry for vmalloc but not for kmalloc? Don't we also
risk to add a new memory mapping for new memory allocated with kmalloc?



>  - at vmalloc time, when adding a new page directory entry, walk all
> the tens of thousands of existing page tables under a lock that
> guarantees that we don't add any new ones (ie it will lock out fork())
> and add the required pgd entry to them.
> 
>  - or just take the fault and do the "fill the page tables" on demand.
> 
> Quite frankly, most of the time it's probably better to make that last
> choice (unless your hardware makes it easy to make the first choice,
> which is obviously simplest for everybody). It makes it _much_ cheaper
> to do vmalloc. It also avoids that nasty latency issue. And it's just
> simpler too, and has no interesting locking issues with how/when you
> expose the page tables in fork() etc.
> 
> So the only downside is that you do end up taking a fault in the
> (rare) case where you have a newly created task that didn't get an
> even newer vmalloc entry.


But then how did the previous tasks get this new mapping? You said
we don't walk through every process page tables for vmalloc.

I would understand this race if we were to walk on every processes page
tables and add the new mapping on them, but we missed one new task that
forked or so, because we didn't lock (or just rcu).



> And that fault can sometimes be in an
> interrupt or an NMI. Normally it's trivial to handle that fairly
> simple nested fault. But NMI has that inconvenient "iret unblocks
> NMI's, because there is no dedicated 'nmiret' instruction" problem on
> x86.


Yeah.


So the parts of the problem I don't understand are:

- why don't we have this problem with kmalloc() ?
- did I understand well the race that makes the fault necessary,
  ie: we walk the tasklist lockless, add the new mapping if
  not present, but we might miss a task lately forked, but
  the fault will fix that.

Thanks.


  parent reply	other threads:[~2010-07-15 14:11 UTC|newest]

Thread overview: 163+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-14 15:49 [patch 0/2] x86: NMI-safe trap handlers Mathieu Desnoyers
2010-07-14 15:49 ` [patch 1/2] x86_64 page fault NMI-safe Mathieu Desnoyers
2010-07-14 16:28   ` Linus Torvalds
2010-07-14 17:06     ` Mathieu Desnoyers
2010-07-14 18:10       ` Linus Torvalds
2010-07-14 18:46         ` Ingo Molnar
2010-07-14 19:14           ` Linus Torvalds
2010-07-14 19:36             ` Frederic Weisbecker
2010-07-14 19:54               ` Linus Torvalds
2010-07-14 20:17                 ` Mathieu Desnoyers
2010-07-14 20:55                   ` Linus Torvalds
2010-07-14 21:18                     ` Ingo Molnar
2010-07-14 22:14                 ` Frederic Weisbecker
2010-07-14 22:31                   ` Mathieu Desnoyers
2010-07-14 22:48                     ` Frederic Weisbecker
2010-07-14 23:11                       ` Mathieu Desnoyers
2010-07-14 23:38                         ` Frederic Weisbecker
2010-07-15 16:26                           ` Mathieu Desnoyers
2010-08-03 17:18                             ` Peter Zijlstra
2010-08-03 18:25                               ` Mathieu Desnoyers
2010-08-04  6:46                                 ` Peter Zijlstra
2010-08-04  7:14                                   ` Ingo Molnar
2010-08-04 14:45                                   ` Mathieu Desnoyers
2010-08-04 14:56                                     ` Peter Zijlstra
2010-08-06  1:49                                       ` Mathieu Desnoyers
2010-08-06  9:51                                         ` Peter Zijlstra
2010-08-06 13:46                                           ` Mathieu Desnoyers
2010-08-06  6:18                                       ` Masami Hiramatsu
2010-08-06  9:50                                         ` Peter Zijlstra
2010-08-06 13:37                                           ` Mathieu Desnoyers
2010-08-07  9:51                                           ` Masami Hiramatsu
2010-08-09 16:53                                           ` Frederic Weisbecker
2010-08-03 18:56                               ` Linus Torvalds
2010-08-03 19:45                                 ` Mathieu Desnoyers
2010-08-03 20:02                                   ` Linus Torvalds
2010-08-03 20:10                                     ` Ingo Molnar
2010-08-03 20:21                                       ` Ingo Molnar
2010-08-03 21:16                                         ` Mathieu Desnoyers
2010-08-03 20:54                                     ` Mathieu Desnoyers
2010-08-04  6:27                                 ` Peter Zijlstra
2010-08-04 14:06                                   ` Mathieu Desnoyers
2010-08-04 14:50                                     ` Peter Zijlstra
2010-08-06  1:42                                       ` Mathieu Desnoyers
2010-08-06 10:11                                         ` Peter Zijlstra
2010-08-06 11:14                                           ` Peter Zijlstra
2010-08-06 14:15                                             ` Mathieu Desnoyers
2010-08-06 14:13                                           ` Mathieu Desnoyers
2010-08-11 14:44                                             ` Steven Rostedt
2010-08-11 14:34                                   ` Steven Rostedt
2010-08-15 13:35                                     ` Mathieu Desnoyers
2010-08-15 16:33                                     ` Avi Kivity
2010-08-15 16:44                                       ` Mathieu Desnoyers
2010-08-15 16:51                                         ` Avi Kivity
2010-08-15 18:31                                           ` Mathieu Desnoyers
2010-08-16 10:49                                             ` Avi Kivity
2010-08-16 11:29                                             ` Avi Kivity
2010-08-04  6:46                                 ` Dave Chinner
2010-08-04  7:21                                   ` Ingo Molnar
2010-07-14 23:40                         ` Steven Rostedt
2010-07-14 19:41             ` Linus Torvalds
2010-07-14 19:56               ` Andi Kleen
2010-07-14 20:05                 ` Mathieu Desnoyers
2010-07-14 20:07                   ` Andi Kleen
2010-07-14 20:08                     ` H. Peter Anvin
2010-07-14 23:32                       ` Tejun Heo
2010-07-14 22:31                   ` Frederic Weisbecker
2010-07-14 22:56                     ` Linus Torvalds
2010-07-14 23:09                       ` Andi Kleen
2010-07-14 23:22                         ` Linus Torvalds
2010-07-15 14:11                       ` Frederic Weisbecker [this message]
2010-07-15 14:35                         ` Andi Kleen
2010-07-16 11:21                           ` Frederic Weisbecker
2010-07-15 14:46                         ` Steven Rostedt
2010-07-16 10:47                           ` Frederic Weisbecker
2010-07-16 11:43                             ` Steven Rostedt
2010-07-15 14:51                         ` Linus Torvalds
2010-07-15 15:38                           ` Linus Torvalds
2010-07-16 12:00                           ` Frederic Weisbecker
2010-07-16 12:54                             ` Steven Rostedt
2010-07-14 20:39         ` Mathieu Desnoyers
2010-07-14 21:23           ` Linus Torvalds
2010-07-14 21:45             ` Maciej W. Rozycki
2010-07-14 21:52               ` Linus Torvalds
2010-07-14 22:31                 ` Maciej W. Rozycki
2010-07-14 22:21             ` Mathieu Desnoyers
2010-07-14 22:37               ` Linus Torvalds
2010-07-14 22:51                 ` Jeremy Fitzhardinge
2010-07-14 23:02                   ` Linus Torvalds
2010-07-14 23:54                     ` Jeremy Fitzhardinge
2010-07-15  1:23                 ` Linus Torvalds
2010-07-15  1:45                   ` Linus Torvalds
2010-07-15 18:31                     ` Mathieu Desnoyers
2010-07-15 18:43                       ` Linus Torvalds
2010-07-15 18:48                         ` Linus Torvalds
2010-07-15 22:01                           ` Mathieu Desnoyers
2010-07-15 22:16                             ` Linus Torvalds
2010-07-15 22:24                               ` H. Peter Anvin
2010-07-15 22:26                               ` Linus Torvalds
2010-07-15 22:46                                 ` H. Peter Anvin
2010-07-15 22:58                                 ` Andi Kleen
2010-07-15 23:20                                   ` H. Peter Anvin
2010-07-15 23:23                                     ` Linus Torvalds
2010-07-15 23:41                                       ` H. Peter Anvin
2010-07-15 23:44                                         ` Linus Torvalds
2010-07-15 23:46                                           ` H. Peter Anvin
2010-07-15 23:48                                       ` Andi Kleen
2010-07-15 22:30                               ` Mathieu Desnoyers
2010-07-16 19:13                             ` Mathieu Desnoyers
2010-07-15 16:44                   ` Mathieu Desnoyers
2010-07-15 16:49                     ` Linus Torvalds
2010-07-15 17:38                       ` Mathieu Desnoyers
2010-07-15 20:44                         ` H. Peter Anvin
2010-07-18 11:03                   ` Avi Kivity
2010-07-18 17:36                     ` Linus Torvalds
2010-07-18 18:04                       ` Avi Kivity
2010-07-18 18:22                         ` Linus Torvalds
2010-07-19  7:32                           ` Avi Kivity
2010-07-18 18:17                       ` Linus Torvalds
2010-07-18 18:43                         ` Steven Rostedt
2010-07-18 19:26                           ` Linus Torvalds
2010-07-14 15:49 ` [patch 2/2] x86 NMI-safe INT3 and Page Fault Mathieu Desnoyers
2010-07-14 16:42   ` Maciej W. Rozycki
2010-07-14 18:12     ` Mathieu Desnoyers
2010-07-14 19:21       ` Maciej W. Rozycki
2010-07-14 19:58         ` Mathieu Desnoyers
2010-07-14 20:36           ` Maciej W. Rozycki
2010-07-16 12:28   ` Avi Kivity
2010-07-16 14:49     ` Mathieu Desnoyers
2010-07-16 15:34       ` Andi Kleen
2010-07-16 15:40         ` Mathieu Desnoyers
2010-07-16 16:47       ` Avi Kivity
2010-07-16 16:58         ` Mathieu Desnoyers
2010-07-16 17:54           ` Avi Kivity
2010-07-16 18:05             ` H. Peter Anvin
2010-07-16 18:15               ` Avi Kivity
2010-07-16 18:17                 ` H. Peter Anvin
2010-07-16 18:28                   ` Avi Kivity
2010-07-16 18:37                     ` Linus Torvalds
2010-07-16 19:26                       ` Avi Kivity
2010-07-16 21:39                         ` Linus Torvalds
2010-07-16 22:07                           ` Andi Kleen
2010-07-16 22:26                             ` Linus Torvalds
2010-07-16 22:41                               ` Andi Kleen
2010-07-17  1:15                                 ` Linus Torvalds
2010-07-16 22:40                             ` Mathieu Desnoyers
2010-07-18  9:23                           ` Avi Kivity
2010-07-16 18:22                 ` Mathieu Desnoyers
2010-07-16 18:32                   ` Avi Kivity
2010-07-16 19:29                     ` H. Peter Anvin
2010-07-16 19:39                       ` Avi Kivity
2010-07-16 19:32                     ` Andi Kleen
2010-07-16 18:25                 ` Linus Torvalds
2010-07-16 19:30                   ` Andi Kleen
2010-07-18  9:26                     ` Avi Kivity
2010-07-16 19:28               ` Andi Kleen
2010-07-16 19:32                 ` Avi Kivity
2010-07-16 19:34                   ` Andi Kleen
2010-08-04  9:46               ` Peter Zijlstra
2010-08-04 20:23                 ` H. Peter Anvin
2010-07-14 17:06 ` [patch 0/2] x86: NMI-safe trap handlers Andi Kleen
2010-07-14 17:08   ` Mathieu Desnoyers
2010-07-14 18:56     ` Andi Kleen
2010-07-14 23:29       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100715141118.GA6417@nowhere \
    --to=fweisbec@gmail.com \
    --cc=acme@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=fche@redhat.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=htejun@gmail.com \
    --cc=jeremy@goop.org \
    --cc=johannes.berg@intel.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rostedt@rostedt.homelinux.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tzanussi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox