All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <mhiramat@redhat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Nick Piggin <npiggin@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Steven Rostedt <srostedt@redhat.com>
Subject: [RFC][PATCH] x86: make text_poke() atomic
Date: Mon, 02 Mar 2009 12:01:29 -0500	[thread overview]
Message-ID: <49AC10E9.1090102@redhat.com> (raw)
In-Reply-To: <49A853CD.3020607@redhat.com>

Masami Hiramatsu wrote:
> Mathieu Desnoyers wrote:
>> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
>>> Mathieu Desnoyers wrote:
>>>> * Masami Hiramatsu (mhiramat@redhat.com) wrote:
>>>>> Steven Rostedt wrote:
>>>>>> On Mon, 23 Feb 2009, Mathieu Desnoyers wrote:
>>>>>>>> Hmm, lets see. I simply set a bit in the PTE mappings. There's not many, 
>>>>>>>> since a lot are 2M pages, for x86_64. Call stop_machine, and now I can 
>>>>>>>> modify 1 or 20,000 locations. Set the PTE bit back. Note, the changing of 
>>>>>>>> the bits are only done when CONFIG_DEBUG_RODATA is set.
>>>>>>>>
>>>>>>>> text_poke requires allocating a page. Map the page into memory. Set up a 
>>>>>>>> break point.
>>>>>>> text_poke does not _require_ a break point. text_poke can work with
>>>>>>> stop_machine.
>>>>>> It can? Doesn't text_poke require allocating pages? The code called by 
>>>>>> stop_machine is all atomic. vmap does not give an option to allocate with 
>>>>>> GFP_ATOMIC.
>>>>> Hi,
>>>>>
>>>>> With my patch, text_poke() never allocate pages any more :)
>>>>>
>>>>> BTW, IMHO, both of your methods are useful and have trade-off.
>>>>>
>>>>> ftrace wants to change massive amount of code at once. If we do
>>>>> that with text_poke(), we have to map/unmap pages each time and
>>>>> it will take a long time -- might be longer than one stop_machine_run().
>>>>>
>>>>> On the other hand, text_poke() user like as kprobes and tracepoints,
>>>>> just want to change a few amount of code at once, and it will be
>>>>> added/removed incrementally. If we do that with stop_machine_run(),
>>>>> we'll be annoyed by frequent machine stops.(Moreover, kprobes uses
>>>>> breakpoint, so it doesn't need stop_machine_run())
>>>>>
>>>> Hi Masami,
>>>>
>>>> Is this text_poke version executable in atomic context ? If yes, then
>>>> that would be good to add a comment saying it. Please see below for
>>>> comments.
>>> Thank you for comments!
>>> I think it could be. ah, spin_lock might be changed to spin_lock_irqsave()...
>>>
>> You are right. If we plan to execute this in both atomic and non-atomic
>> context, spin_lock_irqsave would make sure we are always busy-looping
>> with interrupts off.
> 
> Oops, when I tested spin_lock_irqsave, it caused warnings.
> 
> ------------[ cut here ]------------
> WARNING: at /home/mhiramat/ksrc/linux-2.6/kernel/smp.c:329 smp_call_function_man
> y+0x37/0x1c9()
> Hardware name: Precision WorkStation T5400
> Modules linked in:
> Pid: 1, comm: swapper Tainted: G        W  2.6.29-rc6 #16
> Call Trace:
>  [<c042f7b1>] warn_slowpath+0x71/0xa8
>  [<c044dccb>] ? trace_hardirqs_on_caller+0x18/0x145
>  [<c06dc42f>] ? _spin_unlock_irq+0x22/0x2f
>  [<c044efc9>] ? print_lock_contention_bug+0x14/0xd7
>  [<c044efc9>] ? print_lock_contention_bug+0x14/0xd7
>  [<c044cfbb>] ? trace_hardirqs_off_caller+0x18/0xa3
>  [<c045383b>] smp_call_function_many+0x37/0x1c9
>  [<c04138fc>] ? do_flush_tlb_all+0x0/0x3c
>  [<c04138fc>] ? do_flush_tlb_all+0x0/0x3c
>  [<c04539e9>] smp_call_function+0x1c/0x23
>  [<c0433ee1>] on_each_cpu+0xf/0x3a
>  [<c04138c6>] flush_tlb_all+0x14/0x16
>  [<c04946f7>] unmap_kernel_range+0xf/0x11
>  [<c06dd78a>] text_poke+0xf1/0x12c
> 
> unmap_kernel_range() requires irq enabled, but spin_lock_irqsave
> (and stop_machine too)disables irq. so we have to solve this issue.
> 
> I have some ideas:
> 
> - export(just remove static) vunmap_page_range() and don't use
>   flush_tlb_all().
>  Because this vm_area are not used by other cpus, we don't need
>  to flush TLB of all cpus.
> 
> - make unmap_kernel_range_local() function.
> 
> - extend kmap_atomic_prot() to map lowmem page when the 'prot'
>   is different.
> 

I updated my patch based on the first idea.
I also checked that text_poke() can be called from stop_machine()
with this patch.
---

Use map_vm_area() instead of vmap() in text_poke() for avoiding page allocation
and delayed unmapping, and call vunmap_page_range() and local_flush_tlb()
directly because this mapping is temporary and local.

At the result of above change, text_poke() becomes atomic and can be called
from stop_machine() etc.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Nick Piggin <npiggin@suse.de>
---
  arch/x86/include/asm/alternative.h |    1 +
  arch/x86/kernel/alternative.c      |   36 +++++++++++++++++++++++++++++-------
  include/linux/vmalloc.h            |    1 +
  init/main.c                        |    3 +++
  mm/vmalloc.c                       |    2 +-
  5 files changed, 35 insertions(+), 8 deletions(-)

Index: linux-2.6/arch/x86/include/asm/alternative.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/alternative.h
+++ linux-2.6/arch/x86/include/asm/alternative.h
@@ -177,6 +177,7 @@ extern void add_nops(void *insns, unsign
   * The _early version expects the memory to already be RW.
   */

+extern void text_poke_init(void);
  extern void *text_poke(void *addr, const void *opcode, size_t len);
  extern void *text_poke_early(void *addr, const void *opcode, size_t len);

Index: linux-2.6/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/alternative.c
+++ linux-2.6/arch/x86/kernel/alternative.c
@@ -12,6 +12,7 @@
  #include <asm/nmi.h>
  #include <asm/vsyscall.h>
  #include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
  #include <asm/io.h>

  #define MAX_PATCH_LEN (255-1)
@@ -485,6 +486,16 @@ void *text_poke_early(void *addr, const
  	return addr;
  }

+static struct vm_struct *text_poke_area[2];
+static DEFINE_SPINLOCK(text_poke_lock);
+
+void __init text_poke_init(void)
+{
+	text_poke_area[0] = get_vm_area(PAGE_SIZE, VM_ALLOC);
+	text_poke_area[1] = get_vm_area(2 * PAGE_SIZE, VM_ALLOC);
+	BUG_ON(!text_poke_area[0] || !text_poke_area[1]);
+}
+
  /**
   * text_poke - Update instructions on a live kernel
   * @addr: address to modify
@@ -501,8 +512,9 @@ void *__kprobes text_poke(void *addr, co
  	unsigned long flags;
  	char *vaddr;
  	int nr_pages = 2;
-	struct page *pages[2];
-	int i;
+	struct page *pages[2], **pgp = pages;
+	int i, ret;
+	struct vm_struct *vma;

  	if (!core_kernel_text((unsigned long)addr)) {
  		pages[0] = vmalloc_to_page(addr);
@@ -515,12 +527,22 @@ void *__kprobes text_poke(void *addr, co
  	BUG_ON(!pages[0]);
  	if (!pages[1])
  		nr_pages = 1;
-	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
-	BUG_ON(!vaddr);
-	local_irq_save(flags);
+	spin_lock_irqsave(&text_poke_lock, flags);
+	vma = text_poke_area[nr_pages-1];
+	ret = map_vm_area(vma, PAGE_KERNEL, &pgp);
+	BUG_ON(ret);
+	vaddr = vma->addr;
  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
-	local_irq_restore(flags);
-	vunmap(vaddr);
+	/* Ported from unmap_kernel_range() */
+	flush_cache_vunmap((unsigned long)vma->addr, (unsigned long)vma->size);
+	vunmap_page_range((unsigned long)vma->addr,
+			  (unsigned long)vma->addr + (unsigned long)vma->size);
+	/*
+	 * Since this mapping is temporary, local and protected by spinlock,
+	 * we just need to flush TLB on local processor.
+	 */
+	local_flush_tlb();
+	spin_unlock_irqrestore(&text_poke_lock, flags);
  	sync_core();
  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
  	   that causes hangs on some VIA CPUs. */
Index: linux-2.6/init/main.c
===================================================================
--- linux-2.6.orig/init/main.c
+++ linux-2.6/init/main.c
@@ -676,6 +676,9 @@ asmlinkage void __init start_kernel(void
  	taskstats_init_early();
  	delayacct_init();

+#ifdef CONFIG_X86
+	text_poke_init();
+#endif
  	check_bugs();

  	acpi_early_init(); /* before LAPIC and SMP init */
Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -71,7 +71,7 @@ static void vunmap_pud_range(pgd_t *pgd,
  	} while (pud++, addr = next, addr != end);
  }

-static void vunmap_page_range(unsigned long addr, unsigned long end)
+void vunmap_page_range(unsigned long addr, unsigned long end)
  {
  	pgd_t *pgd;
  	unsigned long next;
Index: linux-2.6/include/linux/vmalloc.h
===================================================================
--- linux-2.6.orig/include/linux/vmalloc.h
+++ linux-2.6/include/linux/vmalloc.h
@@ -96,6 +96,7 @@ extern struct vm_struct *remove_vm_area(
  extern int map_vm_area(struct vm_struct *area, pgprot_t prot,
  			struct page ***pages);
  extern void unmap_kernel_range(unsigned long addr, unsigned long size);
+extern void vunmap_page_range(unsigned long addr, unsigned long end);

  /* Allocate/destroy a 'vmalloc' VM area. */
  extern struct vm_struct *alloc_vm_area(size_t size);

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@redhat.com



  reply	other threads:[~2009-03-02 17:02 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-20  1:13 [git pull] changes for tip, and a nasty x86 page table bug Steven Rostedt
2009-02-20  1:13 ` [PATCH 1/6] x86: check PMD in spurious_fault handler Steven Rostedt
2009-02-20  1:13 ` [PATCH 2/6] x86: keep pmd rw bit set when creating 4K level pages Steven Rostedt
2009-02-20  1:13 ` [PATCH 3/6] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
2009-02-20  1:13 ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-20  1:32   ` Andrew Morton
2009-02-20  1:44     ` Steven Rostedt
2009-02-20  2:05       ` [PATCH][git pull] update to tip/tracing/ftrace Steven Rostedt
2009-02-22 17:50   ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Andi Kleen
2009-02-22 22:53     ` Steven Rostedt
2009-02-23  0:29       ` Andi Kleen
2009-02-23  2:33       ` Mathieu Desnoyers
2009-02-23  4:29         ` Steven Rostedt
2009-02-23  4:53           ` Mathieu Desnoyers
2009-02-23 14:48             ` Steven Rostedt
2009-02-23 15:42               ` Mathieu Desnoyers
2009-02-23 15:51                 ` Steven Rostedt
2009-02-23 15:55                   ` Steven Rostedt
2009-02-23 16:13                   ` Mathieu Desnoyers
2009-02-23 16:48                     ` Steven Rostedt
2009-02-23 17:31                       ` Mathieu Desnoyers
2009-02-23 18:17                         ` Steven Rostedt
2009-02-23 18:34                           ` Mathieu Desnoyers
2009-02-27 17:52                           ` Masami Hiramatsu
2009-02-27 18:07                             ` Mathieu Desnoyers
2009-02-27 18:34                               ` Masami Hiramatsu
2009-02-27 18:53                                 ` Mathieu Desnoyers
2009-02-27 20:57                                   ` Masami Hiramatsu
2009-03-02 17:01                                     ` Masami Hiramatsu [this message]
2009-03-02 17:19                                       ` [RFC][PATCH] x86: make text_poke() atomic Mathieu Desnoyers
2009-03-02 22:15                                         ` Masami Hiramatsu
2009-03-02 22:22                                           ` Ingo Molnar
2009-03-02 22:55                                             ` Masami Hiramatsu
2009-03-02 23:09                                               ` Ingo Molnar
2009-03-02 23:38                                                 ` Masami Hiramatsu
2009-03-02 23:49                                                   ` Ingo Molnar
2009-03-03  0:00                                                     ` Mathieu Desnoyers
2009-03-03  0:00                                                     ` [PATCH] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2009-03-03  0:32                                                       ` Ingo Molnar
2009-03-03  0:39                                                         ` Mathieu Desnoyers
2009-03-03  1:30                                                         ` [PATCH] Text Edit Lock - Architecture Independent Code (v2) Mathieu Desnoyers
2009-03-03  1:31                                                         ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Mathieu Desnoyers
2009-03-03  9:27                                                           ` Ingo Molnar
2009-03-03 12:06                                                             ` Ananth N Mavinakayanahalli
2009-03-03 14:28                                                               ` Mathieu Desnoyers
2009-03-03 14:33                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v3) Mathieu Desnoyers
2009-03-03 14:53                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Ingo Molnar
2009-03-03  0:01                                                     ` [PATCH] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2009-03-03  0:10                                                       ` Masami Hiramatsu
2009-03-03  0:05                                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-03  0:22                                                       ` Ingo Molnar
2009-03-03  0:31                                                         ` Masami Hiramatsu
2009-03-03 16:31                                                           ` [PATCH] x86: make text_poke() atomic using fixmap Masami Hiramatsu
2009-03-03 17:08                                                             ` Mathieu Desnoyers
2009-03-05 10:38                                                             ` Ingo Molnar
2009-03-06 14:06                                                               ` Ingo Molnar
2009-03-06 14:49                                                                 ` Masami Hiramatsu
2009-03-02 18:28                                       ` [RFC][PATCH] x86: make text_poke() atomic Arjan van de Ven
2009-03-02 18:36                                         ` Mathieu Desnoyers
2009-03-02 18:55                                           ` Arjan van de Ven
2009-03-02 19:13                                             ` Masami Hiramatsu
2009-03-02 19:23                                               ` H. Peter Anvin
2009-03-02 19:47                                             ` Mathieu Desnoyers
2009-03-02 18:42                                         ` Linus Torvalds
2009-03-03  4:54                                       ` Nick Piggin
2009-02-23 18:23                         ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-23  9:02         ` Ingo Molnar
2009-02-27 21:08     ` Pavel Machek
2009-02-28 16:56       ` Andi Kleen
2009-02-28 22:08         ` Pavel Machek
     [not found]           ` <87wsba1a9f.fsf@basil.nowhere.org>
2009-02-28 22:19             ` Pavel Machek
2009-02-28 23:52               ` Andi Kleen
2009-02-20  1:13 ` [PATCH 5/6] ftrace: immediately stop code modification if failure is detected Steven Rostedt
2009-02-20  1:13 ` [PATCH 6/6] ftrace: break out modify loop immediately on detection of error Steven Rostedt
2009-02-20  2:00 ` [git pull] changes for tip, and a nasty x86 page table bug Linus Torvalds
2009-02-20  2:08   ` Steven Rostedt
2009-02-20  3:44     ` Linus Torvalds
2009-02-20  4:00       ` Steven Rostedt
2009-02-20  4:17         ` Linus Torvalds
2009-02-20  4:34           ` Steven Rostedt
2009-02-20  5:02           ` Huang Ying
2009-02-20  7:29       ` [PATCH] x86: use the right protections for split-up pagetables Ingo Molnar
2009-02-20  7:39         ` [PATCH, v2] " Ingo Molnar
2009-02-20  8:02           ` Ingo Molnar
2009-02-20 10:24             ` Ingo Molnar
2009-02-20 13:57         ` [PATCH] " Steven Rostedt
2009-02-20 15:40         ` Linus Torvalds
2009-02-20 16:59           ` Ingo Molnar
2009-02-20 18:33           ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49AC10E9.1090102@redhat.com \
    --to=mhiramat@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=arjan@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.