All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Steven Rostedt <srostedt@redhat.com>
Subject: Re: [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions
Date: Fri, 27 Feb 2009 13:53:16 -0500	[thread overview]
Message-ID: <20090227185316.GA19811@Krystal> (raw)
In-Reply-To: <49A83237.40604@redhat.com>

* Masami Hiramatsu (mhiramat@redhat.com) wrote:
> Mathieu Desnoyers wrote:
> > * Masami Hiramatsu (mhiramat@redhat.com) wrote:
> >> Steven Rostedt wrote:
> >>> On Mon, 23 Feb 2009, Mathieu Desnoyers wrote:
> >>>>> Hmm, lets see. I simply set a bit in the PTE mappings. There's not many, 
> >>>>> since a lot are 2M pages, for x86_64. Call stop_machine, and now I can 
> >>>>> modify 1 or 20,000 locations. Set the PTE bit back. Note, the changing of 
> >>>>> the bits are only done when CONFIG_DEBUG_RODATA is set.
> >>>>>
> >>>>> text_poke requires allocating a page. Map the page into memory. Set up a 
> >>>>> break point.
> >>>> text_poke does not _require_ a break point. text_poke can work with
> >>>> stop_machine.
> >>> It can? Doesn't text_poke require allocating pages? The code called by 
> >>> stop_machine is all atomic. vmap does not give an option to allocate with 
> >>> GFP_ATOMIC.
> >> Hi,
> >>
> >> With my patch, text_poke() never allocate pages any more :)
> >>
> >> BTW, IMHO, both of your methods are useful and have trade-off.
> >>
> >> ftrace wants to change massive amount of code at once. If we do
> >> that with text_poke(), we have to map/unmap pages each time and
> >> it will take a long time -- might be longer than one stop_machine_run().
> >>
> >> On the other hand, text_poke() user like as kprobes and tracepoints,
> >> just want to change a few amount of code at once, and it will be
> >> added/removed incrementally. If we do that with stop_machine_run(),
> >> we'll be annoyed by frequent machine stops.(Moreover, kprobes uses
> >> breakpoint, so it doesn't need stop_machine_run())
> >>
> > 
> > Hi Masami,
> > 
> > Is this text_poke version executable in atomic context ? If yes, then
> > that would be good to add a comment saying it. Please see below for
> > comments.
> 
> Thank you for comments!
> I think it could be. ah, spin_lock might be changed to spin_lock_irqsave()...
> 

You are right. If we plan to execute this in both atomic and non-atomic
context, spin_lock_irqsave would make sure we are always busy-looping
with interrupts off.

Having spinlocks taken in _both_ interrupts on and off contexts leads to
higher interrupt latencies when the interrupt-off waits for an
interrupt-on thread.


> >> Thank you,
> >>
> > [...]
> >> Use map_vm_area() instead of vmap() in text_poke() for avoiding page allocation
> >> and delayed unmapping.
> >>
> >> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
> >> ---
> >>  arch/x86/include/asm/alternative.h |    1 +
> >>  arch/x86/kernel/alternative.c      |   25 ++++++++++++++++++++-----
> >>  init/main.c                        |    3 +++
> >>  3 files changed, 24 insertions(+), 5 deletions(-)
> >>
> >> Index: linux-2.6/arch/x86/include/asm/alternative.h
> >> ===================================================================
> >> --- linux-2.6.orig/arch/x86/include/asm/alternative.h
> >> +++ linux-2.6/arch/x86/include/asm/alternative.h
> >> @@ -177,6 +177,7 @@ extern void add_nops(void *insns, unsign
> >>   * The _early version expects the memory to already be RW.
> >>   */
> >>  
> >> +extern void text_poke_init(void);
> >>  extern void *text_poke(void *addr, const void *opcode, size_t len);
> >>  extern void *text_poke_early(void *addr, const void *opcode, size_t len);
> >>  
> >> Index: linux-2.6/arch/x86/kernel/alternative.c
> >> ===================================================================
> >> --- linux-2.6.orig/arch/x86/kernel/alternative.c
> >> +++ linux-2.6/arch/x86/kernel/alternative.c
> >> @@ -485,6 +485,16 @@ void *text_poke_early(void *addr, const 
> >>  	return addr;
> >>  }
> >>  
> >> +static struct vm_struct *text_poke_area[2];
> >> +static DEFINE_SPINLOCK(text_poke_lock);
> >> +
> >> +void __init text_poke_init(void)
> >> +{
> >> +	text_poke_area[0] = get_vm_area(PAGE_SIZE, VM_ALLOC);
> >> +	text_poke_area[1] = get_vm_area(2 * PAGE_SIZE, VM_ALLOC);
> > 
> > Why is this text_poke_area[1] 2 * PAGE_SIZE in size ? I would have
> > thought that text_poke_area[0] would be PAGE_SIZE, text_poke_area[1]
> > also be PAGE_SIZE, and that the sum of both would be 2 * PAGE_SIZE..
> 
> Unfortunately, current map_vm_area() tries to map the size of vm_area,
> this means, you can't use 2page-size vm_area for mapping just 1 page...
> (or maybe, we can set pages[1] = pages[0] when 2nd page doesn't exist)
> 

OK, given we sometimes have to map only a single page (e.g. at the end
of a text section), we really need both 1 and 2 pages mapping. So I
think you solution is good.

> 
> >> +	BUG_ON(!text_poke_area[0] || !text_poke_area[1]);
> >> +}
> >> +
> >>  /**
> >>   * text_poke - Update instructions on a live kernel
> >>   * @addr: address to modify
> >> @@ -501,8 +511,9 @@ void *__kprobes text_poke(void *addr, co
> >>  	unsigned long flags;
> >>  	char *vaddr;
> >>  	int nr_pages = 2;
> >> -	struct page *pages[2];
> >> -	int i;
> >> +	struct page *pages[2], **pgp = pages;
> > 
> > Hrm, why do you need **pgp ? Could you simply pass &pages to map_vm_area ?
> 
> As you know, pages means just the address(value) of an array, so you can't
> get the address of the address...(pages and &pages are same.)
> 
>         int array[2];
>         printf("%p, %p",array, &array);
> 
> please try it :)
> 
> And actually, map_vm_area() requires the address of a pointer.

Ah yes, thanks for the explanation.

After changing the spinlock/irqsave, I think that patch would be good to
merge. And then Steve could use text_poke within stop_machine if he
likes.

Mathieu

> ---
> int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
> {
>         unsigned long addr = (unsigned long)area->addr;
>         unsigned long end = addr + area->size - PAGE_SIZE;
>         int err;
> 
>         err = vmap_page_range(addr, end, prot, *pages);
>         if (err > 0) {
>                 *pages += err;
>                 ^^^^^^^^^^^^^^ Here, it tries to add err(=number of mapped pages)
>                                to the pages pointer!
>                 err = 0;
>         }
> 
>         return err;
> }
> ---
> 
> 
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> >> +	int i, ret;
> >> +	struct vm_struct *vma;
> >>  
> >>  	if (!core_kernel_text((unsigned long)addr)) {
> >>  		pages[0] = vmalloc_to_page(addr);
> >> @@ -515,12 +526,16 @@ void *__kprobes text_poke(void *addr, co
> >>  	BUG_ON(!pages[0]);
> >>  	if (!pages[1])
> >>  		nr_pages = 1;
> >> -	vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> >> -	BUG_ON(!vaddr);
> >> +	spin_lock(&text_poke_lock);
> >> +	vma = text_poke_area[nr_pages-1];
> >> +	ret = map_vm_area(vma, PAGE_KERNEL, &pgp);
> >> +	BUG_ON(ret);
> >> +	vaddr = vma->addr;
> >>  	local_irq_save(flags);
> >>  	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> >>  	local_irq_restore(flags);
> >> -	vunmap(vaddr);
> >> +	unmap_kernel_range((unsigned long)vma->addr, (unsigned long)vma->size);
> >> +	spin_unlock(&text_poke_lock);
> >>  	sync_core();
> >>  	/* Could also do a CLFLUSH here to speed up CPU recovery; but
> >>  	   that causes hangs on some VIA CPUs. */
> >> @@ -528,3 +543,4 @@ void *__kprobes text_poke(void *addr, co
> >>  		BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
> >>  	return addr;
> >>  }
> >> +EXPORT_SYMBOL_GPL(text_poke);
> >> Index: linux-2.6/init/main.c
> >> ===================================================================
> >> --- linux-2.6.orig/init/main.c
> >> +++ linux-2.6/init/main.c
> >> @@ -676,6 +676,9 @@ asmlinkage void __init start_kernel(void
> >>  	taskstats_init_early();
> >>  	delayacct_init();
> >>  
> >> +#ifdef CONFIG_X86
> >> +	text_poke_init();
> >> +#endif
> >>  	check_bugs();
> >>  
> >>  	acpi_early_init(); /* before LAPIC and SMP init */
> > 
> > 
> 
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
> 
> e-mail: mhiramat@redhat.com
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-02-27 18:58 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-20  1:13 [git pull] changes for tip, and a nasty x86 page table bug Steven Rostedt
2009-02-20  1:13 ` [PATCH 1/6] x86: check PMD in spurious_fault handler Steven Rostedt
2009-02-20  1:13 ` [PATCH 2/6] x86: keep pmd rw bit set when creating 4K level pages Steven Rostedt
2009-02-20  1:13 ` [PATCH 3/6] ftrace: allow archs to preform pre and post process for code modification Steven Rostedt
2009-02-20  1:13 ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-20  1:32   ` Andrew Morton
2009-02-20  1:44     ` Steven Rostedt
2009-02-20  2:05       ` [PATCH][git pull] update to tip/tracing/ftrace Steven Rostedt
2009-02-22 17:50   ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Andi Kleen
2009-02-22 22:53     ` Steven Rostedt
2009-02-23  0:29       ` Andi Kleen
2009-02-23  2:33       ` Mathieu Desnoyers
2009-02-23  4:29         ` Steven Rostedt
2009-02-23  4:53           ` Mathieu Desnoyers
2009-02-23 14:48             ` Steven Rostedt
2009-02-23 15:42               ` Mathieu Desnoyers
2009-02-23 15:51                 ` Steven Rostedt
2009-02-23 15:55                   ` Steven Rostedt
2009-02-23 16:13                   ` Mathieu Desnoyers
2009-02-23 16:48                     ` Steven Rostedt
2009-02-23 17:31                       ` Mathieu Desnoyers
2009-02-23 18:17                         ` Steven Rostedt
2009-02-23 18:34                           ` Mathieu Desnoyers
2009-02-27 17:52                           ` Masami Hiramatsu
2009-02-27 18:07                             ` Mathieu Desnoyers
2009-02-27 18:34                               ` Masami Hiramatsu
2009-02-27 18:53                                 ` Mathieu Desnoyers [this message]
2009-02-27 20:57                                   ` Masami Hiramatsu
2009-03-02 17:01                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-02 17:19                                       ` Mathieu Desnoyers
2009-03-02 22:15                                         ` Masami Hiramatsu
2009-03-02 22:22                                           ` Ingo Molnar
2009-03-02 22:55                                             ` Masami Hiramatsu
2009-03-02 23:09                                               ` Ingo Molnar
2009-03-02 23:38                                                 ` Masami Hiramatsu
2009-03-02 23:49                                                   ` Ingo Molnar
2009-03-03  0:00                                                     ` Mathieu Desnoyers
2009-03-03  0:00                                                     ` [PATCH] Text Edit Lock - Architecture Independent Code Mathieu Desnoyers
2009-03-03  0:32                                                       ` Ingo Molnar
2009-03-03  0:39                                                         ` Mathieu Desnoyers
2009-03-03  1:30                                                         ` [PATCH] Text Edit Lock - Architecture Independent Code (v2) Mathieu Desnoyers
2009-03-03  1:31                                                         ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Mathieu Desnoyers
2009-03-03  9:27                                                           ` Ingo Molnar
2009-03-03 12:06                                                             ` Ananth N Mavinakayanahalli
2009-03-03 14:28                                                               ` Mathieu Desnoyers
2009-03-03 14:33                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v3) Mathieu Desnoyers
2009-03-03 14:53                                                               ` [PATCH] Text Edit Lock - kprobes architecture independent support (v2) Ingo Molnar
2009-03-03  0:01                                                     ` [PATCH] Text Edit Lock - kprobes architecture independent support Mathieu Desnoyers
2009-03-03  0:10                                                       ` Masami Hiramatsu
2009-03-03  0:05                                                     ` [RFC][PATCH] x86: make text_poke() atomic Masami Hiramatsu
2009-03-03  0:22                                                       ` Ingo Molnar
2009-03-03  0:31                                                         ` Masami Hiramatsu
2009-03-03 16:31                                                           ` [PATCH] x86: make text_poke() atomic using fixmap Masami Hiramatsu
2009-03-03 17:08                                                             ` Mathieu Desnoyers
2009-03-05 10:38                                                             ` Ingo Molnar
2009-03-06 14:06                                                               ` Ingo Molnar
2009-03-06 14:49                                                                 ` Masami Hiramatsu
2009-03-02 18:28                                       ` [RFC][PATCH] x86: make text_poke() atomic Arjan van de Ven
2009-03-02 18:36                                         ` Mathieu Desnoyers
2009-03-02 18:55                                           ` Arjan van de Ven
2009-03-02 19:13                                             ` Masami Hiramatsu
2009-03-02 19:23                                               ` H. Peter Anvin
2009-03-02 19:47                                             ` Mathieu Desnoyers
2009-03-02 18:42                                         ` Linus Torvalds
2009-03-03  4:54                                       ` Nick Piggin
2009-02-23 18:23                         ` [PATCH 4/6] ftrace, x86: make kernel text writable only for conversions Steven Rostedt
2009-02-23  9:02         ` Ingo Molnar
2009-02-27 21:08     ` Pavel Machek
2009-02-28 16:56       ` Andi Kleen
2009-02-28 22:08         ` Pavel Machek
     [not found]           ` <87wsba1a9f.fsf@basil.nowhere.org>
2009-02-28 22:19             ` Pavel Machek
2009-02-28 23:52               ` Andi Kleen
2009-02-20  1:13 ` [PATCH 5/6] ftrace: immediately stop code modification if failure is detected Steven Rostedt
2009-02-20  1:13 ` [PATCH 6/6] ftrace: break out modify loop immediately on detection of error Steven Rostedt
2009-02-20  2:00 ` [git pull] changes for tip, and a nasty x86 page table bug Linus Torvalds
2009-02-20  2:08   ` Steven Rostedt
2009-02-20  3:44     ` Linus Torvalds
2009-02-20  4:00       ` Steven Rostedt
2009-02-20  4:17         ` Linus Torvalds
2009-02-20  4:34           ` Steven Rostedt
2009-02-20  5:02           ` Huang Ying
2009-02-20  7:29       ` [PATCH] x86: use the right protections for split-up pagetables Ingo Molnar
2009-02-20  7:39         ` [PATCH, v2] " Ingo Molnar
2009-02-20  8:02           ` Ingo Molnar
2009-02-20 10:24             ` Ingo Molnar
2009-02-20 13:57         ` [PATCH] " Steven Rostedt
2009-02-20 15:40         ` Linus Torvalds
2009-02-20 16:59           ` Ingo Molnar
2009-02-20 18:33           ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090227185316.GA19811@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=arjan@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@redhat.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.